BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T000712Z
LOCATION:605
DTSTART;TZID=America/Denver:20231113T111600
DTEND;TZID=America/Denver:20231113T112500
UID:submissions.supercomputing.org_SC23_sess441_ws_p3hpc113@linklings.com
SUMMARY:Performance Portability of Programming Strategies for Nearest-Neig
 hbor Communication with GPU-Aware MPI
DESCRIPTION:James B. White III (Hewlett Packard Enterprise (HPE))\n\nTo be
 tter advise HPC application developers, we have implemented Faces, a neare
 st-neighbor microbenchmark that quantifies performance trade-offs. The Fac
 es experiments presented here explore the following design choices: 1) few
 er dependent messages versus more independent messages, 2) fewer fused GPU
  kernels versus more simple kernels, 3) number of GPU streams, 4) size of 
 GPU thread blocks, and 5) linear versus blocked ordering of MPI ranks. We 
 present weak-scaling performance of a latency-sensitive "small'' per-rank 
 domain and of a bandwidth-sensitive "large'' per-rank domain, and we compa
 re results for two high-performance computers with contrasting CPU, GPU, a
 nd interconnect architectures: Summit and Frontier. We find that using mor
 e independent messages tends to give better performance than using few dep
 endent messages. We identify performance-portability recommendations for G
 PU streams and synchronization, but other aspects of performance show comp
 licated dependence on problem size and computer.\n\nTag: Performance Measu
 rement, Modeling, and Tools, Performance Optimization\n\nRegistration Cate
 gory: Workshop Reg Pass\n\nSession Chairs: Judith C. Hill (Lawrence Liverm
 ore National Laboratory (LLNL)), CJ Newburn (NVIDIA Corporation), Scott J.
  Parker (Argonne National Laboratory (ANL)), and John Pennycook (Intel Cor
 poration)\n\n
END:VEVENT
END:VCALENDAR
