BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T000712Z
LOCATION:403-404
DTSTART;TZID=America/Denver:20231114T153000
DTEND;TZID=America/Denver:20231114T160000
UID:submissions.supercomputing.org_SC23_sess171_pap355@linklings.com
SUMMARY:Optimizing MPI Collectives on Shared Memory Multi-Cores
DESCRIPTION:Jintao Peng, Jianbin Fang, Jie Liu, Min Xie, Yi Dai, Bo Yang, 
 and Shengguo Li (National University of Defense Technology (NUDT), China) 
 and Zheng Wang (University of Leeds, School of Computing, UK)\n\nCollectiv
 e communication operations, such as broadcasting and reductions, often con
 tribute to performance bottlenecks in Message Passing Interface (MPI) prog
 rams. As the number of processor cores integrated into CPUs increases, run
 ning multiple MPI processes on shared-memory machines to leverage hardware
  parallelism is becoming increasingly common. In this context, optimizing 
 MPI collective communications for shared-memory execution is crucial. This
  paper identifies two primary limitations of existing MPI collective imple
 mentations on shared-memory systems. The first is the extensive redundant 
 data movements when performing reduction collectives, and the second is th
 e ineffective use of non-temporal instructions to optimize streamed data p
 rocessing. To address these challenges, we propose two optimization techni
 ques designed to minimize data movements and enhance the use of non-tempor
 al instructions. We integrate our optimizations into the OpenMPI and evalu
 ate their performance through micro-benchmarks and real-world application 
 tests on two multi-core clusters. Experiments show that our approach signi
 ficantly outperforms existing techniques by 1.2-6.4x.\n\nTag: Distributed 
 Computing, Message Passing, Programming Frameworks and System Software\n\n
 Registration Category: Tech Program Reg Pass\n\nAward Finalist: Best Stude
 nt Paper Finalist\n\nReproducibility Badges: Artifact Available\n\nSession
  Chair: Patrick Bridges (University of New Mexico)\n\n
END:VEVENT
END:VCALENDAR
