BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T000712Z
LOCATION:507
DTSTART;TZID=America/Denver:20231113T114000
DTEND;TZID=America/Denver:20231113T120000
UID:submissions.supercomputing.org_SC23_sess444_ws_waccpd112@linklings.com
SUMMARY:Memory Transfer Decomposition: Exploring Smart Data Movement throu
 gh Architecture-Aware Strategies
DESCRIPTION:Diego A. Roa Perdomo (University of Delaware, CAPSL; Argonne N
 ational Laboratory (ANL)); Rodrigo Ceccato and Rémy Neveu (University of C
 ampinas, Argonne National Laboratory (ANL)); Hervé Yviquel (University of 
 Campinas); Xiaoming Li (University of Delaware); Jose M. Monsalve Diaz (Ar
 gonne National Laboratory (ANL)); and Johannes Doerfert (Lawrence Livermor
 e National Laboratory (LLNL))\n\nWe provide an automated framework that ut
 ilizes complex hardware links while preserving the simplified abstraction 
 level for the user. Through the decomposition of user-issued memory operat
 ions into architecture-aware sub-tasks, we automatically exploit generally
  underused connections of the system. The operations we support include mo
 ving, distribution, and consolidation of memory across the node. For each 
 of them, our Auto-Strategyzer framework proposes a task graph that transpa
 rently improves performance, in terms of latency or bandwidth, compared to
  naive strategies. For our evaluation, we integrated the Auto-Strategyzer 
 as a C++ library into the LLVM-OpenMP runtime infrastructure. We demonstra
 te that some memory operations can be improved by a factor of 5x compared 
 to naive versions. Integrated into LLVM/OpenMP, our Auto-Strategyzer accel
 erates cross-device memory movement by a factor of 1.9x, for large transfe
 rs, resulting in approx 6% end-to-end execution time decrease for a scient
 ific proxy application.\n\nTag: Accelerators, Artificial Intelligence/Mach
 ine Learning, Algorithms, Applications, Compilers, Data Movement and Memor
 y, Heterogeneous Computing, Modeling and Simulation, Performance Optimizat
 ion, Programming Frameworks and System Software, Runtime Systems\n\nRegist
 ration Category: Workshop Reg Pass\n\nSession Chairs: Maciej Cytowski (Paw
 sey Supercomputing Research Centre; Commonwealth Scientific and Industrial
  Research Organisation (CSIRO), Australia); Verónica G. Melesse Vergara (O
 ak Ridge National Laboratory (ORNL)); and Jose Manuel Monsalve Diaz (Advan
 ced Micro Devices (AMD))\n\n
END:VEVENT
END:VCALENDAR
