BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T000712Z
LOCATION:501-502
DTSTART;TZID=America/Denver:20231113T155000
DTEND;TZID=America/Denver:20231113T161000
UID:submissions.supercomputing.org_SC23_sess440_ws_ai4s104@linklings.com
SUMMARY:Elastic Deep Learning through Resilient Collective Operations
DESCRIPTION:Jiali Li, George Bosilca, and Aurelien Bouteiller (University 
 of Tennessee) and Bogdan Nicolae (Argonne National Laboratory (ANL))\n\nA 
 robust solution that incorporates fault tolerance and elastic scaling capa
 bilities for distributed deep learning. Taking advantage of MPI resilient 
 capabilities, aka. User-Level Failure Mitigation (ULFM), this novel approa
 ch promotes efficient and lightweight failure management and encourages sm
 ooth scaling in volatile computational settings. The proposed ULFM MPI-cen
 tered mechanism outperforms the only officially supported elastic learning
  framework, Elastic Horovod (using Gloo and NCCL), by a significant factor
 . These results reinforce the capability of MPI extension to deal with res
 iliency, and promote ULFM as an effective technique for fault management, 
 minimizing downtime, and thereby enhancing the overall performance of dist
 ributed applications, in particular elastic training in high-performance c
 omputing (HPC) environments and machine learning applications.\n\nTag: Art
 ificial Intelligence/Machine Learning\n\nRegistration Category: Workshop R
 eg Pass\n\nSession Chairs: Murali Emani (Argonne National Laboratory (ANL)
 ); Gokcen Kestor (Barcelona Supercomputing Center (BSC); University of Cal
 ifornia, Merced); and Dong Li (University of California, Merced)\n\n
END:VEVENT
END:VCALENDAR
