BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T000712Z
LOCATION:607
DTSTART;TZID=America/Denver:20231113T095000
DTEND;TZID=America/Denver:20231113T095500
UID:submissions.supercomputing.org_SC23_sess448_ws_canolt108@linklings.com
SUMMARY:Enabling Performance for NGC Containers on the Slingshot 11 Interc
 onnect
DESCRIPTION:Alberto Madonna (Swiss National Supercomputing Centre (CSCS))\
 n\nContainers based on NVIDIA GPU Cloud (NGC) images have become increasin
 gly popular for deploying optimized software on NVIDIA GPUs, particularly 
 in the context of ML/AI frameworks and models. However, it's important to 
 note that the software stack within NGC images lacks the components necess
 ary to interact with the HPE Slingshot 11 interconnect, which is a high-sp
 eed network utilized in some of the world's most powerful supercomputers. 
 This limitation adds to the challenge of efficiently running containers fo
 r this noteworthy combination of systems and use cases.\n\nThis presentati
 on aims to share insights into the process of enabling NGC-based container
 s to leverage Slingshot 11. The discussion will cover key elements for opt
 imizing application performance, including the NCCL communication collecti
 ves, the libfabric communication framework, and GPUDirect RDMA. The presen
 tation will also feature quantitative results from synthetic benchmarks th
 at measure communication bandwidth and deep learning performance using the
  PyTorch framework.\n\nRegistration Category: Workshop Reg Pass\n\nSession
  Chairs: Richard Shane Canon (Lawrence Berkeley National Laboratory (LBNL)
 ); Alberto Madonna (ETH Zürich, Swiss National Supercomputing Centre (CSCS
 )); Laurie A. Stephey (Lawrence Berkeley National Laboratory (LBNL), Natio
 nal Energy Research Scientific Computing Center (NERSC)); and Andrew Young
 e (Sandia National Laboratories)\n\n
END:VEVENT
END:VCALENDAR
