BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T000711Z
LOCATION:503-504
DTSTART;TZID=America/Denver:20231114T133000
DTEND;TZID=America/Denver:20231114T140000
UID:submissions.supercomputing.org_SC23_sess249_exforum131@linklings.com
SUMMARY:Scaling Up to 32 GPUs to a Single Node Without Changing a Single L
 ine of Code
DESCRIPTION:John Ihnotic (GigaIO)\n\nThis technical deep dive will demonst
 rate scaling an application up to 32 accelerators to a single node — which
  until now was only possible on a supercomputer. This is achieved without 
 needing to modify the application software for HPC or AI workloads, saving
  users considerable time and effort in porting software.\n\nThis new capab
 ility was made possible by a deep integration between the engineering team
 s of AMD and GigaIO. It utilizes off-the-shelf servers and GPUs connected 
 over GigaIO’s native PCIe memory fabric, which provides the same performan
 ce and latency as if those accelerators were housed within the server shee
 t metal.\n \nThis talk will cover the steps to create this first-of-its-ki
 nd server, the GigaIO SuperNODE, including how to identify and resolve iss
 ues that prevent the enumeration of large numbers of GPUs, such as hardcod
 ed limits within ROCm, physical address bit inconsistencies between CPUs (
 Milan, Genoa) and GPUs, and memory address issues in the VBIOS.\n \nGigaIO
  will demonstrate how frameworks such as Pytorch and TensorFlow “just work
 ” when run on this all-AMD system, without changing a single line of code.
  The plug-n-play nature of this solution opens new possibilities for gener
 ative AI and machine learning workloads, especially given the current avai
 lability constraints on GPUs.\n \nLimitations encountered include the need
  for server vendors to be willing to modify their server BIOS to accommoda
 te the unexpected number of PCIe end-points and to support dynamic allocat
 ion of resources. As such, this solution is only available on selected pla
 tforms from those server vendors who have undertaken that effort. Other li
 miting factors include the total number of BUS IDs and MMIO space.\n\nTag:
  Accelerators, Artificial Intelligence/Machine Learning\n\nRegistration Ca
 tegory: Tech Program Reg Pass, Exhibits Reg Pass\n\nSession Chair: Jane He
 rriman (Lawrence Livermore National Laboratory (LLNL))\n\n
END:VEVENT
END:VCALENDAR
