BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T000604Z
LOCATION:DEF Concourse
DTSTART;TZID=America/Denver:20231115T100000
DTEND;TZID=America/Denver:20231115T170000
UID:submissions.supercomputing.org_SC23_sess299_spostg116@linklings.com
SUMMARY:Scaling Infrastructure to Support Multi-Trillion Parameter LLM Tra
 ining
DESCRIPTION:Mikhail Isaev (Georgia Institute of Technology)\n\nThis poster
  discusses efficient system designs for Large Language Model (LLM) scaling
  to up to 128 trillion parameters. We use a comprehensive analytical perfo
 rmance model to analyze how such models could be trained on current system
 s while maintaining 75% Model FLOPS Utilization (MFU). We first show how t
 ensor offloading alone can be used to dramatically increase the size of tr
 ainable LLMs. We analyze performance bottlenecks when scaling on systems u
 p to 16,384 GPUs and with models up to 128T parameters. Our findings sugge
 st that current H100 GPUs with 80 GiB of HBM enabled with 512 GiB of tenso
 r offloading capacity allows scaling to 11T-parameter LLMs; and getting to
  128T parameters requires 120 GiB of HBM and 2 TiB of offloading memory, y
 ielding 75%+ MFU, which is uncommon even when training much smaller LLMs t
 oday.\n\nTag: Artificial Intelligence/Machine Learning, Algorithms, Applic
 ations, Architecture and Networks, Cloud Computing, Distributed Computing,
  Data Analysis, Visualization, and Storage, Performance Measurement, Model
 ing, and Tools, Programming Frameworks and System Software\n\nRegistration
  Category: Tech Program Reg Pass, Exhibits Reg Pass\n\n
END:VEVENT
END:VCALENDAR