BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T000711Z
LOCATION:DEF Concourse
DTSTART;TZID=America/Denver:20231114T100000
DTEND;TZID=America/Denver:20231114T170000
UID:submissions.supercomputing.org_SC23_sess291_rpost174@linklings.com
SUMMARY:Early Experience in Characterizing Training Large Language Models 
 on Modern HPC Clusters
DESCRIPTION:Hao Qi, Liuyao Dai, Weicong Chen, and Xiaoyi Lu (University of
  California, Merced)\n\nIn the realm of natural language processing, Large
  Language Models (LLMs) have emerged as powerful tools for tasks such as l
 anguage translation, text generation, and sentiment analysis. However, the
  immense parameter size and complexity of LLMs present significant challen
 ges. This work delves into the exploration and characterization of high-pe
 rformance interconnects in the distributed training of various LLMs. Our f
 indings reveal that high-performance network protocols, notably RDMA, sign
 ificantly outperform other protocols like IPoIB and TCP/IP in training per
 formance, offering improvements by factors of 2.51x and 4.79x respectively
 . Additionally, we observe that LLMs with larger parameters tend to demand
  higher interconnect utilization. Despite these findings, our study sugges
 ts potential for further optimization in overall interconnect utilization.
  This research contributes to a deeper understanding of the performance ch
 aracteristics of LLMs over high-speed interconnects, paving the way for mo
 re efficient training methodologies.\n\nTag: Artificial Intelligence/Machi
 ne Learning, Architecture and Networks, Heterogeneous Computing, I/O and F
 ile Systems, Performance Measurement, Modeling, and Tools, Post-Moore Comp
 uting, Programming Frameworks and System Software, Quantum Computing\n\nRe
 gistration Category: Tech Program Reg Pass, Exhibits Reg Pass\n\n
END:VEVENT
END:VCALENDAR
