BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T000712Z
LOCATION:DEF Concourse
DTSTART;TZID=America/Denver:20231114T100000
DTEND;TZID=America/Denver:20231114T170000
UID:submissions.supercomputing.org_SC23_sess291_rpost196@linklings.com
SUMMARY:Balancing Latency and Throughput of Distributed Inference by Inter
 leaved Parallelism
DESCRIPTION:Jiangsu Du, Jinhui Wei, and Jiazhi Jiang (Sun Yat-sen Universi
 ty, Guangzhou); Shenggan Cheng (National University of Singapore); and Zhi
 guang Chen, Dan Huang, and Yutong Lu (Sun Yat-sen University, Guangzhou)\n
 \nDistributed large model inference is still in a dilemma where balancing 
 latency and throughput, or rather cost and effect. Tensor parallelism, whi
 le capable of optimizing latency, entails a substantial expenditure. Conve
 rsely, pipeline parallelism excels in throughput but falls short in minimi
 zing execution time.\n\nTo address this challenge, we introduce a novel so
 lution - interleaved parallelism. This approach interleaves computation an
 d communication across requests. Our proposed runtime system harnesses GPU
  scheduling techniques to facilitate the overlapping of communication and 
 computation kernels, thereby enabling this pioneering parallelism for dist
 ributed large model inference. Extensive evaluations show that our proposa
 l outperforms existing parallelism approaches across models and devices, p
 resenting the best latency and throughput in most cases.\n\nTag: Artificia
 l Intelligence/Machine Learning, Architecture and Networks, Heterogeneous 
 Computing, I/O and File Systems, Performance Measurement, Modeling, and To
 ols, Post-Moore Computing, Programming Frameworks and System Software, Qua
 ntum Computing\n\nRegistration Category: Tech Program Reg Pass, Exhibits R
 eg Pass\n\n
END:VEVENT
END:VCALENDAR
