BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T000713Z
LOCATION:505
DTSTART;TZID=America/Denver:20231115T153900
DTEND;TZID=America/Denver:20231115T154800
UID:submissions.supercomputing.org_SC23_sess308_spostg114@linklings.com
SUMMARY:Accelerating Collective Communications with Lossy Compression on G
 PU
DESCRIPTION:Jiajun Huang (University of California, Riverside)\n\nGPU-awar
 e collective communication has become a major bottleneck for modern comput
 ing platforms as GPU computing power rapidly rises. To address this issue,
  traditional approaches integrate lossy compression directly into GPU-awar
 e collectives, which still suffer from serious issues such as underutilize
 d GPU devices and uncontrolled data distortion. \n\nIn this poster, we pro
 pose GPU-LCC, a general framework that designs and optimizes GPU-aware, co
 mpression-enabled collectives with well-controlled error propagation. To v
 alidate our framework, we evaluate the performance on up to 512 NVIDIA A10
 0 GPUs with real-world applications and datasets. Experimental results dem
 onstrate that our GPU-LCC-accelerated collective computation (Allreduce), 
 can outperform NCCL as well as Cray MPI by up to 4.5X and 20.2X, respectiv
 ely. Furthermore, our accuracy evaluation with an image-stacking applicati
 on confirms the high reconstructed data quality of our accuracy-aware fram
 ework.\n\nRegistration Category: Tech Program Reg Pass\n\nSession Chair: A
 na Gainaru (Oak Ridge National Laboratory (ORNL))\n\n
END:VEVENT
END:VCALENDAR
