BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T000711Z
LOCATION:505
DTSTART;TZID=America/Denver:20231112T162000
DTEND;TZID=America/Denver:20231112T165000
UID:submissions.supercomputing.org_SC23_sess428_misc262@linklings.com
SUMMARY:Enabling Large Dynamic Neural Network Training with Learning-Based
  Runtime Memory Management
DESCRIPTION:Dong Li (University of California, Merced)\n\nDynamic neural n
 etwork (DyNN) enables high computational efficiency and strong representat
 ion capability. However, training DyNN can face a memory capacity problem 
 because of increasing model size or limited GPU memory capacity. Managing 
 tensors to save GPU memory is challenging, because of the dynamic structur
 e of DyNN. We introduce DyNN-Offload, a memory-management runtime system t
 o train DyNN. DyNN-Offload uses a learned approach (using a neural network
  called the pilot model) to increase predictability of tensor accesses to 
 facilitate memory management. The key of DyNN-Offload is to enable fast in
 ference of the pilot model in order to reduce its performance overhead, wh
 ile providing high inference (or prediction) accuracy. DyNN-Offload reduce
 s input feature space and model complexity of the pilot model based on a n
 ew representation of DyNN. DyNN-Offload enables 8× larger DyNN training on
  a single GPU compared with using PyTorch alone (unprecedented with any ex
 isting solution). Evaluating with AlphaFold (a production-level, large-sca
 le DyNN), we show that DyNN-Offload outperforms unified virtual memory (UV
 M) and dynamic tensor rematerialization (DTR), the most advanced solutions
  to save GPU memory for DyNN, by 3× and 2.1× respectively in terms of maxi
 mum batch size.\n\nTag: Distributed Computing, Middleware and System Softw
 are, Runtime Systems\n\nRegistration Category: Workshop Reg Pass\n\nSessio
 n Chairs: Barbara Chapman (Hewlett Packard Enterprise (HPE), Stony Brook U
 niversity); Joseph Manzano (Pacific Northwest National Laboratory (PNNL));
  Shirley Moore (University of Texas at El Paso); EunJung (EJ) Park (Qualco
 mm); and Joshua Suetterlein (Pacific Northwest National Laboratory (PNNL))
 \n\n
END:VEVENT
END:VCALENDAR