BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T000605Z
LOCATION:DEF Concourse
DTSTART;TZID=America/Denver:20231116T100000
DTEND;TZID=America/Denver:20231116T170000
UID:submissions.supercomputing.org_SC23_sess300_spostg105@linklings.com
SUMMARY:Fast Checkpointing of Large Language Models with TensorStore CHFS
DESCRIPTION:Sohei Koyama (University of Tsukuba)\n\nThe frequency of check
 point creation in large language models is limited by the write bandwidth 
 to a parallel file system.  In this study, we aim to reduce the checkpoint
  creation time by writing to the Intel Optane Persistent Memory installed 
 on the compute nodes.\n\nWe propose TensorStore CHFS, a storage driver tha
 t adds an ad hoc parallel file system CHFS to the TensorStore.  The propos
 ed method succeeded in increasing the checkpoint creation bandwidth of the
  T5 1.1 model by 4.5 times on 32 nodes.\n\nRegistration Category: Tech Pro
 gram Reg Pass, Exhibits Reg Pass\n\n
END:VEVENT
END:VCALENDAR
