BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T000711Z
LOCATION:710
DTSTART;TZID=America/Denver:20231112T171000
DTEND;TZID=America/Denver:20231112T172000
UID:submissions.supercomputing.org_SC23_sess427_misc323@linklings.com
SUMMARY:Lightning Talk:  Trade-Offs For Developing File Aggregated I/O For
  Asynchronous Checkpointing
DESCRIPTION:Mikaila Gossman (Clemson University)\n\nAsynchronous checkpoin
 t-restart (C/R) has become popular in recent years for its ability to chec
 kpoint alongside the application. One implementation  is VELOC, which quic
 kly checkpoints to a local storage device, and then flushes the checkpoint
 s to the PFS concurrently with the application. VELOC natively adopts a fi
 le-per-process checkpointing strategy, meaning each distributed applicatio
 n process creates its own checkpoint file. File-per-process is easy to imp
 lement, enables embarrassing parallelism, and often provides high throughp
 ut by by-passing strict POSIX semantics. At sufficient scale, file-per-pro
 cess is difficult for users to manage, making the checkpoints hard to veri
 fy, migrate and/or manage. Further, file-per-process strategies are not sc
 alable, due to oversubscription of underlying storage hardware at signific
 ant scale, resulting in lower overall performance of the application and p
 ersisting the checkpoint.\n\nTo alleviate such challenges, asynchronous C/
 R must adopt file aggregation techniques. However, this is a nontrivial pr
 oblem to solve as aggregation requires coordination (e.g. synchronization)
  between processes and compute nodes to facilitate aggregation, while also
  respecting the complex nature of resource competition between the applica
 tion and asynchronous C/R implementation. The most common implementation o
 f aggregation is known as two-phase I/O, where-in a subset of processes ar
 e designated as I/O leaders to flush data to the parallel file system (PFS
 ) on behalf of all processes. State-of-the-art implementations of two-phas
 e I/O, such as MPI-IO, overlap the data exchange phase and the flushing ph
 ase. However, previous works have shown that state-of-the-art aggregation 
 methods, like MPI-IO, are insufficient for asynchronous checkpointing due 
 to the inherent synchronization cost to perform I/O aggregation. Further, 
 it has no mechanism for respecting resource consumption thereby negatively
  impacting the concurrently running application. This talk discusses our w
 ork towards developing a tunable I/O aggregation strategy that operates ef
 ficiently in the background to complement asynchronous C/R. We analyze tra
 de-offs and discuss the performance impact on large-scale microbenchmarks 
 for developing such strategies. Specifically, we explore how to (1) develo
 p efficient, thread-safe data receptions on limited-sized write buffers of
  I/O leaders, (2) prioritize remote (from non-leaders)  and local data on 
 I/O leaders to minimize checkpoint overhead, and (3) load-balance flushing
  on the I/O leaders.\n\nTag: Fault Handling and Tolerance\n\nRegistration 
 Category: Workshop Reg Pass\n\nSession Chairs: Gene Cooperman (Northeaster
 n University); Donglai Dai (Advanced Micro Devices (AMD)); Rebecca Hartman
 -Baker (National Energy Research Scientific Computing Center (NERSC), Lawr
 ence Berkeley National Laboratory (LBNL)); and Bogdan Nicolae (Argonne Nat
 ional Laboratory (ANL), Illinois Institute of Technology)\n\n
END:VEVENT
END:VCALENDAR
