BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T000712Z
LOCATION:710
DTSTART;TZID=America/Denver:20231112T145000
DTEND;TZID=America/Denver:20231112T150000
UID:submissions.supercomputing.org_SC23_sess427_misc310@linklings.com
SUMMARY:Lightning Talk:  Diaspora – Resilient Event Processing for Irregul
 ar, Distributed Scientific Applications
DESCRIPTION:Justin Wozniak (Argonne National Laboratory (ANL))\n\nModern s
 cience increasingly requires the coordinated use of advanced computing, ne
 tworking, instruments, and experimental facilities: collectively, research
  infrastructure. This infrastructure reaches from HPC systems to high-data
 -rate instruments and less well-connected edge systems, and also encompass
 es cloud-hosted services. These resources and their applications can gener
 ate many events, and because many science applications span locations, sci
 entists need to consume events from many sources. To meet this need, we ar
 e developing Diaspora, a resilient, hierarchical event streaming approach 
 that scales to meet the needs of modern science. Such complex, distributed
  applications have myriad hard and soft failure modes. The widely used coo
 rdinated checkpoint-restart resilience solution simply requires that proce
 sses agree on a globally consistent state, which then can be independently
  captured piece-wise by the processes and restarted from in case of failur
 es. However, such approaches have limited applicability at very large scal
 es that may involve geographically distributed resources, because the prob
 lem of agreeing on a globally consistent state is not tractable. Under suc
 h circumstances, there is a need to envision new abstractions to achieve r
 esilience. This talk briefly introduces such abstractions that we propose 
 in Diaspora. Notably, we envision the use of an event-streaming backbone t
 hat allows both loosely and tightly coupled workflow components to communi
 cate and persist data in a resilient fashion. This context opens new oppor
 tunities to apply checkpointing techniques, which we will highlight. Furth
 ermore, we will also describe the scientific applications targeted by the 
 project, including federated learning, astronomical image processing, and 
 x-ray image processing at advanced photon sources.\n\nTag: Fault Handling 
 and Tolerance\n\nRegistration Category: Workshop Reg Pass\n\nSession Chair
 s: Gene Cooperman (Northeastern University); Donglai Dai (Advanced Micro D
 evices (AMD)); Rebecca Hartman-Baker (National Energy Research Scientific 
 Computing Center (NERSC), Lawrence Berkeley National Laboratory (LBNL)); a
 nd Bogdan Nicolae (Argonne National Laboratory (ANL), Illinois Institute o
 f Technology)\n\n
END:VEVENT
END:VCALENDAR
