Skip to main content

Session

Event Type
Workshop
TimeSunday, 12 November 20232pm - 5:30pm
Location710
Tags
Fault Handling and Tolerance
Registration Categories
W
Presentations
2:00pm - 2:05pm Welcome to SuperCheck-SC23
2:05pm - 2:50pm AI-Augmented SWARM Based Resilience for Integrate Research Infrastructures
2:50pm - 3:00pm Lightning Talk: Diaspora – Resilient Event Processing for Irregular, Distributed Scientific Applications
3:00pm - 3:25pm SuperCheck-SC23 – Afternoon Break
3:25pm - 3:50pm Checkpoint/Restart for CUDA Kernels
3:50pm - 4:15pm Implementation-Oblivious Transparent Checkpoint-Restart for MPI
4:15pm - 4:40pm Asynchronous Multi-Level Checkpointing: An Enabler of Reproducibility using Checkpoint History Analytics
4:40pm - 4:50pm Lightning Talk: Update on Checkpointing and Localized Recovery for Nested Fork-Join Programs
Presenter
4:50pm - 5:00pm Lightning Talk: Toward Efficient Asynchronous Checkpointing for Large-Language Models
5:00pm - 5:10pm Lightning Talk: Inherent Checkpointing Properties of Nested Parallelism
5:10pm - 5:20pm Lightning Talk: Trade-Offs For Developing File Aggregated I/O For Asynchronous Checkpointing
5:20pm - 5:30pm Lightning Talk: Datastates for Debugging – Using Productive Checkpointing for Improved Debugging
Back To Top Button