BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T000714Z
LOCATION:607
DTSTART;TZID=America/Denver:20231113T094500
DTEND;TZID=America/Denver:20231113T095000
UID:submissions.supercomputing.org_SC23_sess448_misc217@linklings.com
SUMMARY:Preemptive Scheduling of Stateful GPU-Intensive HPC Applications i
 n Kubernetes
DESCRIPTION:Radostin Stoyanov (University of Oxford, Red Hat Inc); Adrian 
 Reber (Red Hat Inc); and Wesley Armour (University of Oxford)\n\nContainer
 s provide a new paradigm for building, packaging, deploying and managing a
 pplications consistently across varying infrastructures. However, the util
 ization of containers in HPC has been more difficult due to the culminatio
 n of security and performance requirements. High resource utilization acro
 ss GPU-intensive workloads is a crucial requirement for HPC clusters. Cont
 ainer orchestration platforms such as Kubernetes enable efficient manageme
 nt of HPC infrastructure for use by researchers who need access to scalabl
 e high performance facilities. However, the resource utilization of such o
 rchestration frameworks with GPU-intensive HPC workloads remains relativel
 y unexplored. In this paper, we present kube-criu-scheduler, a Kubernetes 
 scheduler that builds on a recently introduced container checkpointing fea
 ture to enable preemptive scheduling of GPU-accelerated HPC applications. 
 Our results show that resulting efficiency and reliability gains are achie
 ved with negligible impact on application performance.\n\nRegistration Cat
 egory: Workshop Reg Pass\n\nSession Chairs: Richard Shane Canon (Lawrence 
 Berkeley National Laboratory (LBNL)); Alberto Madonna (ETH Zürich, Swiss N
 ational Supercomputing Centre (CSCS)); Laurie A. Stephey (Lawrence Berkele
 y National Laboratory (LBNL), National Energy Research Scientific Computin
 g Center (NERSC)); and Andrew Younge (Sandia National Laboratories)\n\n
END:VEVENT
END:VCALENDAR
