BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T000713Z
LOCATION:503-504
DTSTART;TZID=America/Denver:20231112T091200
DTEND;TZID=America/Denver:20231112T091900
UID:submissions.supercomputing.org_SC23_sess422_ws_hpcsysp109@linklings.co
 m
SUMMARY:Clushible: Tidal Wave-Like Configuration with Ansible
DESCRIPTION:Jared Baker, John Blaas, and Jenett Tillotson (National Center
  for Atmospheric Research (NCAR))\n\nConfiguration of HPC nodes is an impo
 rtant aspect of maintaining any HPC cluster. Our flagship HPE/Cray EX supe
 rcomputer, Derecho, is approximately 2,500 compute nodes and is susceptibl
 e to power interruptions from external factors such as lightning strike in
 duced power sags and utility mishaps. These events challenged us to find a
 n acceptable mean time to recovery. Ansible is our selected configuration 
 management system but struggles with single large-scale runs of configurat
 ion despite optimizing individual runs such as tuning fork count and enabl
 ing pipelining. We needed a method to perform a large blast of configurati
 on within a short time period to get the system back to a functional state
  or apply some level of remediation such as security updates. We therefore
  wrote a utility, Clushible, which wraps Ansible with ClusterShell's Pytho
 n API to scale out the execution of Ansible that effectively took our stan
 dard full system run from multiple hours to minutes.\n\nTag: Artificial In
 telligence/Machine Learning, Cloud Computing, Distributed Computing, Data 
 Analysis, Visualization, and Storage, Data Movement and Memory, Fault Hand
 ling and Tolerance, I/O and File Systems, Large Scale Systems, Performance
  Optimization, Resource Management, Security, State of the Practice\n\nReg
 istration Category: Workshop Reg Pass\n\nSession Chairs: Matt Bidwell (Nat
 ional Renewable Energy Laboratory (NREL)) and John Blaas (National Center 
 for Atmospheric Research (NCAR))\n\n
END:VEVENT
END:VCALENDAR
