BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T000713Z
LOCATION:605
DTSTART;TZID=America/Denver:20231112T162000
DTEND;TZID=America/Denver:20231112T162600
UID:submissions.supercomputing.org_SC23_sess434_ws_ftxs107@linklings.com
SUMMARY:Disk Failure Trends in Alpine Storage System
DESCRIPTION:Anjus George, Jesse Hanley, and Sarp Oral (Oak Ridge National 
 Laboratory (ORNL))\n\nLarge-scale HPC systems demand extensive disk-based 
 storage for data generated by HPC applications, necessitating scalable rel
 iability, availability, and failure management. Extracted failure data fro
 m HPC storage offers valuable insights for preventing and managing failure
 s, spanning understanding storage robustness, guiding system design and de
 ployment, and creating durable data protection schemes. This paper introdu
 ces a failure dataset from OLCF’s Summit supercomputer's file system, Alpi
 ne, encompassing 4000+ events over 2.75 years from 32000+ disks. Before an
 alysis, we delve into Alpine's components and introduce IBM Spectrum Scale
  technology, then assess collected data for failure distribution and burst
  correlations. We infer that, proximity to enclosure fan modules heightens
  disk failure rates. Also, burst failure analysis highlights 1/3rd of fail
 ures occurring in bursts, with 90% non-spatially correlated, impacting mul
 tiple racks.\n\nTag: Fault Handling and Tolerance, Large Scale Systems\n\n
 Registration Category: Workshop Reg Pass\n\nSession Chairs: John Daly (US 
 Department of Defense), Scott Levy (Sandia National Laboratories), and Kei
 ta Teranishi (Oak Ridge National Laboratory (ORNL))\n\n
END:VEVENT
END:VCALENDAR
