BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T000623Z
LOCATION:605
DTSTART;TZID=America/Denver:20231112T140000
DTEND;TZID=America/Denver:20231112T173000
UID:submissions.supercomputing.org_SC23_sess434@linklings.com
SUMMARY:13th Workshop on Fault-Tolerance for HPC at Extreme Scale (FTXS 20
 23)
DESCRIPTION:FTXS 2023 : Invited Speaker (Paolo Rech, "Quantum Computing Re
 liability: Problems, Tools, and Potential Solutions")\n\nQuantum computing
  is a new computational paradigm, expected to revolutionize the computing 
 field in the next few years. Qubits, the atomic units of a quantum circuit
 , exploit the quantum physics properties to increase the parallelism and s
 peed of computation.  Unfortunately, qubits are both intrinsi...\n\n\nPaol
 o Rech (Università di Trento)\n---------------------\nRecovery from Silent
  Data Corruption via Spatial Data Prediction\n\nHigh-performance computing
  applications are central to advancement in many fields of science and eng
 ineering. Central to this advancement is the supposed reliability of the H
 PC system. However, as system size grows and hardware components run with 
 near-threshold voltages, transient upset events beco...\n\n\nKristen Guern
 sey, Sarah Placke, Alexandra Poulos, and Jon Calhoun (Clemson University)\
 n---------------------\nFTXS 2023 – Closing Remarks\n\nThank you for atten
 ding FTXS 2023.  See you next year!\n\n\nKeita Teranishi (Oak Ridge Nation
 al Laboratory (ORNL))\n---------------------\nWhen to Checkpoint at the En
 d of a Fixed-Length Reservation?\n\nConsider an application executing for 
 a fixed duration. The checkpoint duration is a stochastic random variable 
 that obeys some well-known probability distribution law.  The question is 
 when to take a checkpoint towards the end of the execution, so that the ex
 pectation of the work done is maximized....\n\n\nQuentin Barbut and Anne B
 enoit (ENS Lyon); Thomas Herault (University of Tennessee); Yves Robert (E
 NS Lyon, University of Tennessee); and Frédéric Vivien (INRIA)\n----------
 -----------\nEvaluating the Resiliency of Posits for Scientific Computing\
 n\nIEEE-754 is the de-facto standard for the implementation of floating po
 int number systems in hardware, although recently, posits have been propos
 ed as a drop-in replacement. Recent work has suggested that posits can off
 er greater numerical accuracy and reproducibility than IEEE-754-compliant 
 floatin...\n\n\nBenjamin Schlueter, Jon Calhoun, and Alexandra Poulos (Cle
 mson University)\n---------------------\nFTXS 2023 – Afternoon Break\n----
 -----------------\nUsing Benford's Law to Identify Unusual Failure Regions
 \n\nFault tolerance remains a key challenge for current high performance c
 omputing systems. Effective and efficient scheduling of mitigation methods
  continues to be a critical issue in the face of dynamic and difficult-to-
 predict error rates found on many systems. Using failure data from the Ast
 ra super...\n\n\nKurt Ferreira (Sandia National Laboratories, University o
 f New Mexico) and Scott Levy (Sandia National Laboratories)\n-------------
 --------\nFTXS 2023 – Opening Remarks\n\nIntroduction and welcome to FTXS 
 2023.\n\n\nKeita Teranishi (Oak Ridge National Laboratory (ORNL))\n-------
 --------------\nDisk Failure Trends in Alpine Storage System\n\nLarge-scal
 e HPC systems demand extensive disk-based storage for data generated by HP
 C applications, necessitating scalable reliability, availability, and fail
 ure management. Extracted failure data from HPC storage offers valuable in
 sights for preventing and managing failures, spanning understanding ...\n\
 n\nAnjus George, Jesse Hanley, and Sarp Oral (Oak Ridge National Laborator
 y (ORNL))\n---------------------\nOptimizing Write Performance for Checkpo
 inting to Parallel File Systems Using LSM-Trees\n\nThe widening gap betwee
 n compute and I/O performances on modern HPC systems means that writing ch
 eckpoints to a parallel file system for fault tolerance is fast becoming a
  bottleneck to high-performance. It is therefore vital that software is en
 gineered such that it can achieve the highest proportio...\n\n\nSerdar Bul
 ut and Steven A. Wright (University of York, England)\n-------------------
 --\nDynamic Selective Protection of Sparse Iterative Solvers via ML Predic
 tion of Soft Error Impacts\n\nSoft errors occur frequently on large comput
 ing platforms due to the increasing scale and complexity of HPC systems. V
 arious resilience techniques have been proposed to protect scientific appl
 ications from soft errors. Among them, system-level replication often invo
 lves duplicating or triplicating t...\n\n\nZizhao Chen (University of Kans
 as); Thomas Verrecchia (National Institute of Advanced Technology (ENSTA P
 aris)); Hongyang Sun (University of Kansas); Joshua Booth (University of A
 labama, Huntsville); and Padma Raghavan (Vanderbilt University)\n\nTag: Fa
 ult Handling and Tolerance, Large Scale Systems\n\nRegistration Category: 
 Workshop Reg Pass\n\nSession Chairs: John Daly (US Department of Defense),
  Scott Levy (Sandia National Laboratories), and Keita Teranishi (Oak Ridge
  National Laboratory (ORNL))
END:VEVENT
END:VCALENDAR
