BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T000711Z
LOCATION:403-404
DTSTART;TZID=America/Denver:20231115T143000
DTEND;TZID=America/Denver:20231115T150000
UID:submissions.supercomputing.org_SC23_sess160_pap103@linklings.com
SUMMARY:5 ExaFlop/s HPL-MxP Benchmark with Linear Scalability on the 40-Mi
 llion-Core Sunway Supercomputer
DESCRIPTION:Rongfen Lin (National Research Center of Parallel Computer Eng
 ineering and Technology, China; Tsinghua University, China); Xinhui Yuan (
 National Research Center of Parallel Computer Engineering and Technology, 
 China); Wei Xue (Tsinghua University, China; Qinghai University); WanWang 
 Yin (National Research Center of Parallel Computer Engineering and Technol
 ogy, China); Jienan Yao (Tsinghua University, China); Junda Shi, Qiang Sun
 , and Chaobo Song (National Research Center of Parallel Computer Engineeri
 ng and Technology, China); and Fei Wang (Tsinghua University, China; Natio
 nal Research Center of Parallel Computer Engineering and Technology, China
 )\n\nHPL-MxP is an emerging high performance benchmark used to measure the
  mixed-precision computing capability of leading supercomputers.  This wor
 k present our efforts on the new Sunway that linearly scales the benchmark
  to over 40 million cores, sustains an overall mixed-precision performance
  exceeding 5 ExaFlop/s, and achieves over 85% of peak performance, which i
 s the highest efficiency reached among all heterogeneous systems on the HP
 L-MxP list. The optimizations of our HPL-MxP implementation include: (1)a 
 Two-Direction Look-Ahead and Overlap algorithm that enables overlaps of al
 l communications with computation; (2)a multi-level process-mapping and co
 mmunication-scheduling method that uses the network as best as possible wh
 ile maintaining conflict-free algorithm-flow; and (3)a CG-Fusion computing
  framework that eliminates up to 60% of inter-chip communications and remo
 ves the memory access bottleneck while serving both computation and commun
 ication simultaneously. This work could also provide useful insights for t
 uning cutting-edge applications on Sunway supercomputers as well as other 
 heterogeneous supercomputers.\n\nTag: Exascale, Large Scale Systems, State
  of the Practice\n\nRegistration Category: Tech Program Reg Pass\n\nAward 
 Finalist: Best Paper Finalist\n\nSession Chair: Taisuke Boku (University o
 f Tsukuba, Advanced HPC‑AI Research and Development Support Center (HAIRDE
 SC))\n\n
END:VEVENT
END:VCALENDAR
