BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T000712Z
LOCATION:405-406-407
DTSTART;TZID=America/Denver:20231114T163000
DTEND;TZID=America/Denver:20231114T170000
UID:submissions.supercomputing.org_SC23_sess173_pap498@linklings.com
SUMMARY:Space Efficient Sequence Alignment for SRAM-Based Computing: X-Dro
 p on the Graphcore IPU
DESCRIPTION:Luk Burchard (Simula Research Laboratory); Max Xiaohang Zhao (
 Charité Universitätsmedizin Berlin); Johannes Langguth (Simula Research La
 boratory, University of Bergen); Aydın Buluç (Lawrence Berkeley National L
 aboratory (LBNL)); and Giulia Guidi (Cornell University)\n\nDedicated acce
 lerator hardware has become essential for processing AI-based workloads, l
 eading to the rise of novel accelerator architectures.  Furthermore, funda
 mental differences in memory architecture and parallelism have made these 
 accelerators targets for scientific computing.  The sequence alignment pro
 blem is fundamental in bioinformatics; we have implemented the X-Drop algo
 rithm, a heuristic method for pairwise alignment that reduces search space
 , on the Graphcore Intelligence Processor Unit (IPU) accelerator.  The X-D
 rop algorithm has an irregular computational pattern, which makes it diffi
 cult to accelerate due to load balancing.\n\nHere, we introduce a graph-ba
 sed partitioning and queue-based batch system to improve load balancing.  
 Our implementation achieves 10x speedup over a state-of-the-art GPU implem
 entation and up to 4.65x compared to CPU.  In addition, we introduce a mem
 ory-restricted X-Drop algorithm that reduces memory footprint by 55x and e
 fficiently uses the IPU's limited low-latency SRAM.  This optimization fur
 ther improves the strong scaling performance by 3.6x.\n\nTag: Accelerators
 , Applications, Graph Algorithms and Frameworks, Performance Measurement, 
 Modeling, and Tools, Programming Frameworks and System Software\n\nRegistr
 ation Category: Tech Program Reg Pass\n\nReproducibility Badges: Artifact 
 Available, Artifact Functional, Results Reproduced\n\nSession Chair: Mehme
 t E Belviranli (Colorado School of Mines)\n\n
END:VEVENT
END:VCALENDAR