BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T000711Z
LOCATION:401-402
DTSTART;TZID=America/Denver:20231116T143000
DTEND;TZID=America/Denver:20231116T150000
UID:submissions.supercomputing.org_SC23_sess184_pap219@linklings.com
SUMMARY:Co-Design Hardware and Algorithm for Vector Search
DESCRIPTION:Wenqi Jiang (ETH Zurich - Swiss Federal Institute of Technolog
 y); Shigang Li (Beijing University of Posts and Telecommunications); Yu Zh
 u, Johannes de Fine Licht, Zhenhao He, and Runbin Shi (ETH Zurich - Swiss 
 Federal Institute of Technology); Cedric Renggli (Apple Inc); Shuai Zhang 
 (ETH Zurich - Swiss Federal Institute of Technology); Theodoros Rekatsinas
  (Apple Inc); and Torsten Hoefler and Gustavo Alonso (ETH Zurich - Swiss F
 ederal Institute of Technology)\n\nVector search has emerged as the founda
 tion for large-scale information retrieval and machine learning systems, w
 ith search engines like Google and Bing processing tens of thousands of qu
 eries per second on petabyte-scale document datasets by evaluating vector 
 similarities between encoded query texts and web documents. As performance
  demands for vector search systems surge, accelerated hardware offers a pr
 omising solution in the post-Moore's Law era. We introduce FANNS, an end-t
 o-end and scalable vector search framework on FPGAs. Given a user-provided
  recall requirement on a dataset and a hardware resource budget, FANNS aut
 omatically co-designs hardware and algorithm, subsequently generating the 
 corresponding accelerator. The framework also supports scale-out by incorp
 orating a hardware TCP/IP stack in the accelerator. FANNS attains up to 23
 .0x and 37.2x speedup compared to FPGA and CPU baselines, respectively, an
 d demonstrates superior scalability to GPUs, achieving 5.5x and 7.6x speed
 up in median and 95th percentile latency within an eight-accelerator confi
 guration.\n\nTag: Accelerators, Artificial Intelligence/Machine Learning, 
 Codesign, Fault Handling and Tolerance, Performance Measurement, Modeling,
  and Tools, Post-Moore Computing\n\nRegistration Category: Tech Program Re
 g Pass\n\nReproducibility Badges: Artifact Available, Artifact Functional\
 n\nSession Chair: Lishan Yang (George Mason University (GMU))\n\n
END:VEVENT
END:VCALENDAR
