BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T000712Z
LOCATION:403-404
DTSTART;TZID=America/Denver:20231117T092000
DTEND;TZID=America/Denver:20231117T094000
UID:submissions.supercomputing.org_SC23_sess459_ws_h2rc102@linklings.com
SUMMARY:Chameleon:  A Disaggregated CPU, GPU, and FPGA System for Retrieva
 l-Augmented Language Models
DESCRIPTION:Wenqi Jiang and Gustavo Alonso (ETH Zurich - Swiss Federal Ins
 titute of Technology)\n\nA Retrieval-Augmented Language Model (RALM) augme
 nts a generative language model by retrieving context-specific knowledge f
 rom an external database via vector search. This strategy facilitates impr
 essive text generation quality even with smaller models, thus saving order
 s of magnitude of computational resources compared to large language model
 s such as GPT4. However, RALMs introduce significant challenges to system 
 designs due to the diverse workload characteristics of the different RALM 
 components. In this presentation, we present Chameleon, a heterogeneous sy
 stem that combines CPUs, GPUs, and FPGAs in a disaggregated manner for eff
 icient RALM serving. While GPUs still manage the computationally-intensive
  model inference, we design a distributed CPU-FPGA engine for large-scale 
 vector search requiring substantial memory capacity and rapid quantized ve
 ctor decoding, with the CPU server managing the vector index and FPGA-base
 d disaggregated memory nodes scanning database vectors using near-memory a
 ccelerators. Chameleon vector search achieves 8.6~29.4x and 1.6~57.9x lowe
 r latency than CPU and GPU-based systems.\n\nTag: Architecture and Network
 s\n\nRegistration Category: Workshop Reg Pass\n\nSession Chairs: Jason Bak
 os (University of South Carolina); Franck Cappello (Argonne National Labor
 atory (ANL), University of Illinois); Torsten Hoefler (ETH Zürich, Microso
 ft Corporation); Kenneth O'Brien (Advanced Micro Devices, Inc. (AMD)); and
  Christian Plessl (Paderborn University, Germany)\n\n
END:VEVENT
END:VCALENDAR
