BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T000713Z
LOCATION:503-504
DTSTART;TZID=America/Denver:20231115T110000
DTEND;TZID=America/Denver:20231115T113000
UID:submissions.supercomputing.org_SC23_sess251_exforum114@linklings.com
SUMMARY:Overcoming the Cost of Data Movement in AI Inference Accelerators
DESCRIPTION:Arun Iyengar (Untether AI)\n\nThe largest performance bottlene
 ck and energy usage in neural network acceleration is the fetching of weig
 ht and activation values prior to general matrix-vector (GEMV) or general 
 matrix-matrix (GEMM) computation. Traditional von Neumann architectures, e
 ven with large on-chip caches, consume as much as 90% of their energy in d
 ata movement and only 10% for actual calculations, which limits their ener
 gy efficiency to, in most cases, low single digit TOPs/W. Analog in-memory
  compute, where the memory cell is used as part of the MAC calculation, su
 ffers from accuracy issues and the required additional support circuitry, 
 such as analog-to-digital and digital-to-analog converters, and compensati
 on which obviates the inherent low-power advantages, limiting the state of
  the art to 3 TOPs/W.\n\nThe novel Untether AI at-memory compute architect
 ure stores all weights directly on-chip in specially designed low-power SR
 AM using high-density bit cells that are tuned to directly feed the proces
 sing elements (PEs) using minimal energy. Because the PEs are directly adj
 acent to the SRAM cells, it only uses 2 femtojoules per bit-access. This i
 nnovation represents an order of magnitude improvement over compiled memor
 y cells, and three orders of magnitude compared to fetching weights from e
 xternal DRAM.\n\nTag: Accelerators, Artificial Intelligence/Machine Learni
 ng, Architecture and Networks, Hardware Technologies\n\nRegistration Categ
 ory: Tech Program Reg Pass, Exhibits Reg Pass\n\nSession Chair: Nathan Han
 ford (Lawrence Livermore National Laboratory (LLNL))\n\n
END:VEVENT
END:VCALENDAR