BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T000713Z
LOCATION:503-504
DTSTART;TZID=America/Denver:20231115T103000
DTEND;TZID=America/Denver:20231115T110000
UID:submissions.supercomputing.org_SC23_sess251_exforum133@linklings.com
SUMMARY:Cost-Effective LLM Inference Solution Using SK hynix's AiM (Accele
 rator-in-Memory)
DESCRIPTION:Yongkee Kwon (SK hynix Inc)\n\nLarge language models (LLMs) ar
 e becoming increasingly popular for a variety of AI services, such as chat
 bots and virtual assistants. However, serving LLMs can be challenging, due
  to their high operating costs and long service latency. The main challeng
 e in serving LLMs is the memory bandwidth bottleneck. LLMs require a lot o
 f memory to store their parameters, and this memory bandwidth can be a lim
 iting factor in the speed of inference. As LLM models continue to grow in 
 size, this problem is only going to get worse.\n\nWe propose a new solutio
 n to the memory bandwidth bottleneck for serving LLMs. Our solution, calle
 d AiM (Accelerator-in-Memory), is a SK hynix's PIM device that is speciali
 zed for serving LLMs. AiM can exploit the abundant memory bandwidth availa
 ble inside memory to accelerate GEMV operations, which are the most comput
 ationally expensive operations in LLM inference. We evaluated AiM on a var
 iety of LLM models and tasks. Our results show that AiM can significantly 
 improve the performance and energy efficiency of LLM inference. For exampl
 e, on the GPT-3 model, AiM can achieve up to 10x speedup at lower cost and
  energy consumption over the state-of-the-art GPU systems.\n\nWe believe t
 hat AiM is a promising solution to the memory bandwidth bottleneck for ser
 ving LLMs. AiM can significantly improve the performance and energy effici
 ency of LLM inference, making it possible to deploy LLMs in real-world app
 lications.\n\nTag: Accelerators, Artificial Intelligence/Machine Learning,
  Architecture and Networks, Hardware Technologies\n\nRegistration Category
 : Tech Program Reg Pass, Exhibits Reg Pass\n\nSession Chair: Nathan Hanfor
 d (Lawrence Livermore National Laboratory (LLNL))\n\n
END:VEVENT
END:VCALENDAR
