BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T000631Z
LOCATION:403-404
DTSTART;TZID=America/Denver:20231116T103000
DTEND;TZID=America/Denver:20231116T120000
UID:submissions.supercomputing.org_SC23_sess178@linklings.com
SUMMARY:Architecture-Specific Optimization
DESCRIPTION:VENOM: A Vectorized N:M Format for Unleashing the Power of Spa
 rse Tensor Cores\n\nThe increasing success and scaling of Deep Learning mo
 dels demands higher computational efficiency and power. Sparsification can
  lead to both smaller models as well as higher compute efficiency, and acc
 elerated hardware is becoming available. However, exploiting it efficientl
 y requires kernel implem...\n\n\nRoberto L. Castro (Universidade da Coruña
 ), Andrei Ivanov (ETH Zürich), Diego Andrade (Universidade da Coruña), Tal
  Ben-Nun (ETH Zürich), Basilio B. Fraguela (Universidade da Coruña), and T
 orsten Hoefler (ETH Zürich)\n---------------------\nCalculon: a Methodolog
 y and Tool for High-Level Codesign of Systems and Large Language Models\n\
 nThis paper presents a parameterized analytical performance model of trans
 former-based Large Language Models (LLMs) for guiding high-level algorithm
 -architecture codesign studies. This model derives from an extensive surve
 y of performance optimizations that have been proposed for the training an
 d inf...\n\n\nMikhail Isaev (Georgia Institute of Technology), Nic McDonal
 d and Larry Dennison (NVIDIA Corporation), and Richard Vuduc (Georgia Inst
 itute of Technology)\n---------------------\nOptimizing Direct Convolution
 s on ARM Multi-Cores\n\nConvolution kernels are widely seen in deep learni
 ng workloads and are often responsible for performance bottlenecks. Recent
  research has demonstrated that a direct convolution approach can outperfo
 rm the traditional convolution implementation based on tensor-to-matrix co
 nversions. However, existing...\n\n\nPengyu Wang, Weiling Yang, Jianbin Fa
 ng, Dezun Dong, Chun Huang, Peng Zhang, and Tao Tang (National University 
 of Defense Technology (NUDT), China) and Zheng Wang (University of Leeds, 
 School of Computing, UK)\n\nTag: Artificial Intelligence/Machine Learning,
  Codesign, Performance Optimization, Programming Frameworks and System Sof
 tware\n\nRegistration Category: Tech Program Reg Pass\n\nReproducibility B
 adges: Artifact Available, Artifact Functional, Results Reproduced\n\nSess
 ion Chair: Aparna Chandramowlishwaran (University of California, Irvine)
END:VEVENT
END:VCALENDAR
