BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T000712Z
LOCATION:507
DTSTART;TZID=America/Denver:20231113T094000
DTEND;TZID=America/Denver:20231113T100000
UID:submissions.supercomputing.org_SC23_sess444_ws_waccpd106@linklings.com
SUMMARY:Specialized Kernels for Optimizing GPU Offload in OpenMP
DESCRIPTION:Dhruva Chakrabarti, Gregory Rodgers, Carlo Bertolli, Gheorghe-
 Teodor Bercea, Jan-Patrick Lehr, Lynd Stringer, Jan Leyonberg, Dan Palermo
 , and Ron Lieberman (Advanced Micro Devices (AMD) Inc)\n\nProgramming mode
 ls for general purpose GPU (GPGPU) computing include grid and non-grid lan
 guages.  Grid languages like CUDA and HIP map directly to the GPU hardware
  and can extract high performance from applications.  However, this low-le
 vel programming approach makes them more difficult to program than non-gri
 d languages such as C, C++, and Fortran with OpenMP target offload.  Furth
 ermore, grid languages often have more portability issues than non-grid la
 nguages.  However, code generated from non-grid languages using automatic 
 compiler and runtime techniques often incur higher overhead while generati
 ng GPU kernels.\n \nThis presentation discusses compiler and runtime techn
 iques to generate specialized, high-performance kernels for OpenMP target 
 regions in certain common situations. We outline conditions under which sp
 ecialized kernels are generated for OpenMP target regions, both with and w
 ithout reduction clauses. Experimental results on AMD GPUs indicate that a
  large percentage of OpenMP target regions are amenable to specialization 
 and consequent performance improvement.\n\nTag: Accelerators, Artificial I
 ntelligence/Machine Learning, Algorithms, Applications, Compilers, Data Mo
 vement and Memory, Heterogeneous Computing, Modeling and Simulation, Perfo
 rmance Optimization, Programming Frameworks and System Software, Runtime S
 ystems\n\nRegistration Category: Workshop Reg Pass\n\nSession Chairs: Maci
 ej Cytowski (Pawsey Supercomputing Research Centre; Commonwealth Scientifi
 c and Industrial Research Organisation (CSIRO), Australia); Verónica G. Me
 lesse Vergara (Oak Ridge National Laboratory (ORNL)); and Jose Manuel Mons
 alve Diaz (Advanced Micro Devices (AMD))\n\n
END:VEVENT
END:VCALENDAR