BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T000610Z
LOCATION:605
DTSTART;TZID=America/Denver:20231113T090000
DTEND;TZID=America/Denver:20231113T123000
UID:submissions.supercomputing.org_SC23_sess441@linklings.com
SUMMARY:2023 International Workshop on Performance, Portability, and Produ
 ctivity in HPC (P3HPC)
DESCRIPTION:High-Level GPU Code:  A Case Study Examining JAX and OpenMP\n\
 nIn recent years, a new scientific software design pattern has emerged tha
 t pairs a Python interface with high-performance kernels in lower-level la
 nguages. The rise of general-purpose GPUs necessitates the rewriting of ma
 ny such kernels, posing challenges in GPU programming and ensuring future 
 porta...\n\n\nNestor Demeure, Theodore Kisner, Reijo Keskitalo, Rollin Tho
 mas, Julian Borrill, and Wahid Bhimji (Lawrence Berkeley National Laborato
 ry (LBNL))\n---------------------\nP3HPC – Morning Break\n----------------
 -----\nPorting Batched Iterative Solvers onto Intel GPUs with SYCL\n\nBatc
 hed linear solvers play a vital role in computational sciences, especially
  in the fields of plasma physics and combustion simulations. With the immi
 nent deployment of the Aurora Supercomputer and other upcoming systems equ
 ipped with Intel GPUs, there is a compelling demand to expand the capabili
 ...\n\n\nPhuong Nguyen (University of Tennessee, Innovative Computing Labo
 ratory (ICL)); Pratik Nayak (Karlsruhe Institute of Technology (KIT)); and
  Hartwig Anzt (University of Tennessee, Innovative Computing Laboratory (I
 CL))\n---------------------\nCuPBoP-AMD:  Extending CUDA to AMD Platforms\
 n\nThe proliferation of artificial intelligence applications has underscor
 ed the need for increased portability among  graphic processing units (GPU
 s) from different vendors. With CUDA as one of the most popular GPU progra
 mming languages, CuPBoP (CUDA for Parallelized and Broad-range Processors)
  aims t...\n\n\nJun Chen, Xule Zhou, and Hyesoon Kim (Georgia Institute of
  Technology)\n---------------------\nMany Cores, Many Models:  GPU Program
 ming Model vs. Vendor Compatibility Overview\n\nIn recent history, GPUs be
 came a key driver of compute performance in HPC. With the installation of 
 the Frontier supercomputer, they became the enablers of the exascale era; 
 further largest-scale installations are in progress (Aurora, El Capitan, J
 UPITER). But the early-day dominance by NVIDIA and t...\n\n\nAndreas Herte
 n (Forschungszentrum Jülich; Juelich Supercomputing Centre (JSC), Institut
 e for Advanced Simulation)\n---------------------\nP3HPC – Wrapup\n-------
 --------------\nEvaluating the Performance Portability of SYCL across CPUs
  and GPUs on Bandwidth-Bound Applications\n\nWe evaluate the portability o
 f the SYCL programming model on some of the latest CPUs and GPUs from a wi
 de range of vendors, utilizing the two main compilers: DPC++ and hipSYCL/O
 penSYCL. Both compilers currently support GPUs from all three major vendor
 s; we evaluate performance on the Intel(R) Data C...\n\n\nIstván Z. Reguly
  (Pázmány Péter Catholic University, Hungary)\n---------------------\nBenc
 hmarking a Portable Lattice Quantum Chromodynamics Kernel Written in Kokko
 s and MPI\n\nSimulations of Lattice Quantum Chromodynamics (LQCD) are an i
 mportant application (two digit percentage of cycles) on major High Perfor
 mance Computing (HPC) installations, including systems high up on and lead
 ing the top500 list. In the rapidly changing hardware landscape of HPC, bi
 nding workforce t...\n\n\nSimon Schlepphorst (Forschungszentrum Jülich) an
 d Stefan Krieg (Forschungszentrum Jülich, University of Bonn)\n-----------
 ----------\nPerformance Portability Evaluation of Blocked Stencil Computat
 ions on GPUs\n\nIn this new era where multiple GPU vendors are leading the
  supercomputing landscape, and multiple programming models are available t
 o users, the drive to achieve performance portability across platforms fac
 es new challenges.  Consider stencil algorithms, where architecture-specif
 ic solutions are req...\n\n\nOscar Antepara, Samuel Williams, and Hans Joh
 ansen (Lawrence Berkeley National Laboratory (LBNL)) and Tuowen Zhao, Sama
 ntha Hirsch, Priya Goyal, and Mary Hall (University of Utah)\n------------
 ---------\nPerformance Portability of Programming Strategies for Nearest-N
 eighbor Communication with GPU-Aware MPI\n\nTo better advise HPC applicati
 on developers, we have implemented Faces, a nearest-neighbor microbenchmar
 k that quantifies performance trade-offs. The Faces experiments presented 
 here explore the following design choices: 1) fewer dependent messages ver
 sus more independent messages, 2) fewer fused GP...\n\n\nJames B. White II
 I (Hewlett Packard Enterprise (HPE))\n---------------------\nP3HPC – Welco
 me and Introduction\n\nScott Parker (Argonne National Laboratory (ANL))\n-
 --------------------\nMatRIS: Multilevel Math Library Abstraction for Hete
 rogeneity and Performance Portability Using IRIS Runtime\n\nVendor librari
 es are tuned for a specific architecture and are not portable to others. M
 oreover, they lack support for heterogeneity and multi-device orchestratio
 n, which is required for efficient use of contemporary HPC and cloud resou
 rces. To address these challenges, we introduce MatRIS-a multile...\n\n\nM
 ohammad Alaul Haque Monil, Narasinga Rao Miniskar, Keita Teranishi, Jeffre
 y S. Vetter, and Pedro Valero-Lara (Oak Ridge National Laboratory (ORNL))\
 n---------------------\nEvaluating the Performance of One-Sided Communicat
 ion on CPUs and GPUs\n\nAs high-performance GPU computing becomes the tren
 d, GPU-initiated one-sided communication becomes a viable solution for mul
 ti-GPU scaling. It also raises attention to the use of one-sided communica
 tion on CPUs. However, the lack of deep understanding of one-sided communi
 cation performance and its i...\n\n\nNan Ding, Muhammad Haseeb, Taylor Gro
 ves, and Samuel Williams (Lawrence Berkeley National Laboratory (LBNL))\n-
 --------------------\nPerformance Evaluation of Heterogeneous GPU Programm
 ing Frameworks for Hemodynamic Simulations\n\nPreparing for the deployment
  of large scientific and engineering codes on upcoming exascale systems wi
 th GPU-dense nodes is made challenging by the unprecedented diversity of d
 evice architectures and heterogeneous programming models. In this work, we
  evaluate the process of porting a massively paral...\n\n\nAristotle Marti
 n (Duke University); Geng Liu (Argonne National Laboratory (ANL)); William
  Ladd (Duke University); Seyong Lee, John Gounley, and Jeffrey Vetter (Oak
  Ridge National Laboratory (ORNL)); Saumil Patel, Silvio Rizzi, Victor Mat
 eevitsi, and Joseph Insley (Argonne National Laboratory (ANL)); and Amanda
  Randles (Duke University)\n---------------------\nA Performance-Portable 
 SYCL Implementation of CRK-HACC for Exascale\n\nThe first generation of ex
 ascale systems will include a variety of machine architectures, featuring 
 GPUs from multiple vendors. As a result, many developers are interested in
  adopting portable programming models to avoid maintaining multiple versio
 ns of their code. It is necessary to document experi...\n\n\nEsteban Range
 l (Argonne National Laboratory (ANL)), John Pennycook (Intel Corporation),
  Adrian Pope and Nicholas Frontiere (Argonne National Laboratory (ANL)), a
 nd Zhiqiang Ma and Varsha Madananth (Intel Corporation)\n\nTag: Performanc
 e Measurement, Modeling, and Tools, Performance Optimization\n\nRegistrati
 on Category: Workshop Reg Pass\n\nSession Chairs: Judith C. Hill (Lawrence
  Livermore National Laboratory (LLNL)), CJ Newburn (NVIDIA Corporation), S
 cott J. Parker (Argonne National Laboratory (ANL)), and John Pennycook (In
 tel Corporation)
END:VEVENT
END:VCALENDAR