BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T000714Z
LOCATION:DEF Concourse
DTSTART;TZID=America/Denver:20231114T100000
DTEND;TZID=America/Denver:20231114T170000
UID:submissions.supercomputing.org_SC23_sess291@linklings.com
SUMMARY:Research Posters Display
DESCRIPTION:Parallel Optimization Methods for Direct Numerical Simulation 
 of High Reynolds Number Wall Turbulence with a Grid Size of 100 Billion\n\
 nDirect numerical simulation (DNS) is a technique that directly solves the
  fluid Navier-Stokes equations with high spatial and temporal resolutions.
  However, its utility in studying high Reynolds number (Re) wall turbulenc
 e of particular interest is limited by the rapidly growing grid size (i.e.
 , the...\n\n\nJiabin Xie, Guangnan Feng, Han Huang, Junxuan Feng, and Yuto
 ng Lu (Sun Yat-sen University, Guangzhou, China)\n---------------------\nH
 PC Accelerated Generative Deep Learning Approach for Creating Digital Twin
 s of Climate Models\n\nClimate models cannot perfectly represent the compl
 ex climate system, but by running them multiple times with small variation
 s in input parameters, it's possible to estimate uncertainties and explore
  different climate scenarios. Generating these ensembles demands significa
 nt computational resources ...\n\n\nJohannes Meuer, Christopher Kadow, and
  Thomas Ludwig (German Climate Computing Centre (DKRZ)) and Claudia Timmre
 ck (Max Planck Institute for Meteorology)\n---------------------\nSCALABLE
  – Scalable Lattice Boltzmann Leaps to Exascale\n\nThe SCALABLE project ai
 ms to enhance an industrial Lattice Boltzmann Method (LBM)-based computati
 onal fluid dynamics (CFD) solver for current and future extreme-scale arch
 itectures, while ensuring accessibility for end-users and developers. This
  is accomplished by transferring technology and knowled...\n\n\nJayesh Bad
 waik (Jülich Supercomputing Centre); Lubomír Říha, Radim Vavřík, Ondřej Vy
 socký, and Kristian Kadlubiak (IT4Innovations National Supercomputing Cent
 er, VŠB – Technical University of Ostrava); Gabriel Staffelbach (CERFACS, 
 France); Markus Holzer (CERFACS, France; Friedrich-Alexander University, E
 rlangen-Nuremberg); Philipp Suffa (Friedrich-Alexander University, Erlange
 n-Nuremberg); and Romain Cuidard and Denis Ricot (CS GROUP)\n-------------
 --------\nHybrid CPU-GPU Implementation of Edge-Connected Jaccard Similari
 ty in Graph Datasets\n\nTypical GPU programs consist of four steps: (1) da
 ta preparation, (2) host CPU-to-GPU data transfers, (3) execution of one o
 r more GPU kernels, and (4) transfer of results back to CPU. While the ker
 nel is running on the GPU, the CPU cores often remain idle, waiting on the
  GPU to finish  kernel execu...\n\n\nAtharva Gondhalekar, Paul Sathre, and
  Wu-chun Feng (Virginia Tech)\n---------------------\nHigh-Performance PME
 M-Aware Collective I/Os\n\nCollective I/Os are widely used to transform sm
 all non-contiguous accesses into large contiguous accesses for parallel I/
 O optimization. The existing collective I/O techniques assume that compute
 r memory is volatile. They are limited both by the size of the buffer, whi
 ch must be small so data is not...\n\n\nKeegan Sanchez and Alex Gavin (Was
 hington State University, Vancouver); Suren Byna (Ohio State University); 
 Kesheng Wu (Lawrence Berkeley National Laboratory (LBNL)); and Xuechen Zha
 ng (Washington State University, Vancouver)\n---------------------\nParLei
 den: Boosting Parallelism of Distributed Leiden Algorithm on Large-Scale G
 raphs\n\nLeiden algorithm has demonstrated superior efficacy compared to t
 raditional Louvain algorithms in the field of community detection. However
 , parallelizing the Leiden algorithm while imposing community size limitat
 ions brings significant challenges in big data processing scenarios. We pr
 esent ParLeid...\n\n\nYongmin Hu (Douyin Vision Co., Ltd); Jing Wang (Shan
 ghai Jiao Tong University); Cheng Zhao (Douyin Vision Co., Ltd); Yibo Liu 
 (Shanghai Jiao Tong University); Cheng Chen and Xiaoliang Cong (Douyin Vis
 ion Co., Ltd); and Chao Li (Shanghai Jiao Tong University)\n--------------
 -------\nDFToy:  A New Proxy App for DFT Calculations\n\nDensity functiona
 l theory based codes are significant users of HPC resources, often ranking
  among the top users of core hours on these systems. However, despite thei
 r popularity and resource usage, they are not very well optimised for curr
 ent HPC architectures - and are not easily adapted. We presen...\n\n\nArje
 n Tamerus (University of Cambridge) and Phil Hasnip (University of York, E
 ngland)\n---------------------\nA High-Performance I/O Framework for Accel
 erating DNN Model Updates Within Deep Learning Workflow\n\nIn traditional 
 deep learning workflows, AI applications (producers) train DNN models offl
 ine using fixed datasets, while inference serving systems (consumers) load
  the trained models for offering real-time inference queries. In practice,
  AI applications often operate in a dynamic environment where d...\n\n\nJi
 e Ye and Jaime Cernuda (Illinois Institute of Technology), Bogdan Nicolae 
 (Argonne National Laboratory (ANL)), and Anthony Kougkas and Xian-He Sun (
 Illinois Institute of Technology)\n---------------------\nSimulating Large
 r Quantum Circuits with Circuit Cutting and Quantum Serverless\n\nQuantum 
 computation is an emerging technology that promises to be able to solve ce
 rtain tasks that are out of reach of classical machines alone. However, th
 e limited number and quality of qubits poses a challenge for practical usa
 ge of near-term quantum computation. Circuit cutting is a technique to...\
 n\n\nCaleb Johnson, Bryce Fuller, Jim Garrison, and Jennifer Glick (IBM Re
 search)\n---------------------\nThat's Right – The Same C++ STL Asynchrono
 us Parallel Code Runs on CPUs and GPUs\n\nHigh-performance computing appli
 cations running on modern-day supercomputers frequently encounter performa
 nce and portability challenges especially if using multiple programming mo
 dels, languages and compilers. In this work, we explore the proposed C++26
  language standard model for asynchronous para...\n\n\nMuhammad Haseeb, We
 ile Wei, Jack Deslippe, and Brandon Cook (Lawrence Berkeley National Labor
 atory (LBNL), National Energy Research Scientific Computing Center (NERSC)
 )\n---------------------\nSophisticated Tools for Performance Analysis and
  Auto-Tuning of Performance Portable Parallel Programming\n\nHPC Software 
 must offer tool support for productive programming of scientific applicati
 ons run on supercomputers using this HPC Software, especially for the soph
 isticated activities of performance analysis and auto-tuning. Given the em
 ergence of performance portable programming libraries having abst...\n\n\n
 Vivek Kale (Sandia National Laboratories), David Boehme (Lawrence Livermor
 e National Laboratory (LLNL)), Kevin Huck and Shravan Kale (University of 
 Oregon), and Vanessa Surjadidjaja and James Brandt (Sandia National Labora
 tories)\n---------------------\nPerformant Low-Order Matrix-Free Finite El
 ement Kernels on GPE Architectures\n\nNumerical methods such as the Finite
  Element Method (FEM) have successfully leveraged the computational power 
 of GPU accelerators. However, much of the effort around FEM on GPU’s has b
 een focused on high order discretizations due to their higher arithmetic i
 ntensity and order of accuracy. For ...\n\n\nRandolph Settgast, William To
 bin, Nicola Castelletto, and Yohann Dudouit (Lawrence Livermore National L
 aboratory (LLNL)); Sergey Klevtsov (Stanford University); and Ben Corbett 
 (Lawrence Livermore National Laboratory (LLNL))\n---------------------\nPa
 nSim: A Performance-Portable Agent Based Model\n\nPanSim, a specialized ag
 ent-based model, was developed to analyze interventions against COVID-19. 
 Implemented in C++ and Thrust, it is a highly performant and portable code
 . Here we focus on different algorithmic formulations for calculating cumu
 lative values like infectiousness at different locatio...\n\n\nIstvan Z. R
 eguly, Bence Keömley-Horváth, Gábor Szederkényi, and Attila Csikász-Nagy (
 Pázmány Péter Catholic University, Hungary)\n---------------------\nPreser
 ving Data Locality in Multidimensional Variational Quantum Classification\
 n\nIn classical machine learning, the convolution operation is leveraged i
 n the eponymous class of convolutional neural networks (CNNs) capturing th
 e spatial and/or temporal locality of multidimensional input features. Pre
 serving data locality allows CNN models to reduce the number of training p
 aramete...\n\n\nMingyoung Jeng, Alvir Nobel, Vinayak Jha, David Levy, Dyla
 n Kneidel, Manu Chaudhary, Ishraq Islam, and Esam El-Araby (University of 
 Kansas)\n---------------------\nAutomating HPC Model Selection on Edge Dev
 ices\n\nThe increasing demand for processing power on resource-constrained
  edge devices necessitate efficient techniques for optimizing High Perform
 ance Computing (HPC) applications. We propose HPEE (HPC Parameter Explorat
 ion on Edge), a novel approach that formulates the parameter search space 
 problem as a...\n\n\nAbrar Hossain and Kishwar Ahmed (University of Toledo
 )\n---------------------\nIntegrating TEZIP into LibPressio:  A Case Study
  of Integrating a Dynamic Application into a Static C Environment\n\nLCLS-
 II at SLAC, SNS at Oak Ridge Laboratory, and other instruments use softwar
 e written in C and C++, producing huge volumes of time evolving data at hi
 gh rate. Data compression can decrease the volume of data we need to move 
 and store. TEZIP is a neural network (NN) based compressor designed for h.
 ..\n\n\nIsita Talukdar (University of California, Berkeley; RIKEN Center f
 or Computational Science (R-CCS)); Amarjit Singh (RIKEN Center for Computa
 tional Science (R-CCS)); Robert Underwood (Argonne National Laboratory (AN
 L)); Kento Sato (RIKEN Center for Computational Science (R-CCS)); and Weik
 uan Yu (Florida State University)\n---------------------\nGeospatial Filte
 r and Refine Computations on NVIDIA Bluefield Data Processing Units (DPU)\
 n\nIn this poster, we will show how to leverage Nvidia's Bluefield Data Pr
 ocessing Unit (DPU) in geospatial systems. Existing work in literature has
  explored DPUs in the context of machine learning, compression and MPI acc
 eleration. We show our designs on how to integrate DPUs into existing high
  perfor...\n\n\nDerda Kaymak (Marquette University) and Satish Puri (Misso
 uri University of Science and Technology)\n---------------------\nQuantum 
 Task Offloading with the OpenMP API\n\nMost of the widely used quantum pro
 gramming languages and libraries are not designed for the tightly coupled 
 nature of hybrid quantum-classical algorithms, which run on quantum resour
 ces that are integrated on-premise with classical HPC infrastructure. We p
 ropose a programming model using the API pr...\n\n\nJoseph K. L. Lee (Edin
 burgh Parallel Computing Centre (EPCC)), Martin Ruefenacht (Leibniz Superc
 omputing Centre), Johannes Doerfert (Lawrence Livermore National Laborator
 y (LLNL)), Oliver Thomson Brown and Mark Bull (Edinburgh Parallel Computin
 g Centre (EPCC)), Michael Klemm (AMD Research), and Martin Schulz (Technic
 al University of Munich)\n---------------------\nA Hybrid Factorization So
 lver with Mixed Precision Arithmetic for Sparse Matrices\n\nFor numerical 
 simulations, linear system with large sparse matrix with high condition nu
 mber needs to be solved. LDU-factorization with pivoting strategy provides
  robust solver for such system. Computational complexity of the factorizat
 ion solver is high and cannot be reduced in framework of the dir...\n\n\nA
 tsushi Suzuki (RIKEN Center for Computational Science (R-CCS))\n----------
 -----------\nReal-Time Change Point Detection in Molecular Dynamics Stream
 ing Data\n\nThe uniform sampling of molecular dynamics (MD) simulations ma
 y not accurately capture crucial scientific events. Deep learning approach
 es are being developed to detect these events within streaming data but ca
 n take significant resources on large datasets (PB+). To address these lim
 itations, we pro...\n\n\nVijayalakshmi Saravanan (University of South Dako
 ta) and Shinjae Yoo, Hubertus Van Dam, Christopher Kelly, Thomas Flynn, Pe
 rry Siehien, Kalyan Muppudojo, and Aniket Kumar Ramesh (Brookhaven Nationa
 l Laboratory)\n---------------------\nUnleashing CGRA Potential for HPC\n\
 nThis poster highlights our previous and future design-space exploration e
 ffort to optimize our CGRA architecture for HPC, i.e., intra-CGRA intercon
 nect optimization, FMA and transcendental operation on CGRA, programmable 
 buffer, systolic-array style execution on CGRA, predication support, and F
 PGA b...\n\n\nBoma Adhi, Emanuele Del Sozzo, and Carlos Cortes (RIKEN Cent
 er for Computational Science (R-CCS)); Xinyuan Wang (University of Toronto
 , RIKEN Center for Computational Science (R-CCS)); and Tomohiro Ueno and K
 entaro Sano (RIKEN Center for Computational Science (R-CCS))\n------------
 ---------\nGraph Based Anomaly Detection in Chimbuko:  Feasible or Fallibl
 e?\n\nPerformance anomaly detection can aid in discovering algorithmic ine
 fficiencies or hardware issues in an application’s environment. The Chimbu
 ko framework monitors large-scale workflow applications in real-time and i
 dentifies function executions which deviate from accumulated statistics (p
 erfo...\n\n\nChase Phelps, Ankur Lahiry, and Tanzima Z. Islam (Texas State
  University) and Christopher Kelly (Brookhaven National Laboratory)\n-----
 ----------------\nExploring Userspace Memory Mapping for RDMA-Enabled Netw
 ork-Attached Memory\n\nMemory-bound applications like graph processing app
 lications often require large memory capacity beyond a single node. Curren
 t HPC systems over-provision compute and memory resources to meet requirem
 ents of diverse workloads. In this work, we explore using network-attached
  memory for disaggregating ...\n\n\nJacob Wahlgren and Jennifer Faj (KTH R
 oyal Institute of Technology, Sweden); Eric Green and Maya Gokhale (Lawren
 ce Livermore National Laboratory (LLNL)); and Ivy Peng (KTH Royal Institut
 e of Technology, Sweden)\n---------------------\nCharacterizing GPU Effect
 iveness on NRP for IceCube fp32 Compute\n\nThe IceCube Neutrino Observator
 y is a cubic kilometer neutrino telescope located at the geographic South 
 Pole. Understanding detector systematic effects is a continuous process. T
 his requires the Monte Carlo simulation to be updated periodically to quan
 tify potential changes and improvements in scie...\n\n\nDavid Schultz (Uni
 versity of Wisconsin, Madison); Igor Sfiligoi (University of California, S
 an Diego (UCSD)); Benedikt Riedel (University of Wisconsin, Madison); and 
 Frank Würthwein (University of California, San Diego (UCSD))\n------------
 ---------\nWhy Wait!?  Hades:  An Active, Content-Aware System for Precalc
 ulating Derived Quantities\n\nModern scientific applications produce vast 
 amounts of data, typically stored in monolithic files on parallel file sys
 tems (PFS). Analyzing these large files often results in inefficiency due 
 to I/O stalls. To mitigate these stalls, certain data can be pre-computed 
 during the production phase and qu...\n\n\nJaime Cernuda, Luke Logan, Anth
 ony Kougkas, and Xian-He Sum (Illinois Institute of Technology)\n---------
 ------------\nModeling Parallel Programs Using Large Language Models\n\nIn
  the past year a large number of large language model (LLM) based tools fo
 r software development have been released.  These tools have the capabilit
 y to assist developers with many of the difficulties that arise from the e
 ver-growing complexity in the software stack.  As we enter the exascale er
 a,...\n\n\nDaniel Nichols (University of Maryland); Aniruddha Marathe, Har
 shitha Menon, and Todd Gamblin (Lawrence Livermore National Laboratory (LL
 NL)); and Abhinav Bhatele (University of Maryland)\n---------------------\
 nThe Many Facets of a Dynamic Graph Processing System\n\nGraphs are used t
 o model real-world systems that often evolve over time. We have developed 
 a streaming graph framework which, while ingesting an unbounded stream of 
 events mirroring a graph's evolution, dynamically updates the solution to 
 a user query, and is able to offer, on-demand and with low la...\n\n\nJunt
 ong Luo, Scott Sallinen, and Matei Ripeanu (University of British Columbia
 )\n---------------------\nGPU-Accelerated Dense Covariance Matrix Generati
 on for Spatial Statistics Applications\n\nLarge-scale parallel computing i
 s crucial in Gaussian regressions to reduce the complexity of spatial stat
 istics applications. The log-likelihood function is utilized to evaluate t
 he Gaussian model for a set of measurements in N geographical locations. S
 everal studies have shown a utilization of mod...\n\n\nZipei Geng, Sameh A
 bdulah, Hatem Ltaief, Ying Sun, Marc Genton, and David Keyes (King Abdulla
 h University of Science and Technology (KAUST))\n---------------------\nTr
 ansfer Learning Workflow for High-Quality I/O Bandwidth Prediction with Li
 mited Data\n\nThe I/O performance prediction is challenging due to multipl
 e intertwined variables inside a cluster. This situation makes I/O perform
 ance prediction a strong candidate for using machine learning because of t
 he complex variables involved. However, making a high-quality prediction r
 equires a large am...\n\n\nDmytro Povaliaiev (RWTH Aachen University); Rad
 ita Liem (RWTH Aachen University, IT Center); Julian Kunkel (University of
  Göttingen, GWDG, Germany); Jay Lofstead (Sandia National Laboratories); a
 nd Philip Carns (Argonne National Laboratory (ANL))\n---------------------
 \nExploring Julia as a Unifying End-to-End Workflow Language for HPC on Fr
 ontier\n\nWe evaluate the use of Julia as a single language and ecosystem 
 paradigm powered by LLVM for the development of high-performance computing
  (HPC) workflow components.  A Gray-Scott 2-variable diffusion-reaction ap
 plication using a memory-bound 7-point stencil kernel is run on Frontier, 
 the first exas...\n\n\nWilliam F. Godoy and Pedro Valero-Lara (Oak Ridge N
 ational Laboratory (ORNL)); Caira Anderson (Oak Ridge National Laboratory 
 (ORNL), Cornell University); Katrina W. Lee (Oak Ridge National Laboratory
  (ORNL); University of Texas, Dallas); and Ana Gainaru, Rafael Ferreira da
  Silva, and Jeffrey S. Vetter (Oak Ridge National Laboratory (ORNL))\n----
 -----------------\nDeveloping an Inverse Reinforcement Learning Methodolog
 y to Predict the Progression of Colorectal Cancer\n\nIn cancer biology, la
 rge amounts of high dimensional data (genomic, transcriptomic, proteomic, 
 phenotypic, etc.) are required for any computationally relevant work. The 
 problem is further complicated by the sheer size of the human genome, roug
 hly three billion base pairs long. Therefore, computation...\n\n\nSilba Do
 well, Daniel Hintz, Tyson Limato, Shad Sellers, and Milana Wolff (Universi
 ty of Wyoming); Nicholas Chia (Argonne National Laboratory (ANL)); and Liu
 dmila Mainzer (University of Wyoming)\n---------------------\nInvestigatin
 g Anomalies in Compute Clusters: An Unsupervised Learning Approach\n\nAs c
 ompute clusters used for running batch jobs continue to grow in scale and 
 complexity, the frequency of anomalies significantly increases. Timely det
 ection of anomalous events has become vital to maintain system efficiency 
 and availability. Our study presents an attention-based graph neural netwo
 ...\n\n\nYiyang Lu and Jie Ren (College of William & Mary); Yasir Alanazi,
  Ahmed Mohammed, Diana McSpadden, Laura Hild, Mark Jones, Wesley Moore, Ma
 lachi Schram, and Bryan Hess (Thomas Jefferson National Accelerator Facili
 ty); and Evgenia Smirni (College of William & Mary)\n---------------------
 \nA Methodology for Accelerating Variant Calling on GPU\n\nPointing out ge
 netic mutations is pivotal to enable clinicians to prescribe personalized 
 therapies to their patients. Genome Analysis ToolKit's HaplotypeCaller, re
 lying on the Pair Hidden Markov Model (PairHMM) algorithm, is one of the m
 ost used applications to identify such variants. However, the P...\n\n\nBe
 atrice Branchini, Alberto Zeni, and Marco Santambrogio (Polytechnic Univer
 sity of Milan)\n---------------------\nCharacterizing the Performance of t
 he Implicit Massively Parallel Particle-in-Cell iPIC3D Code\n\nOptimizing 
 iPIC3D, an implicit Particle-in-Cell (PIC) code, for large-scale 3D plasma
  simulations is crucial for space and astrophysical applications. This wor
 k focuses on characterizing iPIC3D’s communication efficiency through stra
 tegic measures like optimal node placement, communication and...\n\n\nJere
 my Johnathan Williams, Daniel Medeiros, Ivy Peng, and Stefano Markidis (KT
 H Royal Institute of Technology, Sweden)\n---------------------\nEarly Exp
 erience in Characterizing Training Large Language Models on Modern HPC Clu
 sters\n\nIn the realm of natural language processing, Large Language Model
 s (LLMs) have emerged as powerful tools for tasks such as language transla
 tion, text generation, and sentiment analysis. However, the immense parame
 ter size and complexity of LLMs present significant challenges. This work 
 delves into t...\n\n\nHao Qi, Liuyao Dai, Weicong Chen, and Xiaoyi Lu (Uni
 versity of California, Merced)\n---------------------\nEvaluating Performa
 nce Portability of GPU Programming Models\n\nMaintaining a single codebase
  that can achieve good performance on a range of accelerator-based superco
 mputing platforms is of extremely high value for productive scientific app
 lication development. However, the large quantity of programming models av
 ailable which claim to provide performance portab...\n\n\nJoshua H. Davis,
  Pranav Sivaraman, Isaac Minn, and Abhinav Bhatele (University of Maryland
 )\n---------------------\nTemporal Classification of Allocations for Reduc
 ed Memory Usage\n\nUmpire, a data and memory management API created at LLN
 L, provides memory pools which enable less expensive ways to allocate larg
 e amounts of memory in HPC environments. Memory pools commonly contain bot
 h allocations that persist for only a portion of the program (temporary) a
 nd those that persist f...\n\n\nKristi Belcher and David Beckingsale (Lawr
 ence Livermore National Laboratory (LLNL)), Sam Schwartz (University of Or
 egon), and Marty McFadden (Lawrence Livermore National Laboratory (LLNL))\
 n---------------------\nMPI Performance Analysis in Vlasiator:  Unraveling
  Communication Bottlenecks\n\nVlasiator is a popular and powerful massivel
 y parallel code for accurate magnetospheric and solar wind plasma simulati
 ons. This work provides an in-depth analysis of Vlasiator, focusing on MPI
  performance using the Integrated Performance Monitoring (IPM) tool. We sh
 ow that MPI non-blocking point-to-...\n\n\nJennifer Faj, Jeremy J. William
 s, and Ivy B. Peng (KTH Royal Institute of Technology, Sweden); Urs Ganse,
  Markus Battarbee, Yann Pfau-Kempf, Leo Kotipalo, and Minna Palmroth (Univ
 ersity of Helsinki); and Stefano Markidis (KTH Royal Institute of Technolo
 gy, Sweden)\n---------------------\nExploring the Impacts of Multiple I/O 
 Metrics in Identifying I/O Bottlenecks\n\nHPC systems, driven by the rise 
 of workloads with significant data requirements, face challenges in I/O pe
 rformance. To address this, a thorough I/O analysis is crucial to identify
  potential bottlenecks. However, the multitude of metrics makes it difficu
 lt to pinpoint the causes of low I/O performan...\n\n\nIzzet Yildirim (Ill
 inois Institute of Technology), Hariharan Devarajan (Lawrence Livermore Na
 tional Laboratory (LLNL)), Anthony Kougkas and Xian-He Sun (Illinois Insti
 tute of Technology), and Kathryn Mohror (Lawrence Livermore National Labor
 atory (LLNL))\n---------------------\nA Portable Software Environment for 
 Ultrahigh-Resolution ELM Development on GPUs\n\nA software tool, called SP
 EL, has been developed to port and optimize and the ultrahigh-resolution E
 LM (uELM) code onto GPUs within a functional unit test framework. To promo
 te the widespread adoption of this approach for community-based uELM devel
 opment, this poster presents a portable software env...\n\n\nFranklin Eagl
 ebarger (Pellissippi State Community College) and Dali Wang (Oak Ridge Nat
 ional Laboratory (ORNL))\n---------------------\nPipit: Simplifying Analys
 is of Parallel Execution Traces\n\nPer-process per-thread traces enable in
 -depth analysis of parallel program execution to identify various kinds of
  performance issues. Often times, trace collection tools provide a graphic
 al tool to analyze the trace output. However, these GUI-based tools only s
 upport specific file formats, are diffi...\n\n\nAlexander Movsesyan, Rakri
 sh Dhakal, Aditya Ranjan, Jordan Marry, Onur Cankur, and Abhinav Bhatele (
 University of Maryland)\n---------------------\nTwo-Phase IO Enabling Larg
 e-Scale Performance Introspection\n\nNumerous sophisticated profiling and 
 visualization tools have been developed to enable programmers to expose se
 mantic information from their application components.  However, effective 
 and interactive exploration of the profiles of large-scale parallel progra
 ms remains a challenge due to the high I/...\n\n\nKe Fan and Sidharth Kuma
 r (University of Illinois, Chicago)\n---------------------\nQASM-to-HLS:  
 A Framework for Accelerating Quantum Circuit Emulation on High-Performance
  Reconfigurable Computers\n\nHigh-performance reconfigurable computers (HP
 RCs) make use of Field-Programmable Gate Arrays (FPGAs) for efficient emul
 ation of quantum algorithms. Generally, algorithm-specific architectures a
 re implemented on the FPGAs, and there is very little flexibility. Moreove
 r, mapping a quantum algorithm on...\n\n\nAnshul Maurya and Naveed Mahmud 
 (Florida Institute of Technology)\n---------------------\nToward Inductive
  Synthesis of Compiler Heuristics:  A Case Study with Register Allocation\
 n\nThere have been significant advances in machine learning-driven perform
 ance modeling in recent years. One key limitation of such approaches is th
 at their success depends, to a large degree, on the formulation of the out
 come or objective, which is typically done by human experts. In this paper
 , we pr...\n\n\nMohammad Ali and Apan Qasem (Texas State University)\n----
 -----------------\nScalable Fine-Grained Gang Scheduling for HPC Systems w
 ith Unreliable Broadcast Synchronization Mechanisms\n\nThe demand for inte
 ractivity on HPC systems is increasing, primarily driven by new HPC users 
 from the AI/ML research area. Traditional HPC users are accustomed to wait
 ing for job execution on a batch scheduling system while new users prefer 
 an interactive terminal such as Jupyter Notebook. To addres...\n\n\nHiroki
  Ohtsuji, Erika Hayashi, Reika Kinoshita, Masahiro Miwa, and Eiji Yoshida 
 (Fujitsu Ltd)\n---------------------\nRadium: Transparent Distributed Exec
 ution via Process Virtualization\n\nThe soaring demand for AI has led to a
  surge in specialized computation hardware, which poses challenges in shar
 ing resources through conventional virtualization methods among end users.
  Moreover, the extensive data required by AI often cannot be conveniently 
 co-located with the compute resources, r...\n\n\nAidan Cully, Husheng Zhou
 , Dusan Veljko, Hyojong Kim, Vance Miller, Joel Zambrano, and Mazhar Memon
  (VMware Inc)\n---------------------\nDelivering Digital Skills Across the
  Digital Divide:  Creating an Accessible On-Demand Self-Paced HPC Virtual 
 Training Lab\n\nThe training of new and existing HPC practitioners is reco
 gnized as a priority in the HPC community. Traditionally, delivering HPC S
 ystem Administrator training has been through physical face-to-face worksh
 ops, using cloud-based services or remote hardware to provide compute reso
 urces to emulate an ...\n\n\nBryan Johnston, Lara Timm, and Mabatho Hashat
 si (Council for Scientific and Industrial Research (CSIR), South Africa; C
 enter for High Performance Computing (CHPC), South Africa)\n--------------
 -------\nCharacterizing One-/Two-Sided Designs in OpenSHMEM Collectives\n\
 nOpenSHMEM is a widely used Partitioned Global Address Space (PGAS) progra
 mming model in the HPC community. The latest OpenSHMEM Specification v1.5 
 introduced the team concept and team-based collective communication that a
 re similar to the communicator and collective communication in the Message
  Pass...\n\n\nYuke Li (University of California, Merced); Yanfei Guo (Argo
 nne National Laboratory (ANL)); and Xiaoyi Lu (University of California, M
 erced)\n---------------------\nsys-sage:  A Fresh View on Dynamic Topologi
 es and Attributes of HPC Systems\n\nHPC systems are getting ever more powe
 rful, but this comes at the price of increasing system complexity. In orde
 r to use HPC systems efficiently, one has to be aware of their architectur
 al details, in particular details of their hardware topology, which is inc
 reasingly affected by dynamic runtime se...\n\n\nStepan Vanecek and Martin
  Schulz (Technical University of Munich)\n---------------------\nQuantum C
 omputing Case Study in Aerospace Field\n\nWith the demise of Moore’s empir
 ical law, we cannot expect a dramatic improvement in computer performance 
 in the future, but the need for supercomputer in JAXA for numerical simula
 tion and data processing etc. continues to rise. Until now, general-purpos
 e CPUs have been exclusively used, but t...\n\n\nNaoyuki Fujita and Yuusuk
 e Takemoto (Japan Aerospace Exploration Agency (JAXA)); Susumu Takatsu (TR
 C Inc.); and Yasuyuki Nishibayashi, Mitsuhiro Hashimoto, Ryo Sakurai, Mits
 uharu Takeori, Takahiro Yamamoto, and Jiayun Zhu (IBM Japan)\n------------
 ---------\nSimulating Application Agnostic Process Assignment for Graph Wo
 rkloads on Dragonfly and Fat Tree Topologies\n\nDistributed-memory graph a
 pplications are dominated by communication and synchronization overheads. 
 For such applications, the communication pattern comprises of variable-siz
 ed data exchanges between process neighbors in a process graph topology, w
 hich unlike process grid for rectangular problems is...\n\n\nMd Nahid Newa
 z (Oakland University); Sayan Ghosh, Joshua Suetterlein, and Nathan Tallen
 t (Pacific Northwest National Laboratory (PNNL)); and Hua Ming (Oakland Un
 iversity)\n---------------------\nUnstructured Finite Element Models of Ca
 rdiac Electrophysiology Using a Deal.II-Based Library\n\nCardiovascular el
 ectrophysiology simulations often involve computationally expensive tasks 
 due to the inherent multiphysics complexity of the problems. Additionally,
  the use of complex patient-specific geometries and biophysically-detailed
  ionic models adds to the system's complexity. To numerically...\n\n\nLary
 ssa Abdala (University of North Carolina), Simone Rossi (Align Technology)
 , and David Wells and Boyce Griffith (University of North Carolina)\n-----
 ----------------\nScalable Algorithms for Analyzing Large Dynamic Networks
  Using CANDY\n\nAs the dynamic network’s topology undergoes temporal alter
 ations, associated graph properties must be updated to ensure their ac- cu
 racy. Addressing this requirement efficiently in large dynamic networks le
 d to the proposal of a generic framework, CANDY (Cyberinfrastructure for A
 ccelerating In...\n\n\nAashish Pandey (University of North Texas), Arindam
  Khanda (Missouri University of Science and Technology), Sriram Srinivasan
  and Sudharshan Srinivasan (University of Oregon), S. M. Shovan (Missouri 
 University of Science and Technology), Farahnaz Hosseini (University of No
 rth Texas), Sajal Das (Missouri University of Science and Technology), Boy
 ana Norris (University of Oregon), and Sanjukta Bhowmick (University of No
 rth Texas)\n---------------------\nScaling K-Path Centrality Using Optimiz
 ed Distributed Data Structure\n\nK-Path centrality is based on the flow of
  information in a graph along simple paths of length at most K. This work 
 addresses the computational cost of estimating K-path centrality in large-
 scale graphs by introducing the random neighbor traversal graph (RaNT-Grap
 h). The distributed graph data struct...\n\n\nLance Fletcher (Texas A&M Un
 iversity, Lawrence Livermore National Laboratory (LLNL)); Trevor Steil (La
 wrence Livermore National Laboratory (LLNL)); and Roger Pearce (Lawrence L
 ivermore National Laboratory (LLNL), Texas A&M University)\n--------------
 -------\nOptimizing Uncertainty Quantification of Vision Transformers in D
 eep Learning on Novel AI Architectures\n\nDeep Learning (DL) methods have 
 shown substantial efficacy in computer vision (CV) and natural language pr
 ocessing (NLP). Despite their proficiency, the inconsistency in input data
  distributions can compromise prediction reliability. This study mitigates
  this issue by introducing uncertainty evaluat...\n\n\nErik Pautsch (Loyol
 a University, Chicago); John LI (University of California San Diego); Silv
 io Rizzi (Argonne National Laboratory (ANL)); George Thiruvathukal (Loyola
  University, Chicago); and Maria Pantoja (California Polytechnic State Uni
 versity, San Luis Obispo)\n---------------------\nOptimizing Workflow Perf
 ormance by Elucidating Semantic Data Flow\n\nDistributed scientific workfl
 ows are becoming data-intensive, and the data movement through storage sys
 tems often causes bottleneck. Therefore, it is critical to understand data
  flow. Many scientific datasets incorporate domain semantics with formats 
 like HDF and NetCDF, enhancing the interpretabili...\n\n\nMeng Tang (Illin
 ois Institute of Technology), Nathan R. Tallent (Pacific Northwest Nationa
 l Laboratory (PNNL)), and Anthony Kougkas and Xian-He Sun (Illinois Instit
 ute of Technology)\n---------------------\nNeoRodinia:  Evaluation of High
 -Level Parallel Programming Models and Compiler Transformation for GPU Off
 loading\n\nNeoRodinia is a comprehensive benchmark suite developed from Ro
 dinia, containing 23 real-world applications and 5 microbenchmarks. It add
 resses the limitations of Rodinia by optimizing OpenMP GPU offloading prog
 rams and introducing OpenACC variants. The evaluation involves thorough pe
 rformance asses...\n\n\nXinyao Yi, Anjia Wang, and Yonghong Yan (Universit
 y of North Carolina at Charlotte)\n---------------------\nThe Impact of Pr
 ocess Topology on RMA Programming Models:  A Study on NERSC Perlmutter\n\n
 Remote Memory Access (RMA) provides an alternate mechanism for data moveme
 nt by separating communication with synchronization, exposing remote memor
 y access features via one-sided communication semantics to a global addres
 s space. Performance of the most popular asynchronous RMA interfaces like 
 MPI ...\n\n\nNikodemos Koutsoheras (Pacific Northwest National Laboratory 
 (PNNL)), Sayan Ghosh (University of Maryland), Nathan Tallent and Joshua S
 uetterlein (Pacific Northwest National Laboratory (PNNL)), and Abhinav Bha
 tele (University of Maryland)\n---------------------\nMinimizing Data Move
 ment Using Distant Futures\n\nScientific workflows execute a series of tas
 ks where each task may consume data as an input and produce data as an out
 put. Within these workflows, tasks often produce intermediate results that
  may serve as inputs to subsequent tasks within the workflow. These result
 s can vary in size and may need to...\n\n\nBarry Sly-Delgado and Douglas T
 hain (University of Notre Dame)\n---------------------\nSoftware Developme
 nt Case Study: The Acceleration of a Distributed Application Using GPUs\n\
 nWe present a practical approach for the acceleration of an industrial and
  scientific application using graphics processing units (GPUs). Our origin
 al application is a computational stratigraphy codebase that couples fluid
  flow and sediment deposition submodels. The application uses domain decom
 posit...\n\n\nMartin Kuhnel, Alex Loddoch, and Tao Sun (Chevron)\n--------
 -------------\nNeural Domain Decomposition for Variable Coefficient Poisso
 n Solvers\n\nThe computational bottleneck in many fluid simulations arises
  from solving the variable coefficient Poisson equation. To tackle this ch
 allenge, we propose a novel neural domain decomposition algorithm to accel
 erate its solution. Our approach hinges on two key ideas: first, using neu
 ral PDE solvers t...\n\n\nSebastian Barschkis and Zitong Li (University of
  California, Irvine); Hengjie Wang (Modular Inc); and Aparna Chandramowlis
 hwaran (University of California, Irvine)\n---------------------\nSimulati
 ng Quantum Systems with NWQ-Sim on HPC\n\nNWQ-Sim is a cutting-edge quantu
 m system simulation environment designed to run on classical multi-node, m
 ulti-CPU/GPU heterogeneous HPC systems.  In this work, we provide a brief 
 overview of NWQ-Sim and its implementation in simulating quantum circuit a
 pplications, such as the transverse field Isin...\n\n\nIn-Saeng Suh (Oak R
 idge National Laboratory (ORNL)) and Ang Li (Pacific Northwest National La
 boratory (PNNL))\n---------------------\nImproving Memory Interfacing in H
 LS-Generated Accelerators with Custom Caches\n\nAccelerators based on reco
 nfigurable devices are becoming popular for data analytics in high perform
 ance computing and cloud computing systems. However, designing these accel
 erators is a hard problem. High-Level Synthesis tools can help by generati
 ng RTL designs from high-level languages, but they t...\n\n\nClaudio Baron
 e (Pacific Northwest National Laboratory (PNNL)), Giovanni Gozzi and Miche
 le Fiorito (Polytechnic University of Milan), Ankur Limaye and Antonino Tu
 meo (Pacific Northwest National Laboratory (PNNL)), and Fabrizio Ferrandi 
 (Polytechnic University of Milan)\n---------------------\nAres – Simulatin
 g Type Ia Supernovae on Heterogeneous HPC Architectures\n\nType Ia Superno
 vae are highly luminous thermonuclear explosions of white dwarfs which ser
 ve as standardizable distance markers for investigating the accelerating e
 xpansion of our Universe. Most existing supernovae simulation codes are on
 ly designed to run on homogeneous CPU-only systems and do not t...\n\n\nLa
 ndon Dyken (University of Illinois, Chicago); Alexander Holas (Heidelberg 
 Institute of Theoretical Studies); Mark Ivan Ugalino (University of Maryla
 nd); and Md Nageeb Bin Zaman (Louisiana State University)\n---------------
 ------\nAn Early Case Study with Multi-Tenancy Support in SPDK’s NVMe-over
 -Fabric Designs\n\nResource disaggregation is prevalent in datacenters sin
 ce it provides high resource utilization when compared to servers dedicate
 d to either compute, memory, or storage. NVMe-over-Fabrics (NVMe-oF) is th
 e standardized protocol used for accessing disaggregated storage over the 
 network. Currently, the...\n\n\nDarren Ng, Charles Parkinson, Andrew Lin, 
 Arjun Kashyap, and Xiaoyi Lu (University of California, Merced)\n---------
 ------------\nAccelerating Actor-Based Distributed Triangle Counting\n\nTr
 iangle counting is a cornerstone operation in large graph analytics. It ha
 s been a challenging problem historically, owing to the irregular and dyna
 mic nature of the algorithm, which not only inhibits compile-time optimiza
 tions, but also requires runtime optimizations such as message aggregation
  a...\n\n\nAniruddha Mysore, Kaushik Ravichandran, Youssef Elmougy, Akihir
 o Hayashi, and Vivek Sarkar (Georgia Institute of Technology)\n-----------
 ----------\nScalable Reduced-Order Modeling for Three-Dimensional Turbulen
 t Flow\n\nA neural network-based reduced order modeling method for three-d
 imensional turbulent flow simulation is proposed. This method was implemen
 ted as the scalable distributed learning on Fugaku. Our method constitutes
  a dimensional reduction using a convolutional-autoencoder-like neural net
 work and the t...\n\n\nKazuto Ando and Rahul Bale (RIKEN Center for Comput
 ational Science (R-CCS); Kobe University, Japan); Akiyoshi Kuroda (RIKEN C
 enter for Computational Science (R-CCS)); and Makoto Tsubokura (RIKEN Cent
 er for Computational Science (R-CCS); Kobe University, Japan)\n-----------
 ----------\nTowards Enabling Digital Twins Capabilities for a Cloud Chambe
 r\n\nParticle-resolved direct numerical simulations (PR-DNS), which resolv
 e not only the smallest turbulent eddies but also track the development an
 d motion of individual particles, are arguably an essential tool for explo
 ring aerosol-cloud-turbulence interactions at the fundamental level. For i
 nstance, ...\n\n\nJiaqi Yang (Emory University); Mohammad Atif, Vanessa Lo
 pez-Marrero, Tao Zhang, Kwang Min Yu, Meifeng Lin, Lingda Li, Fan Yang, an
 d Yangang Liu (Brookhaven National Laboratory); and Abdullahalmut Sharfudd
 in and Foluso Ladeinde (Stony Brook University)\n---------------------\nEx
 ploring Green Cryptographic Hashing Algorithms for Eco-Friendly Blockchain
 s\n\nCryptographic hash functions are fundamental for ensuring data securi
 ty and integrity in all consensus algorithms in blockchains. While SHA256 
 has been widely used in many blockchain implementations, its throughput an
 d efficiency has led the rise of a modern lightweight and speed superior i
 mplementa...\n\n\nAahad Abubaker (DePaul University); Tanmay Anand, Sonal 
 Gaikwad, Mahad Haider, Jacklyn McAninch, and Lan Nguyen (Illinois Institut
 e of Technology); Alexandru Orhean (DePaul University); and Ioan Raicu (Il
 linois Institute of Technology)\n---------------------\nEE-HPC – A Framewo
 rk for Energy Efficient HPC System Operation\n\nThe energy consumption of 
 HPC data centers is a decisive factor in the procurement and operation of 
 the systems. EE-HPC achieves a more efficient energy use of HPC systems by
  targeted job-specific control and optimization of the hardware. The proje
 ct started end of 2022 and builds on the existing st...\n\n\nJan Eitzinger
  and Thomas Gruber (Friedrich-Alexander University, Erlangen-Nuremberg; Er
 langen National High Performance Computing Center)\n---------------------\
 nBalancing Latency and Throughput of Distributed Inference by Interleaved 
 Parallelism\n\nDistributed large model inference is still in a dilemma whe
 re balancing latency and throughput, or rather cost and effect. Tensor par
 allelism, while capable of optimizing latency, entails a substantial expen
 diture. Conversely, pipeline parallelism excels in throughput but falls sh
 ort in minimizing e...\n\n\nJiangsu Du, Jinhui Wei, and Jiazhi Jiang (Sun 
 Yat-sen University, Guangzhou); Shenggan Cheng (National University of Sin
 gapore); and Zhiguang Chen, Dan Huang, and Yutong Lu (Sun Yat-sen Universi
 ty, Guangzhou)\n---------------------\nIntroducing Prefetching and Data Co
 mpression to Accelerate Checkpointing for Inverse Seismic Problems\n\nRemo
 te Time Migration (RTM) poses substantial computational challenges, demand
 ing large memory and extended processing times. Our RTM implementation pro
 cesses three-dimensional fields on multiple NVIDIA GPUs using the Revolve 
 algorithm for checkpointing. However, transferring data between the host a
 ...\n\n\nSandro Rigo, Thiago Maltempi, Marcio Pereira, and Hervé Yviquel (
 University of Campinas); Jessé Costa (Pará Federal University); and Guido 
 Araujo (University of Campinas)\n\nTag: Artificial Intelligence/Machine Le
 arning, Architecture and Networks, Heterogeneous Computing, I/O and File S
 ystems, Performance Measurement, Modeling, and Tools, Post-Moore Computing
 , Programming Frameworks and System Software, Quantum Computing\n\nRegistr
 ation Category: Tech Program Reg Pass, Exhibits Reg Pass
END:VEVENT
END:VCALENDAR