PASC Posters

Achieving Performance Portability on ECMWF’s Open-Source Operational Wave Model ecWAM Using Source-To-Source Translation and GPU-Aware Data-Structures

It can be quite challenging to adapt production numerical weather prediction (NWP) codes for GPU execution. Those codes have typically been developed and optimised for multi-core CPUs and are continually being updated by domain scientists. Additional complexity arises from the vast size of these codebases, the increased diversity in available platform architectures with native and derived programming models as well as the necessity of vendor-specific modifications to achieve optimal performance. At ECMWF, we manage this complexity using Loki, our source-to-source translation toolchain, and FIELD API, a GPU-aware data-structures library. In this poster we present how these two tools have been used to achieve performance portability on ECMWF’s operational wave-model, ecWAM. Starting from the original CPU optimised Fortran code, we present different GPU-capable variants that can be generated via Loki. The variants presented are diverse in terms of optimisation strategies and employed programming models. As one of the highlights, Loki is capable of translating the original Fortran kernels to C-style kernels like CUDA for NVIDIA GPUs and HIP for AMD GPUs. With this, we present not only performance across multiple architectures but also showing potential performance benefits resulting from translation to native kernel languages.

Author(s): Michael Staneker (ECMWF), and Ahmad Nawab (ECMWF)

Domain: Climate, Weather and Earth Sciences


Advances in HPC-Oriented Refactoring Techniques with Coccinelle

Our collaboration around the Coccinelle tool aims at streamlining maintenance of large software projects in HPC. We are developing techniques to modify large swathes of C/C++ codes and introduce e.g.: GPU support, replace an API with another, introduce modern C++ according to guidelines, change parallelism model. Our aims are: minimization of code rewriting, preserving the value and extending the lifetime of well written old code, saving energy by making code more efficient. This poster summarizes our recent efforts.

Author(s): Michele Martone (Leibniz Supercomputing Centre), Julia Lawall (INRIA), and Victor Gambier (INRIA)

Domain: Computational Methods and Applied Mathematics


Bit-IF: An Incremental Sparse Tensor Format for Maximizing Efficiency in Tensor-Vector Multiplications

This poster presents **Bit-IF** (Incremental Sparse Fibers with Bit Encoding), a novel sparse tensor format designed to reduce the storage requirements of large tensors and improve the efficiency of tensor operations, particularly of tensor-vector multiplication (TVM). As datasets in many scientific fields increase in dimensionality, size, and sparsity, efficient storage and computation methods become essential. Current state-of-the-art sparse tensor formats achieve memory-efficient representations but often require extensive indexing or pre-computation, limiting flexibility and efficiency. Unlike existing formats, Bit-IF only records index increments encoded by a compact bit array. This mode-independent approach allows for an arbitrary index traversal during the TVM. Bit-IF’s design characteristics significantly reduce memory overhead, improve data locality, and eliminate the need for multiple tensor copies or mode-specific preprocessing before performing a TVM. Our analysis and initial comparative studies show that Bit-IF reduces memory consumption and computation time compared to COO-based approaches. Its mode independence and incremental indexing allow for flexible traversal orders, enabling the use of space-filling curves such as Z-curves or Hilbert curves to improve data locality and scalability. We plan to extend the applicability of this method to other tensor operations, such as tensor-matrix and Khatri-Rao products.

Author(s): Xiaohe Niu (mcs Software AG), Georg Meyer (Friedrich-Alexander-Universität Erlangen-Nürnberg, Università della Svizzera italiana), Dimosthenis Pasadakis (Università della Svizzera italiana, Panua Technologies), Albert-Jan N. Yzelman (Huawei Zurich Research Center), and Olaf Schenk (Università della Svizzera italiana, Panua Technologies)

Domain: Computational Methods and Applied Mathematics


Calculation of Spin Hole Qubit Eigenstates with GPU-Accelerated Rayleigh–Chebyshev Subspace Iteration Method

Quantum computers leverage quantum mechanical effects to solve complex problems exponentially faster than classical computers. Their building blocks, or ‘qubits’, can be realized with different technologies. Silicon spin hole qubits are one of the most promising ones, thanks to their long coherence times, potentially fast manipulations, and already matured fabrication processes, as they can be encompassed within conventional CMOS transistors. Nevertheless, the performance of spin hole qubits is still far from optimal. Hence, the availability of advanced modeling platforms is key to capturing qubits’ complex physics and optimizing this technology. The standard approach to simulate spin hole qubits consists of self-consistently solving the Schrödinger and Poisson equations and producing these systems’ ground-state energies and charge distributions. The core operation is solving sparse eigenvalue problems for the smallest eigenpairs. For this purpose, we developed a GPU-accelerated Rayleigh–Chebyshev subspace iteration solver. Our solver relies on custom CPU/GPU kernels written in C++/CUDA and different CUDA library calls. Performance evaluations were conducted on the ALPS supercomputer and its Grace Hopper superchips. Our implementation overcomes previous time limitations achieving a speed-up of ~17x on a single GPU over the previous CPU Krylov approach, enabling high-resolution simulations of multi-qubit structures.

Author(s): Alexander Maeder (ETH Zurich), Ilan Bouquet (ETH Zurich), Vincent Maillou (ETH Zurich), Alexandros Nikolaos Ziogas (ETH Zurich), Chris Anderson (UCLA), and Mathieu Luisier (ETH Zurich)

Domain: Physics


Characterizing Genomic Protein Complex Binding Using Shared Motif Enrichment

Many proteins bind to genomic DNA to regulate critical processes such as transcription and replication. These proteins come together at specific locations to form complexes that perform necessary function(s) at each locus. One primary way for a protein to achieve specificity in its genomic binding locations is by recognizing specific DNA sequences (motifs). Motifs can be identified through chromatin immunoprecipitation (ChIP). One aspect of ChIP to note is that proteins that indirectly bind to the genome (i.e. bind to a cognate factor which itself binds to the genome) will display enrichment at the bound locations of its cognate factor. Making use of this aspect of ChIP, we have developed two software tools to be used in conjunction to identify distinct configurations of proteins on the genome: Clustered Alignment of Motif Profiles (CLAMP) and Decomposition of Protein Occupancy Profiles (DPOP). Together, these software tools allow us to decompose ChIP signal for proteins at bound motifs into distinct protein complexes based on protein composition, binding locations, and the ChIP pattern relative to each motif.

Author(s): Justin Cha (Cornell University)

Domain: Life Sciences


The Chatbot Update System (CUS): An Effective Interface to Train AI

With the rise of AI in many disciplines and the proliferation of chatbots in many applications, various chatbots need training to properly respond to human users. In this presentation, I report on a chatbot training interface that I developed named CUS, the Chatbot Update System. CUS was developed for use with a cybersecurity playable case study that immerses users in an experience like unto working in a cybersecurity firm. A chatbot plays the users’ coworkers in the simulation, and the chatbot needs training to recognize the meaning of various user inputs. CUS successfully provided a convenient and efficient way to provide appropriate responses to user input. With this presentation, I show the most recent version of CUS, which includes new features: gamified elements, small sets of corrections, a mobile-friendly interface, and an arbitration feature.

Author(s): Stephen Francis (Brigham Young University)

Domain: Computational Methods and Applied Mathematics


Code-Generation of Highly Efficient Finite Element Operations Using the MLIR Compiler Infrastructure

The immense financial and environmental cost of high performance computing (HPC) infrastructure demands highly efficient and hardware specific software. In modern exascale hardware, the development of efficient kernels requires addressing both hardware heterogeneity and the memory bandwidth bottleneck. A popular approach to overcoming the latter is by maximizing the floating point operations (FLOPS) per byte of memory, namely the arithmetic intensity, of the numerical scheme. In computational fluid dynamics (CFD) higher order numerical schemes such has the spectral/hp element method (SEM) have been found to have a desirably high and tunable arithmetic intensity, as the higher order shape functions allow each element to contribute with more resolved approximations of the dynamics being simulated. In this work we present the initial stages of the development of NektarIR, a compiler for the Nektar++ spectral/hp element framework. Built using the MLIR compiler infrastructure, NektarIR aims to facilitate the generation of highly efficient, hardware specific, implementations of finite element evaluations for Nektar++.

Author(s): Edward Erasmie-Jones (King’s College London), Giacomo Castiglioni (ETH Zurich / CSCS), and David Moxey (King’s College London)

Domain: Engineering


Computational Prediction of Material Properties for New and Improved Super Alloys

The present study is focused on the CAlculation of PHase Diagram (CALPHAD) for Ni-based super alloys. This methodology can computationally predict the performance of different compositions of super alloys which accounts for the thermodynamic interaction of multiple phases. Since CALPHAD technique depends on empirically derived coefficients, that correlates to the behaviour of the compounds in subatomic level, lack of coefficients will directly impact the number of compounds we can discover and analyse. In the present study and through Machine Learning (ML) approaches, we try to create surrogate models and test multi-classifiers for predicting phases of the compounds against the CALPHAD simulations. These models are expected to mimic analytical solutions obtained by CALPHAD, and potentially be used for generating parameters for the new compounds or alloys when data is not available in the literature. In addition, we investigate phase diagram for binary and ternary systems, for Ni super alloys. Using open-source Pycalphad package, energy, entropy, enthalpy and heat capacity of mixtures in the systems of interest are investigated. In addition, a module is added to calculate the molar volume and thermal expansion coefficient for binary phases in Ni super alloys.

Author(s): Masumeh Gholamisheeri (STFC, Hartree Centre), Lara Kabalan (STFC, Hartree Centre), Zeynep Sumer (IBM Research), Harry Durnberger (STFC, Hartree Centre), Viktor Zolyomi (STFC, Hartree Centre), and Tim Pawell (STFC, Hartree Centre)

Domain: Chemistry and Materials


A Deep Dive into Deep Learning Frameworks for Protein Structure Prediction: Developing and Evaluating Classes of Biomolecular Complexes

Accurately predicting the structure of a protein has been a long standing and extremely challenging problem in biology. In recent years, the rapid evolution and adoption of artificial intelligence (AI) in scientific domains including biology have made the prediction of protein structures leveraging deep learning (DL) frameworks with accuracy rivaling that of experimental crystal structures possible. These advances are key to understanding protein function and play a central role in accelerating the drug discovery process. This work focuses on comparing the performance of state-of-the-art protein structure prediction models across a predefined set of 7 challenging biomolecule categories evaluated using AlphaFold2, AF2Complex, AlphaFold-multimer, AlphaFold3, Chai-1, and Boltz-1. The results compare the accuracy and performance of each method when applied to classes of challenging protein-protein complexes. The evaluation was conducted leveraging computational resources at Oak Ridge National Laboratory (ORNL) including Frontier, ORNL’s exascale supercomputer. The data set constructed, the resulting biomolecular complex type classification, and the comprehensive set of guidelines derived can aid in future experiment design. The results from this study can also provide a deeper understanding of the advantages and limitations of each model when applied to specific classes of biomolecular complexes as opposed to individual complexes.

Author(s): Verónica G. Melesse Vergara (Oak Ridge National Laboratory), Érica Texeira Prates (Oak Ridge National Laboratory), Manesh Shah (Oak Ridge National Laboratory), and Dan Jacobson (Oak Ridge National Laboratory)

Domain: Life Sciences


Developing a Portable Implementation for the Next-Generation ECMWF Model

We present the development of a portable high-level Python implementation for the next-generation ECMWF global dynamical core designed to facilitate simulations at extreme numerical resolutions. This new model framework, called the Portable Model for Multi-Scale Atmospheric Prediction (PMAP), is an advancement of the Finite-Volume Module (FVM) originally developed at ECMWF using Fortran. The key aspect of the global PMAP is its implementation utilizing the latest version of the GridTools for Python (GT4Py) domain-specific library, named gt4py.next. This library is tailored to the applied conservative finite-volume discretization methods that support, among others, the operational octahedral grid at ECMWF. The gt4py.next library itself is being co-developed with various Swiss partners and is under continuous extension, optimization, and refinement alongside the PMAP framework. The model’s distributed multi-node configuration employs the Generic exascale-ready library for halo-exchange operations (GHEX). We present recent model validation results and report on performance, portability, and scalability across selected CPU- and GPU-based supercomputers.

Author(s): Sara Faghih-Naini (ECMWF), Till Ehrengruber (ETH Zurich / CSCS), Stefano Ubbiali (ETH Zurich), Lukas Papritz (ETH Zurich, ECMWF), and Christian Kühnlein (ECMWF)

Domain: Climate, Weather and Earth Sciences


Efficient Execution of Multiphysics Simulation Assembly Using Kokkos::Graph

The computation of elemental system matrices and right-hand-side vectors and their assembly into sparse linear algebra data structures is a key component of many multiphysics simulation codes. When the assembly involves multiple types of governing equations that might also change by subdomain (heterogeneous coefficients, different source terms, different types of boundary conditions, …), using a single computational kernel can lead to inefficient performance and memory footprint. Therefore, one may want to specialize the kernels, thus generating many kernels that need to be created and efficiently scheduled, potentially many times (e.g., iterative solver). These asynchronous kernels must also observe dependencies, thus naturally leading to a graph-based implementation. In this poster, we will present how we realised such an implementation in an in-house multiphysics FEM code using Kokkos::Graph. In particular, we will discuss how we designed a polymorphic hierarchy of functors for performing the assembly on device and how we map these functors to nodes in the graph, while avoiding polymorphic calls on device. We will illustrate the proposed approach and evaluate the performance in the context of a computational electromagnetism simulation relevant to diffraction gratings.

Author(s): Maarten Arnst (University of Liège), and Romin Tomasetti (University of Liège)

Domain: Computational Methods and Applied Mathematics


Electronic Structure Calculations Powered by DLA-Future

DLA-future implements efficient multicore and GPU eigenvalue solvers, designed around C++’s std::execution concurrency proposal (P2300) as implemented in pika. DLA-Future takes advantage of asynchronous task-based programming, and it is designed to exploit modern heterogeneous architectures. DLA-Future also provides a C API, and a Fortran API shipped separately by DLA-Future-Fortran. These APIs allow to use DLA-Future as a drop-in replacement of ScaLAPACK eigensolvers, and have been integrated in popular electronic structure codes: CP2K and SIRIUS. DLA-Future-Fortran is also integrated in ELSI, a unified interface for electronic structure codes into high-performance eigensolvers, and therefore usable in SIESTA, and FHI-aims. We present weak scaling results for DLA-Future, up to matrix sizes of 500k x 500k, and results for realistic and large-scale electronic structure calculations with popular codes in which DLA-Future is integrated: CP2K and SIRIUS. Results are presented for modern hardware (NVIDIA GH200, AMD MI250x) available on the ALPS research infrastructure at the Swiss National Supercomputing Center (CSCS).

Author(s): John Biddiscombe (ETH Zurich / CSCS), Alberto Invernizzi (ETH Zurich / CSCS), Rocco Meli (ETH Zurich / CSCS), Auriane Reverdell (ETH Zurich / CSCS), Mikael Simberg (ETH Zurich / CSCS), and Raffaele Solcà (ETH Zurich / CSCS)

Domain: Chemistry and Materials


Enabling Lattice QCD Normalizing Flows in HPC Infrastructures

The Horizon Europe project interTwin aims at developing a prototype for a multidisciplinary Digital Twin Engine, applicable across a whole spectrum of scientific disciplines: High Energy Physics (HEP), Environment, Climate, etc. As part of this effort we explore the extent to which Machine Learning (ML) methods can speed up Lattice Gauge Theory Simulations in challenging areas of the parameter space where Monte Carlo methods suffer from severe critical slowing down. The overall goal is progressing towards designing the digital twin of a HEP detector, where Lattice QCD simulations could provide future realistic simulations of the Standard model. We are exploiting the advantages of the tools developed in the project interTwin, notably intertwinai, to scale up and support the deployment of our simulations in HPC systems, while enabling as well several code features. The itwinai toolkit provides functionalities for distributed machine learning on HPC, supporting different distributed frameworks (DeepSpeed, Horovod, and PyTorch DistributedDataParallel) implementing different communication protocols across different GPUs, suited to different infrastructures. Furthermore, itwinai also offers a profiling feature based on the PyTorch profiling backend, enabling it to identify communication and computation shares. This profiler will enable the identification of bottlenecks, and hence optimize the code to improve performance.

Author(s): Matteo Bunino (CERN), Isabel Campos Plasencia (IFCA/CSIC), Javad Komijani (ETH Zurich), Marina Marinkovic (ETH Zurich), Gaurav Sinha Ray (IFCA/CSIC), Rakesh Sarma (Forschungszentrum Jülich), and Jarl Sondre Saether (CERN)

Domain: Computational Methods and Applied Mathematics


Enhancing Neural Network Robustness with Adversarial Images Generated by Swarm Algorithms

The effectiveness of neural networks is often compromised by misclassification vulnerabilities when exposed to adversarial attacks or biased data. This research explores a novel method for improving the robustness of neural networks through the generation of adversarial images using a swarm algorithm. This data can be generated through the use of our custom-designed software, “Adversarial Observation”. The proposed approach utilizes the collaborative nature of swarm intelligence to generate realistic, yet misleading, images that mimic the characteristics of true data. These synthetic adversarial images are then incorporated into the training set of the neural network to enhance its ability to recognize and correctly classify data under adversarial conditions. The study evaluates the impact of this method on the neural network’s performance in the face of misclassification, providing insights into how adversarial training can protect against current data vulnerabilities and improve overall robustness. Through experiments and analysis, the research demonstrates the integration of swarm-based adversarial image generation can significantly contribute to the security and reliability of deep learning models, particularly in domains sensitive to data manipulation.

Author(s): Jamil Gafur (The University of Iowa, National Renewable Energy Laboratory)

Domain: Computational Methods and Applied Mathematics


Enhancing Productivity and Performance Analysis on Euro HPC Systems

Large-scale EuroHPC High-Performance Computing (HPC) systems, such as Leonardo and Lumi, present significant challenges for developers. A key difficulty is adapting their software to new architectures, accelerators and different compiler options in order to fully leverage available resources. As a result, developers often spend a substantial amount of time manually running performance and scalability tests to ensure changes do not degrade performance or compromise portability across platforms. To streamline these repetitive, manual steps that hinder productivity, we propose an automated framework for integrating HPC performance testing and benchmarking into a Continuous Integration (CI) pipeline. By extending the ReFrame framework, a widely used HPC regression and benchmarking tool, our approach automates the monitoring of application performance. This enables both strong and weak scaling benchmarks and performance portability tests across multiple architectures. The system collects metrics, generates visual reports, and alerts developers to any performance regressions. With these capabilities, HPC developers can focus on scientific advancements rather than repetitive, time-consuming testing and benchmarking. We demonstrate this workflow using the ECsim space-plasma physics application, illustrating how integrating DevOps practices and ReFrame-driven automation can streamline HPC software development.

Author(s): Michael Redenti (CINECA), and Nitin Shukla (CINECA)

Domain: Physics


Estimation of Calving Law Parameters from Satellite Data

Capturing the calving front motion is critical for simulations of ice shelves and tidewater glaciers. Multiple physical processes, including sliding, water pressure and failure need to be understood to accurately model the front. Calving is particularly challenging due to its discontinuous nature and modellers require more tools to examine it. A common technique for capturing the front in ice simulations is the Level-set method. The front is represented implicitly by the zero isoline of a function. The movement of the front is described by a Hamilton-Jacobi PDE where the velocity of the front includes two components: the horizontal velocity of the ice sheet and the ablation rate, i.e., the sum of melting and calving rates. We are developing scalable simulation code to solve the Level-set problem and to estimate parameters of calving laws from satellite images using numerical optimization. The method is adaptable to different types of calving laws as well as other interface capturing problems and handles temporal sparsity of observations and coupling with an ice sheet model. The code is sufficiently scalable for large scale, high resolution models of continental ice sheets.

Author(s): Daniel Abele (German Aerospace Center, Technical University of Munich), Achim Basermann (German Aerospace Center), Martin Burger (DESY, University of Hamburg), Hans-Joachim Bungartz (Technical University of Munich), and Angelika Humbert (Alfred-Wegener-Institut, Helmholtz-Zentrum für Polar- und Meeresforschung; University of Bremen)

Domain: Computational Methods and Applied Mathematics


Flux-Form Semi-Lagrangian (FFSL) Schemes on a Triangular Mesh

Flux-Form Semi-Lagrangian (FFSL) schemes for the solution of hyperbolic partial differential equations are popular since they allow for high CFL numbers. There are applications in plasma physics and weather and climate simulations. For the latter recently icosahedral meshes on a spherical domain are used. An example of this is the ICON code. We simplify these type of meshes for our investigation to periodic two-dimensional equilateral triangular meshes. The dispersion relation of Miura’s scheme is evaluated numerically for different CFL numbers and wind directions. The type of mesh gives a 60 degree periodicity and 30 degree symmetry of the wave propagation properties. From the dissipation of the scheme the CFL number limit for stability is determined, if existing for particular variants of the scheme. The dispersion relation is further processed to a wave packet propagation analysis, which quantifies exponential dissipation for the class of schemes. A system response of the discretisation is determined, in order to evaluate hybrid schemes of spatial varying order. It confirms good wave propagation properties despite discontinuities due to hybridisation.

Author(s): Andreas Jocksch (ETH Zurich / CSCS), Nina Burgdorfer (MeteoSwiss), Daniel Reinert (DWD), Christoph Mueller (MeteoSwiss), and David Strassmann (ETH Zurich)

Domain: Computational Methods and Applied Mathematics


Fostering the Wider Adoption of High-Performance Computing by UK-Based Arts and Humanities Researchers via National Training and Community-Building Initiatives

The integration of computing innovations into Arts and Humanities (A&H) research is crucial. However, high-performance computing (HPC) is not widely used in A&H, posing risks to interdisciplinary integration with sciences that use advanced computational methods. Efforts in the UK to address the digital skills gap in A&H are limited. The lack of national digital infrastructures and skills in digital workflows specifically hinders HPC adoption. Two initiatives, recently funded by UK Research and Innovation (UKRI) aim to overcome these limitations. The Digital Skills Network in the Arts and Humanities (DISKAH) will build researcher capacity to engage with UK Digital Research Infrastructures, including HPC and AI resources. It builds on a network of universities scaling up existing national digital skills training initiatives and fostering new communities of practice via a UK fellowship programme. The Collaborative Computational Project for Arts, Humanities, and Culture (CCP-AHC) will establish a community of practice centred on the responsible development of scientific software codes and workflows that have the potential to transform A&H research. Over the next two years, both projects will build capacity, run engagement events, identify and support promising projects, and promote best practices, aiming to foster transparency, collaboration, and inclusion in A&H research.

Author(s): Eamonn Bell (Durham University), and Karina Rodriguez Echavarria (University of Brighton)

Domain: Applied Social Sciences and Humanities


GPU Porting of ECMWF Physical Parametrizations Using a High-Level Programming Model

We present recent developments in the GPU porting using the domain-specific library GridTools for Python (GT4Py) of three physical parametrizations from the Integrated Forecasting System (IFS) of the European Centre for Medium-Range Weather Forecasts (ECMWF): the cloud microphysics packages CLOUDSC & CLOUDSC2, the radiation scheme ecRad, and the surface parametrization ecLand. We outline our high-level porting strategy of legacy Fortran physics codes, highlight performance results for CLOUDSC & CLOUDSC2, and discuss ongoing work with ecRad and ecLand.

Author(s): Gabriel Vollenweider (ETH Zurich), Stefano Ubbiali (ETH Zurich), Christian Kühnlein (ECMWF), and Heini Wernli (ETH Zurich)

Domain: Climate, Weather and Earth Sciences


GPU-Accelerated DEM Simulations for Complex Particle Shapes: Optimizing Spheropolyhedron Contact Detection

The Discrete Element Method (DEM) is an N-body numerical method widely used to model granular materials with various particle shapes, including complex geometries like spheropolyhedra. A major computational challenge in DEM lies in contact detection, particularly for such complex shapes, which involve multiple simultaneous contact points and intricate geometry requiring costly intersection evaluations. This work focuses on adapting existing methods to efficiently handle spheropolyhedra geometries on GPUs. Two key developments are presented: extending the PCCP (Parallelized by Contact Candidate Pair) algorithm to these complex shapes, which redefines computational granularity by assigning GPU threads to contact pairs, and an optimized memory data layout (SOA) for efficient GPU memory access and data locality. These contributions speed up the contact detection and force calculation phases. The effectiveness of these GPU optimization methods is demonstrated through their implementation in the ExaDEM open-source HPC code, with performance evaluations on NVIDIA A100 and Grace Hopper GPUs. These optimizations enable large-scale simulations, handling from a few hundred thousand to several million particles, while maintaining reasonable simulation times. This work represents a significant advancement in DEM by enabling efficient large-scale simulations with complex particle geometries.

Author(s): Carlo Elia Doncecchi (CEA)

Domain: Engineering


GPU-Accelerated Fluid-Structure Interaction Resampling in FEM, Including Application of 3-Dimensional 4th-Order WENO

This work addresses the HPC challenges of fluid structure iteration (FSI), focusing on the computational efficiency of mesh resampling. A major computational challenge arises from the fact that the method requires gather and scatter memory access, which introduces a significant memory barrier, and the fraction of time to solution can be high. We have developed a collection of computational methods capable of achieving significant throughput on both GPU and CPU, and have had the opportunity to test them on the new Alps supercomputer al CSCS, based on the GH200 Grace Hooper. We have also begun to explore the possibility of using 3-dimensional 4th order WENO interpolation as a transfer function, which allows the adaptive removal of the oscillatory behavior that occurs in classical interpolation methods, thus improving the quality of the results.

Author(s): Simone Riva (Università della Svizzera italiana, Euler institute), and Patrick Zulian (Università della Svizzera italiana, Euler institute; UniDistance Suisse)

Domain: Computational Methods and Applied Mathematics


A GPU-Accelerated Implementation of Spectrum Slicing for Plane-Wave Density Functional Theory in ABINIT

We consider the problem of accelerating the iterative diagonalization of Hamiltonian operators for electronic structure calculations in plane-wave Density Functional Theory. The complexity bottleneck of existing subspace iteration schemes is that the Rayleigh-Ritz procedure for extracting eigenvectors from a subspace has limited scalability with the number of processors. On the other hand, polynomial filtering methods for constructing subspaces can greatly benefit from the parallel efficiency of Hamiltonian applications using batch processing. The latter feature becomes even more advantageous on Graphics Processing Unit architectures in particular. For this reason, we focus on the spectrum slicing method, which is a special type of polynomial filtering allowing to apply the Rayleigh-Ritz step to fewer vectors. We present a new implementation of this scheme in the ABINIT plane-wave code using GPUs. We also propose a complexity model that allows to achieve load balance in the number of filtering operations and the number of vectors on which we apply the Rayleigh-Ritz step per slice. The numerical performance of our implementation is compared to that of existing algorithms in ABINIT, in particular Chebyshev Filtering and Locally Optimal Block Preconditioned Conjugate Gradient, for systems of up to ten thousand bands.

Author(s): Ioanna-Maria Lygatsika (CEA, Université Paris-Saclay), and Marc Torrent (CEA, Université Paris-Saclay)

Domain: Chemistry and Materials


GPU-Accelerated Matrix Decomposition and Selected Inversion for Banded Arrowhead Matrices

Matrix inversion is a fundamental operation in linear algebra which arises in various scientific problems. Many applications are cast as sparse linear systems, however, when inverted, they produce dense matrices. In some cases, only a subset of the complete inverse—referred to as selected inverse—is required. This approach is especially relevant in fields like statistical learning and nano-electronics, where the underlying sparse matrices of interest often exhibit a banded arrowhead sparsity pattern or can be efficiently permuted to one. Efficient, GPU-accelerated, implementations for the selected inversion of block tridiagonal arrowhead matrices exist within the Serinv library. Our work builds upon this foundation by extending the existing selected inversion routines to cover related sparsity patterns, such as banded arrowhead and n-block diagonal arrowhead matrices. Banded implementations only work with non-zero elements of the matrix but are challenging to implement efficiently on GPUs. To address this, we explore an n-block diagonal tiling approach. Although this method may introduce some zero elements, it allows for greater efficiency, and is well-suited for parallelization on GPUs. We rely on Python for ease of use and compatibility, alongside CuPy for efficient GPU computations. This combination enables us to deliver scalable and high-performance solutions for selected inversion tasks.

Author(s): Carla Lopez Zurita (ETH Zurich), Lisa Gaedke-Merzhäuser (King Abdullah University of Science and Technology, Università della Svizzera italiana), Vincent Maillou (ETH Zurich), and Olaf Schenk (Università della Svizzera italiana)

Domain: Computational Methods and Applied Mathematics


Graph Abstraction for Efficient Scheduling of Synchronous Workloads on GPU

Many computational physics simulations need to efficiently execute asynchronous workloads (FEM assembly, linear algebra, etc) that can be organised as a Direct Acyclic Graph (DAG). Ad hoc scheduling of these asynchronous workloads is an additional burden to the code and might not fully exploit the available execution resources (e.g. a multi-GPU node). By contrast, architecting the code based on a graph abstraction exposes the whole computational graph to the compiler/driver ahead of execution, thereby enabling as many optimisations as possible. Therefore, a graph abstraction that can be prescribed either at compile time or at runtime is necessary, and it must be mappable to the best backend scheduler, thus maximising resource usage. We contribute to the Kokkos implementation of this graph abstraction which allows for a performance portable single source code. More specifically, this poster will focus on recent contributions to Kokkos::Graph that make it evolve towards the C++ std::execution proposal for managing asynchronous execution on generic execution resources (P2300). We will demonstrate the benefits of using Kokkos::Graph both in terms of performance and software design. We will present several examples of varying complexity, including a FEM simulation of electromagnetic wave scattering.

Author(s): Romin Tomasetti (University of Liège), and Maarten Arnst (University of Liège)

Domain: Computational Methods and Applied Mathematics


GT4Py: A Python Framework for the Development of High-Performance Weather and Climate Applications

GT4Py is a Python framework for weather and climate applications simplifying the development and maintenance of high-performance codes in prototyping and production environments. GT4Py separates model development from hardware-dependent optimizations, instead of intermixing them in source code, as regularly done in lower-level languages like Fortran or C. Domain scientists focus solely on numerical modeling using a declarative embedded domain specific language supporting common computational patterns of dynamical cores and physical parametrizations. An optimizing toolchain then transforms this high-level representation into a finely tuned implementation for the target hardware architecture. This separation of concerns allows performance engineers to add optimizations or support new hardware architectures without modifying the application itself, increasing productivity for both domain scientists and performance engineers. We present recent developments from the past year, including a new performance backend based on DaCe, a framework for high-performance parallel programming that uses a data-centric intermediate representation (IR) based on Stateful DataFlow multiGraphs (SDFG). Performance results on Nvidia GH200 and AMD MI300 are presented for two weather models: an ICON-based model ported to GT4Py as part of the EXCLAIM project at ETH, and PMAP, a portable atmospheric model developed at ECMWF, covering applications from large-eddy simulations to global numerical weather prediction.

Author(s): Mauro Bianco (ETH Zurich / CSCS), Yilu Chen (ETH Zurich), Till Ehrengruber (ETH Zurich / CSCS), Sara Faghih-Naini (ECMWF), Nicoletta Farabullini (ETH Zurich), Abishek Gopal (NCAR, ETH Zurich), Rico Häuselmann (ETH Zurich / CSCS), Samuel Kellerhals (ETH Zurich), Christos Kotsalos (ETH Zurich / CSCS), Ioannis Magkanaris (ETH Zurich / CSCS), Magdalena Luz (ETH Zurich), Christoph Müller (MeteoSwiss), Philip Müller (ETH Zurich / CSCS), Edoardo Paone (ETH Zurich / CSCS), Enrique González Paredes (ETH Zurich / CSCS), David Strassmann (ETH Zurich), Felix Thaler (ETH Zurich / CSCS), Hannes Vogt (ETH Zurich / CSCS), and Thomas Schulthess (ETH Zurich / CSCS)

Domain: Climate, Weather and Earth Sciences


Harnessing High Performance Computing for Advanced Biomarker Discovery from Wearable Device Data: A Pathway to Optimized Therapeutic Outcomes

The integration of data from smartphones and wearable devices offers a groundbreaking opportunity to apply machine learning for advancements in digital health. This project presents a case study demonstrating the application of advanced machine learning techniques to large-scale, heterogeneous datasets, with a focus on identifying clinically relevant biomarkers and enabling personalized therapeutic pathways. The project highlights the challenges inherent in managing and analyzing the complexity of physiological and environmental data streams, enriched by user annotations such as mood tracking and medication intake. By leveraging high-performance computing (HPC) infrastructures, the methodology addresses the heterogeneity, volume, and real-time requirements of these datasets. This poster will provide a detailed examination of how HPC-enabled workflows facilitate the preprocessing, feature extraction, and analysis of multi-modal data. It will also illustrate the scalability of the approach, offering insights into the translation of digital health data into innovative therapeutic interventions. The discussion will emphasize the computational techniques used, the challenges of HPC adaptation for machine learning, and the clinical relevance of the findings, while focusing on the scalability and reproducibility of the methodology.

Author(s): Silvano Coletti (Università degli Studi Guglielmo Marconi, CHELONIA SA), and Francesca Fallucchi (Università degli Studi Guglielmo Marconi)

Domain: Engineering


High-Resolution Atmospheric Inversion Modeling of Methane Across Europe Using Heterogeneous Computing and Satellite Data Assimilation

The atmospheric composition has been significantly altered by anthropogenic activities such as fossil fuel combustion and agriculture, releasing greenhouse gases like methane, a driver of climate change. Understanding the sources and sinks of these emissions is critical for effective mitigation, and computational modeling has become an essential tool in this effort. Atmospheric inversion refines emission inventories by inferring emission sources from observations, enhancing our understanding of atmospheric processes. Leveraging GPU acceleration, we employ the ICON-ART atmospheric transport model integrated with an ensemble Kalman filter data assimilation system to perform high-resolution inversions. This framework achieves resolutions comparable to advanced satellite pixel sizes (e.g., TROPOMI, 5.5 × 7 square km), significantly improving the utilization of observational data. The Online Emissions Module (OEM) within ICN-ART processes emissions dynamically during simulations, enhancing modeling accuracy. Using OpenACC directives, we ported OEM and related codes to GPUs, enabling computationally efficient, high-resolution inversions. Initial applications focus on methane emissions in Romania, validated against a previous measurement campaign. The framework will scale to Europe, assimilating TROPOMI data to optimize inventories. This GPU-accelerated atmospheric inversion system provides unprecedented spatial and temporal details, offering actionable insights for policymakers and advancing the field of atmospheric modeling to address critical climate challenges.

Author(s): Arash Hamzehloo (Empa), Michael Steiner (Empa), and Dominik Brunner (Empa)

Domain: Climate, Weather and Earth Sciences


How to Build an Energy Dataset for HPC

Quantifying the energy consumption in HPC domain is becoming increasingly critical nowadays, driven by rising energy costs. To gain a comprehensive understanding of the energy footprint created by the significant power demand of modern systems like Alps, which exceeds its predecessor Piz Daint in energy usage, the Swiss National Supercomputing Centre (CSCS) decided to collect and aggregate energy consumption data from various sources associated with each job executed on Alps to create an energy dataset. This dataset is contained in a MySQL database and consists of: – Job Metadata: This includes fields such as job ID, CPU hours consumed, nodes utilized, start time, end time, and elapsed time. This data is sourced from the SLURM workload manager via the jobcompletion plugin component. Energy Metrics: energy consumption data is derived from telemetry sensors installed on Alps supercomputer. These sensors, provided by DMTF’s Redfish® standard , capture raw data then processed to calculate energy consumption for all nodes associated with a job and aggregate the results. SLURM Energy data: added through energy plugin to the Job Metadata. Quality Indicators: Additional fields are included to assess and ensure reliability and accuracy of computed energy consumption metrics. DCGM data: Nvidia GPU collection metrics system.

Author(s): Mathilde Gianolli (ETH Zurich / CSCS), Massimo Benini (ETH Zurich / CSCS), Jean-Guillaume Piccinali (ETH Zurich / CSCS), Gianna Marano (ETH Zurich / CSCS), and Dino Conciatore (ETH Zurich / CSCS)

Domain: Computational Methods and Applied Mathematics


HQAC: Transforming AI, Quantum, and HPC into a Unified Open Framework

The rapid advancements in artificial intelligence (AI), quantum computing, and high-performance computing (HPC) present immense opportunities, yet their siloed development limits their collective impact. AI faces challenges with accessibility and energy demands, quantum computing is in its early stages with limited applications, and HPC requires modernization to integrate with these evolving paradigms. The need for a unified platform is urgent to maximize their potential. The High-Performance Quantum and AI Computing (HQAC) framework provides a transformative open platform that integrates AI, quantum computing, and HPC into a cohesive ecosystem. HQAC emphasizes openness, sustainability, heterogeneity, state-of-the-art performance, and expanding use cases. Early results underscore its effectiveness: a 2x speedup in HPC rendering workloads without quality degradation, a 6x improvement in AI inference latency with a 40% reduction in memory footprint and energy savings, and the successful launch of the Quantum AI Challenge, engaging over 100 participants from 19 countries, including 17 Historically Black Colleges and Universities (HBCUs). HQAC has been implemented as a competitive alternative to generic cloud-based AI and HPC systems and is accessible via the Education Cloud Cluster: https://hbcu1.ecc.fai.institute. It serves as a promising platform to spur research advancements in fields such as drug discovery, material science, and renewable energy.

Author(s): David Ojika (Flapmax)

Domain: Computational Methods and Applied Mathematics


Improving Productivity of Threaded Scientific Applications with Quo Vadis

Scientific discovery is increasingly enabled by heterogeneous hardware that includes multiple processor types, deep memory hierarchies, and heterogeneous memories. To effectively utilize this hardware, computational scientists must compose their applications using a combination of programming models, middleware, and runtime systems. Since these systems are often designed in isolation from each other, their concurrent execution results in resource contention and interference, which limits application performance and scalability. Consequently, real-world applications face numerous challenges on heterogeneous machines. This poster presents the thread interface of Quo Vadis, a runtime system that helps hybrid applications make efficient use of heterogeneous hardware, eases programmability in the presence of multiple programming abstractions, and enables portability across systems. Applications using OpenMP or POSIX threads can now benefit from Quo Vadis’ high-level abstractions to map and remap full physics packages to the machine dynamically with a handful of functions. Furthermore, the thread interface has similar semantics to the process interface, allowing scientists to leverage a single-semantics model for partitioning and assignment of resources to workers, whether they are processes or threads. With both process and thread interfaces, Quo Vadis broadens its applicability to improve the productivity of computational scientists across programming abstractions and heterogeneous hardware.

Author(s): Edgar A. Leon (Lawrence Livermore National Laboratory), Samuel K. Gutierrez (Los Alamos National Laboratory), and Guillaume Mercier (Bordeaux-INP; Inria, CNRS, LaBRI UMR 5800)

Domain: Computational Methods and Applied Mathematics


Integrating the ICON4Py Python-Based Dynamical Core into ICON

The integration of Python-based high-performance computing into legacy Fortran climate models offers new opportunities for flexibility and efficiency. This poster presents the integration, in the Fortran ICON implementation of the dynamical core implemented in Python, as part of ICON4py, a still in-progress Python-based implementation of the ICON climate model. ICON4Py leverages GT4Py for optimizing the numerical computations. For improved interoperability, the Fortran application invokes the Python interpreter, which in turns executes the ICON4py dynamical core. The performance of ICON4py is enhanced through the Data Centric (DaCe) backend of GT4Py, developed by SPCL at ETH Zurich. DaCe applies dataflow transformations to optimize memory locality and parallel execution, improving efficiency across CPU and GPU architectures. This approach provides computational performance while maintaining the code maintainable. As we plan to use this implementation in a production-setting, the poster will address the verification and testing process. Performance evaluations demonstrate competitive or improved runtime efficiency compared to the legacy system. By integrating Python’s expressiveness with GT4Py and DaCe’s performance optimizations, this poster showcases a viable pathway for modernizing climate models. Our approach exemplifies how scientific software can evolve to leverage emerging computing paradigms while maintaining accuracy and performance.

Author(s): Mauro Bianco (ETH Zurich / CSCS), Magdalena Luz (ETH Zurich), Christoph Muller (MeteoSwiss), Daniel Hupp (MeteoSwiss), Anurag Dipankar (ETH Zurich), Edoardo Paone (ETH Zurich / CSCS), Xavier Lapillonne (MeteoSwiss), Nicoletta Farabullini (ETH Zurich), Enrique Gonzales Pareder (ETH Zurich / CSCS), Hannes Vogt (ETH Zurich / CSCS), Ong Chia Rui (ETH Zurich), Till Ehrengruber (ETH Zurich / CSCS), Yilu Chen (ETH Zurich), Philip Muller (ETH Zurich / CSCS), and Christos Kotsalos (ETH Zurich / CSCS)

Domain: Climate, Weather and Earth Sciences


Interactive Visualization of High-Energy Physics Events via Nvidia Omniverse

Simulations play a crucial role in high energy, nuclear, and accelerator physics, aiding in both data analysis and hardware development. Over the years, several advanced programs have been created to generate detailed and precise simulated events, providing insights into complex physical processes. While visualizing particle motion provides a powerful means to grasp physical interactions, its widespread adoption has been hindered by high computational demands and the absence of truly interactive tools. This work leverages NVIDIA Omniverse as a platform to enhance simulation via interactive visualizations. Built on the Universal Scene Description (USD) framework, Omniverse provides robust tools for integrating 3D scene composition, animation, and real-time interactivity, enabling new possibilities for dynamic exploration. The project aims to develop a user-friendly interface where users define simulation parameters via Omniverse, with Geant4 powering the simulation in the background. A key innovation is the translation of Geant4 geometries into USD format, which allows users to interact with and modify simulation scenarios in real time within Omniverse. This contribution presents use cases developed with Omniverse and Geant4, illustrating the potential of this approach to enhance interactive visualization in physics simulations while addressing challenges posed by the intricate detector geometries and complex processes of real experiments.

Author(s): Felice Nenna (INFN Bari, University of Padova), Marcello Maggi (INFN Bari), Matteo Bunino (CERN), Stewart Boogert (University of Manchester), and Siobhan Alden (Royal Holloway, University of London)

Domain: Physics


itwinai: Enabling Scalable AI Workflows on HPC for Digital Twins in Science

The interTwin project is advancing the integration of Digital Twins across scientific domains, focusing on physics and climate research. A key component of this project is itwinai, a Python library designed to streamline scalable AI workflows on High-Performance Computing (HPC) systems. With its unified interface, itwinai simplifies the deployment and optimisation of AI models across leading frameworks for distributed machine learning. The library features tooling for profiling scalability and monitoring GPU utilisation, allowing scientists to better understand and show how well their code is distributed. It also helps to identify inefficiencies, enhancing sustainability and helping to develop greener AI solutions. Recent advancements include support for large-model parallelism and distributed hyperparameter optimization (HPO). By providing a uniform pipeline to run AI workflows easily and intuitively, itwinai lowers the barriers to these complex domains, empowering scientists to achieve reproducible, high-performance results on HPC infrastructure. Through integration with interLink, itwinai facilitates seamless offloading of compute-intensive tasks from cloud to HPC. Validated on diverse use cases in physics and climate research, including collaborations with CMCC, EURAC and CERFACS, itwinai has shown that it has the potential to address challenges in renewable energy, climate modelling, and sustainable development.

Author(s): Matteo Bunino (CERN), Anna Elisa Lappe (CERN), Jarl Sondre Sæther (CERN), Rakesh Sarma (FZ Jülich), Maria Girone (CERN), and Andreas Lintermann (FZ Jülich)

Domain: Engineering


KBase Research and Genome Annotation Agent

We leverage the latest developments in Large Language Models (LLMs) to create a KBase Research Agent, guiding users in navigating and analyzing data within the KBase platform. This Agent currently converses with users, interprets their data and its relationship with public data on the KBase system, and uses this information to help users reach their analysis goals. It does this by building a custom analysis plan by discussing options with users using the human in the loop tool and orchestrates the execution of the plan through the KBase jupyter notebook-like Narrative interface. Using KBase apps, the agent will help assemble and annotate the reads, ensuring quality control at each step. It will interpret outputs and guide each subsequent step in the workflow. This agent enables LLM-driven acceleration in making a large number of private fully sequenced microbial genomes to become publicly available, significantly lowering the barrier.

Author(s): Prachi Gupta (Lawrence Berkeley National Laboratory), William Riehl (Lawrence Berkeley National Laboratory), Meghan Drake (Oak Ridge National Laboratory), Sean P. Jungbluth (Lawrence Berkeley National Laboratory), Christopher J. Neely (Lawrence Berkeley National Laboratory), Ziming Yang (Brookhaven National Laboratory), Marcin P. Joachimiak (Lawrence Berkeley National Laboratory), Mikaela Cashman (Lawrence Berkeley National Laboratory), Richard Shane Canon (Lawrence Berkeley National Laboratory), Adam P. Arkin (Lawrence Berkeley National Laboratory), and Paramvir S. Dehal (Lawrence Berkeley National Laboratory)

Domain: Life Sciences


Malware Detection Using Machine Learning

Our research presents a novel anti-malware technique based on image processing and advanced machine-learning algorithms. Traditional approaches such as signature-based detection and behavior analysis, which are conventionally used to detect malware, cannot successfully identify most contemporary examples of this phenomenon. In our work, we use samples of malware images from IEEE datasets to design a more effective and reliable detection model. Here again, the proposed solution encompasses clustering algorithms and several settings to improve the detection of new variants of malware. To achieve better results the research combines malware and legitimate data collected from such resources as the Canadian Institute for Cybersecurity. The dataset contains a huge number of samples containing flow duration, packets, and the flag. This approach goes beyond the usual detection systems, and it may help to improve the accuracy and time to assess the threats in the cyber domain. The investigation and the findings presented in this paper add to the development of automated cybersecurity tools and to safeguarding digital spaces, thus bringing tangible value to organizations and end consumers as the sphere continues to develop due to the emergence of new and more complex threats.

Author(s): Manaswini Gopala (San Diego State University), Devashish Palate (San Diego State University), and Lokeshwari Anamalamudi (San Diego State University)

Domain: Computational Methods and Applied Mathematics


The MENTOR Interpretation Agent: From Network Embeddings to Mechanistic Narratives via Retrieval-Augmented LLMs

Despite an increasing number of complex omics data sets, extracting comprehensive mechanistic insights from these data remains challenging. To address this, we developed a human-in-the-loop LLM-based agentic retrieval-augmented generation (RAG) pipeline, the MENTOR Interpretation Agent (MENTOR-IA), to identify novel relationships among multi-omic gene sets. We applied MENTOR-IA to interpret a previously characterized set of 211 opioid addiction-related genes. We first partitioned these genes into clades using hierarchical clustering of random walk with restart (RWR)-based graph embeddings presented in a dendrogram using our previously described MENTOR algorithm. MENTOR-IA identified Akt, ERK, and BDNF signaling pathways known to be critical to synaptic plasticity, previously reported to be associated with the 211 opioid addiction-related genes. In addition, our pipeline identified novel biological processes like extracellular matrix remodeling and vasculogenesis that were not identified through prior manual review. These results illustrate that our integrative pipeline facilitates scalable interpretation of multi-omic datasets, accelerating our capability to comprehend complex biological traits. Ultimately, these innovations will enhance our ability to derive actionable insights for disease biology and therapeutic development from multi-omic data.

Author(s): Anna H.C. Vlot (Oak Ridge National Laboratory), Matthew Lane (Oak Ridge National Laboratory; Bredesen Center for Interdisciplinary Graduate Research and Education, University of Tennessee-Knoxville), Kyle A. Sullivan (Oak Ridge National Laboratory), Peter Kruse (Oak Ridge National Laboratory; Bredesen Center for Interdisciplinary Graduate Research and Education, University of Tennessee-Knoxville), John Dandy (Oak Ridge National Laboratory), Selin Kaplanoglu (Oak Ridge National Laboratory), Alice Townsend (Oak Ridge National Laboratory; Bredesen Center for Interdisciplinary Graduate Research and Education, University of Tennessee-Knoxville), Jean Merlet (Oak Ridge National Laboratory; Bredesen Center for Interdisciplinary Graduate Research and Education, University of Tennessee-Knoxville), and Daniel A. Jacobson (Oak Ridge National Laboratory)

Domain: Life Sciences


Mixed Precision Customized for Discontinuous Galerkin Methods

We present an approach to enhance storage efficiency and reduce memory bandwidth utilization in modal Discontinuous Galerkin (DG) methods by introducing a customized mixed-precision representation for the solution vector. Our approach leverages variations in floating-point accuracy requirements among the local degrees of freedom associated with different modal basis functions. Using a common exponent, we represent the local solution vector compactly, optimizing storage efficiency. This approach significantly reduces memory usage while preserving numerical accuracy. To fully utilize this new representation, we design specialized arithmetic operations for the new datatype —addition, subtraction, multiplication, and division— ensuring stability and precision. The findings highlight the potential of mixed precision to balance accuracy and performance, enabling scalable and efficient implementations of DG methods on modern HPC architectures. This study provides practical insights and guidelines for integrating mixed-precision strategies into high-order numerical methods, promoting broader adoption in computational science.

Author(s): Shivam Sundriyal (University of Bayreuth), Markus Büttner (University of Bayreuth), and Vadym Aizinger (University of Bayreuth)

Domain: Computational Methods and Applied Mathematics


Multi-Omic Single Cell Network Perturbation for Phenotypic Prediction

Drug repurposing offers a cost-effective strategy to identify new applications for existing medications, leveraging established safety profiles to accelerate therapeutic development. Advances in computational biology and large-scale multi-omics data enable systematic identification of novel therapeutic opportunities, addressing unmet medical needs and advancing precision medicine. This study employs a multiplex network integrating 10 literature-based layers from the HumanNet database and 320 data-driven predictive expression networks derived from single-cell RNA sequencing and bulk transcriptomic data. Constructed using the iRF-LOOP algorithm, requiring over 500,000 compute hours on the Frontier supercomputer, this multiplex provides a framework for analyzing gene functions across diverse biological contexts. We applied the Random Walk with Restart algorithm to compute embeddings for 52,722 genes, quantifying their topological relevance within the network. Drug-gene interactions from DrugBank and disease-gene associations from UKBiobank GWAS were mapped to these embeddings, linking therapeutic agents to potential targets and revealing biomarkers. A case study on glucagon-like peptide 1 receptor (GLP-1R) agonists, initially developed for type 2 diabetes, identified genes topologically connected to GLP-1R (TMPRSS2, PNPLA3, DHX37, ZNF91, DTHD1, and IRX3) and associated diseases. This study demonstrates the power of multiplex networks and supercomputing in uncovering connections between genes, drugs, and diseases, offering insights into therapeutic discovery.

Author(s): Matthew Lane (Oak Ridge National Laboratory, University of Tennessee), Erica Prates (Oak Ridge National Laboratory), Alice Townsend (Oak Ridge National Laboratory, University of Tennessee), Jean Merlet (Oak Ridge National Laboratory, University of Tennessee), Christiane Alvarez (Oak Ridge National Laboratory), Alana Wells (Oak Ridge National Laboratory), and Daniel Jacobson (Oak Ridge National Laboratory, University of Tennessee)

Domain: Life Sciences


Optimizing Data Offload in the IFS Using GPU-Aware Data Structures and Source-To-Source Translation

The adaptation of the ECMWF’s medium-range forecasting model, the Integrated Forecasting System (IFS), to heterogeneous computing architectures is an ongoing effort. The IFS consists of millions of lines of Fortran code that is highly optimized for modern CPUs. This poses significant challenges when porting the code to heterogeneous architectures, as data layouts and compute patterns need to be changed to efficiently utilise the hardware. To solve this problem, at ECMWF, we use FIELD API, a GPU-aware data-structure library and Loki, a freely programmable source-to-source translation toolchain written in Python, to generate architecture specific optimised code. In this poster, we show how FIELD API and Loki can be used to generate efficient code for asynchronously offloading data to GPUs. We use “dwarf-cloudsc”, a computationally representative proxy of the IFS physics, to demonstrate the application of Loki to generate two versions of OpenACC accelerated Fortran. One version, that offloads all Fields over the same stream and a second version that blocks the offload of fields over multiple streams and overlaps computation and communication. We provide a comparison of the performance of the three versions showing promising results for the offload of the full IFS to GPUs.

Author(s): Johan Ericsson (ECMWF), Ahmad Nawab (ECMWF), Balthasar Reuter (ECMWF), Philippe Marguinaud (Meteo-France), Judicaël Grasset (Meteo-France), and Michael Lange (ECMWF)

Domain: Climate, Weather and Earth Sciences


Optimizing the ECsim Plasma Code for Exascale Architectures: GPU Acceleration, Portability, and Scalability

This work presents the adaptation of the plasma code ECsim for future exascale architectures. The code has three main blocks called particle movers, moment gathering and field solver. The first two blocks are the most computationally challenging, thus we focused on optimizing them for GPU acceleration using OpenACC directives. Our approach prioritized GPU readiness with minimal code restructuring. The legacy CPU code makes extensive use of C++ structures and templates, which hinder seamless GPU implementation. To overcome this, we manually managed data transfers through CUDA API calls. Performance profiling on NVIDIA GPUs reveals a speedup of 5x to 9x compared to the CPU implementation (considering node-to-node comparison). Scaling tests conducted on multiple supercomputers demonstrate ECsim scalability, achieving above 80% efficiency up to 1024 GPUs in weak and strong scaling tests for adequately sized problems. We further extended this work to use also OpenMP target directives. Our memory management strategy for GPU porting allowed for minimal effort in this case, enhancing the portability of ECsim across different GPU architectures. Comparative analysis on NVIDIA GPUs highlights the code portability and significant speedup also with OpenMP target directives compared to the CPU. Similar work is underway on an AMD GPU system at EuroHPC.

Author(s): Nitin Shukla (CINECA), Elisabetta Boella (E4 Computer Engineering), Filippo Spiga (NVIDIA Inc.), Michael Redenti (CINECA), Mozhgan Kabiri Chimeh (NVIDIA Inc.), and Maria Elena Innocenti (Ruhr University Bochum)

Domain: Physics


Performance Portability Across Different Mathematical Models, Hardware, and Simulation Scenarios in Molecular Dynamics

Due to the importance of Molecular Dynamics simulations within fields such as thermodynamics, numerous methods have been developed to speedup the force calculations, which typically dominate the runtime. None of these methods are, however, optimal for every molecular model, on every hardware, and for every distribution of molecules. For non-HPC-expert simulation developers, choosing and then implementing the best method for their particular simulation is a challenging task. Furthermore, different regions of the simulation can have different molecule distributions with different optimal methods, and these can change as the distribution changes. A solution to this problem, AutoPas, is an open-source, C++17, particle simulation library, which can be used to build a particle simulator that automatically selects and tunes the optimal algorithmic configuration from its internal library. It can optimise for either time or energy. This poster focuses on efforts to improve its performance with complex multi-site molecular models, where the amount of data required per molecule may vary, as well as algorithm selection methods that use data-driven and expert-knowledge-based approaches to reduce the overhead of its selection process. This poster highlights the practicalities of such approaches being used by non-HPC expert users.

Author(s): Samuel James Newcome (Technical University of Munich), Fabio Alexander Gratl (Technical University of Munich), Manish Kumar Mishra (Technical University of Munich), Markus Mühlhäußer (Technical University of Munich), Jonas Schumacher (Technical University of Munich), and Hans-Joachim Bungartz (Technical University of Munich)

Domain: Computational Methods and Applied Mathematics


A Portable Testing and Visualisation Framework for Scientific Libraries on HPC Systems

Benchmarking and regression testing are crucial for ensuring the reliability and performance of software applications in scientific computing. Our work addresses this need within the context of a EuroHPC consortium challenge, involving EUMaster4HPC students and focusing on developing a benchmarking and testing framework for scientific libraries. We contribute a Python interface for Reframe, a framework for regression testing and benchmarking in HPC systems. This interface abstracts system configurations from test-specific parameters, enabling seamless portability across different HPC environments. It supports diverse compilation configurations, including hardware-specific optimisations and toolchains, making it highly adaptable to various platforms. Additionally, we developed a visualisation interface using Grafana, which consolidates test results into interactive dashboards. These dashboards provide real-time insights into performance trends and correctness metrics, simplifying the interpretation of complex data for researchers and developers. To validate our approach, we consider QMeCha, a quantum Monte Carlo package for simulating molecular systems. Testing on EuroHPC’s Deucalion supercomputer, across its x86 and ARM partitions, we leveraged our interface to evaluate performance differences among linear algebra libraries like BLAS and LAPACK. Our work highlights the strengths and limitations of these configurations, demonstrating the utility and flexibility of our testing framework.

Author(s): Emanuele Bellini (Università della Svizzera italiana), Alberto Guerrini (Università della Svizzera italiana), Domenico Santarsiero (Università della Svizzera italiana), Melanie Tonarelli (Università della Svizzera italiana), and Tommaso Trabacchin (Università della Svizzera italiana)

Domain: Engineering


Pyccel: Automating Translation of Python Prototypes to C/Fortran Production Codes

Python is a widely popular programming language, valued for its simplicity, ease of learning, and vast ecosystem of packages, making it ideal for scientific applications. However, its execution speed is a major limitation compared to low-level languages. We present Pyccel, an intuitive transpiler that helps researchers accelerate their Python developments and generate recognisable human-readable Fortran or C code. With the release of Pyccel 2.0 the entry barrier for users has been decreased further. While Pyccel’s clean error messages and simple Python-compatible type annotations already allowed researchers to quickly improve their execution times, the addition of support for common containers and classes (a feature often unsupported by tools like Numba, Jax or Pythran) reduces the work required to identify and isolate translatable compute kernels. We will present benchmarks showing speed-ups of around 50x compared to the original Python code as well as examples taken from the PyGyro code, where classes simplify large-scale simulations by improving code readability and maintainability. By empowering researchers to harness the performance of low-level languages without leaving the Python ecosystem, Pyccel narrows the chasm that researchers face when making the jump from prototype to high-performance computing.

Author(s): Emily Bourne (EPFL), Mohamed Jalal Maaouni (UM6P), and Yaman Güçlü (Max Planck Institute for Plasma Physics)

Domain: Computational Methods and Applied Mathematics


pyGinkgo: Python Bindings for Ginkgo

Over the past decade, machine learning has achieved significant advancements, with applications spanning diverse fields such as physics, medicine, economics or energy. A pressing challenge in contemporary machine learning is optimizing models for time and energy efficiency. One effective approach to enhance time efficiency is sparsification. While contemporary machine learning libraries such as PyTorch, TensorFlow, and SciPy offer decently optimized kernels for dense matrix computations, their performance for sparse matrix operations often falls short. To bridge the performance gap between dense and sparse computations in the Python world, we present pyGinkgo – Python bindings for the Ginkgo library. pyGinkgo enables Python users to leverage Ginkgo’s advanced capabilities for performing sparse computations within Python, offering significant potential for improving the performance of sparse neural networks and beyond. In this poster, we share initial benchmark results, demonstrating pyGinkgo’s potential to enhance performance in sparse matrix computations and hence, sparse neural networks within Python-based workflows.

Author(s): Keshvi Tuteja (Karlsruhe Institute of Technology), Gregor Olenik (Karlsruhe Institute of Technology), Roman Mishchuk (Technical University of Munich), Nicolas Venkovic (Technical University of Munich), Markus Götz (Karlsruhe Institute of Technology), Achim Streit (Karlsruhe Institute of Technology), Hartwig Anzt (Technical University of Munich, University of Tennessee), and Charlotte Debus (Karlsruhe Institute of Technology)

Domain: Computational Methods and Applied Mathematics


QSAR Model for Predicting Anti-Cancer Activity of Small Molecules Against Breast Cancer Cell Lines (MCF-7)

Quantitative Structure-Activity Relationship (QSAR) modeling has emerged as a powerful computational approach for accelerating drug discovery and development. In this study, we developed a QSAR model to predict the anti-cancer activity of small molecules against the MCF-7 breast cancer cell line, measured as inhibitory concentration (IC50).

Author(s): Abdelouahab Dehimat (University of M’sila)

Domain: Life Sciences


Relationship Between Rising Temperatures and Air Quality in India

The idea behind this research was to find and analyze the connection between rising ambient temperatures due to climate change, and the rapidly worsening air quality in India using modeling methods, such as scatter plots. We believed that the year-on-year temperature rise in India is connected to the worsening air quality across the country, particularly in the large urban areas. By researching the trends, and leveraging datasets that we found, we intended to show this correlation. We were then able to see the connection by comparing the datasets and by visualizing the data via a series of graphs. If our problem statement was instead false, no such correlation would be visible, and our hypothesis would be proven to be wrong.

Author(s): Sara Hazlewood (San Diego State University), Terry Tran (San Diego State University), Isabel Gilley (San Diego State University), and Max Winchester (San Diego State University)

Domain: Climate, Weather and Earth Sciences


Scalable Genomic Context Analysis with GCsnap2 on HPC Clusters

GCsnap2 Cluster is a scalable, Python-based high performance solution for genomic context analysis, co-developed by computer and life scientists to overcome the scalability limitations of its predecessor, GCsnap1 Desktop. Leveraging distributed computing with mpi4py.futures, GCsnap2 Cluster achieved a 30× improvement in execution time, and can now perform genomic context analyses of hundreds of thousands of protein-coding gene sequences on HPC clusters. Its modular architecture enables creation of task-specific workflows and flexible deployment on various computational environments, making it ideally-suited for bioinformatics studies of large-scale datasets. This work highlights the potential of applying similar approaches to solve scalability challenges in other scientific domains that rely on large-scale data analysis pipelines.

Author(s): Reto Krummenacher (University of Basel), Osman Seckin Simsek (University of Basel), Michèle Leemann (University of Basel, Swiss Institute of Bioinformatics), Leila T. Alexander (University of Basel, Swiss Institute of Bioinformatics), Torsten Schwede (University of Basel, Swiss Institute of Bioinformatics), Florina M. Ciorba (University of Basel), and Joana Pereira (University of Basel, Swiss Institute of Bioinformatics)

Domain: Computational Methods and Applied Mathematics


SimFS: Developments & Roadmap for I/O, Compression, Visualization and other Auxiliary Features for Climate Simulations

This project aims to provide a host of tools for long-running simulations via a runtime daemon process originally developed for SimFS, a file virtualization system that employs user-defined rules for storing data from time steps of simulation programs. This poster will showcase the current developments along with a roadmap of new use cases. These include data compression tools, integration of low-fidelity ML-based solvers for recomputation, in-situ visualization & dashboarding, among others. Given that it is currently catering to ICON4Py and the related weather & climate workflows, this poster invites deeper discussions with the potential user community so that its developments can better serve its intended users.

Author(s): Prashanth Kanduri (ETH Zurich / CSCS)

Domain: Computational Methods and Applied Mathematics


Simulations of Giant Impacts with Material Strength in pkdgrav3

Giant impacts form the last stage of planet formation and play a key role in determining many aspects like the final structure of planetary systems and the masses and compositions of its constituents. A common choice for numerically solving the equations of motion is the Smoothed Particle Hydrodynamics (SPH) method. We present a new SPH code with material strength built on top of the modern gravity code pkdgrav3. The code uses the Fast Multipole Method on a distributed binary tree to achieve O(N) scaling and is designed to use modern hardware (SIMD vectorization and GPU). Neighbor finding in SPH is done for a whole group of particles at once and is tightly coupled to the FMM tree code. It therefore preserves the scaling from the gravity code. A generalized Equation of State (EOS) interface allows the use of various material prescriptions. A shear strength formulation allows proper treatment of shock propagation in low velocity impacts or smaller bodies and the preservation of craters and other structures formed by impacts. With the example of the formation of the Caloris basin on Mercury (resolved with up to 2 billion particles) we demonstrate the advantages of high-resolution SPH simulations for planet scale impacts.

Author(s): Thomas Meier (University of Zurich), Christian Reinhardt (University of Zurich, University of Bern), Douglas Potter (University of Zurich), and Joachim Stadel (University of Zurich)

Domain: Physics


Spectral Methods for the Clustering of Cyclic and Acyclic Graphs

Traditional spectral clustering methods are designed for undirected graphs and fail to capture the directionality of the edges and of the connections between the clusters. The aim of our work is centered around developing novel spectral methods for the spectral clustering of directed graphs with block-cyclic and block-acyclic structures. Block-cyclic instances are obtained from phenomena with recurrent patterns, while block-acyclic ones capture hierarchical relationships, and usually appear in real-world scenarios such as task scheduling between processors and trophic networks. We extend previously introduced spectral methods for the clustering of block-cyclic and block-acyclic graphs to novel algorithms, employing nonlinear graph Laplacians, that provide sharper approximations for the directed graph cuts, resulting in higher clustering accuracy. Additionally, we leverage diffusion principles in the transition matrices under question, to effectively minimize the normalized cut between the partitions. The effectiveness of the introduced algorithms is validated through a series of experiments on synthetic and real-world graphs. The performance of the algorithms is measured both with metrics based on the quality of the graph-cut, and with metrics based on the accuracy of labels retrieval.

Author(s): Jacopo Palumbo (Università della Svizzera italiana, Politecnico di Milano), Dimosthenis Pasadakis (Università della Svizzera italiana), Albert-Jan Yzelman (Huawei), and Olaf Schenk (Università della Svizzera italiana)

Domain: Computational Methods and Applied Mathematics


SYCL and Block-Structured Grids: Performance Impact on Simulations of Complex Costal Ocean Domains

Developing the next generation of climate modelling tools to increase throughput and ensure performance portability is crucial. The choice of an underlying grid for ocean modelling, an important climate compartment, is difficult. The almost fractal-like boundaries of ocean domains and quickly changing bathymetry often make unstructured triangular meshes the preferred choice. Their construction is simple and adaptability high. An alternative is Block-Structured Grids (BSG), an unstructured collection of blocks, each containing a topologically structured mesh. We present the performance impact of utilizing BSG on diverse hardware employing SYCL and the current state of BSG generation. Computation on unstructured grids is associated with suboptimal performance due to irregular memory access patterns, whereas structured grids enable near-optimal efficiency. Our shallow water equations solver UTBEST exploits the regular memory access pattern on a per-block basis provided by the BSGs to enable performance gains. This impact is studied with respect to the block size and the influence of unstructured blocks. As SYCL allows programming for heterogeneous parallel computing in C++ on CPU, GPUs (and FPGAS), two major SYCL implementations (oneAPI, AdaptiveCpp) with multiple backends CUDA, OpenMP, OpenCL are evaluated.

Author(s): Jonathan Schmalfuß (University of Bayreuth), Daniel Zint (New York University), Sara Faghih-Naini (ECMWF), Julian Stahl (Friedrich-Alexander-Universität Erlangen-Nürnberg), Markus Büttner (University of Bayreuth), Roberto Grosso (Friedrich-Alexander-Universität Erlangen-Nürnberg), and Vadym Aizinger (University of Bayreuth)

Domain: Computational Methods and Applied Mathematics


Towards a Sparse BLAS Standard for Triangular Solvers on ARM Architectures

Sparse matrix computations are critical in scientific simulations and engineering, with the Sparse BLAS standard playing a growing role as a benchmark for performance and portability across diverse hardware, including x86 CPUs, GPUs, and ARM architectures. However, standardizing sparse matrix operations remains challenging due to differences in storage formats, accuracy requirements, and hardware-specific optimizations and will, therefore, require an iterative refinement process. Recent updates to the Arm Performance Libraries, such as the introduction of functions for sparse triangular solves and sparse vector operations, reflect significant industry efforts towards such standardization. This poster contributes to these ongoing efforts by highlighting the benefits of supernodal sparse matrix representations. Supernodes group columns with identical sparsity patterns into dense blocks, enabling efficient utilization of dense BLAS/LAPACK operations and thereby delivering substantial performance gains. We are collaborating with Arm to integrate supernodal representations into the Arm Performance Libraries, showcasing improved performance on ARM systems powered by state-of-the-art processors from the Ampere Altra Max, Azure Cobalt, and AWS Graviton series.

Author(s): Marco Julian Solanki (ETH Zürich), Lorenzo Migliari (Universitat Politècnica de Catalunya, University of Luxembourg), and Olaf Schenk (Università della Svizzera italiana)

Domain: Computational Methods and Applied Mathematics


Towards Exascale Particle-Mesh Methods: A Massively Parallel Performance Portable C++ Particle-in-Cell Framework

We showcase the Independent Parallel Particle Layer (IPPL), a performance portable C++ library for particle-in-cell methods. IPPL makes use of Kokkos (a performance portability abstraction layer), HeFFTe (a library for large scale FFTs), and MPI (Message Passing Interface) to deliver a portable, massively parallel toolkit for particle-mesh methods. IPPL supports simulations in one to six dimensions, mixed precision, and asynchronous execution in different execution spaces (e.g. CPUs and GPUs). One of the advantages of such a framework is the ability to be a test-bed for new algorithms which seek to improve runtime and efficiency of large scale simulations, for example in the beam and plasma physics communities. More concretely, we showcase the performance and usability of our library using a set of plasma physics mini-apps, collectively known as ALPINE, as well as a cosmological structure formation mini-app. Performance is shown on multiple architectures, such as the Nvidia Grace Hopper GPUs (GH200) and Nvidia A100 GPUs.

Author(s): Sonali Mayani (Paul Scherrer Institute, ETH Zurich), Matthias Frey (University of St Andrews), Sriramkrishnan Muralikrishnan (Forschungszentrum Jülich), Ryan Ammann (Paul Scherrer Institute, ETH Zurich), and Andreas Adelmann (Paul Scherrer Institute, ETH Zurich)

Domain: Computational Methods and Applied Mathematics