Papers - PASC 2025

The Proceedings of the PASC Conference are published in the Association for Computing Machinery’s (ACM’s) Digital Library. In recognition of the high quality of the PASC Conference papers track, the ACM continues to provide the proceedings as an Open Table of Contents (OpenTOC). This means that the definitive versions of PASC Conference papers are available to everyone at no charge to the author and without any pay-wall constraints for readers.

The OpenTOC for the PASC Conference is hosted on the ACM’s SIGHPC website. PASC papers can be accessed for free at: www.sighpc.org/for-our-community/acm-open-tocs.

The following papers will be presented as talks at PASC25, and will be accessible on the OpenTOC library post-conference.

Ab Initio Molecular Dynamics with Sequential Electron Addition as a Tool to Find Initial Reductive Solid Electrolyte Interface Formation Reactions

Lithium-ion batteries (LIBs) are an essential building block for modern energy storage. The solid-electrolyte interface (SEI) is an important component of LIBs, which acts as a passivation layer and prevents electrode and electrolyte from further decomposition and, thus, from capacity loss. In this work, we investigated the first steps of SEI formation initiated from commonly used electrolyte com- pounds ethylene carbonate (EC), diethyl carbonate (DEC), vinylele carbonate (VC), and 1,3-propane sultone (PS). Ab initio molecular dynamics (AIMD) simulations based on density functional theory was used to discover chemical reactions without chemical intu- ition. In order to simulate the reductive potential at the electrode, electrons are added sequentially to the system, leading to electro- reductive decomposition of the compounds. It was observed that this progressive electron addition leads to the formation of various reaction products, which can act as further reactants in subsequent reactions. Further reaction products were observed, some of which were reactions known from the literature, but also other, ener- getically less favorable structures were discovered. The molecular structures found in the AIMD simulations agree closely with experi- mental findings, validating the accuracy and reliability of the herein presented approach of sequentially adding electrons in molecular simulations.

Author(s): Tom-Luka Zwarg (Heinrich Heine University), Martin Gouverneur (Fraunhofer-Einrichtung Forschungsfertigung Batteriezelle FFB), and Jan Meisner (Heinrich Heine University)

Domain: Chemistry and Materials

Accelerated CNN-based Scans for Traces of Positive Selection

Positive natural selection is the driving force that enables species to survive and reproduce in their environment. Localizing traces of positive selection has practical applications in studying virus evolution and designing more effective drug treatments. State-of-the-art methods for the detection of positive selection combine Convolutional Neural Networks (CNN) with sliding-window algorithms to scan genomic sequences with high precision, but require prohibitively long execution times to process whole genomes with fine-grained resolution. We present an FPGA-accelerated system for efficiently scanning whole genomes with high granularity, implementing a quantized version of FAST-NN, a CNN that has been designed through a hardware-aware neural architecture search. FAST-NN employs a compact representation of genomic data as features, which eliminates potential I/O bottlenecks in hardware. Our accelerator architecture consists of a dedicated stage for each CNN layer in a pipelined datapath that integrates a specialized buffer design; this enables data reuse between overlapping sliding windows by leveraging the dilated convolutions in FAST-NN. A design point implemented onto an Alveo U250 accelerator card achieves comparable accuracy to FAST-NN, with a maximum reduction of only 2.2% due to quantization, while producing a classification outcome in each clock cycle at a frequency of 100MHz. Scanning the entire human genome (excluding the sex chromosomes), we observed between 19.51× and 28.61× faster processing than a PyTorch implementation on a 16-core CPU, and between 1.22× and 2.89× faster processing than a high-end GPU. The architecture is adaptable to other domains where CNNs are deployed in sliding-window algorithms for large-scale data processing.

Author(s): Sjoerd van den Belt (University of Twente), and Nikolaos Alachiotis (University of Twente)

Domain: Engineering

ASiMoV-CCS – A New Solver for Scalable and Extensible CFD & Combustion Simulations

Solving industry-relevant CFD and combustion problems is computationally extremely challenging. Collaborations between industry and academia can drive research into new techniques or algorithms that improve the computational performance of solvers, however there is a tension between keeping commercially sensitive intellectual property safe, and benefitting from open developments. We have designed ASiMoV-CCS, a new CFD and combustion solver, from the ground up to enable a full separation of open and proprietary source code by leveraging modern Fortran features, in particular submodules. This paper describes the functionality, design choices and implementation details of the solver, and validates the implementation using two widely studied test cases, the Lid-Driven Cavity and the Taylor-Green Vortex. The performance and scalability is evaluated on the UK’s national supercomputer ARCHER2 and demonstrates near-linear strong scaling up to 160 nodes (20,480 cores) for medium-sized test cases.

Author(s): Paul Bartholomew (EPCC, The University of Edinburgh), Alexei Borissov (EPCC, The University of Edinburgh), Christopher Goddard (Rolls-Royce Plc), Shrutee Lakshminarasimha (Infosys), Sébastien Lemaire (EPCC, The University of Edinburgh), Justs Zarins (EPCC, The University of Edinburgh), and Michèle Weiland (EPCC, The University of Edinburgh)

Domain: Engineering

CAFE AU LAIT: Compute-Aware Federated Augmented Low-Rank AI Training

Federated finetuning is essential for unlocking the knowledge embedded in pretrained Large Language Models (LLMs) when data is distributed across clients. Unlike single-institution finetuning, federated finetuning enables collaboration across decentralized datasets while preserving data privacy. To address the high computing costs of LLM training and improve energy efficiency in Federated Learning (FL), Low-Rank Adaptation (LoRA) has gained popularity due to its reduced number of trainable parameters. However, this approach assumes all clients have sufficient computing resources, which is often unrealistic due to the heterogeneity of resources across clients. While some clients may access powerful GPUs, others have limited or no such resources. Federated finetuning using synthetic data allows participation without local LLM training but introduces a performance gap compared to local updates. To address this, we propose a novel two-stage algorithm leveraging the storage and computing power of a strong server. In the first stage, resource-constrained clients generate synthetic data under the coordination of the strong server, which is stored on the strong server. In the second stage, the strong server uses this synthetic data on behalf of constrained clients to perform federated LoRA finetuning alongside clients with sufficient resources. This ensures participation from all clients. Experimental results demonstrate that incorporating local updates from even a small fraction of clients improves performance compared to using synthetic data for all clients. Additionally, we integrate the Gaussian mechanism in both stages to ensure client-level differential privacy.

Author(s): Jiayi Wang (Oak Ridge National Laboratory), John Gounley (Oak Ridge National Laboratory), and Heidi Hanson (Oak Ridge National Laboratory)

Domain: Computational Methods and Applied Mathematics

Data Assimilation for Robust UQ Within Agent-Based Simulation on HPC Systems

Agent-based simulation provide a powerful tool for in silico system modeling. However, these simulations do not provide built-in methods for uncertainty quantification (UQ). A typical approach to UQ within these types of models is to run multiple realizations of the model, then compute aggregate statistics upon completion. This approach is limited due to the compute time required for a solution. When faced with an emerging biothreat, public health decisions need to be made quickly and solutions for integrating near real-time data with analytic tools are needed. We propose an integrated Bayesian UQ framework for agent-based models based on sequential Monte Carlo sampling. Given streaming or static data about the evolution of an emerging pathogen this Bayesian framework provides a distribution over the parameters governing the spread of a disease through a population, yielding accurate estimates of the spread of a disease to public health agencies seeking to abate the spread. By coupling agent-based simulations with Bayesian modeling in a data assimilation, our proposed framework provides a powerful tool for modeling dynamical systems in silico. We propose a method which reduces model error and provides a range of realistic possible outcomes. Our method addresses two primary limitations of ABMs: lack of UQ and inability to assimilate data. Our proposed framework combines the flexibility of an agent-based model with the rigorous UQ provided by the Bayesian paradigm in a workflow which scales well to HPC systems. We provide algorithmic details and results on a simulated outbreak with both static and streaming data.

Author(s): Adam Spannaus (Oak Ridge National Laboratory), Sifat Moon (Oak Ridge National Laboratory), John Gounley (Oak Ridge National Laboratory), and Heidi Hanson (Oak Ridge National Laboratory)

Domain: Computational Methods and Applied Mathematics

An Efficient GPU Parallelization Strategy for Binary Collisions in Particle-In-Cell Plasma Simulations

The Particle-In-Cell (PIC) algorithm coupled with binary collision modules is a widely applicable method to simulate plasmas over a broad range of regimes (from the collisionless kinetic regime to the collisional regime). While several popular PIC codes implement binary collision modules, their performance on GPUs can be constrained by the default parallelization strategy, which assigns one GPU thread per simulation cell. This approach can underutilize GPU resources for simulations with many macroparticles per cell, and relatively few cells per GPU. To address this limitation, we propose an alternative parallelization strategy that instead GPU distributes threads based on independent pairs of colliding particles. Our proposed strategy shows a speed improvement of up to $\sim 4 \times$ for cases with relatively few cells per GPU and a similar performance otherwise.

Author(s): Remi Lehe (Lawrence Berkeley National Laboratory), Muhammad Haseeb (Lawrence Berkeley National Laboratory), Justin Angus (Lawrence Livermore National Laboratory), David Grote (Lawrence Livermore National Laboratory), Roelof Groenewald (TAE), Arianna Formenti (Lawrence Berkeley National Laboratory), Axel Huebl (Lawrence Berkeley National Laboratory), Jack Deslippe (Lawrence Berkeley National Laboratory), and Jean-Luc Vay (Lawrence Berkeley National Laboratory)

Domain: Computational Methods and Applied Mathematics

HiPerRAG: High-Performance Retrieval Augmented Generation for Scientific Insights

The volume of scientific literature is growing exponentially, leading to underutilized discoveries, duplicated efforts, and limited cross-disciplinary collaboration. Retrieval-Augmented Generation (RAG) offers a way to assist scientists by improving the factuality of Large Language Models (LLMs) in processing this influx of information. However, scaling RAG to handle millions of articles introduces significant challenges, including the high computational costs associated with parsing documents and embedding scientific knowledge, as well as the algorithmic complexity of aligning these representations with the nuanced semantics of scientific content. To address these issues, we introduce HiPerRAG, a RAG workflow powered by high performance computing (HPC) to index and retrieve knowledge from more than 3.6 million scientific articles. At its core are Oreo, a high-throughput model for multimodal document parsing, and ColTrast, a query-aware encoder fine-tuning algorithm that enhances retrieval accuracy by using contrastive learning and late-interaction techniques. HiPerRAG delivers robust performance on existing scientific question answering (Q/A) benchmarks and two new benchmarks introduced in this work, achieving 90% accuracy on SciQ and 76% on PubMedQA—outperforming both domain-specific models like PubMedGPT and commercial LLMs such as GPT-4. Scaling to thousands of GPUs on the Polaris, Sunspot, and Frontier supercomputers, HiPerRAG delivers million document-scale RAG workflows for unifying scientific knowledge and fostering interdisciplinary innovation.

Author(s): Ozan Gokdemir (University of Chicago, Argonne National Laboratory), Carlo Siebenschuh (University of Chicago, Argonne National Laboratory), Alexander Brace (University of Chicago, Argonne National Laboratory), Azton Wells (Argonne National Laboratory), Brian Hsu (Argonne National Laboratory, University of Chicago), Kyle Hippe (University of Chicago, Argonne National Laboratory), Priyanka Setty (University of Chicago, Argonne National Laboratory), Aswathy Ajith (University of Chicago), J. Gregory Pauloski (University of Chicago), Varuni Sastry (Argonne National Laboratory), Sam Foreman (Argonne National Laboratory), Huihuo Zheng (Argonne National Laboratory), Heng Ma (Argonne National Laboratory), Bharat Kale (Argonne National Laboratory), Nicholas Chia (Argonne National Laboratory), Thomas Gibbs (NVIDIA Inc.), Michael Papka (Argonne National Laboratory, University of Illinois Chicago), Thomas Brettin (Argonne National Laboratory), Francis Alexander (Argonne National Laboratory), Anima Anandkumar (California Institute of Technology), Ian Foster (Argonne National Laboratory, University of Chicago), Rick Stevens (Argonne National Laboratory), Venkatram Vishwanath (Argonne National Laboratory), and Arvind Ramanathan (Argonne National Laboratory, University of Chicago)

Domain: Life Sciences

iMagine: AI-Powered Image Data Analysis in Aquatic Science

The iMagine platform leverages AI-driven tools to enhance the analysis of imaging data in marine and freshwater research, contributing to the study of ocean, sea, coastal, and inland water health. Connected to the European Open Science Cloud (EOSC), it enables the development, training, and deployment of AI models, collaborating with twelve aquatic science use cases to provide valuable insights. The platform refines existing solutions from data acquisition and preprocessing to provide trained models as a service for users. iMagine outlines various AI-based tools, techniques, and methodologies for aquatic science image processing, ensuring consistency and accuracy through clear annotation guidelines and verified tools. The preparation of training datasets, along with their metadata, ensures FAIRness and effective publishing in data repositories. Deep learning models, such as convolutional neural networks, are used for classification, object detection, and segmentation, with performance metrics and evaluation tools ensuring reproducibility and transparency. AI model drift and data FAIRness are also explored, alongside case studies on AI challenges in aquatic sciences. By implementing these practices, iMagine enhances data quality, promotes reproducibility, and fosters scientific progress in aquatic research while collaborating with projects like AI4EOSC and Blue-Cloud. The platform allows users to develop, train, share, and serve AI models on its marketplace. The AI models are encapsulated as Docker images and integrated with REST APIs to ensure their reproducibility. Researchers benefit from the platform’s flexibility, which enables seamless execution of these Docker containers on both federated clouds of the European Grid Infrastructure (EGI) and High-Performance Computing (HPC) infrastructures.

Author(s): Elnaz Azmi (Karlsruhe Institute of Technology (KIT)), Khadijeh Alibabaei (Karlsruhe Institute of Technology (KIT)), Valentin Kozlov (Karlsruhe Institute of Technology (KIT)), Álvaro López García (Instituto de Fisica de Cantabria (IFCA), CSIC-UC), Dick Schaap (Mariene Informatie Service BV (MARIS)), and Gergely Sipos (EGI Foundation)

Domain: Climate, Weather and Earth Sciences

In-Silico Predictions of Drug Resistance in Lung Cancers with Egfr Mutation

Cancer treatment is often hindered by the emergence of drug resistance, frequently driven by novel mutations in oncogenes or drug-targeted pathways. Predicting resistance mechanisms is critical for informing therapeutic strategies and improving patient outcomes. Here, we present a computational workflow that leverages high-performance computing (HPC) resources to systematically evaluate the impact of emerging mutations on drug efficacy. Our workflow integrates deep learning structure prediction, molecular dynamics simulations, molecular docking, and binding predictions of known compounds to predict resistance mechanisms and propose alternative therapeutic options. We also explore quantum chemical calculations as a tool to complement experimental validations to better understand the binding preferences between different protein forms.

Author(s): Ibrahim Imam (University of Kentucky), Usman Abbas (University of Kentucky), Christian Gosser (University of Kentucky), Christine Brainson (University of Kentucky), WA de Jong (Lawrence Berkeley National Laboratory), Xiaoqi Liu (University of Kentucky), Hunter Moseley (University of Kentucky), Shao Qing (University of Kentucky), Shulin Zhang (University of Kentucky), Ralph Zinner (University of Kentucky), and Sally Ellingson (University of Kentucky)

Domain: Life Sciences

Leveraging Large Language Models for Code Translation and Software Development in Scientific Computing

The emergence of foundational models and generative artificial intelligence (GenAI) is poised to transform productivity in scientific computing, especially in code development, refactoring, and translating from one programming language to another. However, because the output of GenAI cannot be guaranteed to be correct, manual intervention remains necessary. Some of this intervention can be automated through task-specific tools, alongside additional methodologies for correctness verification and effective prompt development. We explored the application of GenAI in assisting with code translation, language interoperability, and codebase inspection within a legacy Fortran codebase used to simulate particle interactions at the Large Hadron Collider (LHC). In the process, we developed a tool, CodeScribe, which combines prompt engineering with user supervision to establish an efficient process for code conversion. In this paper, we demonstrate how CodeScribe assists in converting Fortran code to C++, generating Fortran-C APIs for integrating legacy systems with modern C++ libraries, and providing developer support for code organization and algorithm implementation. We also address the challenges of AI-driven code translation and highlight its benefits for enhancing productivity in scientific computing workflows.

Author(s): Akash Dhruv (Argonne National Laboratory), and Anshu Dubey (Argonne National Laboratory)

Domain: Engineering

OpenACC and OpenMP-Accelerated Fortran/C++ Gyrokinetic Fusion Code GENE-X for Heterogeneous Architectures

Achieving net-positive fusion energy and its commercialization requires not only engineering marvels but also state-of-the-art, massively parallel codes that can handle reactor-scale simulations. The GENE-X code is a global continuum gyrokinetic turbulence code designed to predict energy confinement and heat exhaust for future fusion reactors. GENE-X is capable of simulating plasma turbulence from the core region to the wall of a magnetic confinement fusion (MCF) device. Originally written in Fortran 2008, GENE-X leverages MPI+OpenMP for parallel computing. In this paper, we augment the Fortran-based compute operators in GENE-X to a C++-17 layer exposing them to a wide array of C++-compatible tools. Here we focus on offloading the augmented operators to GPUs via directive-based programming models such as OpenACC and OpenMP offload. The performance of GENE-X is comprehensively characterized, e.g., by roofline analysis on a single GPU and scaling analysis on multi-GPUs. The major compute operators achieve significant performance improvements, shifting the bottleneck to inter-GPU communications. We discuss additional opportunities to enhance further the performance, such as by reducing memory traffic and improving memory utilization efficiency.

Author(s): Jordy Trilaksono (Max Planck Institute for Plasma Physics), Philipp Ulbl (Max Planck Institute for Plasma Physics), Jeremy Williams (KTH Royal Institute of Technology), Carl-Martin Pfeiler (Max Planck Institute for Plasma Physics), Marion Finkbeiner (Max Planck Institute for Plasma Physics), Tilman Dannert (Max Planck Computing and Data Facility), Erwin Laure (Max Planck Computing and Data Facility), Stefano Markidis (KTH Royal Institute of Technology), and Frank Jenko (Max Planck Institute for Plasma Physics)

Domain: Physics

Performance Analysis of an Efficient Algorithm for Feature Extraction from Large Scale Meteorological Data Stores

In recent years, Numerical Weather Prediction (NWP) has undergone a major shift with the rapid move towards kilometer-scale global weather forecasts and the emergence of AI-based forecasting models. Together, these trends will contribute to a significant increase in the daily data volume generated by NWP models. Ensuring efficient and timely access to this growing data requires innovative data extraction techniques. As an alternative to traditional data extraction algorithms, the European Centre for Medium-Range Weather Forecasts (ECMWF) has introduced the Polytope feature extraction algorithm. This algorithm is designed to reduce data transfer between systems to a bare minimum by allowing the extraction of non-orthogonal shapes of data. In this paper, we evaluate Polytope’s suitability as a replacement for current extraction mechanisms in operational weather forecasting. We first adapt the Polytope algorithm to operate on ECMWF’s FDB (Fields DataBase) meteorological data stores, before evaluating this integrated system’s performance and scalability on real-time operational data. Our analysis shows that the low overhead of running the Polytope algorithm, which is in the order of a few seconds at most, is far outweighed by the benefits of significantly reducing the size of the extracted data by up to several orders of magnitude compared to traditional bounding box methods. Our ensuing discussion focuses on quantifying the strengths and limitations of each individual part of the system to identify potential bottlenecks and areas for future improvement.

Author(s): Mathilde Leuridan (ECMWF, University of Cologne), Christopher Bradley (ECMWF), James Hawkes (ECMWF), Tiago Quintino (ECMWF), and Martin Schultz (Forschungszentrum Jülich, University of Cologne)

Domain: Climate, Weather and Earth Sciences

Scalable Bayesian Inference of Large Simulations via Asynchronous Prefetching Multilevel Delayed Acceptance

Bayesian inference enables greater scientific insight into simulation models, determining model parameters and meaningful confidence regions from observed data. With hierarchical methods like Multilevel Delayed Acceptance (MLDA) drastically reducing compute cost, sampling Bayesian posteriors for computationally intensive models becomes increasingly feasible. Pushing MLDA towards the strong scaling regime (i.e. high compute resources, short time-to-solution) remains a challenge: Even though MLDA only requires a moderate number of high-accuracy simulation runs, it inherits the sequential chain structure and need for chain burn-in from Markov chain Monte Carlo (MCMC). We present fully asynchronous parallel prefetching for MLDA, adding an axis of scalability complementary to forward model parallelization and parallel chains. A thorough scaling analysis demonstrates that prefetching is advantageous in strong scaling scenarios. We investigate the behavior of prefetching MLDA in small-scale test problems. A large-scale geophysics application, namely parameter identification for non-linear earthquake modelling, highlights interaction with coarse-level quality and model scalability.

Author(s): Maximilian Kruse (Karlsruhe Institute of Technology), Zihua Niu (Ludwig Maximilian University of Munich), Sebastian Wolf (Technical University of Munich), Mikkel Lykkegaard (Danish Technological Institute), Michael Bader (Technical University of Munich), Alice-Agnes Gabriel (University of California San Diego, Ludwig Maximilian University of Munich), and Linus Seelinger (Karlsruhe Institute of Technology)

Domain: Computational Methods and Applied Mathematics

Scalable Genomic Context Analysis with GCsnap2 on HPC Clusters

GCsnap2 Cluster is a scalable, high performance tool for genomic context analysis, developed to overcome the limitations of its predecessor, GCsnap1 Desktop. Leveraging distributed computing with mpi4py.futures, GCsnap2 Cluster achieved a 22× improvement in execution time and can now perform genomic context analysis for hundreds of thousands of input sequences in HPC clusters. Its modular architecture enables the creation of task-specific workflows and flexible deployment in various computational environments, making it well suited for bioinformatics studies of large-scale datasets. This work highlights the potential for applying similar approaches to solve scalability challenges in other scientific domains that rely on large-scale data analysis pipelines.

Author(s): Reto Krummenacher (University of Basel), Osman Seckin Simsek (University of Basel), Michèle Leemann (University of Basel, Swiss Institute of Bioinformatics), Leila Alexander (University of Basel, Swiss Institute of Bioinformatics), Torsten Schwede (University of Basel, Swiss Institute of Bioinformatics), Florina Ciorba (University of Basel), and Joana Pereira (University of Basel, Swiss Institute of Bioinformatics)

Domain: Computational Methods and Applied Mathematics

Space-Time Parallel Scaling of Parareal with a Physics-Informed Fourier Neural Operator Coarse Propagator Applied to the Black-Scholes Equation

Iterative parallel-in-time algorithms like Parareal can extend scaling beyond the saturation of purely spatial parallelization when solving initial value problems. However, they require the user to build coarse models to handle the unavoidable serial transport of information in time. This is a time-consuming and difficult process since there is still limited theoretical insight into what constitutes a good and efficient coarse model. Novel approaches from machine learning to solve differential equations could provide a more generic way to find coarse-level models for parallel-in-time algorithms. This paper demonstrates that a physics-informed Fourier Neural Operator (PINO) is an effective coarse model for the parallelization in time of the two-asset Black-Scholes equation using Parareal. We demonstrate that PINO-Parareal converges as fast as a bespoke numerical coarse model and that, in combination with spatial parallelization by domain decomposition, it provides better overall speedup than both purely spatial parallelization and space-time parallelization with a numerical coarse propagator.

Author(s): Abdul Qadir Ibrahim (Hamburg University of Technology), Sebastian Götschel (Hamburg University of Technology), and Daniel Ruprecht (Hamburg University of Technology)

Domain: Computational Methods and Applied Mathematics

Tetrahedral and Voxel Finite Elements with Orthogonal Discontinuous Basis Functions for Explicit Scalar Wave Propagation Analysis

Scalar wave propagation analysis is one of the fundamental types of analysis used in many fields and has been the subject of much research. As measurement data accumulates, the need for faster and more accurate analysis using more detailed models has arisen. This paper proposes tetrahedral and voxel finite elements based on orthogonal discontinuous functions that enable fast and accurate analysis. Through accuracy and cost analysis on recent computers with actual implementations, we show that the cost of analysis can be significantly reduced and that faster and more accurate wave analysis can be expected as shown in the application example. In addition, many problems lead to operations with a large number of relatively small matrix-vector products like the problem in this paper. This paper showed that such computation can be handled efficiently by implementations taking advantage of recent computers, and is expected to provide insight for problems with similar operations.

Author(s): Kohei Fujita (The University of Tokyo, RIKEN), Tsuyoshi Ichimura (The University of Tokyo, RIKEN), Muneo Hori (Japan Agency for Marine-Earth Science and Technology), Lalith Maddegedara (The University of Tokyo), and Mizuki Kusumoto (The University of Tokyo)

Domain: Engineering

Toward More Usable, Reproducible, and Sustainable Scientific Software: The Impact of User-Centered Design in Research Software Development

Integrating user-centered approaches and methodologies is essential for advancing usability, reproducibility, and sustainability in scientific software. Scientific tools often prioritize technical functionality, however, creating barriers to adoption and workflow integration, especially in interdisciplinary collaborations where research and software development teams may not share the same technical background as domain scientists and end users. At the National Center for Supercomputing Applications at the University of Illinois Urbana-Champaign, and specifically as a part of the Molecule Maker Lab Institute, we address these challenges by embedding user-centered design into the development of chemistry-focused, open-source software. Through case studies from the AlphaSynthesis suite, we demonstrate how methodologies such as user discovery sessions, iterative design, and usability testing can be used to address domain scientist needs and workflows. Consistent design systems and interaction patterns enhance reproducibility, allowing scientists to replicate results and workflows effectively, while scalable components and community engagement strategies promote sustainability and long-term adaptability within the research ecosystem. This paper highlights the role of user-centered design in bridging the gap between computer science and domain science, advocating for its broader adoption within the research software development space to create impactful, enduring tools for interdisciplinary collaboration and innovation.

Author(s): Katherine Arneson (University of Illinois Urbana-Champaign), Lijiang Fu (University of Illinois Urbana-Champaign), and Lisa Gatzke (University of Illinois Urbana-Champaign)

Domain: Chemistry and Materials

Towards Automated Algebraic Multigrid Preconditioner Design Using Genetic Programming for Large-Scale Laser Beam Welding Simulations

Multigrid methods are asymptotically optimal algorithms ideal for large-scale simulations. But, they require making numerous algorithmic choices that significantly influence their efficiency. Unlike recent approaches that learn optimal multigrid components using machine learning techniques, we adopt a complementary strategy here, employing evolutionary algorithms to construct efficient multigrid cycles from available individual components. This technology is applied to finite element simulations of the laser beam welding process. The thermo-elastic behavior is described by a coupled system of time-dependent thermo-elasticity equations, leading to nonlinear and ill-conditioned systems. The nonlinearity is addressed using Newton’s method, and iterative solvers are accelerated with an algebraic multigrid (AMG) preconditioner using hypre BoomerAMG interfaced via PETSc. This is applied as a monolithic solver for the coupled equations. To further enhance solver efficiency, flexible AMG cycles are introduced, extending traditional cycle types with level-specific smoothing sequences and non-recursive cycling patterns. These are automatically generated using genetic programming, guided by a context-free grammar containing AMG rules. Numerical experiments demonstrate the potential of these approaches to improve solver performance in large-scale laser beam welding simulations.

Author(s): Dinesh Parthasarathy (Friedrich-Alexander-Universität Erlangen-Nürnberg), Tommaso Bevilacqua (Universität zu Köln), Martin Lanser (Universität zu Köln), Axel Klawonn (Universität zu Köln), and Harald Köstler (Friedrich-Alexander-Universität Erlangen-Nürnberg)

Domain: Computational Methods and Applied Mathematics