Bulletin of the American Physical Society

77th Annual Meeting of the Division of Fluid Dynamics

Sunday–Tuesday, November 24–26, 2024; Salt Lake City, Utah

Session L16: CFD: HPC	Hide Abstracts
Chair: Spencer Bryngelson, Georgia Institute of Technology Room: 155 F

Monday, November 25, 2024 8:00AM - 8:13AM	L16.00001: Performance evaluation of the scalable asynchronous DG method Konduri Aditya, Vidyesh R Dapse, Shubham Kumar Goswami The scalability of time-dependent partial differential equation (PDE) solvers based on the discontinuous Galerkin (DG) method at extreme scales is significantly affected by data communication/synchronization across processing elements (PEs). To overcome such challenges, an asynchronous DG (ADG) method has been recently proposed that can provide high-order accurate solutions with relaxed communication/synchronization at a mathematical level. This study focuses on evaluating the performance of the ADG method in solving the compressible flow problems. The method is implemented in the open-source finite element library, deal.II, incorporating a communication-avoiding algorithm. The results show the accuracy limitations of standard DG schemes implemented with communication avoidance. Furthermore, the effectiveness of the newly developed asynchrony-tolerant fluxes in recovering accuracy is demonstrated. Strong scaling results are obtained for both the synchronous and asynchronous DG solvers, demonstrating a speedup of up to 80% with the ADG method at 9216 cores. The results highlight the potential benefits of the asynchronous approach for the development of accurate and scalable PDE solvers, paving the way for simulations of complex problems on massively parallel supercomputers.
Monday, November 25, 2024 8:13AM - 8:26AM	L16.00002: A customised implementation of implicit high-order compact finite difference schemes in Xcompact3d targeting heterogeneous architectures Semih Akkurt, Sebastien Lemaire, Paul Bartholomew, Jacques Xing, Sylvain Laizet Implicit high-order finite difference schemes have significant advantages over low-order schemes when simulating turbulent flows. However, they result in banded tridiagonal systems that are hard to solve in distributed memory environments. The authors have developed a new algorithm for solving tridiagonal systems on distributed heterogeneous architectures. The customised algorithm utilises a specialist data structure that results in a linear data access pattern for maximising the bandwidth throughput, enables vectorisation on CPUs and thread level parallelism on GPUs in all spatial directions, and reduces the data movements between chip and main memory via a combination of cache blocking and fusion strategies. Additionally, the customised algorithm takes advantage of the diagonal dominance of the tridiagonal systems resulting from high-order implicit schemes, by reducing the communication requirements significantly, with pseudo-local communications only between neighbouring subdomains. This new algorithm has been implemented in Xcompact3d, a suite of flow solvers dedicated to the study of turbulent flows. The potential and performance of the new algorithm will be shown with simulations of turbulent flows performed with Xcompact3d on CPU and GPUs.
Monday, November 25, 2024 8:26AM - 8:39AM	L16.00003: Performance of a high-order finite difference compressible flow solver in Fortran on next-generation heterogeneous computing architectures Nicholas Arnold-Medabalimi, Britton J Olson Next-generation high-performance computing architecture will create new challenges in the use of current computational fluid dynamics codes. Code must be adapted to new systems while maintaining capability on existing systems. The efforts made to prepare a Fortran-based high-order finite difference hydrodynamic code for the upcoming HPC systems are discussed. Focus is placed on those specific to Fortran implementations, including OpenMP support, loop/memory structure requirements, and interoperability-based strategies. Porting strategies on upcoming vs. current systems and methods used to minimize code versioning for different systems are presented. A scaling study on current and upcoming systems applying the immersed boundary method to a supersonic flow is conducted. Achieved scaling and throughput are presented and remaining bottlenecks in performance are discussed.
Monday, November 25, 2024 8:39AM - 8:52AM	L16.00004: Development and Validation of a Highly Scalable Finite-Volume Unstructured LES Solver for Wind Farm Flows Radouan Boukharfane The expansion of wind energy projects, both in number and scale, alongside advancements in high-performance computing, necessitates the use of highly efficient, parallel simulation tools to model the complex flow fields around entire wind farms. Such simulations involve vast numbers of degrees of freedom and can provide crucial insights into farm-scale physical phenomena. In this work, we present a parallel implementation of the Actuator Line Method (ALM) integrated into a massively parallel finite volume solver. This implementation enables high-fidelity Large-Eddy Simulations (LES) of wind turbines and rotor-wake interactions on unstructured grids comprising up to billions of cells, with a focus on optimal workload balancing and minimal turnaround time. The effectiveness and accuracy of the LES/ALM technique in our solver have been validated through various test cases, demonstrating promising performance.
Monday, November 25, 2024 8:52AM - 9:05AM	L16.00005: Spectral Adaptivity for a High-Performance Kinetic Solver Alexander A Hrabski, Oleksandr Koshkarov, Robert M Chiodi, Peter T Brady, Salomon Janhunen, Cale Harnish, Oleksandr Chapurin, Ryan T Wollaeger, Zach Jibben, Gian Luca Delzanno, Daniel Livescu Flow systems with kinetic physics are high-dimensional, requiring efficient numerical methods to simulate a broad range of scales. A relevant treatment of the particle distribution function expands the velocity space in terms of Asymmetrically Weighted Hermite (AWH) bases, which completely describe the continuum limit in just a few low-order terms at each spatial point. While this method enables large-scale simulations of flows containing both near-continuum and kinetic regions, the efficiency of these low-order expansions must be balanced with the need for higher-order terms describing the kinetic physics of interest. We address this challenge with a novel spectral adaptivity scheme implemented for our high-performance kinetic model, MASS-APP. In the context of the Vlasov-Maxwell equations, we develop an adaptivity scheme that identifies the regions where higher-order expansions are needed to resolve the distribution function to a prescribed error estimate. While our variable-length AWH expansion allows for considerable computational savings, changes in local macroscopic velocity and temperature reduce our efficiency for near-Maxwellian problems. By additionally scaling the bases to capture the local velocity and temperature, we can capture the dynamics with fewer bases. We test our method via simulations of Landau damping and large-scale plasma turbulence. Time permitting, we apply our method to a kinetic study of the Rayleigh-Taylor instability via a collisional Boltzmann equation.
Monday, November 25, 2024 9:05AM - 9:18AM	L16.00006: GPU-Enabled LICA Fluid Dynamic Solver for Large Scale Semiconductor Fabrication Plant Flow Simulation Ki-Ha Kim, JunHwan Lee, Dongjin Lee, Sehyeong Oh, Jaehee Chang, Joonseon Jeong, Dongjin Ham, Seungwon Lee, Hyun Chul Lee We present the GPU-LICA solver, GPU-enabled scalable solver for largescale flow simulation in complex geometries for industrial applications. Our flow solver is parallelized based on a hybrid approach of utilizing MPI, CUDA, and OpenACC for multi-GPU computational environments. Core components of the flow solver are matrix solvers for solution of the momentum and pressure equations and are parallelized by utilizing MPI and CUDA. The momentum equation is solved by a highly parallel and scalable tridiagonal matrix solver, PaScaL TDMA, which requires significantly lower overheads of all-to-all communications in comparison with other existing tridiagonal solvers. The pressure equation is solved by a newly developed Discrete Cosine Transformation solver, which significantly accelerates all-to-all communications by optimizing MPI topologies for taking advantage of high-bandwidth NVLinks. The GPU-LICA solver demonstrates excellent scalability, achieving 96% parallel efficiency for the 8.5 billion Degrees of Freedom (DoF) case in strong scalability and 97.5% parallel efficiency for the 134 million DoF per GPU case in weak scalability. The GPU-LICA solver demonstrates its ability to simulate flow with complex geometry and facilities within a large-scale semiconductor fabrication plant at Re=66700. The strong scalability results demonstrate an 85.5% parallel efficiency with 128 GPUs, which is practically important as it enables the completion of a flow circulation simulation within a day.
Monday, November 25, 2024 9:18AM - 9:31AM	L16.00007: Modernized and Parallelized Mapped Legendre Spectral Method Code for Unbounded Vortical Flow Simulations Sangjoon Lee, Jinge Wang, Philip S Marcus A modernized and parallelized flow simulation code package is presented, along with its scaling performance results, using several example rotating flow problems in an unbounded domain. The code package, MLegS—standing for Mapped Legendre Spectral expansion that accounts for the domain's unboundedness—employs associated Legendre functions with an algebraic mapping as the basis elements for the radial expansion of an arbitrary field quantity. Based on the numerical algorithm proposed by Matsushima & Marcus (J. Comput. Phys., vol. 137, no. 2, 1997, 321-345), MLegS incorporates scalable multiprocessing interfaces for high-performance computing, supported by the utilization of temporary multiple-precision arithmetic, which enables access to a larger number of basis elements with a higher normalization factor than is attainable with typical double-precision computing. Weak or strong scaling test results are provided for sample problems, and the potential extensive applications of the code package are discussed.
Monday, November 25, 2024 9:31AM - 9:44AM	L16.00008: Introducing FireX: A High-Performance Computing Branch of the NIST Fire Dynamics Simulator Randall J McDermott, Chandan Paul, Marcos Vanella, Jason Floyd The Fire Dynamics Simulator (FDS) is a low-Mach, large-eddy simulation code specifically tuned for fire protection engineering applications. In this talk, we will explore recent developments in FDS to utilize GPU (graphics processing unit) acceleration and scalable distributed memory computing. Modern computing hardware is driving a shift back to specialized coding in order to take advantage of the potential acceleration offered by large numbers of GPU cores. All of the leadership class supercomputers in the U.S. now rely on GPU hardware. A new development branch of FDS, called FireX, uses the PETSc (Portable Extensible Toolkit for Scientific Computation) li- brary developed by Argonne National Laboratory to solve the pressure Pois- son equation and Sundials (SUite of Nonlinear and DIfferential/ALgebraic Equation Solvers) developed by Lawrence Livermore National Laboratory to solve stiff systems of ordinary differential equations for complex chemi- cal kinetics. Both solvers may be configured to run on GPU. Preliminary tests of the code running on Summit at Oak Ridge National Laboratory (NVIDIA GPU) are presented with lessons learned about oversubscription of MPI processes to the GPU cores. Finally, we will present results from FireX with detailed chemistry applied to the Smyth slot burner to predict carbon monoxide concentrations and the University of Maryland line burner to model flame extinction.
Monday, November 25, 2024 9:44AM - 9:57AM	L16.00009: Development of a ML-enabled high-order Discontinuous Galerkin solver for compressible flow simulations Beverley K Yeo, Matthias Ihme Modern high-performance computing and advanced algorithms have presented several opportunities for the advancement of engineering. Despite this, fluid dynamics simulations are not yet fully leveraging the emerging opportunities for enabling the integration of CFD with machine-learning on modern hardware architectures such as massively-parallel GPUs. Thus, we present a new Discontinuous Galerkin (DG) solver built using the ML-enabled JAX library. The DG method offers high order of accuracy as well as high arithmetic intensity, which optimally maps on the GPU hardware, thereby maximizing energy efficiency for tightly integrating scientific computing and machine-learning tasks. We demonstrate scalability of the resulting JAX-based DG solver in applications to different flow problems.

About APS

The American Physical Society (APS) is a non-profit membership organization working to advance the knowledge of physics.

Headquarters 1 Physics Ellipse, College Park, MD 20740-3844 (301) 209-3200
Editorial Office 100 Motor Pkwy, Suite 110, Hauppauge, NY 11788 (631) 591-4000
Office of Public Affairs 529 14th St NW, Suite 1050, Washington, D.C. 20045-2001 (202) 662-8700

Bulletin of the American Physical Society

77th Annual Meeting of the Division of Fluid Dynamics

Sunday–Tuesday, November 24–26, 2024; Salt Lake City, Utah

Session L16: CFD: HPC

Follow Us

Engage

My APS

Information for

About APS