Bulletin of the American Physical Society

76th Annual Meeting of the Division of Fluid Dynamics

Sunday–Tuesday, November 19–21, 2023; Washington, DC

Session T15: CFD: HPC I	Hide Abstracts
Chair: Roberto Verzicco, Univ of Roma Tor Vergata Room: 144C

Monday, November 20, 2023 4:25PM - 4:38PM	T15.00001: MASS-APP: A High-Performance Collisionless Plasma Simulation Tool Robert M Chiodi, Peter T Brady, Cale Harnish, Zach Jibben, Oleksandr Koshkarov, Ryan Wollaeger, Svetlana Tokareva, Chris L Fryer, Gian Luca Delzanno, Daniel Livescu Modern supercomputers leverage a mix of CPUs and GPUs to maximize available computing power, and this heterogeneity must be specifically accounted for when developing new software. With current exascale computers, such as LLNL's El Capitan and OLCF's Frontier, deriving the majority of their FLOPs from GPUs, and even more tightly-integrated systems such as NVIDIA's Grace Hopper superchip starting to enter the market, this is becoming increasingly important. In this talk, we will present our use of LANL's FleCSI library, an adaptable infrastructure created to facilitate asynchronous multiphysics applications, to develop a novel plasma dynamics simulation tool targeted for deployment on exascale machines. In it, we navigate the heterogeneous computing environment using Kokkos. The impact of performing an asynchronous computation in place of a sequential MPI-based simulation will be discussed, along with realized speedups from offloading work to GPUs. Results from large scale simulations relevant to space weather will demonstrate the capabilities of the new simulation tool.
Monday, November 20, 2023 4:38PM - 4:51PM	T15.00002: Light Exascale Application (LEA): an exascale code for the simulation of compressible turbulent flows Benjamin Dalman, Ivan Bermejo-Moreno This work presents the development of the Light Exascale Application (LEA), a compressible flow solver built to conduct direct-numerical simulation (DNS) and large-eddy simulation (LES) of turbulent flows. LEA is written in the C++ programming language, and uses the Kokkos framework for intra-node parallelism with MPI for inter-node parallelism. These features allow for LEA to achieve performance portability across modern heterogeneous platforms. Software design principles were used from the inception of LEA, enabling high levels of code flexibility and modularity. Results from benchmarking tests across several modern CPU- and GPU-based high-performance computing clusters are presented, considering weak and strong scaling. Tests are conducted with no platform-specific modifications or optimizations to the code base. Verification and validation studies for LEA are presented for a Taylor-Green vortex, a zero-pressure gradient flat plate turbulent boundary layer, and a compressible turbulent channel flow case. Numerical studies were completed with 2nd-to-4th order spatial operators, and 4th order time discretization. Performance profiling of LEA was conducted using the Kokkos-tools profiling suite, identifying various enhancements to the code base.
Monday, November 20, 2023 4:51PM - 5:04PM	T15.00003: IMEXLBM: A portable lattice Boltzmann solver based on the Kokkos library Saumil S Patel, Chunheng Zhao, Taehun Lee, Ramesh Balakrishnan An open-source, performance-portable, lattice Boltzmann-based solver, IMEXLBM, is developed for heterogeneous high performance computing (HPC) platforms. Written in C++, this solver is enabled with the Kokkos and message-passing interface (MPI) libraries whereby, we achieve strong scalability on multi-node, GPU-based HPC systems. The implementation and performance metrics for this solver will be assessed across GPU architectures, such as NVIDIA, Intel, and AMD GPUs. Different memory modes (e.g. unified shared memory, device memory) are explored to understand the trade-off between programmability and performance. Our solver is validated by investigating fundamental CFD benchmark problems. A comparison of GPU vs. CPU results are consistent. The 3D Taylor-Green vortex problem is proposed to study the scaling work of this program. The performance of two local functions without MPI data transfer is quite close to ideal scaling. For the non-local function, an even- located MPI partition method is suggested to obtain better performance and scaling. We also present large-scale results for single- and multi-phase applications on the ThetaGPU and Polaris supercomputers at the Argonne Leadership Computing Facility (ALCF).
Monday, November 20, 2023 5:04PM - 5:17PM	T15.00004: Towards spectral-element large-scale simulations of turbulent Rayleigh-Bénardconvection in round cells Martin Karp, Adalberto Perez, Niclas Jansson, Timofey Mukha, Roshan Samuel, Joerg Schumacher, Stefano Markidis, Philipp Schlatter The (potential) transition to the ultimate regime in turbulent Rayleigh-BénardConvection (RBC) hasstill not been shown conclusively through direct numerical simulation. Inthis talk, our developments towardssimulationsof RBCusingthe high-fidelity spectral-element code Neko are considered.The code has a specific focus onlarge-scale direct numerical simulations ofturbulenceand incorporates a modular multi-backend design,enabling performance portability across a wide range of GPUs and CPUs, as well as a GPU-optimized preconditioner with task overlapping for the pressure-Poisson equation and in-situ data compression.We carry out initialsimulations of RBC in a slender circular cell with aspect ratio 1:10 on both the LUMI and Leonardo supercomputers. Neko is able to strong scale to 16,384 GPUsfor a mesh with over 100M spectral elements, correspondingto more than 37B grid points.The Nusselt number and its dependence on Rayleigh numberRa for a few smaller simulations at lower Ra is discussed.We are currentlyunder way with carrying out and preparing simulations at even higher Rayleigh numbers, tonumericallyobservewhether and whenthe transition to the ultimate regime occurs.
Monday, November 20, 2023 5:17PM - 5:30PM	T15.00005: Efficiency-Enhanced High-Fidelity Simulation Solver for Incompressible Two-Phase Flows and Fluid-Structure Interactions Han Liu, Lian Shen In fluid dynamics, the interactions of two-phase incompressible flows with structural elements form a pivotal aspect, significantly influencing a variety of applications such as marine ecosystem dynamics, advanced marine vessel design, environmental hazard mitigation, and renewable energy production systems. To address these complex dynamics and interactions, we have devised a high-fidelity computational solver, tailored specifically for the accurate simulation of incompressible two-phase flows and fluid-structure interactions. Our solver employs the incompressible Navier-Stokes equations, with a synergistic implementation of the level set and volume of fluid methods. This combination enables efficient capturing of the complex geometry at the two-fluid interface. An enhanced immersed boundary method is integrated to effectively capture the fluid-structure interactions, minimizing divergence error in the process. We validated this computational tool using several benchmark tests. The transition from our previous Fortran + MPI CPU-based code to a more versatile and adaptable C++ + CUDA GPU-oriented version substantially improved the computational performance. This transition, when supplemented with the systematic optimization of kernel functions and bandwidth management, has culminated in a substantial enhancement in the solver's efficiency, as supported by quantitative comparisons. Moreover, through an in-depth analysis of the various stages in the computational process, we have obtained valuable insights for further optimizations of the solver.
Monday, November 20, 2023 5:30PM - 5:43PM	T15.00006: Exploring novel algorithmic strategies for optimal performance of discontinuous Galerkin-based flow solvers on GPU-based systems Umesh Unnikrishnan, Kris Rowe, Saumil S Patel Heterogeneous architectures, particularly those based on graphics processsing units (GPUs), are becoming increasingly prevalent in high-performance computing. However, GPU programming is challenging, and requires a different approach to achieve optimal performance than traditional CPU-based implementations. In this talk, we present a case study of a computational fluid dynamics (CFD) application based on the high-order discontinuous Galerkin (DG) method. While the DG method is well-suited for parallelization, the surface term computations for the gradients and fluxes involved in the method present a significant performance bottleneck on GPUs due to the non-contiguous memory accesses. To address this issue, we explore novel performance strategies such as utilization of shared memory, intermediate memory coalescing, and use of alternate data layouts for the element nodes. The implementation and performance metrics for these strategies will be assessed across different GPU architectures, using the OCCA portability framework. The overall speedup of the application for different polynomial orders will also be evaluated and presented. The goal is to investigate novel approaches for optimal performance of DG-based CFD applications on GPUs, which will contribute to the advancement of scientific studies using CFD simulations on modern supercomputers.
Monday, November 20, 2023 5:43PM - 5:56PM	T15.00007: Revolutionizing OpenFOAM: GPU Integration for Unleashing the Power of Computational Fluid Dynamics Zhao Wu, Xiajun Shi, Qi Yang, Lu Li, Haiyang Jiang, Xueqing Zhao, Jian Yang Computational fluid dynamics (CFD) has achieved tremendous success in research and industrial communities partly due to the emergence of accelerator devices like graphics processing units (GPUs) that enable the exploration of systems and timescales that were not feasible before. GPUs offer substantial performance advancements, with expectations of becoming the primary source of raw floating-point operations per second (FLOPS) in upcoming exascale machines. However, fully leveraging these benefits necessitates the reformulation of existing CFD codes. While widely used commercial CFD software packages such as Fluent and STAR-CCM+ have successfully transitioned to GPU architectures, the open-source CFD community has yet to witness a commensurate transformation. Despite some external research attempts to migrate OpenFOAM, a widely adopted open-source CFD code with over 1.3 million lines of code, to GPUs, the core functionality of the code still predominantly relies on central processing units (CPUs). In this talk, we will present our efforts to fully adapt OpenFOAM to leverage GPUs. This includes migrating the entire calculation workflow to the GPU, as well as enabling volume rendering visualization on GPU. By utilizing the latest GPUs, we have observed significant speed improvements, achieving performance gains of 3 to 10 times compared to the latest CPU servers. These enhancements have been attained through meticulous optimization, and they carry profound implications for researchers and engineers seeking faster and more efficient CFD simulations.
Monday, November 20, 2023 5:56PM - 6:09PM	T15.00008: Evaluating Poisson solvers for fire simulations Marcos Vanella, Randall J McDermott The Fire Dynamics Simulator is a fire model commonly used by the fire protection community in design and evaluation of fire suppression systems in buildings and structures, as well as modeling outdoor fires. FDS computes thermally buoyant, chemically reacting flows by means of large eddy simulation of a low Mach approximation of the governing equations for fluid flow, heat and mass transport. Parallel computations are performed on hundreds of cores using the MPI standard. Recently, as part of a large scale fire simulation initiative, the developers started investigating the viability of use of GPU acceleration for computation intensive portions of the algorithm. In this presentation we focus on the pressure Poisson equation arising from the discrete time integration of the momentum equations. We compare GPU accelerated linear solvers being developed as part of the PETSc and associated libraries in an unstructured grid setting, to the classical fast trigonometric solvers used in combination with an immersed boundary method. Both single and multiple mesh, and multiple CPU and GPU test cases are compared in a representative fire scenario, providing some insights and suggestions.

About APS

The American Physical Society (APS) is a non-profit membership organization working to advance the knowledge of physics.

Headquarters 1 Physics Ellipse, College Park, MD 20740-3844 (301) 209-3200
Editorial Office 100 Motor Pkwy, Suite 110, Hauppauge, NY 11788 (631) 591-4000
Office of Public Affairs 529 14th St NW, Suite 1050, Washington, D.C. 20045-2001 (202) 662-8700

Bulletin of the American Physical Society

76th Annual Meeting of the Division of Fluid Dynamics

Sunday–Tuesday, November 19–21, 2023; Washington, DC

Session T15: CFD: HPC I

Follow Us

Engage

My APS

Information for

About APS