Bulletin of the American Physical Society

75th Annual Meeting of the Division of Fluid Dynamics

Volume 67, Number 19

Sunday–Tuesday, November 20–22, 2022; Indiana Convention Center, Indianapolis, Indiana.

Session Q29: CFD: High Performance Computing	Hide Abstracts
Chair: Peter Brady, Los Alamos National Laboratory Room: 237

Monday, November 21, 2022 1:25PM - 1:38PM	Q29.00001: Multi-Precision Solvers for Non-Linear Systems on AMR Grids Peter T Brady, Bobby Philip Modern hardware designed for high performance computing has become increasingly heterogeneous in an effort to increase peak performance and decrease power usage. One of the trends in recent hardware has been the introduction of specialized compute units for which peak performance can only be achieved with reduced precision arithmetic (i.e. Tensor Cores in the NVIDIA A-100 GPU). Historically, flow calculations have relied exclusively on double precision floating point arithmetic and its unclear how much of a flow solver infrastructure can be moved to reduced precision without significantly compromising accuracy. In this work, we will address this question by utilizing iterative refinement and progressive precision to develop multi-precision solvers for non-linear systems on multi-level grids stemming from adaptive mesh refinement.
Monday, November 21, 2022 1:38PM - 1:51PM	Q29.00002: A scalable asynchronous discontinuous-Galerkin method for massively parallel PDE solvers Shubham K Goswami, Konduri Aditya In recent years, the discontinuous-Galerkin (DG) method has received broad interest in developing PDE solvers due to their ability to provide high-order accurate solutions in complex geometries and capture discontinuities in solutions of non-linear hyperbolic problems. The method provides high-arithmetic intensity than finite-difference or finite-volume methods, resulting in good parallel efficiency. However, at an extreme scale, data communication and synchronization remain a bottleneck in the scalability of DG solvers. In this work, we present an asynchronous DG method, which relaxes communication and/or synchronization between processing elements at a mathematical level, thus allowing computations to proceed regardless of the status of communications. The numerical properties of the proposed asynchronous DG method are investigated, where a loss in conservation and poor accuracy are observed. To mitigate these issues, new asynchrony-tolerant (AT) fluxes are derived that can provide arbitrary levels of accuracy. Preliminary results on the stability analysis of the asynchronous DG method will be presented. The computational performance of the method is verified with numerical experiments based on the simple linear equations as well as the reacting compressible flow equations.
Monday, November 21, 2022 1:51PM - 2:04PM	Q29.00003: Neko: A new spectral element code applied to the simulation of a Flettner rotor Philipp Schlatter, Martin Karp, Daniele Massaro, Niclas Jansson, Stefano Markidis Ways to reduce the fuel consumption of ships have been gaining increased interest. One intereting approach is adding a Flettner rotor, i.e. a rotating cylinder that uses the Magnus effect to generate lift. While Flettner rotors have been considered in physical experiments and RANS-type simulations, no direct numerical simulation (DNS) of a Flettner rotor, focusing on the interaction with the surrounding turbulent boundary layer has been carried out. Our simulation demonstrates our new CFD code Neko, a solver that owes it homage to Nek5000. Neko is based on similar numerical methods, but differs in that it also accommodates modern computer architectures such as GPUs and has an object-oriented codebase written in modern Fortran. Special emphasis has been put on portability, scalability and the possibility to extend to code easily with new features; thereby keeping the complexity for the domain scientist to a minimum. The flow case under consideration is a Flettner rotor submerged in a turbulent boundary layer, consisting of 1M spectral elements, which turns into 0.5B unique grid points. We discuss the strong scaling efficiency with comparing several architectures, including Nvidia A100 GPU, AMD INSTINCT GPU and AMD EPYC CPU. We observe excellent parallel scaling, up to hundreds of nodes. Our initial findings for the lift are in excellent agreement with experimental data. The drag was found to be highly dependent on the Re. We observe that there is a strong interaction between the rotor and the turbulent boundary layer, in terms of modified coherent structures.
Monday, November 21, 2022 2:04PM - 2:17PM	Q29.00004: Compact Finite Difference based Framework for Large Scale Simulations of Compressible Turbulent Flows Hang Song, Aditya S Ghate, Akshay Subramaniam, Kristen V Matsuno, Jacob R West, Sanjiva K Lele This work develops a computational framework modernizing the utilization of compact finite difference methods on large scale high performance computing platforms for simulations of compressible turbulent flows. The framework has two major components — a parallel algorithm to solve the cyclic banded system, and a unified discretization approach for compressible Navier-Stokes equations. The linear solver considers the operations on distributed and shared memory hierarchies assuming a flexible grid partition strategy without all-to-all communication nor iterations. The discretization approach allows all conservative variables to be stored at collocated grid points while fluxes to be assembled at staggered grid points. Computational robustness is gained by implicit dealiasing in nonlinear flux assembly as well as enhanced spectral resolution from staggereing and model based subgrid dissipation. The framework has been demonstrated to scale up to 24576 GPUs, and maintains robust performance on cartesian and curvilinear mesh without the need for numerical filtering.
Monday, November 21, 2022 2:17PM - 2:30PM	Q29.00005: Advances on the multiblock extension of a DNS solver in the Legion framework Alboreno Voci, Mario Di Renzo, Sanjiva K Lele, Gianluca Iaccarino A multiblock extension of the Hypersonic Task-based Research (HTR) solver [Di Renzo et al., Comp. Phys. Comm. 255, 2020] is described in this work. The solver under investigation is a structured, scalable, single block solver which solves the Navier--Stokes (NS) equations for a multispecies, chemically reacting, high enthalpy flow. It uses the task-based programming model provided by the Legion runtime, which is also the building framework for this work, to schedule the work across all the available computational resources. In contrast to a conventional MPI based approach, our multiblock solver uses Legion's data structures and their supported logical operations to handle task scheduling and parallelism. The advantage of this approach lies in the fact that the whole computational domain is considered as a single logical instance, which is cognizant of all the partitions, thus automatically performing the appropriate communications when needed. The specific patterns of these communications are automatically determined at runtime by Legion using the boundary conditions and block connectivity information specified as user inputs. Verification tests for inviscid vortex rotation in a ring and an application example of a compressible flow through a pipe, which is simulated in a butterfly grid topology, will be discussed in this talk along with the computational performance of the proposed framework.
Monday, November 21, 2022 2:30PM - 2:43PM	Q29.00006: Turbulence simulations on the verge of Exascale: GPU algorithms and an alternative to long simulations at high resolutions Pui-Kuen Yeung, Kiran Ravikumar, Stephen Nichols With Exascale Computing having officially arrived in mid 2022, we report on the development of a new Exascale-ready GPU algorithm for 3D homogeneous turbulence. Our goal is to push the envelope in simulation size while optimizing code performance aggressively by fully exploiting the particular strengths of leadership-class hardware and software. In particular on ``Frontier'' at Oak Ridge National Laboratory, OpenMP offloading to GPUs, fast GPU-aware message passing and reduced needs for host-device data copying are helping make turbulence simulations at $32768^3$ resolution a reality, and also raising hopes for other computational challenges previously out of reach. However, resource requirements in turbulence actually grow with problem size so fast that long, well-sampled simulations at state-of-the-art resolution are becoming increasingly less feasible. We briefly discuss a new simulation paradigm (Yeung \& Ravikumar, PRF 2020) which is well-suited for problems requiring good statistics of the small scales which evolve rapidly in time.
Monday, November 21, 2022 2:43PM - 2:56PM	Q29.00007: Performance benchmarking of a discontinuous Galerkin-based compressible flow solver on GPU computing platforms using cnsBench Umesh Unnikrishnan, Kris Rowe, Ali Karakus, Saumil S Patel Heterogeneous computing architectures have become an integral feature of modern supercomputers. In this work, we present the performance benchmarking results of a discontinuous Galerkin, spectral-element solver for the compressible Navier-Stokes equations on different GPU-based computing platforms. The solver uses OCCA, an open-source library that provides the portability layer to offload targeted kernels across different architectures and vendor platforms, and achieve application portability. Profiling of the solver is conducted, and the most compute-expensive kernels are identified. A mini-app called cnsBench is developed based on the full solver to benchmark and investigate performance characteristics of different core kernels. The kernel performance metrics will be presented and compared across different GPU architectures, such as NVIDIA, Intel, and AMD GPUs, and programming models, such as CUDA, OpenCL and SYCL. The kernel algorithms and memory access patterns are analyzed to provide insights regarding computational bottlenecks and approaches to further optimize performance of these kernels. These efforts will guide future development of compressible flow applications that can leverage the full potential of next generation exascale supercomputers and beyond.
Monday, November 21, 2022 2:56PM - 3:09PM	Q29.00008: Euler-Lagrange multiphase flow simulations in the GPU-accelerated spectral element solver NekRS Viral S Shah, Dimitrios K. Fytanidis, Yan Feng, Virendra P Ghate, Rao Kotamarthi, Malachi Phillips, Paul Fischer Direct and Large Eddy Simulations using coupled Euler-Lagrange approaches are commonly used to study complex multiphase flows. Applications of such simulations range from the spread of airborne droplets and aerosols as potential virus carriers to detailed cloud-droplet dynamics. The particle simulation library, PPICLF [1,2], which includes state-of-the-art four-way particle/droplet coupling already exists for spectral elements in a CPU-based framework. Such strong support, however, is not yet found in a GPU-based framework. This work aims to provide a plug-in to the GPU-based spectral element fluid flow solver, NekRS, to incorporate interactions of Lagrangian particles with fluid, other particles, and boundaries of the domain. The solver also includes the interaction forces exerted by the particles back onto the fluid. These capabilities are used to simulate cloud-droplet growth by condensation in turbulent flows. Using this example, we evaluate the impact of each component of the plug-in on the parallel GPU performance. [1] Zwick, D., ppiclF: A parallel particle-in-cell library in Fortran, J. Open Source Software, May 2019. [2] Zwick, D. (2019). Scalable highly-resolved Euler-Lagrange multiphase flow simulation with applications to shock tubes. (Doctoral dissertation). University of Florida.
Monday, November 21, 2022 3:09PM - 3:22PM	Q29.00009: A Novel Graphical Technique for Optimal High-Performance Computing Hardware Selection Reid Prichard, Wayne Strasser We present a novel graphical methodology for multi-variable, multi-objective optimization of High-Performance Computing (HPC) hardware selection and allocation. HPC performance rarely scales linearly with increasing CPU core count, and this scaling behavior varies for different types of hardware and different categories of models. Our methodology guides the optimization process with novel normalizations of speed and cost and an accompanying plotting method. This produces a Pareto front allowing easy and intuitive comparison of alternatives and reevaluation of such when the user's goals change. We apply this methodology and demonstrate universal scaling behavior across four hardware types and three computational fluid dynamics models with mesh sizes from 3E5 to 5E7 cells.

About APS

The American Physical Society (APS) is a non-profit membership organization working to advance the knowledge of physics.

Headquarters 1 Physics Ellipse, College Park, MD 20740-3844 (301) 209-3200
Editorial Office 100 Motor Pkwy, Suite 110, Hauppauge, NY 11788 (631) 591-4000
Office of Public Affairs 529 14th St NW, Suite 1050, Washington, D.C. 20045-2001 (202) 662-8700

Bulletin of the American Physical Society

75th Annual Meeting of the Division of Fluid Dynamics

Volume 67, Number 19

Sunday–Tuesday, November 20–22, 2022; Indiana Convention Center, Indianapolis, Indiana.

Session Q29: CFD: High Performance Computing

Follow Us

Engage

My APS

Information for

About APS