Bulletin of the American Physical Society
74th Annual Meeting of the APS Division of Fluid Dynamics
Volume 66, Number 17
Sunday–Tuesday, November 21–23, 2021; Phoenix Convention Center, Phoenix, Arizona
Session A20: Computational Fluid Dynamics: High Performance Computing |
Hide Abstracts |
Chair: Gianluca Iaccarino, Stanford University Room: North 221 AB |
Sunday, November 21, 2021 8:00AM - 8:13AM |
A20.00001: A task-based GPU-compatible discontinuous Galerkin fluid solver Kihiro Bando, Matthias Ihme In order to address the ever increasing heterogeneity of high performance computing platforms, task-based programming models have emerged to facilitate the implementation and improve the scaling of computational physics solvers. Recently, we presented a discontinuous Galerkin solver for the compressible Navier-Stokes equations which leverages the Legion programming system for distributing and scheduling tasks. In this talk, we discuss the use of the Kokkos library to achieve performance portability by demonstrating efficient execution of the program on traditional CPUs and Nvidia GPUs from a single source code. Implementation and scaling challenges specific to the present application will be emphasized, in particular regarding the data layout choices, the splitting of the algorithm into tasks, and the communication pattern. Preliminary results for single and multi-GPU performance will be presented. |
Sunday, November 21, 2021 8:13AM - 8:26AM |
A20.00002: A scalable time-parallel spectral Stokes solver for biological flows Mahdi Esmaily Simulation of unsteady creeping flows in complex geometries has traditionally required the use of a time-stepping procedure, which is typically costly and unscalable. To reduce the cost and allow for computations at much larger scales, we propose an alternative approach that is formulated based on the unsteady Stokes equation expressed in the time-spectral domain. This advantages of this new formulation is that 1) it resolves time-variation of the solution using a few modes rather than thousands of time steps, thus offering significant saving in cost and time-to-solution, 2) it exhibits a super convergence behavior versus the number of computed modes, and 3) it is embarrassingly parallelizable owing to the independence of the solution modes, thus enabling scalable calculations at a much larger number of processors. The comparison of the proposed technique against a standard stabilized finite element solver is performed using two- and three-dimensional canonical and complex geometries. The results show that the proposed method can produce more accurate results at 1\% to 11\% of the cost of the standard technique for the studied cases. |
Sunday, November 21, 2021 8:26AM - 8:39AM |
A20.00003: A task-based parallel framework for ensemble simulations of rocket ignition Kazuki Maeda, Mario Di Renzo, Thiago Teixeira, Jonathan M Wang, Jeffrey M Hokanson, Caetano Melone, Steve Jones, Javier Urzay, Gianluca Iaccarino We present an integrated, parallel computational framework for exascale-oriented ensemble simulations of laser-induced ignition in a methane-oxygen rocket combustor. The framework employs the reacting flow solver HTR (Di Renzo et al., Comp. Phys. Comm. 2020). The solver uses the task-based programming model built on the Legion runtime system to achieve scalable simulation on supercomputers with heterogeneous architectures. The compressible, multi-species Navier-Stokes equations with finite-rate combustion chemistry are discretized on curvilinear grids using a low-dissipation conservative formulation. Laser-induced ignition is modeled by rapid, intense energy-deposition. This solver is integrated in a continuous development environment to manage a large software development team. The framework also leverages Legion’s mapper to efficiently perform the execution of ensembles simultaneously on GPUs and CPUs across multiple fidelities to carry out reliability and uncertainty quantification studies. We show verification examples as well as demonstrate the framework through combustor simulations using representative parameters. |
Sunday, November 21, 2021 8:39AM - 8:52AM |
A20.00004: Achieving performance and portability on GPUs and CPUs for a high-order finite difference compressible flow solver using OpenMP and Fortran Britton J Olson, Brandon Blakeley We present a strategy for porting a exisiting finite-difference computational fluid dynamics (CFD) code written in modern Fortran for deployment on either CPUs or GPUs. The OpenMP library is used to acheive machine independent source code which is suitable for compilation and execution on both CPU and GPU based archictures. To aid in the code tranformation process and to maintain performance on the CPU a python based pre-processor is developed which allows the fortran code to be minimally altered and yet still acheive performance on the GPU. Examples of the transformed source code are included to demonstrate the effectiveness of the approach. In addition, we present performance data on the GPU compared to the CPU which show large speed-ups in the code's throughput. Scaling studies include a large scale calcuatlion using 4096 GPUs of the Sierra platform at Lawrence Livermore National Laboratory of a turbulent jet with more than 8 billions computational zones. |
Sunday, November 21, 2021 8:52AM - 9:05AM |
A20.00005: A Novel Methodology to Optimize High-Performance Computing Hardware for Computational Fluid Dynamics Simulations Reid Prichard, Wayne Strasser We present a novel methodology for the multi-objective optimization of computational fluid dynamics simulations. It is commonly known that Computational Fluid Dynamics (CFD) codes do not scale perfectly with increasing CPU core count. Different hardware types can exhibit qualitatively different scalability that cannot be predicted from hardware specifications, so testing is necessary. Furthermore, we demonstrate performance gains of over 10% by undersubscribing the hardware (that is, using fewer than the available number of CPU cores); testing is necessary to find the optimum undersubscription. Our approach simplifies comparison of alternatives by comparing normalized quantities of speed and cost. Plotting these quantities allows easy and intuitive comparison of alternatives. |
Sunday, November 21, 2021 9:05AM - 9:18AM |
A20.00006: Implementation and Results of a Novel Computational Fluid Dynamics Optimization Methodology Reid Prichard, Wayne Strasser Using a novel optimization methodology, we demonstrate the necessity of testing to determine the optimum hardware configuration for Computational Fluid Dynamics (CFD). We present our optimization results for three disparate CFD models. One model included three meshes ranging from 3e5 to 2e7 cells, the second had meshes of 6e6 and 5e7 cells, and the third model's mesh was 1e8 cells. We show that hardware scaling behavior cannot be predicted by specifications and that results from one model or mesh cannot be applied to another. While it is unfortunate testing results cannot be extrapolated, the optimization process is worthwhile: we achieved performance improvements of over 10% compared to what might be selected with an unoptimized approach. |
Sunday, November 21, 2021 9:18AM - 9:31AM |
A20.00007: A multiblock compressible Navier-Stokes solver in the Legion environment Alboreno Voci, Mario Di Renzo, Kazuki Maeda, Thiago Teixeira, Gianluca Iaccarino In this work we present the development of a parallel, multiblock flow solver using the Legion/Regent task-based programming framework. This work builds upon the Hypersonics Task-based Research (HTR) solver (Di Renzo et al., Comp. Phys. Comm. 2020), which is a highly scalable, compressible multi-species reacting Navier-Stokes solver designed for simulations of hypersonic turbulence and turbulent combustion. The novelty of this development lies in the treatment of general multi-block computational domains as a single logical instance (data layout in memory) composed by the union of grid blocks representing a complex computational domain. Our implementation leverages the capability of the Legion framework to manage complex index-spaces by automatically performing memory allocations and data synchronisation. This paradigm relieves the programmer and the user of the solver from manually setting up communication patterns among blocks and domain decomposition for multiprocessor computations. The correctness and scalability tests on both CPUs and GPUs are performed and demonstrated for canonical flow problems. |
Sunday, November 21, 2021 9:31AM - 9:44AM |
A20.00008: Simulating Fluid Flows on the Tensor Processing Unit Platform Qing Wang, Xinle Liu, Sheide Chammas, Vivian Yang, Matthias Ihme, Yi-Fan Chen A simulation framework is developed for predicting complex flows on Tensor Processing Unit (TPU) platforms. The simulation framework solves the three-dimensional Navier-Stokes equations along with constitutive models for fluid dynamics, combustion, heat-transfer, and other thermodynamic processes. One of the applications of this simulation framework is to study wildfire propagations. This framework is validated by considering predictions of prescribed wildfires with a wide range of physical factors including wind speed, terrains, fuel density, and moisture. Additionally, we simulated the full event of the 2017 Tubbs fire. These high-fidelity simulations generated by the TPU simulation framework runs significantly faster than conventional CPU-based CFD softwares, also at a lower computational cost, which provides a foundation for studying the physical insights of wildfire propagations scientifically. |
Sunday, November 21, 2021 9:44AM - 9:57AM |
A20.00009: A low-storage Runge-Kutta time integration method for scalable asynchrony-tolerant numerical schemes Shubham K Goswami, Vinod J Matthew, Konduri Aditya Asynchrony-tolerant (AT) finite difference schemes, which relax communication and data synchronization requirements at a mathematical level, are shown to significantly improve the scalability of flow solvers at an extreme scale. These schemes are coupled with suitable high-order time integration methods such as Adams-Bashforth or Runge-Kutta schemes to achieve high-order accurate solutions in time-dependent PDEs. The low-storage RK (LSRK) schemes require low memory compared to the standard schemes and are widely used in several flow solvers. However, these schemes need data communication and synchronization at every time step and at every stage within a time step to achieve high-order accuracy. This work proposes a novel approach to couple AT and LSRK schemes in solving time-dependent PDEs that would significantly reduce communication overheads. The accuracy of this approach is investigated, both theoretically and numerically, using simple 1D model equations. Massively parallel 3D simulations of decaying compressible turbulence are performed to demonstrate the scalability of the proposed approach. At extreme scales, a speed-up of 2.8x was obtained. Overall, the approach shows a promise in addressing the communication bottleneck issue as we move towards exascale. |
Follow Us |
Engage
Become an APS Member |
My APS
Renew Membership |
Information for |
About APSThe American Physical Society (APS) is a non-profit membership organization working to advance the knowledge of physics. |
© 2024 American Physical Society
| All rights reserved | Terms of Use
| Contact Us
Headquarters
1 Physics Ellipse, College Park, MD 20740-3844
(301) 209-3200
Editorial Office
100 Motor Pkwy, Suite 110, Hauppauge, NY 11788
(631) 591-4000
Office of Public Affairs
529 14th St NW, Suite 1050, Washington, D.C. 20045-2001
(202) 662-8700