Bulletin of the American Physical Society
70th Annual Meeting of the APS Division of Fluid Dynamics
Volume 62, Number 14
Sunday–Tuesday, November 19–21, 2017; Denver, Colorado
Session M30: High Performance ComputingCFD
|
Hide Abstracts |
Chair: Frederic Gibou, University of California, Santa Barbara Room: 110 |
Tuesday, November 21, 2017 8:00AM - 8:13AM |
M30.00001: A multithreaded and GPU-optimized compact finite difference algorithm for turbulent mixing at high Schmidt number using petascale computing M. P. Clay, P. K. Yeung, D. Buaria, T. Gotoh Turbulent mixing at high Schmidt number is a multiscale problem which places demanding requirements on direct numerical simulations to resolve fluctuations down the to Batchelor scale. We use a dual-grid, dual-scheme and dual-communicator approach where velocity and scalar fields are computed by separate groups of parallel processes, the latter using a combined compact finite difference (CCD) scheme on finer grid with a static 3-D domain decomposition free of the communication overhead of memory transposes. A high degree of scalability is achieved for a $8192^3$ scalar field at Schmidt number $512$ in turbulence with a modest inertial range, by overlapping communication with computation whenever possible. On the Cray XE6 partition of Blue Waters, use of a dedicated thread for communication combined with OpenMP locks and nested parallelism reduces CCD timings by 34\% compared to an MPI baseline. The code has been further optimized for the 27-petaflops Cray XK7 machine Titan using GPUs as accelerators with the latest OpenMP 4.5 directives, giving 2.7X speedup compared to CPU-only execution at the largest problem size. [Preview Abstract] |
Tuesday, November 21, 2017 8:13AM - 8:26AM |
M30.00002: ABSTRACT WITHDRAWN |
Tuesday, November 21, 2017 8:26AM - 8:39AM |
M30.00003: GPU Accelerated DG-FDF Large Eddy Simulator Medet Inkarbekov, Aidyn Aitzhan, Shervin Sammak, Peyman Givi, Aidarkhan Kaltayev A GPU accelerated simulator is developed and implemented for large eddy simulation (LES) of turbulent flows. The filtered density function (FDF) is utilized for modeling of the subgrid scale quantities. The filtered transport equations are solved via a discontinuous Galerkin (DG) and the FDF is simulated via particle based Lagrangian Monte-Carlo (MC) method. It is demonstrated that the GPUs simulations are of the order of 100 times faster than the CPU-based calculations. This brings LES of turbulent flows to a new level, facilitating efficient simulation of more complex problems. [Preview Abstract] |
Tuesday, November 21, 2017 8:39AM - 8:52AM |
M30.00004: A GPU-accelerated semi-implicit fractional step method for numerical solutions of incompressible Navier-Stokes equations Sanghyun Ha, Junshin Park, Donghyun You Utility of the computational power of modern Graphics Processing Units (GPUs) is elaborated for solutions of incompressible Navier-Stokes equations which are integrated using a semi-implicit fractional-step method. Due to its serial and bandwidth-bound nature, the present choice of numerical methods is considered to be a good candidate for evaluating the potential of GPUs for solving Navier-Stokes equations using non-explicit time integration. An efficient algorithm is presented for GPU acceleration of the Alternating Direction Implicit (ADI) and the Fourier-transform-based direct solution method used in the semi-implicit fractional-step method. OpenMP is employed for concurrent collection of turbulence statistics on a CPU while Navier-Stokes equations are computed on a GPU. Extension to multiple NVIDIA GPUs is implemented using NVLink supported by the Pascal architecture. Performance of the present method is experimented on multiple Tesla P100 GPUs compared with a single-core Xeon E5-2650 v4 CPU in simulations of boundary-layer flow over a flat plate. [Preview Abstract] |
Tuesday, November 21, 2017 8:52AM - 9:05AM |
M30.00005: Particle Laden Turbulence in a Radiation Environment Using a Portable High Preformace Solver Based on the Legion Runtime System Hilario Torres, Gianluca Iaccarino Soleil-X is a multi-physics solver being developed at Stanford University as a part of the Predictive Science Academic Alliance Program II. Our goal is to conduct high fidelity simulations of particle laden turbulent flows in a radiation environment for solar energy receiver applications as well as to demonstrate our readiness to effectively utilize next generation Exascale machines. The novel aspect of Soleil-X is that it is built upon the Legion runtime system to enable easy portability to different parallel distributed heterogeneous architectures while also being written entirely in high-level/high-productivity languages (Ebb and Regent). An overview of the Soleil-X software architecture will be given. Results from coupled fluid flow, Lagrangian point particle tracking, and thermal radiation simulations will be presented. Performance diagnostic tools and metrics corresponding the the same cases will also be discussed. [Preview Abstract] |
Tuesday, November 21, 2017 9:05AM - 9:18AM |
M30.00006: Numerical Simulations of Reacting Flows Using Asynchrony-Tolerant Schemes for Exascale Computing Emmet Cleary, Aditya Konduri, Jacqueline Chen Communication and data synchronization between processing elements (PEs) are likely to pose a major challenge in scalability of solvers at the exascale. Recently developed asynchrony-tolerant (AT) finite difference schemes address this issue by relaxing communication and synchronization between PEs at a mathematical level while preserving accuracy, resulting in improved scalability. The performance of these schemes has been validated for simple linear and nonlinear homogeneous PDEs. However, many problems of practical interest are governed by highly nonlinear PDEs with source terms, whose solution may be sensitive to perturbations caused by communication asynchrony. The current work applies the AT schemes to combustion problems with chemical source terms, yielding a stiff system of PDEs with nonlinear source terms highly sensitive to temperature. Examples shown will use single-step and multi-step CH4 mechanisms for 1D premixed and nonpremixed flames. Error analysis will be discussed both in physical and spectral space. Results show that additional errors introduced by the AT schemes are negligible and the schemes preserve their accuracy. [Preview Abstract] |
Tuesday, November 21, 2017 9:18AM - 9:31AM |
M30.00007: Spatial and temporal accuracy of asynchrony-tolerant finite difference schemes for partial differential equations at extreme scales Komal Kumari, Diego Donzis Highly resolved computational simulations on massively parallel machines are critical in understanding the physics of a vast number of complex phenomena in nature governed by partial differential equations. Simulations at extreme levels of parallelism present many challenges with communication between processing elements (PEs) being a major bottleneck. In order to fully exploit the computational power of exascale machines one needs to devise numerical schemes that relax global synchronizations across PEs. This asynchronous computations, however, have a degrading effect on the accuracy of standard numerical schemes.We have developed asynchrony-tolerant (AT) schemes that maintain order of accuracy despite relaxed communications. We show, analytically and numerically, that these schemes retain their numerical properties with multi-step higher order temporal Runge-Kutta schemes. We also show that for a range of optimized parameters,the computation time and error for AT schemes is less than their synchronous counterpart. Stability of the AT schemes which depends upon history and random nature of delays, are also discussed. [Preview Abstract] |
Tuesday, November 21, 2017 9:31AM - 9:44AM |
M30.00008: Error characterization for asynchronous computations: Proxy equation approach Gabriella Sallai, Ankita Mittal, Sharath Girimaji Numerical techniques for asynchronous fluid flow simulations are currently under development to enable efficient utilization of massively parallel computers. These numerical approaches attempt to accurately solve time evolution of transport equations using spatial information at different time levels. The truncation error of asynchronous methods can be divided into two parts: delay dependent ($E_A$) or asynchronous error and delay independent ($E_S$) or synchronous error. The focus of this study is a specific asynchronous error mitigation technique called proxy-equation approach. The aim of this study is to examine these errors as a function of the characteristic wavelength of the solution. Mitigation of asynchronous effects requires that the asynchronous error be smaller than synchronous truncation error. For a simple convection-diffusion equation, proxy-equation error analysis identifies critical initial wave-number, $\lambda_c$. At smaller wave numbers, synchronous error are larger than asynchronous errors. We examine various approaches to increase the value of $\lambda_c$ in order to improve the range of applicability of proxy-equation approach. [Preview Abstract] |
Tuesday, November 21, 2017 9:44AM - 9:57AM |
M30.00009: Algorithm for computing descriptive statistics for very large data sets and the exa-scale era Izaak Beekman An algorithm for Single-point, Parallel, Online, Converging Statistics (SPOCS) is presented. It is suited for \emph{in situ} analysis that traditionally would be relegated to post-processing, and can be used to monitor the statistical convergence and estimate the error/residual in the quantity---useful for uncertainty quantification too. Today, data may be generated at an overwhelming rate by numerical simulations and proliferating sensing apparatuses in experiments and engineering applications. Monitoring descriptive statistics in real time lets costly computations and experiments be gracefully aborted if an error has occurred, and monitoring the level of statistical convergence allows them to be run for the shortest amount of time required to obtain good results. This algorithm extends work by P\'{e}bay (Sandia Report SAND2008-6212). P\'{e}bay's algorithms are recast into a converging delta formulation, with provably favorable properties. The mean, variance, covariances and arbitrary higher order statistical moments are computed in one pass. The algorithm is tested using Sillero, Jim\'{e}nez,~\&~Moser's~(2013, 2014) publicly available UPM high Reynolds number turbulent boundary layer data set, demonstrating numerical robustness, efficiency and other favorable properties. [Preview Abstract] |
Follow Us |
Engage
Become an APS Member |
My APS
Renew Membership |
Information for |
About APSThe American Physical Society (APS) is a non-profit membership organization working to advance the knowledge of physics. |
© 2024 American Physical Society
| All rights reserved | Terms of Use
| Contact Us
Headquarters
1 Physics Ellipse, College Park, MD 20740-3844
(301) 209-3200
Editorial Office
100 Motor Pkwy, Suite 110, Hauppauge, NY 11788
(631) 591-4000
Office of Public Affairs
529 14th St NW, Suite 1050, Washington, D.C. 20045-2001
(202) 662-8700