Bulletin of the American Physical Society

APS March Meeting 2014

Volume 59, Number 1

Monday–Friday, March 3–7, 2014; Denver, Colorado

Session T27: Focus Session: Heterogeneous High Performance Computing Platforms on Computational Physics	Hide Abstracts
Sponsoring Units: DCOMP Chair: Bogdan Mihaila, National Science Foundation Room: 501

Thursday, March 6, 2014 11:15AM - 11:51AM	T27.00001: Exploring Emerging Technologies in the HPC Co-Design Space Invited Speaker: Jeffrey Vetter Concerns about energy-efficiency and reliability have forced our community to reexamine the full spectrum of architectures, software, and algorithms that constitute our ecosystem. While architectures and programming models remained relatively stable for almost two decades, new architectural features, such as heterogeneous processing, nonvolatile memory, and optical interconnection networks, will demand that applications be redesigned so that they expose massive amounts of hierarchical parallelism, carefully orchestrate data movement, and balance concerns over accuracy, reliability, and time to solution. In what we have termed ``co-design,'' teams of architects, software designers, and applications scientists, are working collectively to realize an integrated solution to these challenges. Not surprisingly, this design space can be massive, uncertain, and disjointed. To assist in this design space exploration, our team is using modeling, simulation, and measurement on prototype systems in order to assess the possible trajectories of these future systems. In this talk, I will sample these emerging technologies and discuss how we can prepare for these prospective systems. [Preview Abstract]
Thursday, March 6, 2014 11:51AM - 12:03PM	T27.00002: ABSTRACT WITHDRAWN [Preview Abstract]
Thursday, March 6, 2014 12:03PM - 12:15PM	T27.00003: An hybrid computing approach to accelerating the multiple scattering theory based {\em ab initio} methods Yang Wang, G. Malcolm Stocks The multiple scattering theory method, also known as the Korringa-Kohn-Rostoker (KKR) method, is considered an elegant approach to the {\em ab initio} electronic structure calculation for solids. Its convenience in accessing the one-electron Green function has led to the development of locally-self consistent multiple scattering (LSMS) method, a linear scaling {\em ab initio} method that allows for the electronic structure calculation for complex structures requiring tens of thousands of atoms in unit cell. It is one of the few applications that demonstrated petascale computing capability. In this presentation, we discuss our recent efforts in developing a hybrid computing approach for accelerating the full potential electronic structure calculation. Specifically, in the framework of our existing LSMS code in FORTRAN 90/95, we explore the many core resources on GPGPU accelerators by implementing the compute intensive functions (for the calculation of multiple scattering matrices and the single site solutions) in CUDA, and move the computational tasks to the GPGPUs if they are found available. We explain in details our approach to the CUDA programming and the code structure, and show the speed-up of the new hybrid code by comparing its performances on CPU/GPGPU and on CPU only. [Preview Abstract]
Thursday, March 6, 2014 12:15PM - 12:27PM	T27.00004: Virtual X-Ray and Electron Diffraction Patterns from Atomistic Simulations on Heterogeneous Computing Platforms Shawn Coleman, Yang Wang, Luis Cueva-Parra, Douglas Spearot Electron and X-ray diffraction are well-established experimental methods used to explore the atomic scale structure of materials. In this work, a computational algorithm is developed to produce virtual electron and X-ray diffraction patterns directly from atomistic simulations. In this algorithm, the diffraction intensity is computed via the structure factor equation over a 3-dimensional mesh of \{hkl\} points in reciprocal space. To construct virtual selected area electron diffraction (SAED) patterns, a thin hemispherical slice of the reciprocal lattice map lying near the surface of the Ewald sphere is isolated and viewed parallel to a specified zone axis. X-ray diffraction $2\theta$ line profiles are created by virtually rotating the Ewald sphere around the origin of reciprocal space, binning intensities by their associated scattering angle. The diffraction code is parallelized using a heterogeneous mix of MPI and OpenMP. The atom positions are distributed via MPI while the reciprocal space mesh is parallelized using either OpenMP threads launched on regular CPU cores or offloaded to MIC hardware. The complexity of heterogeneous MPI/OpenMP parallelization on mixed hardware will be discussed. [Preview Abstract]
Thursday, March 6, 2014 12:27PM - 1:03PM	T27.00005: Replica Exchange Molecular Dynamics in the Age of Heterogeneous Architectures Invited Speaker: Adrian Roitberg The rise of GPU-based codes has allowed MD to reach timescales only dreamed of only 5 years ago. Even within this new paradigm there is still need for advanced sampling techniques. Modern supercomputers (e.g. Blue Waters, Titan, Keeneland) have made available to users a significant number of GPUS and CPUS, which in turn translate into amazing opportunities for dream calculations. Replica-exchange based methods can optimally use tis combination of codes and architectures to explore conformational variabilities in large systems. I will show our recent work in porting the program Amber to GPUS, and the support for replica exchange methods, where the replicated dimension could be Temperature, pH, Hamiltonian, Umbrella windows and combinations of those schemes. [Preview Abstract]
Thursday, March 6, 2014 1:03PM - 1:15PM	T27.00006: HOOMD-blue -- scaling up from one desktop GPU to Titan Jens Glaser, Joshua A. Anderson, Sharon C. Glotzer Scaling molecular dynamics simulations from one to many GPUs presents unique challenges. Due to the high parallel efficiency of a single GPU, communication processes become a bottleneck when multiple GPUs are combined in parallel and limit scaling. We show how the fastest general-purpose molecular dynamics code currently available for single GPUs, HOOMD-blue [1,2], has been extended using spatial domain decomposition to run efficiently on tens or hundreds of GPUs. A key to parallel efficiency is a highly optimized communication pattern using locally load-balancing algorithms fully implemented on the GPU. We will discuss comparisons to other state-of-the-art codes (LAMMPS) and present preliminary benchmarks on the Titan super computer. [1] http://arxiv.org/pdf/1308.5587 [2] http://codeblue.umich.edu/hoomd-blue [Preview Abstract]
Thursday, March 6, 2014 1:15PM - 1:27PM	T27.00007: Towards Fast, Scalable Hard Particle Monte Carlo Simulations on GPUs Joshua A. Anderson, M. Eric Irrgang, Jens Glaser, Eric S. Harper, Michael Engel, Sharon C. Glotzer Parallel algorithms for Monte Carlo simulations of thermodynamic ensembles of particles have received little attention because of the inherent serial nature of the statistical sampling. We discuss the implementation of Monte Carlo for arbitrary hard shapes in HOOMD-blue [1], a GPU-accelerated particle simulation tool, to enable million particle simulations in a field where thousands is the norm. In this talk, we discuss our progress on basic parallel algorithms [2], optimizations that maximize GPU performance, and communication patterns for scaling to multiple GPUs. Research applications include colloidal assembly and other uses in materials design, biological aggregation, and operations research. [1] Anderson, Glotzer, arXiv:1308.5587 (2013), http://codeblue.umich.edu/hoomd-blue [2] Anderson, Jankowski, Grubb, Engel, Glotzer, J. Comp. Phys. 254, 27 (2013) [Preview Abstract]
Thursday, March 6, 2014 1:27PM - 1:39PM	T27.00008: DFT-Based Electronic Structure Calculations on Hybrid and Massively Parallel Computer Architectures Emil Briggs, Miroslav Hodak, Wenchang Lu, Jerry Bernholc The latest generation of supercomputers is capable of multi-petaflop peak performance, achieved by using thousands of multi-core CPU's and often coupled with thousands of GPU's. However, efficient utilization of this computing power for electronic structure calculations presents significant challenges. We describe adaptations of the Real-Space Multigrid (RMG) code that enable it to scale well to thousands of nodes. A hybrid technique that uses one MPI process per node, rather than on per core was adopted with OpenMP and POSIX threads used for intra-node parallelization. This reduces the number of MPI process's by an order of magnitude or more and improves individual node memory utilization. GPU accelerators are also becoming common and are capable of extremely high performance for vector workloads. However, they typically have much lower scalar performance than CPU's, so achieving good performance requires that the workload is carefully partitioned and data transfer between CPU and GPU is optimized. We have used a hybrid approach utilizing MPI/OpenMP/POSIX threads and GPU accelerators to reach excellent scaling to over 100,000 cores on a Cray XE6 platform as well as a factor of three performance improvement when using a Cray XK7 system with CPU-GPU nodes. [Preview Abstract]
Thursday, March 6, 2014 1:39PM - 1:51PM	T27.00009: Janus II: the new generation Special Purpose Computer for spin-system simulations Sergio Perez-Gaviro We present Janus II [1], our second grand challenge of High Performance Computing on Computational Physics. This Special Purpose Computer, recently developed and commissioned by the Janus Collaboration, is based on a Field-Programmable-Gate-Array (FPGA) architecture. Janus II has been designed and developed as a multipurpose reprogramable supercomputer and it is optimized for speeding up the Monte Carlo simulations of a wide class of spin glass models. It builds and improves on the experience of its predecessor,Janus, that has been successfully running physics simulations for the last 6 years. With Janus II will make possible to carry out Monte Carlo simulations campaigns that would take several centuries if performed on currently available computer systems. \\[4pt] [1] The Janus Collaboration, Comp. Phys. Comm, in press (arXiv:1310.1032) [Preview Abstract]
Thursday, March 6, 2014 1:51PM - 2:03PM	T27.00010: Hybrid density functional calculation accelerated using GPGPU Yoshihide Yoshimoto Although hybrid density functionals are known to improve several simulated physical properties, their computational costs are very high because we have to compute the exchange interaction explicitly. For example, in plane wave based simulation programs, which are widely used, we have to execute a lot of Fast Fourier Transformation(FFT)s and this part becomes the majority of the cost. In this presentation, its acceleration using GPGPU implemented in the program package xTAPP will be presented. xTAPP is a plane wave based first principles calculation program package developed by the author and his collaborators. GPGPUs have very high memory band width which is required for FFTs. However the data transfer band width between a GPGPU and a CPU is rather low and this is the bottleneck to utilize GPGPU naively. In the xTAPP, this bottleneck is resolved by blocking the computation of the exchange interactions. The exchange interaction is an aggregate of band times band computations consists of FFTs. By blocking this computations with respect to the bands, we can reduce the proportion of data transfers between a CPU and a GPGPU to the computation of FFTs. [Preview Abstract]
Thursday, March 6, 2014 2:03PM - 2:15PM	T27.00011: Sign problems and tensor renormalization group Yuzhi Liu, Shailesh Chandrasekharan, Alan Denbleyker, Yannick Meurice, Mingpu Qin, Tao Xiang, Zhiyuan Xie, Ji-Feng Yu, Judah Unmuth-Yockey, Haiyuan Zou Sign problems appear generically in simulating a system with a high density of fermions, where the Boltzmann weight oscillates fast. Sign problems also occur in modes with complex couplings or temperature. It remains a challenging problem for Monte Carlo practitioners in condensed matter physics and particle physics. In this talk, I will present our latest results on calculating lattice spin models with complex coupling via numerical tensor renormalization group method. I will also present results on two dimensional XY (or O(2)) model with a complex ``chemical potential'' term. Comparison with the world-line algorithm will be shown and a discussion on possible extension of the tensor renormalization group method to models in other gauge groups and higher dimensions will be followed. [Preview Abstract]

About APS

The American Physical Society (APS) is a non-profit membership organization working to advance the knowledge of physics.

Headquarters 1 Physics Ellipse, College Park, MD 20740-3844 (301) 209-3200
Editorial Office 100 Motor Pkwy, Suite 110, Hauppauge, NY 11788 (631) 591-4000
Office of Public Affairs 529 14th St NW, Suite 1050, Washington, D.C. 20045-2001 (202) 662-8700

Bulletin of the American Physical Society

APS March Meeting 2014

Volume 59, Number 1

Monday–Friday, March 3–7, 2014; Denver, Colorado

Session T27: Focus Session: Heterogeneous High Performance Computing Platforms on Computational Physics

Follow Us

Engage

My APS

Information for

About APS