Bulletin of the American Physical Society

APS March Meeting 2014

Volume 59, Number 1

Monday–Friday, March 3–7, 2014; Denver, Colorado

Session D27: Focus Session: High Performance Computing in Density Functional Theory	Hide Abstracts
Sponsoring Units: DCOMP DCP Chair: Aldo Romero, West Virginia University Room: 501

Monday, March 3, 2014 2:30PM - 3:06PM	D27.00001: Accuracy, Speed, Scalability: the Challenges of Large-Scale DFT Simulations Invited Speaker: Francois Gygi First-Principles Molecular Dynamics (FPMD) simulations based on Density Functional Theory (DFT) have become popular in investigations of electronic and structural properties of liquids and solids. The current upsurge in available computing resources enables simulations of larger and more complex systems, such as solvated ions or defects in crystalline solids. The high cost of FPMD simulations however still strongly limits the size of feasible simulations, in particular when using hybrid-DFT approximations. In addition, the simulation times needed to extract statistically meaningful quantities also grows with system size, which puts a premium on scalable implementations. We discuss recent research in the design and implementation of scalable FPMD algorithms, with emphasis on controlled-accuracy approximations and accurate hybrid-DFT molecular dynamics simulations, using examples of applications to materials science and chemistry. [Preview Abstract]
Monday, March 3, 2014 3:06PM - 3:18PM	D27.00002: Challenges and advances in large-scale DFT calculations on GPUs Heather Kulik Recent advances in reformulating electronic structure algorithms for stream processors such as graphical processing units have made DFT calculations on systems comprising up to $O$(10$^3$) atoms feasible. Simulations on such systems that previously required half a week on traditional processors can now be completed in only half an hour. Here, we leverage these GPU-accelerated quantum chemistry methods to investigate large-scale quantum mechanical features in protein structure, mechanochemical depolymerization, and the nucleation and growth of heterogeneous nanoparticle structures. In each case, large-scale and rapid evaluation of electronic structure properties is critical for unearthing previously poorly understood properties and mechanistic features of these systems. We will also discuss outstanding challenges in the use of Gaussian localized-basis-set codes on GPUs pertaining to limitations in basis set size and how we circumvent such challenges to computational efficiency with systematic, physics-based error corrections to basis set incompleteness. [Preview Abstract]
Monday, March 3, 2014 3:18PM - 3:30PM	D27.00003: Implementation of the Small Box Fast Fourier Transformation Method within Orbital-Free Density Functional Theory Mohan Chen, Xiang-Wei Jiang, Lin-Wang Wang, Emily Carter Orbital-Free density functional theory (OFDFT) is a first-principles quantum mechanics method that uses the electron density as its only variable. The main computational cost in OFDFT are Fast Fourier Transforms (FFTs), used to evaluate both the kinetic energy density functional and the Coulomb term. The Small Box Fast Fourier Transform (SBFFT) technique is a newly developed method for solving the Poisson equation using a large number of processors [1]. We further adopt this SBFFT for the non-local kinetic energy density functional (KEDF) term frequently used in OFDFT, for which multiple FFTs are required. An efficient truncation of a real space KEDF kernel is proposed in order to take the advantage of SBFFT. This new method yields similar results as the original OFDFT formulation, as tested on bulk crystals, defects, and surfaces. Finally, we report progress in implementing all the mentioned techniques in PROFESS (PRinceton Orbital-Free Electronic Structure Software) [2]. [1] Xiang-Wei Jiang, Shu-Shen Li and Lin-Wang Wang, Comp. Phys. Comm. (in press). [2] L. Hung, C. Huang, I.Shin, G. Ho, V. L. Ligneres, and E. A. Carter, Comput. Phys. Comm., 181, 2208 (2010). [Preview Abstract]
Monday, March 3, 2014 3:30PM - 3:42PM	D27.00004: Planning the next generation of density functional codes Grady Schofield, James R. Chelikowsky, Yousef Saad Real-space pseudopotential density functional theory has proven to be an efficient avenue for computing the properties of matter in many different states and geometries, including liquids, wires, slabs and clusters with and without spin polarization. Fully self-consistent solutions have been routinely obtained for systems with thousands of atoms. However, there are still systems where quantum mechanical accuracy is desired, but scalability proves to be a hindrance, such as large biological molecules or complex interfaces. We will present an overview of our work on algorithms for this problem, which has taken the route of improved scalability by spectrum slicing in the eigensolver, {\it i.e.}, the construction of a ``parallel'' eigensolver. We will also discuss how accurate forces can be obtained for ``coarse grids.'' [Preview Abstract]
Monday, March 3, 2014 3:42PM - 3:54PM	D27.00005: High Performance Computing for Large Systems: Using Real Space Pseudopotentials for Metal-Semiconductor Interfaces Jaime Souto, James R. Chelikowsky, Tzu-Liang Chan, Kai-Ming Ho, Cai-Zhuang Wang, Shengbai Zhang Solving for the electronic structure at an interface can be computationally intensive. Even at the interface between crystalline systems, the structural details may not be known. Mismatch between the crystalline systems can result in unit cells containing hundreds, if not thousands of atoms. Until recently, such systems were not computationally tractable. Real-space pseudopotential density functional theory has proven to be an efficient avenue for computing the properties of such systems. Fully self-consistent solutions have been routinely obtained for systems with thousands of atoms. We illustrate this method applied to a Pb(111)/Si (111) interface and in particular examine the evolution of a Schottky barrier for this interface. We examine systems up to 1,500 atoms and determine the details of how quantum confinement controls the electronic structure of this system. [Preview Abstract]
Monday, March 3, 2014 3:54PM - 4:06PM	D27.00006: Linear-Scaling Density Functional Theory Simulations of Nanomaterials with the ONETEP code: vdW-DF and PAW methodology and OpenMP/MPI hybrid parallelism Nicholas Hine, Gabriel Constantinescu, Mike Payne, Lampros Andrinopoulos, Arash Mostofi, Peter Haynes, Karl Wilkinson, Jacek Dziedzic, Chris-Kriton Skylaris Methods based on traditional density functional theory (DFT) seek eigenstates of the Kohn-Sham Hamiltonian, and thus inevitably hit a scaling wall as system size increases, due to cubic scaling of the computational effort. However, useful contact with experiment in the study of nanomaterials (eg nanocrystals, interfaces, proteins, disordered molecular crystals) requires accurate calculations on systems comprising many thousands of atoms, beyond this scaling wall. Approaches based on the density matrix can exploit real-space localisation to achieve linear-scaling with system size and make such calculations feasible and highly parallel. The ONETEP Linear-Scaling DFT code [1] combines the benefits of linear-scaling, efficient parallelisation, and variational convergence akin to plane-wave approaches, with a wide-ranging set of features. I will present an overview of the code and recent developments: hybrid parallelism based on OpenMP and MPI, enabling scaling to tens of thousands of cores; Projector Augmented Wave methods, enabling study of transition metals; and van der Waals DF methods. These have combined to enable studies of C$_{60}$ molecular crystals and Transition Metal Dichalcogenide interfaces eg MoS$_2$/MoSe$_2$. [1] C. Skylaris et al, JCP 122, 084119 (2005). [Preview Abstract]
Monday, March 3, 2014 4:06PM - 4:18PM	D27.00007: Plane Wave First-principles Materials Science Codes on Multicore Supercomputer Architectures Andrew Canning, Jack Deslippe, Steven.G. Louie Plane wave first-principles codes based on 3D FFTs are one of the largest users of supercomputer cycles in the world. Modern supercomputer architectures are constructed from chips having many CPU cores with nodes containing multiple chips. Designs for future supercomputers are projected to have even more cores per chip. I will present new developments for hybrid MPI/OpenMP PW codes focusing on a specialized 3D FFTs that gives greatly improved scaling over a pure MPI version on multicore machines. Scaling results will be presented for the full electronic structure codes PARATEC and BerkeleyGW. using the new hybrid 3D FFTs, threaded libraries and OpenMP to gain greatly improved scaling to very large core count on Cray and IBM machines. [Preview Abstract]
Monday, March 3, 2014 4:18PM - 4:54PM	D27.00008: ABINIT: Plane-Wave-Based Density-Functional Theory on High Performance Computers Invited Speaker: Marc Torrent For several years, a continuous effort has been produced to adapt electronic structure codes based on Density-Functional Theory to the future computing architectures. Among these codes, ABINIT [1] is based on a plane-wave description of the wave functions which allows to treat systems of any kind. Porting such a code on petascale architectures pose difficulties related to the many-body nature of the DFT equations. To improve the performances of ABINIT -- especially for what concerns standard LDA/GGA ground-state and response-function calculations -- several strategies have been followed: A full multi-level parallelisation MPI scheme has been implemented, exploiting all possible levels and distributing both computation and memory. It allows to increase the number of distributed processes and could not be achieved without a strong restructuring of the code. The core algorithm used to solve the eigen problem (``Locally Optimal Blocked Congugate Gradient''), a Blocked-Davidson-like algorithm, is based on a distribution of processes combining plane-waves and bands. In addition to the distributed memory parallelization, a full hybrid scheme has been implemented, using standard shared-memory directives (\textit{openMP}/\textit{openACC}) or porting some comsuming code sections to Graphics Processing Units (GPU). As no simple performance model exists, the complexity of use has been increased; the code efficiency strongly depends on the distribution of processes among the numerous levels. ABINIT is able to predict the performances of several process distributions and automatically choose the most favourable one. On the other hand, a big effort has been carried out to analyse the performances of the code on petascale architectures, showing which sections of codes have to be improved; they all are related to Matrix Algebra (diagonalisation, orthogonalisation). The different strategies employed to improve the code scalability will be described. They are based on an exploration of new diagonalization algorithm, as well as the use of external optimized librairies. Part of this work has been supported by the european Prace project (PaRtnership for Advanced Computing in Europe) [2] in the framework of its workpackage 8.\\[4pt] [1] http://www.abinit.org\\[0pt] [2] http://www.prace-ri.eu [Preview Abstract]
Monday, March 3, 2014 4:54PM - 5:06PM	D27.00009: MBPT calculations with ABINIT Matteo Giantomassi, Georg Huhs, David Waroquiers, Xavier Gonze Many-Body Perturbation Theory (MBPT) defines a rigorous framework for the description of excited-state properties based on the Green's function formalism. Within MBPT, one can calculate charged excitations using \emph{e.g.} Hedin's $GW$ approximation for the electron self-energy. In the same framework, neutral excitations are also well described through the solution of the Bethe-Salpeter equation (BSE). In this talk, we report on the recent developments concerning the parallelization of the MBPT algorithms available in the ABINIT code (www.abinit.org). In particular, we discuss how to improve the parallel efficiency thanks to a hybrid version that employs MPI for the coarse-grained parallelization and OpenMP (a de facto standard for parallel programming on shared memory architectures) for the fine-grained parallelization of the most CPU-intensive parts. Benchmark results obtained with the new implementation are discussed. Finally, we present results for the $GW$ corrections of amorphous SiO$_2$ in the presence of defects and the BSE absorption spectrum. [Preview Abstract]
Monday, March 3, 2014 5:06PM - 5:18PM	D27.00010: SIESTA-PEXSI: Massively parallel method for efficient and accurate ab initio materials simulation Lin Lin, Georg Huhs, Alberto Garcia, Chao Yang We describe how to combine the pole expansion and selected inversion (PEXSI) technique with the SIESTA method, which uses numerical atomic orbitals for Kohn-Sham density functional theory (KSDFT) calculations. The PEXSI technique can efficiently utilize the sparsity pattern of the Hamiltonian matrix and the overlap matrix generated from codes such as SIESTA, and solves KSDFT without using cubic scaling matrix diagonalization procedure. The complexity of PEXSI scales at most quadratically with respect to the system size, and the accuracy is comparable to that obtained from full diagonalization. One distinct feature of PEXSI is that it achieves low order scaling without using the near-sightedness property and can be therefore applied to metals as well as insulators and semiconductors, at room temperature or even lower temperature. The PEXSI method is highly scalable, and the recently developed massively parallel PEXSI technique can make efficient usage of 10,000$\sim$100,000 processors on high performance machines. We demonstrate the performance the SIESTA-PEXSI method using several examples for large scale electronic structure calculation including long DNA chain and graphene-like structures with more than 20000 atoms. [Preview Abstract]
Monday, March 3, 2014 5:18PM - 5:30PM	D27.00011: A linear-scaling implementation of time-dependent density-functional theory (TDDFT) in the linear-response formalism Tim Zuehlsdorff, Nicholas Hine, James Spencer, Nicholas Harrison, Jason Riley, Peter Haynes In recent years, linear-scaling approaches to density functional theories have been successfully used to predict ground state properties of nanostructures and large biological systems. While these methods are now well established, the linear-scaling computation of excited state properties via time-dependent density-functional theory (TDDFT) remains challenging. In this talk, we will present a fully linear-scaling implementation of TDDFT in the linear-response formalism that we developed recently (J. Chem. Phys. 139, 064104) and that is particularly suitable for calculating the low energy absorption spectra of systems containing thousands of atoms. The method avoids any reference to individual Kohn-Sham states. Instead, the occupied and unoccupied subspaces are represented by two effective density matrices that are expanded in terms of two independent sets of in-situ optimized localized orbitals. The double basis set approach avoids known problems of representing the unoccupied space with localized orbitals optimized for the unoccupied space, while the in-situ optimization procedure allows for efficient calculations using a minimal number of basis functions. The linear-scaling properties of the method will be demonstrated on a number of nanostructures. [Preview Abstract]

About APS

The American Physical Society (APS) is a non-profit membership organization working to advance the knowledge of physics.

Headquarters 1 Physics Ellipse, College Park, MD 20740-3844 (301) 209-3200
Editorial Office 100 Motor Pkwy, Suite 110, Hauppauge, NY 11788 (631) 591-4000
Office of Public Affairs 529 14th St NW, Suite 1050, Washington, D.C. 20045-2001 (202) 662-8700

Bulletin of the American Physical Society

APS March Meeting 2014

Volume 59, Number 1

Monday–Friday, March 3–7, 2014; Denver, Colorado

Session D27: Focus Session: High Performance Computing in Density Functional Theory

Follow Us

Engage

My APS

Information for

About APS