Bulletin of the American Physical Society

APS March Meeting 2020

Volume 65, Number 1

Monday–Friday, March 2–6, 2020; Denver, Colorado

Session G20: Data Science II: Machine Learning	Hide Abstracts
Sponsoring Units: GDS Chair: Brian Barnes, US Army Rsch Lab - Aberdeen Room: 301

Tuesday, March 3, 2020 11:15AM - 11:27AM	G20.00001: Addressing the Elephant in the Room: Uncertainties in Physical Predictions From Machine-Learned Force Fields Stefan Chmiela, Huziel Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko Learning molecular force-fields (FF) has played a leading role in the path towards reliable molecular dynamics simulations in biology, chemistry, and materials science[1,2]. However, simulation’s predictive power is only as good as the underlying interatomic potential. Although it is common practice to evaluate the reliability of trained FF models based on typical error measures, this only quantifies the error on the database given a set of training points. The relevant question to ask is how well a learned FF reproduce the actual physical properties a system. Here, we present an analysis of the uncertainties in properties derived from learned-FFs, such as vibrational spectrum and thermodynamics. A clear correlation is found between learning errors and the derived properties' uncertainty. The robustness of the symmetric gradient-domain machine learning (sGDML) framework[1] against such problem is evinced by its fast uncertainty minimization with the training set size. These results will serve as reference for the developing of robust and predictive learned physical models. [1] Chmiela et al. Sci. Adv. 3 (5), e1603015 (2017); Nat. Commun. 9 (1), 3887 (2018); Comput. Phys. Commun. 240, 38 (2019). [2] Sauceda et al. J. Chem. Phys. 150 (11), 114102 (2019); arXiv:1909.08565 (2019).
Tuesday, March 3, 2020 11:27AM - 11:39AM	G20.00002: Understanding key challenges in digitizing and contextualizing experimental results Ha-Kyung Kwon, Chirranjeevi Gopal, Brian D Storey, Santiago Caicedo, Jared Kirschner Recent advances in automated, high-throughput experimentation have enabled scientists to leverage machine learning methods and structured data in screening large parameter spaces. These approaches could be even more effective if supplemented by data collected in traditional experimental labs, where samples are handed off between collaborators at multiple processing and characterization steps. In these settings, unstructured experimental observations recorded in physical lab notebooks can provide context for data and metadata collected from a variety of instruments. Despite a number of electronic lab notebook products available in market, digitization of experimental notes remains a challenge due to low levels of adoption by researchers. In this talk, we present our findings from user research conducted in three different academic labs, on researchers’ behaviors and needs throughout the experimental process. We discuss methods based on human-centered design to guide the development of an easily adoptable solution that seeks to 1) integrate experimental notes into data-driven software platforms, 2) contextualize experimental data obtained from multiple sources, and 3) enhance knowledge transfer between collaborators, thereby accelerating scientific discovery in experimental labs.
Tuesday, March 3, 2020 11:39AM - 11:51AM	G20.00003: Using Machine Learning to Reduce Low-Q Disorder in Quasiparticle Interference Maps Aidan Witeck, Yu Liu, Jennifer E. Hoffman Scanning tunneling microscopy (STM) is a commonly used technique to examine a material on the atomic scale. Quasiparticle interference (QPI) extends its ability to resolve the band structure of materials in the reciprocal space by imaging the scattering patterns of impurities in the real space. Those scattering patterns are normally analyzed with the Fourier transform. However, the Fourier transform suffers from low-q noise arising from correlations between impurity centers, drastically decreasing band resolution in low-q regions. Here we present a novel algorithm that uses Fourier filtering and machine learning to reduce low-q noise. We validate this method using both simulated QPI data and real QPI data from various materials. Our method reduces low-q disorder without the introduction of artifacts, allowing us to more clearly examine the low-q band structure.
Tuesday, March 3, 2020 11:51AM - 12:03PM	G20.00004: CdTe nanoparticles as temperature sensors via machine learning of optical properties John Colton, James W Erikson, Charles Lewis, Carrie E McClure, Derek Sanchez, Troy Munro We have investigated using CdTe nanoparticles as non-invasive temperature sensors. Optical photoluminescence (PL) spectra and time-resolved photoluminescence (TRPL) were measured as functions of temperature and used as inputs to an artificial neural network (ANN) for purposes of machine learning. Two regimes were studied: low temperature, data taken from 10-320 K in steps of 10 K; and high temperature, data taken from 325-346 K in steps of 1 K. Five data sets were withheld for validation from the low temperature data; four from the high temperature. We used preprocessing techniques of min-max normalization and (for the low temperature regime) interpolation to generate additional training samples. Best results for both regimes were obtained using a seven layer fully connected ANN architecture. Hyperparameters varied to optimize the network include number and size of layers (including convolutional layers), batch normalization, activation functions, learning rates, and dividing the PL and TRPL data into separate input branches. Using a typical 80-20 training/testing split, the low temperature (high temperature) network was able to be trained to 0.1 K (1.0 K) training error and 0.3 K (2.5 K) testing error, which results in an error on the withheld validation data of 3.4 K (5.5 K).
Tuesday, March 3, 2020 12:03PM - 12:15PM	G20.00005: Machine Learning X-ray Spectra: Theoretical Training for Experimental Predictions Nicholas Marcella, Anatoly I Frenkel The energy dependent X-ray absorption coefficient encodes the electronic and real space structure of select element species within a material. We have found that the neural network (NN) is capable of modeling this relationship in X-ray absorption near edge (XANES) and extended fine structure (EXAFS), resulting in a powerful analytic tool. To date, our NN-assisted analysis methods have been used to investigate the structure and dynamics of nanoparticles and oxide clusters as small at 4 atoms. The availability of large amounts of reliable training data, in terms of both labeling and quality, is key to an accurate NN model. For XANES and EXAFS, a database of such is not experimentally obtainable, so we must use ab initio spectroscopy codes to create a theoretical database. Because theoretical training data only approximates real observations (due to theoretical limitations, and experimental considerations such as noise, resolution, and glitching), many local minima emerge when optimizing for accurate experimental predictions. We will discuss how, by probing various local minima during training, with a set of labeled experimental data, we can find a NN which generalizes theoretical features for use in experimental predictions.
Tuesday, March 3, 2020 12:15PM - 12:27PM	G20.00006: Machine learning on the electron-phonon spectral function and the superconductor gap function Ming-Chien Hsu, Wan-Ju Li, Ting-Kuo Lee, Shin-Ming Huang The phonon mediated superconductor can be described well by the Eliashberg equation. Once the electron-phonon spectral function is known either ab initially or estimated from experiments, the superconductor properties such as the gap function and the wave function renormalization can be solved self-consistently from the Eliashberg equation. However, it is important to investigate in a reverse way by inferring the possible original spectral function from known superconductor properties. The mapping can be learned by using the machine learning technique. We generate various spectral functions with numerous peaks and different shapes and solve their gap functions self consistently. With these data, the relation between each pair of the gap function and the corresponding spectral function can be learned by the machine. The functions are learned and recognized in terms of the basis function found by the neural network. The loss function will be hugely reduced each time when the neuron successfully learns a basis function. We find that in general, the neural network is very efficient to learn the correspondence between the electron-phonon spectral function with the superconductor functions mentioned.
Tuesday, March 3, 2020 12:27PM - 12:39PM	G20.00007: Characteristic space of XRD patterns in machine-learning Keishu Uchimura, Masao Yano, Hiroyuki Kimoto, Kenta Hongo, Ryo Maezono X-ray diffraction (XRD) is a commonly used analytical technique to identify crystal structures. Recent advances in automatic measuring techniques enable one to obtain thousands of XRD patterns within a day. Basically, however, their analysis is done by comparing measured patterns with reference ones, which relies on experts' knowledge and great efforts. Even using a computational implement of the Rietveld refinement, characterization of some XRD patterns would take a few hours (or more) even for experts. Thus, automation/acceleration of the XRD analysis is really desired for managing high throughput XRD patterns. In this study we adopted an unsupervised machine learning technique, auto-encoder, to analyze XRD patterns. Vectorizing XRD patterns to make feature vectors, the encoder itself maps the high-dimensional vectors onto low-dimensional (say, 2-dim.) ones. It was found that XRD patterns get into groups of different compositions in the reduced feature space. We thus concluded that our scheme can capture slight difference in lattice constant caused by atomic substitutions in magnetic alloys without any prior knowledge.
Tuesday, March 3, 2020 12:39PM - 12:51PM	G20.00008: Using machine learning to understand mutations Martha Villagran, Nikolaos Mitsakos, John Miller, Ricardo Azevedo A single harmful base substitution in a DNA sequence can, on occasion, cause a devastating fatal disease. Determining why this happens for some mutations, but not for others, is critical to developing effective treatments and poses a central challenge to modern medicine. Mitochondrial DNA (mtDNA) is especially vulnerable, mutating about 100 times faster than the nuclear genome. Uncovering how DNA’s electronic fingerprint influences its mutation spectrum is of critical importance to genetics and evolutionary biology. At the same time, machine learning, with the added capabilities of deep architectures that saw tremendous advances recently, provides a framework that allows recognizing patterns in data, even when these patterns are governed by very complex interlinked properties. We are investigating the capability of machine learning, with a focus on deep learning architectures, for detecting and predicting potential mutation locations in mtDNA. We demonstrate that these models can learn to discriminate between locations on the DNA where mutations can occur versus stable locations, to the extent that these situations are effectively represented in the available data.
Tuesday, March 3, 2020 12:51PM - 1:03PM	G20.00009: Machine Learning of Energetic Material Properties and Performance Brian Barnes, Betsy M Rice, Andrew E Sifain We present advances in accurate, rapid prediction of detonation pressure, detonation velocity, heat of formation, density, and melting point of energetic molecules. Molecules evaluated are CHNO-containing organic molecules drawn from public datasets and known explosives. These models may be integrated into a larger effort for high-throughput virtual screening or rapid pre-screening of molecules before any hazardous synthesis is attempted. Our research evaluates a message-passing neural network (MPNN) model with representation learned from 2D structure trained on a large body of data generated by physics-driven (quantum mechanically derived) models, and also a thermodynamic fingerprint representation used to train a gradient-boosted decision tree method on a smaller body of experimental data. The utility of each representation and statistical model is discussed. The Python workflow for each analysis is discussed. This data-driven approach is shown to provide advances in speed and accuracy for energetic material property prediction. A brief introduction to energetic materials and detonation physics is provided for non-experts.
Tuesday, March 3, 2020 1:03PM - 1:15PM	G20.00010: Simulation of atmospheric turbulence with generative machine learning models Arturo Rodriguez, Carlos R Cuellar, Luis Fernando Rodriguez, Armando Garcia, Jose Terrazas, VM Krushnarao Kotteda, Rao Gudimetla, Vinod Kumar, Jorge Munoz The Large Eddy Simulation (LES) modeling of turbulence effects are computationally expensive even when not all scales are resolved, especially in the presence of deep turbulence effects in the atmosphere. Machine learning techniques provide a novel way to propagate the effects from inner- to outer-scale in atmospheric turbulence spectrum and to accelerate its characterization on long-distance laser propagation. We simulated the turbulent flow of atmospheric air in an idealized box with a temperature difference between the lower and upper surfaces of 10 degrees Celsius with the LES method. The volume was voxelized and several quantities such as the velocity and the pressure were obtained at regularly-spaced grid points. These values were binned and converted into symbols that were concatenated along the length of the box to create a ‘text’ that was used to train a long short-term memory (LSTM) neural network and a naïve Bayes model. LSTMs are used in speech recognition and handwriting recognition tasks and naïve Bayes is used extensively in text categorization. The trained LSTM and the naïve Bayes models were used to generate instances of turbulent-like flows.
Tuesday, March 3, 2020 1:15PM - 1:27PM	G20.00011: Identification of informative acoustic features in the transition from non-violent to violent crowd behavior Katrina Pedersen, Brooks A Butler, Sean Warnick, Kent L Gee, Mark Transtrum Human crowds can exhibit a variety of collective behaviors. Here, we explore the transition from peaceful to violent behavior in human crowds using acoustic data. Predicting when a crowd will transition from a peaceful to a violent state has many potential applications, such as peace-keeping and security. Relative to video, acoustic data is easier to obtain and is less affected by lighting conditions, such during night or in dark areas. We apply machine learning methods to a data set that includes both video and audio recordings of violent and non-violent crowds. Previous results showed that audio data was only marginally less effective than video alone for classifying violent/non-violent scenes. In this work, I conduct a feature-importance study to identify which acoustic metrics are most informative for correctly classifying peaceful and violent crowds.
Tuesday, March 3, 2020 1:27PM - 1:39PM	G20.00012: "Robust Speaker Identification System Under Adverse Conditions" Swati Prasad When only a speech utterance is given, finding out the person who spoke the given speech utterance from a group of reference speakers is referred to as Speaker Identification. It is also known as biometric based on voice. Its success has great potential to bring a paradigm shift in the way we communicate with the machine. It will facilitate in the easier and secure communication between Man and Machine using speech. It will particularly benefit the elderly members of the society. Factors like voice disguising, emotional state of the person, background noise and throat diseases create a mismatch between the training and the test speech data, referred to as Mismatched Problem. It decreases the speaker identification accuracy and needs to be addressed. To make the speaker identification system robust against these mismatched conditions, we have developed speech frame selection methods for feature extraction. It captures the characteristics of the speech signal efficiently from the time-domain signal. For modeling the speaker, the machine learning technique, Gaussian Mixture Model with 64 components is utilized. It has shown good performance under voice disguise and environmental noise conditions. Future work will test its performance for emotional and diseased speech.
Tuesday, March 3, 2020 1:39PM - 1:51PM	G20.00013: Hyperbolic non-metric multidimensional scaling reveals intrinsic geometric structure in high-dimensional data Yuansheng Zhou, Tatyana Olegivna Sharpee Modern datasets characterize objects with respect to many variables and assign distances between objects based on a Euclidean metric. However, recent results suggest that for data produced by underlying hierarchical tree-like networks a hyperbolic metric might be more appropriate than a Euclidean one. We develop non-metric multidimensional scaling (MDS) in hyperbolic space to perform hyperbolic embedding of points. Using simulations we show that hyperbolic MDS, combined with Euclidean MDS, can be used to detect intrinsic geometry of data. Applying hyperbolic MDS to human gene expression data, we find that the samples taken from local clusters have Euclidean structure, but samples taken broadly from the whole population show hyperbolic metric, which indicates that the human gene expression space is locally Euclidean but globally hyperbolic. Further we quantify the hyperbolic radii of cells from other diverse biological systems including different mouse organs, finding that mouse brain and embryonic stem cells are also hyperbolic while organs like mouse lung, kidney and placenta are Euclidean. Our method provides a quantitative approach to detecting hidden geometric structures and quantifying cell hierarchies of diverse biological systems.
Tuesday, March 3, 2020 1:51PM - 2:03PM	G20.00014: Data Augmentation and Pre-training for Template-Based Retrosynthetic Prediction Mike Fortunato, Connor Coley, Brian Barnes, Klavs Jensen A key step in computer-aided synthesis planning (CASP) is the prioritization of candidate molecular transformations for retrosynthetic analysis. Recent methods obtaining state-of-the-art accuracy have used machine learning (ML) models as recommendation engines to rank reaction templates extracted from databases of recorded reactions. However, data scarcity limits the ability for ML models to recommend rare, often highly desired, transformations. In this work we discuss the augmentation of open-access reaction databases with synthetically generated molecular transformations to teach neural networks generalized template applicability. We use this as a pre-training strategy, which is followed by fine tuning of the model parameters using true, recorded reactions, in order to increase the diversity of suggested retrosynthetic transformations. While previous methods have focused on learning a one-to-one-mapping from featurized molecular inputs to a single template transformation, pre-training with general template applicability allows these new models to learn a one-to-many mapping to multiple templates. The implications of performing data augmentation and pre-training on different sized datasets is discussed, as well as the changes in performance for rare reaction templates.
Tuesday, March 3, 2020 2:03PM - 2:15PM	G20.00015: Neural network-assisted analysis of X-ray absorption spectra of metal oxide clusters Yang Liu, Nicholas Marcella, Anatoly I Frenkel It is challenging to understand the reactivity from structure perspective for the supported metal oxide clusters. Many operando characterization techniques for solving such challenge are limited due to the low-metal loading and high temperature condition. Because of the sensitivity of X-ray absorption near edge (XANES) to the local structure, we demonstrated that XANES can be analyzed and provide structural information combing with supervised machine learning method. In this work, we apply the neural network method to the analysis of grazing incidence XANES spectra of size-selective Cu oxide clusters on flat support, measured in operando condition. The convolution neural network was trained to build the correlation between the XANES and structural descriptors (Cu-Cu coordination numbers). Our result indicates that we can distinguish between different structural motifs of Cu oxide cluster during the reaction conditions and invert the experimental XANES to obtain structure parameters which helps the understanding of the structure-properties relation of the catalysts.

About APS

The American Physical Society (APS) is a non-profit membership organization working to advance the knowledge of physics.

Headquarters 1 Physics Ellipse, College Park, MD 20740-3844 (301) 209-3200
Editorial Office 100 Motor Pkwy, Suite 110, Hauppauge, NY 11788 (631) 591-4000
Office of Public Affairs 529 14th St NW, Suite 1050, Washington, D.C. 20045-2001 (202) 662-8700

Bulletin of the American Physical Society

APS March Meeting 2020

Volume 65, Number 1

Monday–Friday, March 2–6, 2020; Denver, Colorado

Session G20: Data Science II: Machine Learning

Follow Us

Engage

My APS

Information for

About APS