Bulletin of the American Physical Society
APS March Meeting 2021
Volume 66, Number 1
Monday–Friday, March 15–19, 2021; Virtual; Time Zone: Central Daylight Time, USA
Session C04: Machine Learning for Biomolecular Design and SimulationFocus Live
|
Hide Abstracts |
Sponsoring Units: DPOLY DBIO DCOMP GSNP Chair: Stefano Martiniani, University of Minnesota |
Monday, March 15, 2021 3:00PM - 3:12PM Live |
C04.00001: Rational optimization of drug-membrane selectivity by computational screening Bernadette Mohr, Kirill Shmilovich, Tristan Bereau, Andrew Ferguson Mitochondria are organelles of eucaryiotic cells involved in a number of physiological pathways. Cardiolipin (CL) is a phospholipid unique to the inner mitochondrial membrane. It plays a central role in mitochondrial functions and dynamics, and CL abnormalities are implicated in diseases. Our goal is to find compounds with high selectivity that can act as CL probes. |
Monday, March 15, 2021 3:12PM - 3:48PM Live |
C04.00002: Learning molecular models from simulation and experimental data Invited Speaker: Cecilia Clementi The last years have seen an immense increase in high-throughput and high-resolution technologies for experimental observation as well as high-performance techniques to simulate molecular systems at a microscopic level, resulting in vast and ever-increasing amounts of high dimensional data. However, experiments provide only a partial view of macromolecular processes and are limited in their temporal and spatial resolution. On the other hand, atomistic simulations are still not able to sample the conformation space of large complexes, thus leaving significant gaps in our ability to study molecular processes at a biologically relevant scale. We present our efforts to bridge these gaps, by combining statistical physics with state-of-the-art machine-learning methods to design optimal coarse models for complex macromolecular systems. We derive simplified molecular models to reproduce the essential information contained both in microscopic simulation and experimental measurements. |
Monday, March 15, 2021 3:48PM - 4:00PM Live |
C04.00003: Toward Transferable Deep Learning Atomistic Potential for Biomolecular Simulations Olexandr Isayev In the sciences, computational chemists and physicists have been using ML for the prediction of physical phenomena, such as atomistic potential energy surfaces and reaction pathways. Transferable ML potentials, such as ANI-1x, have been developed with the goal of accurately simulating organic molecules containing the chemical elements H, C, N, and O. Here we provide an extension. The new model, dubbed ANI-2x, is trained to sulfur and halogens. These new features open a wide range of new applications within organic chemistry and drug development. To show that these additions do not sacrifice accuracy, we have tested this model across a range of organic molecules and applications, including dihedral rotations, conformer scoring, and non-bonded interactions. ANI-2x is shown to accurately predict molecular energies compared to DFT with a ~106 factor speedup. A resulting model is a valuable tool for drug development that can potentially replace both quantum calculations and classical force fields for myriad applications. |
Monday, March 15, 2021 4:00PM - 4:12PM Live |
C04.00004: Accurate Molecular Polarizabilities with Coupled Cluster Theory and Machine Learning Yang Yang, Ka Un Lao, David M. Wilkins, Andrea Grisafi, Michele Ceriotti, Robert Distasio Despite the importance of the molecular dipole polarizability in governing key intra- and inter-molecular interactions (such as induction and dispersion), determining the spectroscopic signatures of molecules, and being an essential ingredient in polarizable force fields, an accurate and computationally efficient prediction of this fundamental quantum mechanical response property still remains a challenge to date. In this work, we present a benchmark database [1] of highly accurate static dipole polarizability tensors of 7,211 small organic molecules computed using linear response coupled cluster singles and doubles theory (LR-CCSD). Using a symmetry-adapted machine-learning approach [2], we also demonstrate that it is possible to predict these LR-CCSD polarizabilities with an error that is an order of magnitude smaller than that of hybrid density functional theory (DFT). The resulting AlphaML model is robust and transferable, and able to yield molecular polarizabilities for a diverse set of 52 larger molecules (including challenging conjugated systems, carbohydrates, small drugs, amino acids, nucleobases, and hydrocarbon isomers) with a similar level of accuracy and at a negligible computational cost. |
Monday, March 15, 2021 4:12PM - 4:24PM Live |
C04.00005: Machine Learning on a Quantum Hamiltonian shows that DNA is Much Stretchier than Classical Simulations Suggest Joshua Berryman The free energy to pull apart stacked DNA bases is found to be much lower than classical simulations to date have suggested. Thermodynamic calculations are made in explicit water using a machine learning molecular dynamics method, trained on a novel dataset of quantum calculations making an advanced treatment of dispersion interactions. While the novel results contrast with previous classical simulations, they are consistent with values extrapolated down to the nanoscale from single molecule pulling experiments on large DNA double helices. The presented machine learned Hamiltonian for DNA is generally applicable to nucleic acids and is efficient to apply, therefore suggesting wide application in the future. |
Monday, March 15, 2021 4:24PM - 4:36PM Live |
C04.00006: Machine learning for DNA self-assembly: a numerical case study Jörn Appeldorn, Arash Nikoubashman, Thomas Speck We study the spontaneous self-assembly of two single-stranded DNA (ssDNA) fragments using the coarse-grained oxDNA2 implementation [1]. Successful assembly is a rare event that requires crossing free energy barriers of several kBT. To accurately determine different states and transition rates, we use trajectories from molecular dynamics simulations to construct a Markov state model. To this end, one needs one or more order parameters (OP) that faithfully describe the transition towards an assembled state. We formulate these OP based on structural information, which we map onto structural descriptors. Specifically, we investigate the latent space of EncoderMap [2] and how it changes with the amount of information contained in the descriptor. |
Monday, March 15, 2021 4:36PM - 4:48PM Live |
C04.00007: Predicting Protein Developability via Convolutional Sequence Representation Alexander Golinski, Bryce Johnson, Sidharth Laxminarayan, Diya Saha, Sandhya Appiah, Benjamin Hackel, Stefano Martiniani Engineered proteins have emerged as novel diagnostics, therapeutics, and catalysts. Often, poor protein developability - quantified by expression, solubility, and stability - hinders commercialization. The ability to predict protein developability from amino acid sequence would reduce the experimental burden when selecting candidates. Recent advances in screening technologies enabled a high-throughput (HT) developability dataset for 105 of 1020 possible variants of protein scaffold Gp2. In this work, we evaluate the ability of neural networks to learn a developability representation from the HT dataset and transfer the knowledge to predict recombinant expression beyond the observed sequences. Mimicking protein theory, our model convolves learned amino acid properties to predict expression levels 42% closer to the experimental variance compared to a non-embedded control. Analysis of learned amino acid embeddings highlights the uniqueness of cysteine and the importance of hydrophobicity and charge, and unimportance of aromaticity, when aiming to improve developability. We identify clusters of similar sequences with increased developability through nonlinear dimensionality reduction (UMAP) and explore the inferred developability landscape via nested sampling. |
Monday, March 15, 2021 4:48PM - 5:00PM Live |
C04.00008: Supremum modeling to extend model transferability in systems biology Cody Petrie, Christian Anderson, Mark Transtrum A goal of physical modeling is to relate system-level phenomena to physical mechanisms. The complexity of biological systems makes it difficult to build models that balance parsimony and physical realism. The Manifold Boundary Approximation Method can derive simple models with limited scope from biological first-principles. However, these models may not reliably transfer since they abstract away the mechanisms irrelevant for their target context. I describe an approach to improve the transferablility of these reduced models. I consider the space of all possible reduced models. Given two minimal models, I construct the simplest model that can be reduced to them. This model is the least upper bound in complexity, and so we refer to it as the "supremum." By unifying the mechanistic explanations for different phenomena, the supremum model is predictive under diverse conditions. I illustrate for the Wnt signaling pathway. I build minimal models that describe Wnt signaling for two different developmental stages and construct their supremum. The supremum model predicts a new phenomenon: controlled pulsing, analogous to expression patterns of anterior-posterior axial development. The supremum principle is broadly applicable to create parsimonious, transferable models. |
Monday, March 15, 2021 5:00PM - 5:36PM Live |
C04.00009: Prospective experimental validation of machine learning for biological sequence design Invited Speaker: Lucy Colwell Prediction of protein functional properties from sequence is a central challenge that would allow us to discover new proteins with specific functionality. Experimental breakthroughs allow data on the relationship between sequence and function to be rapidly acquired that can be used to train and validate machine learning models that predict protein function directly from sequence. However, the cost and latency of wet-lab experiments require methods that find good sequences in few experimental rounds, where each round contains large batches of sequence designs. In this setting, I will discuss model-based optimization approaches that allow us to take advantage of sample inefficient methods and find diverse optimal sequence candidates for experimental evaluation. The potential of this approach is illustrated through the design and experimental validation of viable AAV capsid protein variants for gene therapy applications in addition to the design and validation of peptides as potential therapeutics. |
Monday, March 15, 2021 5:36PM - 5:48PM Live |
C04.00010: Recurrent networks for protein structure prediction using Frenet-Serret equations and latent residue representations Nazim Bouatta A novel version of the Recurrent Geometrical Network (RGN1) algorithm, which geometrically reasons over protein conformations, is used to predict protein structures. We use a transfer matrix formalism, which enables reasoning over protein backbones using a discrete version of the Frenet-Serret equations (dFSE) that leverages the fact that protein backbones are intrinsically discrete one-dimensional curves. dFSE-based RGNs are used with a context-based encoding of amino acid residues – AminoBERT – derived strictly from raw amino acid sequences without making explicit use of any evolutionary information. For building AminoBERT a reformulated version of the BERT language model is used to train a transformer over protein sequences to predict missing amino acids conditioned on the flanking sequence. Amino acid residues are thus mapped onto a higher-dimensional representation. |
Monday, March 15, 2021 5:48PM - 6:00PM Live |
C04.00011: Multi-fidelity integrated computational-experimental design of self-assembling π-conjugated optoelectronic peptides Kirill Shmilovich, Sayak Panda, John D. Tovar, Andrew Ferguson In this work we employ multi-fidelity Bayesian optimization to fuse experimental and computational datastreams for the design of self-assembling π-conjugated peptides with emergent optoelectronic properties. We consider a family of peptides composed of a central π-core flanked by oligopeptide wings that have been demonstrated to self-assemble into supramolecular pseudo-1D nanoaggregates with emergent optoelectronic properties. Exhaustive traversal of the molecular design space of π-cores and peptide wings by either simulation or experiment is prohibitively expensive. This motivated the construction of a multi-fidelity Bayesian optimization platform to fuse cheap, high-volume, and approximate simulation data with expensive, low-volume, and accurate experimental data to rationally traverse the design space and efficiently identify molecules with engineered optoelectronic properties. New molecules identified by this active learning platform for experimental synthesis and testing yield superior optoelectronic properties compared to the best performing previous candidates. |
Follow Us |
Engage
Become an APS Member |
My APS
Renew Membership |
Information for |
About APSThe American Physical Society (APS) is a non-profit membership organization working to advance the knowledge of physics. |
© 2024 American Physical Society
| All rights reserved | Terms of Use
| Contact Us
Headquarters
1 Physics Ellipse, College Park, MD 20740-3844
(301) 209-3200
Editorial Office
100 Motor Pkwy, Suite 110, Hauppauge, NY 11788
(631) 591-4000
Office of Public Affairs
529 14th St NW, Suite 1050, Washington, D.C. 20045-2001
(202) 662-8700