Bulletin of the American Physical Society

APS March Meeting 2023

Volume 68, Number 3

Las Vegas, Nevada (March 5-10)
Virtual (March 20-22); Time Zone: Pacific Time

Session T08: Physics of Proteins III: Evolution and Function of Molecular Interactions

Sponsoring Units: DBIO
Chair: Xiaoqin Zou, University of Missouri
Room: Room 131

Thursday, March 9, 2023 11:30AM - 12:06PM	T08.00001: Exploring protein biophysics with deep learning Invited Speaker: Claus Wilke Deep learning approaches are becoming increasingly useful for studying protein biophysics. For example, AlphaFold is famously one of the best currently available tools for predicting protein structure from sequence. Beyond structure prediction, deep learning approaches can help to predict mutational effects, protein function, or ligand binding. In all these applications, from a physics perspective, the primary challenge is to understand what the biophysical meaning is of the predictions produced by machine learning, how the machine learning algorithms make their predictions, and how we can curate and select the right training data for obtaining good results. Here, I will discuss several projects in this field that we are currently pursuing in my lab. First, I will describe how machine learning methods can be used to identify sites that are primed for mutation. Second, I will discuss the differences between models trained purely on sequence data versus on structure data. Finally, I will demonstrate how protein embedding models can be used to search sequence data bases for proteins with specific biophysical characteristics.
Thursday, March 9, 2023 12:06PM - 12:18PM	T08.00002: GENERALIST: Generative Probabilistic Non-Linear Tensor Factorization Model for Proteins Hoda Akl, Brooke Emison, Xiaochuan Zhao, Purushottam Dixit Exploring the space of functional protein sequences beyond the naturally occurring ones requires generative models that leverage known natural sequences to learn the correlations between amino acid positions. For large protein sequences with datasets of limited sample size, inference of the protein sequence space could be challenging or infeasible. To address this gap, we present GENERALIST: a generative probabilistic model for protein sequences based on tensor factorization. GENERALIST infers a lower dimensional latent representation of the natural sequences which can then be used to generate novel sequences. The generated ensemble conserves several higher order statistics in the natural alignment. Additionally, GENERALIST also reproduces the statistics of the sequence ensemble, including distribution of nearest neighbor distances. Computational assessment of the sequence ensemble using AlphaFold2 suggests that the ensemble comprises structurally stable sequences. The model complexity in GENERALIST is tunable using the dimension of the latent space which allows us to control the tradeoff between accuracy and generality. This way, GENERALIST addresses the limitations of state of art generative models; the model accuracy is robust against the size of the natural protein sequence alignment and the length of the sequence. Notably, our framework is applicable to all types of categorical data including nucleotide sequences and binary data such as presence/absence of genes in genomes, neuronal spikes, etc.
Thursday, March 9, 2023 12:18PM - 12:30PM	T08.00003: Analyses of the cores of AlphaFold2 protein structure predictions Jillian Belluck, Alex T Grigas, Corey S O'Hern Developing computational methods to accurately predict the three-dimensional structure of a protein from its primary sequence of amino acids is an important and unsolved problem. AlphaFold2, a deep learning methodology developed by DeepMind to generate computational models of proteins, has been successful in recent Critical Assessment of protein Structure Prediction competitions. In the present work, we assess AlphaFold2 computational models using the number of residues in the core, a feature that is strongly correlated with protein stability. We find that while AlphaFold2's predictions for the E. coli proteome resemble X-ray crystal structures, the eukaryotic protein predictions contain too few core residues. Our analysis considers the influence of intrinsically disordered sequences on the fraction of core residues, using both AlphaFold2's per-residue confidence levels and the average charge and hydrophobicity of each protein. The variability in the core size of AlphaFold2's predictions across organisms demonstrates that while machine learning methods have increased the accuracy of computational models for protein structure, significant improvements must be made to achieve results comparable to those in experiments.
Thursday, March 9, 2023 12:30PM - 12:42PM	T08.00004: Evaluating Machine Learning Techniques for Decoy Detection of Protein-Protein Interactions Naomi Brandt, Alex T Grigas, Lynne Regan, Corey S O'Hern Generating accurate computational models for protein-protein interfaces (PPIs) and determining the quality of these models remains a significant challenge. Over the past two decades, several methods have been developed to generate and score PPIs. There are two main approaches for PPI scoring: physics-based forcefields that include protein stereochemistry and van der Waals and electrostatic interactions, and knowledge-based scoring functions that based on experimentally determined PPI structures from the Protein Data Bank. With advances in machine learning, neural networks can also be used for PPI model generation and scoring. In this work, we constructed a dataset of high-resolution x-ray crystal structures of protein heterodimers and generated PPI models for each experimental target using current computational protein docking methods: ZDOCK, HDOCK, and Rosetta. Each method applies its own scoring function to rank its models. To assess their accuracy, we scored all models against the three separate scoring functions, as well as published neural networks train in classifying PPI models.
Thursday, March 9, 2023 12:42PM - 12:54PM	T08.00005: Local deformations of proteins through molecular simulations reveal allosteric couplings with implications for drug design Fabian Byléhn, Juan J De Pablo, Gustavo R Perez Lemus, Cintia A Menendez, Walter Alvarado Allosteric regulation is an important property of proteins with many applications in drug design, yet is notoriously difficult to characterize for any general protein. Many proteins show very subtle conformational changes upon allosteric perturbations, with local changes that are not captured by global metrics such as Root Mean Squared Deviation. This poses a challenge for drug design, where subtle allosteric changes induced by drug binding are missed and the computed efficacy of drugs is mischaracterized. We show that the more natural language to describe conformational changes is a local metric based on an elastic strain formalism, that is able to capture local deformations induced by allosteric perturbations such as drug/peptide binding in Molecular Dynamics simulations. The shear strain tensor is calculated upon binding and reveals previously unknown allosteric sites and allosteric mechanisms. In particular, we find that through this formalism, we are able to explain the mechanisms of repurposed drugs against key proteins of the SARS-CoV-2 proteome, and uncover previously unknown binding sites that can be exploited in drug design. This methodology paves the way for the design of new allosteric drugs to tackle diseases that are hard to target through drugs that act at the functional site.
Thursday, March 9, 2023 12:54PM - 1:06PM	T08.00006: Characterising the intrinsically disordered region of ORF6 from SARS-CoV-2 Alice J Pettitt, Lydia Newton, Stephen McCarthy, Alethea B Tabor, Gabriella T Heller, Christian D Lorenz, D. Flemming Hansen Many viral proteins have flexible and disordered regions that lack a well-defined tertiary structure. These disordered regions are often functionally important in immune evasion and for rapid replication. One such viral protein is the 61-residue protein ORF6, from SARS-CoV-2. ORF6 is a potent interferon antagonist that has been shown to bind to the ribonucleic acid export 1 and GLEBS motif of nucleoporin 98 (Rae1-Nup98) heterodimer via its C-terminal region. The binding of ORF6 to the Rae1-Nup98 heterodimer prevents nuclear export of cellular mRNAs, which suppresses the antiviral immune response. ORF6 is predicted to be very flexible, and only very distant homologues to ORF6 are available. This makes homology modelling and AlphaFold2 structure predictions essentially impossible. To characterise the C-terminal region of ORF6 in the unbound state, we combined nuclear magnetic resonance spectroscopy (NMR) with advanced all-atom molecular dynamics (MD) simulations. Specifically, we employed enhanced MD sampling techniques with NMR chemical shift restraints to improve the force field accuracy (metadynamic metainference). Here, I present molecular scale detail on the conformational sampling of the ORF6 C-terminal region and provide a comparison of our MD simulations to experimental results. In agreement with the chemical shifts, the C-terminal region ensemble showed a mainly disordered state. I will also present insights into the mechanisms of ORF6 gained by using this multi-disciplinary approach to characterise the C-terminal region in the unbound state.
Thursday, March 9, 2023 1:06PM - 1:42PM	T08.00007: Evolution of the Structure and Function of the Cyanobacterial Orange Carotenoid Protein and its Quenching of the Cyanobacterial Light Harvesting Antenna Invited Speaker: Cheryl Kerfeld In contrast to those of plants, the photoprotective mechanisms of cyanobacteria have only recently begun to be characterized. One of the most prevalent, involving the Orange Carotenoid Protein (OCP), a photoreceptor, dissipates excess energy captured by the light harvesting antenna (phycobilisome or PBS). The OCP is a soluble, 34 kDa protein that binds a single carotenoid molecule. It is the only known photoactive protein that uses a carotenoid as its sole chromophore. The crystal structure of the OCP^O shows that the protein is comprised of two structural domains: a carotenoid-binding N-terminal domain (NTD), unique to cyanobacteria, and a C-terminal domain (CTD) with superficial structural similarity to BLUF and LOV domains. The carotenoid spans the two domains. The absorption of blue-green light causes the OCP to convert from a dark stable orange form, OCP^O, to a light-activated red form, OCP^R. Structurally the photoactivation is characterized by a 12Å shift in the position of the carotenoid and, as recently revealed by our Cryo-EM structure of the quenching complex between the OCP and the PBS, a 60Å/220 degree rotation of the CTD. The structure of the OCP^R -PBS complex also provides a high-resolution structural description showing how four 34kDa OCPs, each with a single carotenoid, are able to quench the 6.3MDa PBS with its 396 bilin pigments. In conjunction with analysis of genomic sequence data from ecophysiologically diverse cyanobacteria we find a variety of carotenoproteins that are single-domains homologs of the OCP. Collectively our observations suggest a model for the evolution of OCP-mediated photoprotection and provide a framework for co-opting elements of the OCP structurally and functionally for the development of optogenetic and artificial photosynthesis systems.
Thursday, March 9, 2023 1:42PM - 1:54PM	T08.00008: The Molecular Origin of Various DNA-repair Quantum Yields in Photolyases Chao Yang Photolyase (PL) is a blue-light-activated flavoenzyme that use FADH^- as the catalytic cofactor to repair UV-induced DNA lesions including cyclobutene pyrimidine dimers (CPDs) and pyrimidine-pyrimidone (6-4) photoproducts (6-4 PP). Different classes of CPD photolyases show diverse genetic sequences but have similar folding structure. Class I CPD photolyases from bacteria have much higher repair quantum yields than class II CPD photolyases from plants. The difference mainly comes from a bifurcation in initial electron transfer: class I CPD photolyases mainly use a tunnelling pathway (electron from the isoalloxazine ring directly tunnels to the CPD substrate), while class II CPD photolyases mainly use a two-step hopping pathway (electron first jumps to the adenine and then jumps to the CPD substrate). In this study, we switched two key residues in the active sites of class I (N341, R342 in EcPL from Escherichia coli) and class II photolyases (G381, F382 in AtPL from Arabidopsis thaliana). Steady-state repair quantum yield measurements show dramatic lower repair quantum yields in EcPL mutants compared to EcPL while a different trend is observed in AtPL and its mutants. To reveal how these residues affect the repair reaction, ultrafast laser spectroscopy was used to determine the reaction rates of seven electron-transfer reactions in 10 elementary steps. We found that the repair quantum yields can be tuned by favoring either electron tunnelling or hopping channel, and thus adjusting the quantum yield of electron injection to the CPD substrate. Photolyases evolved to have high affinity and specificity towards DNA lesion with reasonably repair quantum yield. Solely increasing repair quantum yield by single mutation may disrupt the delicate balance between substrate binding and CPD repair.
Thursday, March 9, 2023 1:54PM - 2:06PM	T08.00009: Many-body van der Waals forces, polarization response and dynamical effects in poly-peptides Mario Galante, Alexandre Tkatchenko The modeling of conformations and dynamics of supramolecular systems is of primary importance for understanding physicochemical properties of soft matter. Although short-range interactions such as covalent and hydrogen bonding control the local molecular arrangements, non-covalent interactions play a dominant role in determining the global character of the conformations. The many-body dispersion (MBD) approach enables the inclusion of non-pairwise contributions that consistently yield more accurate energies and longer ranged forces than standard Lennard-Jones-like potentials. Here we focus on the signatures of such many-body forces on the dynamical properties of small peptides, both in terms of simplified backbone models and for a 15-residue polyalanine within semiempirical quantum mechanics. We show that beyond-pairwise terms consistently yield a decreased roughness of the energy landscape and more compact, globally optimized conformations [arXiv:2110.06646]. This is intimately related to the delocalization of the force contributions that derives from the higher versatility of the polarization response tensor. We therefore focus our analysis on such response properties, discussing coarse-graining strategies towards the formulation of MBD polarizabilities in terms of fragments, rather than atoms.
Thursday, March 9, 2023 2:06PM - 2:18PM	T08.00010: Examining dynamic allostery and communication in proteins via machine learning and statistical analysis Freddie R Salsbury, Dizhou Wu We will present results from applying machine learning and statistical techniques to the analysis of molecular dynamics simulations with a particular emphasis on understanding how ensemble change and how different regions of proteins, and protein complexes move and potentially communicate. We focus on thrombin as a particularly interesting protein from biomedical and physical viewpoints.
Thursday, March 9, 2023 2:18PM - 2:30PM	T08.00011: Investigating decoy detection for protein-protein interaction models using state-of-the-art scoring methods and a novel graph neural network Jacob Sumner, Grace Meng, Alex T Grigas, Corey S O'Hern Computational prediction and design of proteins is a difficult task that results in models with a wide variation in quality. Decoy detection algorithms seek to classify computational models as high-quality or low-quality without knowledge of the experimental structures. Recently, dramatic improvements have been made in decoy detection of models for single proteins, but decoy detection of models of protein-protein interfaces (PPI) remains challenging. To assess the current state-of-the-art for PPI decoy detection, we scored computational models generated from RosettaDock, ZDOCK, and HDOCK from a dataset of 32 heterodimeric proteins (with high-resolution x-ray crystal structures) against a standard measure of similarity to the x-ray crystal structure. We found that for some targets, the decoy scores were strongly correlated to the structural similarity scores. However, for other targets nearly all decoy scores were not correlated with the structural similarity scores, which indicates the importance of improving PPI scoring functions. To improve PPI decoy detection, we developed a graph attention neural network model. The model creates a graph using the amino acids as nodes and node features determined using natural language processing on the amino acid sequence. We show results for PPI decoy detection after training the model on the Dockground 1.0 and ZDock decoy datasets, totaling over 170 unique heterodimers.

About APS

The American Physical Society (APS) is a non-profit membership organization working to advance the knowledge of physics.

Headquarters 1 Physics Ellipse, College Park, MD 20740-3844 (301) 209-3200
Editorial Office 100 Motor Pkwy, Suite 110, Hauppauge, NY 11788 (631) 591-4000
Office of Public Affairs 529 14th St NW, Suite 1050, Washington, D.C. 20045-2001 (202) 662-8700

Bulletin of the American Physical Society

APS March Meeting 2023

Volume 68, Number 3

Las Vegas, Nevada (March 5-10)Virtual (March 20-22); Time Zone: Pacific Time

Session T08: Physics of Proteins III: Evolution and Function of Molecular Interactions

Follow Us

Engage

My APS

Information for

About APS

Las Vegas, Nevada (March 5-10)
Virtual (March 20-22); Time Zone: Pacific Time