Bulletin of the American Physical Society

APS March Meeting 2016

Volume 61, Number 2

Monday–Friday, March 14–18, 2016; Baltimore, Maryland

Session F41: Maximum Entropy Models: A Promising Link Between Statistical Physics, Inference, and Biology

Sponsoring Units: DBIO GSNP GSOFT
Chair: Gasper Tkacik, IST Austria
Room: 344

Tuesday, March 15, 2016 11:15AM - 11:27AM	F41.00001: Learning probabilities from random observables in high dimensions: the maximum entropy distribution and others Tomoyuki Obuchi, Simona Cocco, Remi Monasson We consider the problem of learning a target probability distribution over a set of $N$ binary variables from the knowledge of the expectation values (with this target distribution) of $M$ observables, drawn uniformly at random. The space of all probability distributions compatible with these $M$ expectation values within some fixed accuracy, called version space, is studied. We introduce a biased measure over the version space, which gives a boost with the entropy of the distributions and with an arbitrary `temperature'. The choice of the temperature allows us to interpolate between the flat measure over all the distributions and the pointwise measure concentrated at the maximum entropy distribution. Using the replica method we compute the volume of the version space and other quantities of interest, such as the distance $R$ between the target distribution and the center-of-mass distribution over the version space. Some phase transitions are found, corresponding to qualitative improvements in the learning of the target distribution and to the decrease of the distance $R$. However, the distance $R$ does not vary with the temperature, meaning that the maximum entropy distribution is not closer to the target distribution than any others. [Preview Abstract]
Tuesday, March 15, 2016 11:27AM - 11:39AM	F41.00002: Semiparametric energy-based models of systems exhibiting criticality Jan Humplik, Gasper Tkacik Over the last decade, several empirical studies have found evidence that many biological and natural systems exhibit critical fluctuations analogous to those observed during second-order phase transitions in equilibrium systems. In many cases, these fluctuations were shown to be equivalent to a thermodynamic version of Zipf's law--if the system is sufficiently large, then a log-log plot of the probability of a state vs. its rank yields a straight line with slope $-1$. Because the origin of critical fluctuations cannot be traced to a unique mechanism, it is important that data-driven phenomenological models of natural systems are flexible enough so as to easily capture any kind of criticality. Here we study a class of models with exactly this property. This class consists of energy-based models in which the exponential Boltzmann factor is replaced by an arbitrary nonlinear function. We demonstrate the usefulness of our method by modeling the spiking activity of a population of retinal neurons, and the distribution of light intensities in small patches of natural images. In light of recent work on models with hidden variables, the proposed method can separate interactions induced by an unknown fluctuating environment from interactions intrinsic to the system. [Preview Abstract]
Tuesday, March 15, 2016 11:39AM - 11:51AM	F41.00003: From Maximum Entropy Models to Non-Stationarity and Irreversibility Rodrigo Cofre, Bruno Cessac, Cesar Maldonado \\ The maximum entropy distribution can be obtained from a variational principle. This is important as a matter of principle and for the purpose of finding approximate solutions. One can exploit this fact to obtain relevant information about the underlying stochastic process. We report here in recent progress in three aspects to this approach.\\ 1- Biological systems are expected to show some degree of irreversibility in time. Based on the transfer matrix technique to find the spatio-temporal maximum entropy distribution, we build a framework to quantify the degree of irreversibility of any maximum entropy distribution.\\ 2- The maximum entropy solution is characterized by a functional called Gibbs free energy (solution of the variational principle). The Legendre transformation of this functional is the rate function, which controls the speed of convergence of empirical averages to their ergodic mean. We show how the correct description of this functional is determinant for a more rigorous characterization of first and higher order phase transitions.\\ 3- We assess the impact of a weak time-dependent external stimulus on the collective statistics of spiking neuronal networks. We show how to evaluate this impact on any higher order spatio-temporal correlation. [Preview Abstract]
Tuesday, March 15, 2016 11:51AM - 12:03PM	F41.00004: Learning Maximal Entropy Models from finite size datasets: a fast Data-Driven algorithm allows to sample from the posterior distribution. Ulisse Ferrari A maximal entropy model provides the least constrained probability distribution that reproduces experimental averages of an observables set. In this work we characterize the learning dynamics that maximizes the log-likelihood in the case of large but finite datasets. We first show how the steepest descent dynamics is not optimal as it is slowed down by the inhomogeneous curvature of the model parameters space. We then provide a way for rectifying this space which relies only on dataset properties and does not require large computational efforts. We conclude by solving the long-time limit of the parameters dynamics including the randomness generated by the systematic use of Gibbs sampling. In this stochastic framework, rather than converging to a fixed point, the dynamics reaches a stationary distribution, which for the rectified dynamics reproduces the posterior distribution of the parameters. We sum up all these insights in a ``rectified'' Data-Driven algorithm that is fast and by sampling from the parameters posterior avoids both under- and over-fitting along all the directions of the parameters space. Through the learning of pairwise Ising models from the recording of a large population of retina neurons, we show how our algorithm outperforms the steepest descent method. [Preview Abstract]
Tuesday, March 15, 2016 12:03PM - 12:15PM	F41.00005: UniEnt: uniform entropy model for the dynamics of a neuronal population Damian Hernandez Lahme, Ilya Nemenman Sensory information and motor responses are encoded in the brain in a collective spiking activity of a large number of neurons. Understanding the neural code requires inferring statistical properties of such collective dynamics from multicellular neurophysiological recordings. Questions of whether synchronous activity or silence of multiple neurons carries information about the stimuli or the motor responses are especially interesting. Unfortunately, detection of such high order statistical interactions from data is especially challenging due to the exponentially large dimensionality of the state space of neural collectives. Here we present UniEnt, a method for the inference of strengths of multivariate neural interaction patterns. The method is based on the Bayesian prior that makes no assumptions (uniform a priori expectations) about the value of the entropy of the observed multivariate neural activity, in contrast to popular approaches that maximize this entropy. We then study previously published multi-electrode recordings data from salamander retina, exposing the relevance of higher order neural interaction patterns for information encoding in this system. [Preview Abstract]
Tuesday, March 15, 2016 12:15PM - 12:27PM	F41.00006: Modeling the Mass Action Dynamics of Metabolism with Fluctuation Theorems and Maximum Entropy. William Cannon, Dennis Thomas, Douglas Baxter, Jeremy Zucker, Garrett Goh The laws of thermodynamics dictate the behavior of biotic and abiotic systems. Simulation methods based on statistical thermodynamics can provide a fundamental understanding of how biological systems function and are coupled to their environment. While mass action kinetic simulations are based on solving ordinary differential equations using rate parameters, analogous thermodynamic simulations of mass action dynamics are based on modeling states using chemical potentials. The latter have the advantage that standard free energies of formation/reaction and metabolite levels are much easier to determine than rate parameters, allowing one to model across a large range of scales. Bridging theory and experiment, statistical thermodynamics simulations allow us to both predict activities of metabolites and enzymes and use experimental measurements of metabolites and proteins as input data. Even if metabolite levels are not available experimentally, it is shown that a maximum entropy assumption is quite reasonable and in many cases results in both the most energetically efficient process and the highest material flux. [Preview Abstract]
Tuesday, March 15, 2016 12:27PM - 12:39PM	F41.00007: On the sufficiency of pairwise interactions in maximum entropy models of networks Ilya Nemenman, Lina Merchan Biological information processing networks consist of many components, which are coupled by an even larger number of complex multivariate interactions. However, analyses of data sets from fields as diverse as neuroscience, molecular biology, and behavior have reported that observed statistics of states of some biological networks can be approximated well by maximum entropy models with only pairwise interactions among the components. Based on simulations of random Ising spin networks with p-spin (p > 2) interactions, here we argue that this reduction in complexity can be thought of as a natural property of some densely interacting networks in certain regimes, and not necessarily as a special property of living systems. [Preview Abstract]
Tuesday, March 15, 2016 12:39PM - 12:51PM	F41.00008: Insights in connecting phenotypes in bacteria to coevolutionary information. Ryan Cheng, Faruck Morcos, Ryan Hayes, Rodney Helm, Herbert Levine, Jose Onuchic It has long been known that protein sequences are far from random. These sequences have been evolutionarily selected to maintain their ability to fold into stable, three-dimensional folded structures as well as their ability to form macromolecular assemblies, perform catalytic functions, etc. For these reasons, there exist quantifiable mutational patterns in the collection of sequence data for a protein family arising from the need to maintain favorable residue-residue interactions to facilitate folding as well as cellular function. Here, we focus on studying the correlated mutational patterns that give rise to interaction specificity in bacterial two-component signaling (TCS) systems. TCS proteins have evolved to be able to preferentially bind and transfer a phosphate group to their signaling partner while avoiding phosphotransfer with non-partners. We infer a Potts model Hamiltonian governing the correlated mutational patterns that are observed in the sequence data of TCS partners and apply this model to recently published in vivo mutational data. Our findings further support the notion that statistical models built from sequence data can be used to predict bacterial phenotypes as well as engineer interaction specificity between non-partner TCS proteins. [Preview Abstract]
Tuesday, March 15, 2016 12:51PM - 1:03PM	F41.00009: Computational Amide I Spectroscopy for Refinement of Disordered Peptide Ensembles: Maximum Entropy and Related Approaches Michael Reppert, Andrei Tokmakoff The structural characterization of intrinsically disordered peptides (IDPs) presents a challenging biophysical problem. Extreme heterogeneity and rapid conformational interconversion make traditional methods difficult to interpret. Due to its ultrafast (ps) shutter speed, Amide I vibrational spectroscopy has received considerable interest as a novel technique to probe IDP structure and dynamics. Historically, Amide I spectroscopy has been limited to delivering global secondary structural information. More recently, however, the method has been adapted to study structure at the local level through incorporation of isotope labels into the protein backbone at specific amide bonds. Thanks to the acute sensitivity of Amide I frequencies to local electrostatic interactions--particularly hydrogen bonds--spectroscopic data on isotope labeled residues directly reports on local peptide conformation. Quantitative information can be extracted using electrostatic frequency maps which translate molecular dynamics trajectories into Amide I spectra for comparison with experiment. Here we present our recent efforts in the development of a rigorous approach to incorporating Amide I spectroscopic restraints into refined molecular dynamics structural ensembles using maximum entropy and related approaches. By combining force field predictions with experimental spectroscopic data, we construct refined structural ensembles for a family of short, strongly disordered, elastin-like peptides in aqueous solution. [Preview Abstract]
Tuesday, March 15, 2016 1:03PM - 1:39PM	F41.00010: Coevolutionary modeling of protein sequences: Predicting structure, function, and mutational landscapes Invited Speaker: Martin Weigt Over the last years, biological research has been revolutionized by experimental high-throughput techniques, in particular by next-generation sequencing technology. Unprecedented amounts of data are accumulating, and there is a growing request for computational methods unveiling the information hidden in raw data, thereby increasing our understanding of complex biological systems. Statistical-physics models based on the maximum-entropy principle have, in the last few years, played an important role in this context. To give a specific example, proteins and many non-coding RNA show a remarkable degree of structural and functional conservation in the course of evolution, despite a large variability in amino acid sequences. We have developed a statistical-mechanics inspired inference approach - called Direct-Coupling Analysis - to link this sequence variability (easy to observe in sequence alignments, which are available in public sequence databases) to bio-molecular structure and function. In my presentation I will show, how this methodology can be used (i) to infer contacts between residues and thus to guide tertiary and quaternary protein structure prediction and RNA structure prediction, (ii) to discriminate interacting from non-interacting protein families, and thus to infer conserved protein-protein interaction networks, and (iii) to reconstruct mutational landscapes and thus to predict the phenotypic effect of mutations. References [1] M. Figliuzzi, H. Jacquier, A. Schug, O. Tenaillon and M. Weigt "Coevolutionary landscape inference and the context-dependence of mutations in beta-lactamase TEM-1", Mol. Biol. Evol. (2015), doi: 10.1093/molbev/msv211 [2] E. De Leonardis, B. Lutz, S. Ratz, S. Cocco, R. Monasson, A. Schug, M. Weigt "Direct-Coupling Analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction", Nucleic Acids Research (2015), doi: 10.1093/nar/gkv932 [3] F. Morcos, A. Pagnani, B. Lunt, A. Bertolino, D. Marks, C. Sander, R. Zecchina, J.N. Onuchic, T. Hwa, M. Weigt, "Direct-coupling analysis of residue co-evolution captures native contacts across many protein families", Proc. Natl. Acad. Sci. 108, E1293-E1301 (2011). [Preview Abstract]
Tuesday, March 15, 2016 1:39PM - 1:51PM	F41.00011: Phase transitions in Hidden Markov Models John Bechhoefer, Emma Lathouwers In \textit{Hidden Markov Models} (HMMs), a Markov process is not directly accessible. In the simplest case, a two-state Markov model “emits” one of two “symbols” at each time step. We can think of these symbols as noisy measurements of the underlying state. With some probability, the symbol implies that the system is in one state when it is actually in the other. The ability to judge which state the system is in sets the efficiency of a Maxwell demon that observes state fluctuations in order to extract heat from a coupled reservoir. The \textit{state-inference problem} is to infer the underlying state from such noisy measurements at each time step. We show that there can be a phase transition in such measurements:\footnote{John Bechhoefer, \textit{New J. Phys.} \textbf{17}, 075003 (2015).} for measurement error rates below a certain threshold, the inferred state always matches the observation. For higher error rates, there can be continuous or discontinuous transitions to situations where keeping a memory of past observations improves the state estimate. We can partly understand this behavior by mapping the HMM onto a 1d random-field Ising model at zero temperature. We also present more recent work that explores a larger parameter space and more states. [Preview Abstract]
Tuesday, March 15, 2016 1:51PM - 2:03PM	F41.00012: Distinguishing cell type using epigenotype Thomas Wytock, Adilson E Motter Recently, researchers have proposed that unique cell types are attractors of their epigenetic dynamics including gene expression and chromatin conformation patterns. Traditionally, cell types have been classified by their function, morphology, cytochemistry, and other macroscopically observable properties. Because these properties are the result of many proteins working together, it should be possible to predict cell types from gene expression or chromatin conformation profiles. In this talk, I present a maximum entropy approach to identify and distinguish cell type attractors on the basis of correlations within these profiles. I will demonstrate the flexibility of this method through its separate application to gene expression and chromatin conformation datasets. I show that our method out-performs other machine-learning techniques and uncorrelated benchmarks. We adapt our method to predict growth rate from gene expression in E. coli and S. cerevisiae and compare our predictions with those from metabolic models. In addition, our method identifies a nearly convex region of state-space associated with each cell type attractor basin. Estimates of the growth rate and attractor basin make it possible to rationally control gene regulatory networks independent of a model. [Preview Abstract]
Tuesday, March 15, 2016 2:03PM - 2:15PM	F41.00013: Can simple interactions capture complex features of neural activity underlying behavior in a virtual reality environment? Leenoy Meshulam, Jeffrey Gauthier, Carlos Brody, David Tank, William Bialek The complex neural interactions which are abundant in most recordings of neural activity are relatively poorly understood. A prime example of such interactions can be found in the in vivo neural activity which underlies complex behaviors of mice, imaged in brain regions such as hippocampus and parietal cortex. Experimental techniques now allow us to accurately follow these neural interactions in the simultaneous activity of large neuronal populations of awake behaving animals. Here, we demonstrate that pairwise maximum entropy models can predict a surprising number of properties of the neural activity. The models, that are constrained with activity rates and interactions between pairs of neurons, are well fit to the activity `states' in the hippocampus and cortex of mice performing cognitive tasks while navigating in a virtual reality environment. [Preview Abstract]

About APS

The American Physical Society (APS) is a non-profit membership organization working to advance the knowledge of physics.

Headquarters 1 Physics Ellipse, College Park, MD 20740-3844 (301) 209-3200
Editorial Office 100 Motor Pkwy, Suite 110, Hauppauge, NY 11788 (631) 591-4000
Office of Public Affairs 529 14th St NW, Suite 1050, Washington, D.C. 20045-2001 (202) 662-8700

Bulletin of the American Physical Society

APS March Meeting 2016

Volume 61, Number 2

Monday–Friday, March 14–18, 2016; Baltimore, Maryland

Session F41: Maximum Entropy Models: A Promising Link Between Statistical Physics, Inference, and Biology

Follow Us

Engage

My APS

Information for

About APS