Bulletin of the American Physical Society
APS March Meeting 2022
Volume 67, Number 3
Monday–Friday, March 14–18, 2022; Chicago
Session F03: Physics of Learning II: Artificial systemsFocus Recordings Available
|
Hide Abstracts |
Sponsoring Units: DBIO GSNP Chair: Naama Brenner, Technion Room: McCormick Place W-176A |
Tuesday, March 15, 2022 8:00AM - 8:36AM |
F03.00001: Memorizing without overfitting: Over-parameterization in machine learning, physics and biology Invited Speaker: Jason W Rocks Over the last decade, advances in Machine Learning, and in particular Deep Learning, have resulted in incredible progress in the ability to learn statistical relationships from large data sets and make accurate predictions. In contrast to models from classical statistics, Deep Learning models almost always have many more fit parameters than data points, a setting in which classical statistical intuitions such as the bias-variance tradeoff no longer apply. In this presentation, we analyze the generalization properties of two-layer neural networks to showcase some of the new, unaccounted-for behaviors that arise in these "over-parameterized" models that are not present in classic statistics. We also provide additional intuition by proposing a new geometric picture of generalization in over-parameterized models. Finally, we discuss how over-parameterization in Deep Learning Models may reveal a deeper, more general understanding of a wide range of physical systems, including allosteric proteins, physics-based learning machines, and even eco-evolutionary models. |
Tuesday, March 15, 2022 8:36AM - 8:48AM |
F03.00002: When are Neural Networks Kernel Learners? Alexander B Atanasov, Blake Bordelon, Cengiz Pehlevan Certain limits of neural networks have been shown to be equivalent to kernel machines with a kernel that stays constant during training known as the neural tangent kernel (NTK). These limits generally do not exhibit the phenomenon of feature learning, to which a large part of the success of deep learning is attributed. Can neural networks that learn features still be described by kernel machines with a data-dependent learned kernel? We demonstrate that this can indeed happen due to a phenomenon we term silent alignment, which requires that the NTK of a network evolves in eigenstructure while small in overall scale. We show that such an effect takes place in homogenous neural networks with small initialization trained on approximately whitened data. We provide an analytical treatment of this effect in the linear network case. In general, we find that the kernel develops a low-rank contribution in the early phase of training, and then evolves in overall scale, yielding a function equivalent to a kernel regression solution with the final network's NTK. The early spectral learning of the kernel depends on both depth and on relative learning rates in each layer. We also demonstrate that non-whitened data can weaken the silent alignment effect. |
Tuesday, March 15, 2022 8:48AM - 9:00AM |
F03.00003: Teaching a material to be adaptive Martin J Falk, Jiayi Wu, Vedant Sachdeva, Sidney R Nagel, Arvind Murugan Evolution in time-varying environments naturally leads to adaptable biological systems that can easily switch functionalities. Advances in the synthesis of environmentally-responsive materials therefore open up the possibility of creating a wide range of synthetic materials which can also learn to be adaptable. By periodically switching targets in a given design algorithm, we can teach a material to perform distinct, diametrically-opposed functionalities with minimal changes in design parameters. We exhibit this learning strategy for adaptability in two simulated settings: elastic networks that are designed to switch deformation modes with minimal bond changes; and heteropolymers whose folding pathway selections are controlled by a minimal set of residue interactions. |
Tuesday, March 15, 2022 9:00AM - 9:12AM |
F03.00004: Information theory of high dimensional linear regression Vudtiwat Ngampruetikorn, David J Schwab Quantitative characterization of generalization is key to understanding learning in virtually all settings from classical statistical modeling to modern machine learning. While statistical learning in the abundance of data is well-understood, relatively little is known about generalization in the over-parametrized regime where model parameters can far outnumber available data points. Here we demonstrate that recent advances in information-theoretic analyses of generalization provide a general framework for characterizing practical learning algorithms in both data-abundant and data-limited regimes. We consider randomized ridge regression in the thermodynamic limit where we send the numbers of model parameters and data points to infinity while fixing the ratio. We quantify generalization errors, using information-theoretic measures, and analyze an information-theoretic analog of bias-variance decomposition, varying regularization strength, data structure and the degree of over-parametrization. Our results offer a fresh insight into the phenomenon of benign overfitting which describes the surprisingly good generalization properties of perfectly fitted models. Finally we show how the information bottleneck method can be used to identify data-dependent optimal hyperparameters of learning algorithms in the spirit of meta learning. |
Tuesday, March 15, 2022 9:12AM - 9:24AM |
F03.00005: Learning out of equilibrium in physical systems Menachem Stern, Sam J Dillavou, Marc Z Miskin, Douglas J Durian, Andrea J Liu Physical networks can adapt to external stimuli and learn to perform desired tasks by exploiting local 'learning rules' that govern learning degrees of freedom (e.g. edge resistances in resistor networks). So far, it has been assumed that such learning machines can learn successfully only if the learning degrees of freedom evolve slowly compared to their physical dynamics, such that the physical degrees of freedom (e.g. currents on edges) are effectively always equilibrated. However, this assumption slows down learning considerably, rendering machine learning algorithms based on local rules non-competitive with standard algorithms. Inspired by natural learning systems, such as certain neuronal circuits, which learn on timescales similar to their relaxation, we relax the assumption of slow learning, showing in experiments and simulations that electric resistor networks can learn allosteric tasks up to a critical learning rate without loss in accuracy. Going beyond the critical learning rate, we find non-equilibrium learning oscillations but the network can still learn allosteric tasks at much greater rates. These oscillations can be suppressed when the network passes by flat solutions to the learning task. Our results demonstrate that learning is robust even far from equilibrium. |
Tuesday, March 15, 2022 9:24AM - 9:36AM |
F03.00006: Learning Continuous Chaotic Attractors with a Reservoir Computer Lindsay M Smith, Jason Z Kim, Zhixin Lu, Danielle S Bassett Neural systems are well known for their ability to learn and store information as memories. Even more impressive is their ability to abstract these memories to create complex internal representations, enabling advanced functions such as the spatial manipulation of mental representations. While recurrent neural networks (RNNs) are capable of representing complex information, the exact mechanisms of how dynamical neural systems perform abstraction are still not well-understood, thereby hindering development of more advanced functions. Here, we train a 1000-neuron RNN — a reservoir computer (RC) — to abstract a continuous dynamical attractor memory from isolated examples of dynamical attractor memories. Further, we explain the abstraction mechanism with new theory. By training the RC on isolated and shifted examples of either stable limit cycles or chaotic Lorenz attractors, the RC learns a continuum of attractors, as quantified by an extra Lyapunov exponent equal to zero. We propose a theoretical mechanism of this abstraction by combining ideas from differentiable generalized synchronization and feedback dynamics. Our results quantify abstraction in simple neural systems, enabling us to design artificial RNNs for abstraction, and leading us towards a neural basis of abstraction. |
Tuesday, March 15, 2022 9:36AM - 9:48AM |
F03.00007: Learning Nonequilibrium Control Forces to Characterize Dynamical Phase Transitions Jiawei Yan, Hugo Touchette, Grant M Rotskoff Sampling the collective, dynamical fluctuations that lead to nonequilibrium pattern formation requires probing rare regions of trajectory space. Recent approaches to this problem based on importance sampling, cloning, and spectral approximations, have yielded significant insight into nonequilibrium systems, but tend to scale poorly with the size of the system, especially near dynamical phase transitions. Here we propose a machine learning algorithm that samples rare trajectories and estimates the associated large deviation functions using a many-body control force by leveraging the flexible function representation provided by deep neural networks, importance sampling in trajectory space, and stochastic optimal control theory. We show that this approach scales to hundreds of interacting particles and remains robust at dynamical phase transitions. |
Tuesday, March 15, 2022 9:48AM - 10:00AM Withdrawn |
F03.00008: A Bayesian Approach to Hyperbolic Embeddings Anoop Praturu, Tatyana O Sharpee Recent studies have increasingly demonstrated that hyperbolic geometry confers many advantages for analyzing hierarchical structure in complex systems. However, available embedding methods for hyperbolic spaces typically operate at fixed dimension (usually 2 or 3), do not vary curvature, and require knowledge of network connections between data points. To address these problems, we develop a Bayesian formulation of Multi-Dimensional Scaling for embedding data in hyperbolic spaces that can fit for the optimal values of geometric parameters such as curvature and dimension. We propose a novel, physics based model of embedding uncertainty within this Bayesian framework which improves both performance and interpretability of the model. Because the method allows for variable curvature, it can also correctly embed Euclidean data using zero curvature, thus subsuming traditional Euclidean MDS models. We demonstrate that only a small amount of data is needed to constrain the geometry in our model and that the model is robust against false minima when scaling to large datasets. We show how the estimated geometry can be used to derive a new hierarchical clustering algorithm, and demonstrate its effectiveness for inferring latent hierarchical structure in the data. We demonstrate the capabilities of the model by applying it to a variety of biological datasets, uncovering hidden hierarchical relationships in datasets relating to aging and the COVID genome |
Tuesday, March 15, 2022 10:00AM - 10:12AM Withdrawn |
F03.00009: A symbolic systen that synthesises an internal model of an algebraic theory of the data and prior knowlwedge Gonzalo de Polavieja Symbolic approaches to AI excel at mathematical transparency and reasoning, but without learning from data they have limited contact with the real world. Here we propose an approach inspired in Model Theory that combines the mathematical transparency of symbolic systems with the ability to learn internal models with no use of optimization. In a first step, we embed the properties of our data and prior formal knowledge into an algebraic theory consisting of first-order sentences using symbols that refer to objects, parts of objects or abstract concepts. In a second step, the system learns by synthesizing some internal symbols, or atoms, that do not refer directly to items in the world but that are instead a model of the algebraic theory. Specifically, we are interested in the freest atomized model that, among all possible models of the algebraic theory, is the one with more negative sentences. We prove that this model guarantees to find a rule in the data if it exists and we have enough data. The subset of atoms of the freest model that is most stable during training is shown to be a generalizing model. It can also obtain for small datasets an approximation to or even the exact underlying rule that the freest model finds in the large data limit. We believe that these rule-seeking models open many new possibilities at the mathematical, cognitive and practical levels. |
Tuesday, March 15, 2022 10:12AM - 10:24AM |
F03.00010: Optimal learning despite a hundred distracting directions Michael C Abbott, Benjamin B Machta Learning from incomplete data requires a notion of measure on parameter space, which is most explicit in the Bayesian framework as a prior distribution. We demonstrate here that ostensibly neutral choices like Jeffreys prior can in fact introduce enormous bias in typical high-dimensional models. Models found in science typically have an effective dimensionality of accessible behaviors much smaller than the number of microscopic parameters. Naively using the invariant volume element, which treats all of these parameters equally, strongly distorts the measure projected onto the sub-space of relevant parameters, due to variations in the local co-volume of irrelevant directions. The fact that this co-volume typically varies over many orders of magnitude is what introduces bias into predictions. We present results on principled choices of measure which avoid this issue, and lead to unbiased posteriors. These measures allow optimal learning, despite the presence of many paramters which cannot be fixed. |
Tuesday, March 15, 2022 10:24AM - 10:36AM |
F03.00011: Memory, Prediction and Computation in the Kuramoto model Chanin Kumpeerakij, David J Schwab, Thiparat Chotibut, Vudtiwat Ngampruetikorn Nonlinear dynamical systems, such as recurrent neural networks, have proved a powerful model for temporal data, exhibiting remarkable predictive capacity even for chaotic time series. However such performance relies on finding the right parameter regimes, a challenging process for large dynamical systems required to model complex data. Here we investigate the computational capability of interacting phase oscillators, described by the Kuramoto model and coupled to synthetic input data with tunable correlation times. Our approach enables systematic exploration of qualitatively distinct parameter regimes, separated by phase transitions, as well as how they interact with the structure in the data. We use information-theoretic measures to quantify the memory and predictive capacities of many-oscillator systems and analyze their computational efficiency through the lens of the information bottleneck principle. Our work offers an insight into the emergence of computation from the collective behaviors of large dynamical systems. |
Follow Us |
Engage
Become an APS Member |
My APS
Renew Membership |
Information for |
About APSThe American Physical Society (APS) is a non-profit membership organization working to advance the knowledge of physics. |
© 2024 American Physical Society
| All rights reserved | Terms of Use
| Contact Us
Headquarters
1 Physics Ellipse, College Park, MD 20740-3844
(301) 209-3200
Editorial Office
100 Motor Pkwy, Suite 110, Hauppauge, NY 11788
(631) 591-4000
Office of Public Affairs
529 14th St NW, Suite 1050, Washington, D.C. 20045-2001
(202) 662-8700