Bulletin of the American Physical Society
APS March Meeting 2022
Volume 67, Number 3
Monday–Friday, March 14–18, 2022; Chicago
Session F03: Physics of Learning II: Artificial systemsFocus Recordings Available

Hide Abstracts 
Sponsoring Units: DBIO GSNP Chair: Naama Brenner, Technion Room: McCormick Place W176A 
Tuesday, March 15, 2022 8:00AM  8:36AM 
F03.00001: Memorizing without overfitting: Overparameterization in machine learning, physics and biology Invited Speaker: Jason W Rocks Over the last decade, advances in Machine Learning, and in particular Deep Learning, have resulted in incredible progress in the ability to learn statistical relationships from large data sets and make accurate predictions. In contrast to models from classical statistics, Deep Learning models almost always have many more fit parameters than data points, a setting in which classical statistical intuitions such as the biasvariance tradeoff no longer apply. In this presentation, we analyze the generalization properties of twolayer neural networks to showcase some of the new, unaccountedfor behaviors that arise in these "overparameterized" models that are not present in classic statistics. We also provide additional intuition by proposing a new geometric picture of generalization in overparameterized models. Finally, we discuss how overparameterization in Deep Learning Models may reveal a deeper, more general understanding of a wide range of physical systems, including allosteric proteins, physicsbased learning machines, and even ecoevolutionary models. 
Tuesday, March 15, 2022 8:36AM  8:48AM 
F03.00002: When are Neural Networks Kernel Learners? Alexander B Atanasov, Blake Bordelon, Cengiz Pehlevan Certain limits of neural networks have been shown to be equivalent to kernel machines with a kernel that stays constant during training known as the neural tangent kernel (NTK). These limits generally do not exhibit the phenomenon of feature learning, to which a large part of the success of deep learning is attributed. Can neural networks that learn features still be described by kernel machines with a datadependent learned kernel? We demonstrate that this can indeed happen due to a phenomenon we term silent alignment, which requires that the NTK of a network evolves in eigenstructure while small in overall scale. We show that such an effect takes place in homogenous neural networks with small initialization trained on approximately whitened data. We provide an analytical treatment of this effect in the linear network case. In general, we find that the kernel develops a lowrank contribution in the early phase of training, and then evolves in overall scale, yielding a function equivalent to a kernel regression solution with the final network's NTK. The early spectral learning of the kernel depends on both depth and on relative learning rates in each layer. We also demonstrate that nonwhitened data can weaken the silent alignment effect. 
Tuesday, March 15, 2022 8:48AM  9:00AM 
F03.00003: Teaching a material to be adaptive Martin J Falk, Jiayi Wu, Vedant Sachdeva, Sidney R Nagel, Arvind Murugan Evolution in timevarying environments naturally leads to adaptable biological systems that can easily switch functionalities. Advances in the synthesis of environmentallyresponsive materials therefore open up the possibility of creating a wide range of synthetic materials which can also learn to be adaptable. By periodically switching targets in a given design algorithm, we can teach a material to perform distinct, diametricallyopposed functionalities with minimal changes in design parameters. We exhibit this learning strategy for adaptability in two simulated settings: elastic networks that are designed to switch deformation modes with minimal bond changes; and heteropolymers whose folding pathway selections are controlled by a minimal set of residue interactions. 
Tuesday, March 15, 2022 9:00AM  9:12AM 
F03.00004: Information theory of high dimensional linear regression Vudtiwat Ngampruetikorn, David J Schwab Quantitative characterization of generalization is key to understanding learning in virtually all settings from classical statistical modeling to modern machine learning. While statistical learning in the abundance of data is wellunderstood, relatively little is known about generalization in the overparametrized regime where model parameters can far outnumber available data points. Here we demonstrate that recent advances in informationtheoretic analyses of generalization provide a general framework for characterizing practical learning algorithms in both dataabundant and datalimited regimes. We consider randomized ridge regression in the thermodynamic limit where we send the numbers of model parameters and data points to infinity while fixing the ratio. We quantify generalization errors, using informationtheoretic measures, and analyze an informationtheoretic analog of biasvariance decomposition, varying regularization strength, data structure and the degree of overparametrization. Our results offer a fresh insight into the phenomenon of benign overfitting which describes the surprisingly good generalization properties of perfectly fitted models. Finally we show how the information bottleneck method can be used to identify datadependent optimal hyperparameters of learning algorithms in the spirit of meta learning. 
Tuesday, March 15, 2022 9:12AM  9:24AM 
F03.00005: Learning out of equilibrium in physical systems Menachem Stern, Sam J Dillavou, Marc Z Miskin, Douglas J Durian, Andrea J Liu Physical networks can adapt to external stimuli and learn to perform desired tasks by exploiting local 'learning rules' that govern learning degrees of freedom (e.g. edge resistances in resistor networks). So far, it has been assumed that such learning machines can learn successfully only if the learning degrees of freedom evolve slowly compared to their physical dynamics, such that the physical degrees of freedom (e.g. currents on edges) are effectively always equilibrated. However, this assumption slows down learning considerably, rendering machine learning algorithms based on local rules noncompetitive with standard algorithms. Inspired by natural learning systems, such as certain neuronal circuits, which learn on timescales similar to their relaxation, we relax the assumption of slow learning, showing in experiments and simulations that electric resistor networks can learn allosteric tasks up to a critical learning rate without loss in accuracy. Going beyond the critical learning rate, we find nonequilibrium learning oscillations but the network can still learn allosteric tasks at much greater rates. These oscillations can be suppressed when the network passes by flat solutions to the learning task. Our results demonstrate that learning is robust even far from equilibrium. 
Tuesday, March 15, 2022 9:24AM  9:36AM 
F03.00006: Learning Continuous Chaotic Attractors with a Reservoir Computer Lindsay M Smith, Jason Z Kim, Zhixin Lu, Danielle S Bassett Neural systems are well known for their ability to learn and store information as memories. Even more impressive is their ability to abstract these memories to create complex internal representations, enabling advanced functions such as the spatial manipulation of mental representations. While recurrent neural networks (RNNs) are capable of representing complex information, the exact mechanisms of how dynamical neural systems perform abstraction are still not wellunderstood, thereby hindering development of more advanced functions. Here, we train a 1000neuron RNN — a reservoir computer (RC) — to abstract a continuous dynamical attractor memory from isolated examples of dynamical attractor memories. Further, we explain the abstraction mechanism with new theory. By training the RC on isolated and shifted examples of either stable limit cycles or chaotic Lorenz attractors, the RC learns a continuum of attractors, as quantified by an extra Lyapunov exponent equal to zero. We propose a theoretical mechanism of this abstraction by combining ideas from differentiable generalized synchronization and feedback dynamics. Our results quantify abstraction in simple neural systems, enabling us to design artificial RNNs for abstraction, and leading us towards a neural basis of abstraction. 
Tuesday, March 15, 2022 9:36AM  9:48AM 
F03.00007: Learning Nonequilibrium Control Forces to Characterize Dynamical Phase Transitions Jiawei Yan, Hugo Touchette, Grant M Rotskoff Sampling the collective, dynamical fluctuations that lead to nonequilibrium pattern formation requires probing rare regions of trajectory space. Recent approaches to this problem based on importance sampling, cloning, and spectral approximations, have yielded significant insight into nonequilibrium systems, but tend to scale poorly with the size of the system, especially near dynamical phase transitions. Here we propose a machine learning algorithm that samples rare trajectories and estimates the associated large deviation functions using a manybody control force by leveraging the flexible function representation provided by deep neural networks, importance sampling in trajectory space, and stochastic optimal control theory. We show that this approach scales to hundreds of interacting particles and remains robust at dynamical phase transitions. 
Tuesday, March 15, 2022 9:48AM  10:00AM Withdrawn 
F03.00008: A Bayesian Approach to Hyperbolic Embeddings Anoop Praturu, Tatyana O Sharpee Recent studies have increasingly demonstrated that hyperbolic geometry confers many advantages for analyzing hierarchical structure in complex systems. However, available embedding methods for hyperbolic spaces typically operate at fixed dimension (usually 2 or 3), do not vary curvature, and require knowledge of network connections between data points. To address these problems, we develop a Bayesian formulation of MultiDimensional Scaling for embedding data in hyperbolic spaces that can fit for the optimal values of geometric parameters such as curvature and dimension. We propose a novel, physics based model of embedding uncertainty within this Bayesian framework which improves both performance and interpretability of the model. Because the method allows for variable curvature, it can also correctly embed Euclidean data using zero curvature, thus subsuming traditional Euclidean MDS models. We demonstrate that only a small amount of data is needed to constrain the geometry in our model and that the model is robust against false minima when scaling to large datasets. We show how the estimated geometry can be used to derive a new hierarchical clustering algorithm, and demonstrate its effectiveness for inferring latent hierarchical structure in the data. We demonstrate the capabilities of the model by applying it to a variety of biological datasets, uncovering hidden hierarchical relationships in datasets relating to aging and the COVID genome 
Tuesday, March 15, 2022 10:00AM  10:12AM Withdrawn 
F03.00009: A symbolic systen that synthesises an internal model of an algebraic theory of the data and prior knowlwedge Gonzalo de Polavieja Symbolic approaches to AI excel at mathematical transparency and reasoning, but without learning from data they have limited contact with the real world. Here we propose an approach inspired in Model Theory that combines the mathematical transparency of symbolic systems with the ability to learn internal models with no use of optimization. In a first step, we embed the properties of our data and prior formal knowledge into an algebraic theory consisting of firstorder sentences using symbols that refer to objects, parts of objects or abstract concepts. In a second step, the system learns by synthesizing some internal symbols, or atoms, that do not refer directly to items in the world but that are instead a model of the algebraic theory. Specifically, we are interested in the freest atomized model that, among all possible models of the algebraic theory, is the one with more negative sentences. We prove that this model guarantees to find a rule in the data if it exists and we have enough data. The subset of atoms of the freest model that is most stable during training is shown to be a generalizing model. It can also obtain for small datasets an approximation to or even the exact underlying rule that the freest model finds in the large data limit. We believe that these ruleseeking models open many new possibilities at the mathematical, cognitive and practical levels. 
Tuesday, March 15, 2022 10:12AM  10:24AM 
F03.00010: Optimal learning despite a hundred distracting directions Michael C Abbott, Benjamin B Machta Learning from incomplete data requires a notion of measure on parameter space, which is most explicit in the Bayesian framework as a prior distribution. We demonstrate here that ostensibly neutral choices like Jeffreys prior can in fact introduce enormous bias in typical highdimensional models. Models found in science typically have an effective dimensionality of accessible behaviors much smaller than the number of microscopic parameters. Naively using the invariant volume element, which treats all of these parameters equally, strongly distorts the measure projected onto the subspace of relevant parameters, due to variations in the local covolume of irrelevant directions. The fact that this covolume typically varies over many orders of magnitude is what introduces bias into predictions. We present results on principled choices of measure which avoid this issue, and lead to unbiased posteriors. These measures allow optimal learning, despite the presence of many paramters which cannot be fixed. 
Tuesday, March 15, 2022 10:24AM  10:36AM 
F03.00011: Memory, Prediction and Computation in the Kuramoto model Chanin Kumpeerakij, David J Schwab, Thiparat Chotibut, Vudtiwat Ngampruetikorn Nonlinear dynamical systems, such as recurrent neural networks, have proved a powerful model for temporal data, exhibiting remarkable predictive capacity even for chaotic time series. However such performance relies on finding the right parameter regimes, a challenging process for large dynamical systems required to model complex data. Here we investigate the computational capability of interacting phase oscillators, described by the Kuramoto model and coupled to synthetic input data with tunable correlation times. Our approach enables systematic exploration of qualitatively distinct parameter regimes, separated by phase transitions, as well as how they interact with the structure in the data. We use informationtheoretic measures to quantify the memory and predictive capacities of manyoscillator systems and analyze their computational efficiency through the lens of the information bottleneck principle. Our work offers an insight into the emergence of computation from the collective behaviors of large dynamical systems. 
Follow Us 
Engage
Become an APS Member 
My APS
Renew Membership 
Information for 
About APSThe American Physical Society (APS) is a nonprofit membership organization working to advance the knowledge of physics. 
© 2024 American Physical Society
 All rights reserved  Terms of Use
 Contact Us
Headquarters
1 Physics Ellipse, College Park, MD 207403844
(301) 2093200
Editorial Office
100 Motor Pkwy, Suite 110, Hauppauge, NY 11788
(631) 5914000
Office of Public Affairs
529 14th St NW, Suite 1050, Washington, D.C. 200452001
(202) 6628700