Bulletin of the American Physical Society
APS March Meeting 2023
Volume 68, Number 3
Las Vegas, Nevada (March 5-10)
Virtual (March 20-22); Time Zone: Pacific Time
Session F02: Statistical Physics Meets Machine Learning IIFocus
|
Hide Abstracts |
Sponsoring Units: GSNP DSOFT DBIO GDS Chair: David Schwab, The Graduate Center, CUNY Room: Room 125 |
Tuesday, March 7, 2023 8:00AM - 8:36AM |
F02.00001: Statistical Physics and Geometry of Overparameterization Invited Speaker: Pankaj Mehta Modern machine learning often employs overparameterized statistical models with many more parameters than training data points. In this talk, I will review recent work from our group on such models, emphasizing intuitions centered on the bias-variance tradeoff and a new geoemetric picture for overparameterized regression. |
Tuesday, March 7, 2023 8:36AM - 8:48AM |
F02.00002: Zipf's criticality in learning systems Sean A Ridout, Ilya M Nemenman Many high-dimensional complex systems, including biological ones such as populations of neurons, exhibit Zipf's law. That is, the $r$-th most frequently observed value is seen with frequency proportional to $1/r$. Although this has been proposed to be a signature of fine-tuning, previous work shows that the Zipf's law can also emerge from a generic coupling between an observed system with many degrees of freedom and an unobserved fluctuating variable. In this context, the emergence of Zipf's law can be related to the fact that an observation of the large system tightly constrains the values of the unobserved variables. Recently, Zipf's law has been observed in the distribution of functions which may be produced by a neural network of a given architecture. We show that these results hold true for many learning machines (not necessarily deep networks) in regimes where learning is possible. This relates the observation of Zipf's law to the ability of a system to learn a model from data, also suggesting ways to improve learning algorithms.
|
Tuesday, March 7, 2023 8:48AM - 9:00AM |
F02.00003: How SGD noise affects performance in distinct regimes of deep learning Antonio Sclocchi, Mario Geiger, Matthieu Wyart Understanding when the noise in stochastic gradient descent (SGD) improves generalization of neural networks remains a challenge, complicated by the fact that nets can operate in distinct training regimes. Here we study how the magnitude of this noise or `temperature' T affects performance as the scale of initialization α is varied. α is a key parameter that controls if the network is `lazy' and behaves as a kernel (α >> 1), or instead if it learns features (α << 1). |
Tuesday, March 7, 2023 9:00AM - 9:12AM |
F02.00004: Stochastic gradient descent introduces an effective landscape-dependent regularization favoring flat solutions Ning Yang, Yuhai Tu, Chao Tang Generalization is one of the most important problems in deep learning (DL). In the overparameterized regime in neural networks, there exist many low-loss solutions that fit the training data equally well. The key question is which solution is more generalizable. Empirical studies showed a strong correlation between flatness of the loss landscape at a solution and its generalizability, and stochastic gradient descent (SGD) is crucial in finding the flat solutions. To understand how SGD drives the learning system to flat solutions, we construct a simple model whose loss landscape has a continuous set of degenerate (or near degenerate) minima. By solving the Fokker-Planck equation of the underlying stochastic learning dynamics, we show that due to its strong anisotropy the SGD noise introduces an additional effective loss term that decreases with flatness and has an overall strength that increases with the learning rate and batch-to-batch variation. We find that the additional landscape-dependent SGD-loss breaks the degeneracy and serves as an effective regularization for finding flat solutions. |
Tuesday, March 7, 2023 9:12AM - 9:24AM |
F02.00005: Contrastive learning through non-equilibrium memory Arvind Murugan, Adam Strupp, Benjamin Scellier, Martin J Falk Learning algorithms based on backpropagation have been very powerful in silico but alternatives based on local rules offer potential benefits for learning in physical systems. A broad class of such local learning rules - contrastive learning rules - require comparing the spontaneous behavior of the system with the behavior of the system when driven to a desired state. We do not understand the fundamental physical requirements on memory needed for such contrastive learning. Here, we show how the simplest form of non-equilibrium memory in each `synapse' of a network allows for contrastive rules such as equilibrium propagation. In this framework, the free and clamped states are seen in sequence over time as part of a sawtooth-like protocol which breaks the symmetry in time. We identify principles for optimal protocols and determine the fundamental Landauer energy cost of supervised learning through physical dynamics. These principles are also applicable to mechanical, chemical or other physical systems where non-equilibrium synaptic memory can naturally arise through ubiquitous feedback circuits. |
Tuesday, March 7, 2023 9:24AM - 9:36AM |
F02.00006: Data-driven irreversibility measurement for biological patterns Junang Li, Chih-Wei Joshua Liu, Michal Szurek, Nikta Fakhri Thermodynamic irreversibility is a crucial property of living matter. Irreversible processes maintain spatiotemporally complex structures and functions characteristic of living systems. Robust and general qualifications of irreversibility remains a challenging task due to the nonlinearities and influences of many coupled degrees of freedom. Here we use deep learning to reveal tractable, low-dimensional representations of patterns in a canonical protein signaling process, Rho-GTPase system as well as complex Ginzburg-Landau dynamics. We show that our representation recovers the activity levels and irreversibility trends for a range of patterns. Additionally, we find that our irreversibility estimates serve as a dynamical order parameter, distinguishing stable and chaotic dynamics in these nonlinear systems. Our method leverages advances in deep learning to quantify the nonequilibrium and nonlinear behavior of general, complex living processes. |
Tuesday, March 7, 2023 9:36AM - 9:48AM |
F02.00007: Training elastic neural networks with the Hamiltonian Monte Carlo sampling algorithm Théophile Louvet, Vincent Maillou, Finn T Bohte, Lars Gebraad, Marc Serra-Garcia Because of their low damping and highly non-linear characteristics, artificial neural networks (ANNs) made of nonlinear elastic resonators are promising candidates for low-power computing, as illustrated by recent demonstrations of passive speech recognition. However, designing information-processing elastic structures is a hard optimization problem: While the training of software-based ANNs can be facilitated by increasing the network size (converting local minima into saddle points), and by choosing activation functions with beneficial properties, there are usually hard limits on the size and activation functions in physically-implemented neural networks. Here we train resource-constrained elastic ANNs by applying the Hamiltonian Monte Carlo method, a variant of the Metropolis-Hastings algorithm used in statistical physics to sample probability distributions presenting a large number of local minima. While our work focuses on computers consisting of physical elastic resonators, our conclusions can be applied to general low power/resource constrained machine learning. |
Tuesday, March 7, 2023 9:48AM - 10:00AM |
F02.00008: Scalable and interpretable machine learning for inference in stochastic transcriptional systems Maria Carilli
|
Tuesday, March 7, 2023 10:00AM - 10:12AM |
F02.00009: Finding the Function-Determining Subset of Amino Acids in Protein Sequence Data Peter Fields, Vudtiwat Ngampruetikorn, Rama Ranganathan, David J Schwab, Stephanie E Palmer Energy-based models (EBM) fit to aligned sequences of a protein family have demonstrated the ability to generate novel functional protein sequences. This suggests that sequence-level statistics encode the salient features that underpin the structure and function of a protein. Understanding the extent to which EBMs can model such features is paramount to providing insight into their ability to sample new sequences, and consequently, insight into the biology. Specifically, we consider EBMs' ability to capture protein sectors, roughly 10 to 20 percent of total sequence positions that correlate strongly with biological functions. To this end, we fit pairwise models, Restricted Boltzmann machines (RBMs), and hybrid semi-Restricted Boltzmann machines (sRBMs) to synthetic sequences drawn from a minimal model endowed with a notion of sectors. The aim of incorporating an RBM is to model the sector. EBMs fit to the synthetic data are benchmarked by directly relating how well they model the sector to their generative performance. These benchmarks guide insight into the generative performance of EBMs that are fit to real data and are tested directly via lab experiments probing functionality of sampled sequences. |
Follow Us |
Engage
Become an APS Member |
My APS
Renew Membership |
Information for |
About APSThe American Physical Society (APS) is a non-profit membership organization working to advance the knowledge of physics. |
© 2024 American Physical Society
| All rights reserved | Terms of Use
| Contact Us
Headquarters
1 Physics Ellipse, College Park, MD 20740-3844
(301) 209-3200
Editorial Office
100 Motor Pkwy, Suite 110, Hauppauge, NY 11788
(631) 591-4000
Office of Public Affairs
529 14th St NW, Suite 1050, Washington, D.C. 20045-2001
(202) 662-8700