Bulletin of the American Physical Society

APS March Meeting 2023

Volume 68, Number 3

Las Vegas, Nevada (March 5-10)
Virtual (March 20-22); Time Zone: Pacific Time

Session F02: Statistical Physics Meets Machine Learning II

Sponsoring Units: GSNP DSOFT DBIO GDS
Chair: David Schwab, The Graduate Center, CUNY
Room: Room 125

Tuesday, March 7, 2023 8:00AM - 8:36AM	F02.00001: Statistical Physics and Geometry of Overparameterization Invited Speaker: Pankaj Mehta Modern machine learning often employs overparameterized statistical models with many more parameters than training data points. In this talk, I will review recent work from our group on such models, emphasizing intuitions centered on the bias-variance tradeoff and a new geoemetric picture for overparameterized regression.
Tuesday, March 7, 2023 8:36AM - 8:48AM	F02.00002: Zipf's criticality in learning systems Sean A Ridout, Ilya M Nemenman Many high-dimensional complex systems, including biological ones such as populations of neurons, exhibit Zipf's law. That is, the $r$-th most frequently observed value is seen with frequency proportional to $1/r$. Although this has been proposed to be a signature of fine-tuning, previous work shows that the Zipf's law can also emerge from a generic coupling between an observed system with many degrees of freedom and an unobserved fluctuating variable. In this context, the emergence of Zipf's law can be related to the fact that an observation of the large system tightly constrains the values of the unobserved variables. Recently, Zipf's law has been observed in the distribution of functions which may be produced by a neural network of a given architecture. We show that these results hold true for many learning machines (not necessarily deep networks) in regimes where learning is possible. This relates the observation of Zipf's law to the ability of a system to learn a model from data, also suggesting ways to improve learning algorithms.
Tuesday, March 7, 2023 8:48AM - 9:00AM	F02.00003: How SGD noise affects performance in distinct regimes of deep learning Antonio Sclocchi, Mario Geiger, Matthieu Wyart Understanding when the noise in stochastic gradient descent (SGD) improves generalization of neural networks remains a challenge, complicated by the fact that nets can operate in distinct training regimes. Here we study how the magnitude of this noise or `temperature' T affects performance as the scale of initialization α is varied. α is a key parameter that controls if the network is `lazy' and behaves as a kernel (α >> 1), or instead if it learns features (α << 1). For classification of MNIST and CIFAR10 images by deep nets, we empirically observe that: (i) if α<<1, the optimal test error is achieved for a temperature value T_opt~ α^k. In the kernel regime, (ii) the relative weights variation at the end of training with respect to initialization increases as T^δP^γ, where P is the number of training points; (iii) the training time t^, defined as the learning rate times the number of training steps required to bring a hinge loss to zero, increases as t^~T P^b; (iv) at the cross-over temperature T_c~ P^-a the model escapes the kernel regime and its test error changes. We rationalize (i) with a scaling argument yielding k=(D-1)/(D+1), where D is the number of hidden layers of the network. We explain (ii,iii) using a perceptron architecture, for which we can compute the weights-dependent covariance of SGD noise and we obtain the exponents b, γ and δ. b and γ are found to depend on the density of data near the boundary separating labels. This model demonstrates that increasing the noise magnitude T increases the training time, leading to a larger change of the weights, allowing the model to escape the kernel regime. Therefore we rationalize (iv) with a scaling argument that relates the exponents a, γ, δ as a=γ/δ.
Tuesday, March 7, 2023 9:00AM - 9:12AM	F02.00004: Stochastic gradient descent introduces an effective landscape-dependent regularization favoring flat solutions Ning Yang, Yuhai Tu, Chao Tang Generalization is one of the most important problems in deep learning (DL). In the overparameterized regime in neural networks, there exist many low-loss solutions that fit the training data equally well. The key question is which solution is more generalizable. Empirical studies showed a strong correlation between flatness of the loss landscape at a solution and its generalizability, and stochastic gradient descent (SGD) is crucial in finding the flat solutions. To understand how SGD drives the learning system to flat solutions, we construct a simple model whose loss landscape has a continuous set of degenerate (or near degenerate) minima. By solving the Fokker-Planck equation of the underlying stochastic learning dynamics, we show that due to its strong anisotropy the SGD noise introduces an additional effective loss term that decreases with flatness and has an overall strength that increases with the learning rate and batch-to-batch variation. We find that the additional landscape-dependent SGD-loss breaks the degeneracy and serves as an effective regularization for finding flat solutions.
Tuesday, March 7, 2023 9:12AM - 9:24AM	F02.00005: Contrastive learning through non-equilibrium memory Arvind Murugan, Adam Strupp, Benjamin Scellier, Martin J Falk Learning algorithms based on backpropagation have been very powerful in silico but alternatives based on local rules offer potential benefits for learning in physical systems. A broad class of such local learning rules - contrastive learning rules - require comparing the spontaneous behavior of the system with the behavior of the system when driven to a desired state. We do not understand the fundamental physical requirements on memory needed for such contrastive learning. Here, we show how the simplest form of non-equilibrium memory in each `synapse' of a network allows for contrastive rules such as equilibrium propagation. In this framework, the free and clamped states are seen in sequence over time as part of a sawtooth-like protocol which breaks the symmetry in time. We identify principles for optimal protocols and determine the fundamental Landauer energy cost of supervised learning through physical dynamics. These principles are also applicable to mechanical, chemical or other physical systems where non-equilibrium synaptic memory can naturally arise through ubiquitous feedback circuits.
Tuesday, March 7, 2023 9:24AM - 9:36AM	F02.00006: Data-driven irreversibility measurement for biological patterns Junang Li, Chih-Wei Joshua Liu, Michal Szurek, Nikta Fakhri Thermodynamic irreversibility is a crucial property of living matter. Irreversible processes maintain spatiotemporally complex structures and functions characteristic of living systems. Robust and general qualifications of irreversibility remains a challenging task due to the nonlinearities and influences of many coupled degrees of freedom. Here we use deep learning to reveal tractable, low-dimensional representations of patterns in a canonical protein signaling process, Rho-GTPase system as well as complex Ginzburg-Landau dynamics. We show that our representation recovers the activity levels and irreversibility trends for a range of patterns. Additionally, we find that our irreversibility estimates serve as a dynamical order parameter, distinguishing stable and chaotic dynamics in these nonlinear systems. Our method leverages advances in deep learning to quantify the nonequilibrium and nonlinear behavior of general, complex living processes.
Tuesday, March 7, 2023 9:36AM - 9:48AM	F02.00007: Training elastic neural networks with the Hamiltonian Monte Carlo sampling algorithm Théophile Louvet, Vincent Maillou, Finn T Bohte, Lars Gebraad, Marc Serra-Garcia Because of their low damping and highly non-linear characteristics, artificial neural networks (ANNs) made of nonlinear elastic resonators are promising candidates for low-power computing, as illustrated by recent demonstrations of passive speech recognition. However, designing information-processing elastic structures is a hard optimization problem: While the training of software-based ANNs can be facilitated by increasing the network size (converting local minima into saddle points), and by choosing activation functions with beneficial properties, there are usually hard limits on the size and activation functions in physically-implemented neural networks. Here we train resource-constrained elastic ANNs by applying the Hamiltonian Monte Carlo method, a variant of the Metropolis-Hastings algorithm used in statistical physics to sample probability distributions presenting a large number of local minima. While our work focuses on computers consisting of physical elastic resonators, our conclusions can be applied to general low power/resource constrained machine learning.
Tuesday, March 7, 2023 9:48AM - 10:00AM	F02.00008: Scalable and interpretable machine learning for inference in stochastic transcriptional systems Maria Carilli Advances in experimental techniques have enabled the simultaneous quantification of multiple molecular readouts, such as the genomes, transcriptomes, and proteomes in millions of individual cells. Integrating these high-dimensional, multimodal data is a key outstanding biological problem, requiring mechanistic models for biological interpretability. We use physics-informed machine learning to develop a scalable and general approach for high-throughput inference of transcriptional system kinetics. Multimodal data can be integrated by defining models of transcriptional dynamics that parameterize multi-species biophysical processes. For example, the mammalian RNA life cycle includes transcription, splicing, and degradation of individual molecules. Such discrete systems can be modeled using chemical master equations, but do not afford analytical solutions. We use a neural network to approximate steady-state distributions for this model, and employ this differentiable approximation in a variational autoencoder to fit simulated and experimental data with unspliced and spliced RNA counts. The approximation and inference techniques can be extended to a variety of discrete physical systems, presenting opportunities for high-dimensional, mechanistic analyses beyond biology.
Tuesday, March 7, 2023 10:00AM - 10:12AM	F02.00009: Finding the Function-Determining Subset of Amino Acids in Protein Sequence Data Peter Fields, Vudtiwat Ngampruetikorn, Rama Ranganathan, David J Schwab, Stephanie E Palmer Energy-based models (EBM) fit to aligned sequences of a protein family have demonstrated the ability to generate novel functional protein sequences. This suggests that sequence-level statistics encode the salient features that underpin the structure and function of a protein. Understanding the extent to which EBMs can model such features is paramount to providing insight into their ability to sample new sequences, and consequently, insight into the biology. Specifically, we consider EBMs' ability to capture protein sectors, roughly 10 to 20 percent of total sequence positions that correlate strongly with biological functions. To this end, we fit pairwise models, Restricted Boltzmann machines (RBMs), and hybrid semi-Restricted Boltzmann machines (sRBMs) to synthetic sequences drawn from a minimal model endowed with a notion of sectors. The aim of incorporating an RBM is to model the sector. EBMs fit to the synthetic data are benchmarked by directly relating how well they model the sector to their generative performance. These benchmarks guide insight into the generative performance of EBMs that are fit to real data and are tested directly via lab experiments probing functionality of sampled sequences.

About APS

The American Physical Society (APS) is a non-profit membership organization working to advance the knowledge of physics.

Headquarters 1 Physics Ellipse, College Park, MD 20740-3844 (301) 209-3200
Editorial Office 100 Motor Pkwy, Suite 110, Hauppauge, NY 11788 (631) 591-4000
Office of Public Affairs 529 14th St NW, Suite 1050, Washington, D.C. 20045-2001 (202) 662-8700

Bulletin of the American Physical Society

APS March Meeting 2023

Volume 68, Number 3

Las Vegas, Nevada (March 5-10)Virtual (March 20-22); Time Zone: Pacific Time

Session F02: Statistical Physics Meets Machine Learning II

Follow Us

Engage

My APS

Information for

About APS

Las Vegas, Nevada (March 5-10)
Virtual (March 20-22); Time Zone: Pacific Time