Bulletin of the American Physical Society
APS March Meeting 2023
Volume 68, Number 3
Las Vegas, Nevada (March 5-10)
Virtual (March 20-22); Time Zone: Pacific Time
Session D02: Statistical Physics Meets Machine Learning IFocus
|
Hide Abstracts |
Sponsoring Units: GSNP DSOFT DBIO GDS Chair: Yuhai Tu, IBM T. J. Watson Research Center Room: Room 125 |
Monday, March 6, 2023 3:00PM - 3:36PM |
D02.00001: Scaling Laws in Deep Neural Networks: Insights from Statistical Mechanics and Exactly Solvable Models Invited Speaker: Yasaman Bahri Artificial deep neural networks are complex, nonlinear statistical models whose learning and function often depends strongly on the model structure and choice of data and algorithm. Empircally, it has been observed that the generalization ability of such networks in learning tasks is frequently governed by power-law trends with respect to simple scaling variables, such as the amount of data available to learn from and the number of learnable parameters. A full understanding -- and in particular, a prescriptive theoretical framework -- for what governs this scaling is lacking. Towards this end, I will discuss our work introducing a classification of different regimes of behavior -- notions of "resolution-limited" and "variance-limited" regimes -- based on the mechanistic origins behind the scaling. Along the way, I will review and then leverage insights from recently discovered exactly solvable models for deep neural networks, a setting in which we can derive the different regimes exactly. I'll close by discussing implications and remaining challenges. |
Monday, March 6, 2023 3:36PM - 3:48PM |
D02.00002: Results from a Mapping Between Reinforcement Learning and Non-Equilibrium Statistical Mechanics Jacob Adamczyk, Argenis Arriojas Maldonado, Stas Tiomkin, Rahul V Kulkarni Reinforcement learning (RL), a field of machine learning that can be used to solve sequential decision-making problems, has recently become a popular tool for obtaining solutions to a variety of complex problems in physics. Despite this success as a tool, there has been limited work focusing on the relationship between the theoretical frameworks of RL and statistical mechanics. Our recent work has established a mapping between average-reward entropy-regularized RL and non-equilibrium statistical mechanics (NESM) using large deviation theory. We highlight how this mapping allows one to approach problems in NESM from an RL perspective and vice versa. As an example, we discuss how results from RL research on "reward shaping" can be extended using the framework of statistical mechanics of trajectories. In this setting, we derive results in RL that are analogous to the Gibbs-Bogoliubov's inequality in equilibrium statistical mechanics. We propose methods to iteratively improve this bound based on results from RL. The mapping established in our work can thus lead to new results and algorithms in both RL and NESM. |
Monday, March 6, 2023 3:48PM - 4:00PM Author not Attending |
D02.00003: The Onset of Variance-Limited Behavior for Neural Networks at Finite Width and Sample Size. Alexander B Atanasov, Cengiz Pehlevan, Blake Bordelon, Sabarish Sainathan For small training set sizes, the generalization error of wide neural networks is well-approximated by the error of an infinite width neural network. However, at a training set size, the finite-width network generalization begins to worsen compared to the infinite width performance. We empirically study the transition from the infinite width behavior to this variance-limited regime as a function of training set size and network width and network initialization scale. We find that finite size effects can become relevant for very small dataset sizes going as the square root of the width for polynomial regression with ReLU networks. We discuss the source of this finite size behavior based on the variance of the NN's final neural tangent kernel (NTK). Using this, we provide a toy model which also exhibits the same scaling and has sample-size dependent benefits from feature learning. |
Monday, March 6, 2023 4:00PM - 4:12PM Author not Attending |
D02.00004: Feature learning and overfitting in neural networks Francesco Cagnetta It is widely believed that the success of deep networks lies in their ability to learn a meaningful representation of the features of the data. Yet, understanding when and how this feature learning improves performance remains a challenge: for example, for a fixed task such as classifying images, feature learning is beneficial in modern architectures but detrimental in standard fully-connected feed-forward networks. Here we propose an explanation for this puzzle, by showing that feature learning can result in poor generalization performances as it leads to a `sparse' neural representation, where only a fraction of the connection in the original network are active. Although sparsity is known to be essential for learning anisotropies in the data, it is detrimental when the target function is constant or smooth along certain directions of input space. We illustrate this phenomenon in two settings: (i) regression of Gaussian random functions on the d-dimensional unit sphere and (ii) classification of benchmark image datasets. For (i), we can compute the scaling of the generalization error with number of training points analitically, thus show quantitatively how methods that do not learn features generalize better if the target function is sufficiently smooth. For (ii), we show empirically that learning features can indeed lead to sparse and thus less smooth representations. Since an image classifier must be highly smooth with respect to small deformations of the image, this is likely cause of poor performance. |
Monday, March 6, 2023 4:12PM - 4:24PM |
D02.00005: Flatter, Faster; Scaling Momentum for Optimal Speedup of SGD Aditya Cowsik, Tankut U Can, Paolo Glorioso
|
Monday, March 6, 2023 4:24PM - 4:36PM |
D02.00006: Statistical Mechanics of Infinitely-Wide Convolutional Networks Alessandro Favero, Francesco Cagnetta, Matthieu Wyart Understanding how convolutional neural networks (CNNs) can efficiently learn high-dimensional tasks remains a challenge. A popular belief is that these models harness the translational invariant, local, and hierarchical structure of natural data such as images. Yet, we lack a quantitative understanding of how this structure affects performance. To study this problem, we consider wide CNNs in the kernel limit, where generalisation can be characterised using statistical mechanics methods. We introduce a stylised teacher-student framework where a CNN is trained on the output of another CNN with random weights. In this framework, we control the structure of the target function by adding weight sharing and by tuning the size of the neuron receptive fields and the depth of the teacher network. First, we find that translational invariance does not change the scaling of learning curves, that measure the decay of the generalisation error with the number of training examples, and therefore is not enough to beat the curse of dimensionality. Then, we show that if the target function has a local structure, i.e., it depends only on low-dimensional subsets of adjacent input variables, CNNs beat the curse of dimensionality. In fact, the learning curve scaling is controlled by the dimension of these subsets and not by the full input dimension. Finally, we show that the hierarchical structure of CNNs is too rich to be efficiently learnable in high dimensions and discuss further classes of hierarchical target functions. |
Monday, March 6, 2023 4:36PM - 4:48PM |
D02.00007: Phase diagram of training dynamics in deep neural networks: effect of learning rate, depth, and width Dayal Singh Kalra, Maissam Barkeshli We systematically analyze optimization dynamics in deep feed-forward neural networks (DNNs) trained with stochastic gradient descent (SGD) over long time scales and study carefully the effect of learning rate, depth, and width of the neural network. By analyzing the top eigenvalue λt of the Hessian of the loss, which is a proxy for sharpness, we find that the dynamics can show four distinct regimes: (i) an early time transient regime, (ii) an intermediate saturation regime, (iii) a progressive sharpening regime, and finally (iv) a late time ``edge of stability" regime. The early and intermediate regimes (i) and (ii) exhibit a rich phase diagram depending on learning rate η = c / λ0, depth d and width w. We identify four critical values of c: ccritical, closs, csharp, cmax, which separate qualitatively distinct phenomena. In particular, we discover a regime ccritical < c < closs, which opens up with increasing d/w, in which the sharpness decreases significantly but without an initial increase in the loss, violating the simple picture of catapulting out of a local basin and into a wider one by traversing up a barrier. Our results have important implications for the question of how to scale learning rate as the DNN depth and width are increased in order to remain in the same phase of learning. |
Monday, March 6, 2023 4:48PM - 5:00PM |
D02.00008: Generative probabilistic matrix model of data with different low-dimensional latent structures Philipp Fleig, Ilya M Nemenman Complex biological and social features are often modelled by effective models with latent features, which serve the role of emergent, collective degrees of freedom. In many contexts, identification of such features needs to proceed directly from data. Unfortunately, we know very little about how different types of latent feature models manifest themselves in data, which makes inference hard. In this work, we investigate properties of data produced by different types of latent feature models, all described as special cases of a general model involving mixing of latent features. Key ingredient of our model is that we allow for statistical dependence between the mixing coefficients, as well as latent features with a statistically dependent structure. Latent dimensionality and correlation patterns of the data are controlled by three model parameters. The model's special cases include hierarchical clusters, sparse mixing, and non-negative mixing. We describe the correlation and eigenvalue distributions of these patterns within the general model and discuss how our model can be used to generate structured training data for supervised learning. |
Monday, March 6, 2023 5:00PM - 5:12PM |
D02.00009: The Evolution of the Fisher Information Matrix During Deep Neural Network Training Chase W Goddard, David J Schwab
|
Monday, March 6, 2023 5:12PM - 5:24PM |
D02.00010: When does Dual Dimensionality Reduction perform better than Single Dimensionality Reduction? Eslam Abdelaleem, K. Michael Martini, Ahmed H Roman, Ilya M Nemenman Current experiments in many fields often generate large-dimensional datasets with multiple modalities (e.g., neural activity and animal behavior). Often the first step in understanding these experiments is finding correlations between different modalities, which requires dimensionality reduction (DR). We previously introduced the concept of Dual Dimensionality Reduction (DDR) approaches, which simultaneously compress both data modalities to maximize the covariation between their reduced descriptions. We argued that DDR requires significantly fewer data points to detect correlations than performing DR on each modality independently and then identifying relations between the reduced descriptions. Here we use a generative model of multivariate correlated data and linear dimensionality reduction approaches to carefully explore under which conditions DDR methods outperform independent approaches. We extend the argument to nonlinear reduction methods as well, using Deep Canonical Correlation Analysis as a nonlinear DDR and autoencoders for independent reduction of individual modalities. We believe that our analysis points to a general principle that DDR methods are often more data efficient in detecting weak correlations than their independent DR equivalents. |
Monday, March 6, 2023 5:24PM - 5:36PM |
D02.00011: Physics-Informed featurization of spectral functions Shubhang Goswami, Kipton M Barros, Matthew R Carbone Spectral functions are key observables of interacting many-body systems, and characterizing them is of great interest. We investigate two methods for approximating spectral functions via rational approximations, i.e., approximations as a ratio of two polynomials. The first approach, VFIT, approximates individual spectral functions of lattice polaron models in a reliable, simple, and accurate way. We also introduce a second fitting procedure, which we call the Smooth Rational Approximation (SRA), that simultaneously fits an entire batch of spectral functions. This fitting procedure can be regularized such that the predicted spectral functions vary smoothly with the governing physical parameters. |
Monday, March 6, 2023 5:36PM - 5:48PM |
D02.00012: Efficient Modelling of Ge15Te85 using Active Learning Methods Thomas Arbaugh, Francis W Starr Germanium-Telluride is a phase-change material (PCM) that shows promise for potential applications in advanced memory materials. In the 15:85 composition, several anomalous features, including a sharp density maximum and a likely fragile-to-strong transition in the dynamics, occur upon cooling. Unfortunately, accurate simulations of PCM materials typically rely on Density Functional Theory (DFT) and are very limited in the accessible size and time scales, making it difficult to model the properties of these materials. To overcome this challenge, we utilize recently developed machine-learning interatomic potentials (MLIPs) that enable the creation of lightweight and efficient potentials. These potentials are trained on and reproduce DFT-accurate descriptions of materials over a broad range of the phase diagram. We discuss active learning, compare training methods, and evaluate the ability of trained MLIPs to match experimentally known quantities of Ge15Te85. In particular, we find that these potentials reproduce the experimentally known structure to a high degree of accuracy. |
Monday, March 6, 2023 5:48PM - 6:00PM |
D02.00013: A simple model for Grokking modular arithmetic Andrey Gromov
|
Follow Us |
Engage
Become an APS Member |
My APS
Renew Membership |
Information for |
About APSThe American Physical Society (APS) is a non-profit membership organization working to advance the knowledge of physics. |
© 2024 American Physical Society
| All rights reserved | Terms of Use
| Contact Us
Headquarters
1 Physics Ellipse, College Park, MD 20740-3844
(301) 209-3200
Editorial Office
100 Motor Pkwy, Suite 110, Hauppauge, NY 11788
(631) 591-4000
Office of Public Affairs
529 14th St NW, Suite 1050, Washington, D.C. 20045-2001
(202) 662-8700