Bulletin of the American Physical Society
2024 APS March Meeting
Monday–Friday, March 4–8, 2024; Minneapolis & Virtual
Session T28: Statistical Physics Meets Machine Learning IIFocus Session
|
Hide Abstracts |
|
Sponsoring Units: GSNP DSOFT GDS Chair: David Schwab, The Graduate Center, CUNY Room: 101I |
|
Thursday, March 7, 2024 11:30AM - 12:06PM |
T28.00001: Scaling Laws and Emergent Behaviors in Foundation Models Invited Speaker: Irina Rish Large-scale unsupervised pre-trained models, a.k.a. "Foundation models", are taking the AI field by storm, achieving state-of-art performance and impressive few-shot generalization abilities on a variety of tasks in multiple domains. Clearly, predicting the performance and other metrics of interest (robustness, truthfulness etc) at scale, including potential emergent behaviors, is crucial for (1) choosing learning methods that are likely to stand the test-of-time as larger compute becomes available, and (2) ensuring safe behavior of AI systems via anticipating potential emergent behaviors ("phase transitions"). We investigate both an "open-box" approach, when the access to learning dynamics and internal metrics of a neural network are available (e.g., in the case of "grokking" behavior), as well as "closed-box" approach where the predictions of future behavior must be made solely based on the previous behavior, without internal measurements of the system being available. |
|
Thursday, March 7, 2024 12:06PM - 12:18PM |
T28.00002: Reliable emulation of complex functionals by active learning with error control Xinyi Fang, Mengyang Gu, Jianzhong Wu A statistical emulator can be used as a surrogate of complex physics-based calculations to drastically reduce the computational cost. Its successful implementation hinges on an accurate representation of the nonlinear response surface with a high-dimensional input space. Conventional "space-filling'' designs, including random sampling and Latin hypercube sampling, become inefficient as the dimensionality of the input variables increases, and the predictive accuracy of the emulator can degrade substantially for a test input away from the training input set. To address this fundamental challenge, we develop a reliable emulator for predicting complex functionals by active learning with error control (ALEC). The algorithm is applicable to infinite-dimensional mapping with high-fidelity predictions and a controlled predictive error. The computational efficiency has been demonstrated by emulating the classical density functional theory (cDFT) calculations, a statistical-mechanical method widely used in modeling the equilibrium properties of complex molecular systems. We show that ALEC is more accurate than conventional emulators based on the Gaussian processes with "space-filling'' designs and alternative active learning methods. Besides, it is computationally more efficient than direct cDFT calculations. ALEC can be a reliable building block for emulating expensive functionals owing to its minimal computational cost, controllable predictive error, and fully automatic features. |
|
Thursday, March 7, 2024 12:18PM - 12:30PM |
T28.00003: Towards measuring generalization performance of deep neural networks via the Fisher information matrix Chase W Goddard, David J Schwab The problem of generalization in deep neural networks (DNNs) with many parameters is still not well-understood. In particular, there is clear empirical evidence that DNNs generalize well even in the overparameterized regime, where the networks have many more parameters than there are training examples, and generically do not overfit the training data. While some measures, such as flatness of the minimum found by the optimizer (Jiang et. al. 2019), have been shown empirically to correlate well with the generalization ability of a model, these measures often work only in particular regimes (Kaur et. al. 2022). Here, we aim to construct a generalization measure based on the Fisher information matrix of a model, which we show is computationally tractable to compute for large models. We provide theoretically-motivated intuition for why our Fisher-based measure should be predictive of generalization. Further, we investigate the performance of our measure in various settings and across choices of hyperparameters, and compare its performance to traditional generalization measures such as flatness of the loss function. We show that our measure predicts generalization performance across a range of settings. |
|
Thursday, March 7, 2024 12:30PM - 12:42PM |
T28.00004: Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos Dayal Singh Kalra, Tianyu He, Maissam Barkeshli In gradient descent dynamics of neural networks, the top eigenvalue of the Hessian of the loss (sharpness) displays a variety of robust phenomena throughout training. This includes early time regimes where the sharpness may decrease during early periods of training (sharpness reduction), and later time behavior such as progressive sharpening and edge of stability. We demonstrate that a simple two-layer linear network (uv model) trained on a single training example exhibits all of the essential sharpness phenomenology observed in real-world scenarios. By analyzing the structure of dynamical fixed points in function space and the vector field of function updates, we uncover the underlying mechanisms behind these sharpness trends. Our analysis reveals (i) the mechanism behind early sharpness reduction and progressive sharpening, (ii) the required conditions for edge of stability, and (iii) a period-doubling route to chaos on the edge of stability manifold as learning rate is increased. Finally, we demonstrate that various predictions from this simplified model generalize to real-world scenarios and discuss its limitations. |
|
Thursday, March 7, 2024 12:42PM - 1:18PM |
T28.00005: Statistical Mechanics of Semantic Compression Invited Speaker: Tankut U Can I consider the problem of semantic compression: how short can you make a message while keeping its meaning? This can be quantified by introducing a measure of semantic similarity. The idea of a continuous semantic vector space has gained traction in both experimental cognitive psychology as well as machine learning and artificial intelligence. In such a space, input data like words, text, or pictures, are mapped to high-dimensional vectors, and the relationship between these "semantic embeddings" determines their meaning. This suggests that a natural metric for semantic similarity is just the Euclidean distance in this semantic vector space. Equipped with this, I formulate a combinatorial optimization problem for determining the minimal length of a message which satisfies a bound on the semantic distance, or distortion. This optimization problem can be mapped to a statistical mechanical model of a two-spin Hopfield spin glass, which can be solved using replica theory. In this work, I map out the phase diagram of this model in an idealized setting in which the elements of a finite but large lexicon all have random embeddings. I will finally comment on extensions and implications of these results for semantic compression with more realistic embeddings, such as those encountered machine learning. |
|
Thursday, March 7, 2024 1:18PM - 1:30PM |
T28.00006: Bounds on learning with power-law priors Sean A Ridout, Ilya M Nemenman, Ard A Louis, Chris Mingard, Radosław Grabarczyk, Kamaludin Dingle, Guillermo Valle Pérez, Charles London Modern machine-leaning architectures often achieve good generalization despite having enough parameters to express any function on the training data. This is surprising, since such flexibility suggests they should "overfit'' and generalize poorly. In order to generalize well in the regime where any function can be expressed, a learning machine must have a good "inductive bias'': although any function may be expressed, some must be strongly disfavored. We study the inductive biases of many expressive classifiers through the distribution of functions produced by random parameter values, a proxy for their induced Bayesian priors and the corresponding inductive bias. These experiments reveal a universal power-law, "Zipfian'' prior in the space of functions. Here we rationalize the universality of this prior by studying the implications of power-law tails in the prior for Bayesian learning in the overparametrized regime. We show that any tail broader than Zipfian implies that a learning machine will fail to generalize on unseen data, while a narrower tail limits the number of functions that can be learned. This implies that the type of prior distribution seen in commonly-used learning machines is the only type of prior which can allow successful learning in the overparameterized regime. |
|
Thursday, March 7, 2024 1:30PM - 1:42PM |
T28.00007: In-depth analysis of the learning process for a small artificial neural network Xiguang Yang, Krish Arora, Michael Bachmann Machine learning and artificial neural networks are among the most rapidly advancing tools in many fields, including physics. Neural networks have already proven to be valuable optimization methods in numerous scientific applications. Although the potential hidden inside these network architectures is tremendous, to unleash the full potential a thorough understanding of the deep-learning mechanism used to train the networks is necessary. In our study, we investigate the loss landscape and backpropagation dynamics of convergence for the logical exclusive-OR (XOR) gate by means of one of the simplest artificial neural networks composed of sigmoid neurons. Various optimal parameter sets of weights and biases that enable the correct logical mapping from the input neurons via a single layer with two hidden neurons to the output neuron are identified. The state space of the neural network is expressed by a nine-dimensional loss landscape, but three-dimensional cross sections already exhibit distinct features such as plateaus and channels. Our analysis of the learning process helps to understand why backpropagation efficiently achieves convergence toward zero loss, whereas values of weights and biases keep drifting. |
|
Thursday, March 7, 2024 1:42PM - 1:54PM |
T28.00008: Understanding Neural Network Generalizability from the Perspective of Entropy Entao Yang, Xiaotian Zhang, Ge Zhang Neural Networks (NNs) have shown remarkable success in solving a wide range of machine learning problems, ranging from image recognition to natural-language conversation (e.g., Chat-GPT). The generalizability of NNs, which measures NNs' ability to perform well on data unseen from training, is a critical evaluation metric for their usefulness in the real-world applications. However, the underlying mechanism that determines NNs' varying degrees of generalizability remains an open question. While it has been long suspected in the machine learning community that NNs at a flatter minimum of the loss-function landscape tend to generalize better, recent works suggest that the flatness itself may not be sufficient to determine the generalizability. In statistical physics, the flatness of a minimum in an energy landscape can be quantified by entropy. Therefore, we calculated the entropy of the loss-function landscape of NNs using Wang-Landau molecular dynamics and explored the potential correlation between such entropy and NNs' generalizability. By testing in both synthetic and real-world datasets, we found that the entropy-equilibrium state is better or at least comparable to the state reachable via the classical training optimizer (e.g., stochastic gradient descent). |
|
Thursday, March 7, 2024 1:54PM - 2:06PM |
T28.00009: Deep Variational Multivariate Information Bottleneck K. Michael Martini, Eslam Abdelaleem, Ilya M Nemenman Variational dimensionality reduction methods are known for their high accuracy, generative abilities, and robustness. We introduce a unifying principle rooted in information theory to rederive, generalize, and design variational methods. We base our framework on an interpretation of the multivariate information bottleneck, in which the information in an encoder graph is traded off against the information in a decoder graph. The encoder graph specifies the compression of the data and the decoder graph specifies a generative model for the data. Using this framework, we can rederive the deep variational information bottleneck and variational autoencoders, and we generalize deep variational CCA (DVCCA) to beta-DVCCA. We also design a new method, deep variational symmetric informational bottleneck (DVSIB), which simultaneously compresses two variables to preserve information between their compressed representations. We implement all these algorithms and evaluate their ability to produce shared low dimensional latent spaces on a modified noisy MNIST dataset. We show that algorithms that are better matched to the structure of the data (beta-DVCCA and DVSIB in our case) produce better latent spaces as measured by classification accuracy and the dimensionality of the latent variables. |
|
Thursday, March 7, 2024 2:06PM - 2:18PM |
T28.00010: Nonlinear classification of neural manifolds with context information: geometrical properties and storage capacity Francesca Mignacco, Chi-Ning Chou, SueYeon Chung Understanding how neural systems process information through high-dimensional representations presents a fundamental challenge that lies at the interface of theoretical neuroscience and machine learning. A commonly adopted approach to tackle this problem involves the analysis of statistical and geometrical attributes that link neural activity to task implementation in high-dimensional spaces. Here, we explore an analytically-solvable classification model that derives its decision-making rules from a collection of input-dependent "expert" neurons, each associated with distinct contexts through half-space gating mechanisms. This formulation allows to consider non-linearly separable tasks. We investigate the interplay between the geometry of object representations and the correlations within the context functions. By examining these connections, we aim to elucidate how these properties influence the disentanglement of representations, which we measure through the storage capacity. |
Follow Us |
Engage
Become an APS Member |
My APS
Renew Membership |
Information for |
About APSThe American Physical Society (APS) is a non-profit membership organization working to advance the knowledge of physics. |
© 2025 American Physical Society
| All rights reserved | Terms of Use
| Contact Us
Headquarters
1 Physics Ellipse, College Park, MD 20740-3844
(301) 209-3200
Editorial Office
100 Motor Pkwy, Suite 110, Hauppauge, NY 11788
(631) 591-4000
Office of Public Affairs
529 14th St NW, Suite 1050, Washington, D.C. 20045-2001
(202) 662-8700
