Bulletin of the American Physical Society
APS March Meeting 2022
Volume 67, Number 3
Monday–Friday, March 14–18, 2022; Chicago
Session B42: Statistical Physics Meets Machine LearningInvited Live Streamed
|
Hide Abstracts |
Sponsoring Units: GSNP Chair: Herbie Levine Room: McCormick Place W-375A |
Monday, March 14, 2022 11:30AM - 12:06PM |
B42.00001: Statistical physics insights on learning in high dimensions Invited Speaker: Francesca Mignacco The very purpose of physics is to come to an understanding of empirically observed behaviour. From this point of view, the current success of machine learning provides a myriad of yet mysterious empirical observations that call for explanation, in particular in high-dimensional non-convex settings. Inspired by physics, where simple models are at the core of our theoretical understanding of the world, we study models of neural networks that are simple yet able to capture the salient features of real systems. In this talk, I will present several high-dimensional and non-convex statistical learning problems and I will highlight the importance of the associated theoretical questions. The common point of these settings is that the data come from a probabilistic generative model leading to problems for which, in the high-dimensional limit, statistical physics provides exact closed solutions for the performance of gradient-based algorithms as well as the optimally-achievable performance, taken as a benchmark. I will describe some of our recent progress in the hunt for suitable models to study how the interplay between data and optimisation strategy can result in efficient learning. |
Monday, March 14, 2022 12:06PM - 12:42PM |
B42.00002: Effective Theory of Deep Neural Networks Invited Speaker: Sho Yaida Large neural networks perform extremely well in practice, providing the backbone of modern machine learning. The goal of this talk is to provide a blueprint for theoretically analyzing these large models from first principles. In particular, we'll overview how the statistics and dynamics of deep neural networks drastically simplify at large width and become analytically tractable. In so doing, we'll see that the idealized infinite-width limit is too simple to capture several important aspects of deep learning such as representation learning. To address them, we'll step beyond the idealized limit and systematically incorporate finite-width corrections. |
Monday, March 14, 2022 12:42PM - 1:18PM |
B42.00003: Information bottleneck approaches to representation learning Invited Speaker: David J Schwab Extracting relevant information from data is crucial for all forms of learning. The information bottleneck (IB) method formalizes this, offering a mathematically precise and conceptually appealing framework for understanding learning phenomena. However the nonlinearity of the IB problem makes it computationally expensive and analytically intractable in general. Here we explore a few recent approaches towards making IB approaches practical. First, we derive a perturbation theory for the IB method and report the first complete characterization of the learning onset, the limit of maximum relevant information per bit extracted from data. We test our results on synthetic probability distributions, finding good agreement with the exact numerical solution near the onset of learning. Next, we discuss earlier work on an alternative formulation that replaces mutual information with entropy, which we call the deterministic information bottleneck. As suggested by its name, the solution turns out to be a deterministic encoder, or hard clustering, as opposed to the stochastic encoder that is optimal under IB. We show that IB and this approach perform similarly in terms of the IB cost function, but that IB significantly underperforms when measured by this modified objective. Finally, we turn to the question of characterizing optimal representations for supervised learning. We propose the Decodable Information Bottleneck (DIB) that considers information retention and compression from the perspective of a desired predictive family. Empirically, DIB can be used to enforce a small generalization gap on downstream classifiers and to predict the generalization performance of neural networks. |
Monday, March 14, 2022 1:18PM - 1:54PM |
B42.00004: Towards General and Robust Deep Learning at Scale Invited Speaker: Irina Rish Modern AI systems have achieved impressive results in many specific domains, from image and speech recognition to natural language processing and mastering complex games such as chess and Go. However, they often remain inflexible, fragile and narrow, unable to continually adapt to a wide range of changing environments and novel tasks without "catastrophically forgetting" what they have learned before, to infer higher-order abstractions allowing for systematic generalization to out-of-distribution data, and to achieve the level of robustness necessary to "survive" various perturbations in their environment - a natural property of most biological intelligent systems, and a necessary property for successfully deploying AI systems in real-life applications. In this talk, I will provide a brief overview of our recent efforts towards making AI more broad (i.e., general/versatile) and more robust, focusing on continual learning, invariance and adversarial robustness. I will also emphasize the importance of developing an empirical science of AI behaviors, and focus on rapidly expanding field of neural scaling laws, which allow us to better compare and extrapolate behavior of various algorithms and models with increasing amounts of data, model size and computational resources. |
Monday, March 14, 2022 1:54PM - 2:30PM |
B42.00005: Dynamics of Deep Learning: Landscape-dependent Noise, Inverse Einstein Relation, and Flat Minima Invited Speaker: Yuhai Tu Despite tremendous success of the Stochastic Gradient Descent (SGD) algorithm in deep learning, little is known about how SGD finds generalizable solutions in the high-dimensional weight space. In this talk, we discuss our recent work1,2 on establishing a theoretical framework based on nonequilibrium statistical physics to understand the SGD learning dynamics, the loss function landscape, and their relationship. Our study shows that SGD dynamics follows a low-dimensional drift-diffusion motion in the weight space and the loss function is flat with large values of flatness (inverse of curvature) in most directions. Furthermore, our study reveals a robust inverse relation between the weight variance in SGD and the landscape flatness opposite to the fluctuation-response relation in equilibrium systems. We develop a statistical theory of SGD based on properties of the ensemble of minibatch loss functions and show that the noise strength in SGD depends inversely on the landscape flatness, which explains the inverse variance-flatness relation. Our study suggests that SGD serves as an ``smart" annealing strategy where the effective temperature self-adjusts according to the loss landscape in order to find the flat minimum regions that contain generalizable solutions. Finally, we discuss an application of these insights for reducing catastrophic forgetting efficiently for sequential multiple tasks learning. |
Follow Us |
Engage
Become an APS Member |
My APS
Renew Membership |
Information for |
About APSThe American Physical Society (APS) is a non-profit membership organization working to advance the knowledge of physics. |
© 2024 American Physical Society
| All rights reserved | Terms of Use
| Contact Us
Headquarters
1 Physics Ellipse, College Park, MD 20740-3844
(301) 209-3200
Editorial Office
100 Motor Pkwy, Suite 110, Hauppauge, NY 11788
(631) 591-4000
Office of Public Affairs
529 14th St NW, Suite 1050, Washington, D.C. 20045-2001
(202) 662-8700