Bulletin of the American Physical Society
APS March Meeting 2022
Volume 67, Number 3
Monday–Friday, March 14–18, 2022; Chicago
Session B42: Statistical Physics Meets Machine LearningInvited Live Streamed

Hide Abstracts 
Sponsoring Units: GSNP Chair: Herbie Levine Room: McCormick Place W375A 
Monday, March 14, 2022 11:30AM  12:06PM 
B42.00001: Statistical physics insights on learning in high dimensions Invited Speaker: Francesca Mignacco The very purpose of physics is to come to an understanding of empirically observed behaviour. From this point of view, the current success of machine learning provides a myriad of yet mysterious empirical observations that call for explanation, in particular in highdimensional nonconvex settings. Inspired by physics, where simple models are at the core of our theoretical understanding of the world, we study models of neural networks that are simple yet able to capture the salient features of real systems. In this talk, I will present several highdimensional and nonconvex statistical learning problems and I will highlight the importance of the associated theoretical questions. The common point of these settings is that the data come from a probabilistic generative model leading to problems for which, in the highdimensional limit, statistical physics provides exact closed solutions for the performance of gradientbased algorithms as well as the optimallyachievable performance, taken as a benchmark. I will describe some of our recent progress in the hunt for suitable models to study how the interplay between data and optimisation strategy can result in efficient learning. 
Monday, March 14, 2022 12:06PM  12:42PM 
B42.00002: Effective Theory of Deep Neural Networks Invited Speaker: Sho Yaida Large neural networks perform extremely well in practice, providing the backbone of modern machine learning. The goal of this talk is to provide a blueprint for theoretically analyzing these large models from first principles. In particular, we'll overview how the statistics and dynamics of deep neural networks drastically simplify at large width and become analytically tractable. In so doing, we'll see that the idealized infinitewidth limit is too simple to capture several important aspects of deep learning such as representation learning. To address them, we'll step beyond the idealized limit and systematically incorporate finitewidth corrections. 
Monday, March 14, 2022 12:42PM  1:18PM 
B42.00003: Information bottleneck approaches to representation learning Invited Speaker: David J Schwab Extracting relevant information from data is crucial for all forms of learning. The information bottleneck (IB) method formalizes this, offering a mathematically precise and conceptually appealing framework for understanding learning phenomena. However the nonlinearity of the IB problem makes it computationally expensive and analytically intractable in general. Here we explore a few recent approaches towards making IB approaches practical. First, we derive a perturbation theory for the IB method and report the first complete characterization of the learning onset, the limit of maximum relevant information per bit extracted from data. We test our results on synthetic probability distributions, finding good agreement with the exact numerical solution near the onset of learning. Next, we discuss earlier work on an alternative formulation that replaces mutual information with entropy, which we call the deterministic information bottleneck. As suggested by its name, the solution turns out to be a deterministic encoder, or hard clustering, as opposed to the stochastic encoder that is optimal under IB. We show that IB and this approach perform similarly in terms of the IB cost function, but that IB significantly underperforms when measured by this modified objective. Finally, we turn to the question of characterizing optimal representations for supervised learning. We propose the Decodable Information Bottleneck (DIB) that considers information retention and compression from the perspective of a desired predictive family. Empirically, DIB can be used to enforce a small generalization gap on downstream classifiers and to predict the generalization performance of neural networks. 
Monday, March 14, 2022 1:18PM  1:54PM 
B42.00004: Towards General and Robust Deep Learning at Scale Invited Speaker: Irina Rish Modern AI systems have achieved impressive results in many specific domains, from image and speech recognition to natural language processing and mastering complex games such as chess and Go. However, they often remain inflexible, fragile and narrow, unable to continually adapt to a wide range of changing environments and novel tasks without "catastrophically forgetting" what they have learned before, to infer higherorder abstractions allowing for systematic generalization to outofdistribution data, and to achieve the level of robustness necessary to "survive" various perturbations in their environment  a natural property of most biological intelligent systems, and a necessary property for successfully deploying AI systems in reallife applications. In this talk, I will provide a brief overview of our recent efforts towards making AI more broad (i.e., general/versatile) and more robust, focusing on continual learning, invariance and adversarial robustness. I will also emphasize the importance of developing an empirical science of AI behaviors, and focus on rapidly expanding field of neural scaling laws, which allow us to better compare and extrapolate behavior of various algorithms and models with increasing amounts of data, model size and computational resources. 
Monday, March 14, 2022 1:54PM  2:30PM 
B42.00005: Dynamics of Deep Learning: Landscapedependent Noise, Inverse Einstein Relation, and Flat Minima Invited Speaker: Yuhai Tu Despite tremendous success of the Stochastic Gradient Descent (SGD) algorithm in deep learning, little is known about how SGD finds generalizable solutions in the highdimensional weight space. In this talk, we discuss our recent work^{1,2} on establishing a theoretical framework based on nonequilibrium statistical physics to understand the SGD learning dynamics, the loss function landscape, and their relationship. Our study shows that SGD dynamics follows a lowdimensional driftdiffusion motion in the weight space and the loss function is flat with large values of flatness (inverse of curvature) in most directions. Furthermore, our study reveals a robust inverse relation between the weight variance in SGD and the landscape flatness opposite to the fluctuationresponse relation in equilibrium systems. We develop a statistical theory of SGD based on properties of the ensemble of minibatch loss functions and show that the noise strength in SGD depends inversely on the landscape flatness, which explains the inverse varianceflatness relation. Our study suggests that SGD serves as an ``smart" annealing strategy where the effective temperature selfadjusts according to the loss landscape in order to find the flat minimum regions that contain generalizable solutions. Finally, we discuss an application of these insights for reducing catastrophic forgetting efficiently for sequential multiple tasks learning. 
Follow Us 
Engage
Become an APS Member 
My APS
Renew Membership 
Information for 
About APSThe American Physical Society (APS) is a nonprofit membership organization working to advance the knowledge of physics. 
© 2024 American Physical Society
 All rights reserved  Terms of Use
 Contact Us
Headquarters
1 Physics Ellipse, College Park, MD 207403844
(301) 2093200
Editorial Office
100 Motor Pkwy, Suite 110, Hauppauge, NY 11788
(631) 5914000
Office of Public Affairs
529 14th St NW, Suite 1050, Washington, D.C. 200452001
(202) 6628700