Bulletin of the American Physical Society

APS March Meeting 2022

Volume 67, Number 3

Monday–Friday, March 14–18, 2022; Chicago

Session F09: Physics of Machine Learning I

Sponsoring Units: GSNP GDS DCOMP DSOFT
Chair: Yuhai Tu, IBM T. J. Watson Research Center
Room: McCormick Place W-180

Tuesday, March 15, 2022 8:00AM - 8:36AM	F09.00001: Toward Statistical Mechanics of Deep Learning Invited Speaker: Haim I Sompolinsky The groundbreaking success of deep learning in many real-world tasks has triggered an intense effort to theoretically understand the power and limitations of deep learning in the training and generalization of complex tasks. I will present progress in the theory of Deep Learning, based on the statistical mechanics of weight space in the appropriate thermodynamic limit. I will first discuss Deep Linear Neural Networks (DLNNs). Despite the linearity of the units, learning in DLNNs is highly nonlinear, hence studying its properties reveals some of the essential features of nonlinear Deep Neural Networks. To derive properties of network weight space after learning we introduce the Back-Propagating Kernel Renormalization (BPKR), which allows for the incremental integration of the network weights layer-by-layer starting from the network output layer and progressing backward until the first layer's weights are integrated out. This procedure allows us to evaluate important network properties, such as its generalization error, the role of network width and depth, the impact of the size of the training set, the effects of weight regularization and learning stochasticity, as well as the emergent neural representations in each layer. Unlike most statistical mechanical investigations of learning in neural network, the new theory does not make specific assumption about the statistics of the inputs or the desired target; thus it can be applied to realistic data and tasks. A heuristic extension of the BPKR to nonlinear DNNs with rectified linear units (ReLU) yields surprisingly good fit to numerical simulations for networks with modest depth, in a wide regime of parameters. Extensions, including deep convolutional networks, and interesting families of nonlinear DNNs will be discussed.
Tuesday, March 15, 2022 8:36AM - 8:48AM	F09.00002: AI Pontryagin or: How Artificial Neural Networks Learn to Control Dynamical Systems Lucas Boettcher, Thomas Asikis, Nino Antulov-Fantulin The efficient control of complex dynamical systems has many applications in the natural and applied sciences. In most real-world control problems, control energy and cost constraints play a significant role. Although such optimal control problems can be formulated within the framework of variational calculus, their solution for complex systems is often analytically and computationally intractable. To overcome this outstanding challenge, we present AI Pontryagin, a versatile neural ordinary-differential-equation-based control framework that automatically learns control signals that steer high-dimensional dynamical systems towards a desired target state within a predefined amount of time. We demonstrate the ability of AI Pontryagin to learn control signals that closely resemble those found by corresponding optimal control frameworks in terms of control energy and deviation from the desired target state. Our results suggest that AI Pontryagin is capable to solve a wide range of control and optimization problems, including those that are analytically intractable.
Tuesday, March 15, 2022 8:48AM - 9:00AM	F09.00003: Probing the Theoretical and Computational Limits of Dissipative Design Shriram Chennakesavalu, Grant M Rotskoff Self-assembly, the process by which interacting components form well-defined and often intricate structures, is typically thought of as a spontaneous process arising from equilibrium dynamics. When a system is driven by external nonequilibrium forces, states statistically inaccessible to the equilibrium dynamics can arise, a process sometimes termed direct self-assembly. However, if we fix a given target state and a set of external control variables, it is not well-understood i) how to design a protocol to drive the system towards the desired state nor ii) the energetic cost of persistently perturbing the stationary distribution. Here we derive a bound that relates the proximity to the chosen target with the dissipation associated with the external drive, showing that high-dimensional external control can guide systems towards target distribution but with an inevitable entropic cost. Secondly, we investigate the performance of deep multi-agent reinforcement learning algorithms and provide evidence for the realizability of complex protocols that stabilize otherwise inaccessible states of matter. Furthermore, we find that when agents share relevant information about the system, the learned protocols can more closely realize a given target state.
Tuesday, March 15, 2022 9:00AM - 9:12AM	F09.00004: The Role of Data in the Sloppiness of Deep Networks Pratik Chaudhari, Rubing Yang, Jialin Mao We study how the dataset may be the cause of the anomalous generalization performance of deep networks. We show that the data correlation matrix of typical classification datasets has an eigenspectrum where, after a sharp initial drop, a large number of small eigenvalues are distributed uniformly over an exponentially large range. This structure is mirrored in a network trained on this data: we show that the Hessian and the Fisher Information Matrix (FIM) have eigenvalues that are spread uniformly over exponentially large ranges. For such ``sloppy'' eigenspectra, sets of weights corresponding to small eigenvalues can be modified by large magnitudes without affecting the loss. Networks trained on atypical, non-sloppy synthetic data do not share these traits. We show how this structure in the data sheds light on the generalization performance of deep networks using PAC-Bayesian analysis.
Tuesday, March 15, 2022 9:12AM - 9:24AM	F09.00005: Statistical Mechanics of Kernel Regression and Wide Neural Networks Abdulkadir Canatar, Blake Bordelon, Cengiz Pehlevan A theoretical understanding of generalization remains an open problem for many machine learning models, including deep networks where overparameterization leads to better performance. Here, we study this problem for kernel regression, which, besides being a popular machine learning method, also describes infinitely overparameterized neural networks. We develop an analytical theory of generalization in kernel regression using replica theory of statistical mechanics. This theory is applicable to any kernel and data distribution. Experiments with practical kernels including those arising from wide neural networks show perfect agreement with our theory. Further, our theory accurately predicts generalization performance of neural networks with modest widths. We provide an in-depth analysis of our analytical expression for kernel generalization. We show that kernel machines employ an inductive bias towards simple functions, preventing them to overfit the data. We characterize whether a kernel is compatible with a learning task in terms of sample efficiency. We identify a first order phase transition in our theory where more data may impair generalization when the task is noisy or not expressible by the kernel. Finally, we extend these results to out-of-distribution generalization.
Tuesday, March 15, 2022 9:24AM - 9:36AM	F09.00006: Machine learning in and out of equilibrium Michael Hinczewski, Shishir Adhikari, Alkan Kabakcioglu, Alexander Strang, Deniz Yuret The algorithms used to train neural networks, like stochastic gradient descent (SGD), have close parallels to natural processes that navigate a high-dimensional parameter space—for example protein folding or evolution. Our study uses a Fokker-Planck approach, adapted from statistical physics, to explore these parallels in a single, unified framework. We focus in particular on the stationary state of the system in the long-time limit. In contrast to its biophysical analogues, conventional SGD leads to a nonequilibrium stationary state exhibiting persistent currents in the space of network parameters. The effective loss landscape that determines the shape of this stationary distribution sensitively depends on training details, i.e. the choice to minibatch with or without replacement. We also demonstrate that the state satisfies the integral fluctuation theorem, a nonequilibrium generalization of the second law of thermodynamics. Finally, we introduce an alternative ``thermalized'' SGD procedure, designed to achieve an equilibrium stationary state. Deployed as a secondary training step, after conventional SGD has converged, thermalization is an efficient method to implement Bayesian machine learning, allowing us to estimate the posterior distribution of network predictions.
Tuesday, March 15, 2022 9:36AM - 9:48AM	F09.00007: Can Artificial Intelligence "formulate" Quantum Mechanics? An Illustration for Planck's Blackbody Radiation Vishnu Shankar, Sadasivan Shankar With the success of deep neural networks (DNN) in various applications, these methods have been uniformly proposed to solve scientific and engineering problems. To assess this premise, we investigated whether DNN formalisms can really formulate the underlying physics, by using the example of black body radiation (BBR). To resolve the “ultraviolet catastrophe” that incorrectly predicted black body emission of infinite energy at high frequency, Planck derived an “interpolation formula” which matched the experimental spectral intensity data using the assumption that radiation can only be emitted in quanta, connecting classical and quantum mechanics. To evaluate the ability of DNNs to similarly characterize BBR data, we evaluated multiple architectures, along with network dimensions, activation functions, learning rates, etc. Our analysis underscores the difficulty of extracting the physics of frequency correlation, independent of the amount of noise-free spectral data. By studying the functional forms that drive model predictions and examining the reasons for these difficulties, our findings, more broadly, exemplify the challenge of extracting physical laws using machine learning even with “hints”, and why physical intuition continues to be critical in scientific discoveries.
Tuesday, March 15, 2022 9:48AM - 10:00AM	F09.00008: Non-Gaussian effects in finite Bayesian neural networks Jacob Zavatone-Veth, Abdulkadir Canatar, Benjamin S Ruben, Cengiz Pehlevan Bayesian neural networks (BNNs) are theoretically well-understood only in the infinite-width limit, where Gaussian priors over network weights yield Gaussian priors over network outputs. Recent works have suggested that finite BNNs may outperform their infinite cousins because finite networks can adapt their internal representations of data, but our understanding of the non-Gaussian priors and learned hidden layer representations of finite networks remains incomplete. Here, we take two steps towards an understanding of non-Gaussian effects in finite BNNs. First, we argue that the leading finite-width corrections to the feature kernels for any BNN with linear readout have a largely universal form. We illustrate this for three tractable network architectures: deep linear fully-connected and convolutional networks, and networks with a single nonlinear hidden layer. Second, we derive exact solutions for the marginal function space priors of a class of finite feedforward BNNs. These results unify previous descriptions of finite BNN priors in terms of their tail decay and asymptotics. In total, our work begins to elucidate how finite BNNs differ from their infinite cousins.
Tuesday, March 15, 2022 10:00AM - 10:12AM	F09.00009: Equilibrium and non-Equilibrium regimes in the learning of Restricted Boltzmann Machines Aurélien Decelle, Beatriz Seoane, Cyril Furtlehner Training Restricted Boltzmann Machines (RBMs) has been challenging for a long time due to the difficulty of computing precisely the log-likelihood gradient. Over the past decades, many works have proposed more or less successful training recipes but without studying the crucial quantity of the problem: the mixing time. In this work, we show that this mixing time plays a crucial role in the dynamics and stability of the trained model, and that RBMs operate in two well-defined regimes, namely equilibrium and out-of-equilibrium, depending on the interplay between this mixing time of the model and the number of steps, k, used to approximate the gradient. We further show empirically that this mixing time increases with the learning, which often implies a transition from one regime to another as soon as k becomes smaller than this time. In particular, we show that using the popular k (persistent) contrastive divergence approaches, with k small, the dynamics of the learned model are extremely slow and often dominated by strong out-of-equilibrium effects. On the contrary, RBMs trained in equilibrium display faster dynamics, and a smooth convergence to dataset-like configurations during the sampling.
Tuesday, March 15, 2022 10:12AM - 10:24AM	F09.00010: Long range memory in deep neural networks' neural activations Ling Feng, Nicholas Jia Le Chong The biological brain neurons exhibit various critical phase transition patterns, among them is the long-range memory phenomenon. One common hypothesis is that the healthy brain is operating at some critical point, leading to such long range effect. In the artificial neural networks, in particular deep learning models, it has been recently found that they operate optimally at the critical state between periodic cycle phase and chaotic phase in various benchmark tests. Hence, it remains to be seen that if such critical state leads to the long range memory effect similar to that of the biological brain at a quantitatively level. Here, we investigate several widely adopted deep learning models of different architectures, and look for the evidence of such long range memory effect in the neuron's activations, when the model achieves the highest accuracy on benchmark datasets. In some of the models, we found signatures of long range memory in the high frequency region, while the low frequency region is governed by short memory effects. The robustness of the phenomenon is also investigated across different types of training datasets to test the dependence of the long range memory on the long memory of input data.
Tuesday, March 15, 2022 10:24AM - 10:36AM	F09.00011: Identifying symmetries in the statistical ensemble of coarse-graining rules Doruk Efe Gokmen, Zohar Ringel, Sebastian Huber, Maciej Koch-Janusz In statistical physical systems, symmetries which are absent at the level of microscopic building blocks often emerge in the collective behaviour. Moreover, without prior knowledge even microscopic symmetries may be difficult to identify from unstructured Monte Carlo or experimental data. Here we take an information theoretic perspective to address this challenge systematically. To this end we focus on real-space mutual information (RSMI), leveraging the crucial observation that coarse-graining transformations maximising RSMI can be formally identified as the relevant operators of the effective field theory. Using the recently introduced RSMI-NE algorithm, statistical ensembles of such coarse-graining transformations can be efficiently generated. In this work we study the information contained in this ensemble, and show how symmetries, broken and also emergent, can be identified. We also demonstrate the extraction of the phase diagram and the order parameters for equilibrium systems. Our approach paves the way towards automated data-driven discovery of emergent symmetries of complex statistical systems.
Tuesday, March 15, 2022 10:36AM - 10:48AM	F09.00012: Exploring the loss landscape with Langevin dynamics Théo Jules, Yohai Bar-Sinai In supervised learning, neural network training is founded on the minimization of a high-dimensional loss function. A better understanding of its landscape is crucial in designing better-performing learning algorithms. We look to explore the loss landscape of an over-parametrized deep network with numerical experiments. Starting from a global minimum, we study the dynamics of SGD with added random noise that generate competition between diffusion and gradient descent. Most notably, we observe unexpected catastrophic dynamics and investigate how they relate to the value of the hyperparameters, like the learning rate or the batch size, and to the characteristics of the loss landscape.
Tuesday, March 15, 2022 10:48AM - 11:00AM	F09.00013: Learning actions from data using invertible neural networks Claudia Merger, Carsten Honerkamp, Alexandre René, Moritz Helias Many problems in physics can be cast into the form of a polynomial action, of which the coefficients determine physical properties. A typical approach is to derive these coefficients from a theory of microscopic interactions. However, this may not always be possible, or a microscopic theory may not be known. We here use invertible neural networks (INNs) trained in an unsupervised manner to describe data distributions. We choose a nonlinearity for which the coefficients of the corresponding action can be computed from the trained weights. A diagrammatic language expresses the change in the action from one layer of the INN to the next. Inverting the network allows us to extract coefficients of the data distribution and to trace how the INN parameters shape the interaction terms in its action. We test this formalism on a reduced model of Ising spins.

About APS

The American Physical Society (APS) is a non-profit membership organization working to advance the knowledge of physics.

Headquarters 1 Physics Ellipse, College Park, MD 20740-3844 (301) 209-3200
Editorial Office 100 Motor Pkwy, Suite 110, Hauppauge, NY 11788 (631) 591-4000
Office of Public Affairs 529 14th St NW, Suite 1050, Washington, D.C. 20045-2001 (202) 662-8700

Bulletin of the American Physical Society

APS March Meeting 2022

Volume 67, Number 3

Monday–Friday, March 14–18, 2022; Chicago

Session F09: Physics of Machine Learning I

Follow Us

Engage

My APS

Information for

About APS