Why Deep Learning Works: Self Regularization in Neural Networks

Presented by Charles Martin

Thursday, December 13, 2018
2:30 p.m.
ICSI Lecture Hall
 

Title: Why Deep Learning Works: Self Regularization in Neural Networks 

Abstract:

Random Matrix Theory (RMT) is applied to analyze the weight matrices of Deep Neural Networks (DNNs), including both production quality, pre-trained models and smaller models trained from scratch. Empirical and theoretical results clearly indicate that the DNN training process itself implicitly implements a form of self-regularization, implicitly sculpting a more regularized energy or penalty landscape. In particular, the empirical spectral density (ESD) of DNN layer matrices displays signatures of traditionally-regularized statistical models, even in the absence of exogenously specifying traditional forms of explicit regularization. Building on relatively recent results in RMT, most notably its extension to Universality classes of Heavy-Tailed matrices, and applying them to these empirical results, we develop a theory to identify 5+1 Phases of Training, corresponding to increasing amounts of implicit self-regularization.  For smaller and/or older DNNs, this implicit self-regularization is like traditional Tikhonov regularization, in that there appears to be a ``size scale'' separating signal from noise. For state-of-the-art DNNs, however, we identify a novel form of heavy-tailed self-regularization, similar to the self-organization seen in the statistical physics of disordered systems. Moreover, we can use these heavy tailed results to form a VC-like average case complexity metric that resembles the product norm used in analyzing toy NNs, and we can use this to predict the test accuracy of pretrained DNNs without peeking at the test data.

 Bio:

Dr. Martin is the founder and chief scientist for Calculation Consulting, a consultancy and software development company specializing in machine learning and data science. He received his PhD in Theoretical Chemistry from the University of Chicago.  He was an NSF postdoctoral fellow in the Theoretical Chemistry and Physics department at the University of Illinois, Urbana-Champaign and the National Center for Supercomputing and its Applications (NCSA), working with neural networks and early forms of deep learning.  Dr. Martin has devoted 30 years of his life to the academic study and business practice of numerical scientific computation, machine learning, and AI.  Having worked over 20 years in Silicon Valley, he has applied machine learning in AI with companies like eBay, Aardvark (acquired by Google), BlackRock, and GoDaddy.  He was instrumental in making Demand Media the first billion dollar IPO since Google. His consultancy firm, Calculation Consulting, helps companies apply mathematical modeling and software engineering to complex big data, analytics, and artificial intelligence (AI) problems.