Scalable Second-order Methods for Training, Designing, and Deploying Machine Learning Models

Principal Investigator(s): 
Michael Mahoney

Scalable algorithms that can handle the large-scale nature of modern datasets are an integral part of many applications of machine learning (ML). Among these, efficient optimization algorithms, as the bread and butter of many ML methods, hold a special place. Optimization methods that use only first derivative information, i.e., first-order methods, are the most common tools used in training ML models. This is despite the fact that many of these methods come with inherent disadvantages such as slow convergence, poor communication, and the need for laborious hyper-parameter tuning. While second-order methods, i.e., those that use second derivative information, come equipped with the ability to mitigate many of these disadvantages, they are far less used within the ML community.

The primary goal in this project is to develop, implement, and apply novel methods that by innovative application of second-order information allow for enhanced design, diagnostics, and training of ML models. The theoretical developments will tackle challenges involved in training large-scale non-convex ML models from four general angles: high-quality local minima; distributed computing environments; generalization performance; and acceleration. This research will also develop efficient Hessian-based diagnostics tools for the analysis of the training process as well as of already-trained models. The final goal is to study a range of improvements and applications for the proposed methods in a variety of settings: improved communication properties; exploiting adversarial data; and exploring how these ideas can be used for more challenging problems such as how to improve neural architecture design and search. In all cases, researchers will provide high quality user-friendly implementations for both shared-memory and distributed computing environments.

This work is funded by the National Science Foundation grant #2107000.