Combining stochastics and numerics for more meaningful matrix computations

Principal Investigator(s): 
Michael Mahoney

The amount of data in our world has exploded, with data being at the heart of modern economic activity, innovation, and growth. In many cases, data are modeled as matrices, since an m x n matrix A provides a natural structure to encode information about m objects, each of which is described by n features. As a result, linear algebraic algorithms, and in particular matrix decompositions, have proven extremely successful in the analysis of datasets in the form of matrices. RandNLA (Randomized Numerical Linear Algebra), which integrates the complementary perspectives that Theoretical Computer Science and Numerical Linear Algebra bring to matrix computations, has led to nontrivial theory and high-quality implementations, and it has proven useful in a range of scientific and internet applications. Perhaps surprisingly, then, there has been very little analysis of the statistical properties of these algorithms, or of the interplay between statistical properties and numerical aspects of implementations, or of how these algorithms are used in downstream convex and non-convex optimization pipelines. For example, the randomness inside the algorithm can lead to implicit regularization and usefulness in downstream applications that is not captured by the theory. This project addresses these issues, with a focus on complementary stochastic and numerical aspects of RandNLA algorithms, as well as how RandNLA primitives are used in realistic convex and non-convex machine learning pipelines. This will lead to new insights in algorithmic and statistical theory, as well as more useful algorithms in practical implementations and applications. For example, revisiting statistical resampling theory in light of scalable algorithmic random sampling theory is one of the goals of the proposed work.

Funding provided by NSF