Randomized Numerical Linear Algebra (RandNLA) for Multi-Linear and Non-Linear Data

Principal Investigator(s): 
Michael Mahoney

This project investigates two important, non-linear, structural settings in order to start making progress toward using RandNLA (Randomized Numerical Linear Algebra) approaches to big data analysis in situations where the underlying data exhibit non-linear structure. First, researchers investigate how to design the next generation of RandNLA algorithms that can handle data that exhibit multi-linear structures captured by tensors. Second, they investigate the applicability of RandNLA approaches to data that exhibit non-linear structure, as captured by non-linear dimensionality reduction techniques, local spectral methods, and related semi-supervised eigenvector tools. In addition, they evaluate the proposed approaches on data applications where the PIs have significant expertise, such as the statistical analysis of population genetics data and astronomical data.

The amount of data in the world has exploded, with data being at the heart of modern economic activity, innovation, and growth. Big data are often modeled as matrices, since an m × n matrix A provides a natural structure to encode information about m objects, each of which is described by n features. As a result, linear algebraic algorithms, and in particular matrix decompositions, have proven extremely successful in the analysis of datasets in the form of matrices. RandNLA, which integrates the complementary perspectives that theoretical computer science and numerical linear algebra bring to matrix computations, is a new paradigm for the design and analysis of such algorithms and for using the resulting insight to solve important problems.

Despite the many successes of RandNLA both from a theoretical as well as a practical perspective, all the proposed approaches essentially focus on the case where the input matrix exhibits linear structure, as captured by a good low-rank approximation. An obvious question arises: Is RandNLA useful if the underlying data exhibit non-linear structural properties? Equivalently, are the methods, techniques, and intuitions developed in RandNLA useful when analyzing data that lie in non-linear manifolds or when the data have other non-linearities? This project aims to improve RandNLA's usefulness for analyzing non-linear data.

Funding provided by NSF grant #1447534