Robust, Efficient, and Local Machine Learning Primitives

Principal Investigator(s): 
Michael Mahoney

The large-scale data being generated in many application domains promise to revolutionize scientific discovery, engineering and technological development, social science understanding, and our ability to monitor masses and influence behavior in subtle ways. In most applications, however, this promise has yet to be fulfilled. One major reason for this is the dificulty of using, in a low-friction manner, cutting-edge algorithmic and statistical tools to explore the data and develop domain-informed models of the processes generating the data. In Internet and social media analysis, (1) knowledge of the data-generation processes is particularly weak; and (2) there are large teams of engineers and data scientists devoted to developing and applying sophisticated data science tools. Developing models and methods that are appropriate when these two conditions are not present is a major challenge impeding progress in many applications.

In this project, ICSI big data researchers are developing, implementing, and applying a suite of theoretically-principled algorithmic and statistical primitives that are easy for the non-expert to use and that map cleanly to the intuition and understanding that domain experts have about their data and the processes generating their data. Most of their efforts will focus on machine learning (ML) and data analysis (DA) primitives for analyzing data that are modeled by matrices or graphs, with an emphasis on primitives that (when combined appropriately) give complementary algorithmic and statistical advantage.

Funding provided by DARPA and the Air Force Research Laboratory under agreement number FA8750-17-2-0122.