ICSI hosts basic, pre-competitive research of fundamental importance to computer science and engineering. Projects are chosen based on the interests of the Institute’s principal investigators and the strengths of its researchers and affiliated UC Berkeley faculty.

Recent projects are listed below; the full list of each group's projects is accessible via the links listed in the sidebar.

Scalable Second-order Methods for Training, Designing, and Deploying Machine Learning Models

Scalable algorithms that can handle the large-scale nature of modern datasets are an integral part of many applications of machine learning (ML). Among these, efficient optimization algorithms, as the bread and butter of many ML methods, hold a special place. Optimization methods that use only first derivative information, i.e., first-order methods, are the most common tools used in training ML models. This is despite the fact that many of these methods come with inherent disadvantages such as slow convergence, poor communication, and the need for laborious hyper-parameter tuning.

RDL, Big Data
Towards an Extensible Internet

The Internet has created a world of universal connectivity, where any two devices can communicate as long as they are both connected to the Internet. The Internet architecture is a miraculous feat of engineering, remaining largely unchanged while scaling from an early prototype to the centerpiece of the global communications infrastructure. Having grown to such an unprecedented scale, the Internet is now the victim of its own success in that the Internet’s core protocol, IP, is now embedded in every router and thus is essentially impossible to change in any fundamental way. Yet to achieve goals like better performance and greater security, it is clear that the Internet must eventually change. The central technical question facing the Internet is thus: can we fundamentally change the Internet without changing IP?

In this project, researchers at ICSI argue that the answer to this question is most definitely “yes”. Leveraging insights from the large private networks recently built by cloud and content providers and a long line of academic research, they describe an approach called the Extensible Internet (EI).

Extensible Internet, Research Initiatives
Foregrounding Bystanders as Stakeholders in Smart Home Product Design

As computing advances, we are faced with tough decisions like how to balance individual privacy with the potential for innovation. People are often uncomfortable with how data is collected and used, yet we continue to see new data-driven technologies deployed. The oft-touted approach of transparency and control has not been an effective solution to individual privacy. People are ill-equipped to decipher how systems work, so cannot effectively use tools intended to put them in control. And as technology expands beyond devices for individuals, privacy expands beyond individual choice.

Usable Security and Privacy
Narrowing The Gap Between Privacy Expectations and Reality in Mobile Health

ICSI and St. Mary's College are collaborating on an NSF-funded project that seeks to answer important questions about privacy and security practices in mobile health technologies (mHealth), such as health apps.

Usable Security and Privacy

The Pangeo project is a community platform for Big Data geoscience. 2i2c collaborates with Pangeo by developing and running JupyterHub infrastructure for the Pangeo Hubs. This work focuses around building collaborative data science platforms that can draw from large cloud datasets, as well as integrating JupyterHub with scalable computing in the cloud via Dask Gateway.

This research is funded by the Moore Foundation and is a collaboration with Columbia University.

Backdoor Detection via Eigenvalues, Hessians, Internal Behaviors, and Robust Statistics

Although Deep Neural Networks (DNNs) have achieved impressive performance in several applications, there are several by now well-known sensitivities that they exhibit. Perhaps the most prominent of these is sensitivity in various types of adversarial environments. As an example of this, recall that it is common in practice to outsource the training of a model (which is known as Machine Learning as a Service, MLaaS) or to use third-party pre-trained networks (and then perform fine-tuning or transfer learning).

RDL, Big Data

2i2c makes interactive computing more accessible and powerful for research and education. The 2i2c team strives to accelerate research and discovery, and to empower education to be more accessible, intuitive, and enjoyable. They do this by providing managed services for interactive computing infrastructure for research and education, as well as by supporting open source tools and communities that underlie this infrastructure. For more information, see

Funding provided by Chan Zuckerberg Science Initiative

Multi Modal Video Summarization

ICSI researchers have been working with DAC to identify and acquire datasets that are sufficient for training Automated Speech Recognition (ASR) models. They are researching and developing ASR models that are robust to noise, music, babble and reverberation. This may include, but is not limited to, the research and implementation of signal processing algorithms that remove segments of an audio stream that do not include speech.

Audio and Multimedia
Identifying semantic components

Identifying potential WMD-related threats before they materialize requires the ability to discover and analyze low-observable WMD-related information from data of all types, including social media. To help build the robust natural language understanding (NLU) systems needed for this goal, this project investigates the automatic identification of semantic components, sub-lexical elements of linguistic meaning that may be composed in different ways to capture the meanings of words.

Multilingual FrameNet: Merging FrameNets for Cross-linguistic Research

One of the greatest challenges to NLP is the increasing variety of languages on the internet; part of the answer to this challenge can come from the FrameNet lexical database, which has been developed for English since 1997 at the International Computer Science Institute (ICSI) based on the principles of Frame Semantics (Fillmore 1977; Fillmore 1985). The lexicon is organized by semantic frames, with valence information derived from attested, manually annotated corpus examples (Fillmore & Baker 2010).

PacketLab: A Universal Measurement Endpoint Interface

The right vantage point is critical to the success of any active measurement. However, most research groups cannot afford to design, deploy, and maintain their own network of measurement endpoints, and thus rely measurement infrastructure shared by others. Unfortunately, the mechanism by which we share access to measurement endpoints today is not frictionless; indeed, issues of compatibility, trust, and a lack of incentives get in the way of efficiently sharing measurement infrastructure.

Networking and Security
Rethinking Home Networking for the Ultrabroadband Era

People generally center their lives around their residence. This center of gravity is where we can be contacted, store our things, do our homework, play games, meet after disparate activities, eat our meals, and so on. Our digital lives are, however, not organized around such a hub. Rather, we use a myriad of services to communicate with one another, store pictures, work on documents, share videos, keep our music, deal with calendars, etc. In this arrangement we are the hub. Our content and information comes to us from a range of places to wherever we happen to be at the moment.

Networking and Security
Privacy Risk in Machine Learning Pipelines

ICSI researchers are working with researchers at Carnegie Mellon University on tracking private data through machine learning pipelines. They will develop stronger notions of proxy that account for why a classifier is using information by: 

Networking and Security
Previous Work: Implement and Evaluate Matrix Algorithms in Spark on High Performance Computing Platforms for Science Applications

The overall goal of this project is to enable the Berkeley Data Analytics Stack (BDAS) to run efficiently on the Cray XC30 and Cray XC40 supercomputer platforms. BDAS has a rich set of capabilities and is of interest as a computational environment for very large-scale machine learning and data analysis applications. To extend the capabilities of BDAS, ICSI researchers will consider the performance of deterministic and randomized matrix algorithms for problems such as least-squares approximation and low-rank matrix approximation that underlie many common machine-learning algorithms.

Big Data
Previous Work: Local Algorithms for Large Informatics Graphs

A serious problem with many existing machine learning and data analysis tools in the complex networks area is that they are often very brittle and/or do not scale well to larger networks. As a consequence, analysts often develop intuition on small networks, with 102 or 103 nodes, and then try to apply these methods on larger networks, with 105 or 107 or more nodes. Larger networks, however, often have very different static and dynamic properties than smaller networks.

Big Data
Liquid Data Networking

Packet loss often occurs when transmitting over wireless links, due to interference, intermittent obstacles, routing changes as conditions vary, etc. As examples, interference and the resulting packet loss are major concerns as the Internet is extended using wireless mesh networks, 5G millimeter wave transmission is known to be prone to packet loss with even the slightest obstruction, and slight variations in atmospheric conditions cause packet loss in laser communications between ground stations and drones or satellites.

Exploring the Boundaries of Passive Listening in Voice Assistants

Various forms of voice assistants—stand-alone devices or those built into smartphones—are becoming increasingly popular among consumers. Currently, these systems react when you directly speak to them using a specific wake-word, such as “Alexa,” “Siri,” “Ok Google.” However, with advancements in speech recognition, the next generation of voice assistants is expected to always listen to the acoustic environment and proactively provide services and recommendations based on human conversations or other audio signals, without being explicitly invoked.

Usable Security and Privacy
Low latency/high reliability streaming video prototype based on RaptorQ

We are happy to announce the availability of prototype software implementing low latency/high reliability video streaming based on RaptorQ.  The prototype uses the ROUTE protocol, which in turn uses RaptorQ to provide protection against packet loss. ROUTE and RaptorQ are specified in A/331, “Signaling, Delivery, Synchronization, and Error Protection”, as part of the ATSC 3.0 standard for delivery of media and non-timed data.

Intelligent Channel Management for Wireless Mesh Networks

ICSI's TCS team is working with Facebook to build and evaluate a system wide network channel management tool for a wireless mesh network. The aim of the collaboration is to increase wireless capacity, decrease interference between devices (self and external), and provide more automated and intelligent network channel management and planning.

Funding being provided by Facebook Connectivity.

Enhancements for Wireless Mesh Networks

ICSI's TCS team is working with Facebook to simulate, understand and enhance wireless mesh networks. The main goal of the effort is to build a high fidelity simulator for wireless mesh networks to understand the performance scaling of such networks and improve coverage, density, throughput. Potential enhancements include extending the reach and performance of wireless mesh networks i.e. to support more hops and/or provide superior internet connectivity, using technology which avoids deep changes to the network or wireless technology.