ICSI hosts basic, pre-competitive research of fundamental importance to computer science and engineering. Projects are chosen based on the interests of the Institute’s principal investigators and the strengths of its researchers and affiliated UC Berkeley faculty.

Recent projects are listed below; the full list of each group's projects is accessible via the links listed in the sidebar.

Previous Work: MetaNet: A Multilingual Metaphor Repository

Researchers from ICSI, UC San Diego, University of Southern California, and UC Merced are building a system capable of understanding metaphors used in American English, Iranian Persian, Russian as spoken in Russia, and Mexican Spanish. The team includes computer scientists, linguists, psychologists, and cognitive scientists.

Previous Work: California Connects

California Connects is a state-level program administered by the Foundation for California Community Colleges that seeks to advance digital opportunity for underserved communities by promoting and enabling digital competency. Among other services, the program provides laptops to community college students, who in return teach people in their communities how to use computers and the Internet. The program also provides free classes in low-income Central Valley communities. The California Connects team at ICSI provides research support for the initiative, evaluating the program's structure and effectiveness in the context of its target population and making recommendations for its future.

Previous Work: BFOIT

BFOIT (the Berkeley Foundation for Opportunities in Information Technology) supports historically underrepresented ethnic minorities and women in their desire to become leaders in the fields of computer science, engineering, and information technology. The intent is to provide youth with knowledge, resources, practical programming skills, and guidance in their pursuit of higher education and production of technology. For more information, visit the BFOIT Web site.

Previous Work: SWORDFISH

Researchers are developing ways to find spoken phrases in audio from multiple languages. A working group, called SWORDFISH, includes scientists from ICSI, the University of Washington, Northwestern University, Ohio State University, and Columbia University. The acronym expands to a rough description of the effort: Spoken WOrdsearch  with Rapid Development and Frugal Invariant Subword Hierarchies.

Previous Work: Privacy Literacy with San Jose Public Library

ICSI researchers are collaborating with the San Jose Public Library and San Jose State University's Game Development club to develop an online tool which will help individuals understand privacy in the digital age and make informed decisions about their online activity. Beyond the standard educational aid, this tool will be non-biased, acknowledging that people have many different definitions of privacy and may have different needs based on what kind of online persona they have created.

Audio and Multimedia, Usable Security and Privacy
Previous Work: Project Ouch - Outing Unfortunate Characteristics of HMMs (Used for Speech Recognition)

Project OUCH has been completed, and the final report is available here.

The central idea behind this project is that if we want to improve recognition performance through acoustic modeling, then we should first quantify how the current best model — the hidden Markov model (HMM) — fails to adequately model speech data and how these failures impact recognition accuracy. We are undertaking a diagnostic analysis that is an essential component of statistical modeling but, for various reasons, has been largely ignored in the field of speech recognition. In particular, we believe that previous attempts to improve upon the HMM have largely failed because this diagnostic information was not readily available. In our initial research, we are using simulation and a novel sampling process to generate pseudo test data that deviate from the HMM in a controlled fashion. These processes allow us to generate pseudo data that, at one extreme, agree with all of the model's assumptions, and at the another extreme, deviate from the model in exactly the way real data does. In between, we precisely control the degree of data/model mismatch. By measuring recognition performance on this pseudo test data, we are able to quantify the effect of this controlled data/model residual on recognition accuracy.

Multimodal Location Estimation

Location estimation is the task of estimating the geo-coordinates of the content recorded in digital media The Berkeley Multimodal Location Estimation project aims to leverage the GPS-tagged media available on the web as training set for an automatic location estimator. The idea is that visual and acoustic cues can narrow down the possible recording location for a given image, video, or audio track. We also investigate the human baseline of location estimation, i.e. how well does a human do in comparison to a computer?

Audio and Multimedia

Researchers are exposing the ways in which it is possible to aggregate public and seemingly innocuous information from different media and Web sites to attack the privacy of users. The project seeks to help users, particularly younger ones, understand the privacy implications of the information they share publicly on the Internet and to help them understand what control they can exercise over it.

Audio and Multimedia, Networking and Security
User-Centric Networking

In collaboration with Case Western Reserve University, we are investigating foundation architectural constructs that bring users into networked systems in a way that has to this point not been possible. Rather than relegating users to an artifact of the application layer, we seek to accommodate users and their relationships at all layers of the system and to give users new controls over how their traffic is handled by the system.

Funding provided by NSF grant 1213157, NeTS: Large: Collaborative Research: User-Centric Network Measurement.

Networking and Security
Open Software-Defined Networks

Today's routers and switches are both complicated and closed. The forwarding path on these boxes involve sophisticated ASICs, and the large base of installed software is typically closed and proprietary. Thus, functionality can only evolve on hardware design timescales, and only through the actions of the vendors. At ICSI, in collaboration with our colleagues at Stanford University, we are pursuing a radically different approach which we call Open Software-Defined Networks.

Networking and Security
Future Internet Architecture

Along with research groups around the world, we are exploring fundamental questions about Internet architecture. In particular, we are, "If we were to redesign the Internet, what would it look like?" This effort involves looking at all aspects of the Internet architecture, including addressing, intradomain routing, interdomain routing, naming, name resolution, network API, monitoring, and troubleshooting. Moreover, the effort involves both in-depth investigations of these isolated topics, and a synthesis of these aspects into a coherent and comprehensive future Internet architecture.

Networking and Security
Detecting and Preventing Network Attacks

We conduct extensive research on technology for analyzing network traffic streams to detect attacks, either in "real time" as they occur, or in support of post facto forensic exploration. The particular context for much of this research is the open-source "Bro" network intrusion detection system authored by ICSI staff. Bro runs 24x7 operationally at a number of institutes, and we have particularly close ties with the Lawrence Berkeley National Laboratory, where Bro deployments have formed an integral part of the Institute's cybersecurity operations for more than a decade.

Networking and Security
Investigating the Underground Economy

One of the most disturbing recent shifts in Internet attacks has been the change from attackers motivated by glory or vanity to attackers motivated by commercial (criminal) gain. This shift threatens to greatly accelerate the "arms race" between defenders developing effective counters to attacks and highly motivated, well funded attackers finding new ways to circumvent these innovations.

Networking and Security
Understanding and Taming the Privacy Footprint

Typical Web pages may contain numerous third-party components, ranging from advertisement networks to analytics tools to third-party APIs necessary for page function. All of these components may leak information to third parties about the users' current activity. We are attempting to quantify this information leakage through a policy written in the Bro IDS. Preliminary analysis paints a bleak picture, as more than 1 percent of all HTTP requests observed by ICSI users are deliberately leaking information just through Google Analytics alone.

Networking and Security
Previous Work: Speaker Recognition

This project is concerned with the discovery of highly speaker-characteristic behaviors ("speaker performances") for use in speaker recognition and related speech technologies. The intention is to move beyond the usual low-level short-term spectral features which dominate speaker recognition systems today, instead focusing on higher-level sources of speaker information, including idiosyncratic word usage and pronunciation, prosodic patterns, and vocal gestures.

Previous Work: Robust Automatic Transcription of Speech

This DARPA-funded program seeks to significantly improve the accuracy of several speech processing tasks (speech activity detection, speaker identification, language identification, and keyword spotting) for degraded audio sources. As part of the SRI Speech Content Extraction from Noisy Information Channels (SCENIC) Team, we are working primarily on feature extraction (drawing on our experience with biologically motivated signal processing and machine learning) and speech activity detection (drawing on our experience with speech segmentation).

Funding provided by DARPA.

Video Concept Detection

Massive numbers of video clips are generated daily on many types of consumer electronics and uploaded to the Internet. In contrast to videos that are produced for broadcast or from planned surveillance, the "unconstrained" video clips produced by anyone who has a digital camera present a significant challenge for manual as well as automated analysis. Such clips can include any possible scene and events, and generally have limited quality control.

Audio and Multimedia
Previous Work: Color, Language, and Thought

In 1978 The World Color Survey (WCS) collected color naming data in 110 unwritten languages from around the world. The ICSI WCS staff (Paul Kay and Richard Cook of ICSI, Terry Regier of University of Chicago) put these data into a single database, available to the scientific community. Several outside laboratories have already used this database for studies.

Previous Work: Finding Conserved Protein Modules

A long-term goal of computational molecular biology is to extract, from large data sets, information about how proteins work together to carry out life processes at a cellular level. We are investigating protein-protein interaction (PPI) networks, in which the vertices are the proteins within a species and the edges indicate direct interactions between proteins. Our goal is to discover conserved protein modules: richly interacting sets of proteins whose patterns of interaction are conserved across two or more species.

Research Initiatives, Algorithms
Previous Work: Analysis of Genome-Wide Association Studies for Common Diseases

In these studies, sets of cases (individuals carrying a disease) and controls (background population) are collected and genotyped for genetic variants, normally single nucleotide polymorphisms (SNPs). Our group is collaborating closely with groups of geneticists and epidimiologists who have collected such samples. We take part in the analysis of these studies, and in some cases also in the design of the studies.

Research Initiatives, Algorithms