ICSI hosts basic, pre-competitive research of fundamental importance to computer science and engineering. Projects are chosen based on the interests of the Institute’s principal investigators and the strengths of its researchers and affiliated UC Berkeley faculty.

Recent projects are listed below; the full list of each group's projects is accessible via the links listed in the sidebar.

Previous Work: Speaker Recognition

This project is concerned with the discovery of highly speaker-characteristic behaviors ("speaker performances") for use in speaker recognition and related speech technologies. The intention is to move beyond the usual low-level short-term spectral features which dominate speaker recognition systems today, instead focusing on higher-level sources of speaker information, including idiosyncratic word usage and pronunciation, prosodic patterns, and vocal gestures.

Previous Work: Robust Automatic Transcription of Speech

This DARPA-funded program seeks to significantly improve the accuracy of several speech processing tasks (speech activity detection, speaker identification, language identification, and keyword spotting) for degraded audio sources. As part of the SRI Speech Content Extraction from Noisy Information Channels (SCENIC) Team, we are working primarily on feature extraction (drawing on our experience with biologically motivated signal processing and machine learning) and speech activity detection (drawing on our experience with speech segmentation).

Funding provided by DARPA.

Video Concept Detection

Massive numbers of video clips are generated daily on many types of consumer electronics and uploaded to the Internet. In contrast to videos that are produced for broadcast or from planned surveillance, the "unconstrained" video clips produced by anyone who has a digital camera present a significant challenge for manual as well as automated analysis. Such clips can include any possible scene and events, and generally have limited quality control.

Audio and Multimedia
Previous Work: Color, Language, and Thought

In 1978 The World Color Survey (WCS) collected color naming data in 110 unwritten languages from around the world. The ICSI WCS staff (Paul Kay and Richard Cook of ICSI, Terry Regier of University of Chicago) put these data into a single database, available to the scientific community. Several outside laboratories have already used this database for studies.


The NTL (Neural Theory of Language) project of the AI Group works in collaboration with other units on the UC Berkeley campus and elsewhere. It combines basic research in several disciplines with applications to natural language understanding systems. Basic efforts include studies in the computational, linguistic, neurobiological, and cognitive bases for language and thought. This research continues to yield a variety of theoretical and practical findings.


The FrameNet project is building a semantically-rich lexicon of English and a corresponding set of annotated texts, based on more than 600 semantic frames and 130,000 sentences. Comparable FrameNet projects are underway for Spanish, German, and other languages. By providing a layered semantic representation of text, FrameNet delivers a key component of next-generation question answering, machine translation, and other natural language processing applications. Learn more on the FrameNet Web site.

Previous Work: Finding Conserved Protein Modules

A long-term goal of computational molecular biology is to extract, from large data sets, information about how proteins work together to carry out life processes at a cellular level. We are investigating protein-protein interaction (PPI) networks, in which the vertices are the proteins within a species and the edges indicate direct interactions between proteins. Our goal is to discover conserved protein modules: richly interacting sets of proteins whose patterns of interaction are conserved across two or more species.

Research Initiatives, Algorithms
Previous Work: Analysis of Genome-Wide Association Studies for Common Diseases

In these studies, sets of cases (individuals carrying a disease) and controls (background population) are collected and genotyped for genetic variants, normally single nucleotide polymorphisms (SNPs). Our group is collaborating closely with groups of geneticists and epidimiologists who have collected such samples. We take part in the analysis of these studies, and in some cases also in the design of the studies.

Research Initiatives, Algorithms