Projects
ICSI hosts basic, pre-competitive research of fundamental importance to computer science and engineering. Projects are chosen based on the interests of the Institute’s principal investigators and the strengths of its researchers and affiliated UC Berkeley faculty.
Recent projects are listed below; the full list of each group's projects is accessible via the links listed in the sidebar.
Today's routers and switches are both complicated and closed. The forwarding path on these boxes involve sophisticated ASICs, and the large base of installed software is typically closed and proprietary. Thus, functionality can only evolve on hardware design timescales, and only through the actions of the vendors. At ICSI, in collaboration with our colleagues at Stanford University, we are pursuing a radically different approach which we call Open Software-Defined Networks.
Sudderth and Jordan has successfully proposed a non-parametric image segmentation engine by imposing a hierarchical pitman-yor prior on the data and training the probabilistic model through variation learning. We propose to extend the framework to the video domain by incorporating temporal and optical flow information. The project tackles the problems of scarcity of segmented, annotated video dataset, as well as computational issues via GPU computation and distributed computing (e.g., EC2, Hadoop clusters).
Along with research groups around the world, we are exploring fundamental questions about Internet architecture. In particular, we are, "If we were to redesign the Internet, what would it look like?" This effort involves looking at all aspects of the Internet architecture, including addressing, intradomain routing, interdomain routing, naming, name resolution, network API, monitoring, and troubleshooting. Moreover, the effort involves both in-depth investigations of these isolated topics, and a synthesis of these aspects into a coherent and comprehensive future Internet architecture.
We conduct extensive research on technology for analyzing network traffic streams to detect attacks, either in "real time" as they occur, or in support of post facto forensic exploration. The particular context for much of this research is the open-source "Bro" network intrusion detection system authored by ICSI staff. Bro runs 24x7 operationally at a number of institutes, and we have particularly close ties with the Lawrence Berkeley National Laboratory, where Bro deployments have formed an integral part of the Institute's cybersecurity operations for more than a decade.
The Sufficient Dimensionality Reduction (SDR) framework seeks to find a latent subspace that captures as much information of the covariates with respect to the output labels via conditional independence. As a particular instance of SDR, the kernel dimensionality reduction (KDR) algorithm achieves the conditional independence through minimizing the trace of the cross covariance operator. We propose to extend the existing framework for static data to include sequential information through kernel design and dynamic time warping.
One of the most disturbing recent shifts in Internet attacks has been the change from attackers motivated by glory or vanity to attackers motivated by commercial (criminal) gain. This shift threatens to greatly accelerate the "arms race" between defenders developing effective counters to attacks and highly motivated, well funded attackers finding new ways to circumvent these innovations.
Typical Web pages may contain numerous third-party components, ranging from advertisement networks to analytics tools to third-party APIs necessary for page function. All of these components may leak information to third parties about the users' current activity. We are attempting to quantify this information leakage through a policy written in the Bro IDS. Preliminary analysis paints a bleak picture, as more than 1 percent of all HTTP requests observed by ICSI users are deliberately leaking information just through Google Analytics alone.
Many problems in machine learning contain datasets that are comprised of multiple independent feature sets or views, e.g., audio and video, text and images, and multi-sensor data. In this setting, each view provides a potentially redundant sample of the class or event of interest. Techniques in multi-view learning exploit this property to learn under weak supervision by maximizing the agreement of a set of classifiers defined in each view over the training data.
This NSF-funded center is a joint effort with researchers at UC San Diego focused on the growing problem of large-scale subversion of Internet systems. The purview of CCIED is to: (1) analyze this threat, spanning the range from theoretical models to empirical assessments to potential innovations that threaten to develop; (2) devise defenses, both point-wise (for single systems or sites) and more globally; and (3) investigate the surrounding legal and policy issues that in practice affect and constrain approaches for countering the threat.
Traditionally, object recognition requires manually labeled images of objects for training. However, there often exist additional sources of information that can be used as weak labels, reducing the need for human supervision. In this project we use different modalities and information sources to help learn visual models of object categories. The first type of information we use is the speech uttered by a user referring to an object. Such spoken utterances can occur in interaction with an assistant robot, voice-tagging a photo, etc.
Despite the omni-presence of transparent objects in our daily environment, little research has been conducted on how to recognize and detect such objects. The difficulties of this task lie in the complex interactions between scene geometry and illuminants that lead to changing refractory patterns. Realizing that a complete physical model of these phenomena is out of reach at the moment, we seek different machine learning solutions to approach this problem.
In our everyday life, we manipulate many nonrigid objects, such as clothes. In the context of personal robotics, it would therefore be important to correctly recognize and track these objects for a robot to interact with them. While tracking and recognition of rigid objects has received a lot of attention in the Computer Vision community, similar tasks for deformable ones remain mainly unstudied. The main challenges that need to be addressed arise from the much larger appearance variability of such objects.
A common problem in large-scale data is that of quickly extracting nearest neighbors to a query from a large database. In computer vision, for example, this problem arises in content-based image retrieval, 3-D image reconstructions, human body pose estimation, object recognition problems, and other problems. This project focuses on developing algorithms for quickly and accurately performing large-scale image searches using hashing techniques.
This project explores how to define the meaning of prepositions using visual data. One potential application is to be able to command a robot to arrange objects in a room. For example, in order for a robot to be able to follow the command "Put the cup there, on the front of the table," the robot must identify the target location of the cup. The robot can only identify this location if it understands the meanings of each of the components.
During a disaster a large number of children may become separated from their families. Many of these children, especially the younger ones, may be unable or unwilling to identify themselves, making the task of reuniting them with their families especially difficult. Without a system in place for hospitals to document their unidentified children and to help parents search, families could be separated for months. After Hurricane Katrina it was six months until the last child was reunited with her family.
Multiple kernel learning approaches form a set of techniques for performing classification that can easily combine information from multiple data sources, e.g., by adding or multiplying kernels. Most methods, however, are limited by their assumption of a per-view kernel weighting. For many problems, the set of features important for discriminating between examples can vary locally.
For many real world object recognition tasks a common difficulty is the high cost of generating labels for a large pool of unlabeled images. In order to learn a concept with the help of a human expert, we aim at picking only a small subset of examples that is most "helpful" for the classifier. The concept of active learning tackles this setting by enabling a classifier to pose specific queries that are chosen from an unlabeled dataset.
This project has two components.
(1) Cortically-inspired speech recognition: Acoustic events such as speech exhibit distinctive spectro-temporal amplitude modulations. These types of modulations are not well captured by conventional feature extraction methods, which involve either spectral processing or temporal processing at a time.
Go to any meeting or lecture with the younger generation of researchers, business people, or government employees, and there is a laptop or smart phone at every seat. Each laptop and smart phone is capable not only of recording and transmitting video and audio in real time, but also of advanced analytics on the data (e.g., speech recognition, speaker identification, face detection, etc.). Yet this rich resource goes largely unexploited, mostly because there are not enough good training data for machine learning algorithms.
This project is concerned with the discovery of highly speaker-characteristic behaviors ("speaker performances") for use in speaker recognition and related speech technologies. The intention is to move beyond the usual low-level short-term spectral features which dominate speaker recognition systems today, instead focusing on higher-level sources of speaker information, including idiosyncratic word usage and pronunciation, prosodic patterns, and vocal gestures.

