Audio and Multimedia Projects

Multimodal Perceptual Grounding for Robots (DARPA BOLT Activity E)

Capabilities for perceptually grounded deep semantic language acquisition would provide a fundamental advance in language technologies. Practical applications include methods to ground in-the-field dialog for translation or command, so that soldiers commanding robots could refer to actual objects or qualities of the environment when specifying instructions, and systems for grounded translation of human to human dialog such that discourse involving physical properties could be accurately understood and conveyed in another language.

Multimodal Location Estimation

Location estimation is the task of estimating the geo-coordinates of the content recorded in digital media The Berkeley Multimodal Location Estimation project aims to leverage the GPS-tagged media available on the web as training set for an automatic location estimator. The idea is that visual and acoustic cues can narrow down the possible recording location for a given image, video, or audio track. We also investigate the human baseline of location estimation, i.e. how well does a human do in comparison to a computer?

GeoTube

Researchers are exposing the ways in which it is possible to aggregate public and seemingly innocuous information from different media and Web sites to attack the privacy of users. The project seeks to help users, particularly younger ones, understand the privacy implications of the information they share publicly on the Internet and to help them understand what control they can exercise over it.

Automated Low-Level Analysis and Description of Diverse Intelligence Video (ALADDIN)

Massive numbers of video clips are generated daily on many types of consumer electronics and uploaded to the Internet. In contrast to videos that are produced for broadcast or from planned surveillance, the "unconstrained" video clips produced by anyone who has a digital camera present a significant challenge for manual as well as automated analysis. Such clips can include any possible scene and events, and generally have limited quality control.