Multimodal Perceptual Grounding for Robots (DARPA BOLT Activity E)
Capabilities for perceptually grounded deep semantic language acquisition would provide a fundamental advance in language technologies. Practical applications include methods to ground in-the-field dialog for translation or command, so that soldiers commanding robots could refer to actual objects or qualities of the environment when specifying instructions, and systems for grounded translation of human to human dialog such that discourse involving physical properties could be accurately understood and conveyed in another language.
Presently, no systems exist which exhibit perceptually grounded noun, verb, and prepositional structures based on real visual, acoustic, and haptic sensing in an active agent which interacts with objects. The goal of the project is to have a robotic simulate the language-acquisition skills of a 2-year-old using visual, acoustic, and haptic sensors. This is a collaboration with the computer vision group, UC Berkeley, and UPenn.
