Multi Modal Video Summarization

Principal Investigator(s): 
Gerald Friedland

ICSI researchers have been working with DAC to identify and acquire datasets that are sufficient for training Automated Speech Recognition (ASR) models. They are researching and developing ASR models that are robust to noise, music, babble and reverberation. This may include, but is not limited to, the research and implementation of signal processing algorithms that remove segments of an audio stream that do not include speech. ICSI researchers are also working with the DAC team to ensure the model is compliant with the DAC Video Information Summarization, Captioning, Analysis, and Rank Ordering (VISCARO) model. They will research and develop a joint model that includes both automated speech recognition and speaker recognition to determine its potential for improved accuracy.