Publication Details
Title: Multi-Modal Speaker Diarization of Real-World Meetings Using Compressed-Domain Video Features
Author: G. Friedland, H. Hung, and C. Yeo
Group: ICSI Technical Reports
Date: October 2008
PDF: http://www.icsi.berkeley.edu/pubs/techreports/tr-08-007.pdf
Overview:
Speaker diarization is originally defined as the task of determining “who spoke when” given an audio track and no other prior knowledge of any kind. The following article shows a multi-modal approach where we improve a state-of-the-art speaker diarization system by combining standard acoustic features (MFCCs) with compressed domain video features. The approach is evaluated on over 4.5 hours of the publicly available AMI meetings dataset which contains challenges such as people standing up and walking out of the room. We show a consistent improvement of about 34% relative in speaker error rate (21% DER) compared to a state-of-the-art audio-only baseline.
Bibliographic Information:
ICSI Technical Report TR-08-007
Bibliographic Reference:
G. Friedland, H. Hung, and C. Yeo. Multi-Modal Speaker Diarization of Real-World Meetings Using Compressed-Domain Video Features. ICSI Technical Report TR-08-007, October 2008
Author: G. Friedland, H. Hung, and C. Yeo
Group: ICSI Technical Reports
Date: October 2008
PDF: http://www.icsi.berkeley.edu/pubs/techreports/tr-08-007.pdf
Overview:
Speaker diarization is originally defined as the task of determining “who spoke when” given an audio track and no other prior knowledge of any kind. The following article shows a multi-modal approach where we improve a state-of-the-art speaker diarization system by combining standard acoustic features (MFCCs) with compressed domain video features. The approach is evaluated on over 4.5 hours of the publicly available AMI meetings dataset which contains challenges such as people standing up and walking out of the room. We show a consistent improvement of about 34% relative in speaker error rate (21% DER) compared to a state-of-the-art audio-only baseline.
Bibliographic Information:
ICSI Technical Report TR-08-007
Bibliographic Reference:
G. Friedland, H. Hung, and C. Yeo. Multi-Modal Speaker Diarization of Real-World Meetings Using Compressed-Domain Video Features. ICSI Technical Report TR-08-007, October 2008
