Overlap in Meetings: ASR Effects and Analysis by Dialog Factors, Speakers, and Collection Site
Title | Overlap in Meetings: ASR Effects and Analysis by Dialog Factors, Speakers, and Collection Site |
Publication Type | Conference Paper |
Year of Publication | 2006 |
Authors | etin, Ö. Ç., & Shriberg E. |
Published in | Proceedings of the Third Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms (MLMI 2006) |
Page(s) | 212-224 |
Other Numbers | 1972 |
Abstract | We analyze speaker overlap in multiparty meetings both in terms of automatic speech recognition (ASR) performance, and in terms of distribution of overlap with respect to various factors (collection site, speakers, dialog acts, and hot spots). Unlike most previous work on overlap or crosstalk, our ASR error analysis uses an approach that allows comparison of the same foreground speech with and without naturally occurring overlap, using a state-of-the-art meeting recognition system. We examine 101 meetings. For analysis of ASR, we use 26 meetings from the NIST meeting transcription evaluations, and discover a number of interesting phenomena. First, overlaps tend to occur at high-perplexity regions in the foreground talker's speech. Second, overlap regions tend to have higher perplexity than those in nonoverlaps, if trigrams or 4-grams are used, but unigram perplexity within overlaps is considerably lower than that of nonoverlaps. Third, word error rate (WER) after overlaps is consistently lower than that before the overlap, apparently because the foreground speaker reduces perplexity shortly after being overlapped. These appear to be robust findings, because they hold in general across meetings from different collection sites, even though meeting style and absolute rates of overlap vary by site. Further analyses of overlap with respect to speakers and meeting content were conducted on a set of 75 additional meetings collected and annotated at ICSI. These analyses reveal interesting relationships between overlap and dialog acts, as well as between overlap and ``hot spots'' (points of increased participant involvement). Finally, results from this larger data set show that individual speakers have widely varying rates of being overlapped. |
URL | http://www.icsi.berkeley.edu/pubs/speech/CetinShriberg0506.pdf |
Bibliographic Notes | Proceedings of the Third Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms (MLMI 2006), Washington DC, pp. 212-224 |
Abbreviated Authors | O. Cetin and E. Shriberg |
ICSI Research Group | Speech |
ICSI Publication Type | Article in conference proceedings |