Overlap in Meetings: ASR Effects and Analysis by Dialog Factors, Speakers, and Collection Site

TitleOverlap in Meetings: ASR Effects and Analysis by Dialog Factors, Speakers, and Collection Site
Publication TypeConference Paper
Year of Publication2006
Authorsetin, Ö. Ç., & Shriberg E.
Published inProceedings of the Third Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms (MLMI 2006)
Page(s)212-224
Other Numbers1972
Abstract

We analyze speaker overlap in multiparty meetings both in terms of automatic speech recognition (ASR) performance, and in terms of distribution of overlap with respect to various factors (collection site, speakers, dialog acts, and hot spots). Unlike most previous work on overlap or crosstalk, our ASR error analysis uses an approach that allows comparison of the same foreground speech with and without naturally occurring overlap, using a state-of-the-art meeting recognition system. We examine 101 meetings. For analysis of ASR, we use 26 meetings from the NIST meeting transcription evaluations, and discover a number of interesting phenomena. First, overlaps tend to occur at high-perplexity regions in the foreground talker's speech. Second, overlap regions tend to have higher perplexity than those in nonoverlaps, if trigrams or 4-grams are used, but unigram perplexity within overlaps is considerably lower than that of nonoverlaps. Third, word error rate (WER) after overlaps is consistently lower than that before the overlap, apparently because the foreground speaker reduces perplexity shortly after being overlapped. These appear to be robust findings, because they hold in general across meetings from different collection sites, even though meeting style and absolute rates of overlap vary by site. Further analyses of overlap with respect to speakers and meeting content were conducted on a set of 75 additional meetings collected and annotated at ICSI. These analyses reveal interesting relationships between overlap and dialog acts, as well as between overlap and ``hot spots'' (points of increased participant involvement). Finally, results from this larger data set show that individual speakers have widely varying rates of being overlapped.

URLhttp://www.icsi.berkeley.edu/pubs/speech/CetinShriberg0506.pdf
Bibliographic Notes

Proceedings of the Third Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms (MLMI 2006), Washington DC, pp. 212-224

Abbreviated Authors

O. Cetin and E. Shriberg

ICSI Research Group

Speech

ICSI Publication Type

Article in conference proceedings