Using Prosody for Automatic Sentence Segmentation of Multi-Party Meetings
Title | Using Prosody for Automatic Sentence Segmentation of Multi-Party Meetings |
Publication Type | Conference Paper |
Year of Publication | 2006 |
Authors | Kolar, J., Shriberg E., & Liu Y. |
Published in | Proceedings of 9th International Conference on Text, Speech and Dialogue (TSD 2006) |
Page(s) | 629-636 |
Other Numbers | 1988 |
Abstract | We explore the use of prosodic features beyond pauses, including duration, pitch, and energy features, for automatic sentence segmentation of ICSI meeting data. We examine two different approaches to boundary classification: score-level combination of independent language and prosodic models using HMMs, and feature-level combination of models using a boosting-based method (BoosTexter). We report classification results for reference word transcripts as well as for transcripts from a state-of-the-art automatic speech recognizer. We also compare results using the lexical model plus a pause-only prosody model, versus results using additional prosodic features. Results show that: (1) information from pauses is important, including both pause duration at the boundary, and at the previous and following word boundaries; (2) adding duration, pitch, and energy features yields significant improvement over pause alone; (3) the integrated boosting-based model performs better than the HMM for ASR conditions; (4) training the boosting-based model on recognized words yields further improvement. |
URL | http://www.icsi.berkeley.edu/pubs/speech/tsd178a.pdf |
Bibliographic Notes | Proceedings of 9th International Conference on Text, Speech and Dialogue (TSD 2006), Brno, Czech Republic, pp. 629-636 |
Abbreviated Authors | J. Kolar, E. Shriberg, and Y. Liu |
ICSI Research Group | Speech |
ICSI Publication Type | Article in conference proceedings |