Publication Details

Title: Using Prosody for Automatic Sentence Segmentation of Multi-Party Meetings
Author: J. Kolar, E. Shriberg, and Y. Liu
Group: Speech
Date: September 2006
PDF: http://www.icsi.berkeley.edu/pubs/speech/tsd178a.pdf

Overview:
We explore the use of prosodic features beyond pauses, including duration, pitch, and energy features, for automatic sentence segmentation of ICSI meeting data. We examine two different approaches to boundary classification: score-level combination of independent language and prosodic models using HMMs, and feature-level combination of models using a boosting-based method (BoosTexter). We report classification results for reference word transcripts as well as for transcripts from a state-of-the-art automatic speech recognizer. We also compare results using the lexical model plus a pause-only prosody model, versus results using additional prosodic features. Results show that: (1) information from pauses is important, including both pause duration at the boundary, and at the previous and following word boundaries; (2) adding duration, pitch, and energy features yields significant improvement over pause alone; (3) the integrated boosting-based model performs better than the HMM for ASR conditions; (4) training the boosting-based model on recognized words yields further improvement.

Bibliographic Information:
Proceedings of 9th International Conference on Text, Speech and Dialogue (TSD 2006), Brno, Czech Republic, pp. 629-636

Bibliographic Reference:
J. Kolar, E. Shriberg, and Y. Liu. Using Prosody for Automatic Sentence Segmentation of Multi-Party Meetings. Proceedings of 9th International Conference on Text, Speech and Dialogue (TSD 2006), Brno, Czech Republic, pp. 629-636, September 2006