"Detecting Sentence-like Boundaries in Speech Using Maximum Entropy and HMM
Approaches"
Finding sentence boundaries in speech recognition output can help downstream language processing modules and make it easier for human to read as well. Multiple knowledge sources, including prosodic features and textual information, are employed to detect sentence-like boundaries in speech.
We compare two different modeling approaches for this problem, an HMM approach and a maximum entropy approach. Both approaches make inaccurate assumptions but also have some advantages. Experiments are conducted using two different corpora, broadcast news and conversational telephone speech. The maximum entropy approach has the ability to integrate highly related textual information and also optimize the conditional likelihood of the training data which better matches the evaluation metric for the sentence detection task. The HMM approach has the advantage of more fully incorporating prosodic information. Performance from each approach degrades in face of speech recognition errors, with the maximum entropy approach suffering slightly more than the HMM approach. The combination of the results from the two approaches achieves the best performance over all the test conditions.