Publication Details
Title: Comparing HMM, Maximum Entropy, and Conditional Random Fields for Disfluency Detection
Author: Y. Liu, E. Shriberg, A. Stolcke, and M. Harper
Group: Speech
Date: September 2005
PDF: [Not available online]
Overview:
Automatic detection of disfluencies in spoken language is important for making speech recognition output more readable, and for aiding downstream language processing modules. We compare a generative hidden Markov model (HMM)-based approach and two conditional models - a maximum entropy (Maxent) model and a conditional random field (CRF) - for detecting disfluencies in speech. The conditional modeling approaches provide a more principled way to model correlated features. In particular, the CRF approach directly detects the reparandum regions, and thus avoids the use of ad-hoc heuristic rules. We evaluate performance of these three models across two different corpora (conversational speech and broadcast news) and for two types of transcriptions (human transcriptions and recognition output). Overall we find that the conditional modeling approaches (Maxent and CRF) tend to outperform (with one exception) the HMM approach. Effects of speaking style, word recognition errors, and future directions are also discussed.
Bibliographic Information:
Proceedings of the 9th European Conference on Speech Communication and Technology (Interspeech 2005-Eurospeech 2005), Lisboa, Portugal, pp. 3313-3316
Bibliographic Reference:
Y. Liu, E. Shriberg, A. Stolcke, and M. Harper. Comparing HMM, Maximum Entropy, and Conditional Random Fields for Disfluency Detection. Proceedings of the 9th European Conference on Speech Communication and Technology (Interspeech 2005-Eurospeech 2005), Lisboa, Portugal, pp. 3313-3316, September 2005
Author: Y. Liu, E. Shriberg, A. Stolcke, and M. Harper
Group: Speech
Date: September 2005
PDF: [Not available online]
Overview:
Automatic detection of disfluencies in spoken language is important for making speech recognition output more readable, and for aiding downstream language processing modules. We compare a generative hidden Markov model (HMM)-based approach and two conditional models - a maximum entropy (Maxent) model and a conditional random field (CRF) - for detecting disfluencies in speech. The conditional modeling approaches provide a more principled way to model correlated features. In particular, the CRF approach directly detects the reparandum regions, and thus avoids the use of ad-hoc heuristic rules. We evaluate performance of these three models across two different corpora (conversational speech and broadcast news) and for two types of transcriptions (human transcriptions and recognition output). Overall we find that the conditional modeling approaches (Maxent and CRF) tend to outperform (with one exception) the HMM approach. Effects of speaking style, word recognition errors, and future directions are also discussed.
Bibliographic Information:
Proceedings of the 9th European Conference on Speech Communication and Technology (Interspeech 2005-Eurospeech 2005), Lisboa, Portugal, pp. 3313-3316
Bibliographic Reference:
Y. Liu, E. Shriberg, A. Stolcke, and M. Harper. Comparing HMM, Maximum Entropy, and Conditional Random Fields for Disfluency Detection. Proceedings of the 9th European Conference on Speech Communication and Technology (Interspeech 2005-Eurospeech 2005), Lisboa, Portugal, pp. 3313-3316, September 2005
