Structural Metadata Research in the EARS Program

TitleStructural Metadata Research in the EARS Program
Publication TypeConference Paper
Year of Publication2005
AuthorsLiu, Y., Shriberg E., Stolcke A., Peskin B., Ang J., Hillard D., Ostendorf M., Tomalin M., Woodland P., & Harper M. P.
Published inProceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2005)
Page(s)957-960
Other Numbers7
Abstract

Both human and automatic processing of speech require recognition of more than just words. In this paper we provide a brief overview of research on structural metadata extraction in the DARPA EARS rich transcription program. Tasks include detection of sentence boundaries, filler words, and disfluencies. Modeling approaches combine lexical, prosodic, and syntactic information, using various modeling techniques for knowledge source integration. The performance of these methods is evaluated by task, by data source (broadcast news versus spontaneous telephone conversations) and by whether transcriptions come from humans or from an (errorful) automatic speech recognizer. A representative sample of results shows that combining multiple knowledge sources (words, prosody, syntactic information) is helpful, that prosody is more helpful for news speech than for conversational speech, that word errors significantly impact performance, and that discriminative models generally provide benefit over maximum likelihood models. Important remaining issues, both technical and programmatic, are also discussed.

URLhttp://www.icsi.berkeley.edu/ftp/global/pub/speech/papers/icassp2005-mde.pdf
Bibliographic Notes

Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2005), Philadelphia, Pennsylvania, pp. 957-960

Abbreviated Authors

Y. Liu, E. Shriberg, A. Stolcke, B. Peskin, J. Ang, D. Hillard, M. Ostendorf, M. Tomalin, P. Woodland, and M. Harper

ICSI Research Group

Speech

ICSI Publication Type

Article in conference proceedings