Language Modeling in the ICSI-SRI Spring 2005 Meeting Speech Recognition Evaluation System

TitleLanguage Modeling in the ICSI-SRI Spring 2005 Meeting Speech Recognition Evaluation System
Publication TypeTechnical Report
Year of Publication2005
Authorsetin, Ö. Ç., & Stolcke A.
Other Numbers1611
Abstract

In this report, we describe the language models (LMs) used in the ICSI-SRI system for the NIST Spring 2005 Meeting Rich Transcription (RT-05S) evaluation. Our LMs are linear interpolations of $n$-gram models trained on a small number of in-domain sources and a large number of out-of-domain sources, which include conference proceedings and newly collected web data, in addition to other commonly-used corpora. Despite the lack of any training data for the lecture recognition task in the evaluation, effective LMs for this task are designed. As compared to the LMs of the ICSI-SRI-UW system for the NIST Spring 2004 Meeting Rich Transcription (RT-04S) evaluation, significant improvements in perplexity and word error rate (WER) are obtained, which are mainly due to the additional training data from the web and conference proceedings.

URLhttp://www.icsi.berkeley.edu/ftp/global/pub/techreports/2005/tr-05-006.pdf
Bibliographic Notes

ICSI Technical Report TR-05-006

Abbreviated Authors

O. Cetin and A. Stolcke

ICSI Research Group

Speech

ICSI Publication Type

Technical Report