Publication Details

Title: Language Modeling in the ICSI-SRI Spring 2005 Meeting Speech Recognition Evaluation System
Author: O. Cetin and A. Stolcke
Group: ICSI Technical Reports
Date: July 2005
PDF: ftp://ftp.icsi.berkeley.edu/pub/techreports/2005/tr-05-006.pdf

Overview:
In this report, we describe the language models (LMs) used in the ICSI-SRI system for the NIST Spring 2005 Meeting Rich Transcription (RT-05S) evaluation. Our LMs are linear interpolations of $n$-gram models trained on a small number of in-domain sources and a large number of out-of-domain sources, which include conference proceedings and newly collected web data, in addition to other commonly-used corpora. Despite the lack of any training data for the lecture recognition task in the evaluation, effective LMs for this task are designed. As compared to the LMs of the ICSI-SRI-UW system for the NIST Spring 2004 Meeting Rich Transcription (RT-04S) evaluation, significant improvements in perplexity and word error rate (WER) are obtained, which are mainly due to the additional training data from the web and conference proceedings.

Bibliographic Information:
ICSI Technical Report TR-05-006

Bibliographic Reference:
O. Cetin and A. Stolcke. Language Modeling in the ICSI-SRI Spring 2005 Meeting Speech Recognition Evaluation System. ICSI Technical Report TR-05-006, July 2005