next up previous
Next: Conclusions Up: Precise n-gram Probabilities from Previous: Summary

Experiments

The algorithm described here has been implemented, and is being used to generate bigrams for a speech recognizer that is part of the BeRP spoken-language system [Jurafsky et al. 1994]. An early prototype of BeRP was used in an experiment to assess the benefit of using bigram probabilities obtained through SCFGs versus estimating them directly from the available training corpus.gif The system's domain are inquiries about restaurants in the city of Berkeley. The training corpus used had only 2500 sentences, with an average length of about 4.8 words/sentence.

Our experiments made use of a context-free grammar hand-written for the BeRP domain. With 1200 rules and a vocabulary of 1100 words, this grammar was able to parse 60% of the training corpus. Computing the bigram probabilities from this SCFG takes about 24 hours on a SPARCstation 2-class machine.gif

In experiment 1, the recognizer used bigrams that were estimated directly from the training corpus, without any smoothing, resulting in a word error rate of 35.1%. In experiment 2, a different set of bigram probabilities was used, computed from the context-free grammar, whose probabilities had previously been estimated from the same training corpus, using standard EM techniques. This resulted in a word error rate of 35.3%. This may seem surprisingly good given the low coverage of the underlying CFGs, but notice that the conversion into bigrams is bound to result in a less constraining language model, effectively increasing coverage.

Finally, in experiment 3, the bigrams generated from the SCFG were augmented by those from the raw training data, in a proportion of 200,000 : 2500. We have not attempted to optimize this mixture proportion, e.g., by deleted interpolation [Jelinek and Mercer1980].gif With the bigram estimates thus obtained, the word error rate dropped to 33.5%. (All error rates were measured on a separate test corpus.)

The experiment therefore supports our earlier argument that more sophisticated language models, even if far from perfect, can improve n-gram estimates obtained directly from sample data.


next up previous
Next: Conclusions Up: Precise n-gram Probabilities from Previous: Summary

Andreas Stolcke
Fri Jun 28 20:57:43 PDT 1996