Talks at the International Computer Science Institute

The International Computer Science Institute
is pleased to present a talk:


"Discretized Continuous Semantic Space Language Modeling (DISCUSS LM )"

Yan Huang
ICSI
yan [Graphic] icsi.berkeley.edu

Tuesday, August 31, 2004
ICSI, Conference Room 5A
12:30 pm

Abstract:

A Discretized Continuous Semantic Space Language Modeling (DISCUSS LM) is proposed in this talk. A continuous semantic space is firstly constructed by Latent Semantic Analysis (LSA) and then is discretized via Self-Organizing Mapping (SOM). The semantic label of word history as a long term constrain is integrated with traditional n-gram language model as a local constrain. This approach provides a better encoding scheme for word histories than traditional word n-gram language model. Moreover, it scales easily to high order histories without constrains. Initial Experiments on English Broadcast News show improvements on language model perplexity.