"Discretized Continuous Semantic Space Language Modeling (DISCUSS LM )"
A Discretized Continuous Semantic Space Language Modeling (DISCUSS LM) is proposed in this talk. A continuous semantic space is firstly constructed by Latent Semantic Analysis (LSA) and then is discretized via Self-Organizing Mapping (SOM). The semantic label of word history as a long term constrain is integrated with traditional n-gram language model as a local constrain. This approach provides a better encoding scheme for word histories than traditional word n-gram language model. Moreover, it scales easily to high order histories without constrains. Initial Experiments on English Broadcast News show improvements on language model perplexity.