"Recent work on Hidden Activation TRAPS (HATS)"
Incorporating long-term (\~{}500-1000 ms) temporal information using multi-layered perceptrons (MLPs) has improved performance on ASR tasks, especially when used to complement traditional short-term (\~{}25-100 ms) features. This talk further studied techniques for incorporating long-term temporal information in the acoustic model by presenting experiments showing: 1) that simply widening acoustic context by using more frames of full band speech energies as input to the MLP is suboptimal compared to a more constrained two-stage approach that first focuses on long-term temporal patterns in each critical band separately and then combines them, 2) that the best two-stage approach studied utilizes hidden activation values of MLPs trained on the log critical band energies (LCBEs) of 51 consecutive frames, and 3) that combining the best two-stage approach with conventional short-term features significantly reduces word error rates on the 2001 NIST Hub-5 conversational telephone speech (CTS) evaluation set with models trained using the Switchboard Corpus.