Publication Details

Title: Stream Combination Before and/or After the Acoustic Model
Author: D. P.W. Ellis
Group: ICSI Technical Reports
Date: April 2000
PDF: ftp://ftp.icsi.berkeley.edu/pub/techreports/2000/tr-00-007.pdf

Overview:
Combining a number of diverse feature streams has proven to be a very flexible and beneficial technique in speech recognition. In the context of hybrid connectionist-HMM recognition, feature streams can be combined at several points. In this work, we compare two forms of combination: at the input to the acoustic model, by concatenating the feature streams into a single vector (feature combination or FC), and at the output of the acoustic model, by averaging the logs of the estimated posterior probabilities of each subword unit (posterior combination or PC). Based on four feature streams with varying degrees of mutual dependence, we find that the best combination strategy is a combination of feature and posterior combination, with streams that are more independent, as measured by an approximation to conditional mutual information, showing more benefit from posterior combination.

Bibliographic Information:
ICSI Technical Report TR-00-007

Bibliographic Reference:
D. P.W. Ellis. Stream Combination Before and/or After the Acoustic Model. ICSI Technical Report TR-00-007, April 2000