Dynamic Stream Weighting for Turbo-Decoding-Based Audiovisual ASR
Title | Dynamic Stream Weighting for Turbo-Decoding-Based Audiovisual ASR |
Publication Type | Conference Paper |
Year of Publication | 2016 |
Authors | Gergen, S., Zeiler S., Abdelaziz A. Hussen, Nickel R., & Kolossa D. |
Published in | Proceedings of Interspeech 2016 |
Abstract | Automatic speech recognition (ASR) enables very intuitive human-machine interaction. However, signal degradations due to reverberation or noise reduce the accuracy of audio-based recognition. The introduction of a second signal stream that is not affected by degradations in the audio domain (e.g., a video stream) increases the robustness of ASR against degradations in the original domain. Here, depending on the signal quality of audio and video at each point in time, a dynamic weighting of both streams can optimize the recognition performance. In this work, we introduce a strategy for estimating optimal weights for the audio and video streams in turbo-decoding-based ASR using a discriminative cost function. The results show that turbo decoding with this maximally discriminative dynamic weighting of information yields higher recognition accuracy than turbo-decoding-based recognition with fixed stream weights or optimally dynamically weighted audiovisual decoding using coupled hidden Markov models. |
ICSI Research Group | Speech |