Introducing the Turbo-Twin-HMM for Audio-Visual Speech Enhancement

TitleIntroducing the Turbo-Twin-HMM for Audio-Visual Speech Enhancement
Publication TypeConference Paper
Year of Publication2016
AuthorsZeiler, S., Meutzner H., Abdelaziz A. Hussen, & Kolossa D.
Published inProceedings of Interspeech 2016
Abstract

Models for automatic speech recognition (ASR) hold detailed information about spectral and spectro-temporal characteristics of clean speech signals. Using these models for speech enhancement is desirable and has been the target of past research efforts. In such model-based speech enhancement systems, a powerful ASR is imperative. To increase the recognition rates especially in low-SNR conditions, we suggest the use of the additional visual modality, which is mostly unaffected by degradations in the acoustic channel. An optimal integration of acoustic and visual information is achievable by joint inference in both modalities within the turbo-decoding framework. Thus combining turbo-decoding with Twin-HMMs for speech enhancement, notable improvements can be achieved, not only in terms of instrumental estimates of speech quality, but also in actual speech intelligibility. This is verified through listening tests, which show that in highly challenging noise conditions, average human recognition accuracy can be improved from 64% without signal processing to 80% when using the presented architecture.

ICSI Research Group

Speech