Publication Details
Title: Improved Speech Activity Detection Using Cross-Channel Features for Recognition of Multiparty Meetings
Author: K. Boakye and A. Stolcke
Group: Speech
Date: September 2006
PDF: http://www.icsi.berkeley.edu/pubs/speech/boakye_stolcke.pdf
Overview:
We describe the development of a speech activity detection system using an HMM-based segmenter for automatic speech recognition on individual headset microphones in multispeaker meetings. We look at cross-channel features (energy and correlation based) to incorporate into the segmenter for the purpose of addressing errors related to cross-channel phenomena such as crosstalk. Results demonstrate that these features provide a marked improvement (18% relative) over a baseline system using single-channel features as well as an improvement (8% relative) over our previous solution of separate speech activity detection and cross-channel analysis. In addition, the simple cross-channel energy features are shown to be more robust - and consequently better performing - than the more common correlation-based features.
Bibliographic Information:
Proceedings of the 9th International Conference on Spoken Language Processing (ICSLP-Interspeech 2006), Pittsburgh, Pennsylvania, pp. 1962-1965
Bibliographic Reference:
K. Boakye and A. Stolcke. Improved Speech Activity Detection Using Cross-Channel Features for Recognition of Multiparty Meetings. Proceedings of the 9th International Conference on Spoken Language Processing (ICSLP-Interspeech 2006), Pittsburgh, Pennsylvania, pp. 1962-1965, September 2006
Author: K. Boakye and A. Stolcke
Group: Speech
Date: September 2006
PDF: http://www.icsi.berkeley.edu/pubs/speech/boakye_stolcke.pdf
Overview:
We describe the development of a speech activity detection system using an HMM-based segmenter for automatic speech recognition on individual headset microphones in multispeaker meetings. We look at cross-channel features (energy and correlation based) to incorporate into the segmenter for the purpose of addressing errors related to cross-channel phenomena such as crosstalk. Results demonstrate that these features provide a marked improvement (18% relative) over a baseline system using single-channel features as well as an improvement (8% relative) over our previous solution of separate speech activity detection and cross-channel analysis. In addition, the simple cross-channel energy features are shown to be more robust - and consequently better performing - than the more common correlation-based features.
Bibliographic Information:
Proceedings of the 9th International Conference on Spoken Language Processing (ICSLP-Interspeech 2006), Pittsburgh, Pennsylvania, pp. 1962-1965
Bibliographic Reference:
K. Boakye and A. Stolcke. Improved Speech Activity Detection Using Cross-Channel Features for Recognition of Multiparty Meetings. Proceedings of the 9th International Conference on Spoken Language Processing (ICSLP-Interspeech 2006), Pittsburgh, Pennsylvania, pp. 1962-1965, September 2006
