Multimodal Addressee Detection in Multiparty Dialogue Systems

TitleMultimodal Addressee Detection in Multiparty Dialogue Systems
Publication TypeConference Paper
Year of Publication2015
AuthorsTsai, T.. J., Stolcke A., & Slaney M.
Other Numbers3811

Addressee detection answers the question, “Are you talking to me?”When multiple users interact with a dialogue system, it is importantto know when a user is speaking to the computer and when he or sheis speaking to another person. We approach this problem from a multimodalperspective, using lexical, acoustic, visual, dialog state, andbeam-forming information. Using data from a multiparty dialoguesystem, we demonstrate the benefit of using multiple modalities overusing a single modality. We also assess the relative importance of thevarious modalities in predicting the addressee. In our experiments,we find that acoustic features are by far the most important, that ASRand system-state information are useful, and that visual and beamformingfeatures provide little additional benefit. Our study suggeststhat acoustic, lexical, and system state information are an effective,economical combination of modalities to use in addressee detection.

Bibliographic Notes

Proceedings of the 40th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2015), Brisbane, Australia

Abbreviated Authors

TJ Tsai, A. Stolcke, and M. Slaney

ICSI Research Group


ICSI Publication Type

Article in conference proceedings