The "Poor Quality" Meetings Corpus

Go to any meeting or lecture with the younger generation of researchers, business people, or government employees, and there is a laptop or smart phone at every seat. Each laptop and smart phone is capable not only of recording and transmitting video and audio in real time, but also of advanced analytics on the data (e.g., speech recognition, speaker identification, face detection, etc.). Yet this rich resource goes largely unexploited, mostly because there are not enough good training data for machine learning algorithms.

The first step in exploiting this resource is to collect a corpus of audio, video, and annotations of "natural" meetings using the participants' own laptops and cell phones, allowing both analysis of the meetings and training of machine learning algorithms. This one-year planning project involves design of the corpus, including collection protocols, signals, formats, and annotations, collection of a small pilot corpus, and ongoing interaction with the community through mailing lists, forums, wikis, and a workshop hosted at ICSI. The annotations include the words spoken, events such as laughter, a telephone ringing, or a new participant entering the room, who is speaking, who appears on which camera, head and hand gestures, the participants' focus of attention, and summaries and topics.