"Meeting Recorder Dialog Act (MRDA) Corpus and Research"
This will be two talks in one:
I will give a high-level talk giving an overview of the Meeting Recorder Dialog Act (MRDA) corpus, which includes annotations based on the 75 meetings of the ICSI Meeting Recorder corpus. We are planning to release the MRDA corpus in the next couple of weeks. The talk will give a description of the data and files included (dialog act, adjacency pair, hot spot annotations, documentation, etc.), as well as give examples and related statistics on the corpus.
I will then describe recent research on automatic dialog act (DA) segmentation and classification with the MRDA corpus. Using a split of the ICSI meetings for train/dev/test, we found that simple prosodic models improved on results using word-only models, for both DA segmentation and classification. The research established baselines for these tasks in this domain (i.e. meetings), which has only been recently explored by the community. We also determined several metrics for the segmentation and joint segmentation and classification tasks.