Joint Modeling for Entity Analysis

Greg Durrett

UC Berkeley

Tuesday, November 25, 2014
12:30 p.m., Conference Room 5A

Many core NLP tasks require reasoning about the entities in a document: who is mentioned, what are they doing, and what else do we know about them?  We present a joint model of three core tasks in the entity analysis stack: coreference resolution (within-document clustering), named entity recognition (coarse semantic typing), and entity linking (matching to Wikipedia entities).  Our model is formally a structured conditional random field.  It builds on top of highly accurate component models for each task: we demonstrate that the coreference component performs well due to a feature set that captures key linguistic phenomena in a simple, uniform way.  Factors in the joint model then represent cross-task interactions, such as the constraint that coreferent mentions have the same semantic type. On the ACE 2005 and OntoNotes datasets, we achieve state-of-the-art results for all three tasks.  Moreover, joint modeling improves performance on all tasks over the component models in isolation.

Bio:

Greg Durrett is a PhD candidate at UC Berkeley, advised by Dan Klein. He works on a range of topics in statistical natural language processing including coreference resolution, morphological analysis, and syntactic parsing.