Talks at the International Computer Science Institute

The International Computer Science Institute
is pleased to present a talk:


"Natural Language Processing: Unsupervised, Adaptive, and Efficient Tools"

Dan Klein
UC Berkeley
http://www.cs.berkeley.edu/~klein

Tuesday, March 1, 2005
ICSI, Conference Room 5A
12:30 pm

Abstract:

With so much text data available, natural language problems are everywhere. So why aren't practical NLP solutions more widespread? One issue is that state-of-the-art NLP tools are supervision-hungry. They require large amounts of human-annotated training data to perform well, and degrade badly when applied out of domain. For example, newswire-trained parsers have trouble with medical text and conversational speech, but it is infeasible to create a new training set for each language, domain, and problem that arises. I'll discuss work along several lines, ranging from adaptation methods, which can blunt the effects of domain change, to unsupervised methods, which require no labeled training data whatsoever. A second issue is that deep linguistic processing is generally too time consuming to be applied over huge document collections. I'll discuss where and why issues of scale arise from linguistic complexity, and how simplified models and representations can lead to substantially faster analysis methods. This talk will include a mix of completed work and previews of newer projects.