| |
Information Distillation at ICSI
Sibel Yaman
ICSI
Tuesday, March 03, 2009
12:30
Information distillation aims to analyze and interpret massive amount of multilingual speech and text archives and provide queried information to the user. In a typical distillation system, the user query is first processed by an information retrieval system, which finds out the most relevant documents among huge document collections. Each sentence in these documents are processed to answer user’s query.
In this talk, I will describe our work on the information distillation problem, which was stated by BAE as finding exact answers to 5-wh questions, viz. WHO, WHAT, WHEN, WHERE, and WHY. The focus will be on our submission to the 2008 evaluation, which counts on the use of low-level analysis of semantic and syntactic structure of given sentences for exact answer extraction. More specifically, the BAE requires returning answers to 5-wh questions for only one top-level predicate in a given sentence. ICSI/SRI system achieves this by first processing the given sentence to extract one top-level predicate and its arguments, and then by applying to it a set of rules derived from syntactic parse trees. In the second part of the talk, I will talk about the use of knowledge- and data-driven approaches to combining multiple 5W extraction systems. As it turns out, when there are systems that make complementary mistakes, even very simple rules help select the best system for a particular sentence and hence improve the performance.
|
|