Talks at the International Computer Science Institute

The International Computer Science Institute
is pleased to present a talk:


Mining from Open Answers from Questionnaire Data
and
Generalizing Case Frames Using a Thesaurus and the MDL Principle

Hang Li
Microsoft Research China
hangli [Graphic] microsoft.com

http://www.research.microsoft.com/users/hangli/

Thursday, August 30, 2001
ICSI, Rm 607
11:00 am - Noon

Abstract:

1. Mining from Open Answers from Questionnaire Data

Surveys are an important part of marketing and customer relationship management, and open answers (i.e., answers to open questions) in particular many contain valuable information and provide an important basis for making business decisions. We have developed a text mining system that provides a new way for analyzing open answers in questionnaire data. The product is able to perform the following two functions: (A) accurate extraction of characteristics for individual analysis targets, (B) accurate extraction of the relationships among characteristics of analysis targets. In this talk, I will describe the working of our text mining system. It employs two statistical learning techniques: rule analysis and Correspondence Analysis for performing the two functions. Our text mining system has already been put into use by a number of large corporations in Japan in the performance of text mining on various types of survey data, including open answers about brand images, open answers about company images, complaints about products, comments written on home pages, business reports, and help desk records. In this it has been found to be useful in forming a basis for effective business decisions.

(This work was done when the speaker was at NEC Research, jointly with Kenji Yamanishi)

2. Generalizing Case Frames Using a Thesaurus and the MDL Principle

In this talk, I will describe a method for generalizing case frame slots (or learning case slot patterns) from corpus data. We formalize this problem as that of estimating a probability model, which we call case slot model. We restrict the class of case slot models to that of tree cut models by using an existing thesaurus. In this way, the problem of generalizing the values of a case slot turns out to be that of estimating a model from the class of tree cut models for some fixed thesaurus tree. We employ the Minimum Description Length (MDL) principle for model estimation. We then employ an efficient algorithm, which provably obtains the optimal tree cut model in terms of MDL.

(This work was done when the speaker was at NEC Research, jointly with Naoki Abe)

This talk will be held in the Main Lecture Hall at ICSI.
1947 Center Street, Sixth Floor, Berkeley, CA 94704-1198
(on Center between Milvia and Martin Luther King Jr. Way)
Click here for a map