Talks at the International Computer Science Institute

The International Computer Science Institute
is pleased to present a talk:


"Topic Identification - New research direction at ICSI"

Brigitte Bigi
ICSI
brigitte [Graphic] icsi.berkeley.edu

Tuesday, April 6, 2004
ICSI, Conference Room 5A
12:30 pm

Abstract:

1st part:

The main objective of topic identification (TID) is to assign one or several topic labels to a flow of textual data. Labels are chosen from a set of topics fixed a priori. The problem of topic identification is different from newspaper articles categorization where numerous approaches have already been proposed in the literature. TID is a dynamic process where the topic is discovered during the text recognition. I will present the method I have developed during my Ph.D. Each topic is defined by a set of keywords automatically selected. This static probability distribution is continuously compared with the time-varying probability distribution of the content of the cache memory. The comparison is made by introducing a symmetric Kullback-Leibler distance which varies in time when new words are considered.

2nd part:

The classical process to train a statistical language model (SLM) is to apply a function C to the corpus which generates a vector W of counts for all encountered word sequences. Then, a function F generates the SLM from W. I propose a new research direction by defining a new C' function, to generate a new W' vector of counts which will be optimized. This function will use a time-series of vectors W1...Wt obtained by the C function applied on the corpus divided by dates.

Speaker Bio:

Brigitte Bigi received the Doctorate degree in computer sciences from the University of Avignon (France), in september 2000. From 2000 to 2002, she was a post-doc in the PAROLE team at LORIA Nancy (France). Since 2002, she is a CNRS Researcher in the GEOD group at CLIPS-IMAG Laboratory, Grenoble (France). Her research interests include natural topic identification and segmentation, and statistical language modelling.