Featured Research: Spoken Language Understanding

The information age has led to an information overload; there is so much within easy access, but sorting through it all can be terribly unwieldy. A Google search of "How many bombings happened in Iraq this week?" brought up over 2.5 million highly redundant search results, but no direct answer. As increasing amounts of data become accessible, the need to sift through it quickly and intelligently will become all the more urgent. Within ICSI's large and diverse Speech Group, Dr. Dilek Hakkani–Tür leads a team of researchers that is training computers to answer questions, as well as distill information and summarize multiple information streams; essentially, to help computers understand language independent of the source — text, speech, or translation — and genre.

Information Distillation

Information distillation — the retrieval of specific information in response to a query — is, on its surface, a straightforward task of combining information retrieval and topic–focused summarization: dig through the various data sources you have at your disposal, and return the information that responds to the original query. Consider a basic query such as "LIST FACTS ABOUT [US Secretary of Defense Rumsfeld's visit to China]." It is possible to search through several documents and identify a number of sentences that contain Rumsfeld's name or title along with a mention of China. It requires a significantly more robust system to be able to handle more complex queries such as "DESCRIBE THE ARRESTS OF PERSONS FROM [Al–Qaeda in Iraq] AND GIVE THEIR ROLE IN THE ORGANIZATION," which is more indicative of the types of queries that ICSI has worked with as part of the DARPA–funded multi–site GALE project. This query asks for more detailed information: names, affiliations, locations, and functions within a group. Since these pieces of information won't all occur explicitly in a given sentence, it requires the system to resolve pronouns, and understand the correlation between different named entities. For example, the distillation system has to understand that arrests of persons from Al–Qaeda in cities of Iraq like Basra and Baghdad should be included in its answer to this query. As queries and question–answering systems grow, their methods for returning relevant and non–redundant information must become more nuanced.

The GALE program — Global Autonomous Language Exploitation — presents some challenging goals. Distillation systems must handle complex queries and return results from a number of different sources such as print news, broadcast news, broadcast conversations, and blogs in different languages, namely English and Chinese. There are, naturally, some difficulties in realizing this goal. The distillation systems are being developed and evaluated with training data provided by DARPA; to deploy this kind of distillation system over real–world data presents new difficulties. Distilling information from spoken data requires accurate automatic speech recognition (ASR). Dr. Sibel Yaman points out that if an ASR system makes an error on a word or name it hasn't encountered before — like the name Sibel Yaman or, at one time, even the company–name–turned–verb "Google" — the ASR system is likely to also botch the word boundaries around the name and render the entire sentence unusable.

Even when ASR systems work, there is no method of verifying data or accounting for any inconsistencies. It's not hard to imagine events that might be described very differently by an official news source and a blog; a distillation system in this setting could return potentially conflicting statements. It's still up to the user to make judgment calls in these situations.

The continuing growth of distillation systems implies that solutions to these problems are within reach. Information retrieved by current distillation is parsed in a number of useful ways. The systems can indeed make correlations like Baghdad ~ Iraq; it understands people's titles and group affiliations, filters out redundant information, and can even identify causation. New developments include the ability to sort the information coming back to the user; instead of just regurgitating relevant sentences, the information can be broken down by who, what, where, how, and why, allowing for quick sorting of relevant information.

Summarization

If a distillation system can answer specific queries from a pool of data, it seems like a logical next step to develop a system that can go through the data and summarize its contents. The Speech Group is also working on just such a process. Whereas distillation determines if given information is relevant to a specific query, summarization finds the important elements from a source and summarizes them, either independently, or with content ranking by the user.

When a human summarizes a set of documents, he or she will read them, intuitively understand what they are talking about, and generate novel sentences explaining the key points. Robust language generation is still quite difficult for computers, so the current approach to computer summarization differs from human methods. First, the computer must determine what the important concepts are in a set of documents, and then it must present the important concepts back in the summary. ICSI's work in this field was initially fueled by the DARPA–funded CALO project (Cognitive Assistant that Learns and Organizes), led by ICSI's research partner SRI. The Speech Group's portion of CALO work developed ASR tools to summarize meetings by key items discussed. A robust diarization system looks at patterns of who speaks when to determine who the important speakers are, which identifies action items. University of Texas at Dallas visitor Shasha Xie's work incorporates prosodic features to identify salient sentences in meetings.

ICSI grad student Dan Gillick observes that "people have tried a lot of different techniques, and things have gotten fancier over time. But nobody has ever figured out the simplest thing that works really well." Accordingly, ICSI's approach to summarization tries to be as straightforward as possible. ICSI researchers use n–grams (sequences of n words) to identify the most often discussed concepts in the documents. The greater the frequency of a concept, the greater is its value. Taking into account the natural tendency of larger sentences to cover more concepts, a sentence with a greater weight is more likely to be a useful sentence in summarization. The value of a summary is the sum of the values of the unique concepts it contains; the ideal summary contains the most important concepts within the size constraint.

Now working with a pool of sentences that are both important and minimally redundant, the system condenses the data into a small summary, on the order of about 100 words. The search for the greatest number of important concepts in a limited space is an optimization problem. This particular NP–hard combinatorial problem is a variation of the knapsack problem, and is similar to one addressed by ICSI Professor Richard M. Karp in his landmark 1972 paper "Reducibility Among Combinatorial Problems." The Speech Group's form of extractive summarization is proving itself very successful in creating summaries that contain large amounts of information; the group was among the leaders at the 2008 Text Analysis Conference evaluation.

A summary may contain a lot of information, but that doesn't mean that it is going to be easy to read or even sensible. Some of the same problems that make it hard for a computer to create well–formed and reader–friendly language also make it hard for a computer to determine if its summary is easily readable. An additional layer of constraints encourages the summary to include sentences that will work well together. The sentences which are more independent — those without relative dates or unresolved pronouns, for example — will work better when concatenated. It may seem natural that the closer a sentence is to the beginning of a document, the more useful it will be in summarizing the document's contents; however, this is only partly true. Weighting all sentences based on their location in a document doesn't actually increase the quality of the summary. ICSI has recently experimented with a baseline system that uses only the very first sentence of documents, and it turns out that this system produces summaries that score as highly as any other systems currently in use. Other than the first sentence, it's not the case that earlier sentences are more useful than later ones. ICSI Postdoc Benoit Favre points out that it's easier for these systems to achieve satisfactory results, because returning complete sentences has little risk for error. A more robust, semantically derived system would construct more intelligent results, but runs a greater risk of making errors in evaluations.

In order to keep the summaries compact, the Speech Group is developing methods of condensing the data within sentences. Sentence compression based solely on syntactic roles can leave sentences syntactically well–formed, but nonsensical. ICSI is exploring the use of semantic role labeling to enhance sentence compression. If a sentence has been annotated, the computer will be able to isolate the important parts of a sentence. Different annotation systems offer different benefits. PropBank, the annotation system currently in use, is a practical choice because of the large amount of annotated data within the corpus, but PropBank's annotation only extends deep enough to syntactically connect verbs with their prepositions, e.g. X gave Y to Z. A more robust annotation system, such as ICSI's own FrameNet, would greatly serve the summarization work being done in the Speech Group. FrameNet's robustness is not only its greatest strength, but is also a major obstacle in applying it to other technologies. FrameNet's annotation process is very rich, and thus takes more time (and funding) to develop into a comparatively comprehensive corpus. As FrameNet grows in size, it will prove to be another invaluable collaborative tool that ICSI researchers can leverage by simply walking down the hall.

What's in store?

In addition to the more established research directions discussed above, Dr. Hakkani–Tür is actively exploring other fields where ASR–fueled information extraction can be helpful.

NSF has funded some recent ICSI work introducing distillation to the field of emergency response. In times of emergencies and disasters, emergency response dispatchers can get absolutely swamped with calls, with some callers having to wait upwards of 10 minutes to speak with a dispatcher. A large number of these calls can be in regards to the same incident, e.g. a broken levee or a car accident. Dr. Hakkani–Tür is training a system to distill information from callers on hold to determine the content of their calls. An additional level of information is extracted from the prosody (voice quality) of the speaker. These data work together to prioritize calls, so that someone with an urgent health need won't be stuck waiting behind a dozen calls about a single car accident.

SRI has also funded some exploratory work on using ASR in elderly care centers. Health care decisions in situations like this can be very subjective; moreover, questions upon which these decisions are made can be uncomfortable for some. Dr. Hakkani–Tür's group has been studying to what extent automation of these conversations can yield improvements in service. ASR can facilitate these conversations happening more frequently, more privately, and more objectively.

Finally, Speech Group researchers focusing on language processing have collaborated with ICSI Networking researchers to explore how information distillation can be used to track black market activity on the internet. Conversational language recognition technology is useful in filtering out advertisements from chat logs to locate illegal solicitations in multiple languages. These tools can combine to identify the locations and content of various illicit activities that occur online, and trace the locations and behaviors of people trading stolen credit card information, social security numbers, and other exploitable personal information.