In general, this page is always "under construction".
However, current Speech Group projects include the following:
- Effective Affordable Reusable
Speech (EARS):
This is the major DARPA speech program that started in 2002. We are
participating in
both major components of this program, namely:
- Rich Transcription of
Conversational Speech - in collaboration with SRI and the
University of Washington, we are working
to generate readable transcriptions of conversational speech in
multiple
languages. "Readable" here means incorporating capitalization,
punctuation, and speaker markers; but it also means making major
improvements in speech recognition performance, since word errors are
still
significant in this type of task.
- Novel Approaches - we
are studying both replacements of the standard
spectral envelope as the speech representation of choice (typically
with cepstral transformation). This includes work on the acoustic
"front
end", but also includes research on statistical modeling for the new
features that are being generated.
- The Meeting Recorder project,
which seeks to develop speech recognizers that would be useful in
conventional meeting contexts, as well as information retrieval and
other applications that such recognition would make possible. One
component of this work
is the SpeechCorder, which is a
project to design a portable digital tape recorder that uses robust
speech recognition to
create an indexable and annotable word stream for archiving meetings.
SpeechCorder is also associated with the IRAM project.
- Speaker Recognition: The goal of
this project is to use
higher-level features (such as word usage, prosodic characteristics,
pronunciation patterns, idiosyncratic laughs or other non-speech
events) to improve speaker recogniton.
- In addition, there are a number of ongoing projects (and others
just starting up) in a range of areas pertaining to speech
processing that is robust to noise and reverberation, to the automatic
derivation of speech categories, to the incorporation of prosodic
features for predicting punctuation, disfluencies, and overlaps,
and the automatic classification of frustration during human-machine
interaction.
And see here for a number of
earlier
projects.
Nelson Morgan
- $Date: 2003/10/23 18:56:27 $