I am a PhD student in the ICSI Speech Group. My research is on automatic speech recognition (ASR): making computers turn speech into text. I can be reached by email (gelbart at icsi dot berkeley dot edu) or by phone (778-997-6098).
I am now concluding my thesis research on feature selection for multi-stream ASR, or in other words, ensemble feature selection for ASR. I am influenced by the ensemble feature selection work of Tin Kam Ho, David Opitz, Alexey Tsymbal, and Padraig Cunningham. Here's a summary of my motivation: In multi-stream ASR systems in which diversity comes from the use of different feature vectors by different classifiers, features are normally only grouped together in a feature vector if they were calculated by the same feature extraction algorithm. Furthermore, feature vectors for different streams normally have no elements in common. There are conceptual and experimental reasons to suppose that it may be useful to relax these constraints and allow features to be mixed in a more fluid way.
The tools I used for benchmarking using the OGI ISOLET and OGI Numbers data sets can be found here. That page contains links to recognizer configuration files, recognizer scripts, and scripts to create noisy versions of the audio files with background noises added.
I have experimented with a mean subtraction algorithm for reverberation compensation that was developed as part of Carlos Avendano's thesis work at OGI. I redesigned the algorithm to produce time-domain output, making it much easier to integrate with existing ASR software. I then evaluated it using data from a corpus of spoken digits recordings collected (not by me) using tabletop microphones at ICSI. We published a paper on this at ASRU 2001. Please also see this page which contains corrections, source code, audio files, and additional results that were not included in the paper. That page also has a bibliography of related publications. We published further results at AVIOS 2002 and ICSLP 2002, and other research groups have published results since then using the time-domain output version of the algorithm.
I co-authored a EUROSPEECH 2003 paper with Docio and Morgan in which we compared the performance of different types of tabletop microphones, as well as investigating the performance of noise reduction.
Human speech recognition accuracy is often much higher than computer accuracy, even in tasks (like nonsense syllables) where semantic understanding does not play a role. This has inspired work that aims to build computer speech recognition using signal processing inspired by the human hearing system. I have co-authored papers on this topic with Werner Hemmert and others.
I helped Michael Kleinschmidt with his thesis work on the use of Gabor filters for speech recognition. The work has since been continued by Bernd Meyer and others. This page contains a bibliography, links to source code, and some information about unpublished results.
Technologically, it is becoming increasingly simple to record and preserve the audio of meetings. The value of such recordings is higher if ASR-based speech indexing and search is possible, much like how preserving old emails is more useful if one can search through them for emails containing particular keywords. There are also potential uses of ASR technology while a meeting is ongoing.
ICSI has been doing quite a bit of work in this area. My main contribution was to extend Transcriber to support multiple-talker transcription. The modified tool was used by a number of people, way back in 2001. Transcriber and other tools have made progress since then, so I doubt my version is useful anymore.
I also helped integrate noise reduction into ICSI's ASR system for meetings. The code can be found here. (I didn't write the core code, I just cleaned it up a bit and made it easier to use with meeting data.)
I have some ideas about adding reverberation to non-reverberant conversational speech training data (such as Switchboard), in order to increase the amount of reverberant training data available for meeting recognition. However, due to other priorities, I'm not working on this at the moment. If you are interested, please feel free to get in touch.