Hello and wellcome to my ICSI research webpage.
Currently I am no longer at ICSI, but this page is still up to date and
covers all technical material that was developed during my stay there
My PhD Thesis: "
Robust
Speaker Diarization for Meetings" can be downloaded
here. It was defended on December 21st in
Barcelona to obtain a PhD on Telecommunications at UPC.
BeamformIt Acoustic Beamforming software
This acoustic beamforming package
has been created within ICSI and as part of my thesis in order to deal
with the multiple distant microphone cases (MDM, ADM, MSLA...) existent
in the recordings available for the NIST RT evaluations. You can find
further explanation of the software, some examples and the source code
in this
webpage. Please register to the emailing list and post any doubts, bugs
or ideas you might have. You are more than wellcome to develop new
algorithms within the proposed framework or to modify the previous
ones, please contact me in order to make such changes publicly
available
to the community.
ICSI-SRI forced
alignments for the RT evaluations
Within my thesis report and for the RT06s speaker diarization system we
have tuned and evaluated our systems based on forced alignments
reference files instead of the manual transcription files proposed by
NIST. One of the main reasons for this is the consistency of the forced
alignment transcription across years.
To obtain them we have:
- run the ICSI-SRI ASR system in forced alignment
mode through the acoustic data in each Individual Headset Channel (IHM)
using its hand-made words transcription.
- Group together all IHM channels (word-based)
forced alignments in one file.
- Merge all those word segments from one speaker
that contain less than 0.3s non-speech between them.
The files are in here.
It contains the RTTM files for all meetings in RT02-RT06s and lecture
room data for RT06s. It also contains the original CTM files (output of
the ASR system used) and the script used to convert them to RTTM. Any
comments are wellcome.
- Speaker
Segmentation: I'm interested in exploring new ways of quickly
and effectively segmenting speech recordings (Broadcast news, meetings
recordings...).
One of the
methods mostly used for detection of acoustic boundaries is the
Bayesian Information Criterion (BIC).
I propose
a novel method called XBIC, which is computationally more efficient
than BIC but at the cost of increased overall error (see
[1]).
- Speaker
Clustering and Diarization: Given a speech recording, speaker
clustering finds
the speaker changes and labels together all segments that come from the
same speaker.
I was
initially helping out in the EARS diarization evaluation of
Broadcast news (see
[2]).
Afterwards, I
worked on clustering of meetings recordings by using only the far
distant
microphones.
My PhD
dissertation focuses on segmentation and clustering in the
Meetings environment using the farfield channels. For a detailed
description read my thesis proposal document ([3]), the thesis document is to
be defended by the end of 2006.
- Dialog
systems: I'm interested in how computers and people can
communicate with each other. In the past I have been playing with a
tool called OpenVXI
wich is available for
dialog systems building. I've been working in little dialog systems
creation projects. Not working on it right now.
- TTS:
Text-to-Speech synthesis is the technology that allows computers to
talk to people. It becomes the other side of the speech technologies
area (opposite to the speech recognition side). Nowadays the most used
technology is concatenative synthesis which uses a prerecorded database
of sounds that put together creates the speech desired.
In my prior life in the
USA I lived in Southern California and worked for Panasonic Speech
Technologies Lab. In there I developed the company's Spanish TTS
system . No work in this area is being done at the present time.
- Robotics:
Robots in the Human Interaction Loop. I believe that robots can ease
the communication between humans and computers by building computers
that resemble humans. I've been a robots hobbyist since University
years and now I envision them as a way to enhance communication.
I have built two "sumo"
fighters and two line followers during University. These were built to
participate in contests among university students teams. In a couple of
occasions my team won some of the cathegories.
After university years I
built two robots oriented towards human-machine interaction: Marta and
Marta2. Marta was built from a plastic robotic platform where I
attached home-made electronics to connect the platform to a laptop
computer. With it I created a small dialog demo to talk to Marta and
ask her to move around and dance for me. Marta2 was built from an
Evolution Robotics' platform, with all electronics taken care of in the
buying of the platform. In that case I worked into wireless
communication with the robot, commanding it via a PDA via voice and
arrows in the touch screen. The robot streamed video and audio back to
the PDA to indicate its position.