ICSI Page for Xavier Anguera



myself

Hello and wellcome to my ICSI research webpage.
Currently I am no longer at ICSI, but this page is still up to date and covers all technical material that was developed during my stay there
If you would like to visit my personal webpage please go here
For my most current resume please go here


My PhD Thesis: "Robust Speaker Diarization for Meetings" can be downloaded here. It was defended on December 21st in Barcelona to obtain a PhD on Telecommunications at UPC.

BeamformIt Acoustic Beamforming software

This acoustic beamforming package has been created within ICSI and as part of my thesis in order to deal with the multiple distant microphone cases (MDM, ADM, MSLA...) existent in the recordings available for the NIST RT evaluations. You can find further explanation of the software, some examples and the source code in this webpage. Please register to the emailing list and post any doubts, bugs or ideas you might have. You are more than wellcome to develop new algorithms within the proposed framework or to modify the previous ones, please contact me in order to make such changes publicly available to the community.

ICSI-SRI forced alignments for the RT evaluations

Within my thesis report and for the RT06s speaker diarization system we have tuned and evaluated our systems based on forced alignments reference files instead of the manual transcription files proposed by NIST. One of the main reasons for this is the consistency of the forced alignment transcription across years.
To obtain them we have:
  1. run the ICSI-SRI ASR system in forced alignment mode through the acoustic data in each Individual Headset Channel (IHM) using its hand-made words transcription.
  2. Group together all IHM channels (word-based) forced alignments in one file.
  3. Merge all those word segments from one speaker that contain less than 0.3s non-speech between them.
The files are in here. It contains the RTTM files for all meetings in RT02-RT06s and lecture room data for RT06s. It also contains the original CTM files (output of the ASR system used) and the script used to convert them to RTTM. Any comments are wellcome.

Topics of Interest
  • Speaker Segmentation: I'm interested in exploring new ways of quickly and effectively segmenting speech recordings (Broadcast news, meetings recordings...).

        One of the methods mostly used for detection of acoustic boundaries is the Bayesian Information   Criterion (BIC).

        I propose a novel method called XBIC, which is computationally more efficient than BIC but at the cost of increased overall error (see [1]).

  • Speaker Clustering and Diarization: Given a speech recording, speaker clustering finds the speaker changes and labels together all segments that come from the same speaker.

I was initially helping out in the EARS diarization evaluation of Broadcast news (see [2]).
Afterwards, I worked on clustering of meetings recordings by using only the far distant microphones.

My PhD dissertation focuses on segmentation and clustering in the Meetings environment using the farfield channels. For a detailed description read my thesis proposal document ([3]), the thesis document is to be defended by the end of 2006.

  • Dialog systems: I'm interested in how computers and people can communicate with each other. In the past I have been playing with a tool called OpenVXI wich is available for dialog systems building. I've been working in little dialog systems creation projects. Not working on it right now.
  • TTS: Text-to-Speech synthesis is the technology that allows computers to talk to people. It becomes the other side of the speech technologies area (opposite to the speech recognition side). Nowadays the most used technology is concatenative synthesis which uses a prerecorded database of sounds that put together creates the speech desired.
In my prior life in the USA I lived in Southern California and worked for Panasonic Speech Technologies Lab. In there I developed the company's Spanish TTS system . No work in this area is being done at the present time.
  • Robotics: Robots in the Human Interaction Loop. I believe that robots can ease the communication between humans and computers by building computers that resemble humans. I've been a robots hobbyist since University years and now I envision them as a way to enhance communication. 

I have built two "sumo" fighters and two line followers during University. These were built to participate in contests among university students teams. In a couple of occasions my team won some of the cathegories.

After university years I built two robots oriented towards human-machine interaction: Marta and Marta2. Marta was built from a plastic robotic platform where I attached home-made electronics to connect the platform to a laptop computer. With it I created a small dialog demo to talk to Marta and ask her to move around and dance for me. Marta2 was built from an Evolution Robotics' platform, with all electronics taken care of in the buying of the platform. In that case I worked into wireless communication with the robot, commanding it via a PDA via voice and arrows in the touch screen. The robot streamed video and audio back to the PDA to indicate its position.  


Selected Papers (for papers written at ICSI, see in here)
  1. Evolutive Speaker Segmentation using a Repository System, Xavier Anguera and Javier Hernando.  ICSLP, Korea 2004.
  2.  Segmentaci�de locutor per a la indexaci�autom�ica de bases de dades multim�ia en catal�/em>, Xavier Anguera, Mireia Farrs , Javier Hernando and Alberto Abad.  II Congr� denginyeria en llengua catalana, Andorra 2004.
  3.  Els sistemes de reconeixement de veu i traducci�autom�ica en catal� present i futur, Mireia Farrs, Jan Anguita, Xavier Anguera, Josep M. Crego, Adri�/span> de Gispert, Javier Hernando, Climent Nadeu. II Congr� denginyeria en llengua catalana, Andorra 2004.
  4.  XBIC: Nueva Medida para Segmentacion de Locutor hacia el Indexado Automaticode la Se�l de voz", Xavier Anguera, Javier Hernando and Jan Anguita. III Jornadas en Tecnolog� del Habla, Valencia, 17-10 Nov 2004.
  5.  Towards Robust Speaker Segmentation: The ICSI-SRI Fall 2004 Diarization System, Chuck Wooters, James Fung, Barbara Peskin and Xavier Anguera. EARS Program RT-04 Workshop, nov 7-10 2004.
  6.  XBIC: Real-Time Cross Probabilities Measure for Speaker Segmentation, Xavier Anguera. International Computer Science Institute Technical Report TR-05-008.
  7.  Robust Speaker Segmentation for Meetings: The ICSI-SRI Spring 2005 Diarization System, Xavier Anguera, Chuck Wooters, Barbara Peskin and Mateu Aguilo. Rich Transcription 2005 Spring Meeting Recognition Evaluation Workshop, Edinburgh, UK, July 2005.
  8.  Further Progress in Meeting Recognition: The ICSI-SRI Spring 2005 Speech-to-Text Evaluation System, Andreas Stolcke, Xavier Anguera, Kofy Boakye, Ozgur Cetin, Frantisek Grezl, Adam Janin, Arindam Mandal, Barbara Peskin, Chuck Wooters and Jing Zheng. Rich Transcription 2005 Spring Meeting Recognition Evaluation Workshop, Edinburgh, UK, July 2005.
  9. PETRA: Advanced Oral Interfaces for Unified Messaging Applications, David Hernando, Javier Hernando and Xavier Anguera. Buran magazine, IEEE Barcelona student branch. Number 22, September 2005.
  10. Speaker Diarization for Multi-Party Meetings Using Acoustic Fusion, Xavier Anguera, Chuck Wooters and Javier Hernando. Automatic Speech Recognition and Understanding (ASRU). Puerto Rico, November 2005.
  11.  Purity Algorithms for Speaker Diarization of Meetings Data, Xavier Anguera, Chuck Wooters and Javier Hernando. ICASSP 2006, Toulouse, France, May 2006
  12. "Speaker Diarization for Multi-Microphone Meetings Using only Between-Channel Differences", Jose M. Pardo, Xavier Anguera, Chuck Wooters. MLMI 2006, Washington, USA, May 2006
  13. "Automatic Cluster Complexity and Quantity Selection: Towards Robust Speaker Diarization", Xavier Anguera, Chuck Wooters, Javier Hernando. MLMI 2006, Washington, USA, May 2006
  14. "Purity Algorithms for Speaker Diarization of Meetings Data", Xavier Anguera, Chuck Wooters, Javier Hernando. MMUA 2006, Toulouse, France, May 2006
  15. "Hybrid Speech/Non-Speech Detector Applied to Speaker Diarization of Meetings", Xavier Anguera, Mateu Aguilo, Chuck Wooters, Climent Nadeu and Javier Hernando. Speaker Odyssey 2006, San Juan de Puerto Rico, USA, June 2006
  16. "Friends and Enemies: A novel Initialization for Speaker Diarization", Xavier Anguera, Chuck Wooters and Javier Hernando, ICSLP 2006, Pittsburgh, USA, September 2006
  17. "Robust Speaker Diarization for Meetings: ICSI RT06s evaluation system", Xavier Anguera, Chuck Wooters and Javier Hernando, ICSLP 2006, Pittsburgh, USA, September 2006
  18. "Multi-Stream Speaker Diarization Systems for the Meeting Domain", A. Gallardo-Antolin, X. Anguera, and C. Wooters, ICSLP 2006, Pittsburgh, USA, September 2006
  19. "Speaker Diarization for Multiple Distant Microphone Meetings: Mixing Acoustic Features and Inter-Channel Time Differences"J. Pardo, X. Anguera, C. Wooter, ICSLP 2006, Pittsburgh, USA, September 2006 (to appear)


Other documents
  1. "Robust Speaker Segmentation and Clustering for Meetings", presented as PhD Thesis proposal in UPC (Barcelona, Spain).

Pictures
  • Pictures from my home town in Spain: Tarragona
 


Interesting Links