Event

 
 

When People Sound More Like Themselves: Studies of Intrinsic Variation in Speaker Recognition

Liz Shriberg

SRI and ICSI

Tuesday, July 21, 2009
12:30

A key problem in automatic speaker recognition is variability between different recordings of the same talker. Most previous work has addressed "extrinsic" factors (microphone, channel, noise). But not all real-world variability is extrinsic. This talk describes recent SRI studies on "intrinsic" variation, or variation associated with the speaker him- or herself (for example, variation in speaking style, emotion, health, linguistic context).

Part I will describe studies using the SRI FRTIV corpus, a new corpus that varies speaking style (conversational, interview, read, and oration) with vocal effort (normal, low, high)--while controlling for extrinsic variation. We find that performance varies dramatically depending on the type and degree of train/test mismatch. We also discovered that a feature subspace modeling approach works surprisingly well for this data--even if trained on mismatched data.

Part II will present results from modeling local variation (in NIST data), using phonetic and prosodic information to condition cepstral features. The idea here is to compare like regions (within a recording) in train and test. The approach gave excellent results in a recent NIST evaluation, but more importantly provides interpretable information that can be used to better understand where (within their speech) speakers are most idiosyncratic.

The studies and corpus represent joint work with SRI colleagues Martin Graciarena, Sachin Kajarekar, Andreas Stolcke, Nicolas Scheffer, Luciana Ferrer, Harry Bratt, Andreas Kathol.

 
Copyright © 2005 International Computer Science Institute. All Rights Reserved.