Finding Difficult Speakers in Automatic Speaker Recognition

TitleFinding Difficult Speakers in Automatic Speaker Recognition
Publication TypeThesis
Year of Publication2011
AuthorsStoll, L.
Other Numbers3247

The task of automatic speaker recognition, wherein a system verifies or determines aspeaker’s identity using a sample of speech, has been studied for a few decades. In that time, agreat deal of progress has been made in improving the accuracy of the system’s decisions, throughthe use of more successful machine learning algorithms, and the application of channel compensationtechniques and other methodologies aimed at addressing sources of errors such as noise or datamismatch. In general, errors can be expected to have one or more causes, involving both intrinsicand extrinsic factors. Extrinsic factors correspond to external influences, includingreverberation, noise, and channel or microphone effects. Intrinsic factors relate inherently to thespeaker himself, and include sex, age, dialect, accent, emotion, speaking style, and other voicecharacteristics. This dissertation focuses on the relatively unexplored issue of dependence ofsystem errors on intrinsic speaker characteristics. In particular, I investigate the phenomenonthat some speakers within a given population have a tendency to cause a large proportion of errors,and explore ways of finding such speakers.There are two main components to this thesis. First, I establish the dependence of systemperformance on speaker characteristics, building upon and expanding previous work demonstrating the existence of speakers with tendencies to cause false alarm or false rejection errors. To thisend, I explore two different data sets: one that is an older collection of telephone channelconversational speech, and one that is a more recent collection of conversational speech recordedon a variety of channels, including the telephone, as well as various types of microphones.Furthermore, in addition to considering a traditional speaker recognition system approach, for thesecond data set I utilize the outputs of a more con- temporary approach that is better able tohandle variations in channel. The results of such analysis repeatedly show variations in behavioracross speakers, both for true speaker and impostor speaker cases. Variation occurs both at thelevel of speech utterances, wherein a given speaker’s performance can depend on which of his speechutterances is used, as well as on the speaker level, wherein some speakers have overall tendencies

Bibliographic Notes

University of California, Berkeley PhD thesis, Berkeley, California

Abbreviated Authors

L. Stoll

ICSI Research Group


ICSI Publication Type

PhD thesis