How Do Humans Recognize Speakers? -- Speaker
Recognition Reading List
(Thanks to Barbara Peskin and Joe Campbell)
Papers on Human SID Performance ("familiar vs unfamiliar" talkers):
A. Schmidt-Neilsen & K. Stern (1985) "Identification of known
voices as
a function of familiarity and narrow-band coding," JASA 77, 658-663.
G. Papcun, J. Kreiman & A. Davis (1989) "Long-term memory for
unfamiliar
voices," JASA 85, 913-925.
H. Hollien, W. Majewski & E.T. Doherty (1982) "Perceptual
identification
of voices under normal, stress and disguise speaking conditions,"
J. Phonetics 10, 139-148.
D. Van Lancker, J. Kreiman & K. Emmorey (1985) "Familiar voice
recognition:
Patterns and parameters--Recognition of backward voices," J. Phonetics
13,
19-38.
P. Ladefoged & J. Ladefoged (1980) "The ability of listeners to
identify
voices," UCLA Working Papers in Phonetics 49, 43-51.
T.C. Feustel, A.G. Velius..., "Human and Machine Performance on Speaker
Identity Verification," Speech Tech. 89, pp. 169-170.
Papers on Human Recognition
HUMAN (and/vs MACHINE) SPEECH &
SPEAKER RECOGNITION (thanks to R. Lippmann)
0. Richard Lippmann, "Speech Recognition by Humans and Machine,"
Speech Communication, 1997, 22, 1-15. Available:
http://www.sciencedirect.com/science?_ob=MImg&_imagekey=B6V1C-3SP2NWW-1-1&_cdi=5671&_orig=search&_coverDate=07%2F31%2F1997&_qd=1&_sk=999779998&wchp=dGLbVlz-lSzt
z&_acct=C000022659&_version=1&_userid=501045&md5=332de60f60dcf515e58b45d233dc0eb0&ie=f.pdf
1. Martin Cooke review of Auditory Streaming for human and machine
speech recognition NIPS 2002. Available:
http://www.dcs.shef.ac.uk/~martin/nips.ppt
2. The auditory organization of speech and other sources in listeners
and computational models, Martin Cooke and Daniel Ellis,
Speech Communication, vol. 35, nos.3-4, pp.141-177, 2001. Available:
http://citeseer.nj.nec.com/386802.html
3. Large-Vocabulary Audio-Visual Speech Recognition by Machines and
Humans (2002) Potamianos, Neti, Iyengar, and Helmuth. Available:
http://citeseer.nj.nec.com/potamianos01largevocabulary.html
4. Speaker Verification by Human Listeners: Experiments Comparing Human
and Machine Performance Using the NIST 1998 Speaker Evaluation Data,
Astrid Schmidt-Nielsen and Thomas H. Crystal,
Digital Signal Processing 10, 249ˆ266 (2000). Available:
http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6WDJ-45F541V-K&_user=501045&_coverDate=01%2F31%2F2000&_rdoc=16&_fmt=summary&_orig=browse&_srch=%23toc%
236768%232000%23999899998%23294357!&_cdi=6768&_sort=d&_docanchor=&_acct=C000022659&_version=1&_urlVersion=0&_userid=501045&md5=30dfbea4f043b07393b9d1d2db3767f3
5. Analytic Assessment of Telephone Transmission Impact on ASR
Performance Using a Simulation Model (2001) Sebastian Möller,
Hervé Bourlard, http://citeseer.nj.nec.com/445169.html
6. Flexible, Robust, And Efficient Human Speech Recognition (1997),
Louis C. W. Pols http://citeseer.nj.nec.com/pols97flexible.html
AUTHOR ID (thanks to Larry Heck)
http://www.clsp.jhu.edu/ws2002/groups/supersid/071002_High_Level_Info_for_Spkr_Rec.pdf
Holmes, 1985: Determining Authorship of Written Text
Doddington, 2001: Idiolect
Idiosyncratic Word/Phrase Usage in Written Text
Lessons Learned from Authorship
*Average Word Length (number of letters)
- Mendenhall (1901): Shakespeare or Bacon?
- Briengar (1963): Frequency of 2-4 letter words:
Was Quintus Curtius Snodgrass written by Mark Twain?
*Sentence Length (number of words)
- C.B. Williams in 1940: log of sentence length has
normal distribution
- Morton in 1965: analyzed ancient Greek writings
*Vocabulary Richness
- Thisted Efron in 1987: Newly discovered poem by Shakespeare?
*Hierarchical Cluster Analysis
- Holmes in 1992: Detected changes in authorship in Mormon scripture
Idiosyncratic Word/Phrase Usage in Written Text
Lessons Learned from Authorship Work
* The Best Features -> Frequency of Function Words
(conjunctions, prepositions, & pronouns)
- F. Mosteller & D. Wallace (1963) Authorship of
The Federalist papers: Hamilton or Madison?
. Practically "twins" w/ respect to average sentence length
. Used functions words -> successfully assigned authorship
- Peng & Hengartner, 2002
. Canonical Discriminant Analysis: determined small set of
useful words
- R.D. Peng and N.W. Hengartner "Quantitative Analysis of Literary
Styles,"
The American Statistician, 56 (3), 175-185.
. Extension of work by Mosteller & Wallace on The Federalist
Papers
. Authorship discrimination between 9 authors
. Used 69 function words from Miller-Newman-Friedman list
More Papers on Human Speaker ID
(Thanks to Reva Schwartz)
McGehee, F. (1937) "The Reliability of the Identification of the Human
Voice", J. Gen. Psychol., 17:249-271.
Pollack, I, Pickett, J.M., and Sumby, W.H. (1954) "On the
Identification of
Speakers by Voice", J. Acoust. Soc. Am. 26(3):403-412.
Compton, A. (1963) "Effects of Filtering and Vocal Duration upon the
Identification of Speakers, Aurally", J. Acoust. Soc. Am. 35:1748-1752.
Bricker, P. and Pruzansky, S. (1966) "Effects of Stimulus Content and
Duration on Talker Identification", J. Acoust. Soc. Am. 40:1441-1450.
Young, M., and Campbell, R. (1967) "Effects of Context on Talker
Identification", J. Acoust. Soc. 42(6):1250-1254.
Reich, A.R. and Duke, J.E. (1979) "Effects of Selective Vocal Disguise
Upon
Speaker Identification by Listening", J. Acoust. Soc. Am. 66:1023-1028.
Saslove, H. and Yarmey, A.D. (1980) "Long Term Auditory Memory: Speaker
Identification", J. Applied Psychol. 65:111-116.
Clifford, B.R. (1980) "Voice Identification by Human Listeners: On
Earwitness Reliability", Law Hum. Behav., 4:373-394.
Nolan, J.F. (1983) "The Phonetic Bases of Speaker Recognition",
Cambridge:
Cambridge Univ. Press.
Yarmey, A.D. (1991) "Descriptions of Distinctive and Non-Distinctive
Voices
Over Time", J. Forensic Sci., 31:421-428.
Kunzel, H. (1994) "On the Problem of Speaker Identification by Victims
and
Witnesses", Forensic Linguistics, 1:45-58.
Broeders, A. And Rietveld, A. (1995) "Speaker Identification by
Earwitnesses", Studies in Forensic Phonetics, 64:20-40.
Schiller, N.O. and Koster, O. (1998) "The Ability of Expert Witnesses to
Identify Voices: A Comparison Between Trained and Untrained Listeners",
Forensic Linguistics, 5:1-9.
DeJong, G. (1998) "Earwitness Characteristics and Speaker Identification
Accuracy", Ph.D. dissertation, University of Florida.
Hollien, H. and Schwartz, R. (2001) "Speaker Identification Utilizing
Noncontemporary Soeech" J. Forensic Sci., 46:63-67.
Yarmey, A.D., Yarmey, A.L., Yarmey, M.J., and Parliament, L. (2001)
"Commonsense Beliefs and the Identification of Familiar Voices" Appl.
Cognit. Psychol. 15:283-299.
And here is a link
to an excerpt from the 2nd edition of Doug O'Shaughnessey's
Speech Communications book, which has a nice referenced review of human
speaker recognition (and language/accent id).