How Do Humans Recognize Speakers?  --  Speaker Recognition Reading List

(Thanks to Barbara Peskin and Joe Campbell)

Papers on Human SID Performance ("familiar vs unfamiliar" talkers):

A. Schmidt-Neilsen & K. Stern (1985) "Identification of known voices as
a function of familiarity and narrow-band coding," JASA 77, 658-663.

G. Papcun, J. Kreiman & A. Davis (1989) "Long-term memory for unfamiliar
voices," JASA 85, 913-925.

H. Hollien, W. Majewski & E.T. Doherty (1982) "Perceptual identification
of voices under normal, stress and disguise speaking conditions,"
J. Phonetics 10, 139-148.

D. Van Lancker, J. Kreiman & K. Emmorey (1985) "Familiar voice recognition:
Patterns and parameters--Recognition of backward voices," J. Phonetics 13,
19-38.

P. Ladefoged & J. Ladefoged (1980) "The ability of listeners to identify
voices," UCLA Working Papers in Phonetics 49, 43-51.

T.C. Feustel, A.G. Velius..., "Human and Machine Performance on Speaker
Identity Verification," Speech Tech. 89, pp. 169-170.

Papers on Human Recognition


HUMAN (and/vs MACHINE) SPEECH & SPEAKER RECOGNITION (thanks to R. Lippmann)

0. Richard Lippmann, "Speech Recognition by Humans and Machine,"
Speech Communication, 1997, 22, 1-15.  Available:
http://www.sciencedirect.com/science?_ob=MImg&_imagekey=B6V1C-3SP2NWW-1-1&_cdi=5671&_orig=search&_coverDate=07%2F31%2F1997&_qd=1&_sk=999779998&wchp=dGLbVlz-lSzt
z&_acct=C000022659&_version=1&_userid=501045&md5=332de60f60dcf515e58b45d233dc0eb0&ie=f.pdf


1. Martin Cooke review of Auditory Streaming for human and machine
speech recognition NIPS 2002. Available:
 http://www.dcs.shef.ac.uk/~martin/nips.ppt

2. The auditory organization of speech and other sources in listeners
and computational models, Martin Cooke and Daniel Ellis,
Speech Communication, vol. 35, nos.3-4, pp.141-177, 2001. Available:
http://citeseer.nj.nec.com/386802.html

3. Large-Vocabulary Audio-Visual Speech Recognition by Machines and
Humans (2002) Potamianos, Neti, Iyengar, and Helmuth.  Available:
http://citeseer.nj.nec.com/potamianos01largevocabulary.html

4. Speaker Verification by Human Listeners: Experiments Comparing Human
and Machine Performance Using the NIST 1998 Speaker Evaluation Data,
Astrid Schmidt-Nielsen and Thomas H. Crystal,
Digital Signal Processing 10, 249ˆ266 (2000).  Available:
http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6WDJ-45F541V-K&_user=501045&_coverDate=01%2F31%2F2000&_rdoc=16&_fmt=summary&_orig=browse&_srch=%23toc%
236768%232000%23999899998%23294357!&_cdi=6768&_sort=d&_docanchor=&_acct=C000022659&_version=1&_urlVersion=0&_userid=501045&md5=30dfbea4f043b07393b9d1d2db3767f3

5. Analytic Assessment of Telephone Transmission Impact on ASR
Performance Using a Simulation Model (2001) Sebastian Möller,
Hervé Bourlard, http://citeseer.nj.nec.com/445169.html

6. Flexible, Robust, And Efficient Human Speech Recognition (1997),
Louis C. W. Pols http://citeseer.nj.nec.com/pols97flexible.html


AUTHOR ID (thanks to Larry Heck)
http://www.clsp.jhu.edu/ws2002/groups/supersid/071002_High_Level_Info_for_Spkr_Rec.pdf

Holmes, 1985: Determining Authorship of Written Text

Doddington, 2001: Idiolect

Idiosyncratic Word/Phrase Usage in Written Text
Lessons Learned from Authorship
*Average Word Length (number of letters)
- Mendenhall (1901): Shakespeare or Bacon?
- Briengar (1963): Frequency of 2-4 letter words:
  Was Quintus Curtius Snodgrass written by Mark Twain?

*Sentence Length (number of words)
- C.B. Williams in 1940: log of sentence length has
  normal distribution
- Morton in 1965: analyzed ancient Greek writings

*Vocabulary Richness
- Thisted Efron in 1987: Newly discovered poem by Shakespeare?

*Hierarchical Cluster Analysis
- Holmes in 1992: Detected changes in authorship in Mormon scripture

Idiosyncratic Word/Phrase Usage in Written Text
Lessons Learned from Authorship Work
* The Best Features -> Frequency of Function Words
  (conjunctions, prepositions, & pronouns)
- F. Mosteller & D. Wallace (1963) Authorship of
  The Federalist papers: Hamilton or Madison?
  . Practically "twins" w/ respect to average sentence length
  . Used functions words -> successfully assigned authorship
- Peng & Hengartner, 2002
  . Canonical Discriminant Analysis: determined small set of useful words
- R.D. Peng and N.W. Hengartner "Quantitative Analysis of Literary Styles,"
  The American Statistician, 56 (3), 175-185.
  . Extension of work by Mosteller & Wallace on The Federalist Papers
  . Authorship discrimination between 9 authors
  . Used 69 function words from Miller-Newman-Friedman list


More Papers on Human Speaker ID


(Thanks to Reva Schwartz)

McGehee, F. (1937) "The Reliability of the Identification of the Human
Voice", J. Gen. Psychol., 17:249-271.

Pollack, I, Pickett, J.M., and Sumby, W.H. (1954) "On the Identification of
Speakers by Voice", J. Acoust. Soc. Am. 26(3):403-412.

Compton, A. (1963) "Effects of Filtering and Vocal Duration upon the
Identification of Speakers, Aurally", J. Acoust. Soc. Am. 35:1748-1752.

Bricker, P. and Pruzansky, S. (1966) "Effects of Stimulus Content and
Duration on Talker Identification", J. Acoust. Soc. Am. 40:1441-1450.

Young, M., and Campbell, R. (1967) "Effects of Context on Talker
Identification", J. Acoust. Soc. 42(6):1250-1254.

Reich, A.R. and Duke, J.E. (1979) "Effects of Selective Vocal Disguise Upon
Speaker Identification by Listening", J. Acoust. Soc. Am. 66:1023-1028.

Saslove, H. and Yarmey, A.D. (1980) "Long Term Auditory Memory: Speaker
Identification", J. Applied Psychol. 65:111-116.

Clifford, B.R. (1980) "Voice Identification by Human Listeners: On
Earwitness Reliability", Law Hum. Behav., 4:373-394.

Nolan, J.F. (1983) "The Phonetic Bases of Speaker Recognition", Cambridge:
Cambridge Univ. Press.

Yarmey, A.D. (1991) "Descriptions of Distinctive and Non-Distinctive Voices
Over Time", J. Forensic Sci., 31:421-428.

Kunzel, H. (1994) "On the Problem of Speaker Identification by Victims and
Witnesses", Forensic Linguistics, 1:45-58.

Broeders, A. And Rietveld, A. (1995) "Speaker Identification by
Earwitnesses", Studies in Forensic Phonetics, 64:20-40.

Schiller, N.O. and Koster, O. (1998) "The Ability of Expert Witnesses to
Identify Voices: A Comparison Between Trained and Untrained Listeners",
Forensic Linguistics, 5:1-9.

DeJong, G. (1998) "Earwitness Characteristics and Speaker Identification
Accuracy", Ph.D. dissertation, University of Florida.

Hollien, H. and Schwartz, R. (2001) "Speaker Identification Utilizing
Noncontemporary Soeech" J. Forensic Sci., 46:63-67.

Yarmey, A.D., Yarmey, A.L., Yarmey, M.J., and Parliament, L. (2001)
"Commonsense Beliefs and the Identification of Familiar Voices" Appl.
Cognit. Psychol. 15:283-299.

And here is a link to an excerpt from the 2nd edition of Doug O'Shaughnessey's
Speech Communications book, which has a nice referenced review of human
speaker recognition (and language/accent id).