Publication Details

Title: The Auditory Organization of Speech in Listeners and Machines
Author: M. Cooke and D. P.W. Ellis
Group: ICSI Technical Reports
Date: June 1998
PDF: ftp://ftp.icsi.berkeley.edu/pub/techreports/1998/tr-98-016.pdf

Overview:
Speech is typically perceived against a background of other sounds. Listeners are adept at extracting target sources from the acoustic mixture reaching the ears. The auditory scene analysis account holds that this feat is the result of a two stage process. In the first stage, sound is decomposed both within and across auditory nuclei. Subsequent processes of perceptual organization are informed both by cues which suggest a common source of origin and prior experience. These operate on the decomposed auditory scene to extract coherent evidence for one or more sources for subsequent processing. Auditory scene analysis in listeners has been studied for several decades and recent years have seen a steady accumulation of computational models of perceptual organization. The purpose of this review is to describe the evidence for auditory organization in listeners and to explore the computational models which have been motivated by such evidence. The primary focus is on speech rather than on sources such as polyphonic music or nonspeech ambient backgrounds, although these other domains are equally amenable to auditory organization. The review concludes with a discussion of the relationship between auditory scene analysis and alternative approaches to sound source segregation.

Bibliographic Information:
ICSI Technical Report TR-98-016

Bibliographic Reference:
M. Cooke and D. P.W. Ellis. The Auditory Organization of Speech in Listeners and Machines. ICSI Technical Report TR-98-016, June 1998