Publication Details

Title: Pitch-based Vocal Tract Length Normalization
Author: A. Faria
Group: ICSI Technical Reports
Date: November 2003
PDF: ftp://ftp.icsi.berkeley.edu/pub/techreports/2003/tr-03-001.pdf

Overview:
This paper investigates the correlation between fundamental frequency and resonant frequencies in speech, exploiting this relation for vocal tract length normalization (VTLN). By observing a speaker's average pitch, it is possible to estimate the appropriate frequency warping factor which will transform a spectral representation into one with less variation of the formants. I use a function of pitch that maps to a corresponding frequency warping factor. An exploration of speaker and vowel characteristics in the TIMIT speech corpus is used to optimize the parameters of this function. The approach presented here is a potentially simpler alternative to existing VTLN algorithms which derive the warping factor by other means. Recognizer results indicate that the pitch-based approach compares favorably against other methods; furthermore, performance could be further improved by using a warping function that is not strictly linear.

Bibliographic Information:
ICSI Technical Report TR-03-001

Bibliographic Reference:
A. Faria. Pitch-based Vocal Tract Length Normalization. ICSI Technical Report TR-03-001, November 2003