As concerns the long-lasting kinematic vs. static
controversy in vowel perception, our claims, based on recent experiments
on glides and vowel reduction (carried out by Cathiard et al. and Loevenbruck
et al.), will be that kinematics is needed when signals are (i) undersampled
or (ii) noisy. Point-light talking faces or sinewave speech typically belong
to the first case. Undershot vowels add to the case of undersampling, while
cue-tracking in speech in noise is a typical case of CASA, be it a general-purpose
or a specific device for speech and/or music.
Our perceptual results with natural stimuli and
with a non-linear biomechanical model of control will be discussed in the
framework of a double-component account of vowel gestures. Recalling that
the concept can probably be traced back to Sweet, we reset this 2-COMP-VOWEL
in a theory formulated within a neural model of goal-directed actions (developped
by Jeannerod & Boussaoud). Basically, speech sound mouthing is considered
as an outgrowth, an evolutionary *homologue* - say an *exaptation*, via
the vocal self monitoring system - departing from the stem function of
mouth grasping, itself related to hand capture in primates. The carrier
component of vowel mouthing is named *placing*, homologous to the reaching-convey
component in mouth "handling". The second component is called *shaping*,
homologous to grip formation (preshaping). When both components are fairly
synchronous, "steady-state" vowels (typically French and German long vowels)
are produced. Glides occur when *shaping* is relaxed asynchronously respective
to *placing* changes. Such an asynchrony gives rise to glide-epenthesis
in the transition between vowels (French, German), or to diphthongal on/off-glides
(in English, Swedish, etc.).
Of course, like other epenthetic phenomena, glides
can be recovered as true phonological controls. But even in this case there
is no evidence that motion in the vocalic gesture could offer more than
a mere processing benefit for undersampled signals, through a neural *shape-from-motion*
mechanism, without any kinematic status of the phonetic long-term representation.