Structure of Human Speech Chris Darwin Vocal Tract.
-
Upload
august-sullivan -
Category
Documents
-
view
223 -
download
0
Transcript of Structure of Human Speech Chris Darwin Vocal Tract.
Pitch and FormantsPitch and Formants
1. Harmonics (giving pitch) produced by vocal cord vibration
frequency125 Hz (fundamental)
FormantsF1 = 396HzF2 = 1520Hz
F3 = 1940Hz
2. Formant frequencies: resonances of the vocal tract
3. Formant frequencies change as you change the shape of your vocal tract
Prosody vsProsody vsSegmentsSegments
Segmental: consonants / vowels -> words
Prosodic: pitch contour, stress. Emphasis, pragmatics.
“I thought she was married?” & "….!" “I thought she was married.”“I thought she was married!”“I thought she was married!”
NB in tone languages pitch used segmentally.
Are bird/mammal animal systems like human prosody?
generally use different pitch contours
yes!
Mynah bird Mynah bird speechspeech
Klatt &Stefanski (1974) How does a mynah bird imitate human speech?J Acoust Soc Amer, 55, 822-832.
Mynah / Mynah / Grey parrotGrey parrot
• Mynah produces "formants" but probably through
changing syrinx resonances, not through changing vocal
tract shape. (Klatt & Stefanski, 1974, J Acoust Soc Amer)
• Grey parrot has a longer vocal tract and may use changes
in its shape to produce formant variation (more like human
speech). (Warren, Patterson, Pepperburg, 1996, Auk)
Characteristics of Characteristics of speechspeech
•No gaps between words
•Smoothly changing sound from one speech sound to the next
•So you can’t just shuffle the acoustic “words”
Only silence is /g/ of “ago”
Narrow-band spectrogram
Speech is more like semaphore than Speech is more like semaphore than like musiclike music
• Music: discrete targets giving discrete acoustic events
• Semaphore: discrete targets with transitions between targets
• Speech: articulatory transitions between targets
Formants in a wide-band Formants in a wide-band spectrogramspectrogram
“w e g o”
<-- Formant transitions ------->
<-- F1
<-- F 2
<-- F 3Burst -->
Speech is more like speech than like Speech is more like speech than like semaphoresemaphore
Speech does not have invariant acoustic targets:
consonants change with the vowel.
Compare /s/ in /si/
with /s/ in /su/
This is due to co-articulation.
Different transition - same consonantDifferent transition - same consonant
dee da
1400 Hz<-- Formant transitions ------->
Liberman et al. (1967) Perception of the speech code. Psych Rev 74, 431-461
Co-articulationCo-articulation
Arises because (mainly) consonant gestures don’t involve all the articulators:
eg /b/ is lips only, tongue free to take up position for next vowel.
/d/ and /s/ just involve the tongue tip, touching the alveolar ridge, tongue body and lips free to take up position for next vowel - viz. /si/ /su/.
Same noise - different consonantSame noise - different consonant
pea ka
1400 Hz
F 2
F 1
Burst -->
Liberman et al. (1967) Perception of the speech code. Psych Rev 74, 431-461
Two articulatory systemsTwo articulatory systems
Öhman suggested that articulation can be decomposed into two semi-independent systems:
Slow movement from one vowel target to nexteg /i/ -> /u/
Rapid consonantal movement superimposedeg /b/ /d/
So the /b/ in /ibu/ is not the same as in /ibi/
Co-articulationCo-articulation
Advantages 1. information about different segments is spread across time (Hockett’s squashed eggs).
You know that a /u/ is coming because of the type of /s/ you have heard.
Co-articulationCo-articulation
2. Liberman thought that
this spreading across
time makes it easier to
transmit information at a
fast rate. Liberman et al (1967) Psych
Rev 74, 431-461
The disadvantage of co-articulation for perception is that there are no constant acoustic targets in speech.
The same phoneme can be represented as different sounds in different contexts (/s/ before /u/ or /i/.
Conversely, the same sound, can be heard as different consonants in different contexts (eg as /p/ before /i/ and /a/ but as /k/ before /u/).
Co-articulation - 2Co-articulation - 2
Speech CodeSpeech Code
• Articulatory movement
• Co-articulation
• Rapid speech /djewonega?at/
• Different vocal-tract sizes:
men 15% longer than women
• Different dialects --->
ieEa A
ou
Dialect versions of /au/ as in "now"
Berks
Somerset
CockneyRP
N Ireland
Wales
F1
F2
Factors that make it hard (for machines) to recognise speech