Structure of Human Speech Chris Darwin Vocal Tract.

Structure of Human SpeechStructure of Human Speech

Chris Darwin

Vocal TractVocal Tract

Pitch and FormantsPitch and Formants

1. Harmonics (giving pitch) produced by vocal cord vibration

frequency125 Hz (fundamental)

FormantsF1 = 396HzF2 = 1520Hz

F3 = 1940Hz

2. Formant frequencies: resonances of the vocal tract

3. Formant frequencies change as you change the shape of your vocal tract

Source & FilterSource & Filter

Larynx Vocal tract Output sound

Sex changeSex change

Me (m)

Higher pitch Shorter vocal-tract(higher formants)

Both (-> f)

Prosody vsProsody vsSegmentsSegments

Segmental: consonants / vowels -> words

Prosodic: pitch contour, stress. Emphasis, pragmatics.

“I thought she was married?” & "….!" “I thought she was married.”“I thought she was married!”“I thought she was married!”

NB in tone languages pitch used segmentally.

Are bird/mammal animal systems like human prosody?

generally use different pitch contours

yes!

Vowel Vowel productionproduction

narrow-band spectrogramnarrow-band spectrogram

sine-wave speechsine-wave speech

Sine-wave speechSine-wave speech

Orchestra in your throatOrchestra in your throat

Tuvan throat musicTuvan throat music

Mynah bird Mynah bird speechspeech

Klatt &Stefanski (1974) How does a mynah bird imitate human speech?J Acoust Soc Amer, 55, 822-832.

Mynah / Mynah / Grey parrotGrey parrot

• Mynah produces "formants" but probably through

changing syrinx resonances, not through changing vocal

tract shape. (Klatt & Stefanski, 1974, J Acoust Soc Amer)

• Grey parrot has a longer vocal tract and may use changes

in its shape to produce formant variation (more like human

speech). (Warren, Patterson, Pepperburg, 1996, Auk)

Characteristics of Characteristics of speechspeech

•No gaps between words

•Smoothly changing sound from one speech sound to the next

•So you can’t just shuffle the acoustic “words”

Only silence is /g/ of “ago”

Narrow-band spectrogram

from Clive Frankish

Speech is more like semaphore than Speech is more like semaphore than like musiclike music

• Music: discrete targets giving discrete acoustic events

• Semaphore: discrete targets with transitions between targets

• Speech: articulatory transitions between targets

SemaphoreSemaphore

Formants in a wide-band Formants in a wide-band spectrogramspectrogram

“w e g o”

<-- Formant transitions ------->

<-- F1

<-- F 2

<-- F 3Burst -->

time

"bag"

Where are the segments?

F1

F2

F3

/bæg/

Where are the segments?Where are the segments?

Speech is more like speech than like Speech is more like speech than like semaphoresemaphore

Speech does not have invariant acoustic targets:

consonants change with the vowel.

Compare /s/ in /si/

with /s/ in /su/

This is due to co-articulation.

Different transition - same consonantDifferent transition - same consonant

dee da

1400 Hz<-- Formant transitions ------->

Liberman et al. (1967) Perception of the speech code. Psych Rev 74, 431-461

Co-articulationCo-articulation

Arises because (mainly) consonant gestures don’t involve all the articulators:

eg /b/ is lips only, tongue free to take up position for next vowel.

/d/ and /s/ just involve the tongue tip, touching the alveolar ridge, tongue body and lips free to take up position for next vowel - viz. /si/ /su/.

Same noise - different consonantSame noise - different consonant

pea ka

1400 Hz

F 2

F 1

Burst -->

Liberman et al. (1967) Perception of the speech code. Psych Rev 74, 431-461

Two articulatory systemsTwo articulatory systems

Öhman suggested that articulation can be decomposed into two semi-independent systems:

Slow movement from one vowel target to nexteg /i/ -> /u/

Rapid consonantal movement superimposedeg /b/ /d/

So the /b/ in /ibu/ is not the same as in /ibi/


Advantages 1. information about different segments is spread across time (Hockett’s squashed eggs).

You know that a /u/ is coming because of the type of /s/ you have heard.


2. Liberman thought that

this spreading across

time makes it easier to

transmit information at a

fast rate. Liberman et al (1967) Psych

Rev 74, 431-461

The disadvantage of co-articulation for perception is that there are no constant acoustic targets in speech.

The same phoneme can be represented as different sounds in different contexts (/s/ before /u/ or /i/.

Conversely, the same sound, can be heard as different consonants in different contexts (eg as /p/ before /i/ and /a/ but as /k/ before /u/).

Co-articulation - 2Co-articulation - 2

Speech CodeSpeech Code

• Articulatory movement

• Co-articulation

• Rapid speech /djewonega?at/

• Different vocal-tract sizes:

men 15% longer than women

• Different dialects --->

ieEa A

ou

Dialect versions of /au/ as in "now"

Berks

Somerset

CockneyRP

N Ireland

Wales

F1

F2

Factors that make it hard (for machines) to recognise speech

Structure of Human Speech Chris Darwin Vocal Tract.

Documents

Transcript of Structure of Human Speech Chris Darwin Vocal Tract.