Acoustic Phonetics
Investigating physical properties of speech sounds
SONUS Reviving 2
Speech Sound Representation Reconsidered
Articulatory phonetic approach: Describing sounds depending on how they are producedProblems of this approach Representation is only in terms of symbols Sounds are not like that in
reality It’s not reflected that some sounds are more confusing each other when
perceived while others are not Eg) i/e vs. a/k s/f vs. s/m
So we need another way of describing speech sounds
SONUS Reviving 3
Acoustic representation of speech sounds
Representing sounds as they are Visual other than symbolic representation
Depending more upon perception than production or articulationPhysical properties are analyzedSimilarities and differences of sounds are disclosed
SONUS Reviving 4
Acoustic definition of sound
Variation in air pressureMovements of air particlesAn audible disturbance of a medium produced by a source The source: any object that vibrates
Eg) musical instruments, human vocal cords, microphone The medium: any elastic object that carries vibration
Eg) air, water
SONUS Reviving 5
Advantages of acoustic representation
Real/physical mechanism of speech communication is represented No convention, no confusion, no controversy
Gradual change of sounds are shown Example) How loud a sound is
Small variations are shownHelpful for understanding how computers synthesize speech and how speech recognition works
SONUS Reviving 6
What to represent?
Three aspects sounds that can differ Pitch Loudness Quality (Length)
SONUS Reviving 7
How to represent acoustically?
Sound is air particle movementsThe best and agreed way of expressing air particle movements:Waveform
Another necessary way of representing sound:Spectrum
Waveforms
SONUS Reviving 9
Waveform properties
Simple harmonic movement + Time elapse WaveformIndividual particles move only backward and forward
SONUS Reviving 10
Time
Displacement
Air particle movement
No force
Initial force
Inertia
Elasticity
Elasticity
SONUS Reviving 11
Simple Waveform
SONUS Reviving 12
Simulated Air Particle Movement
SONUS Reviving 13
Waveform as Air Pressure Representation
SONUS Reviving 14
Speech sound properties shown in waveforms
Differentiation of sounds Sounds are different, which is crucial in human speech as
a communication method
Ways in which sounds can differ Perceptually: Pitch, Loudness, Quality Acoustically: Frequency, Amplitude, Phase
Waveform shows differences in Acoustic correlate of Loudness Amplitude Acoustic correlate of Pitch Frequency
SONUS Reviving 15
Waves of Different Amplitudes (Loudness)
Time (s)0 0.05
-0.5
0.5
0
Time (s)0 0.05
-1
1
0
Time (s)0 0.05
-0.5
0.5
0
Time (s)0 0.05
-1
1
0
(a)
(b)
amplitu
de
SONUS Reviving 16
Amplitude (cntd.)
Air pressure fluctuationThe extent of the maximum variation in air pressure from normal during a soundUnit: Bel, Decibel(dB; 1/10 of Bel), Bark
dB: Common logarithm of power ratiosTwice amplitude is not heard as twice loudLoud sound: particles move farther and more rapidly
SONUS Reviving 17
Waves of Different Frequencies (Pitch)
(a)
(b)
Time (s)0 0.05
-0.5
0.5
0
Time (s)0 0.05
-0.5
0.5
0
Period
P
SONUS reviving 18
Frequency (cntd.)
The rate at which sound source vibrates Sound sources: tuning forks, vocal cords, etcUnits: Hz, cps (cycle per second)Depending upon Length of the pendulum Length of tuning fork prongsF(requency) = 1/T(period)
SONUS Reviving 19
Frequency (cntd.)
Standard A frequency: 440 HzOctave: a note which is exactly twice the frequency of another note Eg) A(440Hz), A’(880Hz), A’’(1760Hz)Audible Frequency Human: 20Hz(or16Hz) – 20KHz Bats: 20KHz – 100KHzFastest telephone vibration: 35KHzMost of the human speech sound frequency: below 8KHz
SONUS Reviving 20
Frequency (cntd.)
Pitch and frequency are not in linear relationship Only in the low frequency, fairly linear 600-700Hz difference sounds greater than
3600-3700Hz difference
SONUS Reviving 21
Waves of Different Phases
Time (s)0 0.05
-0.5
0.5
0
Time (s)0 0.05
-0.5
0.5
0
SONUS Reviving 22
Phase (cntd.)
Phase differences cause different waveforms
But
Human ears do not perceive phase differences
SONUS Reviving 23
Waveform is not sufficient..
Two sounds with the same pitch and loudness can still differ Example) Violin A vs. Piano A Example) [i] vs. [a]
Another way of representation needed Spectrum
SONUS Reviving 24
More about waveform first..
To know about spectrum and its representation of quality, we need to know more about waveform
SONUS Reviving 25
Types of Waveforms:Pure tones vs. Complex waves
Most sounds, including human speech, sources produce complex vibrationsPure tone: single harmonic motion (SHM), with only one frequencyComplex wave: more than one harmonic motion, multiple frequency Pure tone + pure tone of the same frequency and phase
another pure tone Pure tone + pure tones of different frequency a
complex tone
SONUS Reviving 26
Pure Tone(Simple Wave,
simple harmonic motion,Sinusoid,
Sine wave)
SONUS Reviving 27
Complex Wave
Time (s)0 0.05
-2.499
2.499
0
100 Hz + 200 Hz + 300 Hz
SONUS Reviving 28
Complex wave
Time (s)0 0.05
-2.499
2.499
0
Time (s)0 0.0195395
-0.1355
0.1318
0
[a] produced by a female speaker
SONUS Reviving 29
Types of Waveform:Repetitive vs. non-repetitive wave
Strictly Repetitive (periodic): sine wave, ideal soundsVirtually Repetitive (periodic): vowels, sonorantsNon-repetitive (aperiodic): obstruents white noise (most complex) click
SONUS Reviving 30
Periodic vs non-periodic wave
Time (s)0 0.05
-2.499
2.499
0
Time (s)0 0.0195395
-0.1355
0.1318
0
Time (s)0 0.0732916
-0.08255
0.08606
0
Time (s)0 0.0732916
-0.08255
0.08606
0
Aperiodic [s] Periodic wave [a]
SONUS Reviving 31
Limitation of Waveform Representation
Sound can be heard in 3 different way Loudness, Pitch, Quality
Quality can’t be represented directly in waveforms A new way of representation needed Spectrum
Spectrum
SONUS Reviving 33
Background Knowledge on Spectrum
Sound waves can be either simple or complex Simple: sinusoid Complex: Combined simple waves with different
frequency Sound quality can be determined by the way such
simple waves combine into a complex wave If a complex wave can be split into each simple wave
we can see the secret
SONUS Reviving 34
Waveform and Spectrum(100Hz + 200Hz + 300Hz )
Time (s)0 0.05
-5.354
5.354
0
Time (s)0 0.025
-5.354
5.354
0
Wave
4
2
200 300100
Spectrum
Hz
SONUS Reviving 35
Example of Spectrum
SONUS Reviving 36
Example of Spectrum
SONUS Reviving 37
Formants shown in spectrum
Frequency component(s) with boosted energyFormant frequency: Its frequencyReason for formant shaping: Filtering function in vocal tractDecisive aspect of sound qualityFor vowels three formants (F1, F2, F3) are especially important for their distinction
SONUS Reviving 38
How Formants are Made(source-filter theory)
Source: sound creation airflow formulation
eg) pulmonic airstream mechnism eg) egressive, ingressive
airflow modulation eg) phonation
Filter: sound modification resonance properties of vocal tract Articulator movements
SONUS Reviving 39
How Formants are Made (cntd.)(source-filter theory)
SONUS Reviving 40
An Example of Formant :
Vowel []
F1
F2
F3
SONUS Reviving 41
An Example of Formant:Vowel []
F1F2
F3
0123456
50 300
550
80010
50
1300
1550
1800
2050
2300
2550
2800
3050
3300
3550
3800
4050
Hz
Am
plitu
de
SONUS Reviving 42
Disadvantages of Spectrum Representation
Less intuitive X-axis denotes frequency level No time varying representation
Hard to see interaction with WaveformsThus, a new way of representation needed Spectrogram
Spectrogram & its reading
SONUS Reviving 44
What is spectrogram?
Begin to be used since 1940sAnother representation of frequency domain analysisThe most popular way of representing spectral information3 dimensional representation
X-axis: Time Y-axis: Frequency Darkness (or color): Energy
SONUS Reviving 45
Waveform & Spectrogram aligned
SONUS Reviving 46
Spectrogram example (color resolution of word “compute”)
SONUS Reviving 47
Spectrogram example (grayscale of word “compute”)
SONUS Reviving 48
Types of spectrogram
Wideband spectrogram better time resolution
Narrowband spectrogram better frequency resolution
SONUS Reviving 49
Wideband vs. Narrowbandspectrograms of the question "Is Pat sad, or mad?" The 5th, 10th and 15th harmonics have been marked by white squares in two of
the vowels
SONUS Reviving 50
Advantages & Disadvantages
Advantages Time alignment
Disadvantages Less reliable than waveform
Spectrogram Reading
SONUS Reviving 52
Vowel Spectrogram
Formant frequencies are critical cues for vowel distinctionF1: Height high vowels: low F1
F2: Backness back vowels: low F2
SONUS Reviving 53
Examples of formant frequencies of English monophthongs
F3F3 290
0255
0249
0249
0264
0238
0230
0250
0239
0
F2F2 2250
1900
1770
1660
1100
1030
870 1500
1190
F1F1 280 400 550 690 710 450 310 900 640
SONUS Reviving 54
"heed, hid, head, had, hod, hawed, hood, who'd" (a male speaker, American English)
From http://hctv.humnet.ucla.edu/departments/linguistics
SONUS Reviving 55
Consonant Spectrogram
General Acoustic structure more complicated than
vowels Adjacent sounds (especially vowels) convey
important information locus High frequency characteristics
especially for fricatives and affricates
SONUS Reviving 56
What is LOCUS
Information of formant transition from vowels into obstruents or from obstruents into vowelsThe target frequency that each formant transition is heading toward as an obstruction is made, or the frequency the transition comes as the obstruction is releasedThe characteristic of the consonantal place and manner roughly the same in different vowel contexts
SONUS Reviving 57
Stops
General Fairly distinct locus for each place Burst Silence during the closure (only at syllable
onset position) Virtually no difference during the closure
SONUS Reviving 58
Stops (cntd.)
Voicing distinction voiced: vertical striations for voiced sounds,
less abrupt burst, frequently weakened to be like fricatives or approximants
voiceless: generally abrupt burst at higher frequency area
SONUS Reviving 59
Example (voiceless vs voiced stops)
SONUS Reviving 60
Stops (cntd.)
Place distinction bilabial
relatively low F2, F3 locus rising into and falling out of vowel
weak and spread vertical lines alveolar
F2 locus about 1800 Hz Strong vertical lines
velar Velar pinch: vowels F2, F3 merging often double burst long formant transitions
SONUS Reviving 61
Stops (cntd.)
Manner distinction Silence duration, VOT, Following V F0
silence VOT F0
Aspirated short long high
Tense ’ long short high
Lax mid mid low
SONUS Reviving 62
Examples -- “a bab, a dad, a gag”
SONUS Reviving 63
Place dependent loci
SONUS Reviving 64
Fricatives
General Random noise pattern especially in high frequency regions Place distinction
Labiodental [f, v]: rising locus into the following vowel Dental []: major energy above 6000Hz Alveolar [s, z]: major energy above 4000Hz Alveopalatal []: major energy above 2000Hz Glottal [h]: the trace of formant frequencies of neighbouring vowel
s
SONUS Reviving 65
Fricatives (cntd.)
Weak vs. strong Strong []: darker bands Weak []: spread and fainter
Voiced [ ]: often so weak and confused with nasals or approximants
Cues to tell [] from []: higher formants of [] fall into adjacent vowels
SONUS Reviving 66
Example – voiceless fricatives
SONUS Reviving 67
Example – “fie, thigh, sigh, shy”
SONUS Reviving 68
Example – “ever, weather, fizzer, pleasure”
SONUS Reviving 69
Affricates
General Silence as in stops or low energy interval Noise as in fricatives Silence Noise as in
Low frequency energy (or voice bar) Noise as in
SONUS Reviving 70
Example – affricates “I gotcha, Jerry”
SONUS Reviving 71
Nasals
General Formants similar to vowels but fainter Relatively rapid formant transitions Very low F1 (about 250Hz), F2 (about 2500Hz), and F3
(about 3250Hz)
Place distinction bilabial []: downward F2, F3 locus alveolar []: less amount of F2 transition velar [ ]: velar pinch
SONUS Reviving 72
Example – Nasals “many angles”
SONUS Reviving 73
Examples -- “a Pam, a tan, a kang”
SONUS Reviving 74
Liquids & Approximants
General Formants similar to vowels but fainter
(especially at high frequency regions) Approximately F1(250Hz), F2(1200Hz),
F3(2400Hz) Slow formant movements
SONUS Reviving 75
Liquids & Approximants(cntd.)
Phone specific properties Labial glide [w]:
very low F1, F2 (600-1000Hz|) and gets too close to each relatively low F3 rapid falloff of spectral amplitude (formant movements)
Palatal glide [y]: extremely low F1 extremely high F2, F3
SONUS Reviving 76
Liquids & Approximants(cntd.)
Phone specific properties (cntd.) Flap []: soft burst, short duration Retroflex []:
F3 dipping down close to F2 General lowering of F3, F4
Lateral []: Low F1, F2 (approx. F1 250Hz, F2 1200Hz) usually substantial energy in the high F region Relatively slow formant transition (cf. [n])
SONUS Reviving 77
Example – liquids & glides “a yellow array”
SONUS Reviving 78
Example – “led, red, wed, yell”
SONUS Reviving 79
Final remarks on spectrogram
Spectrogram is not the only cue for acoustic distinction of speech sounds.When there is a mismatch between waveform & spectrogram, the waveform is more reliable in general.
SONUS Reviving 80
Useful links
Spectrogram cues for phonemes (GA accent) CSLU at OGI
Top Related