ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

36
ECE 598: The Speech ECE 598: The Speech Chain Chain Lecture 7: Fourier Lecture 7: Fourier Transform; Transform; Speech Sources and Speech Sources and Filters Filters

Transcript of ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Page 1: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

ECE 598: The Speech ECE 598: The Speech ChainChain

Lecture 7: Fourier Transform;Lecture 7: Fourier Transform;

Speech Sources and FiltersSpeech Sources and Filters

Page 2: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

TodayToday Fourier Series (Discrete Fourier Transform)Fourier Series (Discrete Fourier Transform)

Sawtooth wave exampleSawtooth wave example Finding the coefficientsFinding the coefficients

SpectrogramSpectrogram Frequency ResponseFrequency Response Speech Source-Filter TheorySpeech Source-Filter Theory Speech SourcesSpeech Sources

TransientTransient FricationFrication AspirationAspiration VoicingVoicing Formant TransitionsFormant Transitions

Page 3: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Topic #1: Discrete Fourier Topic #1: Discrete Fourier TransformTransform

Page 4: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Sawtooth WaveSawtooth Wave

Sawtooth Sawtooth first first harmonic:harmonic: Amplitude: Amplitude:

0.640.64 Phase: Phase:

0.520.52 radiansradians

Page 5: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Sawtooth Second HarmonicSawtooth Second Harmonic

Sawtooth Sawtooth second second harmonic:harmonic: Amplitude: Amplitude:

0.320.32 Phase: Phase:

0.530.53 radiansradians

Page 6: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Sawtooth Third HarmonicSawtooth Third Harmonic

Sawtooth Sawtooth third third harmonic:harmonic: Amplitude: Amplitude:

0.210.21 Phase: Phase:

0.550.55 radiansradians

Page 7: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Sawtooth: Calculating the Sawtooth: Calculating the Fourier TransformFourier Transform

Average over Average over one full one full period of period of (125Hz (125Hz sawtooth * sawtooth * 125Hz 125Hz cosine) = -cosine) = -0.030.03

Page 8: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Sawtooth: Calculating the Sawtooth: Calculating the Fourier TransformFourier Transform

Average Average over one full over one full period of period of (125Hz (125Hz sawtooth * sawtooth * 125Hz sine) 125Hz sine) = 0.636= 0.636

Page 9: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Sawtooth: Amplitude and Sawtooth: Amplitude and Phase of the First HarmonicPhase of the First Harmonic

First Fourier Coefficient of the Sawtooth Wave:First Fourier Coefficient of the Sawtooth Wave: -0.03 + 0.636j-0.03 + 0.636j |-0.03 + 0.636j| = 0.64|-0.03 + 0.636j| = 0.64 phase(-0.03 + 0.636j) = 0.52phase(-0.03 + 0.636j) = 0.52 XX11 = -0.03+0.636j = 0.64 e = -0.03+0.636j = 0.64 ej0.52j0.52

First Harmonic:First Harmonic: hh11(t) = 0.64 e(t) = 0.64 ej(250j(250t+0.52t+0.52))

Re{h1(t)} = 0.64 cos(250Re{h1(t)} = 0.64 cos(250t+0.52t+0.52))

Page 10: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Sawtooth: Calculating the Sawtooth: Calculating the Fourier TransformFourier Transform

Average over Average over one full one full period of period of (125Hz (125Hz sawtooth * sawtooth * 250Hz 250Hz cosine) = -cosine) = -0.030.03

Page 11: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Sawtooth: Calculating the Sawtooth: Calculating the Fourier TransformFourier Transform

Average Average over one full over one full period of period of (125Hz (125Hz sawtooth * sawtooth * 250Hz sine) 250Hz sine) = -0.3173= -0.3173

Page 12: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Sawtooth: Amplitude and Sawtooth: Amplitude and Phase of the Second HarmonicPhase of the Second Harmonic

Second Fourier Coefficient of the Sawtooth Second Fourier Coefficient of the Sawtooth Wave:Wave: -0.03 + 0.3173j-0.03 + 0.3173j |-0.03 + 0.3173j| = 0.32|-0.03 + 0.3173j| = 0.32 phase(-0.03 + 0.3173j) = 0.53phase(-0.03 + 0.3173j) = 0.53 XX11 = -0.03+0.3173j = 0.32 e = -0.03+0.3173j = 0.32 ej0.53j0.53

First Harmonic:First Harmonic: hh22(t) = 0.32 e(t) = 0.32 ej(500j(500t+0.53t+0.53))

Re{hRe{h22(t)} = 0.64 cos(250(t)} = 0.64 cos(250t+0.53t+0.53))

Page 13: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

How to Reconstruct a Sawtooth How to Reconstruct a Sawtooth from its Harmonicsfrom its Harmonics

At sampling frequency of FS=8000Hz At sampling frequency of FS=8000Hz (8000 samples/second), the period of a (8000 samples/second), the period of a 125Hz sawtooth is 125Hz sawtooth is (8000 samples/second)/(125 cycles/second) = (8000 samples/second)/(125 cycles/second) =

64 samples64 samples The “0”th harmonic is DC (0*125Hz = 0 The “0”th harmonic is DC (0*125Hz = 0

Hz)Hz) So we need to add up 63 harmonics:So we need to add up 63 harmonics:

x(t) = hx(t) = h11(t) + h(t) + h22(t) + … + h(t) + … + h6363(t) (t)

Page 14: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Why Does This Work???Why Does This Work??? It’s a property of sines and cosinesIt’s a property of sines and cosines They are a They are a BASIS SETBASIS SET, which means that…, which means that… Let <> mean “average over one full period:”Let <> mean “average over one full period:”

<cos<cos22(m(m00t)> = 0.5 for any integer mt)> = 0.5 for any integer m

<cos(m<cos(m00t)cos(nt)cos(n00t)> = 0 if m ≠ nt)> = 0 if m ≠ n

above two properties also hold for sineabove two properties also hold for sine

<cos(m<cos(m00t) sin(nt) sin(n00t)> = 0t)> = 0

00 = 2 = 2FF00 = 2 = 2/T/T00 is the “fundamental frequency” is the “fundamental frequency” in radians/secondin radians/second

Page 15: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Topic #2: SpectrogramTopic #2: Spectrogram

Page 16: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

SpectrogramSpectrogram

Time (seconds)

Frequ

ency

(H

z)

Page 17: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

How to Create a How to Create a SpectrogramSpectrogram

Divide the waveform into overlapping windowDivide the waveform into overlapping window Example: window length = 6msExample: window length = 6ms Example: window spacing = one per 2msExample: window spacing = one per 2ms Result: neighboring windows overlap by 4msResult: neighboring windows overlap by 4ms

Fourier transform each frame to create the “short-Fourier transform each frame to create the “short-time Fourier transform” Xtime Fourier transform” Xkk(t)(t) Read: kth frequency bin, taken from the frame at time t Read: kth frequency bin, taken from the frame at time t

secondsseconds Could also write X(t,f) = Fourier coefficient from the frame Could also write X(t,f) = Fourier coefficient from the frame

at time=t seconds, from the frequency bin centered at f at time=t seconds, from the frequency bin centered at f Hz.Hz.

In pixel (k,t), plot 20*log10(abs(XIn pixel (k,t), plot 20*log10(abs(Xkk(t))(t)) The Fourier transform expressed in decibels!The Fourier transform expressed in decibels!

Page 18: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Reading a SpectrogramReading a Spectrogram

Time (seconds)

Frequ

ency

(H

z)

Vowel

Turbulence(Frication orAspiration)

Stop

Glide(ExtremeFormant Frequencies)

Each vertical stripe is a glottal closure instant(BANG – high energy at all frequencies)Inter-bang period = pitch period ≈ 10 ms

Nasals

Page 19: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Topic #3: Frequency Topic #3: Frequency ResponseResponse

andandSpeech Source-Filter TheorySpeech Source-Filter Theory

Page 20: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Frequency ResponseFrequency Response Any sound can be windowedAny sound can be windowed

Call the window length “N”Call the window length “N” Its windows can be Fourier transformed:Its windows can be Fourier transformed:

x(t) = hx(t) = h11(t) + h(t) + h22(t) + …(t) + …

x(t) = Xx(t) = X11eejj00tt + X + X22ee2j2j00tt + … + …

What happens when you filter a cosine?What happens when you filter a cosine?Input = eInput = ejj00tt --- Output = H( --- Output = H(00)e)ejj00tt

Input = eInput = e2j2j00tt --- Output = H(2 --- Output = H(200)e)e2j2j00tt

……

Page 21: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Frequency ResponseFrequency Response

If x(t) goes through any filter:If x(t) goes through any filter:x(t) x(t) → ANY FILTER →→ ANY FILTER → x(t) x(t)

Then we can find y(t) using freq Then we can find y(t) using freq response!response!

x(t) = Xx(t) = X11eejj00tt + X + X22ee2j2j00tt + … + …

y(t) = H(y(t) = H(00)X)X11eejj00tt + H(2 + H(200)X)X22ee2j2j00tt + … + …

Page 22: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Frequency ResponseFrequency Response

In general, we can writeIn general, we can write Y(Y() = H() = H()X()X())

Page 23: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Examples of FiltersExamples of Filters Low-Pass Filter:Low-Pass Filter:

Eliminates all frequency components above some cutoff Eliminates all frequency components above some cutoff frequencyfrequency

Result sounds muffledResult sounds muffled High-Pass Filter:High-Pass Filter:

Eliminates all frequency components below some cutoff Eliminates all frequency components below some cutoff frequencyfrequency

Result sounds fizzyResult sounds fizzy Band-Pass Filter:Band-Pass Filter:

Eliminates all components except those between fEliminates all components except those between f11 and f and f22

Result: sounds like an old radio broadcastResult: sounds like an old radio broadcast Band-Stop Filter:Band-Stop Filter:

Eliminates only the frequency components between f1 and Eliminates only the frequency components between f1 and f2f2

Result: usually hard to hear that anything’s happened!Result: usually hard to hear that anything’s happened!

Page 24: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Examples of FiltersExamples of Filters A Room:A Room:

Adds echoes to the signalAdds echoes to the signal A Quarter-Wave Resonator:A Quarter-Wave Resonator:

Emphasizes frequency components at f=(c/4L)+Emphasizes frequency components at f=(c/4L)+(nc/2L), for all integer n(nc/2L), for all integer n

Components at other frequencies are passed Components at other frequencies are passed without any emphasiswithout any emphasis

A Vocal Tract:A Vocal Tract: Emphasizes any part of the input that is near a Emphasizes any part of the input that is near a

formant frequencyformant frequency Any energy at other frequencies is passed Any energy at other frequencies is passed

without emphasiswithout emphasis

Page 25: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

The Source-Filter Model of The Source-Filter Model of Speech ProductionSpeech Production

Input: Weak SourcesInput: Weak Sources Glottal closureGlottal closure Turbulence (frication, aspiration)Turbulence (frication, aspiration)

Filter: The Vocal TractFilter: The Vocal Tract Energy near formant frequency is emphasized Energy near formant frequency is emphasized

by a factor of 10 or a factor of 100by a factor of 10 or a factor of 100 Energy at other frequencies is passed without Energy at other frequencies is passed without

emphasisemphasis Speech = (Vocal Tract Transfer Function) X Speech = (Vocal Tract Transfer Function) X

(Excitation)(Excitation) S(S() = T() = T() E() E())

Page 26: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Topic #4: Speech Sources.Topic #4: Speech Sources.Presented as:Presented as:

Events in the Release of a Events in the Release of a Stop ConsonantStop Consonant

Page 27: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Events in the Release of a Events in the Release of a StopStop

“Burst” = transient + frication (the part of the spectrogram whose transfer function has poles only at the front cavity resonance frequencies, not at the back cavity resonances).

Page 28: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Pre-voicing during ClosurePre-voicing during ClosureTo make a voiced stop in most European languages:

Tongue root is relaxed, allowing it to expandm so that vocal folds can continue to vibrating for a little while after oral closure.

Result is a low-frequency “voice bar” that may continue well into closure.

In English, closure voicing is typical of read speech, but not casual speech.

“the bug”

Page 29: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Burst = Transient and FricationBurst = Transient and FricationFrication = Turbulence at a Frication = Turbulence at a

ConstrictionConstriction

Front cavity resonance frequency: FR = c/4Lf

Turbulence striking an obstacle makes noise

The fricative “sh” (this ship)

Page 30: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Aspiration = Turbulence at the Glottis Aspiration = Turbulence at the Glottis (like /h/). Transfer Function During (like /h/). Transfer Function During

Aspiration…Aspiration…

Page 31: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Formant Formant Transitions: Transitions:

Labial Labial ConsonantsConsonants

“the mom”

“the bug”

Page 32: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Formant Formant Transitions: Transitions:

Alveolar Alveolar ConsonantsConsonants

“the tug”

“the supper”

Page 33: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Formant Formant Transitions: Transitions: Post-alveolar Post-alveolar ConsonantsConsonants

“the shoe”

“the zsazsa”

Page 34: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Formant Formant Transitions: Transitions:

Velar Velar ConsonantsConsonants

“the gut”

“sing a song”

Page 35: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Formant Transitions: A Formant Transitions: A Perceptual StudyPerceptual Study

The study: (1) Synthesize speech with different formant patterns, (2) recordsubject responses. Delattre, Liberman and Cooper, J. Acoust. Soc. Am. 1955.

Page 36: ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Perception of Formant Perception of Formant TransitionsTransitions