Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying...

54
Topic 2 Signal Processing Review (Some slides are adapted from Bryan Pardo’s course slides on Machine Perception of Music)

Transcript of Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying...

Page 1: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Topic 2

Signal Processing Review

(Some slides are adapted from Bryan Pardo’s course slides on Machine Perception of Music)

Page 2: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Recording Sound

MechanicalVibration

PressureWaves

Motion->VoltageTransducer

Voltage over time

2ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 3: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Microphones

http://www.mediacollege.com/audio/microphones/how-microphones-work.html

3ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 4: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Pure Tone = Sine Wave

time amplitude frequency initial phase

𝑥 𝑡 = 𝐴 sin(2𝜋𝑓𝑡 + 𝜑)

Time (ms)

Am

plit

ude

0 2 4 6-1

0

1440Hz

Period T

4ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 5: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Reminders

• Frequency, 𝑓 = 1/𝑇, is measured in cycles per second , a.k.a. Hertz (Hz).

• One cycle contains 2𝜋 radians.

• Angular frequency Ω, is measured in radians per second and is related to frequency by Ω = 2𝜋𝑓.

• So we can rewrite the sine wave as

𝑥 𝑡 = 𝐴 sin(Ω𝑡 + 𝜑)

5ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 6: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Fourier Transform

Time (ms)

Am

plit

ude

0 2 4 6-1

0

1

𝑋 𝑓 = −∞

𝑥(𝑡)𝑒−𝑗2𝜋𝑓𝑡𝑑𝑡

Am

plit

ude

Frequency (Hz)0 440-440

|𝑋 𝑓 |

6ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 7: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

We can also write

Time (ms)

Am

plit

ude

0 2 4 6-1

0

1

𝑋 Ω = −∞

𝑥(𝑡)𝑒−𝑗Ω𝑡𝑑𝑡

Am

plit

ude

Angular Frequency (radians)

0 440 × 2𝜋−440 × 2𝜋

|𝑋 𝑓 |

7ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 8: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Complex Tone = Sine Waves

0 10 20 30 40 50 60 70 80 90 100-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

0 10 20 30 40 50 60 70 80 90 100-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

0 10 20 30 40 50 60 70 80 90 100-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

0 10 20 30 40 50 60 70 80 90 100-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

+

+

=

220 Hz

660 Hz

1100 Hz

8ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 9: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Frequency Domain

Am

plit

ude

Frequency (Hz)

Time (ms)

Am

plit

ude

0 10 20 30 40 50 60 70 80 90 100-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

220 660 1100

𝑋 𝑓 = −∞

𝑥(𝑡)𝑒−𝑗2𝜋𝑓𝑡𝑑𝑡|𝑋 𝑓 |

9ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 10: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Harmonic Sound

• 1 or more sine waves

• Strong components at integer multiples of a fundamental frequency (F0) in the range of human hearing (20 Hz ~ 20,000 Hz)

• Examples

– 220 + 660 + 1100 is harmonic

– 220 + 375 + 770 is not harmonic

10ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 11: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Noise

• Lots of sines at random freqs. = NOISE

• Example: 100 sines with random frequencies, such that 100 < 𝑓 < 10000.

0 0.5 1 1.5 2 2.5 3 3.5

x 104

-30

-20

-10

0

10

20

30

11ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 12: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

How strong is the signal?

• Instantaneous value?

• Average value?

• Something else?

0 2 4 6-1

0

1

0 0.5 1 1.5 2 2.5 3 3.5

x 104

-30

-20

-10

0

10

20

30

𝑥 𝑡

12ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 13: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Acoustical or Electrical

• Acoustical

𝐼 =1

𝜌𝑐

1

𝑇𝐷 0

𝑇𝐷

𝑥2 𝑡 𝑑𝑡

• Electrical

𝑃 =1

𝑅

1

𝑇𝐷 0

𝑇𝐷

𝑥2 𝑡 𝑑𝑡

View 𝑥 𝑡 as

sound pressureAverage intensity

View 𝑥 𝑡 as

electric voltageAverage power

densitysound speed

resistance

13ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 14: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Root-Mean-Square (RMS)

𝑥𝑅𝑀𝑆 =1

𝑇𝐷 0

𝑇𝐷

𝑥2 𝑡 𝑑𝑡

• 𝑇𝐷 should be long enough.

• 𝑥(𝑡) should have 0 mean, otherwise the DC component will be integrated.

• For sinusoids

𝑥𝑅𝑀𝑆 =1

𝑇 0

𝑇

𝐴2sin2 2𝜋𝑓𝑡 𝑑𝑡 = 𝐴2/2 = 0.707𝐴

14ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 15: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Sound Pressure Level (SPL)

• Softest audible sound intensity 0.000000000001 watt/m2

• Threshold of pain is around 1 watt/m2

• 12 orders of magnitude difference

• A log scale helps with this

• The decibel (dB) scale is a log scale, with respect to a reference value

15ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 16: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

The Decibel

• A logarithmic measurement that expresses the magnitude of a physical quantity (e.g., power or intensity) relative to a specified reference level.

• Since it expresses a ratio of two (same unit) quantities, it is dimensionless.

𝐿 − 𝐿ref = 10 log10𝐼

𝐼ref

= 20 log10𝑥𝑅𝑀𝑆𝑥ref,𝑅𝑀𝑆

16ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 17: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Lots of references!

• dB-SPL – A measure of sound pressure level. 0dB-SPL is approximately the quietest sound a human can hear, roughly the sound of a mosquito flying 3 meters away.

• dbFS – relative to digital full-scale. 0 dbFS is the maximum allowable signal. Values typically negative.

• dBV – relative to 1 Volt RMS. 0dBV = 1V.

• dBu – relative to 0.775 Volts RMS with an unloaded, open circuit.

• dBmV – relative to 1 millivolt across 75 Ω. Widely used in cable television networks.

• ……

17ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 18: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Typical Values

• Jet engine at 3m

• Pain threshold

• Loud motorcycle, 5m

• Vacuum cleaner

• Quiet restaurant

• Rustling leaves

• Human breathing, 3m

• Hearing threshold

140 db-SPL

130 db-SPL

110 db-SPL

80 db-SPL

50 db-SPL

20 db-SPL

10 db-SPL

0 db-SPL

18ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 19: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Digital Sampling

0

1

2

3

-1

-2

AM

PLIT

UD

E

TIME

quantization increment

sampleinterval

011

010

001

101

100

000

RECONSTRUCTION

19ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 20: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

More quantization levels = more dynamic range

0

1

2

3

4

5

6

-4

-3

-2

-1

0000

0001

0010

0110

0100

0101

0011

1001

1010

1011

1000

AM

PLIT

UD

E

TIMEsampleinterval

quantization increment

20ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 21: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Bit Depth and Dynamics

• More bits = more quantization levels = better sound

• Compact Disc: 16 bits = 65,536 levels

• POTS (plain old telephone service): 8 bits = 256 levels

• Signal-to-quantization-noise ratio (SQNR), if the signal is uniformly distributed in the whole range

SQNR = 20 log10 2𝑁 ≈ 6.02𝑁 dB

– E.g., N = 16 bits depth gives about 96dB SQNR.

21ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 22: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

RMS

𝑥𝑅𝑀𝑆 =1

𝑁

𝑛=0

𝑁−1

𝑥2[𝑛]

Am

plit

ude

0 2 4 6-1

0

1

The red dots form the discrete signal 𝑥[𝑛]

22ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 23: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Aliasing and Nyquist

0

1

2

3

4

5

6

AM

PLIT

UD

E

TIME

-4

-3

-2

-1

sampleinterval

23ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 24: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Aliasing and Nyquist

0

1

2

3

4

5

6

AM

PLIT

UD

E

TIME

-4

-3

-2

-1

sampleinterval

24ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 25: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Aliasing and Nyquist

0

1

2

3

4

5

6

AM

PLIT

UD

E

TIME

-4

-3

-2

-1

sampleinterval

25ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 26: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Nyquist-Shannon Sampling Theorem

• You can’t reproduce the signal if your sample rate isn’t faster than twice the highest frequency in the signal.

• Nyquist rate: twice the frequency of the highest frequency in the signal.

– A property of the continuous-time signal.

• Nyquist frequency: half of the sampling rate

– A property of the discrete-time system.

26ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 27: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Discrete-Time Fourier Transform (DTFT)

Am

plit

ude

0 2 4 6-1

0

1

𝑋 𝜔 =

𝑛=−∞

𝑥[𝑛]𝑒−𝑗𝜔𝑛

Am

plit

ude

Angular frequency 𝜔0−2𝜋

|𝑋 𝜔 |

The red dots form the discrete signal 𝑥[𝑛], where 𝑛 = 0,±1,±2,…

2𝜋

𝑋(𝜔) is Periodic.We often only show −𝜋, 𝜋

𝜔 is a continuous variable

−𝜋 𝜋

27ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 28: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Relation between FT and DTFT

𝑋 𝜔 =1

𝑇

𝑘=−∞

𝑋𝑐𝜔

𝑇+2𝜋𝑘

𝑇

• Scaling: 𝜔 = Ω𝑇, i.e., 𝜔 = 2𝜋 corresponds to

Ω =2𝜋

𝑇= 2𝜋𝑓𝑠, which corresponds to 𝑓 = 𝑓𝑠.

• Repetition: 𝑋 𝜔 contains infinite copies of 𝑋𝑐, spaced by 2𝜋.

Am

plit

ude

0 2 4 6-1

0

1

Time (ms)

Sampling: 𝑥 𝑛 = 𝑥𝑐(𝑛𝑇)

FT: 𝑋𝑐(Ω) = −∞∞𝑥𝑐(𝑡)𝑒

−𝑗Ω𝑡𝑑𝑡

DTFT: 𝑋 𝜔 = 𝑛=−∞∞ 𝑥[𝑛]𝑒−𝑗𝜔𝑛

28ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 29: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Aliasing

Ω0

|𝑋𝑐 Ω |

1800𝜋 3600𝜋−3600𝜋 −1800𝜋

Complex tone900Hz + 1800Hz

Sampling rate = 8000Hz

0

|𝑋 𝜔 |

𝜋 2𝜋−2𝜋 −𝜋3600𝜋

8000

𝜔

Sampling rate = 2000Hz

𝜔0 𝜋 2𝜋−2𝜋 −𝜋

|𝑋 𝜔 | 3600𝜋

2000

1800𝜋

2000

200Hz

29ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 30: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Fourier Series

• FT and DTFT do not require the signal to be periodic, i.e., the signal may contain arbitrary frequencies, which is why the frequency domain is continuous.

• Now, if the signal is periodic:𝑥 𝑡 + 𝑚𝑇 = 𝑥 𝑡 ∀𝑚 ∈ Ζ

• It can be reproduced by a series of sine and cosine functions:

𝑥 𝑡 = 𝐴0 +

𝑛=1

𝐴𝑛 cos Ω𝑛𝑡 + 𝐵𝑛 sin Ω𝑛𝑡

• In other words, the frequency domain is discrete.

ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 31: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Discrete Fourier Transform (DFT)

• FT and DTFT are great, but the infinite integral or summations are hard to deal with.

• In digital computers, everything is discrete, including both the signal and its spectrum.

𝑋 𝑘 =

𝑛=0

𝑁−1

𝑥[𝑛]𝑒−𝑗2𝜋𝑘𝑛/𝑁

frequency domain index

time domain index

Length of the signal, i.e. length of DFT

31ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 32: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

DFT and IDFT

𝑋 𝑘 = 𝑛=0

𝑁−1

𝑥[𝑛]𝑒−𝑗2𝜋𝑘𝑛/𝑁

𝑥 𝑛 =1

𝑁 𝑘=0

𝑁−1

𝑋[𝑘]𝑒𝑗2𝜋𝑘𝑛/𝑁

• Both 𝑥[𝑛] and 𝑋[𝑘] are discrete and of length 𝑁.

• Treats 𝑥[𝑛] as if it were infinite and periodic.

• Treats 𝑋[𝑘] as if it were infinite and periodic.

• Only one period is involved in calculation.

DFT:

IDFT:

32ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 33: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Discrete Fourier Transform

• If the time-domain signal has no imaginary part (like an audio signal) then the frequency-domain signal is conjugate symmetric around N/2.

DFT0 N-1

0 N-1

0 N-1

0 N-1

Real portion

Imaginary portion

N/2

N/2

Real portion

Imaginary portion

Time domain 𝑥[𝑛] Frequency domain 𝑋[𝑘]

IDFT

33ECE 477 - Computer Audition, Zhiyao Duan 2019

DC fs/2

Page 34: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Kinds of Fourier Transforms

Fourier TransformSignals: continuous, aperiodicSpectrum: aperiodic, continuous

Fourier SeriesSignals: continuous, periodicSpectrum: aperiodic, discrete

Discrete Time Fourier TransformSignals: discrete, aperiodicSpectrum: periodic, continuous

Discrete Fourier TransformSignals: discrete, periodicSpectrum: periodic, discrete

34ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 35: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Duality

ECE 477 - Computer Audition, Zhiyao Duan 2019 35

Fourier Transform

DTFT

Fourier Series

DFT

continuous discrete

aperiodic periodic

continuous

discrete

aperiodic

periodic

Time domain

Tim

e d

om

ain

Frequency domain

Fre

qu

en

cy d

om

ain

Page 36: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

The FFT

• Fast Fourier Transform

– A much, much faster way to do the DFT

– Introduced by Carl F. Gauss in 1805

– Rediscovered by J.W. Cooley and John Tukeyin 1965

– The Cooley-Tukey algorithm is the one we use today (mostly)

– Big O notation for this is O(N log N)

– Matlab functions fft and ifft are standard.

36ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 37: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Windowing

• A function that is zero-valued outside of some chosen interval.

– When a signal (data) is multiplied by a window function, the product is zero-valued outside the interval: all that is left is the "view" through the

window.

x[n] w[n] z[n]

x =

Example: windowing x[n] with a rectangular window

37ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 38: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Some famous windows

• Rectangular

𝑤 𝑛 = 1

• Triangular (Bartlett)

𝑤 𝑛 =2

𝑁−1

𝑁−1

2− 𝑛 −

𝑁−1

2

• Hann

𝑤 𝑛 = 0.5 1 − cos2𝜋𝑛

𝑁−1

Note: we assume w[n] = 0 outside some range [0,N]

sample

am

plit

ude

sample

am

plit

ude

sample

am

plit

ude

38ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 39: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Why window shape matters

• Don’t forget that a DFT assumes the signal in the window is periodic

• The boundary conditions mess things up…unless you manage to have a window whose length is exactly 1 period of your signal

• Making the edges of the window less prominent helps suppress undesirable artifacts

39ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 40: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Fourier Transform of Windows

-4 -2 0 2 4-30

-20

-10

0

10

20

30

40

Normalized angular frequency

Am

plit

ude (

dB

)

Main lobe

SidelobesWe want- Narrow main lobe- Low sidelobes

40ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 41: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Which window is better?

-4 -2 0 2 4-150

-100

-50

0

50

Normalized angular frequency

Am

plit

ude (

dB

)

-4 -2 0 2 4-60

-40

-20

0

20

40

Normalized angular frequency

Am

plit

ude (

dB

)

Hann window

𝑤 𝑛 = 0.5 1 − cos2𝜋𝑛

𝑁 − 1

Hamming window

𝑤 𝑛 = 0.54 − 0.46 × cos2𝜋𝑛

𝑁 − 1

41ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 42: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Multiplication v.s. Convolution

Time domain Frequency Domain

𝑥[𝑛] ∙ 𝑦[𝑛]1

𝑁𝑋[𝑘] ∗ 𝑌[𝑘]

𝑥[𝑛] ∗ 𝑦[𝑛] 𝑋[𝑘] ∙ 𝑌[𝑘]

• Windowing is multiplication in time domain, so the spectrum will be a convolution between the signal’s spectrum and the window’s spectrum

• Convolution in time domain takes 𝑂(𝑁2), but if we perform in the frequency domain…• FFT takes 𝑂 𝑁 log𝑁• Multiplication takes 𝑂 𝑁• IFFT takes 𝑂 𝑁 log𝑁

42ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 43: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Windowed Signal

ECE 477 - Computer Audition, Zhiyao Duan 2019 43

0 50 100 150 200 250 300 350 400-3

-2

-1

0

1

2

3

0 50 100 150 200 250 300 350 400-3

-2

-1

0

1

2

3

Page 44: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Spectrum of Windowed Signal

0 1000 2000 3000 4000 5000-80

-60

-40

-20

0

20

40

Frequency (Hz)

Am

plit

ude (

dB

)

• Two sinusoids: 1000Hz + 1500Hz

• Sampling rate: 10KHz

• Window length: 100 (i.e. 100/10K = 0.01s)

• FFT length: 400 (i.e. 4 times zero padding)

44ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 45: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Zero Padding

• Add zeros after (or before) the signal to make it longer

• Perform DFT on the padded signal

ECE 477 - Computer Audition, Zhiyao Duan 2019 45

0 200 400 600 800 1000 1200 1400 1600-3

-2

-1

0

1

2

3

Windowed signal

Padded zeros

Page 46: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Why Zero Padding?

• Zero padding in time domain gives the ideal interpolation in the frequency domain.

• It doesn’t increase (the real) frequency resolution!

– 4 times is generally enough

– Here the resolution is always fs/L=100Hz

0 1000 2000 3000 4000 5000-80

-60

-40

-20

0

20

40

Frequency (Hz)

Am

plit

ude (

dB

)

0 1000 2000 3000 4000 5000-80

-60

-40

-20

0

20

40

Frequency (Hz)

Am

plit

ude (

dB

)

No zero padding 4 times zero padding

0 1000 2000 3000 4000 5000-80

-60

-40

-20

0

20

40

Frequency (Hz)

Am

plit

ude (

dB

)

8 times zero padding

46ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 47: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

How to increase frequency resolution?

• Time-frequency resolution tradeoff∆𝑡 ⋅ ∆𝑓 = 1

(second) (Hz)

0 1000 2000 3000 4000 5000-80

-60

-40

-20

0

20

40

60

Frequency (Hz)

Am

plit

ude (

dB

)

0 1000 2000 3000 4000 5000-100

-50

0

50

Frequency (Hz)

Am

plit

ude (

dB

)

0 1000 2000 3000 4000 5000-80

-60

-40

-20

0

20

40

Frequency (Hz)

Am

plit

ude (

dB

)

47ECE 477 - Computer Audition, Zhiyao Duan 2019

Window length: 10ms Window length: 20ms Window length: 40ms

Page 48: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Short time Fourier Transform

•Break signal into frames•Window each frame•Calculate DFT of each windowed frame

48ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 49: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

The Spectrogram

• There is a “spectrogram” function in matlab.

49ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 50: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

A Fun Example

(Thanks to Robert Remez)

50ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 51: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Overlap-Add Synthesis

• IDFT on each spectrum.

– Use the complex, full spectrum.

– Don’t forget the phase (often using the original phase).

– If you do it right, the time signal you get is real.

• (optional) Multiply with a synthesis window (e.g., Hamming) to suppress signals at edges.

– Not dividing the analysis window

• Overlap and add different frames together.

51ECE 477 - Computer Audition, Zhiyao Duan 2019

Page 52: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Constant Overlap Add (COLA)

• Windows of all frames add up to a constant function. Perfect reconstruction!

𝑚

𝑤[𝑛 −𝑚𝑅] = 𝑐𝑜𝑛𝑠𝑡

• Requires special design of 𝑤 and 𝑅

– Rectangular window: 𝑅 ≤ 𝐿

– Triangular window: 𝑅 =𝐿

𝑘, 𝑘 ≥ 2, 𝑘 ∈ ℕ

– Hamming/hann window: 𝑅 =𝐿

2𝑘, k ∈ ℕ

ECE 477 - Computer Audition, Zhiyao Duan 2019 52

Window function

Frame index

Frame hop size

Window size

Page 53: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Shepard Tones

Continuous Risset scale

Barber’s pole

ECE 477 - Computer Audition, Zhiyao Duan 2019 53

Page 54: Topic 2 - University of Rochesterzduan/teaching/ece477... · roughly the sound of a mosquito flying 3 meters away. • dbFS –relative to digital full-scale. 0 dbFS is the maximum

Shepard Tones

• Make a sound composed of sine waves spaced at octave intervals.

• Control their amplitudes by imposing a Gaussian (or something like it) filter in the log-frequency dimension.

• Move all the sine waves up a musical ½ step.

• Wrap around in frequency.

ECE 477 - Computer Audition, Zhiyao Duan 2019 54