E85.2607: Lecture 6 -- Time-segment Processingronw/adst-spring2010/...E85.2607: Lecture 6 {...

16
E85.2607: Lecture 6 – Time-segment Processing E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 1 / 16

Transcript of E85.2607: Lecture 6 -- Time-segment Processingronw/adst-spring2010/...E85.2607: Lecture 6 {...

Page 1: E85.2607: Lecture 6 -- Time-segment Processingronw/adst-spring2010/...E85.2607: Lecture 6 { Time-segment Processing 2010-03-04 2 / 16 Variable speed replay in the frequency domain

E85.2607: Lecture 6 – Time-segment Processing

E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 1 / 16

Page 2: E85.2607: Lecture 6 -- Time-segment Processingronw/adst-spring2010/...E85.2607: Lecture 6 { Time-segment Processing 2010-03-04 2 / 16 Variable speed replay in the frequency domain

Variable speed replay

How can we modify a sound to make ‘faster’ or ‘slower’?i.e. song played at half tempoWhy not just slow it down?

xs(t) = xo(vt), v = speedup factor (> 1→ faster)

equivalent to playback at a different sampling rate(resample in Matlab)

2.35 2.4 2.45 2.5 2.55 2.6-0.1

-0.05

0

0.05

0.1

2.35 2.4 2.45 2.5 2.55 2.6-0.1

-0.05

0

0.05

0.1

time/s

Original

2x slower

v = 12

Source: Dan Ellis, EE6820 Lecture 1

E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 2 / 16

Page 3: E85.2607: Lecture 6 -- Time-segment Processingronw/adst-spring2010/...E85.2607: Lecture 6 { Time-segment Processing 2010-03-04 2 / 16 Variable speed replay in the frequency domain

Variable speed replay in the frequency domain

Compress in time → shiftfrequencies higher

“chipmunk” effect

Beware aliasing

LPF before resampling

0 2000 4000 6000 8000

−0.5

0

0.5

x(n)

Variable speed replay (v=1), time domain signals

0 2000 4000 6000 8000

−0.5

0

0.5

x(n)

(v=0.5)

0 2000 4000 6000 8000

−0.5

0

0.5

x(n)

(v=2)

n !

0 1000 2000 3000 4000 5000−80

−60

−40

−20

0

X(f)

Variable speed replay (v=1), spectra

0 1000 2000 3000 4000 5000−80

−60

−40

−20

0

X(f)

(v=0.5)

0 1000 2000 3000 4000 5000−80

−60

−40

−20

0

X(f)

(v=2)

f in Hz !

Source: DAFX: Fig. 7.1

E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 3 / 16

Page 4: E85.2607: Lecture 6 -- Time-segment Processingronw/adst-spring2010/...E85.2607: Lecture 6 { Time-segment Processing 2010-03-04 2 / 16 Variable speed replay in the frequency domain

Timescale modification

Want to change timeaxis

Leave pitch alone

0 2000 4000 6000 8000

−0.5

0

0.5

x(n)

Time stretching (!=1), time domain signals

0 2000 4000 6000 8000

−0.5

0

0.5

x(n)

(!=0.5)

0 2000 4000 6000 8000

−0.5

0

0.5

x(n)

(!=2)

n "

0 1000 2000 3000 4000 5000−80

−60

−40

−20

0

X(f)

Time stretching (!=1), spectra

0 1000 2000 3000 4000 5000−80

−60

−40

−20

0

X(f)

(!=0.5)

0 1000 2000 3000 4000 5000−80

−60

−40

−20

0

X(f)

(!=2)

f in Hz "

Source: DAFX: Fig. 7.4

E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 4 / 16

Page 5: E85.2607: Lecture 6 -- Time-segment Processingronw/adst-spring2010/...E85.2607: Lecture 6 { Time-segment Processing 2010-03-04 2 / 16 Variable speed replay in the frequency domain

Block processing

time / s

4

4

1

1

3

3

2

2

5

5

Chop signal into short segments

Resample individual segments,but keep time axis

Will this work?

c = 1;for i = 0:N:length(x)y(:,c) = x(i + [1:N]);c = c + 1;

end

E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 5 / 16

Page 6: E85.2607: Lecture 6 -- Time-segment Processingronw/adst-spring2010/...E85.2607: Lecture 6 { Time-segment Processing 2010-03-04 2 / 16 Variable speed replay in the frequency domain

TSM in the time-domain

Problem: want to preserve local time structurebut alter global time structureRepeat or discard segments

but: artifacts from abrupt edges

Cross-fade & overlap-add (OLA)

ym[mL + n] = ym−1[mL + n] + w [n] · x [bvmc L + n]

2.35 2.4 2.45 2.5 2.55 2.6-0.1

0

0.1

4.7 4.75 4.8 4.85 4.9 4.95-0.1

0

0.1

1

1

1 1 2 2 3 3 4 4 5 5 6

2

2

3

3

4

4

5 6

6

5

time / s

time / sSource: Dan Ellis, EE6820 Lecture 1

E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 6 / 16

Page 7: E85.2607: Lecture 6 -- Time-segment Processingronw/adst-spring2010/...E85.2607: Lecture 6 { Time-segment Processing 2010-03-04 2 / 16 Variable speed replay in the frequency domain

Aside: Fast convolution (fftfilt)

Source: Dan Ellis, EE4810: Lecture 3

Use FFT to implement block-wise convolution: O(N log N) vs O(N2)

E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 7 / 16

Page 8: E85.2607: Lecture 6 -- Time-segment Processingronw/adst-spring2010/...E85.2607: Lecture 6 { Time-segment Processing 2010-03-04 2 / 16 Variable speed replay in the frequency domain

Synchronous overlap-add (SOLA)

Idea: allow some leeway in placing window to optimize alignment ofwaveforms to minimize artifacts

1

2

Km maximizes alignment of 1 and 2

Hence,

ym[mL + n] = ym−1[mL + n] + w [n] · x [bvmc L + n + Km]

Where Km chosen by cross-correlation (xcorr):

Km = argmax0≤K≤Ku

∑Novn=0 ym−1[mL + n] · x [bvmc L + n + K ]√∑(ym−1[mL + n])2

∑(x [bvmc L + n + K ])2

Source: Dan Ellis, EE6820 Lecture 1

E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 8 / 16

Page 9: E85.2607: Lecture 6 -- Time-segment Processingronw/adst-spring2010/...E85.2607: Lecture 6 { Time-segment Processing 2010-03-04 2 / 16 Variable speed replay in the frequency domain

Pitch-synchronous OLA (PSOLA): Analysis

Assuming input signal has pitch, can get even cleaner TSM

1 Identify pitch period T usingauto-correlation

2 Find peaks corresponding topitch pulse

3 Segment signal using length 2Twindow centered on each pitchpulse

Pitch Synchronous Analysis

P

pitch marks

PSOLA analysis

segments

Source: DAFX: Fig. 7.9

E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 9 / 16

Page 10: E85.2607: Lecture 6 -- Time-segment Processingronw/adst-spring2010/...E85.2607: Lecture 6 { Time-segment Processing 2010-03-04 2 / 16 Variable speed replay in the frequency domain

PSOLA: Synthesis

1 Set up grid of pitch marks foroutput segments

2 Match up closest input segmentto each output pitch mark

3 Overlap-add...

pitch marks

segments

PSOLA time stretching

Synthesispitch marks

Overlap and add

Source: DAFX: Fig. 7.10

E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 10 / 16

Page 11: E85.2607: Lecture 6 -- Time-segment Processingronw/adst-spring2010/...E85.2607: Lecture 6 { Time-segment Processing 2010-03-04 2 / 16 Variable speed replay in the frequency domain

PSOLA gotchas

Pitch Synchronous Analysis

P

pitch marks

PSOLA analysis

segments

Source: DAFX: Fig. 7.8

PSOLA analysis assumes constant pitchUse more sophisticated pitch tracker

What if there is no pitch?Fall back to OLA/SOLAComplicates analysis – how to decide whether given section has pitch ornot?

Tends to introduce artifacts on noisy sectionsRepeating noisy segments introduces artificial periodicity (e.g. repeatedtransients)Fix by reversing repeated segments

E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 11 / 16

Page 12: E85.2607: Lecture 6 -- Time-segment Processingronw/adst-spring2010/...E85.2607: Lecture 6 { Time-segment Processing 2010-03-04 2 / 16 Variable speed replay in the frequency domain

Pitch-shifting

Leave the time-axis alone, but shift frequency-axis

Approach: Resample to shift both, then correct time axis

or vice-versa

0 2000 4000 6000 8000

−0.5

0

0.5

x(n)

Pitch scaling (!=1), time domain signals

0 2000 4000 6000 8000

−0.5

0

0.5

x(n)

(!=0.5)

0 2000 4000 6000 8000

−0.5

0

0.5

x(n)

(!=2)

n "

0 1000 2000 3000 4000 5000−80

−60

−40

−20

0

X(f)

Pitch scaling (!=1), spectra

0 1000 2000 3000 4000 5000−80

−60

−40

−20

0

X(f)

(!=0.5)

0 1000 2000 3000 4000 5000−80

−60

−40

−20

0

X(f)

(!=2)

f in Hz "

Resampling(ratio N /N )1 2

x(n) y(n)Time Scaling(ratio N /N )2 1

Source: DAFX: Fig. 7.13

v =N2

N1

E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 12 / 16

Page 13: E85.2607: Lecture 6 -- Time-segment Processingronw/adst-spring2010/...E85.2607: Lecture 6 { Time-segment Processing 2010-03-04 2 / 16 Variable speed replay in the frequency domain

Pitch-shifting example

0 2000 4000 6000 8000

−0.5

0

0.5x(

n)

Pitch scaling (!=1), time domain signals

0 2000 4000 6000 8000

−0.5

0

0.5

x(n)

(!=0.5)

0 2000 4000 6000 8000

−0.5

0

0.5

x(n)

(!=2)

n "

0 1000 2000 3000 4000 5000−80

−60

−40

−20

0

X(f)

Pitch scaling (!=1), spectra

0 1000 2000 3000 4000 5000−80

−60

−40

−20

0

X(f)

(!=0.5)

0 1000 2000 3000 4000 5000−80

−60

−40

−20

0

X(f)

(!=2)

f in Hz "

Resampling(ratio N /N )1 2

x(n) y(n)Time Scaling(ratio N /N )2 1

Source: DAFX: Fig. 7.12

E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 13 / 16

Page 14: E85.2607: Lecture 6 -- Time-segment Processingronw/adst-spring2010/...E85.2607: Lecture 6 { Time-segment Processing 2010-03-04 2 / 16 Variable speed replay in the frequency domain

TSM with the Spectrogram

Just stretch out the spectrogram?

Time

Fre

quen

cy

0 0.2 0.4 0.6 0.8 1 1.2 1.40

1000

2000

3000

4000

Time

Fre

quen

cy

0 0.2 0.4 0.6 0.8 1 1.2 1.40

1000

2000

3000

4000

how to resynthesize?spectrogram is only |Y [k,m]|Source: Dan Ellis, EE6820 Lecture 1

E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 14 / 16

Page 15: E85.2607: Lecture 6 -- Time-segment Processingronw/adst-spring2010/...E85.2607: Lecture 6 { Time-segment Processing 2010-03-04 2 / 16 Variable speed replay in the frequency domain

A look ahead: Phase vocoder

Time/pitch modification in thetime-frequency domain

Like OLA, but also take DFT of eachsegment

Like STFT, but special attention paid tophase...

Magnitude from ‘stretched’ spectrogram:

|Y [k,m]| = |X [k , vr ]|

But preserve phase increment between slices:

θ̇Y [k ,m] = θ̇X [k , vm]

Magnitude Phase

Sh

ort

-tim

eF

ou

rie

rtr

an

sfo

rm

FFT

Windowing

Time-domainsignal

An

aly

sis

Sta

ge

Syn

the

sis

Sta

ge

IFFT

Magnitude and Phasecalculation

Overlap-add

Time-domainsignal

Time-frequencyProcessing

Inve

rse

sho

rt-t

ime

Fo

urie

rtr

an

sfo

rm

E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 15 / 16

Page 16: E85.2607: Lecture 6 -- Time-segment Processingronw/adst-spring2010/...E85.2607: Lecture 6 { Time-segment Processing 2010-03-04 2 / 16 Variable speed replay in the frequency domain

Reading

DAFX 7.1–7.4 TSM, SOLA, PSOLA

DAFX 8.1–8.2 Phase vocoder basics

E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 16 / 16