E85.2607: Lecture 6 -- Time-segment Processingronw/adst-spring2010/...E85.2607: Lecture 6 {...

E85.2607: Lecture 6 – Time-segment Processing

E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 1 / 16

Variable speed replay

How can we modify a sound to make ‘faster’ or ‘slower’?i.e. song played at half tempoWhy not just slow it down?

xs(t) = xo(vt), v = speedup factor (> 1→ faster)

equivalent to playback at a different sampling rate(resample in Matlab)

2.35 2.4 2.45 2.5 2.55 2.6-0.1

-0.05

0

0.05

0.1

2.35 2.4 2.45 2.5 2.55 2.6-0.1

-0.05

0

0.05

0.1

time/s

Original

2x slower

v = 12

Source: Dan Ellis, EE6820 Lecture 1


http://www.ee.columbia.edu/~dpwe/e6820/sounds/mpgr1_sx419-8k.wav

http://www.ee.columbia.edu/~dpwe/e6820/sounds/mpgr-slowdown.wav

http://www.ee.columbia.edu/~dpwe/e6820/lectures/L01-intro+dsp.pdf

Variable speed replay in the frequency domain

Compress in time → shiftfrequencies higher

“chipmunk” effect

Beware aliasing

LPF before resampling

0 2000 4000 6000 8000

−0.5

0

0.5

x(n)

Variable speed replay (v=1), time domain signals

0 2000 4000 6000 8000

−0.5

0

0.5

x(n)

(v=0.5)

0 2000 4000 6000 8000

−0.5

0

0.5

x(n)

(v=2)

n !

0 1000 2000 3000 4000 5000−80

−60

−40

−20

0

X(f)

Variable speed replay (v=1), spectra

0 1000 2000 3000 4000 5000−80

−60

−40

−20

0

X(f)

(v=0.5)

0 1000 2000 3000 4000 5000−80

−60

−40

−20

0

X(f)

(v=2)

f in Hz !

Source: DAFX: Fig. 7.1


Timescale modification

Want to change timeaxis

Leave pitch alone

0 2000 4000 6000 8000

−0.5

0

0.5

x(n)

Time stretching (!=1), time domain signals

0 2000 4000 6000 8000

−0.5

0

0.5

x(n)

(!=0.5)

0 2000 4000 6000 8000

−0.5

0

0.5

x(n)

(!=2)

n "

0 1000 2000 3000 4000 5000−80

−60

−40

−20

0

X(f)

Time stretching (!=1), spectra

0 1000 2000 3000 4000 5000−80

−60

−40

−20

0

X(f)

(!=0.5)

0 1000 2000 3000 4000 5000−80

−60

−40

−20

0

X(f)

(!=2)

f in Hz "



Block processing

time / s

4

4

1

1

3

3

2

2

5

5

Chop signal into short segments

Resample individual segments,but keep time axis

Will this work?

c = 1;for i = 0:N:length(x)y(:,c) = x(i + [1:N]);c = c + 1;

end


TSM in the time-domain

Problem: want to preserve local time structurebut alter global time structureRepeat or discard segments

but: artifacts from abrupt edges

Cross-fade & overlap-add (OLA)

ym[mL + n] = ym−1[mL + n] + w [n] · x [bvmc L + n]

2.35 2.4 2.45 2.5 2.55 2.6-0.1

0

0.1

4.7 4.75 4.8 4.85 4.9 4.95-0.1

0

0.1

1

1

1 1 2 2 3 3 4 4 5 5 6

2

2

3

3

4

4

5 6

6

5

time / s

time / sSource: Dan Ellis, EE6820 Lecture 1


http://www.ee.columbia.edu/~dpwe/e6820/sounds/mpgr-fixwin.wav


Aside: Fast convolution (fftfilt)

Source: Dan Ellis, EE4810: Lecture 3

Use FFT to implement block-wise convolution: O(N log N) vs O(N2)


http://www.ee.columbia.edu/~dpwe/e4810/lectures/L03-fourier.pdf

Synchronous overlap-add (SOLA)

Idea: allow some leeway in placing window to optimize alignment ofwaveforms to minimize artifacts

1

2

Km maximizes alignment of 1 and 2

Hence,

ym[mL + n] = ym−1[mL + n] + w [n] · x [bvmc L + n + Km]

Where Km chosen by cross-correlation (xcorr):

Km = argmax0≤K≤Ku

∑Novn=0 ym−1[mL + n] · x [bvmc L + n + K ]√∑(ym−1[mL + n])2

∑(x [bvmc L + n + K ])2

Source: Dan Ellis, EE6820 Lecture 1


http://www.ee.columbia.edu/~dpwe/e6820/sounds/mpgr-solafs.wav


Pitch-synchronous OLA (PSOLA): Analysis

Assuming input signal has pitch, can get even cleaner TSM

1 Identify pitch period T usingauto-correlation

2 Find peaks corresponding topitch pulse

3 Segment signal using length 2Twindow centered on each pitchpulse

Pitch Synchronous Analysis

P

pitch marks

PSOLA analysis

segments



PSOLA: Synthesis

1 Set up grid of pitch marks foroutput segments

2 Match up closest input segmentto each output pitch mark

3 Overlap-add...

pitch marks

segments

PSOLA time stretching

Synthesispitch marks

Overlap and add



PSOLA gotchas

Pitch Synchronous Analysis

P

pitch marks

PSOLA analysis

segments


PSOLA analysis assumes constant pitchUse more sophisticated pitch tracker

What if there is no pitch?Fall back to OLA/SOLAComplicates analysis – how to decide whether given section has pitch ornot?

Tends to introduce artifacts on noisy sectionsRepeating noisy segments introduces artificial periodicity (e.g. repeatedtransients)Fix by reversing repeated segments


Pitch-shifting

Leave the time-axis alone, but shift frequency-axis

Approach: Resample to shift both, then correct time axis

or vice-versa

0 2000 4000 6000 8000

−0.5

0

0.5

x(n)

Pitch scaling (!=1), time domain signals

0 2000 4000 6000 8000

−0.5

0

0.5

x(n)

(!=0.5)

0 2000 4000 6000 8000

−0.5

0

0.5

x(n)

(!=2)

n "

0 1000 2000 3000 4000 5000−80

−60

−40

−20

0

X(f)

Pitch scaling (!=1), spectra

0 1000 2000 3000 4000 5000−80

−60

−40

−20

0

X(f)

(!=0.5)

0 1000 2000 3000 4000 5000−80

−60

−40

−20

0

X(f)

(!=2)

f in Hz "

Resampling(ratio N /N )1 2

x(n) y(n)Time Scaling(ratio N /N )2 1


v =N2

N1


Pitch-shifting example

0 2000 4000 6000 8000

−0.5

0

0.5x(

n)

Pitch scaling (!=1), time domain signals

0 2000 4000 6000 8000

−0.5

0

0.5

x(n)

(!=0.5)

0 2000 4000 6000 8000

−0.5

0

0.5

x(n)

(!=2)

n "

0 1000 2000 3000 4000 5000−80

−60

−40

−20

0

X(f)

Pitch scaling (!=1), spectra

0 1000 2000 3000 4000 5000−80

−60

−40

−20

0

X(f)

(!=0.5)

0 1000 2000 3000 4000 5000−80

−60

−40

−20

0

X(f)

(!=2)

f in Hz "

Resampling(ratio N /N )1 2

x(n) y(n)Time Scaling(ratio N /N )2 1



TSM with the Spectrogram

Just stretch out the spectrogram?

Time

Fre

quen

cy

0 0.2 0.4 0.6 0.8 1 1.2 1.40

1000

2000

3000

4000

Time

Fre

quen

cy

0 0.2 0.4 0.6 0.8 1 1.2 1.40

1000

2000

3000

4000

how to resynthesize?spectrogram is only |Y [k,m]|Source: Dan Ellis, EE6820 Lecture 1



A look ahead: Phase vocoder

Time/pitch modification in thetime-frequency domain

Like OLA, but also take DFT of eachsegment

Like STFT, but special attention paid tophase...

Magnitude from ‘stretched’ spectrogram:

|Y [k,m]| = |X [k , vr ]|

But preserve phase increment between slices:

θ̇Y [k ,m] = θ̇X [k , vm]

Magnitude Phase

Sh

ort

-tim

eF

ou

rie

rtr

an

sfo

rm

FFT

Windowing

Time-domainsignal

An

aly

sis

Sta

ge

Syn

the

sis

Sta

ge

IFFT

Magnitude and Phasecalculation

Overlap-add

Time-domainsignal

Time-frequencyProcessing

Inve

rse

sho

rt-t

ime

Fo

urie

rtr

an

sfo

rm


http://www.ee.columbia.edu/~dpwe/e6820/sounds/mpgr-pvoc.wav

Reading

DAFX 7.1–7.4 TSM, SOLA, PSOLA

DAFX 8.1–8.2 Phase vocoder basics


E85.2607: Lecture 6 -- Time-segment Processingronw/adst-spring2010/...E85.2607: Lecture 6 {...

Documents

Transcript of E85.2607: Lecture 6 -- Time-segment Processingronw/adst-spring2010/...E85.2607: Lecture 6 {...