TIME-SHIFTED PRINCIPAL COMPONENT ANALYSIS BASED CUE EXTRACTION FOR STEREO AUDIO SIGNALS Jianjun HE,...

13
TIME-SHIFTED PRINCIPAL COMPONENT ANALYSIS BASED CUE EXTRACTION FOR STEREO AUDIO SIGNALS Jianjun HE , Ee-Leng Tan, Woon-Seng Gan Digital Signal Processing Lab, School of EEE, Nanyang Technological University, Singapore 15 th May 2013 Email: [email protected]

Transcript of TIME-SHIFTED PRINCIPAL COMPONENT ANALYSIS BASED CUE EXTRACTION FOR STEREO AUDIO SIGNALS Jianjun HE,...

Page 1: TIME-SHIFTED PRINCIPAL COMPONENT ANALYSIS BASED CUE EXTRACTION FOR STEREO AUDIO SIGNALS Jianjun HE, Ee-Leng Tan, Woon-Seng Gan Digital Signal Processing.

TIME-SHIFTED PRINCIPAL COMPONENT ANALYSIS BASED

CUE EXTRACTION FOR STEREO AUDIO SIGNALS

Jianjun HE, Ee-Leng Tan, Woon-Seng Gan

Digital Signal Processing Lab, School of EEE, Nanyang Technological University, Singapore

15th May 2013

Email: [email protected]

Page 2: TIME-SHIFTED PRINCIPAL COMPONENT ANALYSIS BASED CUE EXTRACTION FOR STEREO AUDIO SIGNALS Jianjun HE, Ee-Leng Tan, Woon-Seng Gan Digital Signal Processing.

TIME-SHIFTED PRINCIPAL COMPONENT ANALYSIS BASED CUE EXTRACTION FOR STEREO AUDIO SIGNALS

Outline

2

1. Introduction

2. Stereo signal model

3. Cue extraction using PCA

4. Cue extraction using Shifted PCA

5. Experimental results

6. Conclusions

Page 3: TIME-SHIFTED PRINCIPAL COMPONENT ANALYSIS BASED CUE EXTRACTION FOR STEREO AUDIO SIGNALS Jianjun HE, Ee-Leng Tan, Woon-Seng Gan Digital Signal Processing.

Introduction

3

Cue — Where the sound comes from?

Ambience — Where are you?

Page 4: TIME-SHIFTED PRINCIPAL COMPONENT ANALYSIS BASED CUE EXTRACTION FOR STEREO AUDIO SIGNALS Jianjun HE, Ee-Leng Tan, Woon-Seng Gan Digital Signal Processing.

Cues highly correlated

Ambience uncorrelated

Cue ambience uncorrelated

Ambience power balanced

Stereo Signal Model

4

L L L

R R R

x c a

x c a

Signal = Cue + Ambience

: Left channel

: Right channel

L

R

Assumptions

R Lc kc

L Ra a

L Ra aP P

( ) ( )L R L Rc a

Page 5: TIME-SHIFTED PRINCIPAL COMPONENT ANALYSIS BASED CUE EXTRACTION FOR STEREO AUDIO SIGNALS Jianjun HE, Ee-Leng Tan, Woon-Seng Gan Digital Signal Processing.

Stereo Signal Model

5

2

Cue panning factor CPF:

12 2

R

L

RR LL RR LL

LR LR

c

c

r r r rk

r

k

r

Total cue energyCue energy ratio CER:

Total signal energy

2, [0,1]LR RR LL

RR LL

r r r k

r r k

autocorrelation of the left, right channel; cross correlation between the left and right chann, : l e:LL RR LRr r r

k

1

Center

RightLeft

1/10 10

Page 6: TIME-SHIFTED PRINCIPAL COMPONENT ANALYSIS BASED CUE EXTRACTION FOR STEREO AUDIO SIGNALS Jianjun HE, Ee-Leng Tan, Woon-Seng Gan Digital Signal Processing.

Cue extraction using Principal component analysis (PCA)

6M. Goodwin and J. M. Jot, “Primary-ambient signal decomposition and vector-based localization for spatial audio coding and

enhancement,” IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, April 2007.

0

2 2

0 0 0arg max H HL R

uu u x u x

L L L

R L R

x c a

x kc a

2 20

0 0

0 00 0

0 0 0 0

0.5( ( ) 4 )

( )

ˆ ˆ,

LL RR LL RR LR

LR L LL R

H HL R

L RH H

r r r r r

u r x r x

u x u xc u c u

u u u u

2 2ˆ ˆ,

1 1L R L R

L R

x kx x kxc c k

k k

Lx

Rx ˆ

Rc

ˆLc

0u

1u

Page 7: TIME-SHIFTED PRINCIPAL COMPONENT ANALYSIS BASED CUE EXTRACTION FOR STEREO AUDIO SIGNALS Jianjun HE, Ee-Leng Tan, Woon-Seng Gan Digital Signal Processing.

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

c

(a) ECR of extracted cues in two channels

EC

R

0 0.2 0.4 0.6 0.8 10

5

10

15

20

c

IL

D(d

B)

(d) ILD extraction error of the cues

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

c

cL

(b) Extracted and true cue correlation in left channel

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

c

cR

(c) Extracted and true cue correlation in right channel

=0.5=0.7=0.9

=0.5=0.7=0.9

PCA based cue extraction

7

2

2

1

ˆ1

L L

R R

L R

L R

a kakk

a ka

c c

c ck

Lc

Rc

0

1c

c

0

1c

c

2

2

( )1

1

1

R L

L R

kn kn

k

kn nk

Problems remains with

Performance of PCA based cue extraction with varying

(k=3). (a) ECR;

c

: uncorrelated components

decomposed from partially-correlated  cuesL Rn n

(b)-(c) extraction similarity; (d) ILD error

Practically,

ITD ≡ 0

Error

Error powerECR

True cue power

( )

( ) ( )

Extraction similarity =

ˆ Correlation ,

L R

L R L Rc c

Localization parameters:Inter-channel time difference (ITD)Inter-channel level difference (ILD)

Page 8: TIME-SHIFTED PRINCIPAL COMPONENT ANALYSIS BASED CUE EXTRACTION FOR STEREO AUDIO SIGNALS Jianjun HE, Ee-Leng Tan, Woon-Seng Gan Digital Signal Processing.

PCA based cue extraction

8

Problems remains with

So what can we do?

Page 9: TIME-SHIFTED PRINCIPAL COMPONENT ANALYSIS BASED CUE EXTRACTION FOR STEREO AUDIO SIGNALS Jianjun HE, Ee-Leng Tan, Woon-Seng Gan Digital Signal Processing.

PCA based cue extraction

9

Shifted

PCA Decomposition

CuesStereo input signalTimeShifting

ITDEstimation

OutputMapping

-50 -40 -30 -20 -10 0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Lag

Inte

r-ch

anne

l cor

rela

tion

coef

ficie

nt

An simple example of ITD estimation

ITD

Page 10: TIME-SHIFTED PRINCIPAL COMPONENT ANALYSIS BASED CUE EXTRACTION FOR STEREO AUDIO SIGNALS Jianjun HE, Ee-Leng Tan, Woon-Seng Gan Digital Signal Processing.

Performance comparison between PCA and SPCA

10

Synthesized signals:•Cue: speech amplitude panned by 3 and shifted by 40 time units•Ambience: uncorrelated white Gaussian noise •CER: [0.5, 1]

Synthesized signals:

•Cue: speech amplitude panned by 3 and shifted by 40 time units•Ambience: uncorrelated white Gaussian noise •CER: [0.5, 1]

0.5 0.6 0.7 0.8 0.9 10

0.5

1Error to cue energy ratio in extracted left channel cue

Cue energy ratio

EC

RL

PCA

SPCA

0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8Error to cue energy ratio in extracted right channel cue

Cue energy ratio

EC

RR

0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1Correlation between extracted cue and input cue in left channel

Cue energy ratio

cL

PCA

SPCA

0.5 0.6 0.7 0.8 0.9 10

0.5

1Correlation between extracted cue and input cue in right channel

Cue energy ratio

cR

0.5 0.6 0.7 0.8 0.9 1-40

-20

0

20

40

Cue energy ratio

ITD in the cues

ITD

/lag

0.5 0.6 0.7 0.8 0.9 15

10

15

20

25

Cue energy ratio

ILD

/dB

ILD in the cues

input

PCA

SPCA

SPCAPCA

Page 11: TIME-SHIFTED PRINCIPAL COMPONENT ANALYSIS BASED CUE EXTRACTION FOR STEREO AUDIO SIGNALS Jianjun HE, Ee-Leng Tan, Woon-Seng Gan Digital Signal Processing.

Conclusions

11

In practice, cues in stereo signal can be time shifted and amplitude panned. ITD

Cue correlation

ILD

Error

Similarity

Retained

Increased

Corrected

Reduced

Increased

Lost

Dropped

Exaggerated

Higher

Lower

SPCA PCA SPCA PCA

outperforms in cue extraction with time shifting operation.

Page 12: TIME-SHIFTED PRINCIPAL COMPONENT ANALYSIS BASED CUE EXTRACTION FOR STEREO AUDIO SIGNALS Jianjun HE, Ee-Leng Tan, Woon-Seng Gan Digital Signal Processing.

Email: [email protected]

TIME-SHIFTED PRINCIPAL COMPONENT ANALYSIS BASED

CUE EXTRACTION FOR STEREO AUDIO SIGNALS

Page 13: TIME-SHIFTED PRINCIPAL COMPONENT ANALYSIS BASED CUE EXTRACTION FOR STEREO AUDIO SIGNALS Jianjun HE, Ee-Leng Tan, Woon-Seng Gan Digital Signal Processing.

PCA based cue extraction

13

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

c

(a) ECR of extracted cues in two channels

EC

R

0 0.2 0.4 0.6 0.8 10

5

10

15

20

c

IL

D(d

B)

(d) ILD extraction error of the cues

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

c

cL

(b) Extracted and true cue correlation in left channel

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

c

cR

(c) Extracted and true cue correlation in right channel

=0.5=0.7=0.9

Problems remains with

Performance of PCA based cue extraction with varying

(a) Error to cue power ratio; (b)-(c) extraction similarity; (d) ILD

error

c