Communications & Multimedia Signal Processing Meeting 7 Esfandiar Zavarehei Department of Electronic...

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Meeting 7

Esfandiar Zavarehei

Department of Electronic and Computer Engineering

Brunel University

23 November, 2005

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Contents

• Kalman Filter: Speech and noise tracking

• HNM Model: The degree of “Harmonicity”

• Bandwidth extension

• Future work: noise reduction using HNM

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Kalman Filter: Speech and noise tracking

• Previous method: Modelling speech with an AR model

nennn rrrr GSFS 1

nDnnX rrr HS

1 Tr r rn S n N S n S

nananana

n

NNN

r

121

1000

0100

0010

F

1

N

r k r rk

S n a n S n k e n

X: NoisyS: SpeechD: Noise

nnXnnn rrrrr SHKSS ˆˆˆ

12 nnnn rT

rT

rr HHPHPK Noise Variance

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Kalman Filter: Speech and noise tracking (cont.)

• New method: Modelling speech AND noise with AR models

1

M

r k r rk

D n b n D n k g n

1r r r c rn n n n X A X G E

r c rX n nH X

TT T

r r rn n n X S D

r

rr

nn

n

F 0A

0 B

1 2 1

0 1 0 0

0 0 1 0

0 0 0 1r

M M M

n

b n b n b n b n

B

ˆ ˆ ˆr r rc r c rn n n X n n X X K H X

1T T

rc rc c c rc cn n n K P H H P H

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Kalman Filter: Speech and noise tracking (cont.)

• The noise AR models are obtained from noise-only periods.

• Results from the new model sound more natural

Clean Speech

Noisy Speech

Kalman Old

Kalman New

SNR

Method -5 0 5 10

Car

DFTKUN 2.41 2.80 3.13 3.43

DFTKCN 2.51 2.90 3.20 3.49

MMSE 2.39 2.75 3.10 3.38

Wiener 2.36 2.74 3.10 3.36

PSS 2.44 2.79 3.08 3.28

Train

DFTKUN 1.81 2.22 2.62 2.98

DFTKCN 1.90 2.30 2.69 3.05

MMSE 1.78 2.20 2.58 2.89

Wiener 1.48 1.99 2.45 2.82

PSS 1.65 2.12 2.51 2.84

WGN

DFTKUN 1.90 2.29 2.64 2.92

DFTKCN 1.99 2.35 2.68 3.02

MMSE 1.90 2.22 2.58 2.90

Wiener 1.85 2.21 2.61 2.91

PSS 1.95 2.26 2.58 2.84

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

HNM Model

• Harmonic sub-bands are modelled as the sum of a Gaussian and some random noise

-60 -40 -20 0 20 40 600

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

Hz

WH

Energy=1

2 1HW

2k

k

f f

kf f f

A X f

2

1

k

k

f f

k H kf f f

kk

X f A W f f

VA

2

2.2exp

60H

fW f

1k k H k k kX f A V W f f V R f f 2 1HR

R: Random Noise with Rayleigh Distribution

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

HNM Model

• Sample Reconstructed Frame

0 1000 2000 3000 4000 5000 6000 7000 8000-14

-12

-10

-8

-6

-4

-2

0

Hz

dB

Original

Reconstructed

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

HNM Model

0 5 10 15 20 25 30 35 40 45 505

10

15

Harmonic Index

Ak

0 5 10 15 20 25 30 35 40 45 500

0.2

0.4

0.6

0.8

Vk

Harmonic Index

0 1000 2000 3000 4000 5000 6000 7000 8000-15

-10

-5

0

Hz

Original Synthesized, PESQ:3.91

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

HNM Model

• Noise severely affects the Ak

• Pitch, Harmonicity and Harmonic frequencies are much less distorted by noise

• Simple analysis/synthesis of noisy speech improves its quality (SNR<10dB)

-5 0 5 10 15 201.8

2

2.2

2.4

2.6

2.8

3

3.2

3.4

3.6

3.8

SNR (dB)

PE

SQ

Clean Pitch

-5 0 5 10 15 201.8

2

2.2

2.4

2.6

2.8

3

3.2

3.4

3.6

3.8

SNR (dB)

PE

SQ

Noisy Pitch

All NoisyNoisy AmpNoisy TrackNoisy HarmonicityClean AllActual Noisy Signal

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

LP-HNM

• Decompose the signal to an LP model (AR or LSF) and an HNM model of the residual (fk,Ak,Vk)

• Amplitude can be assumed to be equal (whitened by inverse modelling)

• Frequencies also may be assumed to be multiples of the fundamental frequency (later displaced slightly by LP modelling)

LP-HNM synthesizedPESQ: 3.50

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Bandwidth Extension

• One application of the model is Bandwidth Extension for getting 16KHz speech quality from 8KHz Speech

Trained LP-HNM Model

LP-HNM Analysis LP-HNM Analysis

8KHz Speech Signal

16KHz Speech Signal

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Bandwidth Extension

• Codebook Mapping is used to obtain higher LPF coefficients from lower LPF coefficients extracted from 8KHz signal

• Similar method is used to obtain the harmonicity degree of higher sub-bands

LSF 1 LSF 12 LSF 1 LSF 24

LSF 13 LSF 24

OR

Codebook Shadow Codebook

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Bandwidth Extension

• A shadow codebook for LP gain ratio (G8/G16) is used for gain mapping

• Phase is extrapolated assuming a linear phase for the harmonics, some random noise is added to unvoiced sub-bands

Extract LSF1-12 G8 and

Excitation

Harmonicity degree of

Harmonics

Excitation

Codebook Mapping

Codebook Mapping

Codebook Mapping

L1-12

Vk8

G8

Pitch Extraction

Phase Reconstruction

L1-24

G16

Phase16KHz

Vk16

HNM Magnitude Synthesis

LP Magnitude Synthesis

x

LPF HPF+

• The performance of the system deteriorates in noise

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Future Work

• Tracking the HNM parameters using Kalman filter, in other words, rather than tracking DFT trajectories in one frequency bin, it might be better to track only the harmonic bins (reduced computational complexity) along the harmonic frequencies (intuitively makes more sense!)

0 100 200 300 400 500 600 700 800 900 10000

1000

2000

3000

4000

5000

6000

7000

8000

Frame index

Hz

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Future Work

• Some harmonics proved very difficult to recover from noise (e.g. 1-3). Investigate the possibility of a similar model based approach as the BWE method for estimating parameters of those harmonics. The Harmonicity of the sub-bands and the reciprocal noise level at those frequencies may be used as weights in the mapping process.

Clean Speech

De-noised Speech

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Future Work

• A is a parameter vector, W is the weighting vector (e.g. reciprocal of normalized noise spectrum). Bj is the jth entry of the codebook.

2

,arg min

1

i i j iiA B W

A

A A W AW

• The result can be used for reconstructing speech

Communications & Multimedia Signal Processing Meeting 7 Esfandiar Zavarehei Department of Electronic...

Documents

Transcript of Communications & Multimedia Signal Processing Meeting 7 Esfandiar Zavarehei Department of Electronic...