Lecture 13 fall 2010 - UCSB speech... · Lecture 13 Linear Predictive Coding (LPC)-Introduction 2...

14
1 1 Digital Speech Processing— Lecture 13 Linear Predictive Coding (LPC)- Introduction 2 LPC Methods LPC methods are the most widely used in speech coding, speech synthesis, speech recognition, speaker recognition and verification and for speech storage – LPC methods provide extremely accurate estimates of speech parameters, and does it extremely efficiently basic idea of Linear Prediction: current speech sample can be closely approximated as a linear combination of past samples, i.e., 1 ( ) ( ) for some value of , 's α α = = p k k k sn sn k p 3 LPC Methods for periodic signals with period , it is obvious that () ( ) but that is not what LP is doing; it is estimating ( ) from the ( ) most recent values of ( ) by linearly predicting << p p p N sn sn N sn p p N sn its value for LP, the predictor coefficients (the 's) are determined (computed) by (over a finite interval) α k minimizing the sum of squared differences between the actual speech samples and the linearly predicted ones 4 LPC Methods LP is based on speech production and synthesis models - speech can be modeled as the output of a linear, time-varying system, excited by either quasi-periodic pulses or noise; - assume that the model parameters remain constant over speech analysis interval LP provides a for estimating the parameters of the linear system (the com robust, reliable and accurate method bined vocal tract, glottal pulse, and radiation characteristic for voiced speech) 5 LPC Methods LP methods have been used in control and information theory—called methods of system estimation and system identification used extensively in speech under group of names including 1. covariance method 2. autocorrelation method 3. lattice method 4. inverse filter formulation 5. spectral estimation formulation 6. maximum likelihood method 7. inner product method 6 Basic Principles of LP 1 1 1 () () () = = = p k k k Sz Hz GU z az 1 () ( ) () = = + p k k sn asn k Gu n • the time-varying digital filter represents the effects of the glottal pulse shape, the vocal tract IR, and radiation at the lips • the system is excited by an impulse train for voiced speech, or a random noise sequence for unvoiced speech • this ‘all-pole’ model is a natural representation for non-nasal voiced speech—but it also works reasonably well for nasals and unvoiced sounds

Transcript of Lecture 13 fall 2010 - UCSB speech... · Lecture 13 Linear Predictive Coding (LPC)-Introduction 2...

Page 1: Lecture 13 fall 2010 - UCSB speech... · Lecture 13 Linear Predictive Coding (LPC)-Introduction 2 LPC Methods • LPC methods are the most widely used in speech coding, speech synthesis,

1

1

Digital Speech Processing—Lecture 13

Linear Predictive Coding (LPC)-Introduction

2

LPC Methods• LPC methods are the most widely used in

speech coding, speech synthesis, speech recognition, speaker recognition and verification and for speech storage– LPC methods provide extremely accurate estimates

of speech parameters, and does it extremely efficiently

– basic idea of Linear Prediction: current speech sample can be closely approximated as a linearcombination of past samples, i.e.,

1

( ) ( ) for some value of , 'sα α=

= −∑p

k kk

s n s n k p

3

LPC Methods

for periodic signals with period , it is obvious that

( ) ( )

but that is not what LP is doing; it is estimating ( ) from the ( ) most recent values of ( ) by linearly

predicting

≈ −

<<

p

p

p

N

s n s n N

s np p N s n

its value for LP, the predictor coefficients (the 's) are determined

(computed) by (over a finite interval)

α• k

minimizing the sum of squared differences between the actual speech samples

and the linearly predicted ones

4

LPC Methods LP is based on speech production and synthesis models

- speech can be modeled as the output of a linear, time-varying system, excited by either quasi-periodic pulses or noise; -

assume that the model parameters remain constant over speech analysis interval LP provides a for

estimating the parameters of the linear system (the comrobust, reliable and accurate method

bined vocal tract, glottal pulse, and radiation characteristic for voiced speech)

5

LPC Methods• LP methods have been used in control and

information theory—called methods of system estimation and system identification

– used extensively in speech under group of names including

1. covariance method2. autocorrelation method3. lattice method4. inverse filter formulation5. spectral estimation formulation6. maximum likelihood method7. inner product method

6

Basic Principles of LP

1

1

1

( )( )( ) −

=

= =

−∑p

kk

k

S zH zGU z

a z

1

( ) ( ) ( )=

= − +∑p

kk

s n a s n k Gu n

• the time-varying digital filter represents the effects of the glottal pulse shape, the vocal tract IR, and radiation at the lips

• the system is excited by an impulse train for voiced speech, or a random noise sequence for unvoiced speech

• this ‘all-pole’ model is a natural representation for non-nasal voiced speech—but it also works reasonably well for nasals and unvoiced sounds

Page 2: Lecture 13 fall 2010 - UCSB speech... · Lecture 13 Linear Predictive Coding (LPC)-Introduction 2 LPC Methods • LPC methods are the most widely used in speech coding, speech synthesis,

2

7

LP Basic Equations

1 1

1

a order linear predictor is a system of the form

( ) ( ) ( ) ( )( )

the prediction error, ( ), is of the form

( ) ( ) ( ) ( ) ( )

the pr

α α

α

= =

=

= − ⇔ = =

= − = − −

∑ ∑

%%

%

th

p pk

k kk k

p

kk

p

S zs n s n k P z zS z

e n

e n s n s n s n s n k

1

1 1

1

ediction error is the output of a system with transfer function

( ) ( ) ( )( )

if the speech signal obeys the production model exactly, and if ,( ) ( )and ( )

α

α

=

= = − = −

• = ≤ ≤

⇒ =

∑p

kk

k

k k

E zA z P z zS z

a k pe n Gu n A z

1

is an inverse filter for ( ), i.e.,

( )( )

=

H z

H zA z

8

LP Estimation Issues

need to determine { } directly from speech such that they give good estimates of the time-varying spectrum need to estimate { } from short segments of speech

need to minimize mean-squared pr

α

α

k

k

ediction error over

short segments of speech resulting { } assumed to be the actual { } in the

speech production model=> intend to show that all of this can be done

α• k ka

efficiently, reliably, and for speech accurately

9

Solution for {αk}

( )22

2

1

ˆ ˆ ˆ ˆ

ˆ ˆ

ˆ

short-time average prediction squared-error is defined as

( ) ( ) ( )

( ) ( )

ˆ select segment of speech ( ) ( ) in the vicinity of sample

n n n nm m

p

kn nm k

n

E e m s m s m

s m s m k

s m s m n

α=

= = −

⎛ ⎞⎜ ⎟= − −⎜ ⎟⎝ ⎠

• = +

∑ ∑

∑ ∑

%

ˆ the key issue to resolve is the range of for summation

(to be discussed later)

nm•

10

Solution for {αk}

1

0 1 2

2 0 1

2 0 1

ˆ

ˆ

ˆ ˆ ˆ

ˆ ˆ

can find values of that minimize by setting:

, , ,...,

giving the set of equations

ˆ ( )[ ( ) ( )] ,

( ) ( ) ,

ˆ where

k n

n

i

p

kn n nm k

n nm

EE

i p

s m i s m s m k i p

s m i e m i p

α

α

α

α

=

∂= =

− − − − = ≤ ≤

− − = ≤ ≤

∑ ∑∑

ˆ

ˆ ˆ

are the values of that minimize (from nowˆ on just use rather than for the optimum values)

prediction error ( ( )) is orthogonal to signal ( ( )) for delays ( ) of 1 to

k k n

k k

n n

E

e m s m ii p

αα α

• −

11

Solution for {αk}

1

0 1 2

ˆ ˆ ˆ

ˆ ˆ

defining

( , ) ( ) ( )

we get

( , ) ( , ), , ,...,

leading to a set of equations in unknowns that can be

solved in an efficient manner for the { }

φ

α φ φ

α

=

= − −

= =

n n nm

p

k n nk

k

i k s m i s m k

i k i i p

p p

12

Solution for {αk}

2

1

1

0 0 0

ˆ ˆ ˆ ˆ

ˆ ˆ ˆ

ˆ

minimum mean-squared prediction error has the form

( ) ( ) ( )

which can be written in the form

( , ) ( , )

Process: 1. compute ( , ) for

α

φ α φ

φ

=

=

= − −

= −

∑ ∑ ∑

p

kn n n nm k m

p

kn n nk

n

E s m s m s m k

E k

i k 1 0

ˆ

ˆ

,2. solve matrix equation for

need to specify range of to compute ( , )

need to specify ( )

αφ

≤ ≤ ≤ ≤

k

n

n

i p k p

m i k

s m

Page 3: Lecture 13 fall 2010 - UCSB speech... · Lecture 13 Linear Predictive Coding (LPC)-Introduction 2 LPC Methods • LPC methods are the most widely used in speech coding, speech synthesis,

3

13

Autocorrelation Method0 1

0 1

ˆ

ˆ

assume ( ) exists for and is exactly zero everywhere else (i.e., window of length samples) ( )

ˆ ( ) ( ) ( ), wh

Assumption

ere ( ) is a fin

#1

ite length window of

≤ ≤ −

= + ≤ ≤ −

n

n

s m m L

Ls m s m n w m m L

w m length samplesL

m=0 m=L-1

n n+L-1^ ^

14

Autocorrelation Method

1

12 2

0

0 1

0 1

ˆ

ˆ ˆ ˆ

ˆ ˆ ˆ

if ( ) is non-zero only for then

( ) ( ) ( )

is non-zero only over the interval , giving

( ) ( )

at values of near 0 (i.e.,

α=

− +∞

=−∞ =

⋅ ≤ ≤ −

= − −

≤ ≤ − +

= =

=

∑ ∑

np

kn n nk

L p

n n nm m

s m m L

e m s m s m k

m L p

E e m e m

m m 0 1 1

1 1ˆ

, ...., ) we are predicting signal from zero-valued samples outside the window range => ( ) will be (relatively) large at values near (i.e., , ,..., ) we are predicting zero-valu

= = + + −n

pe m

m L m L L L p

ˆ

ed samples (outside window range) from non-zero samples => ( ) will be (relatively) large for these reasons, normally use windows that taper the segment to zero (e.g., Hamming window)•

ne m

m=0

m=p-1

m=L

m=L+p-1

15

Autocorrelation Method

16

The “Autocorrelation Method”

1

01 2ˆ ˆ ˆ[ ] [ ] [ ] , , ,

− −

=

= + =∑ KL k

n n nm

R k s m s m k k p

L-1

^ ^ˆ 1n L+ −

ˆ ˆ[ ] [ ] [ ]ns m s m n w m= +

17

The “Autocorrelation Method”

n̂ ˆ 1n L+ −

ˆ ˆ[ ] [ ] [ ]ns m s m n w m= +

Large errorsat ends of window

n̂ ˆ 1n L+ −

ˆ ˆ ˆ1

[ ] [ ] [ ]p

n n k nk

e m s m s m kα=

= − −∑1L−

1L −

ˆ ˆ[ ] [ ] [ ]ns m s m n w m= +

( 1 )L p− + 18

Autocorrelation Method

1

0

0 0 1

1 0

ˆ ˆ

ˆ ˆ ˆ

ˆ ˆ ˆ

for calculation of ( , ) since ( ) outside the range , then

( , ) ( ) ( ), ,

which is equivalent to the form

( , ) ( ) (

φ

φ

φ

− +

=

• = ≤ ≤ −

= − − ≤ ≤ ≤ ≤

=

∑n n

L p

n n nm

n n n

i k s m m L

i k s m i s m k i p k p

i k s m s1

0

1 0

1 0

( )

ˆ

ˆ ˆ

), ,

there are | | non-zero terms in the computation of ( , ) for each value of and ; can easily show that ( , ) ( ) ( ), , wher

φ

φ

− + −

=

+ − ≤ ≤ ≤ ≤

• − −

= − = − ≤ ≤ ≤ ≤

∑L i k

m

n

n n

m i k i p k p

L i k i ki k

i k f i k R i k i p k p

1

0

ˆ ˆ

ˆ ˆ ˆ

e ( ) is the short-time autocorrelation of ( ) evaluated at where

( ) ( ) ( )− −

=

− −

= +∑n n

L k

n n nm

R i k s m i k

R k s m s m k

Page 4: Lecture 13 fall 2010 - UCSB speech... · Lecture 13 Linear Predictive Coding (LPC)-Introduction 2 LPC Methods • LPC methods are the most widely used in speech coding, speech synthesis,

4

19

Autocorrelation Method

1

1

1 0

0 1

1

ˆ

ˆ ˆ

ˆ ˆ

ˆ ˆ

since ( ) is even, then

( , ) ( ), ,

thus the basic equation becomes

( ) ( , ),

( ) ( ),

with the mi

φ

α φ φ

α

=

=

= − ≤ ≤ ≤ ≤

− = ≤ ≤

− = ≤ ≤

n

n n

p

k n nkp

k n nk

R k

i k R i k i p k p

i k i i p

R i k R i i p

1

1

0 0 0

0

ˆ ˆ ˆ

ˆ ˆ

nimum mean-squared prediction error of the form

( , ) ( , )

( ) ( )

φ α φ

α

=

=

= −

= −

p

kn n nk

p

kn nk

E k

R R k20

Autocorrelation Method

1

2

0 1 1 11 0 2 2

1 2 0

ˆ ˆ ˆ ˆ

ˆ ˆ ˆ ˆ

ˆ ˆ ˆ ˆ

as expressed in matrix form( ) ( ) . . ( ) ( )( ) ( ) . . ( ) ( )

.. . . . . .

.. . . . . .( ) ( ) . . ( ) ( )

with so

αα

α

⎡ ⎤−⎡ ⎤ ⎡ ⎤⎢ ⎥⎢ ⎥ ⎢ ⎥− ⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥=⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥− −⎣ ⎦ ⎣ ⎦⎣ ⎦

n n n n

n n n n

pn n n n

R R R p RR R R p R

R p R p R R p

α = r

1

lution

is a Toeplitz Matrix => symmetric with all diagonal elements equal=> there exist more efficient algorithms to solve for { } than simple matrix

inversionα

−ℜ• ℜ ×

k

p pα = r

21

Covariance Method

ˆ

there is a second basic approach to defining the speech segment ( ) and the limits on the sums, namely

over which the mean-squared error is computed, giving ( )

ns m fix the interval

Assumption #221 1

2

0 0 1

1

0

1 0

ˆ ˆ ˆ ˆ

ˆ ˆ ˆ

:

( ) ( ) ( )

( , ) ( ) ( ), ,

α

φ

− −

= = =

=

⎡ ⎤= = − −⎢ ⎥

⎢ ⎥⎣ ⎦

= − − ≤ ≤ ≤ ≤

∑ ∑ ∑

pL L

kn n n nm m k

L

n n nm

E e m s m s m k

i k s m i s m k i p k p

22

Covariance Method1

1

1 0

1 0

ˆ ˆ ˆ

ˆ ˆ ˆ

changing the summation index gives

( , ) ( ) ( ), ,

( , ) ( ) ( ), ,

key difference from Autocorrelation Method is that limi

φ

φ

− −

=−− −

=−

= + − ≤ ≤ ≤ ≤

= + − ≤ ≤ ≤ ≤

L i

n n nm i

L k

n n nm k

i k s m s m i k i p k p

i k s m s m k i i p k p

01

ts of summation include terms before => window extends samples backwards

ˆ ˆ from ( - ) to ( - ) since we are extending window backwards, don't need to taper it using

a HW- since there

=

+•

m ps n p s n L

is no transition at window edges

m=-p m=L-1

23

Covariance Method

24

Covariance Method

1

1

0 1 2

0 0 0

11

ˆ ˆ

ˆ ˆ ˆ

ˆ

cannot use autocorrelation formulation => this is a true cross correlation need to solve set of equations of the form

( , ) ( , ), , ,..., ,

( , ) ( , )

( ,

α φ φ

φ α φ

φ

=

=

••

= =

= −

p

k n nk

p

kn n nk

n

i k i i p

E k

1

2

1 2 1 1 02 1 2 2 2 2 0

1 2 0

ˆ ˆ ˆ

ˆ ˆ ˆ ˆ

ˆ ˆ ˆ ˆ

) ( , ) . . ( , ) ( , )( , ) ( , ) . . ( , ) ( , )

.. . . . . .

.. . . . . .( , ) ( , ) . . ( , ) ( , )

or

αφ φ φαφ φ φ φ

αφ φ φ φ

ψ ψ

⎡ ⎤⎡ ⎤ ⎡ ⎤⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥=⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦⎣ ⎦

n n n

n n n n

pn n n n

pp

p p p p p−1φα = α = φ

Page 5: Lecture 13 fall 2010 - UCSB speech... · Lecture 13 Linear Predictive Coding (LPC)-Introduction 2 LPC Methods • LPC methods are the most widely used in speech coding, speech synthesis,

5

25

Covariance Method

1 1 1 1 1 12 2 11 2

ˆ ˆ

ˆ ˆ ˆ ˆ ˆ ˆ

ˆ ˆ ˆ ˆ

we have ( , ) ( , ) => symmetric but not Toeplitz matrix whose diagonal elements are related as ( , ) ( , ) ( ) ( ) ( ) ( )

( , ) ( , ) ( )

φ φ

φ φφ φ

• =

+ + = + − − − − − − − − −

= + −

n n

n n n n n n

n n n n

i k k i

i k i k s i s k s L i s L ks s 2 2 2ˆ ˆ

ˆ

ˆ

( ) ( ) ( ) all terms ( , ) have a fixed number of terms contributing to the

computed values ( terms) ( , ) is a covariance matrix => specialized solution for { }

called the Co

φ

φ α

− − − −

n n

n

kn

s L s Li k

Li k

variance Method

26

Summary of LP

ˆ

ˆ use order linear predictor to predict ( ) from previous samples minimize mean-squared error, , over analysis window of duration -samples solution for optimum predictor coefficients, {α

••

th

n

p s n pE L

}, is based on solving a matrix equation => two solutions have evolved - => signal is windowed by a tapering window in order to minimize discontinuities at beginn

k

autocorrelation method

ˆ

ing (predicting speech from zero-valued samples) and end (predicting zero-valued samples from speech samples) of the interval; the matrix ( , ) is shown to be an autocorrelation functφn i k ion; the resulting autocorrelation matrix is Toeplitz and can be readily solved using standard matrix solutions - => the signal is extended by samples outside the normalpcovariance method

0 1 00

range of - to include samples occurring prior to ; this eliminates large errors in computing the signal from values prior to (they are available) and eliminates the

≤ ≤ ==

m L p mm

need for a tapering window; resulting matrix of correlations is symmetric but not Toeplitz => different method of solution with somewhat different set of optimal prediction coefficients, { }αk

27

LPC Summary

1

1

1

1

1

1

1. Speech Production Model:

( ) ( ) ( )

( )( )( )

2. Linear Prediction Model:

ˆ ˆ( ) ( )

( )( )( )

ˆ ˆ ˆ ˆ ˆ( ) ( ) ( ) ( ) (

α

α

α

=

=

=

=

= − +

= =−

= −

= =

= − = − −

%

%

%

p

kk

pk

kk

p

kk

pk

kk

k

s n a s n k Gu n

S zH zGU z a z

s n s n k

S zP z zS z

e n s n s n s n s n1

11

)

( )( )( )

α

=

=

= = −

p

kp

kk

k

k

E zA z zS z 28

LPC Summary22

2

1

1

0 1 2

ˆ ˆ ˆ ˆ

ˆ ˆ

ˆ

ˆ ˆ ˆ ˆ

ˆ ˆ ˆ

ˆ

3. LPC Minimization:

( ) ( ) ( )

( ) ( )

, , ,...,

( ) ( ) ( ) ( )

( , ) ( ) ( )

( ,

α

α

α

φ

α φ

=

=

= = −⎡ ⎤⎣ ⎦

⎡ ⎤= − −⎢ ⎥

⎣ ⎦∂

= =∂

− = − −

= − −

∑ ∑

∑ ∑

∑ ∑ ∑

%n n n n

m m

p

kn nm k

n

ip

kn n n nm k m

n n nm

k n

E e m s m s m

s m s m k

Ei p

s m i s m s m i s m k

i k s m i s m k

i1

1

0 1 2

0 0 0

ˆ

ˆ ˆ ˆ

) ( , ), , ,...,

( , ) ( , )

φ

φ α φ

=

=

= =

= −

p

nk

p

kn n nk

k i i p

E k

29

LPC Summary

1

0 1

0 1

0 1 0 10 1

ˆ

ˆ ˆ ˆ

ˆ ˆ

4. Autocorrelation Method:ˆ ( ) ( ) ( ),

( ) ( ) ( ),

( ) defined for ; ( ) defined for large errors for and for

α=

= + ≤ ≤ −

= − − ≤ ≤ − +

≤ ≤ − ≤ ≤ − +

⇒ ≤ ≤ − ≤

∑n

p

kn n nk

n n

s m s m n w m m L

e m s m s m k m L p

s m m L e m m L pm p L

( )

12

0

1

0

1

1

1

1

0

ˆ ˆ

( )

ˆ ˆ ˆ ˆ ˆ

ˆ ˆ

ˆ ˆ ˆ

( )

( , ) ( ) ( ) ( )

( ) ( ),

( ) ( )

φ

α

α

− +

=

− − −

=

=

=

≤ + −

=

= − = + − = −

− = ≤ ≤

= −

L p

n nm

L i k

n n n n nm

p

k n nk

p

kn n nk

m L p

E e m

i k R i k s m s m i k R i k

R i k R i i p

E R R k30

LPC Summary

1

1

2

0 1 1 11 0 2 2

1 2 0

ˆ ˆ ˆ ˆ

ˆ ˆ ˆ ˆ

ˆ ˆ ˆ

4. Autocorrelation Method: resulting matrix equation:

or( ) ( ) . . ( ) ( )( ) ( ) . . ( ) ( )

.. . . . . .

.. . . . .( ) ( ) . . ( )

α ααα

α

−ℜ = = ℜ

− ⎡ ⎤⎡ ⎤⎢ ⎥⎢ ⎥− ⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥ =⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥− −⎣ ⎦ ⎣ ⎦

n n n n

n n n n

pn n n

r rR R R p RR R R p R

R p R p R ˆ

.( )

matrix equation solved using Levinson or Durbin method

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦nR p

Page 6: Lecture 13 fall 2010 - UCSB speech... · Lecture 13 Linear Predictive Coding (LPC)-Introduction 2 LPC Methods • LPC methods are the most widely used in speech coding, speech synthesis,

6

31

LPC Summary21 1

2

0 0 1

1

1

ˆ ˆ ˆ ˆ

ˆ ˆ

5. Covariance Method: fix interval for error signal

( ) ( ) ( )

ˆ ˆ need signal for from ( ) to ( ) samples

( , ) (

α

α φ φ

− −

= = =

=

⎡ ⎤= = − −⎢ ⎥

⎣ ⎦− + − ⇒ +

=

∑ ∑ ∑

pL L

kn n n nm m k

p

k n nk

E e m s m s m k

s n p s n L L p

i k

1

1

0 1 2

0 0 0

11 1 2 12 1 2 2 2

1

ˆ ˆ ˆ

ˆ ˆ ˆ

ˆ ˆ ˆ

ˆ

, ), , ,...,

( , ) ( , )

expressed as a matrix equation: or , symmetric matrix

( , ) ( , ) . . ( , )( , ) ( , ) . . ( , ). . . . . . . . . .

( ,

φ α φ

φα ψ α φ ψ φ

φ φ φφ φ φ

φ

=

=

= −

= =

∑p

kn n nk

n n n

n n n

n

i i p

E k

pp

p

1

2

1 02 0

2 0

ˆ

ˆ

ˆ ˆ ˆ

( , )( , )

. .

. .) ( , ) . . ( , ) ( , )

α φα φ

αφ φ φ

⎡ ⎤⎡ ⎤ ⎡ ⎤⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥=⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦⎣ ⎦

n

n

pn n np p p p 32

Computation of Model Gain it is reasonable to expect the model gain, , to be determined by matching

the signal energy with the energy of the linearly predicted samples from the basic model equations we have

G

1

1

( ) ( ) ( ) model

whereas for the prediction error we have

( ) ( ) ( ) best fit to model

when (i.e., perfect match to model), then ( ) ( )

α

α

=

=

= − − ⇒

= − − ⇒

• =

=•

p

kk

p

kk

k k

Gu n s n a s n k

e n s n s n k

ae n Gu n

since it is virtually impossible to guarantee that , cannot use this simple matching property for determining the gain; instead use energy matching criterion (energy in error signal=energy in

α =k ka

1 12 2 2

0 0ˆ

excitation)

( ) ( )− + − +

= =

= =∑ ∑L p L p

nm m

G u m e m E

33

Gain Assumptions

assumptions about excitation to solve for -- ( ) ( ) order of a

single pitch period; predictor order, , large enough to model glottal pulse shape, vocal tract IR, and

δ•− = ⇒

Gu n n L

pvoiced speech

radiation -- ( )-zero mean, unity variance, stationary white noise process− u nunvoiced speech

34

Solution for Gain (Voiced)

1

1

1

for voiced speech the excitation is ( ) with output ( ) (since it is the IR of the system),

( ) ( ) ( ); ( )( )

with autocorrelation ( ) (of the impu

δ

α δ

α= −

=

= − + = =

∑∑

%

% % %

%

p

k pk k

kk

G n h n

G Gh n h n k G n H zA z

z

R m

0

1

2

1

0

1

0 0

lse response) satisfying the relation shown below

( ) ( ) ( ) [ ],

( ) ( ),

( ) ( ) ,

α

α

=

=

=

= + = − ≤ < ∞

= − ≤ < ∞

= + =

% % % %

% %

% %

np

kkp

kk

R m h n h m n R m m

R m R m k m

R R k G m

35

Solution for Gain (Voiced)

0

0

ˆ

Since ( ) and ( ) have the identical form, it follows that

( ) ( ),where is a constant to be determined. Since the total energies in the signal ( ( )) and the impulse

= ⋅ ≤ ≤

%

%n

n

R m R m

R m c R m m pc

R

2

1

0

0

0

ˆ ˆ ˆ

ˆ

response ( ( )) must be equal, the constant must be 1, and we obtain the relation

( ) ( )

since ( ) ( ), , and the energy of the impulse response=en

α=

= − =

• = ≤ ≤

%

%

p

kn n nk

n

R c

G R R k E

R m R m m p1

1

ergy of the signal => first coefficients of the autocorrelation of the impulse response of the model are identical to the first coefficients of the autocorrelation function of the speech

+

+

p

p signal. This condition called the of the autocorrelation method.

autocorrelation matchingproperty 36

Solution for Gain (Unvoiced) for unvoiced speech the input is white noise with zero mean

and unity variance, i.e., ( ) ( ) ( ) if we excite the system with input ( ) and call the output

( ) then

δ

− =⎡ ⎤⎣ ⎦•%

E u n u n m m

Gu ng n

1

( ) ( ) ( )

Since the autocorrelation function for the output is the convolution of the autocorrelation function of the impulse response with the autocorrelation function

α=

= − +∑% %p

kk

g n g n k Gu n

1

of the white noise input, then

[ [ ] [ ]] [ ] [ ] [ ]

letting ( ) denote the autocorrelation of ( ) gives

( ) ( ) ( ) [ ( ) ( )] (

δ

α=

− = ∗ =

= − = − − +⎡ ⎤⎣ ⎦ ∑

% %% %

% %

% % % % %p

kk

E g n g n m R m m R m

R m g n

R m E g n g n m E g n k g n m E Gu n

1

0

0 0

) ( )

( ),

since ( ) ( ) for because ( ) is uncorrelated

with any signal prior to ( )

α=

−⎡ ⎤⎣ ⎦

= − ≠

• − = >⎡ ⎤⎣ ⎦

%

%

%

p

kk

g n m

R m k m

E Gu n g n m m u n

u n

Page 7: Lecture 13 fall 2010 - UCSB speech... · Lecture 13 Linear Predictive Coding (LPC)-Introduction 2 LPC Methods • LPC methods are the most widely used in speech coding, speech synthesis,

7

37

Solution for Gain (Unvoiced)

1

2

12

0

0

for we get

( ) ( ) ( ) ( )

( )

since ( ) ( ) ( )( ( ) terms prior to

since the energy in the signal must equal the energy in the

α

α

=

=

• =

= + ⎡ ⎤⎣ ⎦

= +

• = + =⎡ ⎤ ⎡ ⎤⎣ ⎦⎣ ⎦•

% % %

%

%

p

kkp

kk

m

R R k GE u n g n

R k G

E u n g n E u n Gu n n G

2

1

0

ˆ

ˆ ˆ ˆ

response to ( ) we get

( ) ( )

( ) ( )α=

=

= − =∑

%n

p

kn n nk

Gu n

R m R m

G R R k E38

Frequency Domain

Interpretations of Linear

Predictive Analysis

39

The Resulting LPC Model{ }

1

1 2

1

The final LPC model consists of the LPC parameters, , , ,..., , and the gain, , which together define the system function

( )

with frequency response

α

α −

=

=

=−∑

%

%

k

pk

kk

k p G

GH zz

H

( )

1

22

1

1

0ˆ ˆ ˆ ˆ

( )( )

with the gain determined by matching the energy of the model to the short-time energy of the speech signal, i.e.,

( ) ( ) ( )

ωω

ωα

α

=

=

= =−

= = = −

∑ ∑

jp j

j kk

k

p

kn n n nm k

G GeA ee

G E e m R R k40

LPC Spectrum

x = s .* hamming(301);X = fft( x , 1000 )[ A , G , r ] = autolpc( x , 10 )H = G ./ fft(A,1000);1

1( )ω

ωα −

=

=−∑

% jp

j kk

k

GH ee

LP Analysis is seen to be a method of short-time spectrum estimation with removal of excitation fine structure (a form of wideband spectrum analysis)

41

LP Short-Time Spectrum Analysis

• Defined speech segment as:

• The discrete-time Fourier transform of this windowed segment is:

• Short-time FT and the LP spectrum are linked via short-time autocorrelation

ˆ ˆ[ ] [ ] [ ]ns m s m n w m= +

( )ˆ ˆ[ ] [ ]j j mn

mS e s m n w m eω ω

∞−

=−∞

= +∑

42

LP Short-Time Spectrum Analysis

(a) Voiced speech segment obtained using a Hamming window

(b) Corresponding short-time autocorrelation function used in LP analysis (heavy line shows values used in LP analysis)

(c) Corresponding short-time log magnitude Fourier transform and short-time log magnitude LPC spectrum (FS=16 kHz)

Page 8: Lecture 13 fall 2010 - UCSB speech... · Lecture 13 Linear Predictive Coding (LPC)-Introduction 2 LPC Methods • LPC methods are the most widely used in speech coding, speech synthesis,

8

43

LP Short-Time Spectrum Analysis

(a) Unvoiced speech segment obtained using a Hamming window

(b) Corresponding short-time autocorrelation function used in LP analysis (heavy line shows values used in LP analysis)

(c) Corresponding short-time log magnitude Fourier transform and short-time log magnitude LPC spectrum (FS=16 kHz)

44

Frequency Domain Interpretation of Mean-Squared Prediction Error

ˆ

0

ˆ

The LP spectrum provides a basis for examining the propertiesof the prediction error (or equivalently the excitation of the VT) The mean-squared prediction error at sample is:

n

L p

nm

n

E e+

=

=1

2 2 2 2ˆ ˆ ˆ

ˆ ˆ

[ ]

1 1| ( ) | | ( ) | | ( ) |2 2

( ) [ ] ( )

which, by Parseval's Theorem, can be expressed as:

where is the FT of and is the correspondingprediction

j j jn n n

j jn n

m

E e d S e A e d G

S e s m A e

π πω ω ω

π π

ω ω

ω ωπ π

− −

= = =

∫ ∫E

1

( ) 1

error frequency response

p

j j kk

k

A e eω ωα −

=

= −∑

45

Frequency Domain Interpretation of Mean-Squared Prediction Error

222ˆ

ˆ 2

( )( )

| ( ) |2 | ( ) |

The LP spectrum is of the form:

Thus we can express the mean-squared error as:

We see that minimizing total squared prediction error

jj

jn

n j

GH eA e

S eGE d GH e

ωω

π ω

ωπ

ωπ −

=

= =∫

%

%

is equivalent to finding gain and predictor coefficientssuch that the integral of the ratio of the energy spectrum ofthe speech segment to the magnitude squared of the frequencyresponse of the model l

| ( ) |

| ( ) |

| ( ) |

inear system is unity. Thus can be interpreted as a frequency-domain

weighting function LP weights frequencies where is large more heavily than when is small.

jn

jn

jn

S e

S e

S e

ω

ω

ω

⇒46

LP Interpretation Example1

Much better spectral matches to STFT spectral

peaks than to STFT spectral

valleys as predicted by

spectral interpretation of

error minimization.

47

LP Interpretation Example2

Note small differences in

spectral shape between STFT, autocorrelation spectrum and

covariance spectrum when

using short window duration (L=51 samples).

48

Effects of Model Orderˆ ˆ[ ] [ ],

[ ], [ ],( ),

( 1)

The AC function, of the speech segment,

and the AC function, of the impulse response, corresponding to the system function, are equal forthe first values. Th

n nR m s m

R m h mH z

p +

%%

%

2 2ˆ

,

lim | ( ) | | ( ) |

( ),

us, as the AC functions are equal for all values and thus:

Thus if is large enough, the FR of the all-polemodel, can approximate the signal spectrumwi

j jnp

j

p

H e S e

p

H e

ω ω

ω

→∞

→∞

=%

%

th arbitrarily small error.

Page 9: Lecture 13 fall 2010 - UCSB speech... · Lecture 13 Linear Predictive Coding (LPC)-Introduction 2 LPC Methods • LPC methods are the most widely used in speech coding, speech synthesis,

9

49

Effects of Model Order

50

Effects of Model Order

51

Effects of Model Order

52

Effects of Model Order

plots show Fourier transform of segment and LP spectra for various orders - as increases, more details of the spectrum are preserved - need to choose a value of that represents

p

p the spectral effects of the glottal pulse, vocal

tract and radiation--nothing else

53

Linear Prediction Spectrogram

1(2 / )

020log | [ ] | 20 log| [ ] [ ] |

, / , 1, 2,..., / 2

Speech spectrogram previously defined as:

for set of times, and set of frequencies, where is the time shift (in s

Lj N km

rm

r k S

S k s rR m w m e

t rRT F kF N k NR

π−

=

= +

= = =

1/amples) between adjacent STFTs,

is the sampling period, is the sampling frequency, and is the size of the discrete Fourier transform used tocompute each STFT estimate. Similarly we can def

ST F TN

=

(2 / )

(2 / )

20log | [ ] | 20log( )

( ).

ine the LP spectrogram as an image plot of:

where and are the gain and prediction error polynomialat analysis time

rr j N k

r

j N kr r

GH kA e

G A erR

π

π

=%

54

Linear Prediction Spectrogram

L=81, R=3, N=1000, 40 db dynamic range

Page 10: Lecture 13 fall 2010 - UCSB speech... · Lecture 13 Linear Predictive Coding (LPC)-Introduction 2 LPC Methods • LPC methods are the most widely used in speech coding, speech synthesis,

10

55

Comparison to Other Spectrum Analysis Methods

Spectra of synthetic vowel /IY/

(a) Narrowband spectrum using 40 msec window

(b) Wideband spectrum using a 10 msec window

(c) Cepstrally smoothed spectrum

(d) LPC spectrum from a 40 msec section using a p=12 order LPC analysis 56

Comparison to Other Spectrum Analysis Methods

Natural speech spectral estimates using cepstral smoothing (solid line) and linear prediction analysis (dashed line).

Note the fewer (spurious) peaks in the LP analysis spectrum since LP used p=12 which restricted the spectral match to a maximum of 6 resonance peaks.

Note the narrow bandwidths of the LP resonances versus the cepstrally smoothed resonances.

57

Selective Linear Prediction

1

2

it is possible to apply LP methods to selected parts of spectrum - 0-4 kHz for voiced sounds use a predictor of order - 4-8 kHz for unvoiced sounds use a predictor of order the key id

•⇒

pp

{ } { }0 5

2 2 0

22

2 2

ea is to map the frequency region { , } linearly to { ,. } or, equivalently, the region , maps linearly to , via

the transformation

=

we must modify the calculation

π π π

ω πω π

π π−′ ⋅−

A B

A B

AB

B A

f f

f f

ff

f f

212 ˆ

for the autocorrelation to give:

( ) | ( ) |π

ω ω

π

ωπ

′ ′

′ ′= ∫ j j mnR m S e e d

Selective Linear Prediction

58

• 0-5 kHz region modeled using p1=14

• 5-10 kHz region modeled using p2=5

• discontinuity in model spectra at 5 kHz

• 0-10 kHz region modeled using p=28

•no discontinuity in model spectra at 5 kHz

59

Solutions of LPC Equations

Covariance Method (Cholesky Decomposition Method)

60

LPC Solutions-Covariance Method

1

0 1 2ˆ ˆ

ˆ

for the covariance method we need to solve the matrix equation

( , ) ( , ), , ,...,

(in matrix notation) is a positive definite, symmetric matrix with ( , ) element

α φ φ

φα ψφ φ

=

= =

=•

∑p

k n nk

n

i k i i p

i j0ˆ

t

( , ), and and are column vectors with elements and ( , ) the solution of the matrix equation is called the Cholesky

decomposition, or square root method

=VDV ; V lower triangula

α ψ α φ

φ

=

i n

i ji

r matrix with 1's on the main diagonal D=diagonal matrix

Page 11: Lecture 13 fall 2010 - UCSB speech... · Lecture 13 Linear Predictive Coding (LPC)-Introduction 2 LPC Methods • LPC methods are the most widely used in speech coding, speech synthesis,

11

61

LPC Solutions-Covariance Method

1

1

1

1 1

1

ˆ

ˆ

can readily determine elements of V and D by solving for ( , ) elements of the matrix equation, as follows

( , ) ,

giving

( , ) ,

φ

φ

=

=

= ≤ ≤ −

= − ≤ ≤

j

ik k jknk

j

ij j ik k jknk

i j

i j V d V j i

V d i j V d V j

1

12

1

1

1

2

11

ˆ

ˆ

ˆ

and for the diagonal elements

( , )

giving

( , ) ,

with ( , )

φ

φ

φ

=

=

=

= − ≥

•=

∑ ik

i

ik k iknk

i

i knk

n

i

i i V d V

d i i V d i

d62

Cholesky Decomposition Example

11 21 31 41

21 22 32 42

31 32 33 43

41 42 34 44

1

21 2

31 32 3

41 42 43 4

4

1 0 0 0 0 0 01 0 0 0 0 0

1 0 0 0 01 0 0 0

ˆ consider example with , and matrix elements ( , )φ φ

φ φ φ φφ φ φ φφ φ φ φφ φ φ φ

• = =

⎡ ⎤⎢ ⎥⎢ ⎥ =⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥

⎢ ⎥⎢ ⎥ ⎣ ⎦⎣ ⎦

ijnp i j

dV dV V dV V V d

21 31 41

32 42

43

10 10 0 10 0 0 1

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

V V VV V

V

63

Cholesky Decomposition Example

21

1 21 31 41 2 32 42 3 43 4

1 11

21 1 21 31 1 31 41 1 41

21 21 1 31 31 1 41 41 12

2 22 1

32 2 32 31 1 21 32 32 31 1 2

solve matrix for , , , ,step 1

step 2

st

, , , , ,

; ;/ ; / ; /

e 3 p

φφ φ φ

φ φ φ

φ

φ φ

=

= = =

= = =

= −

= − ⇒ = −

d V V V d V V d V dd

V d V d V dV d V d V d

d V d

V d V d V V V d V( )( )

1 2

42 2 42 41 1 21 42 42 41 1 21 2

3 43 4

step

/

/ iterate procedure to solve for , ,

4φ φ= − ⇒ = −

d

V d V d V V V d V d

d V d

64

LPC Solutions-Covariance Method

t

t

t 1

now need to solve for using a 2-step procedure

VDV writing this as

VY= with

DV or

V D from V (which is now known) solve for column vector Y

using a

α

α ψ

ψ

α

α −

=•

=

=•

Y

Y

1

1

1 1

2

simple recursion of the form

= ,

with initial condition

ψ

ψ

=

− ≥ ≥

•=

∑i

i i ij jj

Y V Y p i

Y

65

LPC Solutions-Covariance Method

1

1 1

11

now can solve for using the recursion

/ ,

with initial condition /

calculation proceeds backwards from down to

α

α α

α

= +

= − ≤ ≤ −

•=

• = −=

∑p

i i i ji jj i

p p p

Y d V i p

Y d

i pi

66

Cholesky Decomposition Example

1 1

21 2 2

31 32 3 3

41 42 43 4 4

1 4

1 1

2 2 21 1

3 3 31 1 32 2

4

1 0 0 01 0 0

1 01

continu ing the exam ple w e so lve for Y

first so lv ing for w e get

ψψψψ

ψψψ

⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥

⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎣ ⎦ ⎣ ⎦⎣ ⎦• −

=

= −

= − −

YV YV V YV V V Y

Y YYY V YY V Y V YY 4 41 1 42 2 43 3ψ= − − −V Y V Y V Y

Page 12: Lecture 13 fall 2010 - UCSB speech... · Lecture 13 Linear Predictive Coding (LPC)-Introduction 2 LPC Methods • LPC methods are the most widely used in speech coding, speech synthesis,

12

67

Cholesky Decomposition Example

1 1 1 1 121 31 41

2 2 2 2 232 42

3 3 3 3 343

4 4 4 4 4

1 0 0 010 1 0 00 10 0 1 00 0 10 0 0 10 0 0 1

next solve for from equation/ /

/ // /

/ / givi

ααααα

⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎡ ⎤⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ = =⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦

d Y Y dV V Vd Y Y dV V

d Y Y dVd Y Y d

4 4 4

3 3 3 43 4

2 2 2 32 3 42 4

1 1 1 21 2 31 3 41 4

ng the results /

///

completing the solution

αα αα α αα α α α

=

= −

= − −

= − − −

Y dY d VY d V VY d V V V

68

Covariance Method Minimum Error

1

1 1

1

2

0 0 0

0 0

0 0

0 0

ˆ ˆ ˆ

ˆ

ˆ ˆ

ˆ

the minimum mean squared error can be written in the form

( , ) ( , )

( , )

since can write this as

( , )

( , ) /

φ α φ

φ α ψ

α

φ

φ

=

− −

= −

= −

• =

= −

= −

∑p

kn n nkt

nt t

tn n

k knk

E k

Y D V

E Y D Y

Y d1

ˆ this computation for can be used for all values of LP order from 1 to can understand how LP order reduces mean-squared error

=

∑p

nEp

69 70

Solutions of LPC Equations

Autocorrelation Method via Levinson-Durbin Algorithm

71

Levinson-Durbin Algorithm 1

1

1

ˆ)

[| |] [ ] 1

[ ]

Autocorrelation equations (at each frame :

is a positive definite symmetric Toeplitz matrix The set of optimum predictor coefficients satisfy:

p

kk

kk

n

R i k R i i p

R i

α

α

=

=

− = ≤ ≤

∑Rα = r

R

( )

1

[| |] 0, 1

[0] [ ]

with minimum mean-squared prediction error of:

p

pp

kk

R i k i p

R R k Eα=

− = ≤ ≤

− =

∑72

Levinson-Durbin Algorithm 2

( )1( )2

1[0] [1] [2] ... [ ][1] [0] [1] ... [ 1][2] [1] [0] ... [ 2]. . . . .[ ] [ 1] [ 2] ... [0]

By combining the last two equations we get a larger matrix equation of the form:

p

p

R R R R pR R R R pR R R R p

R p R p R p R

αα

⎡ ⎤⎢ ⎥ −−⎢ ⎥

−⎢ ⎥−⎢ ⎥⎢ ⎥⎢ ⎥− −⎣ ⎦

( )

( )

00

. .0

1) ( 1) expanded ( x matrix is still Toeplitz and can be solved iterativelyby incorporating new correlation value at each iteration andsolving for next

p

pp

E

p p

α

⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥− ⎣ ⎦⎣ ⎦

+ +

higher order predictor in terms of new correlation value and previous predictor

Page 13: Lecture 13 fall 2010 - UCSB speech... · Lecture 13 Linear Predictive Coding (LPC)-Introduction 2 LPC Methods • LPC methods are the most widely used in speech coding, speech synthesis,

13

73

Levinson-Durbin Algorithm 3

( 1) ( 1) ( 1) ( 1)

( ) ( ) ( )

( 1),

( 1)

th st

i i i i

i i i

st

i iR E

R Ei

α α

α

− − − −

=

=

Show how order solution can be derived from order solution; i.e., given the solution to we derive solution to The solution can be expres

( 1)

( 1)1( 1)2

( 1)1

1[0] [1] [2] ... [ 1][1] [0] [1] ... [ 2] 0[2] [1] [0] ... [ 3] 0

.. . . . . .[ 1] [ 2] [ 3] ... [0] 0

i

i

i

ii

R R R R i ER R R R iR R R R i

R i R i R i R

αα

α

−−

− ⎡ ⎤⎡ ⎤⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥ −− ⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥ − =−⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥ −− − −⎣ ⎦ ⎣ ⎦ ⎣ ⎦

sed as:

74

Levinson-Durbin Algorithm 4

( 1) ( )

( 1)

[0] [1] [2] ... [ ][1] [0] [1] ... [ 1][2] [1] [0] ... [ 2]. . . ... .

[ 1] [ 2] [ 3]

Appending a 0 to vector and multiplying by the matrix givesa new set of equations of the form:

i iRi

R R R R iR R R R iR R R R i

R i R i R i

α −

+

−−

− − −

( 1)

( 1)1( 1)2

( 1)1

( 1)

1( 1) ( 1)

1

100

. .... [1] 0

[ ] [ 1] [ 2] ... [0] 0

[ ] [ ] [ ] where and are introduced

i

i

i

ii

i

ii i

jj

E

RR i R i R i R

R i R i j R i

αα

αγ

γ α

−−

−− −

=

⎡ ⎤⎡ ⎤ ⎡ ⎤⎢ ⎥⎢ ⎥ ⎢ ⎥− ⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥−

= ⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥−⎢ ⎥⎢ ⎥ ⎢ ⎥

− − ⎢ ⎥⎣ ⎦ ⎣ ⎦ ⎣ ⎦

= − −∑

75

Levinson-Durbin Algorithm 5

[0] [1] [2] ... [ ][1] [0] [1] ... [ 1][2] [1

Key step is that since Toeplitz matrix has special symmetrywe can reverse the order of the equations (first equation last,last equation first), giving:

R R R R iR R R R iR R

( 1)

( 1)1

( 1)2

( 1)1

( 1)

00

] [0] ... [ 2] 0. . . ... . . .

[ 1] [ 2] [ 3] ... [1] 0[ ] [ 1] [ 2] ... [0] 1

i

iii

i

i

i

R R i

R i R i R i RR i R i R i R E

γαα

α

−−−−

⎡ ⎤⎡ ⎤ ⎡ ⎤⎢ ⎥⎢ ⎥ ⎢ ⎥− ⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥− −

= ⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥− − − −⎢ ⎥⎢ ⎥ ⎢ ⎥

− − ⎢ ⎥⎣ ⎦ ⎣ ⎦ ⎣ ⎦

76

Levinson-Durbin Algorithm 6( )

( 1)1( 1)2( )

( 1)1

1

.

0

To get the equation into the desired form (a singlecomponent in the vector ) we combine the twosets of matrices (with a multiplicative factor ) giving:

αα

α

−−

⎡⎢−⎢−⎢⎢⎢⎢−

i

i

i

ii

ii

Ek

R

( 1) ( 1)

( 1)1

( 1)2

( 1)1

( 1) ( 1)

( 1)

00 00 0

. . .0 0

1

Choose so tha

γαα

αγ

γ

− −

−−−−

− −

⎡ ⎤⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎤ ⎡ ⎤⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎥ ⎢ ⎥− ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥−⎥ ⎢ ⎥

− = −⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎥ ⎢ ⎥−⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦⎣ ⎦ ⎣ ⎦

i i

iii

ii i

i

i i

i

E

k k

E

1( 1)

( 1)1

( 1) ( 1)

[ ] [ ]

t vector on right has only a singlenon-zero entry, i.e.,

α

γ

−−

−=

− −

− −= =

∑i

iji

ji i i

R i R i jk

E E

77

Levinson-Durbin Algorithm 7( ) ( 1) ( 1) ( 1) 2

( 1)

(1 )

,

i i i ii i

ii th

E E k E kk

i

γ

γ

− − −

= − = −

The first element of the right hand side vector is now: The parameters are called PARCOR coefficients. With this choice of the vector of order

( ) ( 1) ( 1)1 1 1( ) ( 1) ( 1)2 2 2

( ) ( 1) ( 1)1 1 1

( )

1 1 0

. . .

0 1

i i ii

i i ii

i

i i ii ii

i

k

α α αα α α

α α αα

− −−

− −−

− −− −

⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥ ⎢ ⎥− − −⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥− − −

= −⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥− − −⎢ ⎥ ⎢ ⎥ ⎢ ⎥−⎢ ⎥ ⎣ ⎦ ⎣ ⎦⎣ ⎦

predictorcoefficients is:

yielding the updati( ) ( 1) ( 1)

( )

, 1, 2,...,i i ij j i i j

ii i

k j i

k

α α α

α

− −−= − =

=

ng procedure

78

Levinson-Durbin Algorithm 7( )

( ) 2 2

1 1

1

[0] (1 ) [0] (1 )

[ ] [ ] / [0]

The final solution for order is:

with prediction error

If we use normalized autocorrelation coefficients: we get norma

pj j

p pp

m mm m

pj p

E E k R k

r k R k R

α α

= =

= ≤ ≤

= − = −

=

∏ ∏

( )( ) ( ) 2

1 1( )

1 [ ] (1 )[0]

0 1 1 1

lized errors of the form:

where or

i iii i

k mk m

ii

E r k kR

k

ν α

ν= =

= = − = −

< ≤ − < <

∑ ∏

Page 14: Lecture 13 fall 2010 - UCSB speech... · Lecture 13 Linear Predictive Coding (LPC)-Introduction 2 LPC Methods • LPC methods are the most widely used in speech coding, speech synthesis,

14

79

Levinson-Durbin Algorithm

1

1 1

( ) ( )

( )

( ) ( )( )

− − −

⇒ =

i i

i ii

A z A zk z A z

80

Autocorrelation Example

1

2

0

11

12 2

1

20 1 11 0 2

01 0

1 0

0 10

( )

( )

( )

consider a simple solution of the form( ) ( ) ( )

( ) ( ) ( )

with solution

( )( ) / ( )

( ) / ( )

( ) ( )( )

αα

α

• =

⎡ ⎤⎡ ⎤ ⎡ ⎤=⎢ ⎥⎢ ⎥ ⎢ ⎥

⎣ ⎦ ⎣ ⎦⎣ ⎦•

==

=

−=

pR R RR R R

E Rk R R

R R

R RER

81

Autocorrelation Example2

2 2 2

22

2 2 2

21 2 2

212

2 2

2 0 10 1

2 0 10 1

1 0 1 20 1

( )

( )

( )1

( )

( )

( ) ( ) ( )( ) ( )

( ) ( ) ( )( ) ( )

( ) ( ) ( ) ( )( ) ( )

with final coefficients

prediction error for predictor of order

α

α

α α

α α

−=

−=

−−

=−

=

=

=i

R R RkR R

R R RR R

R R R RR R

E i82

Prediction Error as a Function of p

11

0 0[ ]

[ ] [ ]α

=

= = −∑p

n nn k

kn n

E R kVR R

Model order is usually determinedby the following rule of thumb:

Fs/1000 poles for vocal tract2-4 poles for radiation2 poles for glottal pulse

83

Autocorrelation Method Properties

• mean-squared prediction error always non-zero– decreases monotonically with increasing model order

• autocorrelation matching property– model and data match up to order p

• spectrum matching property– favors peaks of short-time FT

• minimum-phase property– zeros of A(z) are inside the unit circle

• Levinson-Durbin recursion– efficient algorithm for finding prediction coefficients– PARCOR coefficients and MSE are by-products