System Identi cation

System IdentificationLecture 4: Transfer function averaging and smoothing

Roy Smith

2019-10-4 4.1

Averaging

Multiple estimates

Multiple experiments: urpkq, yrpkq, r “ 1, . . . , R, and k “ 0, . . . ,K ´ 1.

Multiple estimates (ETFE):

Grpejωnq “ YrpejωnqUrpejωnq (drop the N from the YN pejωn q notation)

Averaging to improve the estimate.

Gpejωnq “Rÿ

r“1

αrGrpejωnq, withRÿ

r“1

αr “ 1.

How to choose αr?

The “average” is calculated with αr “ 1{R.

2019-10-4 4.2

Averaging

Optimal weighted average

If the variance of Grpejωnq is σ2rpejωnq

(with uncorrelated errors and identical means), then,

variance´Gpejωnq

¯“ variance

˜Rÿ

r“1

αrpejωnqGrpejωnq¸“

Rÿ

r“1

α2rσ

2rpejωnq

This is minimized by

αrpejωnq “ 1{σ2rpejωnq

Rÿ

r“1

1{σ2rpejωnq

.

If variance´Grpejωnq

¯“ φvpejωnq

1N|Urpejωnq|2 then αrpejωnq “ |Urpejωnq|2

Rÿ

r“1

|Urpejωnq|2.

2019-10-4 4.3

Averaging

Variance reduction

Best˚ result:ˇUrpejωnqˇ is the same for all r “ 1, . . . , R.

This gives,

variance´Gpejωnq

¯“

variance´Grpejωnq

¯

R.

If the estimates are biased we will not get as much variance reduction.

(The method of splitting the data record and averaging the periodograms is attributed

to Bartlett (1948, 1950).)

˚ This is “best” given that we have R experiments and an upper bound onˇˇUrpejωn q

ˇˇ.

2019-10-4 4.4

Averaging example (noisy data)

Splitting the data record: K “ 72, R “ 3, N “ 24,

´5

0

5

index: k´20 ´10 0 10 20 30 40 50 60 70

upkq

´5

0

5

index: k´20 ´10 0 10 20 30 40 50 60 70

ypkq

´5

0

5

index: k´20 ´10 0 10 20 30 40 50 60 70

y1pkq

´5

0

5

index: k´20 ´10 0 10 20 30 40 50 60 70

y2pkq

´5

0

5

index: k´20 ´10 0 10 20 30 40 50 60 70

y3pkq

2019-10-4 4.5


Estimates: Grpejωnq, r “ 1, 2, 3 and the weighted average, Gavgpejωnq.

Magnitude

Gpejωq

G1pejωn q

G2pejωn q

G3pejωn qGavgpejωn q

0.1

1

10

Frequency, ω(rad/sample)

1 π

2019-10-4 4.6


Estimates: Grpejωnq, r “ 1, 2, 3 and the weighted average, Gavgpejωnq.

Errors: Erpejωnq “ Gpejωnq ´ Grpejωnq.

Magnitude

E1pejωn qE2pejωn q

E3pejωn q

Eavgpejωn q

0.01

0.1

1

5


1 π

2019-10-4 4.7

Averaging with periodic signals

Splitting the data record: K “ 72, N “ 24, R “ 3, upkq has period M “ 24.

´5

0

5

index: k´20 ´10 0 10 20 30 40 50 60 70

upkq

´5

0

5

index: k´20 ´10 0 10 20 30 40 50 60 70

ypkq

´5

0

5

index: k´20 ´10 0 10 20 30 40 50 60 70

y1pkq

´5

0

5

index: k´20 ´10 0 10 20 30 40 50 60 70

y2pkq

´5

0

5

index: k´20 ´10 0 10 20 30 40 50 60 70

y3pkq

2019-10-4 4.8


Estimates: Grpejωnq and Gavgpejωnq “´G2pejωnq ` G3pejωnq

¯{2

Magnitude

Gpejωq

G2pejωn q

G3pejωn q

Gavgpejωn q

0.1

1

10


1 π

2019-10-4 4.9



¯{2


Magnitude

E2pejωn qE3pejωn qEavgpejωn q

Eavgpejωn q(non-periodic)

0.01

0.1

1

5

lFrequency, ω(rad/sample)

1 π

2019-10-4 4.10



¯{2


Magnitude

E2pejωn qE3pejωn qEavgpejωn q

Eavgpejωn q(non-periodic)

0.01

0.1

1

5

lFrequency, ω(rad/sample)

1 π

2019-10-4 4.10

Bias-variance trade-offs in data record splitting

Bartlett’s procedure

Divide a (long) data record into smaller parts for averaging.

Data: tupkq, ypkqu k “ 0, . . . ,K ´ 1 with K very large.

Choose R records, and calculation a length: N, NR ď K.

urpnq “ uprN ` nq, r “ 0, . . . , R´ 1, n “ 0, . . . , N ´ 1.

And average the resulting estimates:

Gpejωnq “R´1ÿ

r“0

Grpejωnq “R´1ÿ

r“0

YrpejωnqUrpejωnq .

Bias and variance effects

As R increases:

§ The number of points calculated, N , decreases (so NR ď K);

§ The variance decreases (by up to 1{R);

§ The bias increases (due to non-periodicity transients).

2019-10-4 4.11

Bias-variance trade-offs in data record splitting

Mean-square error

MSE “ bias2 ` variance.

§ Transient bias grows linearly with the number of data splits.

§ Variance decays with a rate of up to 1/(number of averages).

Optimal bias-variance trade-off

Error

number of records, R

MSE

Bias error

Variance error

1 2

2019-10-4 4.12

Smoothing transfer function estimates

What if we have no option of running periodic input experiments?

§ The system will not tolerate periodic inputs; or

§ The data has already been taken and given to us.

§ More data just gives more frequencies (with the same error variance).

Exploit the assumed smoothness of the underlying system

Gpejωq is assumed to be a low order (smooth) system.

E!pGpejωnq ´ GN pejωnqqpGpejωsq ´ GN pejωsqq

)ÝÑ 0 pn ‰ sq,

(asymptotically at least).

2019-10-4 4.13

Smoothing the ETFE

Smooth transfer function assumption

Assume the true system to be close to constant for a range of frequencies.

Gpejωn`r q « Gpejωnq for r “ 0,˘1,˘2, . . . ,˘R.

Smooth minimum variance estimate

The minimum variance smoothed estimate is,

GN pejωnq “

Rÿ

r“´RαrGN pejωn`r qRÿ

r“´Rαr

, αr “1N

ˇUN pejωn`r qˇ2φvpejωn`r q .

(Here we smooth over 2R` 1 points)

2019-10-4 4.14

Smoothing the ETFE

If N is large (many closely spaced frequencies), the summation can beapproximated by an integral,

GN pejωnq “

Rÿ

r“´RαrGN pejωn`r qRÿ

r“´Rαr

«

ż ωn`r

ωn´r

αpejξqGN pejξq dξż ωn`r

ωn´r

αpejξq dξ, with αpejξq “

1N

ˇUN pejξq

ˇ2

φvpejξq .

2019-10-4 4.15

Smoothing the ETFE

Smoothing window

GN pejωnq “1

2π

ż π

´πWγpejpξ´ωnqqαpejξqGN pejξq dξ

1

2π

ż π

´πWγpejpξ´ωnqqαpejξq dξ

,

with αpejξq “1N

ˇUN pejξq

ˇ2

φvpejξq .

The 1{2π scaling will make it easier to derived time-domain windows later.

2019-10-4 4.16

Smoothing the ETFE

Assumptions on φvpejωq

Assume φvpejωq is also a smooth (and flat) function of frequency.

1

2π

ż π

´πWγpejpξ´ωnqq

ˇˇ 1

φvpejξq ´1

φvpejωnqˇˇ dξ « 0.

Then use,

αpejξq “1N

ˇUN pejξq

ˇ2

φvpejωnq ,

to get,

GN pejωnq “1

2π

ż π

´πWγpejpξ´ωnqq 1

N

ˇˇUN pejξq

ˇˇ2

GN pejξq dξ1

2π

ż π

´πWγpejpξ´ωnqq 1

N

ˇˇUN pejξq

ˇˇ2

dξ

.

2019-10-4 4.17

Weighting functions

Frequency smoothing window characteristics: width (specified by γparameter)

The wider the frequency window (i.e. decreasing γ) ...

§ the more adjacent frequencies included in the smoothed estimate,

§ the smoother the result,

§ the lower the noise induced variance,

§ the higher the bias.

2019-10-4 4.18

Weighting functions

Typical window (Hann) as a function of the width parameter, γ.

γ “ 20

γ “ 10

γ “ 5γ “ 1

N “ 256

Wγpejωn q

5

10

15

20

Frequency: ω

´π ´π{2 0 π{2 π

2019-10-4 4.19

Weighting functions

Window characteristics: shape

Some common choices:

Bartlett: Wγpejωq “ 1

γ

ˆsin γω{2sinω{2

˙2

Hann: Wγpejωq “ 12Dγpωq ` 1

4Dγpω ´ π{γq ` 1

4Dγpω ` π{γq

where Dγpωq “ sinωpγ ` 0.5qsinω{2

Others include: Hamming, Parzen, Kaiser, ...

The differences are mostly in the leakage properties of the energy to adjacentfrequencies. And the ability to resolve close frequency peaks.

2019-10-4 4.20

Weighting functions

Example frequency domain windows.

Welch

Hann

Hamming

Bartlett

γ “ 10

N “ 256

Wγpejωn q

5

10

Frequency: ω

´π ´π{2 0 π{2 π

2019-10-4 4.21

ETFE smoothing example: Matlab calculations

U = fft(u); % calculate N point FFTs

Y = fft(y);

Gest = Y./U; % ETFE estimate

Gs = 0*Gest; % smoothed estimate

[omega,Wg] = WfHann(gamma,N); % window (centered)

zidx = find(omega==0); % shift to start at zero

omega = [omega(zidx:N);omega(1:zidx-1)]; % frequency grid

Wg = [Wg(zidx:N);Wg(1:zidx-1)];

a = U.*conj(U); % variance weighting

for wn = 1:N,

Wnorm = 0; % reset normalisation

for xi = 1:N,

widx = mod(xi-wn,N)+1; % wrap window index

Gs(wn) = Gs(wn) + ...

Wg(widx) * Gest(xi) * a(xi);

Wnorm = Wnorm + Wg(widx) * a(xi);

end

Gs(wn) = Gs(wn)/Wnorm; % weight normalisation

end

2019-10-4 4.22

Window properties

Properties and characteristic values of window functions

1

2π

ż π

´πWγpejξqdξ “ 1 (Normalised)

ż π

´πξWγpejξqdξ “ 0 (“Even” sort of)

Mpγq :“ż π

´πξ2Wγpejξqdξ

W pγq :“ 2π

ż π

´πW 2γ pejξqdξ

Bartlett: Mpγq “ 2.78

γ, W pγq « 0.67γ (for γ ą 5)

Hamming: Mpγq “ π2

2γ2, W pγq « 0.75γ (for γ ą 5)

2019-10-4 4.23

Smoothed estimate properties

Asymptotic bias properties

E!Gpejωnq ´ E

!Gpejωnq

))“ E

!Gpejωnq ´Gpejωnq

)

“Mpγqˆ

1

2G2pejωnq `G1pejωnqφ

1upejωnqφupejωnq

˙

` OpC3pγqqloooomoooonÝÑ 0

as γ ÝÑ 8

` O´

1{?N¯

looooomooooonÝÑ 0

as N ÝÑ 8Increasing γ:

§ makes the frequency window narrower;

§ averages over fewer frequency values;

§ makes Mpγq smaller; and

§ reduces the bias of the smoothed estimate, Gpejωnq.

2019-10-4 4.24


Asymptotic variance properties

E

"´Gpejωnq ´ E

!Gpejωnq

)¯2*“ 1

NW pγqφvpe

jωnqφupejωnq` o

`W pγq{N˘

loooooomoooooonÝÑ 0

as γ ÝÑ 8N ÝÑ 8γ{N ÝÑ 0

Increasing γ:

§ makes the frequency window narrower;

§ averages over fewer frequency values;

§ makes W pγq larger; and

§ increases the variance of the smoothed estimate, Gpejωnq.

2019-10-4 4.25


Asymptotic MSE properties

E

"ˇˇGpejωnq ´Gpejωnq

ˇˇ2*«M2pγq|F pejωnq|2 ` 1

NW pγqφvpe

jωnqφupejωnq ,

where F pejωnq “ 1

2G2pejωnq `G1pejωnqφ

1upejωnqφupejωnq

If Mpγq “M{γ2 and W pγq “ Wγ then MSE is minimised by,

γoptimal “ˆ

4M2|F pejωnq|2φupejωnqWφvpejωnq

˙1{5N1{5.

and

MSE at γoptimal « CN´4{5

2019-10-4 4.26

Bibliography

Windowing and ETFE smoothing

P. Stoica & R. Moses, Introduction to Spectral Analysis (see Chapters 1 and 2),Prentice-Hall, 1997.

Lennart Ljung, System Identification; Theory for the User, (see Section 6.4)Prentice-Hall, 2nd Ed., 1999.

M.S. Bartlett, “Smoothing Periodograms from Time-Series with Continuous Spectra,”Nature, vol. 161(4096), pp. 686–687, 1948.

M.S. Bartlett, “Periodogram analysis and continuous spectra,” Biometrika, vol. 37,pp. 1–16, 1950.

2019-10-4 4.27

System Identi cation

Documents

Transcript of System Identi cation