1 Systems and Signals 1-1 - Department of Physics and ...ivo/DSP/CV.pdf · time. This signal is...

Digital Signal Processing Table of contents

1 Systems and Signals ..................................................................................................... 1-1Discrete and continuous signals 1-1Basic discrete signals 1-1Continuous signals 1-3Discrete and continuous systems 1-5How can we describe LTI systems ? 1-5Continuous time 1-7Discrete time Fourier series 1-8Continuous time Fourier transforms 1-13Minimum error approximation 1-15Gibbs phenomenon 1-16Fourier transform of non periodic signals 1-16Sampling 1-19Relation between spectra of discrete and continuous time signals 1-22DFT Processing 1-23The Fast Fourier Transform 1-24Speed up of FFT relative to DFT 1-25Concluding remarks 1-26References 1-26

2 The z-transform.................................................................................................................. 2-1Introduction 2-1Definition and properties of the z-transform 2-1Inverse z-transform: contour integration 2-4More properties of the z-transform 2-5z-Plane poles and zeros 2-6System stability 2-7Geometrical evaluation of the Fourier Transform in the z-plane. 2-8First and second order LTI systems 2-8Nonzero auxiliary conditions 2-10

3 Design of nonrecursive (FIR) filters .................................................................................. 3-1Introduction 3-1Moving average filters 3-2The Fourier transform method 3-4Windowing 3-6Rectangular window 3-6Triangular window 3-6Von Hann and Hamming windows 3-6Kaiser window 3-7Equiripple filters 3-9Digital differentiators 3-10

4 Design of recursive (IIR) filters......................................................................................... 4-1Introduction 4-1Simple designs based on z-plane poles and zeros 4-1Filters derived from analog designs 4-5The bilinear transformation 4-6Impulse invariant filters 4-8

Versie 1.1 i 1994


Frequency sampling filters 4-12Digital integrators 4-14Running sum 4-14Trapezoid rule 4-15Simpson’s rule 4-15Comparison 4-15

5 Spectral analysis................................................................................................................. 5-1Introduction 5-1Spectral leakage 5-1Windowing 5-3Investigating LTI systems 5-4

6 Time series analysis ..................................................................................................... 6-1Discrete-time difference equation models 6-1Stochastic processes 6-2Autocovariance and autocorrelation functions 6-2Gaussian processes 6-3Intermezzo Spectral representation 6-4Wiener-Khintchine theorem 6-5Autocorrelation function of autoregressive processes 6-8The partial autocorrelation function 6-9Properties of autoregressive, moving average and mixed ARMA processes 6-10Linear nonstationary models 6-12Addition of explanatory variables: ARMAX 6-13Co-integration 6-13Spectral representation of stationary stochastic processes 6-14Cross-covariance and cross-correlation functions 6-16Linear system with noise 6-20Estimation in the time domain 6-21Estimation of the mean 6-22Estimation of the autocovariance function 6-22Estimation of the autocorrelation function 6-23Estimation of parameters in autoregressive models 6-24Estimation of parameters in moving average models 6-25Estimation of parameters in ARMA models 6-26Determining the order of the model 6-27Estimation in the frequency domain 6-27Properties of the periodogram of a linear process 6-28Sampling properties of the periodogram 6-30Consistent estimates of the spectral density function; spectral windows 6-30Sampling properties of spectral estimates 6-32Approximate expression for the bias 6-33Estimation of cross-spectra 6-35Parametric spectral estimation 6-37Use of the Fast Fourier Transform 6-46

Smoothing, prediction and filtering ........................................................................... 6-47Minimum mean square error estimation 6-47Smoothing 6-48Prediction 6-50Updating the forecasts of an ARMA process 6-53

Versie 1.1 ii 1994


The Wiener filter 6-54The Kalman filter 6-57ARMA signals in white noise 6-59State space representation of Kalman filter 6-62References 6-65

7 Stochastic point processes ............................................................................................ 7-1The Poisson process 7-2Shot noise 7-5Application of point processes and correlation to auditory neurophysiology 7-6Time dependent correlation functions and coincidence histograms 7-6Linear systems analysis of the frog middle ear 7-9Linear systems analysis applied to the auditory nerve fibre responses 7-10References 7-14Appendix 1 Matrix fundamentals A-1Appendix 2 Probability theory A-1Transformation of random variables A-4Chi-squared, F and t distributions A-4

Versie 1.1 iii 1994

Systems and Signals Discrete and continuous signals

Chapter 1 Systems and SignalsTo analyse with a computer a physical entity which varies as a function of time, we have totransform it into an electrical signal, which has to be quantized in value and sampled in time.The general scheme is given in figure 1-1. First the physical entity is transformed by atransducer or sensor into an electrical signal , which varies continuously as a function oftime. This signal is then quantized in value at a fixed sample rate by an analog-to-digitalconverter (ADC), resulting in a discrete signal , which exists only for integer values of n

corresponding to time values in which is the sampling interval. This discretesignal is processed by the computer with the digital signal processing (DSP) module. Whenthe processed signal has to be presented as a continuous signal a digital to analogconverter (DAC) is needed.

First we will investigate properties of systems and signals. We will concentrate on the discretecase and show the comparable continuous situation. When the systems are linear and time-invariant the Fourier transform is a powerful tool to describe these systems. In the last sectionwe will concentrate on the transition between the continuous and the discrete domain, andderive the sampling theorem of Nyquist and Shannon.

1.1 Discrete and continuous signalsWe will denote a continuous signal as function of time by round brackets so is acontinuous signal and defined for all t.

We will denote a discrete signal by square brackets . This signal exists only for integervalues of n and is not defined for other values.

1.2 Basic discrete signalsunit step function:

Eq. 1-1

x t( )

x n[ ]t n∆T= ∆T

y n[ ]

figure 1-1. Set-up of a Digital Signal Processing System.

physicalentity

Transducer ADC DSP DACelectricalsignal(continuous)

discretesignal

x t( ) x n[ ] y n[ ] y t( )

x t( )

figure 1-2. (a) continuous signal as function of time.

(b) discrete signal

x t( )x n[ ]

t 0=

x t( )

a) b)n 0=

x n[ ]

u n[ ] 0= n 0<u n[ ] 1= n 0≥

Natuurkundige Informatica 1 3 2003

Systems and Signals Basic discrete signals

unit impulse function:

Eq. 1-2

The unit step function is the running sum of the unit impulse ; and is thedifference of two sample shifted step functions:

Eq. 1-3

Eq. 1-4

Multiplying a discrete signal with an impulse function selects one signal value as or in general

Eq. 1-5

An exponential discrete signal is in general given by

Eq. 1-6

where is real. When we are dealing with exponential growth, when withexponential decay. In general the signal will start at a certain moment. So we will assume thatthe signal is zero until a certain moment . Without loss of generality we will assume

that the signal will start at so

Eq. 1-7

or

Eq. 1-8

When is purely imaginary, so , the signal is given by

Eq. 1-9

When this signal is periodic, so for some N, we require

Eq. 1-10

This occurs when is a multiple of so which results in

Eq. 1-11

δ n[ ] 0= n 0≠δ n[ ] 1= n 0=

u n[ ] δ n[ ] δ n[ ]

u n[ ] δ m[ ]m ∞–=

n

∑ δ n l–[ ]l 0=

∞

∑= =

δ n[ ] u n[ ] u n 1–[ ]–=

x n[ ]δ n[ ] x 0[ ]δ n[ ]=

x n[ ]δ n k–[ ] x k[ ]δ n k–[ ]=

x n[ ] Aeβn=

β β 0> β 0<

n k=

n 0=

x n[ ] Aeβn= n 0≥x n[ ] 0= n 0<

x n[ ] Aeβnu n[ ]=

β β jΩ=

x n[ ] Ae jnΩ A nΩ( )cos jA nΩ( )sin+= =

x n N+[ ] x n[ ]=

e j n N+( )Ω e jnΩ= → e jNΩ 1=

ΩN 2π ΩN m2π=

Ω0 m( )2πN------=


Systems and Signals Continuous signals

which means that when the fundamental frequency is (a multiple of) then the signal isperiodic. Thus periodic in time corresponds to discrete in frequency. When we investigate thediscrete signals

Eq. 1-12

we see that all these signals are periodic with period N. These signals are called harmonicallyrelated. There are only N different frequencies because

Eq. 1-13

This means that there is an ambiguity in the discrete signal. The discrete signal with frequencyk equals again the discrete signal with frequency k+N. This is also called aliasing. Thespectrum of a discrete signal is periodic. Thus discrete in time corresponds to periodic in

frequency. What are the frequencies of the cosines in figure 1-3.? And what is the highestfrequency?In the general case we have the signals

Eq. 1-14

which results in an exponential growth or decay of the signal. When the signals originate fromthe real physical world we have in general to do with real valued signals, in which the isgiven by

Eq. 1-15

1.3 Continuous signalsIn the continuous case the step function is given by

Eq. 1-16

The step function is discontinuous at , it is the integral of the impulse function:

Eq. 1-17

2π N⁄

φk n[ ] e jkΩ0n= with Ω02πN------=

φk N+ n[ ] ej k N+( ) 2π

N------ n

ejk

2πN------ n

e j2πn φk n[ ]= = =

figure 1-3. Ambiguity in discrete signals

n=0

x n[ ] Aeβ0ne jΩn=

x n[ ]

x n[ ] Aeβ0n nΩ φ+( )cosA2---eβ0ne jΩn jφ+ A

2---eβ0ne jΩn– jφ–+= =

u t( )

u t( ) 0= t 0<u t( ) 1= t 0>

t 0=

u t( ) δ τ( )dτ∞–

t

∫=


Systems and Signals Continuous signals

To investigate this impulse function let us first define a function which is rectangularly

shaped and has area 1, so

Eq. 1-18

and

Eq. 1-19

The impulse function can be viewed as the limit of when :

Eq. 1-20

The area of the impulse function is 1. A discussion in more depth can be found in §8.7

figure 1-4. (a) Growing discrete-time sinusoidal signal; (b) decaying discrete-time sinusoid.

δ∆ t( )

δ∆ t( ) 1 ∆⁄= 0 t ∆< <

δ∆ t( ) 0= t 0< or t ∆>

u∆ t( ) δ∆ τ( )dτ∞–

t

∫=

δ∆ t( ) ∆ 0→

δ∆ t( )∆ 0→lim δ t( )=

figure 1-5. andu∆ t( ) δ∆ t( )

t ∆=

1

0

1 ∆⁄

0 ∆

δ t( )


Systems and Signals Discrete and continuous systems

of Arfken. The function plays a crucial role in the sampling of signals. so when we again take the limit we obtain

or in general

Eq. 1-21

1.4 Discrete and continuous systemsA system transforms signals:

A system is time-invariant, when the system gives the same output for a specific input,independent of the time when this input is given. So the system will give today the sameoutput for that input as yesterday, and as it will give tomorrow.

Time-invariant: Eq. 1-22

A system is linear when it is both additive and homogeneous.

Linear:

Eq. 1-23

A result of Eq. 1-23 is that when for a linear system the input equals zero also the outputshould equal zero since . In the remainder of this chapter we willrestrict ourselves to linear time-invariant systems.

1.5 How can we describe LTI systems ?Given a discrete signal then holds or

Eq. 1-24

So we can give a discrete signal as a summation of weighted impulse functions:

Eq. 1-25

x t( )δ∆ t( ) x 0( )δ∆ t( )= ∆ 0→

x t( )δ t( ) x 0( )δ t( )=

x t( )δ t t0–( ) x t0( )δ t t0–( )=

system

x n[ ] y n[ ]

y n[ ]

x n[ ] y n[ ]→TI

⇒x n n0–[ ] y n n0–[ ]→

x1 n[ ] y1 n[ ]→

x2 n[ ] y2 n[ ]→ L

⇒ax1 n[ ] bx2 n[ ]+ ay1 n[ ] by2 n[ ]+→

0 0 x 0 y⋅→⋅ 0= =

L TI

LTI systems

x n[ ] x n[ ]δ n[ ] x 0[ ]δ n[ ]=

x n[ ]δ n k–[ ] x k[ ]δ n k–[ ]=

x n[ ] x k[ ]δ n k–[ ]k ∞–=

∞

∑=


Systems and Signals How can we describe LTI systems ?

When the input signal is an impulse the output signal of the system is called the impulseresponse

Eq. 1-26

So when the system is time-invariant this means that

Eq. 1-27

Linear and time-invariant means that

Eq. 1-28

or using Eq. 1-25

Eq. 1-29

This result is also called the convolution sum and denoted as . Thisresult shows that a LTI system is completely described by its impulse response. When weknow the impulse response of a LTI system we can calculate for each input signal the

output signal by this convolution sum. The convolution is commutative:

Examples:

#1 Given a system which calculates the moving average over three samples. What is itsimpulse response ?

Eq. 1-30

gives

Eq. 1-31

#2 Given a system (with ) described by the following recursive relation, what is itsimpulse response ?

Eq. 1-32

gives

Eq. 1-33

x n[ ] δ n[ ]= ⇒ y n[ ] h n[ ]=

δ n k–[ ] h n k–[ ]→

x n[ ]δ n k–[ ] x k[ ]h n k–[ ]→

y n[ ] x k[ ]h n k–[ ]k ∞–=

∞

∑=

y n[ ] x n[ ]∗h n[ ]=

x n[ ]y n[ ]

y n[ ] x n[ ]∗h n[ ] h n[ ]∗x n[ ]= =

y n[ ] x n[ ] x n 1–[ ] x n 2–[ ]+ +( ) 3⁄=

x n[ ] δ n[ ]=

h n[ ] δ n[ ] δ n 1–[ ] δ n 2–[ ]+ +( ) 3⁄=

a 1<

y n[ ] ay n 1–[ ] x n[ ]+=

x n[ ] δ n[ ]=

h n[ ] ah n 1–[ ] δ n[ ]+=

h n 1–[ ] ah n 2–[ ] δ n 1–[ ]+=

h n[ ] δ n[ ] aδ n 1–[ ] a2h n 2–[ ]+ +=

h n[ ] akδ n k–[ ]k 0=

∞

∑ anδ n k–[ ]k 0=

∞

∑ anu n[ ]= = =


Systems and Signals Continuous time

In contrast to the previous example this is an infinite impulse response.

#3 Given the impulse response of a LTI system by with . Whatis the output signal when the input is a unit step function ?

Eq. 1-34

A system is called causal if for . When a system is not causal it has already

an output before an input is present. A system is called stable if exists. This

means that a bounded input will give a bounded output.

1.6 Continuous timeNow we have to do with a LTI system of which the impulse response is when the input

signal is

Eq. 1-35

The output of the system is now given by a convolution integral instead of a convolution sum.The convolution integral is given by

Eq. 1-36

We may think of the continuous signal consisting of columns of width , and so the signal

consists of the weighted sum of pulses:

h n[ ] anu n[ ]= a 1<

x n[ ] u n[ ]=

y n[ ] u k[ ]an k– u n k–[ ]k ∞–=

∞

∑= n 0≥∀

y n[ ] an k–

k 0=

n

∑ an a 1–( )k

k 0=

n

∑ 1 an 1+–1 a–

---------------------= = =

figure 1-6. Output for example #3

h n[ ] 0= n 0<

h k[ ]k ∞–=∞∑

h t( )δ t( )

δ t( )LTI

→h t( )

y t( ) x τ( )h t τ–( )dτ∞–

∞

∫=

∆δ∆


Systems and Signals Discrete time Fourier series

Eq. 1-37

For a time-invariant system results in the response which equals

shifted over a time interval so is given by

Eq. 1-38

#4 Given a LTI system with impulse response given by . What is the

output when the input signal is the unit step function: ?

Eq. 1-39

Example #4 is the famous leaky integrator, of which example #3 is the discrete analogon.

1.7 Discrete time Fourier seriesGiven a linear time-invariant system with an impulse response . So the output for

an input signal is given by

Eq. 1-40

Now we ask ourselves, which signal after being transformed by a LTI system will give thesame output signal (apart from a multiplicative factor). So what are the eigenfunctions

of a LTI system

Eq. 1-41

These eigenfunctions are given by , in which is a complex number or

x t( ) x k∆( )δ∆ t k∆–( ) ∆⋅k ∞–=

∞

∑∆ 0→lim=

δ∆ t k∆–( ) h∆ t k∆–( )

h∆ t( ) k∆ y t( )

y t( ) x k∆( )h∆ t k∆–( ) ∆⋅k ∞–=

∞

∑∆ 0→lim x τ( )h t τ–( )dτ

∞–

∞

∫= =

figure 1-7. Building a continuous signal from rectangular pulses

t→

x t( )

a)0 ∆

1 ∆⁄

0 k∆

δ∆ t k∆–( )

t→

h t( ) e at– u t( )=

x t( ) u t( )=

y t( ) e aτ– dτ0

t

∫ 1a--- 1 e at––( )= = for t 0>

h n[ ] y n[ ]x n[ ]

y n[ ] x n[ ]∗h n[ ] x n k–[ ]h k[ ]k ∞–=

∞

∑= =

φk n[ ]

φk n[ ]LTI

⇒λkφk n[ ]

zn z



Eq. 1-42

This can easily be seen by substituting Eq. 1-42 into Eq. 1-40 resulting in

Eq. 1-43

in which is defined by

Eq. 1-44

Note that in Eq. 1-43 a time shift is expressed by multiplication with the shift operator .

As z is a complex number it may be represented as . We will restrict us now to

or

Eq. 1-45

If we take for , then we obtain a harmonic sequence given by

Eq. 1-46

We will show now, that when is a periodic function with period N, it can be written as a

sum of eigenfunctions in which . So

Eq. 1-47

with . We already saw in Eq. 1-13 that there exist only N different

functions . The validity of Eq. 1-47 is demonstrated by showing that we can calculate

the coefficients . We can obtain in the following way: first multiply Eq. 1-47 with

, this results in

Eq. 1-48

Next we sum Eq. 1-48 over n resulting in

Eq. 1-49

x n[ ] φ n[ ] zn= =

y n[ ] zn k– h k[ ]k ∞–=

∞

∑ zn h k[ ]z k–

k ∞–=

∞

∑

H z( )zn H z( )φ n[ ]= = = =

H z( )

H z( ) h k[ ]z k–

k ∞–=

∞

∑=

z1–

z re jΩ=

r 1=

z e jΩ=

Ω kΩ0=

zn e jkΩ0n=

x n[ ]

φk n[ ] e jkΩ0n= Ω02πN------=

x n[ ] ake jkΩ0n

k 0=

N 1–

∑=

x n[ ] x n N+[ ]=

φk n[ ]

ak ak

e jlΩ0n–

e jlΩ0n– x n[ ] ake jkΩ0ne jlΩ0n–

k 0=

N 1–

∑=

e jlΩ0n– x n[ ]n 0=

N 1–

∑ ake j k l–( )Ω0n

k 0=

N 1–

∑n 0=

N 1–

∑ ak e j k l–( )Ω0n

n 0=

N 1–

∑

k 0=

N 1–

∑= =



We now can distinguish two cases: and

Eq. 1-50

Eq. 1-51

since . In other words, and are orthogonal functions

on the interval . So only for the term between parentheses in Eq. 1-49 differsfrom zero and

Eq. 1-52

This results in the Fourier series of a periodic discrete signal given by

Eq. 1-53

Because the eigenfunctions are periodic, instead of summing from 0 to , we can sum

over N successive values starting from an arbitrary value, which is denoted by : one

period of the signal. As , also the Fourier coefficients are periodic with period

. So instead of the N different values of the signal the signal is also completelydescribed by the N different Fourier coefficients.

Examples:

#5 , ,

Eq. 1-54

This means that all N frequencies are equally strong present in the impulse signal. So itsfrequency spectrum is flat.

#6

Eq. 1-55

k l= k l≠

l= ⇒ e j0

n 0=

N 1–

∑ N=

k l≠ ⇒ e j k l–( )Ω0n

n 0=

N 1–

∑ 1 xN–1 x–

---------------x e

j k l–( )Ω0=

1 e j k l–( )Ω0N–

1 e j k l–( )Ω0–---------------------------------- 0= = =

e j k l–( )Ω0N e j k l–( )2π= e jlΩ0n– e jkΩ0n–

0 N 1– , k l=

e jkΩ0n– x n[ ]n 0=

N 1–

∑ Nak=

ak1N---- e jkΩ0n– x n[ ]

n 0=

N 1–

∑ 1N---- e jkΩ0n– x n[ ]

n N⟨ ⟩=∑= =

x n[ ] ake jkΩ0n

k 0=

N 1–

∑ ake jkΩ0n

k N⟨ ⟩=∑= =

N 1–

k N⟨ ⟩=

ak ak N+= ak

N x n[ ]

x n[ ] δ n[ ]= 0 n N 1–≤ ≤ x n[ ] x n N+[ ]=

ak1N---- δ n[ ]e jkn 2π( ) N⁄–

n 0=

N 1–

∑ 1N----e0 1

N----= = =

x n[ ] δ n 1–[ ]=

ak1N---- δ n 1–[ ]e jkn 2π( ) N⁄–

n 0=

N 1–

∑ 1N----e jk 2π( ) N⁄–= =



In this case again the modulus of is independent of frequency, but the phase is a linear

function of the frequency.

Let us now return to the fact that are the eigenfunctions of a LTI system. This means

that the Fourier coefficients of the output are those of the input , multiplied by the

eigenvalues . When the signal is periodic with period N and the impulse response of the

LTI system is periodic with the same period N, it can be shown that the output of the

system is given by the periodic convolution :

Eq. 1-56

The discrete Fourier coefficients of are in this case given by

Eq. 1-57

with the Fourier coefficients of the impulse response function. So we see that a

convolution in the time domain corresponds to a product in the frequency domain, and theeigenvalues of the LTI system equal N times the Fourier coefficients of the impulse responsefunction. In the frequency domain an LTI system is thus described by the product with itstransfer function. Similarly it can be derived that the Discrete Fourier Transform of a productof two functions in the time domain corresponds to a convolution in the frequency domain:the modulation property. If and possess, respectively, Fourier coefficients

and then

(modulation) Eq. 1-58

We see in Eq. 1-58 the duality of modulation and convolution (Eq. 1-56 and Eq. 1-57):

(convolution) Eq. 1-59

Other properties are: linear:

Eq. 1-60

ak

e jΩ0k

bk ak

λk

y n[ ]⊗

y n[ ] x n[ ] h n[ ]⊗ x n k–[ ]h k[ ]k 0=

N 1–

∑= =

bk y n[ ]

bk1N---- e jkΩ0n– x n m–[ ]h m[ ]

m N⟨ ⟩=∑

n N⟨ ⟩=∑ h m[ ]

m N⟨ ⟩=∑ 1

N---- x n m–[ ]e jkΩ0n–

n N⟨ ⟩=∑

= = =

h m[ ]m N⟨ ⟩=∑ 1

N---- x l[ ]e jkΩ0 l m+( )–

l N⟨ ⟩=∑

h m[ ]e jkΩ0m– akm N⟨ ⟩=∑ Nakck= =

ck

x1 n[ ] x2 n[ ] ak

bk

x1 n[ ]x2 n[ ] ambk m–m N⟨ ⟩=∑⇒

x1 m[ ]x2 k m–[ ]m N⟨ ⟩=∑ Nakbk=

px1 n[ ] qx2 n[ ]+ pak qbk+→



time shift (of which Example #6 is an example with )

Eq. 1-61

Parseval

Eq. 1-62

We could think of as the energy present in one period of a signal.

When is the voltage across a resistor of , the dissipated energy in the resistor would

be . Parseval’s theorem means that the energy in the signalis the same. It does not matter whether we express it in its time distribution or in its frequencydistribution.When the signal is real, the spectrum is even: so . Only when is real

and even also the spectrum is real and even.So far we only looked at periodic signals , but what happens if the signal is not

periodic ? We can investigate this by letting N go to infinity. We will call now

where

Eq. 1-63

. For and the limits of are

so

Eq. 1-64

We see that when is an aperiodic function of a discrete n, the frequency spectrum is a

periodic function of a continuous . The periodicity of the spectrum is the direct result of the

n0 1=

x n n0–[ ] e jkΩ0n0– ak⇒

1N---- e jkΩ0n– x n n0–[ ]

n N⟨ ⟩=∑ 1

N---- e jkΩ0 n n0– n0+( )– x n n0–[ ]

n N⟨ ⟩=∑ e jkΩ0n0– 1

N---- e jkΩ0l– x l[ ]

l N⟨ ⟩=∑= =

1N---- x n[ ]( )2

n N⟨ ⟩=∑ ak

2

k N⟨ ⟩=∑=

1N---- x n[ ] x n[ ]( )∗

n N⟨ ⟩=∑ 1

N---- x n[ ]( )∗

n N⟨ ⟩=∑ e jkΩ0nak

k N⟨ ⟩=∑

= =

akk N⟨ ⟩=∑ 1

N---- x n[ ]e j– kΩ0n( )∗

n N⟨ ⟩=∑ ak

k N⟨ ⟩=∑ ak

∗=

1 N⁄( ) x n[ ]( )2n N⟨ ⟩=∑

x n[ ] 1ΩV 2 R⁄ x n[ ]( )2 1⁄ x n[ ]( )2= =

x n[ ] ak∗ a k–= x n[ ]

x n[ ]Nak X Ωk( )

Ωk kΩ0=

x n[ ] 1N----X Ωk( )e jΩkn

k N 1–( ) 2⁄–=

N 1–( ) 2⁄

∑ 12π------ 2π

N------X Ωk( )e jΩkn

k N 1–( ) 2⁄–=

N 1–( ) 2⁄

∑= =

2πN------ Ωk 1+ Ωk– ∆Ω= = N ∞→ ∆Ω

N ∞→lim dΩ= Ωk

2πN------ N 1–( )

2------------------± π±=

x n[ ] 12π------ X Ω( )e jΩndΩ

π–

π

∫= X Ω( ) x n[ ]e jΩn–

n ∞–=

∞

∑=

x n[ ]Ω


Systems and Signals Continuous time Fourier transforms

discrete nature of the signal. That is periodic with period can easily be verified by

showing that . We will see later that the frequency corresponds tothe sampling frequency.Also a convolution in the time domain corresponds to a product in the frequency domain: if

then

Eq. 1-65

Examples:

#7 The discrete Fourier transform of an impulse function is again flat (compare toexample #5), but its value is now 1:

Eq. 1-66

#8 In the same way we obtain for a delayed impulse function (compare to example #6)

Eq. 1-67

#9 Frequency spectrum of a moving average filter of three terms (see example #1).

Eq. 1-68

Eq. 1-69

We see that a moving average filter suppresses higher frequencies, up to half the samplingfrequency, but these frequencies are still present.

1.8 Continuous time Fourier transformsIn the same way as in the discrete case we can pose the question: what are the eigenfunctionsof a continuous LTI system. These are , as can easily be verified by substituting intoEq. 1-36:

X Ω( ) 2πX Ω 2π+( ) X Ω( )= 2π

x1 n[ ] X1 Ω( )→ x2 n[ ] X2 Ω( )→

x1 n[ ]∗x2 n[ ] X1 Ω( ) X2 Ω( )⋅→

x n[ ] δ n[ ]=DFT

⇒X Ω( ) δ n[ ]e jΩn–

n ∞–=

∞

∑ e0 1= = =

x n[ ] δ n 1–[ ]=DFT

⇒X Ω( ) δ n 1–[ ]e jΩn–

n ∞–=

∞

∑ e jΩ–= =

h n[ ] δ n 1–[ ] δ n[ ] δ n 1+[ ]+ +( ) 3⁄=

H Ω( ) h n[ ]e jΩn–

n ∞–=

∞

∑ 13--- e jΩn–

n 1–=

1

∑ 1 2 Ω( )cos+3

-------------------------------= = =

figure 1-8. Frequency response of three-point moving average lowpass filter.

est est



Eq. 1-70

Note that analogous to Eq. 1-43 a time shift is expressed by multiplication with . So

Eq. 1-71

with . We will restrict ourselves to the eigenfunctions which are

periodic with period so with and . We can write a

periodic signal with period as

Eq. 1-72

This is the well known Fourier series of periodic functions. A Fourier series exists under theDirichlet conditions, which are given by (see ):

1) Over any period , must be absolutely integrable: .

2) In any finite interval has a finite number of maxima and minima.

3) In any finite interval there are only a finite number of discontinuities.

The derivation of Eq. 1-72 is analogous to the discrete case. Example:

#10 Let us take a block wave given by

Eq. 1-73

which is periodic with period . We will apply Eq. 1-72 now to the interval .So

Eq. 1-74

and

Eq. 1-75

y t( ) x t( )∗h t( ) h t( )∗x t( ) h τ( )x t τ–( )dτ∞–

∞

∫= = = =

h τ( )es t τ–( )dτ∞–

∞

∫ est h τ( )e sτ– dτ∞–

∞

∫ estH s( )= =

e s– τ

estLTI

⇒H s( )est

H s( ) h τ( )e sτ– dτ∞–∞∫=

T 0 e jkω0t k 0 1± 2± …, , ,= ω0 2π T 0⁄=

T 0

x t( ) x t T 0+( ) ake jkω0t

k ∞–=

∞

∑= =

ak1

T 0------ x t( )e jkω0t– dt

T 0

∫=

T 0 x t( ) x t( ) dtT 0∫ ∞<

x t( )

x t( )

x t( ) 1= t T 1<

x t( ) 0= T 1 t T 0 2⁄< <

T 0 T– 0 2⁄ T 0 2⁄[ , ]

a01

T 0------ dt

T 1–

T 1

∫ 2T 1

T 0------= =

ak1

T 0------ e jkω0t– dt

T 1–

T 1

∫ e jkω0t–

jkω0T 0-------------------

T 1–

T 1

2 e jkω0T 1– e jkω0T 1–( )kω0T 0 2 j( )

--------------------------------------------------2 kω0T 1( )sin

kω0T 0--------------------------------= = = =



Using we find for .

figure 1-9.

ω0T 0 2π= ak kω0T 1( )sin( ) kπ( )⁄= k 0≠


Systems and Signals Minimum error approximation

1.9 Minimum error approximation in Eq. 1-72 contains an infinite number of terms. How many of them are necessary ?

Assume that we approximate a periodic signal by terms, so

Eq. 1-76

The error is in this case

Eq. 1-77

The energy of this error signal is

Eq. 1-78

It can be shown that minimization of gives the Fourier coefficients. So Fourier

coefficients minimize the energy in the error signal, and minimization is independent of theother coefficients. This is a consequence of the fact that the basisfunctions of the Fourierseries are orthogonal functions.

figure 1-10. Fourier series coefficients for the periodic square wave: (a) 4; (b) 8; (c) 16.T 0 T 1⁄ =

x t( ) ak

N 1+

xN t( ) ake jkω0t

k N–=

N

∑=

eN t( ) x t( ) ake jkω0t

k N–=

N

∑–=

EN eN t( ) 2dtT 0

∫ eN t( )eN∗ t( )dt

T 0

∫= =

EN


Systems and Signals Gibbs phenomenon

1.10 Gibbs phenomenon(see also Arfken §14.5) When there is a discontinuity in the signal , an approximation ofthe signal by Eq. 1-76 will always result in an overshoot, independent of the number ofcoefficients. This overshoot equals 1.09. The error goes to zero, but the overshoot remains,and moves towards the discontinuity when increases.

1.11 Fourier transform of non periodic signalsJust as we did in the discrete case, we will investigate the limiting case when the period T goesto infinity. We will call and with .

Substitution into Eq. 1-72 gives

Eq. 1-79

and taking the limit:

Eq. 1-80

The Fourier expressions are summarized in the following Table

Examples:

#11 For an impulse signal the frequency spectrum is again flat:

Eq. 1-81

x t( )

N

figure 1-11. Convergence of the Fourier series representation of a square wave: an illustration of Gibbs

phenomenon. Here we have the finite series approximation for

several values of N.

xN t( ) ake jkω0tk N–=N∑=

T ak⋅ XT ω( )= ω 2πk T⁄ kω0= = ω0 ∆ω 2π T⁄= =

x t( ) 12π------ 2π

T------XT ω( )e jωt

k ∞–=

∞

∑=

x t( ) 12π------ X ω( )e jωtdω

∞–

∞

∫= X ω( ) x t( )e jωt– dt

∞–

∞

∫=

x t( ) δ t( )=F

→X ω( ) 1=


Systems and Signals Fourier transform of non periodic signals

#12 When is a rectangular function, so

Eq. 1-82

Eq. 1-83

Oppositely, when the spectrum is rectangular, the signal is a sinc function:

Eq. 1-84

Eq. 1-85

where is defined as .

x t( )

x t( ) 1= t T 1<

x t( ) 0= t T 1>

X ω( ) e jωt– dt

T 1–

T 1

∫ e jωT 1– e jωT 1–jω–

----------------------------------2 ωT 1( )sin

ω--------------------------= = =

figure 1-12. Fourier transform pairs of rectangular pulse (left) and of rectangular spectrum (right).

X ω( ) 1= ω W<X ω( ) 0= ω W>

x t( ) 12π------ e jωtdω

W–

W

∫ Wt( )sinπt

-------------------Wπ-----sinc Wt π⁄( )= = =

sinc x( ) sinc x( ) πx( )sinπx

-------------------=


Systems and Signals Fourier transform of non periodic signals

Some properties of the Fourier transform (which we will denote by F) are

(DC component) Eq. 1-86

Linear: Eq. 1-87

Time shift: when then

Eq. 1-88

Differentiation: Eq. 1-89

Scaling: Eq. 1-90

Convolution: Eq. 1-91

Modulation: Eq. 1-92

We could also ask whether there exists a Fourier transform of a periodic signal. This is indeedthe case, and the spectrum can be derived from the Fourier series coefficients. If

then can be written as (Eq. 1-72) thus

Eq. 1-93

According to Eq. 1-88, when we shift the spectrum over a frequency , the signal is

multiplied by . As the inverse Fourier transform of equals ,

results in a shifted -function as spectrum

X 0( ) x t( )dt

∞–

∞

∫=

ax1 t( ) bx2 t( )+F

↔aX1 ω( ) bX2 ω( )+

x t( )F

↔X ω( )

x t t0–( )F

↔X ω( )e jωt0–

x t( )e jω0t F

↔X ω ω0–( )

tdd

x t( )F

↔jωX ω( )

jt– x t( )F

↔ ωdd

X ω( )

x at( )F

↔1a-----X

ωa----

x1 t( )∗x2 t( )F

↔X1 ω( ) X2 ω( )⋅

x1 t( ) x2 t( )⋅F

↔1

2π------X1 ω( )∗X2 ω( )

x t( ) x t T+( )= x t( ) ake jkω0tk ∞–=∞∑

X ω( ) F x t( )[ ] F ake jkω0tk ∞–=∞∑[ ] akF e jkω0t[ ]

k ∞–=

∞

∑= = =

ω0

e jω0t X ω( ) δ ω( )=1

2π------ e jω0t

δ


Systems and Signals Sampling

Eq. 1-94

Substitution of Eq. 1-94 into Eq. 1-93 results in

Eq. 1-95

This means that there is a direct relation between the coefficients of a Fourier series and thespectrum of a periodic signal. Eq. 1-95 gives the relation between the frequencies in the realworld continuous system, and the integer k representing the frequencies in the discreterepresentation and the value is .

1.12 SamplingAfter the discussion of the discrete and continuous case, we are now ready to investigate theconversion from continuous to discrete signals. Sampling can be described by multiplying thesignal with a sampling function consisting of an infinite sequence of -functions. When

is the continuous signal and is the sampled signal then:

Eq. 1-96

where , and T denotes the sampling interval. So is given

by

Eq. 1-97

and . The Fourier transform of is given by (combine Eq. 1-92 and

Eq. 1-96):

Eq. 1-98

Now the question is: what is ? is a periodic signal so its Fourier coefficients aregiven by Eq. 1-72

Eq. 1-99

and its Fourier transform is found using Eq. 1-95:

Eq. 1-100

e jω0t 12π------⋅

F

↔δ ω ω0–( )

X ω( ) ak2πδ ω kω0–( )k ∞–=

∞

∑=

k k 2π T⁄( )↔ 2π ak⋅

δ x t( )xp t( )

xp t( ) x t( )p t( )=

p t( ) δ t nT–( )n ∞–=∞∑= xp t( )

xp t( ) x nT( )δ t nT–( )n ∞–=

∞

∑=

x n[ ] x nT( )= xp t( )

X p ω( ) 12π------X ω( )∗P ω( )=

P ω( ) p t( )

ak1T--- p t( )e jkωot– dt

T∫ 1

T--- δ t( )e jkωot– dt

T 2⁄–

T 2⁄

∫ 1T---= = =

P ω( ) 2πT------ δ ω kω0–( )

k ∞–=

∞

∑=


Systems and Signals Sampling

which is again a sequence of -functions but now in the Fourier domain, at an interval. And

Eq. 1-101

This means that the spectrum of is repeated at multiples of the sampling frequency .

Sampling of the signal results in a periodic spectrum. This we know already from our analysisof discrete signals. When the signal is limited in frequency, and its maximum frequency issmaller than half the sampling frequency, the repeated spectra will not overlap, and theoriginal signal can be reconstructed. This is the well known sampling theorem of Nyquist andShannon, which states the following:

If is a bandlimited signal with when , then is uniquely

determined by its samples if with .

The effect that the spectra overlap when is called aliasing.

To reconstruct the original continuous signal we have to multiply the periodic spectrum with awindow which filters out one period:

Eq. 1-102

This means a convolution in the time domain with a reconstruction filter given by

δω0 2π T⁄=

X p ω( ) 1T--- X ω kω0–( )

k ∞–=

∞

∑=

X ω( ) ω0

x t( ) X ω( ) 0= ω ωM> x t( )x nT( ) n, 0 1± 2± …, , ,= ωs 2ωM> ωs 2π T⁄=

ωs 2ωM<

figure 1-13. Effect in the frequency domain of sampling in the time domain: (a) Spectrum of original signal;(b) spectrum of sampling function; (c) spectrum of sampled signal with ; (d)

spectrum of sampled signal with .

ωs 2ωM>ωs 2ωM<

H ω( )

H ω( ) 1= ω ωs 2⁄<

H ω( ) 0= ω ωs 2⁄>


Systems and Signals Relation between spectra of discrete and continuous

Eq. 1-103

and the reconstructed signal is given by

Eq. 1-104

It is interesting to note that the zeros of are at ( ) or

. This we could expect as the sampled values are exact at the sample

points. As the sinc function is a very expensive function for convolution, often more simplereconstruction filters are used. As these are less good in their frequency response the samplingfrequency should be a little higher than twice the maximum frequency (see e.g. Lynn andFuerst).

1.13 Relation between spectra of discrete and continuous time signalsWe will call the spectrum of the continuous time signal and the spectrum of

the discrete time signal, then:

Eq. 1-105

h t( ) 12π------ H ω( )e jωtdω

∞–

∞

∫ωst 2⁄( )sin

πt----------------------------

ωs

2π------sinc

ωst

2π-------- = = =

xr t( )

xr t( ) xp t( )∗h t( ) x nT( )ωs t nT–( ) 2⁄( )sin

π t nT–( )----------------------------------------------

n ∞–=

∞

∑= =

h t( ) ωst 2⁄ kπ= k 0≠t 2πk ωs⁄ kT= =

figure 1-14. Ideal bandlimited interpolation using the sinc function.

Xc ω( ) X p ω( )

X p ω( ) 1T--- Xc ω kωs–( )

k ∞–=

∞

∑=


Systems and Signals DFT Processing

Eq. 1-106

We already saw that

Eq. 1-107

We can view Eq. 1-106 as a summation of delta functions multiplied by coefficients .

As the Fourier transform of the delta function is given by Eq. 1-107, and using the fact that theFourier transform is linear, taking the Fourier transform of Eq. 1-106 results in

Eq. 1-108

The discrete time Fourier transform of was given by Eq. 1-64:

Eq. 1-109

Comparing Eq. 1-108 and Eq. 1-109 results in

Eq. 1-110

which means that in the discrete case corresponds to the sampling frequency in thecontinuous case: , which is .So far we have seen that a periodic and discrete signal results in a periodic and discretespectrum. The Fourier transform of an aperiodic and discrete signal results in a periodic andcontinuous spectrum. In the computer we can only represent discrete signals and spectra, sowe need a discrete spectrum. Now there exists a dual sampling theorem: if we observe a signala limited time, say from 0 to , then the spectrum is completely described by samples at aninterval Hz, so . If we have N samples in a time frame with samplinginterval , then in the discrete spectrum , so Nsamples over one period of are sufficient. If we transform a certain time frame of anaperiodic signal according to the Discrete Fourier Transform, then we make the signal bydoing that periodic, we repeat that time frame periodically.

1.14 DFT ProcessingSpectral analysis gives decomposition of a signal in its frequency components. Spectralanalysis is used for instance in the analysis of natural signals and in the investigation ofsystems like vibrations in buildings and mechanical systems, in radar and sonar.Spectral analysis with the DFT means discrete time and discrete frequencies, so a limited timeobservation window, which is repeated to obtain a periodic time-signal, in order to apply theDFT.The DFT of a signal that contains only harmonic frequencies (multiples) of the fundamental

xp t( ) xc nT( )δ t nT–( )n ∞–=

∞

∑=

δ t( )F

→1 δ t nT–( )

F

→e jωnT–

xc nT( )

X p ω( ) xc nT( )e jωnT–

n ∞–=

∞

∑=

x n[ ]

X Ω( ) x n[ ]e jΩn–

n ∞–=

∞

∑ xc nT( )e jΩn–

n ∞–=

∞

∑= =

X Ω( ) X pΩT---- =

Ω 2π=2π T⁄ ωs

T 01 T 0⁄ ∆ω 2π T 0⁄= T 0T T 0 N⁄= ∆Ω ∆ωT 2πT T 0⁄ 2π N⁄= = =

2π


Systems and Signals DFT Processing

frequency , results in a line spectrum. The DFT of a signal that containsfrequencies which are not harmonic frequencies of the fundamental frequency gives awidening of the spectral lines, this widening is called leakage. Now we may ask ourselveswhat is the cause thereof.The reason is that we observe the signal only during a limited time window, say from to

. We may see this as multiplying the signal with a rectangular window given by:

Eq. 1-111

and . Now the continuous Fourier transform of would have been

Eq. 1-112

and the Fourier transform of is given by Eq. 1-83:

Eq. 1-113

When we would sample this signal the resulting discrete frequency is given by , so

Eq. 1-114

So let us assume that we have N samples on the limited time window then and

Eq. 1-115

figure 1-15. Relationships between the DFT, Fourier transform and discrete Fourier series.

Ω0 2π N⁄=

T 1–T 1 w t( )

w t( ) 1= t T 1<

w t( ) 0= t T 1>

xb t( ) x t( ) w t( )⋅= xb t( )

Xb ω( ) 12π------X ω( )∗W ω( )=

W ω( ) w t( )

W ω( )2 ωT 1( )sin

ω--------------------------=

Ω ωT=

W Ω( )2T ΩT 1 T⁄( )sin

Ω---------------------------------------=

T 1 T 1,–( )1 NT 2⁄=

W Ω( ) 2T ΩN 2⁄( )sinΩ

------------------------------------=


Systems and Signals The Fast Fourier Transform

The zeros of this function occur when and . So these are amultiple of the fundamental frequency , and thus harmonic frequencies of .When the signal contains only harmonic frequencies of the fundamental frequency, inthe convolution we do not see the leakage, as the leakage is just zero in the discrete frequencysamples. When the signal contains frequencies for which Eq. 1-115 is not zero, we do see theleakage.We define the amount of leakage as the distance between the first two zeros of whichequals . The larger we take N, the smaller the leakage. The leakage can bedecreased by taking other types of windows with lower side lobes (see e.g. Lynn and Fuerst).

1.15 The Fast Fourier TransformThe Discrete Fourier Transform was given by Eq. 1-53, we will call this for the moment

Eq. 1-116

This means that for complex numbers we need four multiplications for , and one

ΩN 2⁄ kπ= k 0≠ Ω 2kπ N⁄=Ω0 2π N⁄= Ω0

x t( )

W Ω( )2Ω0 4π N⁄=

figure 1-16. Fourier transformation of (a) a signal containing three exact Fourier harmonics, and (b) a signalcontaining both harmonic and non-harmonic components (each abscissa: 512 samples).

X k[ ] Nak=

X k[ ] Nak e jkΩ0n– x n[ ]n 0=

N 1–

∑= =

e jkΩ0n– x n[ ]


Systems and Signals Concluding remarks

takes multiplications. As there are N of them in total we need multiplications.

So the complexity of the DFT is . In particular for large N this becomes very timeconsuming. The Fast Fourier Transform gives a solution to this problem.We could write Eq. 1-116 as

Eq. 1-117

in which is given by

Eq. 1-118

There are only N different values of because as soon as this value is the same as

with . Let us assume now that N is a power of two, then we can apply the

decimation in time: split in the even and odd terms:

Eq. 1-119

and by definition thus

Eq. 1-120

So instead of a N-point DFT we have now two points DFTs. We could repeat this trick

on and and obtain four points DFTs, until we end up with two points

DFTs which are inputs apart. Now how many multiplications are needed? We have N

multiplications with ’s in each step. In total there are steps. So the total number of

multiplications is and the complexity of the FFT is instead of

for the DFT. The speed up can easily be demonstrated by Table 1::

1.16 Concluding remarksIn this chapter we have discussed the basic theory of systems and signals and concentrated onthe relation between the continuous and discrete time. A further discussion can be found in(5). The situations which occur in real world signals are analysed by the computer. Importantissues such as digital filter design and stability of digital systems fall outside the scope of thischapter. An introduction into these topics can be found in (3), who give also PASCALprograms for illustration and filter design. For further reading we recommend (2, 4, 6).

X k[ ] 4N 4N 2

O N 2( )

X k[ ] x n[ ]wNkn

n 0=

N 1–

∑=

wN

N e jΩ0– e j2π N⁄–= =

wNkn kn N>

mod kn N,( )X k[ ]

X k[ ] x 2r[ ]wN2rk x 2r 1+[ ]wN

2r 1+( )k+r 0=

N 2⁄ 1–

∑= =

x 2r[ ] wN2( )rk wN

k x 2r 1+[ ] wN2( )rk

r 0=

N 2⁄ 1–

∑+r 0=

N 2⁄ 1–

∑

wN2 e j2π N 2⁄⁄– wN 2⁄= =

X k[ ] x 2r[ ] wN 2⁄( )rk wNk x 2r 1+[ ] wN 2⁄( )rk

r 0=

N 2⁄ 1–

∑+r 0=

N 2⁄ 1–

∑ G k[ ] wNk H k[ ]+= =

N 2⁄G k[ ] H k[ ] N 4⁄

N 2⁄w log2N

N log2N O N log2N( ) O N 2( )


Systems and Signals References

1.17 References1 Arfken, G. Mathematical methods for physicists. 3d ed. Academic Press: Orlando, 1985.

2 Ludeman, L.C. Fundamentals of digital signal processing. Wiley: New York, 1987.

3 Lynn, P.A.; Fuerst, W. Introductory Digital Signal Processing with computer applications.

Wiley: Chicester, 1990.

4 Oppenheim, A.V.; Schafer, R.W. Digital Signal Processing. Prentice/Hall: Englewood

Cliffs, NJ, 1975.

5 Oppenheim, A.V.; Willsky, A.S.; Young, I.T. Signals and systems. Prentice/Hall: London,

1983.

6 Roberts, R.A.; Muller, C.T. Digital Signal Processing. Addison Wesley: Reading, 1987.

Table 1: Speed up of FFT relative to DFT

speed up

16 64 256 4

64 384 4096 10

256 2048 32

1024 102

4096 342

N N log2N N 2

653×10

103×10 1

6×10

493×10 16

6×10


Digital Signal Processing The z-transform

2The z-transform

2.0 Introduction

The z-transform should be regarded as a generalization of the Discrete Fourier Transform. Itoffers a technique for the frequency analysis of digital signals and systems, providing anextremely compact notation for describing digital signals and systems. It is widely used byDSP designers and in the DSP literature. The so-called pole-zero description of a system is agreat help in visualizing its stability and frequency response characteristics.

2.1 Definition and properties of the z-transform

We recollect that the eigenfunctions of an LTI system in discrete time are given by , with z

a complex number. The output of a system with impulse response is given by:

Eq. 2-1

Inserting we find

Eq. 2-2

Thus are eigen functions with eigen value .

The z-transform is now defined as

Eq. 2-3

As z is a complex number it may be represented as . Previously werestricted us to the unit circle in the complex plane, , which corresponds to the Fouriertransform:

Eq. 2-4

LTI

x n[ ] y n[ ]h n[ ]

zn

h n[ ]

y n[ ] x n[ ]∗h n[ ] h k[ ]x n k–[ ]k ∞–=

∞

∑= =

x n[ ] zn=

y n[ ] h k[ ]zn k–

k ∞–=

∞

∑ zn h k[ ]z k–

k ∞–=

∞

∑ znH z( )= = =

zn H z( ) h k[ ]z k–k ∞–=∞∑=

X z( ) x n[ ]z n–

n ∞–=

∞

∑=

a jb+ re jΩ= =r 1=

X e jΩ( ) x n[ ] e jΩ( ) n–

n ∞–=

∞

∑ x n[ ]e j– Ωn

n ∞–=

∞

∑= =

versie 1.1 2-1 1994


Example #2.1:

the Fourier and the z-transform of an exponentially decaying signal

Eq. 2-5

where the sum converges if and thus . When the DFT sum diverges.However, the z-transform of this signal is given by

Eq. 2-6

The region of convergence (ROC) of this sum is defined by or equivalently

. For the ROC does not include the unit circle, consistent with the fact that forthose a-values the DFT does not converge.For , is the unit step with

Eq. 2-7

Now let then the z-transform of this signal is given by

Eq. 2-8

where the sum converges if or equivalently . Thus the z-transforms of thesetwo signals differ only in their ROC.

x n[ ] anu n[ ]=

X e jΩ( ) ae jΩ–( )n

n 0=

∞

∑ 11 ae jΩ––-----------------------= =

ae jΩ– 1< a 1< a 1≥

X z( ) x n[ ]z n–

n ∞–=

∞

∑ az 1–( )n

n 0=

∞

∑ 11 az 1––------------------- z

z a–-----------= = = =

az 1– 1<z a> a 1≥

a 1= x n[ ] u n[ ]

X z( ) zz 1–-----------,= z 1>

x n[ ] an– u n– 1–[ ]=

X z( ) az 1–( )n

n ∞–=

1

∑– az 1–( ) n–

n 1=

∞

∑– 1 a 1– z( )n

n 0=

∞

∑– 1 11 a 1– z–-------------------– z

z a–-----------= = = = =

a 1– z 1< z a<

Figure 2.1. Pole-zero plot and region of convergence for z-transform ofexponentially decaying signals of Example #2.1: (left) (right)

.x n[ ] anu n[ ]=

x n[ ] an– u n– 1–[ ]=

versie 1.1 2-2 1994


We calculate the z-transform of a cosine

with the help of Eq. 2-6:

Eq. 2-9

When we are dealing with a causal signal ( for ) we find the unilateral z-transform:

Eq. 2-10

If is a causal signal (right sided sequence) then the following property holds: if the

circle is in the ROC, then all finite values of z for which will also be in the

ROC. This can be seen as follows: if the circle is in the ROC, then is

absolutely summable. Since is right-sided, then multiplied by any real

exponential sequence which decays faster than will also be absolutely summable.

An important property of the (unilateral) z-transform is its relation to time shifting.Let us consider the z-transform of an impulse

Eq. 2-11

We can view z as a time-shift operator: multiplication by is equivalent to a delay of onesampling interval, a backward shift, whereas multiplication by z is equivalent to a forwardshift. More formally, the z-transform of is given by:

Eq. 2-12

The convolution property also holds for the z-transform:

Eq. 2-13

where is the transfer function. This can easily be seen by taking the z-transform of both

sides of (Eq. 2-1):

x n[ ] nΩ0( )ucos n[ ] 12--- e jnΩ0 e j– nΩ0+( )u n[ ]= =

X z( ) 12--- 1

1 e jΩ0z 1––-------------------------- 1

2--- 1

1 e j– Ω0z 1––-----------------------------+

z z Ω0( )cos–( )z2 2z Ω0( )cos– 1+-----------------------------------------------= =

x n[ ] 0= n 0<

X z( ) x n[ ]z n–

n 0=

∞

∑=

x n[ ]z r0= z r0>

z r0= x n[ ]r0n–

x n[ ] x n[ ]r0

n–

x n[ ] δ n[ ]= → X z( ) δ n[ ]z n–

n ∞–=

∞

∑ 1= =

x n[ ] δ n n0–[ ]= → X z( ) δ n n0–[ ]z n–

n ∞–=

∞

∑ z n0–= =

z 1–

x n n0–[ ]u n n0–[ ]

x n n0–[ ]u n n0–[ ]z n–

n ∞–=

∞

∑ z n0– x n[ ]u n[ ]z n–

n ∞–=

∞

∑ z n0– X z( )= =

y n[ ] h n[ ]∗x n[ ]= → Y z( ) H z( ) X z( )⋅=

H z( )

y n[ ] h k[ ]x n k–[ ]k ∞–=∞∑=

versie 1.1 2-3 1994


Eq. 2-14

Eq. 2-15

2.2 Inverse z-transform: contour integrationThe inverse z-transform is defined by

Eq. 2-16

where the integration is along a counterclockwise circular contour centered at the origin withradius r, where r can be chosen as any value for which converges.A useful procedure to find the inverse of a rational z-transform consists of expanding thealgebraic expression into a partial fraction expansion and recognizing the sequence associatedwith the individual terms.For example we calculate the signal having z-transform as

follows: assume that we can write as

Eq. 2-17

from which we have to solve for A, B and C. After a little algebra we find , and

. Thus

Eq. 2-18

We now recall that multiplication by is equivalent to a delay of one sampling interval.

The terms between brackets produce the inverse transform

(where we have used Eq. 2-6, Eq. 2-7 and Eq. 2-11) so the required is given by

Eq. 2-19

The z-transform may also represent an LTI system which we denote by . The

corresponding time function must correspond to the system’s impulse response . FromEq. 2-13 we have

Eq. 2-20

Consider again our previous example

y n[ ]z n–

n ∞–=

∞

∑ h k[ ]x n k–[ ]z n–

k ∞–=

∞

∑n ∞–=

∞

∑=

Y z( ) h k[ ]z k–

k ∞–=

∞

∑ x n k–[ ]z n k–( )–

n ∞–=

∞

∑ H z( ) X z( )⋅= =

x n[ ] 12πj-------- X z( )zn 1– dz∫°=

X z( )

X z( ) 1 z z 1–( ) 2z 1–( )( )⁄=

X z( )

X z( ) 1z z 1–( ) 2z 1–( )-------------------------------------- A

z--- B

z 1–----------- C

2z 1–--------------+ += =

A 1= B 1=

C 4–=

X z( ) 1z--- 1

z 1–----------- 4

2z 1–--------------–+ z 1– 1 z

z 1–----------- 2

zz 0.5–---------------–+

= =

z 1–

δ n[ ] u n[ ] 2 0.5nu n[ ]( )–+

x n[ ]

x n[ ] δ n 1–[ ] u n 1–[ ] 2 0.5n 1– u n 1–[ ]( )–+=

H z( )h n[ ]

H z( ) Y z( )X z( )-----------=

versie 1.1 2-4 1994


Eq. 2-21

giving

Eq. 2-22

which we can write as

Eq. 2-23

Using again the time-shift property we find

Eq. 2-24

To find the corresponding time function , we deliver a unit impulse as input

signal, and evaluate term-by-term:

Eq. 2-25

Evaluation of this recursive relation gives the impulse response described by Eq. 2-19. Thuswe find , , and so on.

2.3 More properties of the z-transformWe have already discussed some properties of the z-transform in Section 2.1: linearity,convolution and time-shift. Note the corresponding properties of the Fourier transform, since

.

The initial value theorem for a causal sequence may be stated as follows:

Eq. 2-26

which can easily be seen by inserting the definition of the (unilateral) z-transform :

only the term will remain.

The final value theorem may be stated as follows:

Eq. 2-27

Note that is the z-transform of .We use this Eq. 2-27 to calculate the steady state responses of a system. For example thesteady state response of a system with transfer function to a step input

.

H z( ) 1z z 1–( ) 2z 1–( )-------------------------------------- Y z( )

X z( )-----------= =

z z 1–( ) 2z 1–( )Y z( ) X z( )=

2 3z 1–– z 2–+( )Y z( ) z 3– X z( )=

2y n[ ] 3y n 1–[ ]– y n 2–[ ]+ x n 3–[ ]=

h n[ ] δ n[ ]h n[ ]

h n[ ] 1.5h n 1–[ ] 0.5h n 2–[ ]– 0.5δ n 3–[ ]+=

h 3[ ] 0.5= h 4[ ] 0.75= h 5[ ] 0.875=

z e jΩ=

x n[ ]

x 0[ ] X z( )z ∞→lim=

X z( )

X z( )z ∞→lim x n[ ]z n–

n 0=∞∑

z ∞→lim= n 0=

x n[ ]n ∞→lim

z 1–z

----------- X z( )

z 1→lim=

z 1–( ) z⁄ 1 z 1––= δ n[ ] δ n 1–[ ]–

H z( ) z z 0.8–( )⁄=

u n[ ]

versie 1.1 2-5 1994


Eq. 2-28

2.4 z-Plane poles and zerosThe z-transform of an exponential signal ( ) is a ratio of polynomials. The z-transform ofany real digital signal or transfer function can be written as a rational function of thefrequency variable z:

Eq. 2-29

Apart from a gain factor K this transform may be completely specified by the roots of thenumerator and denominator:

Eq. 2-30

where and are called the zeros and poles of . When the corresponding timefunction is real, then the poles and zeros are themselves either real, or occur in complexconjugate pairs. A useful representation of a z-transform is obtained by plotting its poles andzeros in the complex z-plane.

Figure 2.2. Properties of the unilateral z-transform.

y n[ ]n ∞→lim

z 1–z

----------- Y z( )

z 1→lim

z 1–z

-----------

z 1→lim X z( )H z( )= =

z 1–z

-----------

z 1→lim

zz 1–----------- z

z 0.8–--------------- 1

1 0.8–---------------- 5.0= = =

an

X z( ) N z( )D z( )------------=

X z( ) N z( )D z( )------------

K z z1–( ) z z2–( )…z p1–( ) z p2–( )…

-----------------------------------------------= =

zi pi X z( )

versie 1.1 2-6 1994


2.5 System stabilityA system is called stable when a bounded input results in a bounded output ,

which is equivalent to . When we must have

Figure 2.3. Unilateral z-transform pairs.

x µ< y µ'<

h n[ ]n ∞–=∞∑ ∞< z 1=

versie 1.1 2-7 1994


, thus the z-transform must exist on the unit circle.

For a causal system we get the condition , thus the z-transform must

exist on the unit circle and outside of it, for . The ROC is determined by singularities.

Thus for a causal and stable system all poles must be inside the unit circle.

An example of a causal and stable system is given by the exponentially decaying signal

described in Example #2.1: with .

2.6 Geometrical evaluation of the Fourier Transform in the z-plane.Assume that we want to evaluate the Fourier transform for a certain frequency. We draw avector from each pole and zero to a point on the unit circle representing the sinusoidalfrequency of interest. Then the magnitude of the spectral function equals the product of allzero-vector lengths, divided by the product of all pole-vector lengths (disregarding the gainfactor K). The phase equals the sum of all zero-vector phases, minus the sum of all pole-vectorphases.Thus with poles close to the unit circle the spectral magnitude function peaks, whereas withzeros close to or on the unit circle it goes through a minimum.An example of a transfer function with a pole at and

a zero at is shown in Figure 2.4.

Substituting gives the frequency response of the system:

Eq. 2-31

2.7 First and second order LTI systemsFirst and second order systems can be considered as building blocks for more complicatedsystems. Thus a system with transfer function can be viewed as a cascade of first andsecond order subsystems with transfer functions:

h n[ ] z n–n ∞–=∞∑ ∞<

h n[ ] z n–n 0=∞∑ ∞<

z 1≥

x n[ ] anu n[ ]= a 1<

H z( ) z 0.8–( ) z 0.8+( )⁄= z 0.8–=

z 0.8=

Figure 2.4. Visualizing the frequency response of an LTI system.

z e jΩ=

H Ω( ) e jΩ 0.8–e jΩ 0.8+----------------------=

H z( )

versie 1.1 2-8 1994


Eq. 2-32

Eq. 2-33

as the poles and zeros of a real function are either real or occur in complex conjugate pairs.Examples of a first order system with are shown in Figure 2.5. With

the pole on the positive real axis we get a low pass filter, whereas a pole on the negative realaxis results in a high pass filter.

Next we consider a second order system with a complex conjugate pole-pair ,

as shown in the right-hand part of Figure 2.5. The frequency at which the peak

gain occurs (the center frequency) is determined by the parameter . The selectivity (orbandwidth) of the systems is determined by the parameter r. The two zeros are placed at theorigin to ensure that the impulse response begins at . Dividing numerator and

denominator by gives

H1 z( )z z1–( )z p1–( )

-------------------=

H2 z( )z z2–( ) z z3–( )z p2–( ) z p3–( )

--------------------------------------=

H1 z( ) z z α–( )⁄=

Figure 2.5. (left) Characteristics of first-order systems. (right) The z-plane pole-zero configuration of a second-order system.

p2 re jθ=

p3 re j– θ=

θ

n 0=

z2

versie 1.1 2-9 1994


Eq. 2-34

and hence the difference equation

Eq. 2-35

2.8 Nonzero auxiliary conditionsThe unilateral z-transform can also cope with nonzero auxiliary (or initial) conditions. Asystem of order k requires k auxiliary conditions. For example we consider here a first ordersystem with difference equation:

Eq. 2-36

The z-transform of is given by:

Eq. 2-37

Thus taking the z-transform of Eq. 2-36 we findwhich leads to

Eq. 2-38

Thus with nonzero auxiliary (or initial) conditions ( ) the ratio of and

is not equal to .

H2 z( ) Y z( )X z( )----------- 1

1 2r θ( )cos z 1–– r2z 2–+----------------------------------------------------------= =

y n[ ] 2r θ( )cos y n 1–[ ] r2y n 2–[ ]– x n[ ]+=

y n[ ] αy n 1–[ ]– x n[ ]=

y n 1–[ ]

Y 1 z( ) y n 1–[ ]z n–

n 0=

∞

∑ y 1–[ ] y n 1–[ ]z n–

n 1=

∞

∑+= =

y 1–[ ] z 1– y n[ ]z n–

n 0=

∞

∑+ y 1–[ ] z 1– Y z( )+=

Y z[ ] α y 1–[ ] z 1– Y z( )+( )– X z( )=

Y z( ) X z( ) αy 1–[ ]+1 αz 1––

------------------------------------=

y 1–[ ] 0≠ Y z( ) X z( )H z( )

versie 1.1 2-10 1994

Digital Signal Processing Design of nonrecursive (FIR) filters

3 Design of nonrecursive (FIR) filters

3.0 Introduction

The general form of difference equation for a causal LTI system is given by:

Eq. 3-1

In a nonrecursive filter the output depends only on present and previous inputs and not onprevious outputs ( ):

Eq. 3-2

The coefficients are simply the successive terms in the impulse response of the filter. Sincethe number M of coefficients must be finite, a practical nonrecursive filter is called FIR(finite impulse response). The transfer function is found by taking the z-transform of Eq. 3-2:

Eq. 3-3

and the frequency response is found by putting in Eq. 3-3:

Eq. 3-4

The question is now how to choose the coefficients of a desired filter.

Idealized filter frequency responses are shown in Figure 3.1.

aky n k–[ ]k 0=

N

∑ bkx n k–[ ]k 0=

M

∑=

N 0=

y n[ ] bkx n k–[ ]k 0=

M

∑=

bkbk

H z( ) Y z( )X z( )----------- bkz k–

k 0=

M

∑= =

z ejΩ

=

H Ω( ) bke jkΩ–

k 0=

M

∑=

bk

Figure 3.1. Idealized digital filter frequency responses: (a) low-pass, (b) high-pass, (c) bandpass, and (d) bandstop.

versie 1.1 3-1 1994


A FIR filter is inherently stable, because it has no poles outside of the origin. As the impulseresponse is finite it can be chosen symmetrical in form. This produces an ideal linear-phasecharacteristic, equivalent to a pure time delay of all frequency components passing throughthe filter (no phase distortion). To illustrate this last point we start with a noncausal impulse

response with transfer function

Eq. 3-5

which is a real function of , implying a zero-phase filter (no phase shift at any frequency).

To make this filter causal we shift by M sampling intervals: and

thus converting the zero-phase characteristic into a pure linear-phaseone.

3.1 Moving average filtersThe impulse response of a simple moving average filter is given by

Eq. 3-6

The z-transform of this is given by:

Eq. 3-7

To find its transfer function we substitute into Eq. 3-5:

Eq. 3-8

h n[ ] bkx n k–[ ]k M–=M∑=

H Ω( ) bke jkΩ–

k M–=

M

∑ b0 2 bk kΩ( )cosk 1=

M

∑+= =

Ωh n[ ] h′ n[ ] h n M–[ ]=

H′ Ω( ) e jΩM– H Ω( )=

Figure 3.2. Impulse responses giving (a) zero-phase, and (b) linear-phasecharacteristics.

h n[ ]1

2M 1+------------------

0

=n M≤n M>

h n[ ]

H z( ) 12M 1+------------------ z n–

k M–=

M

∑ 12M 1+------------------ z2M 1+ 1–

zM z 1–( )-------------------------⋅= =

bk 1 2M 1+( )⁄=

H Ω( ) 12M 1+------------------ 1 2 kΩ( )cos

k 1=

M

∑+ 1

2M 1+------------------ Ω 2M 1+( ) 2⁄( )sin

Ω 2⁄( )sin-----------------------------------------------= =

versie 1.1 3-2 1994


The causal filter possesses 2M poles in the origin (because of the time shift we add) and 2Mzeros spaced around the unit circle, but the zero at is missing (Eq. 3-7), which accountsfor the passband centered at . Examples with and are shown inFigure 3.3.

Note that these the magnitude responses of these low-pass filters, which are often used inpractice, are far from the ideal low-pass filter characteristic of Figure 3.1.a.From a low-pass filter a simple high-pass or bandpass filter can be derived. The basic idea isto multiply, or modulate, the original impulse response by , where is thedesired center frequency of the filter. By the modulation property of the Fourier transform wefind

Eq. 3-9

For our moving-average low-pass filter we get, taking for example (and usingEq. 3-6):

Eq. 3-10

which characteristics with are shown in Figure 3.4.

z 1=Ω 0= M 2= M 10=

Figure 3.3. Frequency response magnitude characteristics of low-pass moving-average filters: (a) 5-term, and (b) 21-term. Parts (c) and (d) show their respective z-plane pole-zero configurations.

nΩ0( )cos Ω0

nΩ0( )cos h n[ ] H Ω( )∗ 12---δ Ω Ω0–( ) 1

2---δ Ω Ω0+( )+

→

Ω0 π 3⁄=

h n[ ]1

2M 1+------------------ nπ

3------ cos

0

=n M≤n M>

M 10=

versie 1.1 3-3 1994


As noted before, this filter characteristic is far from the ideal band-pass filter characteristic.Rather than start with a simple form of impulse response we should calculate the impulseresponse which best approximates a specified frequency response.

3.2 The Fourier transform methodIn principle the impulse response is found from the inverse Fourier transform of the

desired frequency response :

Eq. 3-11

Thus for an ideal low-pass filter with cut-off frequency we find

Eq. 3-12

To shift the passband to we multiply this expression by

Eq. 3-13

To find the frequency response characteristic of a truncated (FIR) we substitute it intoEq. 3-5:

Eq. 3-14

which will give a better approximation with increasing M.Thus a close to ideal filter requires many coefficients (Figure 3.6.). When contains a

step like transition ( ) the output will show oscillations and overshoots as a result

Figure 3.4. Deriving a simple bandpass filter from a low-pass prototype: (a) impulseresponse, and (b) frequency response magnitude function.

h n[ ]H Ω( )

h n[ ] 12π------ H Ω( )e jΩn Ωd

2π∫=

Ω1

h n[ ] 12π------ e jΩn Ωd

Ω1–

Ω1

∫nΩ1( )sin

nπ-----------------------

Ω1

π-------sinc nΩ1( )= = =

Ω0 nΩ0( )cos

h n[ ]Ω1

π-------sinc nΩ1( ) nΩ0( )cos=

h n[ ]

H Ω( )Ω1

π------- 2 h k[ ] kΩ( )cos

k 1=

M

∑+=

x n[ ]u n[ ] y n[ ]

versie 1.1 3-4 1994


of the high frequencies present in the transition, a phenomenon which is called ringing.The Fourier transform design method gives the best approximation in a least squares sense.Denoting the desired and actual frequency response function by, respectively, and

, the overall error e which is defined as:

Eq. 3-15

is minimal.

Figure 3.5. Impulse responses of two ideal, zero-phase, low-pass filters.

Figure 3.6. Frequency responses of three linear-phase bandpass filters, obtained bytruncating the ‘ideal’ impulse response.

Hd Ω( )

Ha Ω( )

e1

2π------ Hd Ω( ) Ha Ω( )– 2 Ωd

2π∫=

versie 1.1 3-5 1994


3.3 WindowingTruncation of in the time domain (as we did in the previous section) is equivalent to

multiplication with a rectangular window function . Because of the modulationproperty of the Fourier transform this is equivalent to a convolution in the frequency domain:

Eq. 3-16

So let us now investigate some window functions.

3.3.1 Rectangular windowWe recall from Eq. 3-6:

Eq. 3-17

3.3.2 Triangular window

Eq. 3-18

The triangular window can be regarded as a self-convolution of a rectangular window. Nowtime-domain convolution is equivalent to frequency domain multiplication. Therefore, whenplotted on a logarithmic scale, the ( terms) triangular window has sidelobe levels half

as great as those of the ( terms) rectangular window, as can be seen by comparingFigure 3.7. and Figure 3.8.The most widely used logarithmic measure of spectral magnitude (or gain) G is the decibel,which is defined as .

3.3.3 Von Hann and Hamming windowsSince all practical windows involve a compromise between the shape of the main lobe andsidelobe levels, there must be a trade-off between a sharp passband-stopband transition andlow ripple levels in the actual filter. Two windows which have a main spectral lobe similar tothat of a triangular window, but smaller sidelobe levels (see Figure 3.9.) are defined by:

Eq. 3-19

h n[ ]w n[ ]

ha n[ ] hd n[ ] w n[ ]⋅= → Ha Ω( ) Hd Ω( )∗W Ω( )=

w n[ ]1

0

=n M≤n M>

→ W Ω( ) 1 2 kΩ( )cosk 1=

M

∑+=

w n[ ]M 1+( ) n–

M 1+( )2-------------------------------

0

=n M≤n M>

→

W Ω( ) 1M 1+--------------

2M 1+( )2

---------------------- M 1+ k–( ) kΩ( )cosM

∑+=

2M 1+

M 1+

dB: 20log10G

w n[ ]A 1 A–( ) nπ

B------ cos+

0

=n M≤n M>

→

W Ω( ) 1 2 w k[ ] kΩ( )cosM

∑+=

versie 1.1 3-6 1994


Von Hann: (Figure 3.9.b, also referred to as Hanning window).

Hamming: . (Figure 3.9.c)

3.3.4 Kaiser windowIn contrast to the previous windows which had fixed shapes, the Kaiser window offers thedesigner the possibility to adjust the trade-off. It is defined as:

Figure 3.7. Spectra of rectangular windows with (a) 21 terms, and (b) 51 terms.

Figure 3.8. (a) A triangular function, and (b) the spectrum of a 41-term triangularwindow.

A 0.5= B M 1+=

A 0.54= B M=

versie 1.1 3-7 1994


Eq. 3-20

where is the modified Bessel function of the first kind and of zero order, which may be

expanded as a power series: . If the Kaiser

window is similar to the Hamming window. The design of the Kaiser window is based on thefollowing findings:The parameter α depends upon the allowable ripple value δ. Then the transition width ∆ isrelated to the window length. Hence if ∆ is specified we can find the parameter M.The ripple level is expresses as an attenuation in decibels:

Eq. 3-21

The following empirical formulae are often used:

Figure 3.9. Spectra of 51-term windows: (a) triangular, (b) von Hann, and (c)Hamming.

w n[ ]I0 α 1

nM----- 2

–

I0 α( )----------------------------------------

0

=n M≤n M>

I0

I0 x( ) 11n!----- x

2--- n

2

n 1=

∞∑+= α 5.44=

A 20log10δ–=

versie 1.1 3-8 1994


Eq. 3-22

3.4 Equiripple filtersThe basic idea is to distribute the error between desired and actual response more equally overthe range . We illustrate this for a low-pass equiripple filter.

In the passband ( ) the acceptable level of ripple is ; in the stopband

( ) the acceptable level of ripple is . The width of the transition band is

. The ripple peaks and troughs occur at . We start with an impulse

response which is symmetric about . The frequency response takes the general form:

Eq. 3-23

Now a term can always be expressed as a sum of powers of . Therefore

Figure 3.10. Specifying the design of a Kaiser-window filter.

α 0.1102 A 8.7–( )= ifA 50≥α 0.5842 A 21–( )0.4 0.07886 A 21–( )+= if21 A 50< <

α 0= ifA 21≤

MA 7.95–28.72∆--------------------≥

0 Ω π≤ ≤

Figure 3.11. Specifying an equiripple low-pass filter.

0 Ω Ωp≤ ≤ δ1±

Ωs Ω π≤ ≤ δ2±

Ωs Ωp– Ω1 Ω2 …, ,

n 0=

H Ω( ) h 0[ ] 2 h k[ ] kΩ( )cosk 1=

M

∑+=

kΩ( )cos Ω( )cos

versie 1.1 3-9 1994


Eq. 3-23 can be recast as:

Eq. 3-24

an Mth order trigonometric polynomial which can display up to local extrema

within the range , corresponding to ripple peaks and troughs. Differentiating Eq. 3-24 with respect to Ω we obtain

Eq. 3-25

Since when there are extrema at these frequencies. Hence there are

at most local extrema within the range . The widely used approach of Parks

and McClellan allows to specify and the ripple ratio , while allowing the

actual value of to vary. Their approach has the advantage that the transition bandwidth

( ) is properly controlled.

3.5 Digital differentiatorsAn LTI system which forms the first order difference (FOD) of an input signal

Eq. 3-26

may be thought of as a ‘differentiator’. The corresponding frequency response is

Eq. 3-27

with magnitude function

Eq. 3-28

However, accurate differntiation is only achieved for the lower part of the frequency range

. An ideal differentiator has , since differentiating a Fourier term

proportional to with respect to n gives . A magnitude response proportional to Ω

H Ω( ) ck Ω( )cosk

k 0=

M

∑=

M 1–( )0 Ω π< <

H′ Ω( )Ωdd

H Ω( ) Ω( ) ckk Ω( )cosk 1–

k 1=

M

∑sin–= =

Ω( )sin 0= Ω 0 π,=

M 1+ 0 Ω π≤ ≤M Ωp Ωs, , δ1 δ2⁄

δ1

Ωs Ωp–

y n[ ] x n[ ] x n 1–[ ]–=

H Ω( ) 1 e jΩ–– 2 je jΩ 2⁄– Ω 2⁄( )sin= =

H Ω( ) 2 Ω 2⁄( )sin=

Figure 3.12. Frequency responses of digital differentiators.

0 Ω π≤ ≤ H Ω( ) jΩ=

e jnΩ jΩe jnΩ

versie 1.1 3-10 1994


is found only with small values of Ω in Eq. 3-28. In general

Eq. 3-29

Until now we only considered real impulse responses , which were symmetrical about

leading to real transfer functions . An odd, purely imaginary

corresponds to an odd, antisymmetrical impulse response about .

The inverse Fourier transform of is given by

Eq. 3-30

where we have integrated by parts. We thus find for an ideal differentiating filter

Eq. 3-31

Again, multiplication with a window function is necessary. Examples are shown in Figure3.13. and Figure 3.14.

H Ω( ) A Ω( ) jB Ω( )+=

h n[ ]n 0= H Ω( ) A Ω( )=

H Ω( ) jB Ω( )= n 0=

H Ω( ) jΩ=

h n[ ] 12π------ jΩe jΩn Ωd

π–

π

∫ 12π------ jΩe jΩn

jn------------------

π–

π e jΩn

n----------- Ωd

π–

π

∫– 1

2π------ e jΩn Ω

n---- 1

jn2--------–

π–

π= = =

h n[ ]0

1 n⁄1 n⁄–

=

n 0=

n 2± 4± …, ,=

n 1± 3± …, ,=

Figure 3.13. Impulse response of differentiator of Eq. 3-31 truncated to 21 termsand shifted to begin at n 0=

Figure 3.14. Frequency responses of two nonrecursive differentiators based on (a) arectangular window, and (b) a hamming window.

versie 1.1 3-11 1994

Digital Signal Processing Design of recursive (IIR) filters

4 Design of recursive (IIR) filters

4.0 Introduction

The output from a recursive digital filter depends upon one or more previous output values, aswell as on inputs. The great advantage thereof is computational economy: a filtercharacteristic requiring say 100 coefficients in a nonrecursive realization can often beobtained using just a few recursive coefficients. However, there are two potentialdisadvantages: (a) the recursive filter may become unstable if its feedback coefficients arechosen badly (b) recursive designs cannot generally provide the linear phase responses soreadily achieved by nonrecursive methods, so there is phase distortion.In most cases a recursive filter has an infinite impulse response (IIR). Although the impulseresponse decays towards zero as , it theoretically continues forever. Assuming

the filter is causal ( for ) this means that the impulse response cannot besymmetrical in form, and therefore the filter cannot display a pure linear-phase characteristic.In contrast to the nonrecursive filter the recursive filter has one or more strategically placed z-plane poles. We may write the difference equation ( , ):

Eq. 4-1

and transfer function

Eq. 4-2

Factorizing the numerator and denominator polynomials of Eq. 4-2 we obtain the pole-zerodescription of the filter:

Eq. 4-3

with frequency response

Eq. 4-4

4.1 Simple designs based on z-plane poles and zerosAs discussed in Section 2.6 and Section 2.7 a pole close to the unit circle gives rise to a well-defined response peak, whereas a zero close to (or on) the unit circle produces a trough (ornull). Our aim in this section is to show how z-plane poles and zeros can be positioned to givea variety of simple, but useful, recursive filters.Suppose we specify a real pole at . It contributes the following factor to the

denominator of :

h n[ ] n ∞→h n[ ] 0= n 0<

N 0> M 0≥

aky n k–[ ]k 0=

N

∑ bkx n k–[ ]k 0=

M

∑=

H z( ) Y z( )X z( )----------- bkz k–

k 0=

M

∑ akz k–

k 0=

N

∑⁄= =

H z( )K z z1–( ) z z2–( )…

z p1–( ) z p2–( )…-----------------------------------------------=

H Ω( )K e jΩ z1–( ) e jΩ z2–( )…

e jΩ p1–( ) e jΩ p2–( )…-----------------------------------------------------------=

z α=

H Ω( )

versie 1.1 4-1 1994


Eq. 4-5

Its magnitude contribution is therefore:

Eq. 4-6

A real zero gives an identical contribution, but to the numerator of .

When in Eq. 4-6 then and thus constant. When α is close to 1

becomes very small for resulting in large values of the transfer function.

A complex conjugate pole-pair, or zero-pair, with polar coordinates makes acontribution:

Eq. 4-7

and the magnitude is

Eq. 4-8

This results when r is close to 1 in small values of for and thus in large

values of the transfer function.We can build up an overall response by assessing the contributions of individual poles orpole-pairs, and zeros or zero-pairs in turn. Examples are shown in Figure 4.2. and Figure 4.1.

This is equivalent to synthesizing the system as a series of cascaded first and second ordersubsystems. Such a realization is often referred to as the cascade canonic form in the DSPliterature.

As a second example we design a recursive bandpass filter with the following characteristics:(a) a passband centered at , with a bandwidth of between -3 dB points and a

peak gain of unity. (b) steady-state rejection of components at and .

To meet the passband centering we place a complex conjugate pole pair at . It isassumed that a pole close to the unit circle is entirely responsible for the response peak. The

F1 Ω( ) e jΩ α– Ω( )cos α–( ) j Ω( )sin+= =

F1 Ω( ) Ω( )cos α–( )2 Ω( )sin2+ 1 2α Ω( )cos– α2+= =

H Ω( )α 0= F1 Ω( ) 1= F1 Ω( )

Ω π±=

r θ±,( )

F2 Ω( ) e jΩ re jθ–( ) e jΩ re jθ––( ) e2 jΩ 2r θ( )cos e jΩ– r2+= = =

2Ω( )cos 2r θ( )cos Ω( )cos– r2+( ) j 2Ω( )sin 2r θ( )cos Ω( )sin–( )+

F2 Ω( ) 2Ω( )cos 2r θ( )cos Ω( )cos– r2+( )2 2Ω( )sin 2r θ( )cos Ω( )sin–( )2+=

F2 Ω( ) Ω θ±=

Figure 4.1. (a) A pole-zero configuration, and (b) the equivalent spectral magnitudefunction, normalized to a peak value of 0 dB.

Ω π 2⁄= π 40⁄Ω 0= Ω π=

θ π 2⁄±=

versie 1.1 4-2 1994


radius r can be found as follows, and is illustrated in Figure 4.3.

The -3 dB bandwidth corresponds to a distance around the unit circle (which

approximates a straight line in this region), and hence to a change in Ω of radians:

Eq. 4-9

To reject at we place zeros on the unit circle at . The complete pole-zeroconfiguration is shown in Figure 4.4.

Figure 4.2. Spectral magnitude functions produced by (a) a single real pole at ;(b) a second order zero at ; (c) a complex conjugate pole pair at ,

; and (d) a complex conjugate zero pair on the unit circle at .

z 0.9=z 0.8–= r 0.975=

θ 150°±= θ 50°±=

Figure 4.3. (a) Measuring the -3 dB bandwidth; (b) relationship between bandwidth andthe radius of a z-plane pole.

2d 2 1 r–( )=

2 1 r–( )

2 1 r–( ) π 40⁄= giving r 0.961=

Ω 0 π,= z 1±=

versie 1.1 4-3 1994


The filter’s transfer function becomes

Eq. 4-10

with to ensure a maximum gain of unity at where

. The corresponding difference equation is

Eq. 4-11

In the third example we design a simple bandstop filter for rejecting a narrow band ofunwanted frequencies. In addition to a pair of complex conjugate zeros at the appropriatefrequencies a pair of complex conjugate poles is placed close to these zeros. Then over mostof the frequency range the pole and zero vectors are almost identical in length, and theresponse is close to unity. Only in the immediate vicinity of the zero vector becomes

much shorter than the pole vector, producing a narrow notch, see Figure 4.5. (compare withFigure 4.2.b).

Figure 4.4. Pole-zero configuration and magnitude response of a simple bandpass filter.

H z( ) K z 1–( ) z 1+( )z rj–( ) z rj+( )

------------------------------------- K z2 1–( )z2 r2+

----------------------- K 1 z 2––( )1 r2z 2–+

-------------------------= = =

K 0.03824= z j=

H j( ) 2K 1 r2–( )⁄=

y n[ ] r2y n 2–[ ]+ K x n[ ] x n 2–[ ]– =

Ω Ω0=

Figure 4.5. (a) Poles and zeros of a ‘notch’ filter; (b) response of a notch design forrejecting mains-supply interference from an EKG signal.

versie 1.1 4-4 1994


4.2 Filters derived from analog designsContinuous time filters are defined by differential equations:

Eq. 4-12

The role of the z-transform in the discrete time domain is played by the Laplace transform in the continuous time domain, and the substitution is equivalent to . It

follows that the imaginary axis ( ) in the s-plane corresponds to the unit circle in the z-plane.

Using the Fourier transform property we find for the

frequency response function

Eq. 4-13

whereas the transfer function can be described in the general form

Eq. 4-14

where the filter is characterized by its poles and zeros which can be

plotted in the complex s-plane.One of the most effective ways of converting an analog filter into a digital filter is by means ofthe bilinear transformation. But first we summarize the characteristics of two important typesof analog filters. The magnitude functions are given by

Eq. 4-15

where n is the filter order and is the nominal cut-off frequency. is the so-called

Chebyshev polynomial of nth order. It oscillates between 0 and 1 in the passband (if ),rising to large values in the stopband. The amount of passband ripple δ is related to theparameter ε by the expression

Eq. 4-16

Chebyshev polynomials may be generated from the recursion formula

Eq. 4-17

ak tk

k

dd

y t( )k 0=

N

∑ bk tk

k

dd

x t( )k 0=

M

∑=

X s( ) z e jΩ→ s jω→s jω=

tdd

x t( ) jωX ω( )→ sX s( )=

H ω( ) Y ω( )X ω( )------------- bk jω( )k

k 0=

M

∑ ak jω( )k

k 0=

N

∑⁄= =

H s( )K s z1–( ) s z2–( )…

s p1–( ) s p2–( )…-----------------------------------------------=

p1 p2 …, , z1 z2 …, ,

H ω( ) 1ωω1------ 2n

+ 1 2/–

= Butterworth( )

H ω( ) 1 ε2Cn2 ωω1------ +

1 2/–= (Chebyshev)

ω1 Cn

n 0>

δ 1 1 ε2+( ) 1 2/––=

Cn x( ) 2xCn 1– x( ) Cn 2– x( )–= C0 x( ) 1= C1 x( ) x=

versie 1.1 4-5 1994


4.2.1 The bilinear transformationLet us consider the complex function

Eq. 4-18

which is ‘bilinear’ in the sense that its numerator and denominator are both linear in z.Substituting we obtain

Eq. 4-19

when then , so it maps the unit circle on the axis. The complete

response of an analog filter is generated as ω varies from 0 to . If we substitute

for in the transfer function of Eq. 4-14 we obtain a function

in which the complete frequency response of the analog filter is compressed into the

range . The compression of the frequency scale is nonlinear. The shape of the

function means that the compression, or ‘warping’, effect is very small near ; but it

increases dramatically as we approach .The bilinear transformation preserves the ‘maximally flat’, or ‘equiripple’, amplitudeproperties of the filters when the frequency axis is compressed. There is no aliasing of theanalog frequency response. Thus the response of a low-pass filter falls to zero at .The magnitude responses of the low-pass digital filters derived from Eq. 4-15 are given by

Eq. 4-20

A Butterworth low-pass digital filter of nth order has n poles arranged on a circular locus in

Figure 4.6. Typical frequency response (magnitude) functions of Butterworth andChebyshev analog low-pass filters.

F z( ) z 1–z 1+-----------=

z e jΩ=

F Ω( ) e jΩ 1–e jΩ 1+----------------- 2 j Ω 2⁄( )sin

2 Ω 2⁄( )cos------------------------------ j

Ω2---- tan= = =

0 Ω π< < 0 F Ω( ) ∞< < jω∞

F Ω( ) j Ω 2⁄( )tan= s jω=

H Ω( )0 Ω π< < tan

Ω 0=

Ω π=

Ω π=

H Ω( ) 1Ω 2⁄( )tanΩ1 2⁄( )tan

--------------------------- 2n

+ 1 2/–

= Butterworth( )

H Ω( ) 1 ε2Cn2 Ω 2⁄( )tan

Ω1 2⁄( )tan--------------------------- +

1 2/–= (Chebyshev)

versie 1.1 4-6 1994


the z-plane, and an nth order real zero at . The poles are given by the values of

falling inside the unit circle, where the real and imaginary of are respectively

Eq. 4-21

where

Eq. 4-22

and . If n is even the terms are replaced by .

An example is the design of a Butterworth low-pass filter with a cut-off frequency which response should be at least 30 dB down at .

Substituting and into Eq. 4-20 gives

Eq. 4-23

Now -30 dB corresponds to a response ratio (using

). Hence we find or . Since

the filter order must be integer, we choose . The pole locations found from

Eq. 4-21 and Eq. 4-22 are , and .They are sketched in Figure 4.7.

A convenient way to derive the filter’s difference equation is to treat it as a cascaded set of

z 1–= Pm

Pm

PRm 1 Ω1 2⁄( )tan2–( ) d⁄=

PIm 2 Ω1 2⁄( ) mπ n⁄( )sintan d⁄=

d 1 2 Ω1 2⁄( ) mπ n⁄( )costan– Ω1 2⁄( )tan2+=

m 0 1 … 2n 1–( ), , ,= mπ n⁄ 2m 1+( )π 2n⁄

Ω1 0.2π= Ω 0.4π=

Ω1 0.2π= Ω 0.4π=

H 0.4π( ) 10.2π( )tan0.1π( )tan

------------------------ 2n

+ 1 2/–

1 2.236( )2n+( ) 1 2/–= =

H Ω( ) 10 3 2/–=

30– 20 H Ω( )( )log= 1 2.236( )2n+( ) 103≥ 1000= n 4.29≥n 5= r θ,( )

0.50953 00,( ) 0.83221 34.6440,( ) 0.59619 23.1250,( )

Figure 4.7. A 5th-order Butterworth low-pass digital filter.

versie 1.1 4-7 1994


first and second order subfilters, as shown in Figure 4.7.b. We will distribute the five zeros at equally over the first and second order subfilters resulting in terms and

. The transfer function of the first order subfilter then takes the form

Eq. 4-24

with . It gives the difference equation

Eq. 4-25

Each second-order subfilter has a transfer function of the form

Eq. 4-26

yielding a difference equation

Eq. 4-27

The three difference equations can be used together, or alternatively a single high-orderdifference equations involving just x and y can be derived.From a low-pass filter a bandpass filter can be derived analogous to Eq. 3-9: multiplicationwith , which means convolution in the frequency domain with the Fourier

transform thereof (two δ-functions at ). This can be viewed as rotation by of poles in

the z-plane and addition of the complex conjugates, thus going from order n to 2n.Suppose we require a bandpass filter with a lower cut-off frequency and an upper cut-off

frequency , and center frequency . We start with finding the poles

and zeros of a low-pass prototype with cut-off frequency . A pole (or zero)

located at then gives two poles (or zeros) in the bandpass design, at locations

Eq. 4-28

where . The zeros of the low-pass filter at

are converted by Eq. 4-28 to .

4.2.2 Impulse invariant filtersIn this case the design criterion is that the impulse response of the digital filter should be asampled version of that of the reference analog filter. Sampling of an analog signal causesrepetition of its spectrum. This is illustrated in Figure 4.9.To avoid extensive aliasing an adequate sampling rate is necessary (part (b)). The effect ofhalving the sampling rate is illustrated in part (c). The effectiveness of the impulse invarianttechnique depends on an adequate sampling rate, and on choosing an analog reference filterwith a limited bandwidth.Recall that Ω is equivalent to ωT, where T is the sampling interval. Hence in part (b) the value

z 1–= z 1+( )z 1+( )2

V z( )X z( )----------- z 1+

z α–------------=

α 0.50953=

v n[ ] αv n 1–[ ] x n[ ] x n 1–[ ]+ +=

W z( )V z( )------------ z 1+( )2

z re jθ–( ) z re j– θ–( )------------------------------------------------- z2 2z 1+ +

z2 2r θ( )cos– r2+---------------------------------------------= =

w n[ ] 2r θ( )cos w n 1–[ ] r2w n 2–[ ]– x n[ ] 2x n 1–[ ] x n 2–[ ]+ + +=

nΩ0( )cos

Ω0± Ω0

Ω2

Ω3 Ω0 Ω3 Ω2+( ) 2⁄=

Ω1 Ω3 Ω2–=

z α=

z 0.5A 1 α+( ) 0.5A 1 α+( )( )2 α–±=

A Ω3 Ω2+( ) 2⁄( )cos Ω3 Ω2–( ) 2⁄( )cos⁄=

z 1–= z 1±=

versie 1.1 4-8 1994


corresponds to , whereas in part (c) it corresponds to .

The starting point of a recursive design technique is the transfer function of the referenceanalog filter Eq. 4-14. Assuming there are no repeated poles we use the partial fractionexpansion to express in the following parallel form

Eq. 4-29

In effect we are decomposing the analog filter into a set of single pole subfilters, whoseoutputs are added together. The impulse response of each analog subfilter takes a simpleexponential form

Eq. 4-30

The impulse response of the impulse invariant digital subfilter is therefore ,

Figure 4.8. A 10th-order Chebyshev bandpass filter.

Figure 4.9. The idea of impulse-invariance.

Ω π= ω π T 1⁄= ω π T 2⁄=

H s( )

H s( )Ki

s pi–-------------

i∑=

hi t( )Kie

pit

0

=t 0≥t 0<

hi n[ ] hi nT( )=

versie 1.1 4-9 1994


where T is the chosen sampling interval. This gives us

Eq. 4-31

with transfer function

Eq. 4-32

The overall digital filter is now built up as a parallel set of subfilters, as shown in Figure 4.10.

As an example we calculate the impulse invariant digital equivalent of an analog third orderButterworth low-pass filter with a cut-off frequency of 1 radian/second, which transferfunction is

Eq. 4-33

with . Given a sampling interval we find using Eq. 4-32

Eq. 4-34

with . The filter can be either implemented in this form, as parallelledfirst and second order subsystems; or we can convert into the series form, using

Eq. 4-35

hi n[ ]Kie

piTn

0

=n 0≥n 0<

Hi z( ) KiepiTnz n–

n 0=

∞

∑Ki

1 epiT z 1––--------------------------

Kiz

z epiT–------------------= = =

Figure 4.10. Designing an impulse-invariant filter by parallel decomposition.

H s( ) 1s 1+( ) s p1–( ) s p1

∗–( )----------------------------------------------------------=

p1 0.5– 0.866 j–= T 0.5=

H z( ) zz e 0.5––------------------- Kz

z e0.5 p1–--------------------- K∗z

z e0.5 p1∗–

-----------------------+ +=

0.5– 0.2887 j+=

H z( ) 0.087z z 0.73+( )z3 2.02z2– 1.46z 0.37–+--------------------------------------------------------------=

versie 1.1 4-10 1994


Figure 4.11. Responses of 3rd-order Butterworth low-pass filters designed by (a)impulse-invariance, and (b) the bilinear transformation.

Figure 4.12. (a) Pole-zero configuration, and (b) impulse response of an impulse-invariant Butterworth low-pass filter.

versie 1.1 4-11 1994


4.3 Frequency sampling filtersThe frequency sampling method is an example of a DSP technique developed from basicprinciples. It also produces FIR filters which offer the advantage of a linear-phase response.We start by considering a digital resonator having a complex conjugate pole-pair on the unitcircle in the z-plane, and a second order zero at the origin.The impulse response of a resonator is given by

Eq. 4-36


Eq. 4-37

An infinite impulse response has transfer function

Eq. 4-38

corresponding to the difference equation

Eq. 4-39

An example with is shown in Figure 4.13.As it stands such a resonator is not a useful processor, because it is unstable. However, itsimpulse response can be made finite by cascading with a very simple form of nonrecursivefilter, known as the comb filter. The combination of comb filter and resonator provides thebasic building block for a complete frequency sampling filter.The comb filter is described by the difference equation

Eq. 4-40


Eq. 4-41

giving m zeros spaced uniformly around the unit circle. These produce a comb frequencyresponse, illustrated in part (c) for the case when . The overall pole-zeroconfiguration of comb filter and resonator is shown in part (d). The poles of the resonator areexactly cancelled by two of the comb filter’s zeros. The recursive difference equation is

Eq. 4-42

which requires only three additions/subtractions. A nonrecursive realization would need manymore additions and subtractions. Part (e) shows that inclusion of the resonator converts thecomb filter into an elementary bandpass characteristic. The center of the passbandcorresponds to the resonator pole locations. If the parameter m is increased, the width of themain passband reduces, and the characteristic tends to a , or sinc, function. Such a

h n[ ] nθ( )sin u n[ ] 12 j----- e jθn e jθn––( )u n[ ]= =

H z( ) 12 j----- 1

1 e jθz 1––------------------------ 1

1 e j– θz 1––--------------------------–

z 1– θ( )sin

1 e jθz 1––( ) 1 e j– θz 1––( )------------------------------------------------------------= =

h n[ ] nθ( )cos u n[ ]=

H z( ) z2

z e jθ–( ) z e jθ––( )------------------------------------------- z2

z2 2 θ( )cos z– 1+-------------------------------------------= =

y n[ ] 2 θ( )cos y n 1–[ ] y n 2–[ ]– x n[ ]+=

θ( )cos 0.5=

y n[ ] x n[ ] x n m–[ ]–=

H z( ) Y z( )X z( )----------- 1 z m–– zm 1–

zm--------------= = =

m 24=

y n[ ] x n[ ] x n 24–[ ]– y n 1–[ ] y n 2–[ ]–+=

x( )sin x⁄

versie 1.1 4-12 1994


filter forms the basis for a complete frequency-sampling filter.Suppose we require a digital filter with the response magnitude characteristic of Figure4.14.a. We first sample it, as in Figure 4.14.b. The required response is now built up bysuperposing a set of sinc functions, each weighted by one of the sample values , and

arranged around it (Figure 4.14.c). Each of the sinc functions is provided by a comb filter-resonator combination. The complete frequency sampling filter uses a single comb filterwhich feeds all the resonators in parallel (Figure 4.14.d). Note that alternate weights must beinverted, because there is a phase reversal between the output of adjacent resonators. Tworemarks are in order. First, the actual filter characteristic will always be an approximation tothe desired one. The superposition of sinc functions does not give an exact replica of thedesired response, particularly near any sharp discontinuity. Second, if we attempt to placepoles (and cancelling zeros) exactly on the unit circle, very small arithmetic errors mayprevent exact cancellation and cause poles to move outside the unit circle.Thus for stabilityreasons poles and zeros are placed at a radius just less than unity.

Figure 4.13. Basis of the frequency-sampling technique

ai

versie 1.1 4-13 1994


4.4 Digital integratorsIntegration can be performed digitally if we assume that the signal samples represent anunderlying analog waveform. In this section we consider several well-known integrationalgorithms as digital filtering operations, and compare their properties in the time andfrequency domains.

4.4.1 Running sumThe simplest integration algorithm, with difference equation

Eq. 4-43

and corresponding transfer function and frequency response

Eq. 4-44

Figure 4.14. Building up a complete frequency-sampling filter.

y n[ ] y n 1–[ ] x n[ ]+=

H z( ) Y z( )X z( )----------- 1

1 z 1––---------------- z

z 1–-----------= = =

versie 1.1 4-14 1994


Eq. 4-45

4.4.2 Trapezoid rule

Eq. 4-46

Eq. 4-47

Eq. 4-48

4.4.3 Simpson’s rule

Eq. 4-49

Eq. 4-50

Eq. 4-51

4.4.4 ComparisonWe may look on each of these methods as giving a polynomial approximation to theunderlying analog signal. The running sum uses a zero-order polynomial, whereas trapezoidand Simpson’s rule are based on, respectively, a first- and second-order polynomial. This isillustrated in Figure 4.15.

An ideal analog integrator would have a magnitude response inversely proportional to the

frequency, with a phase shift of because .

Pole-zero configurations of the three types of digital integrators are shown in Figure 4.16.All have a pole on the unit circle at , and the Simpson algorithm possesses another one

H Ω( ) 11 e j– Ω–------------------- e jΩ

e jΩ 1–-----------------= =

y n[ ] y n 1–[ ] 12--- x n[ ] x n 1–[ ]+( )+=

H z( ) 1 2⁄( ) 1 z 1–+( )1 z 1––

------------------------------------ z 1+2 z 1–( )-------------------= =

H Ω( ) e jΩ 1+2 e jΩ 1–( )-------------------------=

y n[ ] y n 2–[ ] 13--- x n[ ] 4x n 1–[ ] x n 2–[ ]+ +( )+=

H z( ) 1 3⁄( ) 1 4z 1– z 2–+ +( )1 z 2––

----------------------------------------------------- z2 4z 1+ +3 z2 1–( )

--------------------------= =

H Ω( ) e2 jΩ 4e jΩ 1+ +3 e2 jΩ 1–( )

--------------------------------------=

Figure 4.15. Digital integration: (a) the running-sum technique; (b) the trapezoid rule;(c) Simpson’s rule.

π 2⁄ ωt( )cos td∫ ωt( )sin ω⁄=

z 1=

versie 1.1 4-15 1994


at . Therefore we plot the frequency response over a limited frequency range in Figure4.17.

The greatest differences are in the higher frequency part ( ). For example, if we needto integrate a signal contaminated with random fluctuations or ‘noise’ (much of which isgenerally high frequency) it may be best to use the trapezium rule, which will reduce theeffects of the noise.

Figure 4.16. Pole-zero configurations of the running-sum, trapezoid and Simpsonintegrators.

z 1–=

Figure 4.17. Frequency responses of four digital integrators over the range. (a) Ideal; (b) running sum; (c) trapezoid; (d) Simpson.0.05π Ω 0.95π< <

Ω 0.4π>

versie 1.1 4-16 1994

Digital Signal Processing Spectral analysis

5 Spectral analysis

5.0 IntroductionIn this chapter we return to the application of digital spectral analysis upon real world signals.With naturally occurring signals some applications involve searching for a wanted signal inthe presence of unwanted disturbances or ‘noise’, on the basis of their different spectraldistributions. Examples arise in the analysis of speech and biomedical signals such as theEKG (electrocardiogram). A different application is the measurement of the response of asystem which is deliberately disturbed with a suitable input signal. Spectral analysis thenyields information about the frequency dependent properties of the system. Examples hereofarise in testing of electronic circuits and filters, the analysis of vibrations in buildings andstructures, and in radar, sonar and seismology. In chapter 1 we have discussed sampling,aliasing and (spectral) leakage. Here we continue our discussion of leakage, discuss spectralresolution, and present some examples which illustrate the effects of windowing and zero-padding.

5.1 Spectral leakageSpectral analysis with the discrete Fourier transformation (DFT) means discrete time anddiscrete frequencies, so a limited time observation window, which is repeated to obtain aperiodic time-signal, in order to apply the DFT.The DFT of a signal that contains only harmonic frequencies (multiples) of the fundamentalfrequency , results in a line spectrum. The DFT of a signal that containsfrequencies which are not harmonic frequencies of the fundamental frequency gives awidening of the spectral lines, this widening is called leakage. The origin lies again in thefinite length of the sum over N. Mathematically this can be regarded as the result of“windowing”. An alternative explanation for leakage which gives valuable insight into thenature of the DFT, is to regard the DFT as a type of filtering process. A DFT behaves like a setof elementary bandpass filters which split the signal into its various frequency components.This is illustrated in Figure 5.1. It is important to notice that the peak response of each filter

coincides with zero response in its neighbours. Figure 5.1.a does not give the completepicture, because each elementary filter characteristic has substantial sidelobes to either side ofits main lobe. As the transform length increases, each characteristic tends to a sinc function(Figure 5.1.b). The width of its main lobe is radians, the sidelobes are radians

Ω0 2π N⁄=

Figure 5.1. An 8-point DFT considered as a set of elementary bandpass filters.

4π N⁄ 2π N⁄

versie 1.1 5-1 1994


wide, with amplitudes decreasing away from the center frequency . Note that the zerocrossings coincide with the center frequencies of the other filters. Thus, a signal component atan exact harmonic frequency only produces an output from one of the filters. If a componentis displaced slightly from the filter’s center frequency, it gives a smaller peak response, plus awhole series of sidelobe responses from adjacent filters. This spectral leakage effect isillustrated in Figure 5.2.

Increasing N, the number of FFT filters, will increase the frequency resolution in proportion.If we wish to resolve closely-spaced frequency components, we must work with a longportion of signal. Furthermore, increasing N will decrease the leakage, since a nonharmonicfrequency (like in Figure 5.2.) will usually become closer to a filter center frequency, and theleakage will extend over a more limited portion of the frequency domain.

Spectral analysis may provide a useful method for detecting a signal in the presence of noise.Suppose we believe the data shown in Figure 5.3.a contains a periodic square wave buried innoise. The transform in Figure 5.3.b shows a pronounced peak at the 32nd harmonic, and alesser peak at the 96th harmonic. Now random noise in which successive time-domainsamples are statistically independent has a flat spectrum, the individual spectral linesdisplaying chance amplitude variations.

The FFT is consistent with the view that the time domain data consist of white noise, plus asignal which is strong in the 32nd and 96th harmonics. This is probably a square wave with a

Ωc

Figure 5.2. Spectral leakage effects for (a) component lying midway between twoharmonics, and (b) component lying a quarter of the way between two harmonics.

Figure 5.3. Using the FFT to detect a signal in noise.

versie 1.1 5-2 1994


fundamental frequency corresponding to the FFT’s 32nd harmonic. The reason for the FFT’ssuccess in this example is that the signal’s spectral energy is well concentrated, whereas thenoise is wideband. Such techniques are likely to be valuable whenever signal and noise havesubstantially different spectral distributions.

5.2 WindowingA signal observed during a limited time window may be seen as the signal multiplied by arectangular window (Eq. 1-111, Eq. 3-17). It follows that time-domain windowing causes thespectrum of the ‘raw’ signal to be convolved with that of the window. The rectangular, or‘do-nothing’, window has the narrowest possible main lobe, but large, sinc-function sidelobes. It causes no spreading of exact harmonic components, but it produces a lot of spectralleakage with non-harmonics (figure 1-14.). As we learned in our discussion of truncating aninfinite impulse response to arrive at a FIR filter in section 3.3 a variety of other windowsexist. They all involve a different trade-off between a narrow main spectral lobe (to preventlocal spreading of the spectrum), and low sidelobe levels (to reduce distant spectral leakage).

This is illustrated in Figure 5.4. The signal contains an exact harmonic (the 9th), two closelyspaced exact harmonics (the 51st and 53th) and a non-harmonic (midway between the 24th

Figure 5.4. The use of windows in FFT analysis: (a) rectangular; (b) triangular; (c)Hamming, and (d) Hamming applied to the first and last 20 values.

versie 1.1 5-3 1994


and 25th). The magnitudes of the FFT are shown on the right. With the rectangular window(Figure 5.4.a) the harmonics stand out clearly as single spectral lines, however there is a lot ofdistant leakage around the non-harmonic component. With the triangular window (Figure5.4.b) the individual spectral lines have broadened, and there are significant sidelobes. The51st and 53th harmonic terms can hardly be disentangled. However, leakage around the non-harmonic component has been considerably reduced. With the Hamming window (Figure5.4.c) distant leakage is dramatically reduced compared to Figure 5.4.b. This is because of thelow sidelobe levels of the Hamming function (see also Figure 3.9.). The effect of tapering justthe ends of the data are shown in Figure 5.4.d. Note that it is now easier to distinguish theclose-spaced 51st and 53th harmonics, but there is generally more spectral leakage than inFigure 5.4.c.In conclusion, spectral analysis and windowing are rather complicated. In some cases it maybe best to leave the data alone (rectangular, or ‘do-nothing’, window), for example if a signalhas close spaced components of roughly the same magnitude. Conversely, if the amplitudesare very different, a window with low side lobes will reduce leakage around the largecomponent, and should make the small one easier to detect. Finally, it is very important toremember that sensible interpretation of an FFT depends on knowing what form of windowhas been used.

5.3 Investigating LTI systemsThe investigation of an LTI system by means of FFT analysis is summarized in Figure 5.5.

A wideband input signal disturbs the system under test, and its output or response

is recorded. (By wideband it is meant that must contain a significant amount ofall frequencies likely to be transmitted by the system. Only if this condition is met can weexpect to characterize the system completely.) FFT analysis of the output signal gives theoutput spectrum, which after division by the input spectrum yields the system’s frequencyresponse. Thus:

Eq. 5-1

The simplest implementation of Eq. 5-1 involves using an input impulse. The output is thenthe impulse response, which transforms directly to the frequency response. When this ispractically impossible, a step input may be preferred.Of course we cannot expect the impulse response of a system being investigated to have anumber of sample values equal to an exact integer power of 2 needed to perform an FFT. Sothe usual approach is to add zeros to the time-domain data to bring it up to the required length.The addition of zeros is referred to as zero-filling or zero-padding. Figure 5.6. shows thetypical effects of zero filling. The impulse response represents an LTI system withclose-spaced humps or ‘resonances’ in its frequency response, and is therefore useful forillustrating spectral resolution. Note that this has about 180 sample values of

Figure 5.5. Using the FFT to explore system properties.

x n[ ]y n[ ] x n[ ]

H k[ ] Y k[ ]X k[ ]------------=

h n[ ]

h n[ ]

versie 1.1 5-4 1994


significant size. In Figure 5.6.a it is zero-filled up to 256 points. The magnitude of the FFTdisplays two frequency response peaks, and also responds significantly around zero frequency(DC). A 180-point DFT or FFT would give a frequency response with slightly lowerresolution, but the same ‘envelope’. The harmonic frequencies would be multiples of

, rather than radians. Figure 5.6.b shows the effect of zero filling up to 512points. In the frequency domain the number of spectral lines and resolution have doubled, thus‘oversampling’ the ‘envelope’. Although this does not improve the basic information contentof the frequency response, it may aid interpretation. For instance, in Figure 3.3. the plots of

are greatly oversampled. However, when a 5-term moving average impulse responseis transformed by a 5-point DFT, the impulse response is constant, resulting in only onenonzero coefficient of the transform at zero frequency, see Figure 5.7.. The other frequencydomain samples coincide with nulls (at ). We conclude that althoughsuch a DFT is theoretically adequate, it may be unhelpful for visualizing the detailed shape ofa response.

Figure 5.6. Effects of zero-filling on spectral resolution.

2π 180⁄ 2π 256⁄

H Ω( )

Ω 2± π 5⁄ 4± π 5⁄,=

versie 1.1 5-5 1994


Of course, these comments on zero-filling (made in the context of impulse responses and LTIsystems) are also valid for the analysis of digital signals.

Finally, Figure 5.6.c illustrates the 128-point FFT of the truncated . The time function isnow only an approximation to the true impulse response, leading to errors in the spectralfunction. The truncated version of displays sudden discontinuities when repeated end-on-end, producing spectral leakage. Spectral resolution has been halved (compared to Figure5.6.a), making interpretation more difficult.

Figure 5.7. (a) 5-term moving average impulse response, with its (b) 5-point DFTmagnitude.

n 0=

h n[ ]

a) b)Ω 0=

1/5

Ω 4π 5⁄=

1.0H Ω( )

h n[ ]

h n[ ]

versie 1.1 5-6 1994

Digital Signal Processing Time series analysis

6 Time series analysisIn this chapter we will consider methods for building, identifying, fitting and checking models

for time series and dynamic systems. The methods discussed are appropriate for discrete

(sampled data) systems, where observation of the system and an opportunity to take control

action occur at equally spaced intervals of time ( ). In general we will assume that the

processes are stationary and possess zero mean, unless noted otherwise. We will abbreviate

sampled data as scaling the time axis as . Continuous

data are written as .

Three important applications of time series analysis are:

• the determination of the transfer function of a system.

• the forecasting of future values of a time series from current and past values (for

example predicting the world average temperature or stock prizes).

• the design of feedforward and feedback control schemes by means of which potential

deviations of the system output from a desired target may be compensated, so far as

possible (for instance chemical processes).

We will concentrate on linear time invariant (LTI) systems. After a brief sketch of the model

we will discuss stochastic processes and the tools of spectral analysis and correlation

functions. Thereafter we will consider the parameter estimation aspects, returning to the well

known maximum likelihood and least squares methods. We will only consider the first two

applications.

Discrete-time difference equation models

A model frequently used to describe linear dynamic discrete time transformations of an

arbitrary stochastic process into a process is

Eq. 6.1

When we have a regression of upon its previous values

and Eq. 6.1 is termed an autoregressive (AR) model (also called all-pole or

linear prediction model). When is a convolution of with , and the

model is referred to as moving average (MA) model. The general model therefore is referred

∆

y n∆( ) y n( ) y n[ ] yn

= = = ∆ 1=

y t( )

an yn

yn

φ1yn 1– φ2y

n 2– … φpyn p– an θ1an 1–– …– θqan q––+ + + +=

θ1 … θq 0= = = yn

yn 1– … y

n p–, ,

φ1 … φp 0= = = yn

a θ

Versie 1.1 6-1 1994


to as autoregressive-moving average (ARMA) model. In practice small numbers p and q

( ) are sufficient.

Stochastic processes

A stochastic (or random) process, , is a family of random variables indexed by the

symbol t, where t belongs to some given index set T. If t takes a continuous range of real

values so that , is said to be a continuous parameter process. If t takes a

discrete set of values, , then is said to be a discrete parameter

process. A particular record of is called a realization of the process, whereas the

collection of all possible records is called the ensemble.

Theorem 6.1. For any positive integer n, let be any admissible set of values

of t. Then under general conditions the probabilistic structure of the stochastic

process is completely specified if we are given the joint probability

distribution of for all values of n and all choices of

.

The process is said to be completely stationary if for any k the joint probability distribution of

is identical with the joint probability distribution of

. It is stationary up to order 2 (also called Wise Sense Stationary,

WSS) if the joint moments up to order 2 exist and are independent of time: ,

and .

Autocovariance and autocorrelation functions

The autocovariance function of a stationary stochastic process is defined as

Eq. 6.2

where . The autocorrelation function is defined as

Eq. 6.3

By definition and are symmetric.. Analogous to Theorem A.1 we have the

0 1 2 …, , ,

y t( )

T R1⊆ y t( )

t 0 1± 2± …, , ,= y t( ) yt

=

y t( )

t1 t2 … tn, , ,

y t( )

y t1( ) …y tn( ),

t1 t2 … tn, , ,

y t1( ) …y tn( ),

y t1 k+( ) …y tn k+( ),

E y t( )[ ] µ=

E y2 t( )[ ] σ2= cov y t( ) y s( ), E y t( )y s( )[ ] µ2– function of t s only–= =

r τ( ) E y t( )y t τ+( )[ ] µ2–= rτ E yty

t τ+[ ] µ2–=

r 0( ) σ2 r0= =

ρ τ( ) r τ( )r 0( )----------= ρτ

rτr0----=

r τ( ) ρ τ( )

Versie 1.1 6-2 1994


Theorem 6.2. The function is positive semi-definite in the sense that for any set of

time points and all real

Eq. 6.4

Proof: define the random variable . Then (see Eq. A.10)

.

Gaussian processes

is called a Gaussian (normal) process if for any n and any admissible subset

the joint probability distribution of is multivariate normal. A

stationary Gauss process is completely determined by its mean and autocovariance

function . Thus for Gaussian processes stationarity up to order 2 implies complete

stationarity.

Example 1. is called a purely random process if it consists of

uncorrelated random variables: . This process has a flat power spectrum (to

which we return later) and is therefore referred to as white noise. In the following will

denote Gaussian white noise.

Example 2. A first order autoregressive process satisfies the difference equation

Eq. 6.5

Assuming the initial condition so that we find

Eq. 6.6

r t( )

t1 t2 … tn, , , z1 z2 … zn, , ,

r t p tq–( )zpzqp q, 1=

n

∑ zT D y( )z 0≥=

w zpy t p( )p 1=

n

∑ zT y= =

0 D w( )≤ D zT y( ) zT D y( )z zT E y µ1–( ) y µ1–( )T[ ]z= = = =

zpE yp

µ–( ) yq

µ–( )[ ]zqp q, 1=

n

∑ r t p tq–( )zpzqp q, 1=

n

∑=

y t( )

t1 t2 … tn, , , y t1( ) …y tn( ),

E y t( )[ ]

r τ( )

νt t, 0 1± …, ,=

rτ σ2δτ 0,=

νt

yt

φyt 1– νt+=

y0 0= y1 ν1=

yt

φt i– νii 1=

t

∑ φiνt i–i 0=

t 1–

∑= =

Versie 1.1 6-3 1994


Thus . Thus if

is a function of t and is not stationary even to order 1. Assuming we

find for the autocovariance

for . Now if

then for t sufficiently large and . Thus

is asymptotically stationary up to order 2 if . Solving the difference equation

Eq. 6.7

where B is the backward difference operator defined by . Thus B equals the shift

operator known from the z-transform in Digital Signal Processing. A particular solution

of Eq. 6.7 is given by

Eq. 6.8

Thus the general solution is

Eq. 6.9

where c denotes a constant whose value is determined by the initial condition. The term

decays to zero if , which again is the condition for asymptotic stationarity.

Intermezzo Spectral representation

The Fourier series of a periodic function can be written as

Eq. 6.10

whereas the Fourier transform of a non-periodic, absolutely integrable, function is

E yt

[ ] µν1 φt–1 φ–------------- = if φ 1≠ µνt= if φ 1= µν 0≠

E yt

[ ] y t( ) µν 0=

E yty

t τ+[ ] E φt p– νpp 1=

t

∑

φt τ q–+ νqq 1=

t τ+

∑ σν2 φt p– φt τ p–+

p 1=

t

∑= =

σν2φτ1 φ2t–1 φ2–---------------- = if φ 1≠ σν2t= if φ 1= τ 0≥ φ 1<

σy2 σν2 1 φ2–( )⁄= ry τ, σν2φτ( ) 1 φ2–( )⁄=

y t( ) φ 1<

yt

φyt 1–– 1 φB–( )y

tνt= =

Byt

yt 1–=

z 1–

yt

1 φB–( ) 1– νt φsBs

s 0=

∞

∑

νt φsνt s–s 0=

∞

∑= = =

yt

cφt φsνt s–s 0=

∞

∑+=

cφt

φ 1<

f x( )

f x( ) areiωrx

r ∞–=

∞

∑=

Versie 1.1 6-4 1994


Eq. 6.11

Eq. 6.10 and Eq. 6.11 can be combined into the Fourier-Stieltjes transform

Eq. 6.12

where is differentiable for non-periodic, absolutely integrable, :

. On the other hand, if is periodic is a step function with

increments at , and otherwise (see Fig.6.1.). An important

property of Eq. 6.12 is that it also allows the representation of other processes, where

.

Wiener-Khintchine theorem

Important theories expressing the relation between the autocorrelation function and the power

spectrum are the following (proofs can be found in Priestley, 1981, p.218-226)

Theorem 6.3. (Wiener-Khintchine) A necessary and sufficient condition for to be

the autocorrelation function of some stochastically continuous stationary

process, , is that there exists a function, , having the properties of

a distribution function on (i.e. , , and

non-decreasing), such that, for all , may be expressed in the

form

f x( ) p ω( )eiωxdω∞–

∞

∫=

f x( ) eiωxdP ω( )∞–

∞

∫=

P ω( ) f x( )

dP ω( ) p ω( )dω= f x( ) P ω( )

dP ω( ) ar= ωr dP ω( ) 0=

Fig.6.1. for periodic function.P ω( )

dP ω( ) dω∼

ρ τ( )

y t( ) F ω( )

∞– ∞,( ) F ∞–( ) 0= F ∞( ) 1=

F ω( ) τ ρ τ( )

Versie 1.1 6-5 1994


Eq. 6.13

Theorem 6.4. (Wold) A necessary and sufficient condition for the sequence

to be the autocorrelation function for some discrete

parameter stationary process, , is that there exists a

function, , having the properties of a distribution function on

(i.e. , , and is non-decreasing), such that

Eq. 6.14

The distribution function can be decomposed into a differentiable part and a step function

(remember the Fourier-Stieltjes transform introduced in the Intermezzo)

Eq. 6.15

ARMA processes, purely random processes and general linear processes have

. Thus and we have the Fourier pairs

Eq. 6.16

Eq. 6.17

Processes having possess a purely discrete spectrum. An example is the

harmonic process

Eq. 6.18

When we have the process possesses a mixed spectrum.

Example 1 (continued). From Eq. 6.17 we find, using , that

Eq. 6.19

ρ τ( ) eiωτdF ω( )∞–

∞

∫=

ρτ; τ = 0 1± 2± …, , ,

yt

t, 0 1± 2± …, , ,=

F ω( ) π– π,( )

F π–( ) 0= F π( ) 1= F ω( )

ρτ eiωτdF ω( )π–

π

∫= τ 0 1± 2± …, , ,=

F ω( ) c1F1 ω( ) c2F2 ω( )+=

c1 c2,( ) 1 0,( )= dF ω( ) dF1 ω( ) f ω( )dω= =

ρ τ( ) eiωτ f ω( )dω∞–

∞

∫= f ω( ) 12π------ ρ τ( )e iωτ– dτ

∞–

∞

∫=

ρτ eiωτ f ω( )dωπ–

π

∫= f ω( ) 12π------ ρτe iωτ–

τ ∞–=

∞

∑= π– ω π≤ ≤

c1 c2,( ) 0 1,( )=

y t( ) ak ωkt ϕk

+( )cosk 1=

K

∑=

c1 0> c2 0>

ρτ δτ 0,=

f ω( ) 12π------= π– ω π≤ ≤

Versie 1.1 6-6 1994


Thus the power spectrum of Gaussian white noise is independent of frequency. Hence by

analogy with white light which contains all colours (frequencies) the term white noise.

Example 2 (continued). The autocorrelation function of the AR(1) process is given by

Eq. 6.20

From Eq. 6.17 we find

Eq. 6.21

For most of the power is concentrated at the low frequency end of the spectrum,

whereas when the power is concentrated mainly at the high frequency end.

Example 3. Moving average process of order 1 (MA(1)).

Eq. 6.22

Then and

Eq. 6.23

We thus find that whereas if . In general,

if , the order of the moving average process. The spectrum is calculated from

Eq. 6.24

which is shown for two examples in the right-hand part of Fig.6.2. Note that the numerator of

Eq. 6.24 ( ) corresponds to the denominator in Eq. 6.21 ( ).

Inverting Eq. 6.22 we find

Eq. 6.25

The process is thus invertible if .

ρτE y

ty

t τ+[ ]E y

ty

t[ ]

------------------------ φ τ= = φ 1<( )

f ω( ) 12π------ φ τ e iωτ–

τ ∞–=

∞

∑ 12π------ 1– φe iω–( )τ φeiω( )τ+

τ 0=

∞

∑+

= = =

12π------ 1– 1

1 φe iω––---------------------- 1

1 φeiω–--------------------+ +

1

2π------ 1 φ2–

1 2φ ωcos– φ2+----------------------------------------

=

φ 0>

φ 0<

yt

νt θνt 1–– 1 θB–( )νt= =

E yt

[ ] 0=

rτ E νt θνt 1––( ) νt τ+ θνt τ 1–+–( )[ ] σν2 δτ 0, 1 θ2+( ) θ δτ 1, δτ 1–,+( )– = =

ρ1 ρ 1– θ 1 θ2+( )⁄–= = ρr 0= r q> 1=

ρr 0= r q>

f ω( ) 12π------ ρτe iωτ–

τ ∞–=

∞

∑ 12π------ 1 2θ ωcos

1 θ2+-------------------–

= =

1 2θ ωcos– θ2+ 1 2φ ωcos– φ2+

νt 1 θB–( ) 1– yt

θB–( ) jyt

j 0=

∞

∑= = if θ 1<( )

θ 1<

Versie 1.1 6-7 1994


Autocorrelation function of autoregressive processes

We consider an autoregressive process of order p:

Eq. 6.26

multiply by and take expectations:

Eq. 6.27

Thus we find

Eq. 6.28

ig.6.2. Realizations from two first-order autoregressive (left) and moving average (right)

processes and their corresponding theoretical autocorrelation and spectral density

functions.

tφ1y

t 1– φ2yt 2– … φpy

t p– νt+ + + +=

yt k–

E yty

t k–[ ] E φ1yt 1– φ2y

t 2– … φpyt p– νt+ + + +( )y

t k–[ ]= k 0>

ρk φ1ρk 1– … φpρk p–+ += k 0>

Versie 1.1 6-8 1994


which is analogous to the difference equation Eq. 6.26. Eq. 6.28 can be rewritten as

Eq. 6.29

where B now operates on k. The general solution is

Eq. 6.30

where the are the roots of the characteristic equation . For stationarity we

require that . We distinguish two situations:

• a root is real, we refer to this as a damped exponential

• a pair of roots is complex, in which case they contribute a term to

the autocorrelation function in Eq. 6.30, which follows a damped sine wave.

In general the autocorrelation function of a stationary autoregressive process will consist of a

mixture of damped exponentials and damped sine waves.

If we substitute in Eq. 6.28 we obtain a set of linear equations for in

terms of which are called the Yule-Walker equations:

Eq. 6.31

NB higher order autocorrelation coefficients contain no independent information, e.g.

where we have used Eq. 6.28.

The partial autocorrelation function

When the order p of the autoregressive process is unknown we can determine it from the

partial autocorrelation function. Consider a process of order k, with Yule-Walker equations

Eq. 6.32

where , and with . We

can solve Eq. 6.32 for successively to obtain

φ B( )ρk 0= with φ B( ) 1 φ1B– …– φpBp–=

ρk aigik

i 1=

p

∑=

gi1– φ B( ) 0=

gi 1<

gi

gi g j, dk ωk ψ+( )sin

k 1 … p, ,= φ1 … φp, ,

ρ1 … ρp, ,

ρ1

ρ2

…ρp

1 ρ1 … ρp 1–

ρ1 1 … ρp 2–

… … … …ρp 1– ρp 2– … 1

φ1

φ2

…φp

= ρ Pφ=

ρp 1+ φ1ρp … φpρ1+ +=

Pkφk ρk=

φk φk1 … φkk, ,( )T≡ ρk ρk1 … ρkk, ,( )T≡ Pk ij, ρ i j–= 1 i j k≤,≤

k 1 2 3 …, , ,=

Versie 1.1 6-9 1994


Eq. 6.33

For an autoregressive process of order p the partial autocorrelation function will be

nonzero for and zero for . This reminds of being zero for with a moving

average process of order q. Conversely, of this moving average process is infinite in

extent and is dominated by damped exponentials and/or damped sine waves.

Example 3 (continued) It can be shown that for the MA(1) process

(Box and Jenkins, 1976, p.70).

We summarize the properties of stationary ARMA processes in Table 1 and illustrate and

for various AR(2), MA(2) and ARMA(1,1) processes in Fig.6.3.

Table 1 Properties of autoregressive, moving average and mixed ARMA processes

Autoregressiveprocesses

Moving averageprocesses

Mixed processes

Model in terms ofprevious

Model in terms ofprevious

Stationaritycondition

roots of lie

outside unit circle

always stationary roots oflie outside unit circle

Invertibilitycondition

always invertible roots oflie outside unit circle

roots oflie outside unit circle

Autocorrelationfunction

infinite (dampedexponentials and/ordamped sine waves)

finite infinite (dampedexponentials and/ordamped sine wavesafter first lags)

tails off cuts off tails off

Partialautocorrelationfunction

finite infinite (dominatedby dampedexponentials and /ordamped sine waves)

infinite (dominatedby dampedexponentials and/ordamped sine wavesafter first lags)

cuts off tails off tails off

φ11 ρ1= φ221 ρ1

ρ1 ρ2

1 ρ1

ρ1 1÷

ρ2 ρ12–

1 ρ12–

-----------------= =

φkk

p≤ p> ρk k q>

φkk

φkk θk 1 θ2–( )( ) 1 θ2 k 1+( )–( )⁄=

ρk

φkk

yt

φ B( )yt

νt= θ 1– B( )yt

νt= θ 1– B( )φ B( )yt

νt=

νt

yt

φ 1– B( )νt= yt

θ B( )νt= yt

φ 1– B( )θ B( )νt=

φ B( ) 0=φ B( ) 0=

θ B( ) 0= θ B( ) 0=

q p–

p q–

Versie 1.1 6-10 1994


Fig.6.3.

Versie 1.1 6-11 1994


Linear nonstationary models

We found that the AR(p) process was stationary if the roots of lie outside the unit

circle. A special kind of nonstationary process is obtained if we assume that d roots are equal

to 1. Thus let us consider

Eq. 6.34

where is a stationary autoregressive operator. Now corresponds to the difference

operator . We can write Eq. 6.34 as

Eq. 6.35

The inverse of the difference operator is the infinite summation operator S:

Eq. 6.36

Thus

Eq. 6.37

which implies that Eq. 6.34 can be obtained by summing (or “integrating”) the stationary

process d times. Therefore we call the process of Eq. 6.34 an autoregressive integrated

moving average (ARIMA) process. Examples of ARIMA processes are shown in Fig.6.4.

φ B( )

ϕ B( )yt

φ B( ) 1 B–( )dyt

θ B( )νt= =

φ B( ) 1 B–

∇

φ B( )wt θ B( )νt= ∇dyt

wt=

Sxt xhh ∞–=

t

∑ 1 B B2 …+ + +( )xt 1 B–( ) 1– xt ∇ 1– xt= = = =

yt

Sdwt=

Fig.6.4. Two kinds of homogeneous nonstationary behaviour which can be described as

ARIMA processes with and .d 1= d 2=

Versie 1.1 6-12 1994


Addition of explanatory variables: ARMAX

In addition to using past values to model a series it is often desirable to use explanatory or

regression variables. The regression variables may simply be a constant (intercept) term, a

deterministic function of time, or lagged values of another time series. In particular, if trends

in tend to be anticipated by changes in , economists call a leading indicator for . A

necessary condition for such a model is that the explanatory variables are weakly exogeneous.

The ARMAX model is defined as

Eq. 6.38

where indicates the orders of the respective polynomials in the lag operator

B. Any transfer function may be expressed as an ARMAX model with restrictions on the lag

polynomials. When the model is referred to as autoregressive distributed lag.

Efficient estimation of the parameters in the ARMAX model of Eq. 6.38 requires that no

information on the parameters is lost by conditioning on the explanotory variables. In other

words, the explanatory variables may be treated as though they are fixed in repeated samples,

even though they may be generated by a stochastic mechanism in the same way as . If this

condition is satisfied, the explanatory variables are said to be weakly exogeneous. For

purposes of prediction, stronger conditions must be placed on the explanatory variables.

Specifically, there must be no feedback between them and the dependent variable. If this

condition holds, the explanatory variables are said to be strongly exogeneous. Strong

exogeneity implies weak exogeneity, but not vice versa.

Co-integration

Suppose some kind of steady-state relationship exists between the dependent variable and

explanatory variables , while both are non-stationary time series. It may still be possible to

model this with an ARMAX model, provided the series are co-integrated. Two series and

are said to be co-integrated of order d,b (denoted as ) if (i) they are both

stationary after differencing d times (denoted as , integrated of order d, Eq. 6.37), and

y x x y

p s1 … sk q, , , ,( )

φ B( )yt

βi B( )xtii 1=

k

∑ θ B( )νt+=

p s1 … sk q, , , ,( )

q 0=

yt

y

x

yt

xt CI d b,( )

I d( )

Versie 1.1 6-13 1994


(ii) there exists a linear combination of them for which the order of integration is smaller than

d, it is with .

Example 4. An example of a steady-state relationship between two variables is:

Eq. 6.39

where and are parameters and is a zero mean stationary disturbance term. The series

and are because there exists a stationary linear combination

Eq. 6.40

However, estimation of the parameters is non trivial, see the discussion in Harvey (1990).

Spectral representation of stationary stochastic processes

Consider first one realization of the process. It is neither periodic, nor absolutely integrable

(because it does not decay to zero as ). But each realization can be expressed as a

Fourier-Stieltjes transform:

Eq. 6.41

where , which is a complex valued stochastic process, is not differentiable, for then Eq.

6.41 would reduce to an ordinary Fourier integral. Eq. 6.41 is called the spectral

representation of the process , and it tells us that (virtually) any stationary process can be

represented as (the limit of) the sum of sine and cosine functions with random coefficients

, or more precisely, with random amplitudes and random phases

. The relationship between and the spectral properties of is

expressed in

Eq. 6.42

Analogous to Eq. 6.15 we have . When

we have and

Eq. 6.43

I d b–( ) b 0>

I 1( )

yt

β0 β1xt νt+ +=

β0 β1 νt

yt

xt CI 1 1,( )

zt yt

β1xt– β0 νt+= =

t ∞±→

y t( ) eiωtdz ω( )∞–

∞

∫=

z ω( )

y t( )

dz ω( ) dz ω( )

arg dz ω( ) z ω( ) y t( )

E dz ω( ) 2[ ] dH ω( ) σy2dF ω( )= =

H ω( ) c1H1 ω( ) c2H2 ω( )+= c1 c2,( ) 1 0,( )=

dz ω( ) dω∼

h ω( )E dz ω( ) 2[ ]

dω----------------------------- σy

2 f ω( )= =

Versie 1.1 6-14 1994


where represents the (non-normalized) power spectral density function of the process.

It is used as a means of describing the energy/frequency properties of the stochastic process.

Eq. 6.41 is expressed more formally in the following

Theorem 6.5. Let , , be a zero-mean stochastically continuous

stationary process. Then there exists an orthogonal process, , such that

for all t, may be written in the form of Eq. 6.41, the integral being

defined in the mean square sense. The process has the following

properties

, all

, all

for any two distinct frequencies, , , ( ),

Eq. 6.44

Theorem 6.6. Let , , be a zero-mean stationary process. Then there

exists an orthogonal process, , , such that for all integral t,

Eq. 6.45

The process has the same properties as in Theorem 6.5, and in particular

, .

Proof (brief sketch): Regard as an element of a Hilbert space, and define the inner product

as so that the norm of is given by . We now introduce

the forward difference operator F:

Eq. 6.46

Now since is a stationary process we have for all s, t

Eq. 6.47

Thus the operator F preserves inner products and is a unitary operator. A basic result states

that a unitary operator has a spectral representation of the form

h ω( )

y t( ) ∞– t ∞< <

z ω( )

y t( )

z ω( )

E dz ω( )[ ] 0= ωE dz ω( ) 2[ ] dH ω( )= ω

ω ω' ω ω'≠

cov dz ω( ) dz ω'( ),[ ] E dz∗ ω( )dz ω'( )[ ] 0= =

yt

t 0 1± …, ,=

z ω( ) π– ω π≤ ≤

yt

eiωtdz ω( )π–

π

∫=

z ω( )

E dz ω( ) 2[ ] dH ω( )= π– ω π≤ ≤

yt

u v( , ) E u∗v[ ]= u u u( , ) E u 2[ ]=

Fyt

yt 1+=

yt

Fys

Fyt

,( ) E ys 1+

∗yt 1+[ ] E y

s∗y

t[ ] y

sy

t,( )= = =

Versie 1.1 6-15 1994


Eq. 6.48

where the are the so-called orthogonal projection operators. From the

orthogonality property of it may be shown that, for all integral t, has the

representation

Eq. 6.49

The spectral representation of is now immediate on writing

Eq. 6.50

where . The orthogonality property of follows from the

corresponding property of . Alternatively, we can write down the autocovariance

function, inserting Eq. 6.41 into Eq. 6.2

Eq. 6.51

Now the left-hand side is a function of s only; hence the right-hand side must be a function of

s only and must not depend on t. This can only be true if the contribution to the double integral

is zero when which tells us that

Eq. 6.52

Thus Eq. 6.51 (in combination with Eq. 6.43) becomes an alternative proof of the Wold

theorem:

Eq. 6.53

Cross-covariance and cross-correlation functions

The bivariate process is stationary when and are stationary and

is a function of only. Next to the usual autocovariance and

F eiωde ω( )π–

π

∫=

de ω( )

de ω( ) Ft

Ft eiωtde ω( )π–

π

∫=

yt

yt

Fty0 eiωtde ω( )π–

π

∫

y0 eiωtdz ω( )π–

π

∫= = =

z ω( ) e ω( )y0= z ω( )

e ω( )

rs E e iωt– dz∗ ω( ) eiω' t s+( )dz ω'( )π–

π

∫π–

π

∫=

ω ω'≠

E dz∗ ω( )dz ω'( )[ ] 0= for all ω ω'≠

rs eiωsE dz ω( ) 2[ ]π–

π

∫ eiωsdH ω( )π–

π

∫ eiωsσy2dF ω( )

π–

π

∫= = =

y1 t, y2 t,, y1 t, y2 t,

cov y1 t, y2 s,,( ) s t–( )

Versie 1.1 6-16 1994


correlation functions, Eq. 6.2 and Eq. 6.3, we now also have the cross-covariance function,

which describes the correlation structure between the processes

Eq. 6.54

and the cross-correlation function which is given by

Eq. 6.55

Substituting the spectral representations (Eq. 6.41)

Eq. 6.56

into Eq. 6.54 gives

Eq. 6.57

Now the left-hand side is a function of s only; hence the right-hand side must be a function of

s only and must not depend on t. This can only be true if the contribution to the double integral

is zero when which tells us that

Eq. 6.58

i.e. , are not only orthogonal, but also cross-orthogonal. Using this

property Eq. 6.57 reduces to

Eq. 6.59

Analogous to Eq. 6.17 and Eq. 6.43 we now have for processes with continuous spectra and

for

Eq. 6.60

r21 s, cov y1 t, y2 t s+,,( ) E y1 t, µ1–( )∗ y2 t s+, µ2–( )[ ]= =

ρ21 s,r21 s,

r11 0, r22 0,---------------------------=

y1 t, eiωtdz1 ω( )π–

π

∫= y2 t, eiωtdz2 ω( )π–

π

∫=

r21 s, E e iωt– dz1∗ ω( ) eiω' t s+( )dz2 ω'( )

π–

π

∫π–

π

∫=

ω ω'≠

E dz1∗ ω( )dz2 ω'( )[ ] 0= for all ω ω'≠

dz1 ω( ) dz2 ω( )

r21 s, eiωsE dz1∗ ω( )dz2 ω( )[ ]

π–

π

∫=

π– ω π≤ ≤

ρ21 τ, eiωτ f 21 ω( )dωπ–

π

∫= f 21 ω( ) 12π------ ρ21 τ, e iωτ–

τ ∞–=

∞

∑=

r21 τ, eiωτh21 ω( )dωπ–

π

∫= h21 ω( ) 12π------ r21 τ, e iωτ–

τ ∞–=

∞

∑=

Versie 1.1 6-17 1994


Eq. 6.61

The interpretation of the (non-normalized) cross-spectral density is that

represents the average value of the product of the coefficients of in and .

Alternatively, we may say that whereas , , represent the variances of

, , respectively, represents the covariance between and

.

The complex coherency (at frequency ) is defined by

Eq. 6.62

so that may be interpreted as the correlation coefficient between the random

coefficients of the components in and at frequency . The graph of as a

function of is called the coherency spectrum. From Eq. 6.62 it follows that

Eq. 6.63

Example 5. Suppose and satisfy the relationship

Eq. 6.64

where is uncorrelated with . Then the cross-covariance function

Eq. 6.65

so that

Eq. 6.66

Furthermore thus

Eq. 6.67

Example 6. A model for price and supply. Let and denote respectively the price and

supply at time t. The model is

Eq. 6.68

where , and and uncorrelated white noise processes. Now the spectral

h21 ω( )dω E dz1∗ ω( )dz2 ω( )[ ]=

h21 ω( )dω

eiωt y1 t, y2 t,

h11 ω( )dω h22 ω( )dω

dz1 ω( ) dz2 ω( ) h21 ω( )dω dz1 ω( )

dz2 ω( )

ω

w21 ω( )h21 ω( )

h11 ω( )h22 ω( )--------------------------------------

cov dz1 ω( ) dz2 ω( ),

var dz1 ω( )( ) var dz2 ω( )( )( )----------------------------------------------------------------------= =

w21 ω( )

y1 t, y2 t, ω w21 ω( )

ω

0 w21 ω( ) 1≤ ≤

y1 t, y2 t,

y1 t, αy2 t, νt+=

νt y2 t,

r12 s, αr22 s,=

h12 ω( ) αh22 ω( )=

h11 ω( ) α2h22 ω( ) hν ω( )+=

w12 ω( ) 1 hν ω( )( ) α2h22 ω( )( )⁄+ 1 2/–=

y1 t, y2 t,

y2 t, ay1 t 1–, ν2 t,+= y1 t, by2 t,– ν1 t,+=

a 0> b 0> ν1 t, ν2 t,

Versie 1.1 6-18 1994


representation Eq. 6.41 gives us for :

Eq. 6.69

Thus we may write for the model of Eq. 6.68:

Eq. 6.70

which gives

Eq. 6.71

Now we use Eq. 6.19 and Eq. 6.43 to find which gives

Eq. 6.72

Example 7. Consider again the ARMA model of Eq. 6.1. On inserting the spectral

representation we find:

Eq. 6.73

with

Eq. 6.74

and power spectrum

Eq. 6.75

When is Gaussian white noise we have

Eq. 6.76

y2 t 1–,

y2 t 1–, eiω t 1–( )dz2 ω( )π–

π

∫ e iω– eiωtdz2 ω( )π–

π

∫= =

dz2 ω( ) ae iω– dz1 ω( ) dzν2ω( )+= dz1 ω( ) bdz2 ω( )– dzν1

ω( )+=

dz1 ω( )dzν1

ω( ) bdzν2ω( )–

1 abe iω–+------------------------------------------------= dz2 ω( )

ae iω– dzν1ω( ) dzν2

ω( )+

1 abe iω–+-----------------------------------------------------------=

E dzν1ω( ) 2[ ]

σ12dω2π

--------------=

h11 ω( )σ1

2 b2σ22+

2π 1 a2b2 2ab ωcos+ +( )--------------------------------------------------------------=

h22 ω( )a2σ1

2 σ22+

2π 1 a2b2 2ab ωcos+ +( )--------------------------------------------------------------=

h12 ω( )ae iω– σ1

2 bσ22–

2π 1 a2b2 2ab ωcos+ +( )--------------------------------------------------------------=

φ e iω–( )dzy ω( ) θ e iω–( )dza ω( )=

φ z 1–( ) 1 φ1z 1–– φ2z 2–– …– φpz p––=

θ z 1–( ) 1 θ1z 1–– θ2z 2–– …– θqz q––=

σy2 f y ω( ) φ e iω–( ) 2 σa

2 f a ω( ) θ e iω–( ) 2=

a

σy2 f y ω( )

σa2 θ e iω–( ) 2

2π φ e iω–( ) 2------------------------------=

Versie 1.1 6-19 1994


Thus the spectral density function of an ARMA process is a rational function of .

Conversely, given a stationary process which satisfies Eq. 6.76 then is an ARMA

process.

Linear system with noise

Consider a single input/single output system in which the output is corrupted by a noise

disturbance, which is uncorrelated with the input:

Eq. 6.77

On inserting the spectral representation Eq. 6.41:

Eq. 6.78

Defining the Fourier transform of :

Eq. 6.79

we find

Eq. 6.80

where and are uncorrelated. Eq. 6.80 has the form of a simple linear

regression between and (for each ). Thus on multiplying both sides with

and taking expectations we obtain

Eq. 6.81

Thus the “least squares estimate” of is given by

Eq. 6.82

We see that minimizing is equivalent to minimizing for each :

e iω–

yt

yt

guxt u–u ∞–=

∞

∑ nt+=

eiωtdzy ω( )π–

π

∫ eiωtgue iωu– dzx ω( )π–

π

∫u ∞–=

∞

∑ eiωtdzn ω( )π–

π

∫+=

g

Γ ω( ) gue iωu–

u ∞–=

∞

∑=

dzy ω( ) Γ ω( )dzx ω( ) dzn ω( )+=

dzx ω( ) dzn ω( )

dzy ω( ) dzx ω( ) ω

dzx∗ ω( )

E dzy ω( )dzx∗ ω( )[ ] Γ ω( )E dzx ω( ) 2[ ]=

Γ ω( )

Γ ω( )cov dzy ω( ) dzx ω( ),

var dzx ω( ) -----------------------------------------------------

hyx ω( )hxx ω( )----------------= =

E nt2[ ] E dzn ω( ) 2[ ] ω

Versie 1.1 6-20 1994


Eq. 6.83

Furthermore we can calculate the power spectrum of y:

Eq. 6.84

Combining Eq. 6.62, Eq. 6.82, Eq. 6.83 and Eq. 6.84 gives

Eq. 6.85

The final expression is called the residual variance bound since it gives the value of the

residual variance after fitting the best possible linear relationship between and allowing

an infinite number of parameters. Eq. 6.85 can be interpreted as: the residual variance is equal

to the total variance of minus the variance due to “regression on ”. We can also write

Eq. 6.86

which can be compared with the analysis of variance expressions:

Eq. 6.87

which strengthens the interpretation of as the correlation coefficient in the

frequency domain.

Estimation in the time domain

Our aim is to estimate functions like given a part of a realization: . In

contrast to the usual regression assumptions we no longer have iid observations. In the first

place the observations are, in general correlated, and secondly, only when the process is

completely stationary the observations will have a common probability distribution. When we

assume that is stationary up to order four and ergodic we can find estimators.

E nt2[ ] hnn ω( )dω

π–

π

∫ E dzn ω( ) 2[ ]π–

π

∫= =

hyy ω( ) Γ ω( ) 2hxx ω( ) hnn ω( )+=

σn2 hnn ω( )dω

π–

π

∫ hyy ω( ) Γ ω( ) 2hxx ω( )– dωπ–

π

∫= = =

hyy ω( ) hyx ω( )hxx ω( )----------------

2hxx ω( )–

dωπ–

π

∫ hyy ω( ) 1 wyx ω( ) 2– dωπ–

π

∫=

xt yt

y x

hnn ω( ) hyy ω( ) 1 wyx ω( ) 2–( )=

wyx ω( ) 2hyy ω( ) hnn ω( )–

hyy ω( )----------------------------------------= ry y,

2 SSR SSM–SST SSM–----------------------------=

wyx ω( )

ρτ yt

t 1 … n, ,=

yt

Versie 1.1 6-21 1994


Estimation of the mean

In general we assume that the process is zero mean, but when it is not we must estimate the

mean in order to correct for it. We estimate the mean by the sample mean

Eq. 6.88

which is an unbiased estimator, with variance

Eq. 6.89

We now change the summation variables to and s. Thus the double summation

which extends over the lattice points in a square is replaced by summing over the diagonals of

constant and adding the diagonal sums together. The summation over k goes from

to ; for , s goes from 1 to , while for , s goes from -k to N.

The summand in Eq. 6.89 is a function of k only, and hence summation over s gives the

expression . We thus obtain

Eq. 6.90

Now if the sum is finite when then and is a consistent estimate. If has

a purely continuous spectrum we can use Eq. 6.17 to find

Eq. 6.91

Whereas is the ensemble average of , is the time average of .

Example 2 (continued) With the AR(1) process we use Eq. 6.21 to find for large n

. Thus the “equivalent number of independent

observations” is .

Estimation of the autocovariance function

We form the observations into pairs , , , . In general,

µ

µ 1n--- y

tt 1=

n

∑=

varµ 1n2----- cov y

sy

t,( )

s t, 1=

n

∑ σn--- 2

ρt s–s t, 1=

n

∑= =

k t s–=

k t s–=

n 1–( )– n 1–( ) k 0> n k– k 0<

n k–

varµ σn--- 2

n k–( )ρkk n 1–( )–=

n 1–

∑ σ2

n------ 1 k

n-----–

ρkk n 1–( )–=

n 1–

∑= =

n ∞→ varµ 0→ µ yt

1 kn-----–

ρkk n 1–( )–=

n 1–

∑n ∞→lim ρk

k ∞–=

∞

∑ 2πf 0( )= =

µ ytµ y= y

tt 1 … n, ,=

varµ σ2 n⁄( ) 1 φ+( ) 1 φ–( )⁄( )≈

n 1 φ–( ) 1 φ+( )⁄( )

y1 yk 1+,( ) y2 y

k 2+,( ) … yn k– y

n,( )

Versie 1.1 6-22 1994


these pairs will come from different bivariate distributions, however we know that they share

the same covariance. Furthermore we assume that . We know use a biased estimate:

Eq. 6.92

with . This estimate is preferred over the unbiased estimate for two

reasons. Firstly, shares with r the property of positive semidefiniteness (Theorem 6.2).

Secondly, the variance of the unbiased estimate increases when k approaches , whereas

in the region the bias is negligible. The calculation of the (co)variance of involves

fourth moments of ( ):

Eq. 6.93

After some calculations (Priestley, 1981, p.326-327) it follows that when is Gaussian

Eq. 6.94

In particular, when we have with .

Example 2 (continued) With we find from

which we get the “equivalent number of independent observations” .

Estimation of the autocorrelation function

A natural estimate follows from Eq. 6.92 and is called the sample autocorrelation function:

Eq. 6.95

For a Gaussian process the covariance of this estimator can be approximated by ( )

Eq. 6.96

µ 0=

rk1n--- y

ty

t k+t 1=

n k–

∑=

E rk[ ] 1 k n⁄–( )rk=

r

n 1–

k n« rk

yt

k k l+, 0≥

cov rk rk l+,( ) E rkrk l+[ ] E rk[ ]E rk l+[ ]–= =

1n2----- E y

ty

t k+ ysy

s k l+ +[ ]s 1=

n k– l–

∑t 1=

n k–

∑ 1 kn---–

1 k l+n

----------– rkrk l+–

y

cov rk rk l+,( ) 1n--- rmrm l+ rm k l+ + rm k–+

m ∞–=

∞

∑≈

k l 0= = r0 sy2≡ 1

n--- y

t2

t 1=n∑= var r0( ) 2

n--- rm

2m ∞–=∞∑≈

rm σ2ρm σ2φ m= = var r0( ) 2σ4

n--------- 1 φ2+

1 φ2–--------------- ≈

n 1 φ2–( ) 1 φ2+( )⁄( )

ρk rk r0⁄=

k k l+, 0≥

cov ρk ρk l+, 1n--- ρiρi l+ ρi k l+ + ρi k– 2ρkρk l+ ρi

2 2ρkρi k– l– ρi– 2ρk l+ ρi k– ρi–+ +i ∞–=

∞

∑≈

Versie 1.1 6-23 1994


For processes with decaying autocorrelation functions we approximate

where k is sufficiently large. Then Eq. 6.96 reduces to

Eq. 6.97

Example 1 (continued) With Gaussian white noise we have and thus

Eq. 6.98

which is zero when . Furthermore when .

Example 2 (continued) For the AR(1) process and for k sufficiently large

Eq. 6.99

Estimation of parameters in autoregressive models

Let us first assume that the order p of the process is known. We now rewrite Eq. 6.26:

Eq. 6.100

In contrast to normal linear least squares analysis the observations are not independent and

even appear in the design matrix X with elements . Thus we cannot be certain that

minimization of

Eq. 6.101

will provide optimal results. Note that the , are not included, since they

cannot be computed from Eq. 6.26. In fact we are maximizing the conditional log-likelihood

function

Eq. 6.102

given that the first p observations are exact. Eq. 6.102 leads to the normal equations

Eq. 6.103

ρs 0≈ s k≥

cov ρk ρk l+, 1n--- ρiρi l+

i ∞–=

∞

∑≈

ρk δk 0,=

cov ρk ρk l+, 1n--- δl 0, δ2k l 0,+ 2δk 0, δl 0,–+( )≈

l 0> varρk 1 n⁄≈ k 0>

ρi φ i=

cov ρk ρk l+, φ l

n------ 1 φ2+

1 φ2–--------------- l+

≈ varρk1n--- 1 φ2+

1 φ2–---------------

≈

νt yt

φ1yt 1–– φ2y

t 2–– …– φpyt p––=

tj yt j–=

S φ( ) νt2

t p 1+=

n

∑=

νt2 t 1 … p, ,=

L φ( ) σν 2π( ) n p–( )–log1

2σν2--------- y

tφ1y

t 1–– φ2yt 2–– …– φpy

t p––( )2

t p 1+=

n

∑–=

yt

φ1yt 1–– φ2y

t 2–– …– φpyt p––( )y

t j–p 1+=

n

∑ 0= j 1 … p, ,=

Versie 1.1 6-24 1994


As a further approximation we consider the terms as estimates of the autocovariance function:

Eq. 6.104

Eq. 6.105

which is identical in form to the Yule-Walker equations Eq. 6.31. Thus approximate

expressions for may be obtained by computing the sample autocovariances and

then solving Eq. 6.31.

The partial autocorrelation function which is defined as is estimated with help of Eq. 6.31.

When the order of the fitted model is larger than p we have approximately (Box and Jenkins,

1976, p.65)

Eq. 6.106

To estimate we combine Eq. 6.101 with Eq. 6.103 to find

Eq. 6.107

We can compare Eq. 6.31 with with the linear least squares normal equations .

Thus the role of is played by and we find the approximate covariance matrix

Eq. 6.108

Example 2 (continued) For the AR(1) process we find the theoretical variance

Estimation of parameters in moving average models

Again we assume that the order q of the process is known. We have

Eq. 6.109

j φ1 r j 1– … φpr j p–+ += j 1 … p, ,=

r1

r2

…r p

r0 r1 … r p 1–

r1 r0 … r p 2–

… … … …r p 1– r p 2– … r0

φ1

φ2

…φp

= r R pφ=

φ r0 … r p 1–,

φkk

var φkk( ) 1n---≈ k p>

σν

S φ( ) νt2

t p 1+=

n

∑ n p–( ) r0 φ1 r1– …– φpr p–( )= = σν2S φ( )

n 2 p–---------------=

T y XT X θ=

XT X n p–( )Rp

D φ( )σν2 Rp

1–

n p–---------------

r0 φT r–

n-------------------- Rp

1–≈ ≈

varφr0 φr1–

n-------------------r0

1–≈ 1 φ2–n

--------------=

yt

νt θ1νt 1–– …– θqνt q–– θ B( )νt= =

Versie 1.1 6-25 1994


We assume that to find that

Eq. 6.110

We find the conditional log-likelihoodfunction

Eq. 6.111

Thus we are faced with a nonlinear least squares problem.

Estimation of parameters in ARMA models

To ensure identifiability we impose three conditions (see also Table 1):

• the polynomials and have no common factors

• all the roots of and lie outside the unit circle

• and are not both zero.

We now assume that to find that

Eq. 6.112

We now find the conditional log-likelihoodfunction

Eq. 6.113

Again we arrive at a nonlinear least squares problem. We estimate from

Eq. 6.114

An approximate joint confidence region for all the parameters is defined

by

ν0 ν 1– … ν q– 1+ 0= = = =

ν1 y1= ν2 y2 θ1ν1+= ν3 y3 θ1ν2 θ2ν1+ + y3 θ1y2 θ12 θ2+( )y1+ += =

… νq yq

θ1νq 1– … θq 1– ν1+ + +=

νt yt

θ1νt 1– … θqνt q–+ + += q t n≤<

L θ( ) constant1

2σv2

---------S θ( )–= with S θ( ) νt2

t 1=

n

∑=

φ B( ) θ B( )

φ B( ) 0= θ B( ) 0=

φp θq

ν0 ν 1– … ν q– 1 p+ + 0 ν1 … νp= = = = = = =

νp 1+ yp 1+ φ1y

p– …– φpy1–= …

νp q+ yp q+ θ1νp q 1–+ … θq 1– νp 1+ φ1y

p q 1–+– …– φpyq

–+ + +=

νt yt

θ1νt 1– … θqνt q– φ1yt 1–– …– φpy

t p––+ + += p q+ t n≤<

L φ θ,( ) constant1

2σv2

---------S φ θ,( )–= with S φ θ,( ) νt2

t p 1+=

n

∑=

σν

σν2S φ θ,( )

n 2 p– q–------------------------=

100 1 α–( )% φ θ,

Versie 1.1 6-26 1994


Eq. 6.115

Determining the order of the model

The estimates of the full and partial autocorrelation function can be examined to find whether

an AR(p) or MA(q) model is appropriate, cf. Table 1 and Fig.6.3. However, one has to take

into account the variances of and (Eq. 6.98 and Eq. 6.108). For instance, with an

AR(p) model should be approximately smaller than for .

Several criteria have been proposed which are based upon the residual variance (Priestley,

1981, §5.4.5). A popular one is Akaike’s Information Criterion (AIC) which adds a penalty

for each additional parameter and is defined as

Eq. 6.116

The model with the minimum value of is chosen.

Estimation in the frequency domain

Our aim is to estimate the (non-normalized) power spectrum . At first glance this seems

easy, since by Eq. 6.17 and Eq. 6.53 and form a Fourier pair. Thus we define

Eq. 6.117

where is called the sample spectral density function, also referred to as (modified)

periodogram. In the second part of Eq. 6.117 we have used the fact that is symmetric.

We now proof that is an asymptotically unbiased estimate of .

We introduce the finite Fourier transform which is defined by

Eq. 6.118

Using the spectral representation of (Eq. 6.41) we find

Eq. 6.119

S φ θ,( ) S φ θ,( ) 1p q+

n 2 p– q–------------------------F p q+ n 2 p– q– α, ,+

≤

ρk φkk

φkk 2 n⁄ p>

σν2

AIC p q,( ) nS φ θ,( )

n----------------- log 2 p q+( )+=

AIC p q,( )

h ω( )

h ω( ) rs

In ω( ) 12π------ rse

iωs–

s n 1–( )–=

n 1–

∑ 12π------ rs ωs( )cos

s n 1–( )–=

n 1–

∑= =

In ω( )

rs

In ω( ) h ω( )

ζy ω( )

ζyω( ) 1

2πn-------------- y

te iωt–

t 1=

n

∑= π– ω π≤ ≤

yt

ζyω( ) 1

2πn-------------- e iωt– eiθt

t 1=

n

∑

dz θ( )π–

π

∫=

Versie 1.1 6-27 1994


Now the summation yields

Eq. 6.120

We define the Fejer kernel as

Eq. 6.121

Thus

Eq. 6.122

Furthermore (analogous to the argument leading from Eq. 6.89 to Eq. 6.90)

Eq. 6.123

The expectation of the periodogram is, using Eq. 6.43:

Eq. 6.124

Taking the limit the Fejer kernel corresponds to a -function, and is an

asymptotically unbiased estimate of .

Properties of the periodogram of a linear process

We will find out that in its raw state the periodogram is an extremely poor (if not a useless)

estimate of the spectral density function. The reasons for this rather surprising result are:

• is not a consistent estimate of in the sense that does not tend to

zero as

• as a function of , typically has an erratic and wildly fluctuating form, in

contrast to the smooth .

The general linear process

eiφt

t 1=

n

∑ 1 eiφ n 1+( )–1 eiφ–

----------------------------- 1– ei n 1+( ) φ 2⁄( ) n φ 2⁄( )( )sinφ 2⁄( )sin

-------------------------------= =

Fn φ( ) 1

2πn-------------- n φ 2⁄( )( )sin

φ 2⁄( )sin-------------------------------

2=

ζyω( ) Fn

1 2/ θ ω–( )ei n 1+( ) θ ω–( ) 2⁄( )dz θ( )π–

π

∫=

In ω( ) 12π------ rse

iωs–

s n 1–( )–=

n 1–

∑ 12πn---------- y

ty

t s+ e iωs–

t 1=

n s–

∑s n 1–( )–=

n 1–

∑= = =

12πn---------- y

t'ys'eiωs'– eiωt'

t' 1=

n

∑s' 1=

n

∑ ζyω( )ζ

y∗ ω( ) ζ

yω( ) 2= =

E In ω( )[ ] Fn θ ω–( )E dz θ( ) 2[ ]π–

π

∫ Fn θ ω–( )h θ( )dθπ–

π

∫= =

n ∞→ δ In ω( )

h ω( )

In ω( ) h ω( ) varIn ω( )

n ∞→

ω In ω( )

h ω( )

Versie 1.1 6-28 1994


Eq. 6.125

obeys the relation (compare the derivation from Eq. 6.77 to Eq. 6.80)

Eq. 6.126

where is defined in Eq. 6.79. The proof of the following theorem can be found in

Priestley, 1981, §6.2.2.

Theorem 6.7. Let be a general linear process of the form Eq. 6.125 in which the

are independent with , ( ) and

( ). Then

Eq. 6.127

with, uniformly in

Eq. 6.128

Furthermore (using e.g. Eq. 6.84)

Eq. 6.129

with, uniformly in

Eq. 6.130

The covariance of the periodogram is given by (uniformly in , )

Eq. 6.131

where .

Setting , Eq. 6.131 gives for all , ( )

Eq. 6.132

yt

guνt u–u ∞–=

∞

∑=

dzy ω( ) Γ ω( )dzν ω( )=

Γ ω( )

yt

νt E νt[ ] 0= E νtm[ ] ∞< m 2 4,=

gu u αu ∞–=∞∑ ∞< α 0>

ζyω( ) Γ ω( )ζν ω( ) cn ω( )+=

ω

E cn ω( ) m[ ] O n mα–( )=

In y, ω( ) 2πh ω( )σν2– In ν, ω( ) dn ω( )+=

ω

E dn ω( ) 2[ ] O n 2α–( )=

ω1 ω2

cov In y, ω1( ) In y, ω2( ),( ) =

εn---

2πn

------ Fn ω1 ω2+( ) Fn ω1 ω2–( )+ + h ω1( )h ω2( ) O n α–( )+

ε E νt4[ ] 3–=

ω1 ω2 ω= = ω 0 π±,≠

varIn y, ω( ) h2 ω( ) 1 εn---+

O n α–( )+ h2 ω( ),→= as n ∞→

Versie 1.1 6-29 1994


The basic reason why the variance of the periodogram does not decay to zero as lies in

the fact that it consists of n (consistent) estimates each having a variance . The

sum of these, however, possesses a variance .

Sampling properties of the periodogram

When is a Gaussian process, a sequence of independent random variables distributed

for each t as then for all , ( ) the samples of the periodogram, being the

sum of the squares of two independent zero mean normal variables (real and imaginary part of

Eq. 6.123) have a distribution which is proportional to (chi-squared, two degrees of

freedom, see appendix). Applying this to Eq. 6.129 we find that asymptotically ( )

whereas for we have .

Consistent estimates of the spectral density function; spectral windows

To find a consistent estimate we introduce the lag window

Eq. 6.133

Combining this with Eq. 6.117 we express as the Fourier transform of :

Eq. 6.134

where the spectral window is defined as the Fourier transform of the lag window

Eq. 6.135

Thus we have two equivalent approaches: weighting the sample autocovariance function so as

to reduce the number of contributing , which has exactly the same effect as smoothing the

periodogram. A few examples of windows are given below (see also Priestley, 1981, §6.2.3)

Example 8. The truncated periodogram window. Consider the lag window

Eq. 6.136

n ∞→

rs O 1 n⁄( )

O 1( )

yt

N 0 σy2,( ) ω 0 π±,≠

χ22

ω 0 π±,≠

In y, ω( ) 2πh ω( )χ22∼ ω 0 π±,= In y, ω( ) 4πh ω( )χ1

2∼

h ω( ) λs

h ω( ) 12π------ λsrse

iωs–

s n 1–( )–=

n 1–

∑=

rs In ω( )

h ω( ) 12π------ λse

i ω θ–( )s–

s n 1–( )–=

n 1–

∑

In θ( )dθπ–

π

∫ In θ( )W ω θ–( )dθπ–

π

∫= =

W θ( ) λs

W θ( ) 12π------ λse

iθs–

s n 1–( )–=

n 1–

∑=

rs

λs1, s m≤0, s m>

=

Versie 1.1 6-30 1994


which corresponds to the spectral window (using Eq. 6.120)

Eq. 6.137

where the function is known as the Dirichlet kernel. Its form is illustrated in Fig.6.5.

The window parameter m determines the truncation point, and at the same the width of the

smoothing kernel. Thus a small m produces a smaller variance of , at the cost of

W θ( ) 12π------ e iθs–

s m–=

m

∑ 12π------ m 1 2⁄+( )θ[ ]sin

θ 2⁄( )sin-----------------------------------------

Dm θ( )= = =

Dm θ( )

Fig.6.5. Examples of lag windows (left) and accompanying spectral windows (right).

h ω( )

Versie 1.1 6-31 1994


resolution in the -domain.

Example 9. Bartlett’s window. The triangular lag window is defined as

Eq. 6.138

which corresponds to the spectral window (using Eq. 6.120 and Eq. 6.121)

Eq. 6.139

Since the Fejer kernel is non-negative everywhere it follows that the Bartlett estimate is

similarly non-negative everywhere. This in contrast to the Dirichlet kernel.

Example 10. The Daniell (or rectangular) window. This is simply the average of the

periodogram over a small interval:

Eq. 6.140

with corresponding lag window

Eq. 6.141

Sampling properties of spectral estimates

Our aim is to find a consistent estimate . We already found that represented an

asymptotically unbiased estimate, but with variance . Now we must consider the price

for our window operations.

We assume that is an even sequences ( ) so that is a real valued even

function of . Furthermore we assume that is such that , ,

(all n), as .

To ensure that is not too narrow in relation to , i.e. that its width is much greater

than , we must have that as , e.g. with . Then,

for any , uniformly as for . It is useful to note that the

sequence as , which follows from the fact that has the

ω

λs1 s m⁄– , s m≤0, s m>

=

W θ( ) 12π------ e iθs– 1 s

m-----–

s m–=

m

∑ 12πm----------- mθ( ) 2⁄[ ]sin

θ 2⁄( )sin--------------------------------

2

Fm θ( )= = =

W θ( )m 2π⁄ , θ π m⁄≤

0, otherwise=

λs eisθdθπ m⁄–

π m⁄

∫ πs m⁄( )sinπs m⁄

---------------------------= = all s

h ω( ) I ω( )

O 1( )

λs λs λ s–= W θ( )

θ λs W θ( ) 0≥ W θ( )dθπ–π∫ 1=

W 2 θ( )dθπ–π∫ ∞< s n⁄( )λn s,

2s n 1–( )–=n 1–∑( ) λn s,

2s n 1–( )–=n 1–∑( )⁄ 0→ n ∞→

W n θ( ) 1 n⁄

O 1 n⁄( ) m n⁄ 0→ n ∞→ m n α–= 0 α 1< <

ε 0> W n θ( ) 0→ n ∞→ θ ε>

W n2 θ( )dθπ–

π∫ ∞→ n ∞→ W n θ( )

Versie 1.1 6-32 1994


limiting form of a -function. With these properties it can be proved that, when has a

bounded first derivative,

Eq. 6.142

Because is asymptotically unbiased

Eq. 6.143

with asymptotical variance

Eq. 6.144

The first term is equal to zero when is Gaussian, and is negligible compared with the

second term, which as . Because the term

with will vanish as , unless , for which we

introduce the symbol . Thus Eq. 6.144 simplifies to

Eq. 6.145

Thus is consistent if

Eq. 6.146

Using the asymptotic normality of the approximate confidence interval

for is given by

Eq. 6.147

where is the two-sided % point of the normal distribution.

Approximate expression for the bias

Although is asymptotically unbiased, it will nevertheless be biased for n finite. This is

δ h ω( )

E h ω( )[ ] h θ( )W n ω θ–( )dθπ–

π

∫ On( )log

n---------------- + h ω( ) O

n( )logn

---------------- +≡=

W n ω θ–( ) n ∞→lim δ ω θ–( )= h ω( )

E h ω( )[ ]n ∞→lim h ω( )=

nvarh ω( ) ε h2 ω( ) 2π h2 θ( )W n ω θ–( ) W n ω θ–( ) W n ω θ+( )+ dθπ–

π

∫+≈

yt

∞→ n ∞→ W n ω θ–( ) n ∞→lim δ ω θ–( )=

W n ω θ–( )W n ω θ+( ) n ∞→ ω 0 π±,=

δω 0 π, ,

varh ω( ) 1 δω 0 π, ,+( )2πn

------h2 ω( ) W n2 ω θ–( )dθ

π–

π

∫≈ 1 δω 0 π, ,+( )h2 ω( )cW

n------=

h ω( )

1n--- W n

2 ω θ–( )dθπ–

π

∫

n ∞→lim

1n--- λn s,

2

s∑

n ∞→lim

cW

n------

n ∞→lim≡ ≡ 0=

h ω( ) 100 1 α–( )%

h ω( )

h ω( ) 1 c α( )cW

n------±

c α( ) 100α( ) N 0 1,( )

h ω( )

Versie 1.1 6-33 1994


simply the price we have to pay for reducing the variance by smoothing the periodogram. This

trade off between bias and variance is an essential characteristic of the estimation of the

spectral density function. From Eq. 6.142 we derive the bias

(using )

Eq. 6.148

Suppose now that can be expanded as a Taylor series

Eq. 6.149

Substituting Eq. 6.149 into Eq. 6.148 and using the fact that because

is an even function of we obtain

Eq. 6.150

Thus the bias depends upon the curvature of and upon the width of the window. When

we think of as a probability distribution function, the second term represents its

variance.

In the particular case when the lag window is of the scale parameter form (for

example Eq. 6.136, Eq. 6.138 and Eq. 6.141) we have

Eq. 6.151

Example 8 (continued) For the truncated periodogram we have ,

and hence by combining Eq. 6.145 and Eq. 6.151 we find

.

Example 9 (continued) For the Bartlett window so that

b ω( ) h ω( ) E h ω( )[ ]–≡

W θ( )dθπ–π∫ 1=

b ω( ) h ω( ) h θ( )– W n ω θ–( )dθπ–

π

∫ O nlog( ) n⁄( )+=

h ω( )

h ω θ–( ) h ω( ) θh' ω( )–θ2

2-----h'' ω( ) o θ2( )+ +=

θW θ( )dθπ–π∫ 0=

W θ( ) θ

b ω( ) 12---h'' ω( ) θ2W θ( )dθπ–

π∫–≈

h ω( )

W θ( )

λs κs m⁄=

1m---- λn s,

2

s n 1–( )–=

n 1–

∑ 1m---- κn s m⁄,

2

s n 1–( )–=

n 1–

∑ κ2 u( )du

∞–

∞

∫→= as n ∞→

κ u( )1, u 1≤0, u 1>

=

κ2 u( )du∞–∞∫ 2=

varh ω( ) 1 δω 0 π, ,+( )2mn

-------h2 ω( )≈

κ u( )1 u– , u 1≤0, u 1>

=

Versie 1.1 6-34 1994


and .

Furthermore it can be derived that the bias where

Eq. 6.152

Example 10 (continued) With the Daniell window and

hence which

could also be derived directly from Eq. 6.145. The bias is approximately equal to

These examples illustrate the trade off between bias and variance: a large m decreases the bias,

at the expense of increasing the variance, and vice versa.

Example 2. In Fig.6.6. we illustrate the estimation of for an AR(1)

process, with three different values for the window parameter m. Simulated were 500 samples

of an AR(1) process with , . The conditional maximum likelihood

estimate of was 0.55, with thus providing an approximate 95% confidence

interval for of . . Abscissa: in units of . Ordinate: the

solid lines indicate the theoretical normalized spectral density function , whereas the

dashed lines indicate as obtained with a Parzen window. The relative standard deviation

is approximately given by , , . Selected approximate

95% confidence interval for are shown with .

Estimation of cross-spectra

The previous results can be generalized for multivariate processes relatively easily.

The cross-covariance function is estimated by (compare Eq. 6.92)

Eq. 6.153

where the summation extends from to . is asymptotically

κ2 u( )du∞–∞∫ 2 1 u–( )2du

01∫ 2 3⁄= = varh ω( ) 1 δω 0 π, ,+( )2m

3n-------h2 ω( )≈

b ω( ) h 1( ) ω( )( ) m⁄≈

h l( ) ω( ) 12π------ s lrse

iωs–

s ∞–=

∞

∑=

κ u( ) πusin( ) πu( )⁄=

κ2 u( )du∞–∞∫ 2 xsin( ) x⁄( )2dx

0∞∫ 1= = varh ω( ) 1 δω 0 π, ,+( )m

n----h2 ω( )≈

b ω( ) h 2( ) ω( )π2( ) 6m2( )⁄≈ h'' ω( )π2( ) 6m2( )⁄–=

f ω( ) h ω( )( ) σy2⁄=

φ 0.6= νt N 0 1,( )∼

φ varφ 0.00128≈

φ 0.48 0.62,( ) σν2 0.98= ω π 128⁄

f ω( )

f ω( )

0.23 m=50( ) 0.16 m=25( ) 0.11 m=12( )

f ω( ) m 50=

r ij s,1n--- y

j t, yi t s+,

t∑= s 0 1± … n 1–( )±, , ,=

max 1 1 s–,( ) min n s– n,( ) r ij s,

Versie 1.1 6-35 1994


Fig.6.6. Example of time and frequency domain analysis of an AR(1) process.

Versie 1.1 6-36 1994


unbiased and when the processes are Gaussian we have (compare Eq. 6.94)

Eq. 6.154

The cross-spectral density function is estimated by (compare Eq. 6.133)

Eq. 6.155

where the cross-periodogram is defined as (compare Eq. 6.123)

Eq. 6.156

The covariance is approximately given by (compare Eq. 6.145)

Eq. 6.157

The coherency is estimated as

Eq. 6.158

with variance

Eq. 6.159

Note that the raw estimate of the coherency, based upon the periodogram is

Eq. 6.160

This becomes obvious when we realize that using the raw periodogram matrix is equivalent to

estimating the correlation coefficient between , from the single pair of

observations , . When we apply smoothing, we are assuming that the

correlation coefficient (i.e. the coherency) has the same value for a number of neighbouring

frequencies. Thus estimation of the coherency strongly depends upon a careful window

choice.

cov r ij s, r ij u,, 1n--- rii k, r jj k u s–+, rij k u+, r ji k s–,+

k ∞–=

∞

∑≈

hij ω( ) 12π------ λsrij s, e iωs–

s n 1–( )–=

n 1–

∑ In ij, θ( )W n ω θ–( )dθπ–

π

∫= =

In ij, ω( )

In ij, ω( ) ζyiω( )ζ

y j

∗ ω( )=

cov hij ω( ) hkl ω( ), cW

n------ hik ω( )h jl

∗ ω( ) hik ω( )h jl∗ ω( )δω 0 π, ,+( )≈

wij ω( )hij ω( )

hii ω( )h jj ω( )-----------------------------------=

var wij ω( )cW

2n------ 1 wij ω( ) 2–( )≈

wij ω( )ζ

yiω( )ζ

y j

∗ ω( )

ζyiω( ) 2 ζ

y jω( ) 2

----------------------------------------------- 1≡= all ω

dz1 ω( ) dz2 ω( )

ζyiω( ) ζ

y jω( )

Versie 1.1 6-37 1994


Parametric spectral estimation

The spectral density function estimates which we considered so far are based upon smoothing

the periodogram via a suitable spectral window. This method is valid for stationary processes

with continuous spectra, and is non-parametric in the sense that it does not assume a specific

parametric model for the observed series. Since the periodogram is the Fourier transform of

the windowed estimate of the autocovariance (Eq. 6.117, Eq. 6.133) we can as well interpret it

as a MA estimate of order less than n, since for a moving average process of order q we have

, , see e.g. Eq. 6.23. Thus if we assume the spectrum to be MA of order q

we have the so-called Blackman-Tukey estimate

Eq. 6.161

One can also start with the estimation of the parameters of an ARMA (p,q) model in the time

domain, and use Eq. 6.76 to estimate

Eq. 6.162

When an AR(p) process is assumed the method is known as autoregressive spectral

estimation. A criterion like the AIC (Eq. 6.116) is necessary to choose the order (p,q) of the

ARMA model.

Line spectra (Eq. 6.18) naturally give rise to parametric spectral estimation. The

autocovariance function of a harmonic process is given by

Eq. 6.163

Thus we find the spectrum

ρk rk 0= = k q>

h ω( ) 12π------ rse

iωs–

s q–=

q

∑=

h ω( )σa

2 θ e iω–( ) 2

2π φ e iω–( ) 2------------------------------=

rτ E yt τ+ y

t[ ]

akal

2π( )2--------------

k l, 1=

K

∑ ωk t τ+( ) ϕk+( )cos dϕk

π–

π

∫ ωlt ϕl+( )cos dϕl

π–

π

∫= = =

akal

2π( )2--------------

k l, 1=

K

∑ dϕk

π–

π

∫ dϕl

π–

π

∫ 12--- ωl ωk–( )t ϕl ϕk– ωkτ–+( )cos sum( )cos+( ) =

ak2

2-----

k 1=

K

∑ ωkτ( )cos

Versie 1.1 6-38 1994


Eq. 6.164

A more realistic assumption is that the observation contains additive errors

Eq. 6.165

and thus we are faced with a nonlinear least squares problem with parameters K,

and . Apart from the additive noise the harmonic process is

predictable.

The most popular models for signal spectra involve a linear combination from the triplet

(white noise, ARMA, lines), e.g. a line spectrum plus white noise.

Example 11. Canadian lynx series. The annual trappings of Canadian lynx over the period

1821-1934 have been registered. This celebrated set of data shows oscillations with a period

of about ten years, but with irregular variations in amplitude (Fig.6.7.). The form of both the

data and the autocovariance function suggests that either data contain a strictly periodic

component corrupted by error, or alternatively that the data conform to some pseudo periodic

type of ARMA model. One obvious candidate of the latter type of model is the AR(2) model,

which can generate pseudo periodic behaviour. This fit was unsatisfactory, and choosing from

AR(p) models the AIC was least for an AR(11) model. A subset of the AR(11) model with

produced the least AIC. Ignoring this model could roughly be

factorized in the form , where the second factor corresponds to a

damped periodic component with a period of ten years. The roots of possessed

moduli between 0.87 and 0.98, which is very close to the unit circle, as one would expect from

the largely cyclical form of the data. An ARMA(3,3) model also provided a satisfactory fit.

The periodogram confirms the cyclical behaviour of the data. The (logarithm of the)

smoothed periodogram resembles the spectrum calculated from the AR(11) fit.

Campbell and Walker (1977) adopted a mixed spectrum model with a harmonic component of

9.63 years and fitted an AR(2) model to the residuals. Their final model is

h ω( ) 12π------ rτe

iωτ–

τ ∞–=

∞

∑ 12π------

ak2

2-----

k 1=

K

∑ ωkτ( )e iωτ–cosτ ∞–=

∞

∑= = =

ak2

4-----

k 1=

K

∑ δ ω ωk–( ) δ ω ωk+( )+( )

yt

ak ωkt ϕk+( )cosk 1=

K

∑ νt+=

ak ϕk,( ) k, 1 K,= σν2 νt

φ1 φ2 φ4 φ10 φ11, , , , φ4

1 B–( ) 1 0.3B10–( )

φ B( ) 0=

Versie 1.1 6-39 1994


Eq. 6.166xt 2.9036 0.0895

2πt9.63---------- cos 0.6249

2πt9.63---------- sin– y

t+ +=

yt

0.9717yt 1–– 0.2654y

t 2–+ νt= σν2 0.042=

Versie 1.1 6-40 1994


Fig.6.7. Analysis of Canadian lynx trappings 1821-1934.

Versie 1.1 6-41 1994


Example 12. Autoregressive spectral estimation of MA(1) series ,

, 500 observations, (Compare Fig.6.2.). Since is close to 1, the coefficients

in the AR representation of this model

Eq. 6.167

decay very slowly. The AIC selected an AR(17) model, whose spectral estimate shows quite

marked oscillations (Fig.6.8.). In contrast, the smoothed periodogram resembles the

theoretical spectral density function well.

Example 13. Mixed spectrum: MA plus lines. The data can relatively easy be generated:

Eq. 6.168

The resulting power spectrum is given by (using Eq. 6.76 and Eq. 6.164)

yt

νt 0.95νt 1–+=

νt N 0 12,( )∼ θ1

1 θ1B+( ) 1– yt

θ1B–( )k

k 0=

∞

∑

yt

νt= =

Fig.6.8.

yt

2πt2----- cos 2

11πt20

----------- cos xt+ +=

xt 1 B–( ) 1 B+( )3( )νt= σν2 1=

Versie 1.1 6-42 1994


Eq. 6.169

The MA portion of the spectrum has power 10, and each sinusoid has power 2.

A 256-point dataset was generated via Eq. 6.168, and periodograms were calculated from 64-

point portions of the data. In the upper left hand corner of Fig.6.9. only one periodogram is

shown. Then averages of two and four periodograms are shown, the one with uses all

256 data values. The true spectrum is also shown on the same plots. The averaging gradually

reduces the variance, and the result approaches the expected value , denoted in the

figure by . Note that this expected value differs from the true power spectrum because

of the finite data set (Eq. 6.124, Fejer kernel shown in Fig.6.5.).

When instead of a rectangular window we choose a Hanning window (which possesses a

broader main lobe but smaller side lobes)

Eq. 6.170

(where is chosen so that the window sequence has total energy n) we considerably smooth

the periodograms, at the expense of a loss of resolution (evidenced in the treatment of the

lines), see Fig.6.10..

Finally, we show the results of autoregressive spectral estimation of this MA(4) plus lines

process. An AR(20) model was assumed. The (anomalous) peaks in Fig.6.11. correspond

roughly to those in Fig.6.9., suggesting that they are inherent in the data. As the length of the

data increases these false peaks and valleys smooth out.

A tutorial on model based spectral analysis is Kay and Marple (1981).

h ω( ) δ ω π2---–

δ ω π2---+

δ ω 11π20

---------– δ ω 11π

20---------+

+ + + +=

162π------ ω( )sin2 1 ω( )cos+( )2

M 4=

E In ω( )[ ]

ES θ( )

λs β 12s n– 1+

n 1+------------------------π cos+= 0 s n 1–≤ ≤

β

Versie 1.1 6-43 1994


Fig.6.9. MA(4) plus lines: true spectrum and periodograms

Versie 1.1 6-44 1994


Fig.6.10. MA(4) plus lines: Hanning window

Versie 1.1 6-45 1994


Fig.6.11. MA(4) plus lines: AR(20) spectral estimate

Versie 1.1 6-46 1994


Use of the Fast Fourier Transform

In practice one does not compute the entire periodogram, but merely samples of it. A set of

equally spaced samples of can most efficiently be obtained by using a fast Fourier

transform algorithm for the DFT. Let be a power of two and let then

Eq. 6.171

where and form a N-point DFT-pair. Thus one can

get samples of the periodogram by extending the data to length N by zero-padding,

performing a length N FFT, and then normalizing the magnitude squared transform. It should

be understood that increasing the size of the DFT in no way improves the “resolution” (bias)

of the spectral estimate. It serves only to obtain more samples of the periodogram. This point

is illustrated in Fig.6.12. The periodogram is the continuous curve, while the use of the FFT

(with ) provides only the samples shown by the vertical lines. Increasing N (and

zero-padding the data) would decrease the sample interval. This would allow for more

accurate representation of the true heights of peaks and valleys.

In ω( )

N n≥ ω0 2π N⁄=

I mω0( ) ζy

mω0( ) 2 12πN----------- Y m( ) 2= =

y0 … yn 1– 0 … 0N 1–, , , , , 0 … Y N 1–, ,

Fig.6.12.

N n 64= =

Versie 1.1 6-47 1994


Smoothing, prediction and filtering

Minimum mean square error estimation

We wish to estimate the present value of a stochastic process in terms of the values of

another process specified for every in an interval . The desirable linear estimate

of is a sum

Eq. 6.172

The values of follow from minimizing the mean square error (MSE) criterion P:

Eq. 6.173

Taking the derivative of Eq. 6.173 with respect to we find

Eq. 6.174

Thus the MSE P is minimum when the error is orthogonal to the data .

When and are normally distributed with zero mean then is the conditional expectation

of , given the data , and s has probability density function:

Eq. 6.175

We now distinguish three cases (see also Fig.6.13.):

• If the time t is in the interior of the data interval , then the estimate of will be

called smoothing.

• If t is outside this interval and (no additive noise) then is a predictor of . If

then is a forward predictor or forecast, if then is a backward

predictor or backcast. We will only discuss forecasting ( Fig.6.14.).

• Finally, if t is outside the data interval and , then the estimate is called filtering

and prediction. In this case we want to filter out the noise.

st xξ

xt ξ a ξ b≤ ≤

st st

st αkxkk a=

b

∑=

αk

E st st–( )2[ ] E st αkxkk a=

b

∑– 2

= E ε2[ ]≡ P=

αi

αi∂∂P

E st αkxkk a=

b

∑–

xi E εxi[ ] 0= = =

ε xi a i b≤ ≤,

xt st s

s x st E st xa … xb, ,[ ]=

p s xi a i b≤ ≤,( ) 2πP( ) 1 2/– exps αaxa– …– αbxb–( )2–

2P------------------------------------------------------------ =

a b,[ ] st st

xt st= st st

t b> st t a< st

xt st≠

Versie 1.1 6-48 1994


Smoothing

The noncausal estimate in terms of the data

Eq. 6.176

is the output of a LTI non-causal system with impulse response (see also Linear system

with noise on page 20). From the orthogonality principle of Eq. 6.174 we find

Fig.6.13.

Fig.6.14.

sn

xn sn νn+=

gn

Versie 1.1 6-49 1994


Eq. 6.177

Hence

Eq. 6.178

Taking Fourier transforms on both sides (Eq. 6.60) we obtain and

Eq. 6.179

in agreement with Eq. 6.82. The system described by g (or ) can be termed the (discrete

time) noncausal Wiener filter. Thus a prerequisite for smoothing is knowledge of (or

, which reduces to the signal autocovariance (or spectrum ) when signal

and noise are uncorrelated.

Example 14. Suppose that is an AR(1) process. We have (Eq. 6.76)

Eq. 6.180

where , respectively N, represent the power of the noise that generates the AR(1) process

and of the disturbance . In this case

Eq. 6.181

Combining Eq. 6.179-Eq. 6.181 we find

Eq. 6.182

Thus the frequency response function is close to unity when is large, and

close to zero when this signal to noise ratio is small, which is intuitively satisfying.

Now if ( ) we can write

Eq. 6.183

Hence we need geometrically decaying weights for smoothing an AR(1) process.

E sn gkxn k–k ∞–=

∞

∑–

xn m– 0=

rsx m, gkrxx m k–,k ∞–=

∞

∑= ∞– m ∞< <

hsx ω( ) Γ ω( )hxx ω( )=

Γ ω( )hsx ω( )hxx ω( )----------------=

Γ ω( )

rsx m,

hsx ω( ) rss m, hss ω( )

sn

hss ω( )N0

1 ϕe iω–– 2----------------------------= hsν ω( ) 0= hνν ω( ) N=

N0

νn

hxx ω( ) hss ω( ) hνν ω( )+N0 N 1 ϕe iω–– 2+

1 ϕe iω–– 2----------------------------------------------= =

Γ ω( )hss ω( )

hss ω( ) hνν ω( )+----------------------------------------

N0

N0 N 1 ϕ1e iω–– 2+-------------------------------------------------= =

hss ω( ) hνν ω( )⁄

b b 1–+ ϕ ϕ 1– N0 ϕN( )⁄+ += 0 b ϕ 1< < <

Γ ω( )N0

ϕN 1 be iω–– 2------------------------------------= gn cb n= c

bN0

ϕN 1 b2–( )----------------------------=

gn

Versie 1.1 6-50 1994


Prediction

There are many different approaches to the prediction problem. We will treat here the solution

using the difference equation of the ARMA model of the stochastic process (Eq. 6.1), which is

called the Box-Jenkins approach, and may be regarded as a special case of the more general

and more powerful Kalman filter. We shall be concerned with forecasting a value , ,

when we are currently standing at time t. This forecast is said to be made at origin t for lead

time l. Adopting this notation Eq. 6.1 turns to

Eq. 6.184

Let us assume that we have estimated the parameters of the ARMA model, as well as the

innovations . We now take the conditional expectation of Eq. 6.184

which we denote as:

Eq. 6.185

We now have:

Eq. 6.186

Therefore, to obtain the forecast one writes down the model for and applies Eq.

6.186. In words, observations which have not yet happened are replaced by their

forecasts, the other observations are left unchanged. Innovations which have not yet

happened are replaced by zeroes, the other innovations are available from

, the one step ahead forecast errors.

Prediction can be nicely illustrated in Hilbert space (Fig.6.15.). In this Hilbert space of all

complex random variables U orthogonal corresponds to uncorrelated, and squared distance

corresponds to mean square:

yt l+ l 1≥

t l+ φ1yt l 1–+ φ2y

t l 2–+ … φpyt l p–+ at l+ θ1at l 1–+– …– θqat l q–+–+ + + +=

at at 1– … at 1 q–+, , ,

E yt l+ y

ty

t 1– …, ,[ ] Et yt l+[ ]≡ y

tl( )=

Et yt j+[ ] y

t j+= j 1 2 …, ,=

Et yt j–[ ] y

t j–= j 0 1 2 …, ,,=

Et at j+[ ] 0= j 1 2 …, ,=

Et at j–[ ] at j– yt j– y

t j– 1– 1( )–= = j 0 1 2 …, ,,=

yt l( ) yt l+

yt j+

yt j– t j+

at j– yt j– y

t j– 1– 1( )–=

Versie 1.1 6-51 1994


Eq. 6.187

The subspace consists of all linear combinations (in Fig.6.15. indicated

by ). The vector representation of the random variable (in Fig.6.15.

) will in general lie outside the subspace , and the essence of the linear prediction

problem is to find the vector (in Fig.6.15. ) which is closest to . The minimum

mean square error criterion corresponds to a minimal distance between and . The

simple geometrical solution for this is the orthogonal projection of on the subspace .

The innovation or one step ahead forecast error is illustrated in the right

hand part of Fig.6.15., where it is indicated by . In geometrical terms, it denotes the part

of which is orthogonal to the subspace .

Example 15. The temperature readings of a chemical process shown in Fig.6.16. are closely

represented by the model

Eq. 6.188

that is

Eq. 6.189

The forecasts at origin t are given by

E U[ ] 0=

U V,( ) E U∗V[ ]=

U 2 U U,( ) E U 2[ ]= =

U V– 2 E U V– 2[ ]=

Fig.6.15. Illustration of prediction in Hilbert space.

Ht yt

yt 1– y

t 2– …, , ,

t Xt 1– Xt 2– …, , , yt l+

Xt m+ Ht

yt l+ X t m+ y

t l+

yt l+ y

t l+

yt l+ Ht

at 1+ yt 1+ y

t 1+–=

εt 1+

yt 1+ Ht

1 0.8B–( ) 1 B–( )yt l+ at l+=

yt l+ 1.8y

t l 1–+ 0.8yt l 2–+–= at l++

Versie 1.1 6-52 1994


Eq. 6.190

In general, if the moving average operator is of degree q, the forecast equations for

will depend directly on the a’s, but forecasts at longer lead times will not.

Taking conditional expectations of Eq. 6.184 we have, for

Eq. 6.191

where for . The difference equation Eq. 6.191 has the solution

Eq. 6.192

where are functions of the lead time l. In general, they could

include polynomials, exponentials, sines and cosines, and products of these functions. For a

given origin t the coefficients are constants applying to all lead times l, but they change

from one origin to the next, adapting themselves appropriately to the particular part of the

Fig.6.16. Time series C: temperature readings of a chemical process (top) and forecasts

yt

1( ) 1.8yt

0.8yt 1––=

yt

2( ) 1.8 yt

1( ) 0.8yt

–=

yt

1( ) 1.8 yt

l 1–( ) 0.8 yt

l 2–( )–= l 3 4 …, ,=

θ B( )

yt

1( ) … yt

q( ), ,

l q>

yt

l( ) ϕ1 yt

l 1–( )– …– ϕpyt

l p–( )– 0=

yt

j–( ) yt j–= j 0≥

yt

l( ) b0t( ) f 0 l( ) b1

t( ) f 1 l( ) … bp 1–t( ) f p 1– l( )+ + +=

f 0 l( ) f 1 l( ) … f p 1– l( ), , ,

b jt( )

Versie 1.1 6-53 1994


series being considered. The function defined in Eq. 6.192 is termed the eventual forecast

function, and it is the general autoregressive operator which determines its

mathematical form. Examples for processes are shown in Fig.6.17.

Updating the forecasts of an ARMA process

Consider the ARMA (1,1) model

Eq. 6.193

with forecasts

Eq. 6.194

Since the forecasts decay geometrically to zero. The one step ahead forecast error is

Eq. 6.195

Combining Eq. 6.194 and Eq. 6.195 we find

Eq. 6.196

ϕ B( )

AR p( )

Fig.6.17.

yt l+ φy

t l 1–+ at l+ θat l 1–+–+=

yt 1( ) ϕyt

θat–=

yt l( ) ϕ yt

l 1–( )= l 2≥

ϕ 1<

at yt

yt 1– 1( )–=

yt

1( ) ϕ yt 1– 1( ) at+( ) θat– ϕ y

t 1– 1( ) ϕ θ–( )at+ yt 1– 2( ) ϕ θ–( )at+= = =

Versie 1.1 6-54 1994


as well as

Eq. 6.197

Eq. 6.196 tell us that going from origin to origin t we can simply update our prediction

(at time ) by adding the one step ahead forecast error multiplied by .

Eq. 6.197 tells us that the new forecast is a linear combination between the old forecast and

the new observation at time t.

Combining Eq. 6.194 and Eq. 6.196 we find ( )

Eq. 6.198

For a general ARMA model a similar equation exists, with a different coefficient for .

The Wiener filter

Consider again the problem of estimating a signal in noise

Eq. 6.199

In contrast to the section Smoothing on page 49 we will now estimate a causal filter, the so-

called Wiener filter. Our problem is the determination of the future value of a stochastic

process in terms of the present and past values of another process :

Eq. 6.200

From the orthogonality principle, or equivalently the minimum MSE criterion, we have

Eq. 6.201

Thus, after multiplication with and taking the expectation we find

Eq. 6.202

This is called the discrete time Wiener-Hopf equation. In contrast to the smoothing Eq. 6.178

we cannot Fourier transform this equation because it only applies for (due to causality).

Instead we proceed differently. We express in terms of the innovations of ,

compare the difference Eq. 6.1 with instead of , and a solution like Eq. 6.9 in case of an

yt

1( ) ϕyt

θ yt

yt 1– 1( )–( )– ϕ θ–( )y

tθ y

t 1– 1( )+= =

t 1–

t 1+ ϕ θ–

l 1≥

yt

l( ) yt 1– l 1+( ) ϕ θ–( )ϕl 1– at+=

at

xn sn νn+=

sn l+

xn

sn l+l gx k,

l xn k–k 0=

∞

∑=

sn l+ sn l+l– xn m–⊥ m 0≥

xn m–

rsx m l+, gx k,l rxx m k–,

k 0=

∞

∑= m 0≥

m 0≥

sn l+ ix n, xn

ix a

Versie 1.1 6-55 1994


AR model or Eq. 6.22 in case of a MA model:

Eq. 6.203

Now since the innovations are orthogonal we get, after multiplication with and taking

the expectation

Eq. 6.204

because . Hence we have the causal impulse response

Eq. 6.205

with the step function. On the other hand, the cross covariance function can be

expressed in terms of , by means of the whitening filter (with z-transform ),

see Fig.6.18.

Eq. 6.206

Now we multiply Eq. 6.206 by and take the expectation value, arriving at the relation:

Eq. 6.207

The z-transform of this relation gives (recall that correlation in one domain corresponds to

multiplication with the complex conjugate in the other domain)

Eq. 6.208

sn l+l gix k,

l ix n k–,k 0=

∞

∑=

ix n m–,

rsix m l+, gix k,l δm k–

k 0=

∞

∑ gix m,l= = m 0≥

rixix m k–, δm k–=

gix m,l Θmrsix m l+,= allm

Θ rsix m,

rsx m, γ x k, Γx z( )

Fig.6.18.

x n, γ x k, xn k–k 0=

∞

∑=

sn m+

E ix n, sn m+[ ] γ x k, E xn k– sn m+[ ]k 0=

∞

∑=

rsix m, rsx m k+, γ x k,k 0=

∞

∑=

hsixz( ) hsx z( )Γx

∗ z( )=

Versie 1.1 6-56 1994


Now again we assume that the noise is white and orthogonal to the signal (compare Eq. 6.180)

Eq. 6.209

We first treat the pure filter case, with lead time . Now

Eq. 6.210

In practically all cases possesses a spectral factorization

Eq. 6.211

with the inverse of the whitening filter . Combining Eq. 6.210 and

Eq. 6.211 we find

Eq. 6.212

which we insert into Eq. 6.208:

Eq. 6.213

Keeping in mind Eq. 6.205 we now wish to find the causal part of Eq. 6.213, including the

value of its inverse at . Since the inverse z-transform of is zero for

and for it equals

Eq. 6.214

we find for the z-transform of

Eq. 6.215

and since (see Fig.6.18., where indicates ) we finally find the

Wiener filter

Eq. 6.216

It can be proved (Papoulis, p.452) that with lead time

hsν ω( ) 0= hνν ω( ) N=

l 0=

hsx z( ) hss z( ) hsν z( )+ hss z( )= =

hxx z( ) hss z( ) hνν z( )+ hss z( ) N+= =

hxx z( )

hxx z( ) Lx z( ) 2=

L z( ) L z( ) 1 Γ z( )⁄=

hsx z( ) Lx z( ) 2 N–=

hsixz( ) Lx z( ) 2 N–( )Γx

∗ z( ) Lx z( ) NΓx∗ z( )–= =

n 0= Γx∗ z( ) Γx z 1–( )=

n 0> n 0= Γx 0( )

γ x n, Γx z( )zn dz2πiz-----------∫° 0= = n 0<

Γx z 1–( )zn dz2πiz-----------∫° 0= n 0>

Γx z 1–( ) dz2πiz-----------∫° Γx 0( )= n 0=

gix m,0

Gix

0 z( ) Lx z( ) NΓx 0( )–=

Gxl z( ) Γx z( )Gix

l z( )= Hr Gl

Gx0 z( ) 1 NΓx 0( )Γx z( )–=

l 0>

Versie 1.1 6-57 1994

Parameter Estimation and System Identification Time series analysis

Eq. 6.217

where is the inverse z-transform of , the innovations filter for .

Example 14, continued. We want to find the one step ahead Wiener filter for an AR(1) process

corrupted by white noise. First we need to find , the innovations filter for from the

spectral factorization of (Eq. 6.181)

Eq. 6.218

Inserting this into Eq. 6.217 with

Eq. 6.219

since . The

impulse response of the one step ahead Wiener filter is thus given by

Eq. 6.220

which like the smoothing filter of Eq. 6.183 decays geometrically.

The Kalman filter

We now extend the preceding results to nonstationary processes with causal data. Again we

assume a white noise disturbance , and furthermore we assume that the signal is an ARMA

process. In contrast to the Wiener filter we will now assume that we have a finite number of n

observations of data .

Eq. 6.221

Thus is the output of a causal, time-varying system with input , and our

problem is to find its impulse response . From the orthogonality principle we have

Eq. 6.222

Again, after multiplication with and taking the expectation we find

Gxl z( ) zl 1 L 1–

x z( ) λx k, z k–

k 0=

l 1–

∑–

=

λx k, Lx z( ) xn

Lx z( ) xn

hxx ω( )

Lx z( ) Nϕb

--------1 bz 1––1 ϕz 1––-------------------- =

l 1=

Gx1 z( ) z 1 L 1–

x z( )λx 0,–( ) z 1 z ϕ–z b–-----------–

ϕ b–( ) zz b–-----------= = =

λx 0, Lx z( ) dz2πiz-----------∫° resz 0= resz ϕ=+ Nϕ

b-------- b

ϕ--- ϕ b–

ϕ------------+

Nϕb

--------= = = =

gx n,1 ϕ b–( )bnΘn=

νt

xn sn νn+=

sn l+l E sn l+ xk 0 k n≤ ≤,[ ] gx

l n k,[ ]xkk 0=

n

∑= =

sn l+l xnΘ n( )

gxl n k,[ ]

sn l+ sn l+l– xn m–⊥ 0 m n≤ ≤

xn m–

February 26, 2003 page 6-58 VU NI


Eq. 6.223

For a specific n this yields equations for the unknowns .

Analogous to the Wiener filter we proceed by expressing the desired estimate in terms of

the Kalman innovations :

Eq. 6.224

of the process , where is the Kalman whitening filter. The process is

orthonormal (white noise with unit power) and, if the data are linearly independent, then the

processes and are linearly equivalent. Thus analogous to Eq. 6.203 we have

Eq. 6.225

To determine we apply the orthogonality principle (compare Eq. 6.204)

Eq. 6.226

And analogous to Eq. 6.207 we can express this in terms of the cross covariance :

Eq. 6.227

Thus for a specific m, is the response of the Kalman whitening filter of to the

function where n is the variable. To complete the specification of we must

cascade the filter with the whitening filter as in Fig.6.19. which can be

considered the analog of Fig.6.18. for nonstationary processes.

rsx n l+ m,[ ] gxl n k,[ ]rxx k m,[ ]

k 0=

n

∑= 0 m n≤ ≤

n 1+ n 1+ gxl n k,[ ]

sn l+l

ix n,

ix n, γ x n k,[ ]xkk 0=

n

∑=

xnΘn γ x n k,[ ] ix n,

xn ix n,

sn l+l gix

l n k,[ ]ix k,k 0=

n

∑=

gix

l n k,[ ]

rsixn l+ m,[ ] gix

l n k,[ ]δm k–k 0=

n

∑ gix

l n m,[ ]= = 0 m n≤ ≤

rsx m n,[ ]

rsixm n,[ ] γ x n k,[ ]rsx m n,[ ]

k 0=

n

∑=

rsixm n,[ ] xn

rsx m n,[ ] sn l+l

gix

l n k,[ ] γ x n k,[ ]

Fig.6.19.



ARMA signals in white noise

The numerical implementation of the Kalman filter can be drastically simplified for ARMA

signals in white noise (orthogonal to the signal). Proofs of the following can be found in

Papoulis §13-6.

Eq. 6.228

The difference between data and estimated signal is proportional to the Kalman

innovations of the data:

Eq. 6.229

The estimate of equals the pure predictor of the estimate of

Eq. 6.230

Thus filtering and prediction can be reduced to a cascade of a pure filter and a pure predictor

which is illustrated in Fig.6.20. When the signal is a time varying ARMA process

Eq. 6.231

then the estimate is also an ARMA process where the AR coefficients are the same as in

Eq. 6.231 and the MA coefficients are M constants to be determined:

Eq. 6.232

These equations are illustrated in Fig.6.21.

The recursion Eq. 6.232 can be written as a system of M first-order equations (state equations)

rνν m n,[ ] Nnδm n–= rsν m n,[ ] 0=

xn sn0

–

ix n,

xn sn0

– Dnix n,=

Dn2 E xn sn

0–( )2[ ]=

sn l+l

sn l+ sn l+0

sn0

sn

sn l+l

sn l+0

gl n k,[ ] sk0

k 0=

n

∑= =

Fig.6.20.

sn

n a1nsn 1–– …– aM

n sn M–– bknζ

n k–k 0=

M 1–

∑= rζζ m n,[ ] V nδm n–=

sn0

sn0

a1nsn 1–

0– …– aM

n sn M–0

– cknix n k–,

k 0=

M 1–

∑=



or, equivalently, as a first order vector equation. The unknowns are the scalar and the

coefficients . We shall illustrate an AR(1) model, which provides a simple scalar case.

If

Eq. 6.233

then Eq. 6.232 yields

Eq. 6.234

where

Eq. 6.235

We can compare Eq. 6.234 with the forecast update Eq. 6.196 and Eq. 6.198 where we found

that the new prediction ( ) is derived from the old prediction

( ) plus a correction term proportional to the innovation.

in Eq. 6.235 is the mean square error criterion P defined in Eq. 6.173:

Eq. 6.236

The corresponding system is shown in Fig.6.22., where we also show the realization of the

Fig.6.21.

Dn

ckn

sn Ansn 1–– ζn

= E ζn2[ ] V n=

sn0

Ansn 1–0

– Kn xn Ansn 1–0

–( )=

n

Pn

Nn------

An2Pn 1– V n+

An2Pn 1– V n Nn+ +

----------------------------------------------= =

yt

l( ) sn0

ϕ yt 1– l( ) y

t 1– l 1+( )= Ansn 1–0

Pn

Pn E εn2[ ]= εn sn sn

0–=



one-step predictor of

Eq. 6.237

The estimate of is determined recursively: If and are known, then is

determined from Eq. 6.235 and from Eq. 6.234. To start the iteration we must specify the

initial conditions of Eq. 6.233. We shall assume that

Eq. 6.238

Example 14, continued. We shall determine the noncausal, causal and Kalman estimate of an

AR(1) process in terms of the data , and the corresponding MSE P. We

assume that the process satisfies

Eq. 6.239

and that , , . This is a special case of Eq. 6.180 with

, , , .

• Smoothing: is available for all k. We use the impulse response of Eq. 6.183 to find

Eq. 6.240

with MSE

Eq. 6.241

• Causal (Wiener) filter: is available for . We now use Eq. 6.216 and Eq. 6.218

Fig.6.22.

sn 1+

sn 1+1 sn 1+

0Ansn

0= =

sn0

sn Kn 1– sn 1–0

Kn

sn0

0 ζ0

= s00 K0x0= K0

V 0

V 0 N0+-------------------= P0

V 0N0

V 0 N0+-------------------=

sn xn sn νn+=

sn

sn 0.8sn 1–– ζn

=

rζζ m, 0.36δm= rζν m, 0= rνν m, δm=

ϕ 0.8= N 1= N0 0.36= b 0.5=

xk

gn 0.3 0.5 n×=

P E sn gkxn k–k ∞–=

∞

∑–

sn rss 0, gkrsx k,k ∞–=

∞

∑–= = =

N0

1 ϕ2–--------------- 1 c ab( ) k

k ∞–=

∞

∑–bN0

a 1 b2–( )---------------------- 0.3= =

xk k n≤



together with to find the estimator

Eq. 6.242

with impulse response . The estimate satisfies the recursion equation

Eq. 6.243

The resulting MSE equals

Eq. 6.244

• Kalman filter: is available for . Our case is a special case of Eq. 6.233 with

, and . Solution of Eq. 6.235 using Eq. 6.238 yields

Eq. 6.245

with for . Then Eq. 6.234 yields

Eq. 6.246

which is equal to Eq. 6.243. The above shows that, if the process is WSS (wide sense

stationary), then its Kalman filter approaches the Wiener filter as .

State space representation of Kalman filter

ARMA processes and the state space formulation of a linear system are equivalent

representations (recall that any finite order difference equation can be expressed as a vector

first order equation). For example, if we take the AR(2) model

Eq. 6.247

and write , then Eq. 6.247 may be re-written as

Eq. 6.248

Suppose we have the following linear discrete-time state space model

Γx 0( ) λx 0,1– b Nϕ( )⁄= =

Gx0 z( ) 1 N

bNϕ--------

2 1 ϕz 1––( )1 bz 1––( )

-------------------------– ϕ b–ϕ 1 bz 1––( )---------------------------- 0.375z

z 0.5–----------------= = =

0.375 0.5n× Θn sn

sn 0.5 sn 1–– 0.375xn= n 0≥

P rss 0, gkrsx k,k 0=

∞

∑– 0.375= =

xk 0 k n≤ ≤

An 0.8= n 0.36= Nn 1=

Kn Pn

0.48z1n 0.12z2

n–

1.28z1n 0.08z2

n+--------------------------------------= = z1 1.6= z2 0.4=

n Pn 0.375= = n 4≥

sn0

0.8sn 1–0

– 0.375 xn 0.8 sn 1–0

–( )=

sn

n ∞→

yt

ϕ1yt 1– ϕ2y

t 2– εt+ +=

xt2( ) y

t= xt

1( ) ϕ2– yt 1– ϕ2– xt 1–

2( )= =

xt1( )

xt2( )

0 ϕ2–

1– ϕ1

xt 1–1( )

xt 1–2( )

0

1εt+= y

t 0 1xt

1( )

xt2( )

=

Versie 1.1 6-63 1994


Eq. 6.249

with an n-dimensional state vector to be estimated from the measurement vector

using , an known system matrix, an known measurement matrix and the

fact that and are mutually independent Gaussian white noise processes with zero mean

and known covariance matrices and . Furthermore we need the initial condition that

is a Gaussian distributed random vector with mean and with known covariance matrix .

We will write where the subscript will denote time, whereas the superscript denotes

filtering in case and prediction in case . The Kalman filter solves the following

problem: compute the unbiased recursive minimum variance estimate of the stochastic

vector at time , provided up to have been measured. The solution is:

Eq. 6.250

In case of an AR(1) process we find Eq. 6.233-Eq. 6.237 where represents the state, the

measurement with additive noise , , , .

The Kalman gain matrix can be easily understood if we take for the identity matrix. We

then find which gives in combination with the filtering

equation: . Thus the gain matrix is proportional to the uncertainty in the

estimate and inversely proportional to the measurement noise. With small K the difference

between the actual and predicted measurements will only be used for small corrections in the

estimate, whereas with large K this difference will lead to large corrections in the estimates.

Example 16. Estimation of position using two sensors: a distance sensor A (with large random

errors) and a shaft encoder B (incremental encoder with small random errors but with a

systematic error). Four Kalman filters have been used: A, B, A+B, and A+B in an Extended

Kalman filter (which models systematic error nonlinearly). The state vector is composed of

xt 1+ Atxt εt+= the evolution equation

yt

Ctxt νt+= the measurement equation

xt m 1× yt

At n n× Ct m n×

εt νt

t N t x0

x00 P0

0

xts

s t= s t<

xt 1+t 1+

xt 1+ t 1+ y1 yt 1+

xt 1+t At xt

t= prediction equation

xt 1+t 1+ xt 1+

t Kt 1+ yt 1+ Ct 1+ xt 1+

t–( )+= filtering equation

Pt 1+t AtPt

t AtT V t+= prediction equation

Pt 1+t 1+ I Kt 1+ Ct 1+–( )Pt 1+

t= filtering equation

Kt 1+ Pt 1+t Ct 1+

T Ct 1+ Pt 1+t Ct 1+

T Nt 1++( ) 1–= Kalman gain matrix

sn xn

νn sn0 xt

t= sn 1+1 sn 1+

0 xt 1+t= = An At=

Ct

Kt 1+ Pt 1+t Pt 1+

t N t 1++( ) 1–=

t 1+ Pt 1+t 1+ Nt 1+

1–=



position, velocity and acceleration. Uniformly accelerated motion gives the evolution

equation:

Eq. 6.251

distance sensor: , shaft encoder: . The prediction errors of a

cart simulation are illustrated in Fig.6.23.

s

v

a t 1+

1 1 1 2⁄0 1 1

0 0 1

s

v

a t

εt+=

Ct 1 0 0= Ct 0 1 1 2⁄–=

Fig.6.23.



ReferencesBox, G.E.P., and Jenkins, G.M. (1976) Time Series Analysis: forecasting and control. Holden-

Day, San Fransisco.

Harvey, A.C. (1989) Forecasting, structural time series models and the Kalman filter.

Cambridge University Press, Cambridge.

Harvey, A.C. (1990) The econometric analysis of time series. 2nd edition. Philip Allan, New

York.

Kay, S.M., and Marple, S.L. (1981) Spectrum analysis. A modern perspective. Proc. IEEE 69,

1380-1419.

Lagerberg, J. (1991) Handout Caput College Autonome Robotica, FWI, UvA.

Papoulis, A. (1984) Probability, random variables and stochastic processes. McGraw-Hill

Book Co., Singapore.

Priestley, M.B. (1981) Spectral Analysis and Time Series. Academic Press, London. Vol.I+II

Roberts, R.A., and Mullis, C.T. (1987) Digital signal processing. Addison Wesley Publ. Co,

Reading, MA.


Digital Signal Processing Stochastic point processes

7 Stochastic point processesA point process on a space is a stochastic process which by chance indicates a discrete set

of . Usually is a time interval, there are also examples where “events” occur in a plane.

We will only consider point processes where the events are indistinguishable (there are also

marked point processes).

Some examples:

• emissions from a radioactive source produce a random time series, where each emission

corresponds to an event of the point process

• detection times of photons from a radioactive source (different from the previous

example because each photon counted causes a “dead time” of the detector)

• action potentials generated by a neuron. Sensory neurons, interneurons and

motorneurons transfer information, among other mechanisms, by means of action

potentials: a short duration (about one millisecond) pulse like electric activity which

propagates along the neuronal membrane and releases transmitter at synapses. A series

of action potentials can be regarded as a realization of a point process. See Fig.7.1.

• queueing problems. Queueing theory describes a large class of phenomena involving he

arrivals, waiting, servicing and departures of objects (customers).

• the occurences (in time or place) of catastrophes like earth quakes, air plane crashes, car

accidents, lightning, soldiers falling from their horses, etc.

• positions of stars around a center of gravity in space.

• heart beats

• electric organ discharges from weakly electric pulse fish (e.g. the elephant-nose fish)

With ordinary stochastic processes the most simple one was (Gaussian) white noise, where the

values at different times were uncorrelated and thus the power spectrum was flat. The

analogon of white noise for stochastic point processes is the well known Poisson process.

Ω

Ω Ω

Versie 1.1 7-1 1994


The Poisson process

The Poisson process is the most important discrete distribution. Events are independent and

the probability of an event in a small interval of length is proportional to this . We will

first consider stationary, or homogeneous processes. Define the counting process as the

number of events which have occurred up to time t, and the change of N within a small

interval of length as:

Eq. 7.1

We have for the probabilities of occurence of events:

Eq. 7.2

where denotes the intensity of the Poisson process. A point process with a vanishing

probability of more than one event within interval is called orderly. When depends on

Fig.7.1. Examples of extracellular recordings of action potentials from neurons, which

appear as spikes on a noisy baseline. The middle trace shows spikes of different

shape which belong to different neurons.

∆t ∆t

N t( )

∆t

∆N t( ) N t ∆t+( ) N t( )–≡

P ∆N t( )=1[ ] λ∆t o ∆t( )+=

P ∆N t( )=0[ ] 1 λ∆t– o ∆t( )+=

P ∆N t( )>1[ ] o ∆t( )=

λ

∆t λ

Versie 1.1 7-2 1994


time t we are dealing with a nonstationary or inhomogeneous Poisson process. Properties of

the Poisson process are

• is an independent increment process. If then

are statistically independent

• the number of events in an interval I is Poisson distributed with parameter .

With a homogenous Poisson process and interval [ ) we have:

Eq. 7.3

The characteristic function of a stochast X is defined by .

For the Poisson distribution we find and thus

Eq. 7.4

Moments are defined by and can be found from by

differentiation:

Eq. 7.5

For the Poisson distribution we find

Eq. 7.6

• autocorrelation and spectrum: if is a sum of Poisson impulses

N t( )

N t( ) t1 t2 … tk< < <

N t1( ) N t2( ) N t1( )– … N tk( ) N tk 1–( )–, , ,

λ s( )dsI∫

0 t,

P N t( )=k[ ] λt( )k

k!------------e λt–=

Fig.7.2. Relation between counting process and sum of Poisson impulses .N t( ) z t( )

ϕ θ( ) E eiθX[ ] eiθx p x( )dx∫= =

p x( ) λt( )k

k!------------e λt– δ x k–( )

k 0=∞∑=

ϕ θ( ) E eiθN[ ] eiθk λt( )k

k!------------e λt–

k 0=

∞

∑ e λt– λteiθ( )k

k!-------------------

k 0=

∞

∑ eλt exp iθ( ) 1–( )= = = =

µk E Xk[ ]≡ xk p x( )dx∫= ϕ θ( )

µk i k–

θk

k

∂∂ ϕ θ( )

θ 0==

µ1 λt= µ2 λt λt( )2+= var N( ) µ2 µ12– µ1= =

z t( )

Versie 1.1 7-3 1994


Eq. 7.7

then is a stationary process with mean

Eq. 7.8

and the autocovariance of is given by

Eq. 7.9

Proof: is the derivative of the Poisson counting process , thus

Eq. 7.10

where in the second step we have used the fact that differentiation and expectation are both

linear operators and thus can be interchanged. The second moment of is given by

(assume )

Eq. 7.11

where we have used the fact that is an independent increment process and Eq. 7.6.

For the second moment of we twice interchange differentiation and expectation:

Eq. 7.12

where is the step function, the integral of the -function. Insertion of Eq. 7.10 and Eq. 7.12

into Eq. 7.9 gives the desired result.

Thus from Eq. 7.9 we find that the power spectrum of the sum of Poisson impulses is

flat, analogous to white noise:

Eq. 7.13

z t( ) δ t ti–( )i∑=

z t( )

E z t( )[ ] λ=

z t( )

r τ( ) E z t( )z t τ+( )[ ] E2 z t( )[ ]– λδ τ( )= =

z t( ) N t( )

E z t( )[ ] Etd

dN t( )

tdd

E N t( )[ ] λ= = =

N t( )

t2 t1≥

µ2 N, t1 t2,( ) E N t1( )N t2( )[ ] E N t1( ) N t1( ) N t2( ) N t1( )–( )+ [ ]= = =

λ2t12 λt1 λ2 t2 t1–( )t1+ + λ2t1t2 λt1+ λ2t1t2 λmin t1 t2,( )+= =

N t( )

z t( )

µ2 z, t1 t2,( ) Et1∂∂

N t1( )t2∂∂

N t2( )t1∂∂

t2∂∂ λ2t1t2 λmin t1 t2,( )+ = = =

t1∂∂ λ2t1 λΘ t1 t2–( )+ λ2 λδ t1 t2–( )+=

Θ δ

z t( )

h ω( ) 12π------ r τ( )e iωτ– dτ

∞–

∞

∫ λ2π------= =

Versie 1.1 7-4 1994


Shot noise

In an electronic device, the emission of electrons or holes from an electrode (e.g. the collector

of a transistor or cathode of a thermionic valve) occurs in a random manner. This results in

variations in the current output from the device, called shot noise. Shot noise can be modelled

as a filtered sum of Poisson points:

Eq. 7.14

where the impulsresponse results from each event occuring at time .

We now find for the mean of the shot noise (assuming the input is a stationary Poisson

process):

Eq. 7.15

where is the Fourier transform of . The power spectrum of the shot noise is given

by the product of the square of and the (flat) power spectrum of the Poisson impulses:

Eq. 7.16

For the covariance of the shot noise we thus find:

Eq. 7.17

where in the last step we used the property of the Fourier transform that correlation in one

domain corresponds to multiplication with the complex conjugate in the other domain

(compare convolution in one domain and multiplication in the other domain):

s t( ) g t u–( )z u( )du∫ g t u–( )dN u( )∫ g t ti–( )i 1=

N t( )

∑= = =

g t( ) ti

Fig.7.3. Generation of shot noise.

E s t( )[ ] g u( )E z t u–( )[ ]du

0

∞

∫ λ g u( )du

0

∞

∫ λG 0( )= = =

G ω( ) g t( )

G ω( )

hs ω( ) G ω( ) 2λ=

rs t( ) hs ω( )eiωtdω∫ λ G ω( ) 2eiωtdω∫ λ g u( )g u t+( )du∫= = =

Versie 1.1 7-5 1994


Eq. 7.18

Eq. 7.17 is known as Campbell’s theorem.

Application of point processes and correlation to auditory neurophysiology

The famous theoretical neurobiologist Warren McCullough (from the McCullough-Pitts

neural networks of the fourties) wrote: “If I point with my finger, don’t look at my finger”. This

idea has elegantly been applied in the investigation of (the first neuronal parts of) the auditory

system. Sound impinging on the tympanum is translated to point processes in the auditory

nerve, which are processed further in other nuclei in the brain. The brain somehow makes

sense of these point processes and we sense or even perceive noise, talk, and music. Two tasks

can be distinguished: the identification and localization of sound. We will sketch some

applications related to the identification, on the one hand looking at the neuronal response to

different sound stimuli (forward correlation), on the other hand trying to characterize the

sound preceding the action potentials of an auditory neuron (reverse correlation, a method

introduced by De Boer (UvA), and Johannesma, Aertsen, Eggermont (KUN)).

Although most of the auditory system is non-linear, some methods developed from linear

systems theory are still useful, sometimes linearization around a working point is possible,

and sometimes a Taylor like expansion of system properties makes sense.

Time dependent correlation functions and coincidence histograms

In general the response of a neuron to a stationary stimulus like noise or Poisson clicks is non

stationary because of processes like adaptation (short-term or long-term) and habituation

(getting used to a stimulus, the stimulus is no longer interesting). To study time dependent

correlations between non-stationary point process time dependent correlation functions were

introduced by Van Stokkum, Johannesma, and Eggermont (KUN).

We start with a realization of the two point processes A and B which we represent as a series

of -functions:

g u( )g u t+( )du∫ du∫ G ω'( )eiω'udω'∫ G ω( )eiω u t+( )dω∫= =

G ω'( )dω'∫ G ω( )eiωt∫ δ ω ω'+( )dω G ω( ) 2eiωtdω∫=

δ

Versie 1.1 7-6 1994


Eq. 7.19

and define the time dependent crosscorrelation function as

Eq. 7.20

Thus we estimate from a single realization of the two point processes A and B.

Under the assumption that the processes are stationary time averaging makes sense. We divide

the time difference in bins of width and find the crosscoincidence histogram

Eq. 7.21

The definitions for the time dependent autocorrelation function and autocoincidence

histogram follow naturally by replacing B with A in Eq. 7.19-Eq. 7.21.

The effect of interchanging the point processes A and B illustrates the differences between

forward and reverse correlation. Let A be a Poisson distributed click stimulus and let B be the

action potentials of an auditory midbrain neuron in response to this stimulus. The forward

correlation, is illustrated in Fig.7.4.b, where vertical bars are visible, indicating

that the neuron responds with a single spike within an interval of 50 ms, preferably at a

latency of about 10 or 30 ms. The reverse correlation is illustrated in Fig.7.4.a. The clicks

following a spike ( ) are randomly distributed with respect to the time delay . The clicks

preceding a spike constitute the so-called Pre Event Stimulus Ensemble (PESE, introduced by

Johannesma). A clear structure is visible: the stimuli of the PESE consist mostly of two clicks

(as can be deduced from the different symbols) separated by an interval of 20 ms, and situated

at 30 and 10 ms before the spike. The very selective response of this neuron is probably

related to its selectivity for low frequency sounds, which was found when presenting other

stimuli.

Concerning the non-stationarity of the response, the neuron displayed a spontaneous activity

of 2 spikes/s which was first suppressed by the click stimulus (top of Fig.7.4.). This

suppression gradually declined. The time averaged Cross Coincidence Histograms at the

A t( ) δ t ai–( )i 1=

N A

∑= B t( ) δ t b j–( )j 1=

NB

∑=

CCAB t τ,( ) δ t ai–( )i∑ δ t τ b j–+( )

j∑ δ t ai–( )

i∑ δ τ b j ai–( )–( )

j∑= =

CCAB t τ,( )

τ ∆

CCH AB m( ) 1T∆------- dt

0

T

∫ dτm 1 2⁄–( )∆

m 1 2⁄+( )∆

∫ CCAB t τ,( ) 1T∆------- dτ

m 1 2⁄–( )∆

m 1 2⁄+( )∆

∫ δ τ b j ai–( )–( )i j,∑= =

CCAB t τ,( )

τ 0> τ

Versie 1.1 7-7 1994


bottom show the averaged pre event click stimulus (left) and the post stimulus time histogram

(PSTH). The last is related to the excitatory influence of a click, thus to the generator potential

of a neuron.

In Fig.7.5. two more examples of time dependent crosscorrelation diagrams between click

Fig.7.4.

Versie 1.1 7-8 1994


stimulus and spikes from brainstem (Dorsal Medullary Nucleus and Torus Semicircularis)

neurons are shown. The shaded histogram at the bottom indicates the first click preceding a

spike. At the left the DMN neuron again shows facilitation, two clicks within an interval of

about 6 ms are necessary to elicit a spike. The TS neuron in Fig.7.5.b shows suppression of

spontaneous activity followed by activation. Again the shading indicates that multiple clicks

are necessary to elicit a spike. Thus, interpreted forwardly, the influence of a click is first

inhibitory and then excitatory.

Linear systems analysis of the frog middle ear

Studying a linear system three different paradigms in principle give rise to the same

characterization in terms of impulse response of transfer function:

Fig.7.5.

Versie 1.1 7-9 1994


• harmonic analysis or frequency approach (Fig.7.6.)

• impulse response or structured multi-frequency stimulus approach (Fig.7.7.)

• unstructured multi-frequency stimuli: the white noise approach (Fig.7.8.)

Linear systems analysis applied to the auditory nerve fibre responses

In contrast with the continuous output signals of the previous section, we now deal with point

processes. The usual assumption is stationarity of the response, and then time averaging

results in spike rate histograms as a function of frequency (Fig.7.9. left), or Post Stimulus

Time Histograms after a single (condensation or rarefaction) click repeatedly administered

(Fig.7.10.). Before we consider the white noise approach we first present a simplified model

for the peripheral auditory system (excluding the linear middle ear). It can be shown that

Fig.7.6.

Versie 1.1 7-10 1994


Fig.7.7.

Fig.7.8.

Versie 1.1 7-11 1994


Fig.7.9.

Fig.7.10.

Versie 1.1 7-12 1994


Eq. 7.22

Thus if is a click then is the impulse response of the cascaded linear bandpass

filter and the low-pass filter. The compound PSTH (Fig.7.10. left) therefore does not betray

the algebraic non-linearity. It appears that a logical extension of the methods for the analysis

of linear systems to the auditory nervous system is the reverse correlation method using white

noise as input signal. For a linear system this method should also yield the impulse response

of the neuron.

Let us write down the time dependent correlation between the spikes and the stimulus

:

Eq. 7.23

where the constitute the elements of the Pre Event Stimulus Ensemble. Time

averaging of the PESE gives us the reverse correlation function:

Eq. 7.24

Fig.7.11.

n0 t( ) k σ( )0

∞

∫ h τ( )x t τ– σ–( )0

∞

∫ dτdσ=

x t( ) n0 t( )

z t( )

x t( )

CCxz t τ,( ) x t( )z t τ+( ) δ t τ ti–+( )x t( )i 1=

N

∑ δ t τ ti–+( )x ti τ–( )i 1=

N

∑= = =

x ti τ–( )

RevCor τ( ) NT---- 1

N---- x ti τ–( )

i 1=

N

∑=

Versie 1.1 7-13 1994


indicating that equals the average value of the signal that precedes the spikes

times the neurons average firing rate. For white noise as a stimulus it can be shown that

Eq. 7.25

As we can see from Eq. 7.25 a prerequisite for a non zero is that the cut off

frequency of the low-pass filter is higher than the centre frequency of the band-pass

filter . Two examples of reverse correlation functions from the frog DMN are shown

inFig.7.12. and Fig.7.13. Only neurons with best frequencies below about 600 Hz show a non

zero in these cold blooded animals. In mammals the cut off frequency for phase

lock (and thus related to ) is about 4 kHz. Some animals (e.g. weakly electric fish) have

specialized receptors enabling electric synaptic transmission and thus a much higher cut off

frequency for phase lock.

The concept of the Pre Event Stimulus Ensemble helps us to generate a second order

characterization of the stimulus preceding a spike: the Coherent Spectro-Temporal Intensity

Density function, which integrates temporal and spectral information. This CoSTID is defined

in terms of the analytic signal as

Eq. 7.26

For example when then . The Fourier

transform thus only contains power for positive frequencies . Two examples of the

average PESE CoSTID from the DMN are shown in Fig.7.12. and Fig.7.13.

References

Eggermont, J.J., Johannesma, P.I.M., Aertsen, A.M.H.J. (1983) Reverse correlation methods

in auditory research. Quart. Rev. Biophys. 16, 341-414.

Papoulis, A. (1984) Probability, random variables and stochastic processes. McGraw-Hill

Book Co., Singapore.

Van Stokkum I.H.M., Johannesma P.I.M., Eggermont J.J. (1986) Representation of time-

dependent correlation and recurrence-time functions. A new method to analyse non-

stationary point-processes. Biological Cybernetics 55, 17-24.

Van Stokkum I.H.M. (1987) Sensitivity of neurons in the dorsal medullary nucleus of the

grassfrog to spectral and temporal characteristics of sound.HearingResearch 29,223-235

RevCor τ( )

2RevCor τ( ) k σ( )h τ σ–( )0

∞

∫ dσ=

RevCor τ( )

k σ( )

h τ( )

RevCor τ( )

k σ( )

ξ t( )

CoSTID ω t,( ) ξ∗ ω( )e iωt– ξ t( )=

x t( ) ωt( )acos b ωt( )sin+= ξ t( ) a ib–( )eiωt=

ξ ω( ) ω

Versie 1.1 7-14 1994


Fig.7.12.

Versie 1.1 7-15 1994


Fig.7.13.

Versie 1.1 7-16 1994

Digital Signal Processing Matrix fundamentals

Appendix 1 Matrix fundamentalsVectors and matrices are represented by, respectively, lower case and upper case characters, if

possible italic. Underlining of characters denotes stochastic variables. A hat ( ) denotes

estimator. is the transpose of A. is the Moore-Penrose generalized inverse of A.

Let f be a scalar function of x, then the gradient of f with respect to x is defined by the row

vector . The gradient vector g is defined as .

The second derivative with respect to x is a matrix which is called the Hessian and is defined

by . Let x and y be and

column vectors, then the derivative of y with respect to x is an matrix called the

Jacobian and is defined by . Thus the Jacobian of the gradient is the

Hessian.

The quadratic form is defined as . Now since

and the latter part does not contribute to we will further assume that Q is symmetric.

Quadratic forms are classified according to their sign. If for all vectors x, then Q is

positive definite. ( positive semidefinite; negative definite; negative

semidefinite; sign indefinite). We mention a few useful identities:

Eq. A.1

Appendix 2 Probability theorySome definitions and theorems from probability theory which are used in parameter

estimation are summarized below.

For an random vector (NB the underscore indicates that is stochastic) the

probability distribution function is defined as the probability that , ,

, in formula

Eq. A.2

Under reasonable conditions the probability distribution function possesses a derivative which

is called the probability density function (PDF) and which satisfies

a

AT A†

x∂∂

f x( )x1∂

∂ fx2∂

∂ f …, , = g

x∂∂ f

T

=

x2

2

∂∂

f x( )x∂∂

x∂∂

f x( ) T

H Hij xi∂x j

2

∂∂ f== = m 1× n 1×

n m×

x∂∂y

J Jij x j∂∂yi==

q x( ) q x( ) xT Qx= QQ QT+

2------------------ Q QT–

2-----------------+=

q x( )

xT Qx 0>

0≥ 0< 0≤

x∂∂

cT x cT=x∂∂

xT Qx 2xT Q=x2

2

∂∂

xT Qx 2Q=

n 1× y y

Fy y1 y1≤ y2 y2≤

… yn

yn≤,

Fy y( ) P y1 y1≤ y2 y2≤ … yn

yn≤, , ,( )=

f y

Versie 1.1 A-1 1994

Digital Signal Processing Probability theory

Eq. A.3

Eq. A.4

Marginal distribution functions and marginal density functions result from letting one or more

of the , e.g. the probability distribution function for is given by

Eq. A.5

Saying that two random variables and are independent means that their joint

distribution function is equal to the product of the marginal

distributions. Generalizing: n random variables with joint PDF and

marginal PDF’s are independent if and only if

Eq. A.6

The ensemble average of a function of random variables is defined in terms of the expectation

operation. The expected value of a (vector) function g of is defined as

Eq. A.7

Thus expectation is a linear operation on the function g.

is called the mean value and shall be denoted as (or ). The (auto) covariance

matrix is the matrix whose element is . Note that the

diagonal elements of are the variances of the individual random variables:

. If two random vectors and are independent

then their covariance matrix equals zero

Eq. A.8

Vectors satisfying Eq. A.8 are said to be uncorrelated. Uncorrelated vectors are not necessarily

independent. For normally distributed random vectors uncorrelated implies independence.

Correlation is scaled covariance. We estimate the correlation coefficient r with

Fy y( ) … f y ψ1 ψ2 … ψn, , ,( )dψ1dψ2…dψn

∞–

y1

∫∞–

y2

∫∞–

yn

∫=

f y y( )y1∂y2…∂yn

n

∂∂

Fy y( )=

yi ∞→ y1

Fy y1 ∞ … ∞, , ,( ) Fy1y1( )=

y1 y2

Fy1 y2, y1 y2,( ) Fy1y1( )Fy2

y2( )

y1 y2 … yn

, , , f y y( )

f yiyi( )

f y y( ) f yiyi( )

i 1=

n

∏=

y

E g y( )[ ] … g y1 … yn, ,( ) f y y( )dy1…dyn

∞–

∞

∫∞–

∞

∫ g y( ) f y y( )dy∫≡=

E y[ ] µ µy

D y( ) n n× i jth E yi

µi–( ) yj

µ j–( )[ ]

D y( )

D yi

( ) E yi

µi–( )2[ ] E yi2[ ] µi

2–= = y z

D y z,( ) E y µy–( ) z µz–( )T[ ] E y µy–( )[ ]E z µz–( )T[ ] 0= = =

Versie 1.1 A-2 1994

Digital Signal Processing Probability theory

Eq. A.9

where the mean and the variance are estimated as usual: and

. Illustrations of correlations are shown in Fig.A.1.

The covariance matrix (denoted by D) of two random vectors and is defined by:

Eq. A.10

Theorem A.1 The autocovariance matrix is positive semidefinite:

Eq. A.11

Proof:

Eq. A.12

Now if we substitute and we get . Since is a

scalar its variance must be non-negative, thus which proves Eq. A.11.

r x y,

1n--- xi x–( ) y

iy–( )

i 1=

n

∑sxsy

-------------------------------------------------=

x1n--- xii 1=

n∑=

sx2 1

n--- xi x–( )2

i 1=n∑=

Fig.A.1.

y z

D y z,( ) E y E y[ ]–( ) z E z[ ]–( )T[ ] E yzT[ ] E y[ ]E zT[ ]–= =

D y( ) D y y,( ) 0≥≡

D Ay b+( ) E Ay b E Ay b+[ ]–+( ) Ay b E Ay b+[ ]–+( )T[ ]= =

E A y E y[ ]–( ) y E y[ ]–( )T AT[ ] AD y( )AT=

A aT= b 0= D aT y( ) aT D y( )a= aT y

aT D y( )a 0≥

Versie 1.1 A-3 1994

Digital Signal Processing Transformation of random variables

Transformation of random variables

Suppose that is a one to one mapping of into and that is the inverse mapping

such that for we have . If possesses PDF then the PDF of is

Eq. A.13

where is the absolute value of the determinant of the Jacobian

Eq. A.14

We apply Eq. A.13 to derive the general form of the multivariate normal distribution. Assume

that is a vector of n iid normally distributed random variables with PDF

Eq. A.15

Consider the transformation with A a regular matrix. Then

and the Jacobian is given

by . Thus from Eq. A.13 we find

Eq. A.16

Now define then with we find for a multivariate

Gaussian distribution with mean and covariance matrix V:

Eq. A.17

Chi-squared, F and t distributions

When is then has the central distribution with n degrees of freedom.

The random variable possesses expectation and variance:

Eq. A.18

Two independent variables each having central distributions form the basis of the central

F-distribution. If and then , the central F-distribution

g Rn Rn g 1–

y g x( )= x g 1– y( )= x f x y

f y y( ) f x g 1– y( )( ) J y( )=

J y( )

J y( ) dety∂∂

g 1– y( )=

x N 0 1,( )

f x x( ) 2π( ) n 2/– e12---xT x–

=

y Ax µ+=

xT x A 1– y µ–( )( )T A 1– y µ–( )( ) y µ–( )T A T– A 1– y µ–( )= =

y∂∂

A 1– y µ–( )( ) A 1–=

f y y( ) 2π( ) n 2/– detA 1– exp12--- y µ–( )T A T– A 1– y µ–( )–

=

A T– A 1– V 1–= detV 1– detA 1–( )2 0>=

µ

f y y( ) 2π( )ndetV( ) 1 2/– exp12--- y µ–( )T V 1– y µ–( )–

=

x N x 0 I,( ) xT x u= χ2

u

u xT x χn2∼= E u[ ] n= D u( ) 2n=

χ2

u1 χn1

2∼ u2 χn2

2∼ vu1 n1⁄u2 n2⁄-------------- Fn1 n2,∼=

Versie 1.1 A-4 1994

Digital Signal Processing Chi-squared, F and t distributions

with and degrees of freedom. The random variable has expectation and variance:

Eq. A.19

Finally, the ratio of a normally distributed variable to one that has a distribution is the

basis of Student’s t-distribution. Thus . Its mean and variance are given by

Eq. A.20

Note that . Some examples of these three distributions are shown in Fig.A.2.

n1 n2 v

vu1 n1⁄u2 n2⁄-------------- Fn1 n2,∼= E v[ ]

n2

n2 2–--------------= D v( )

2n22 1 n2 2–( ) n1⁄+( )n2 2–( )2 n2 4–( )

----------------------------------------------------=

χ2

zx

u n⁄-------------- tn∼=

zx

u n⁄-------------- tn∼= E z[ ] 0= D z( ) n

n 2–------------=

z2 x2

u n⁄---------- F1 n,∼=

Fig.A.2. Probability densities of several , F and t-distributions.χ2

Versie 1.1 A-5 1994

Digital Signal Processing Index

Numerics-3 dB point 4-2

AADC 1-1AIC 6-27, 6-38Akaike’s Information Criterion 6-27aliasing 1-20, 4-6, 4-8all-pole 6-1ambiguity 1-3analog filter 4-5, 4-8AR 6-1ARIMA 6-12ARMA 6-2, 6-26ARMAX 6-13autocorrelation 7-3autocorrelation function 6-2, 6-23autocovariance 7-4autocovariance function 6-2, 6-22autocovariance matrix A-2, A-3autoregressive 6-1, 6-24autoregressive distributed lag 6-13autoregressive spectral estimation 6-38, 6-42

Bbackcast 6-47backward difference operator 6-4backward predictor 6-47bandpass filter 3-3, 4-2, 4-8bandstop filter 4-4bias 6-33bilinear transformation 4-5Blackman-Tukey estimate 6-38Box-Jenkins approach 6-50Butterworth 4-6

CCanadian lynx series 6-39cascade 2-8, 4-2, 4-7causal signal 2-3characteristic function 7-3Chebyshev polynomial 4-5chi-squared distribution A-4coherency 6-18, 6-37coherency spectrum 6-18coincidence histogram 7-7Co-integration 6-13comb filter 4-12complex conjugate pair 4-2

conditional expectation 6-47consistent 6-22continuous signal 1-1convolution 1-18, 2-3convolution integral 1-7convolution sum 1-6correlation coefficient 6-18, 6-21, A-2covariance matrix A-2, A-3cross-correlation function 6-17cross-covariance function 6-17, 6-35cross-periodogram 6-37cross-spectra 6-35cross-spectral density 6-18

DD() A-2DAC 1-1decibel (dB) 3-6definiteness A-1DFT 1-23, 6-46difference equation models 6-1difference operator 6-12digital resonator 4-12digital signal processing 1-1Dirichlet conditions 1-14Dirichlet kernel 6-31Discrete Fourier Transform 1-11, 1-24discrete parameter process 6-2discrete signal 1-1DSP 1-1dynamic systems 6-1

EE A-2eigenfunctions 1-8, 1-14, 2-1ensemble 6-2ensemble average A-2exogeneity 6-13expectation A-2expected value A-2explanatory variables 6-13exponentially decaying signal 2-2, 2-8

FFast Fourier Transform 1-24F-distribution A-4feedback 4-1Fejer kernel 6-28FFT 1-25, 6-46

Versie 1.1 I-1 1994


filtering and prediction 6-47, 6-59final value theorem 2-5FIR (finite impulse response) 3-1first order difference (FOD) 3-10forecast 6-47, 6-50forecast error 6-50, 6-53forward difference operator 6-15forward predictor 6-47Fourier series 1-8, 1-14, 6-4Fourier transform 1-16, 2-1, 6-4, 6-27Fourier-Stieltjes transform 6-5frequency sampling 4-12fundamental frequency 1-3, 1-23, 5-1

GGaussian processes 6-3Gibbs phenomenon 1-16gradient A-1

HHamming window 3-6, 3-8, 5-4Hanning window 3-7, 6-42harmonic process 6-38Hessian A-1Hilbert space 6-50

IIdealized filter frequency response 3-1impulse response 1-5, 1-7independent increment process 7-3independent variables A-2initial value theorem 2-5innovations 6-50, 6-54, 6-58invertibility 6-10

JJacobian A-1

KKaiser window 3-7Kalman filter 6-57Kalman gain matrix 6-63

Llag window 6-30Laplace transform 4-5lead time 6-50leading indicator 6-13leakage 1-23, 5-1Line spectra 6-38

linear prediction 6-1linear systems 6-1linear time-invariant systems 1-5linear-phase 3-2, 4-1, 4-12log-likelihood function 6-24low-pass filter 3-3, 3-4LTI 1-5LTI system 5-4

MMA 6-1mean square error 6-47, 6-51, 6-60minimum error approximation 1-15mixed spectrum model 6-39, 6-41modulation 1-11, 1-18moving average 6-1, 6-25MSE 6-47, 6-54multivariate Gaussian distribution A-4multivariate normal distribution A-4

Nnoise 4-16, 5-2noncausal estimate 6-48noncausal Wiener filter 6-49non-parametric 6-38nonstationarity 6-12nonstationary processes 6-57notation convention A-1notch filter 4-4Nyquist 1-20

Ooversampling 5-5overshoot 1-16

Pparallel 4-10, 4-13parametric model 6-38Parks and McClellan 3-10Parseval’s theorem 1-12partial autocorrelation function 6-9, 6-25partial fraction expansion 2-4, 4-9passband 4-2passband-stopband transition 3-6PDF A-1periodic convolution 1-11periodogram 6-27peripheral auditory system 7-10phase distortion 3-2, 4-1

Versie 1.1 I-2 1994


point process 7-1Poisson distribution 7-3Poisson process 7-2positive semidefinite A-3post stimulus time histogram 7-8power 6-5, 6-7, 6-15, 6-27Pre Event Stimulus Ensemble 7-7Prediction 6-50predictor 6-61probability density function 6-47, A-1pseudo periodic behaviour 6-39

Qquadratic form A-1

Rrandom process 6-2realization 6-2reconstruction filter 1-21rectangular window 3-6, 5-4region of convergence (ROC) 2-2residual variance bound 6-21resolution 5-5, 6-46resonator 4-12reverse correlation 7-7, 7-13ringing 3-5ripple 3-6, 3-8, 3-9

Ssampled data 6-1sampling 1-19, 4-8sampling frequency 1-20sampling theorem 1-20selectivity 2-9Shannon 1-20shot noise 7-5sidelobe 3-6, 5-3sinc function 1-17, 4-12smoothing 6-30, 6-47, 6-48spectral analysis 1-23spectral factorization 6-56spectral representation 6-4, 6-14spectral window 6-30spectrum 6-5, 6-6stability 4-13state equations 6-59state space 6-62stationarity 6-2, 6-10steady state response 2-5

stochastic process 6-2summation operator 6-12

Ttapering 5-4t-distribution A-5time delay 3-2time dependent correlation 7-6time series 6-1time shift 1-11, 1-18, 2-3time-invariant 1-5transfer function 1-11, 2-3, 6-1triangular window 3-6, 5-4

Uuncorrelated variables A-2unilateral z-transform 2-3Updating the forecasts 6-53

Vvariance A-2Von Hann window 3-6

Wwhite noise 6-3whitening filter 6-55, 6-58wideband 5-4Wiener filter 6-54Wiener-Hopf equation 6-54window 1-21, 6-30

YYule-Walker equations 6-9, 6-25

Zzero-filling 5-4zero-padding 5-4, 6-46

Versie 1.1 I-3 1994

1 Systems and Signals 1-1 - Department of Physics and ...ivo/DSP/CV.pdf · time. This signal is...

Documents

Transcript of 1 Systems and Signals 1-1 - Department of Physics and ...ivo/DSP/CV.pdf · time. This signal is...