Digital Signal Processin-TU Darmstadt

Darmstadt University of Technology

Digital Signal Processing

Abdelhak M. Zoubir

Signal Processing GroupInstitute of Telecommunications

Darmstadt University of TechnologyMerckstr. 25, 64283 Darmstadt, Germany

e-mail: [email protected]: http://www.spg.tu-darmstadt.de

Manuscript draft for Digital Signal Processing

c©Abdelhak Zoubir 2006Last revised: October 2006.

Contents

Preface xi

1 Discrete-Time Signals and Systems 11.1 Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 The unit sample sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.2 The unit step sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.3 Exponential and sinusoidal sequences . . . . . . . . . . . . . . . . . . . . 41.1.4 Periodic Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2 Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.1 The unit sample response of linear systems . . . . . . . . . . . . . . . . . 8

1.3 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.4 Stability and Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.4.1 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.4.2 Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Digital Processing of Continuous-Time Signals 132.1 Periodic Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2 Reconstruction of Band-Limited Signals . . . . . . . . . . . . . . . . . . . . . . . 152.3 Discrete-time Processing of Continuous-Time Signals. . . . . . . . . . . . . . . . 15

3 Filter Design 193.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2 Digital Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.3 Linear Phase Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.3.1 Generalised Linear Phase for FIR Filters . . . . . . . . . . . . . . . . . . . 223.4 Filter Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.5 Properties of IIR Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.5.1 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.5.2 Linear phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4 Finite Impulse Response Filter Design 334.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.2 Design of FIR Filters by Windowing . . . . . . . . . . . . . . . . . . . . . . . . . 334.3 Importance of the Window Shape and Length . . . . . . . . . . . . . . . . . . . . 354.4 Kaiser Window Filter Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.5 Optimal Filter Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.6 Implementation of FIR Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

i

ii CONTENTS

5 Design of Infinite Impulse Response Filters 535.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535.2 Impulse Invariance Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545.3 Bilinear Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575.4 Implementation of IIR Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.4.1 Direct Form I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605.4.2 Direct Form II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615.4.3 Cascade form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.5 A Comparison of FIR and IIR Digital Filters . . . . . . . . . . . . . . . . . . . . 63

6 Introduction to Digital Spectral Analysis 656.1 Why Spectral Analysis? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

7 Random Variables and Stochastic Processes 697.1 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

7.1.1 Distribution Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697.1.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697.1.3 Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707.1.4 Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 707.1.5 Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 717.1.6 χ2 Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 727.1.7 Multiple Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 737.1.8 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747.1.9 Operations on Random Variables . . . . . . . . . . . . . . . . . . . . . . . 777.1.10 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 777.1.11 Expected Value of a Function of Random Variables . . . . . . . . . . . . . 787.1.12 The Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797.1.13 Transformation of a Random Variable . . . . . . . . . . . . . . . . . . . . 807.1.14 The Characteristic Function . . . . . . . . . . . . . . . . . . . . . . . . . . 81

7.2 Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 837.3 Stochastic Processes (Random Processes) . . . . . . . . . . . . . . . . . . . . . . 84

7.3.1 Stationarity and Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . 867.3.2 Second-Order Moment Functions and Wide-Sense Stationarity . . . . . . 88

7.4 Power Spectral Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 937.4.1 Definition of Power Spectral Density . . . . . . . . . . . . . . . . . . . . . 937.4.2 Relationship between the PSD and SOMF: Wiener-Khintchine Theorem . 957.4.3 Properties of the PSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 967.4.4 Cross-Power Spectral Density . . . . . . . . . . . . . . . . . . . . . . . . . 97

7.5 Definition of the Spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 997.6 Random Signals and Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . 100

7.6.1 Convergence of Random Variables . . . . . . . . . . . . . . . . . . . . . . 1007.6.2 Linear filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1017.6.3 Linear filtered process plus noise . . . . . . . . . . . . . . . . . . . . . . . 106

8 The Finite Fourier Transform 1118.1 Statistical Properties of the Finite Fourier Transform . . . . . . . . . . . . . . . . 111

8.1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

CONTENTS iii

9 Spectrum Estimation 1159.1 Use of bandpass filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

10 Non-Parametric Spectrum Estimation 11910.1 Consistency of Spectral Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . 11910.2 The Periodogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

10.2.1 Mean of the Periodogram . . . . . . . . . . . . . . . . . . . . . . . . . . . 12010.2.2 Variance of the Periodogram . . . . . . . . . . . . . . . . . . . . . . . . . 122

10.3 Averaging Periodograms - Bartlett Estimator . . . . . . . . . . . . . . . . . . . . 12410.4 Windowing Periodograms - Welch Estimator . . . . . . . . . . . . . . . . . . . . 12610.5 Smoothing the Periodogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12710.6 A General Class of Spectral Estimates . . . . . . . . . . . . . . . . . . . . . . . . 13010.7 The Log-Spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13310.8 Estimation of the Spectrum using the Sample Covariance . . . . . . . . . . . . . 133

10.8.1 Blackman-Tukey Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 13410.9 Cross-Spectrum Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

10.9.1 The Cross-Periodogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

11 Parametric Spectrum Estimation 13911.1 Spectrum Estimation using an Autoregressive Process . . . . . . . . . . . . . . . 13911.2 Spectrum Estimation using a Moving Average Process . . . . . . . . . . . . . . . 14111.3 Model order selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

12 Discrete Kalman Filter 14512.1 State Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14512.2 Derivation of Kalman Filter Equations . . . . . . . . . . . . . . . . . . . . . . . . 14512.3 Extended Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149.1 Discrete-Time Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

iv CONTENTS

List of Figures

1.1 Example of a continuous-time signal. . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Example of a continuous-time signal. . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Example of a sampled continuous-time signal. . . . . . . . . . . . . . . . . . . . 21.4 The sequence p(n). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.5 Discrete Time Signal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.6 Exponential Sequence with 0 < α < 1. . . . . . . . . . . . . . . . . . . . . . . . . 41.7 Exponential Sequence with −1 < α < 0. . . . . . . . . . . . . . . . . . . . . . . . 51.8 Sinusoidal Sequence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.9 Real and Imaginary parts of x(n). . . . . . . . . . . . . . . . . . . . . . . . . . . 61.10 Representation of a discrete-time system. . . . . . . . . . . . . . . . . . . . . . . 71.11 Linear Shift Invariant System. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.12 Transversal Filter input and output for q = 3. . . . . . . . . . . . . . . . . . . . . 101.13 Convolution operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1 Continuous-Time to digital converter. . . . . . . . . . . . . . . . . . . . . . . . . 142.2 Effect in the frequency domain of sampling in the time domain: (a) Spectrum of

the continuous-time signal, (b) Spectrum of the sampling function, (c) Spectrumof the sampled signal with Ωs > 2ΩB, (d) Spectrum of the sampled signal withΩs < 2ΩB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3 Discrete-time processing of continuous-time signals. . . . . . . . . . . . . . . . . . 172.4 Frequency Spectra. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.5 Continuous-Time Filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.1 A basic system for discrete-time filtering of a continuous-time signal. . . . . . . . 203.2 Linear and time-invariant filtering of s(n). . . . . . . . . . . . . . . . . . . . . . . 213.3 Examples of the four types of linear phase FIR filters . . . . . . . . . . . . . . . . 253.4 Frequency response of type I filter of Example 9.3.1. Magnitude (left). Phase

(right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.5 Frequency response of type II filter of Example 9.3.2. Magnitude (left). Phase

(right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.6 Frequency response of type III filter of Example 9.3.3. Magnitude (left). Phase

(right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.7 Frequency response of type IV filter of Example 9.3.4. Magnitude (left). Phase

(right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.8 Frequency response of ideal low-pass, high-pass, band-pass, and band-stop filters. 283.9 Specifications of a low-pass filter. . . . . . . . . . . . . . . . . . . . . . . . . . . 293.10 Low-pass filter with transition band. . . . . . . . . . . . . . . . . . . . . . . . . 29

v

vi LIST OF FIGURES

3.11 Tolerance scheme for digital filter design. (a) Low-pass. (b) High pass. (c)Bandpass. (d) Stopband. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.1 The sequence h(n). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.2 Magnitude of the Fourier transform of a rectangular window (N = 8). . . . . . . 364.3 Commonly used windows for FIR filter design N = 200. . . . . . . . . . . . . . . 374.4 Fourier transform (magnitude) of rectangular window (N = 51). . . . . . . . . . 384.5 Fourier transform (magnitude) of Bartlett (triangular) window (N = 51). . . . . 384.6 Fourier transform (magnitude) of Hanning window (N = 51). . . . . . . . . . . . 384.7 Fourier transform (magnitude) of Hamming window (N = 51) . . . . . . . . . . . 394.8 Fourier transform (magnitude) of Blackman window (N = 51). . . . . . . . . . . 394.9 Fourier transform of a Hamming window with window length of N = 101. . . . 394.10 Fourier transform of a Hamming window with window length of N = 201. . . . 404.11 (a) Kaiser windows for β = 0 (solid), 3 (semi-dashed), and 6 (dashed) and N = 21.

(b) Fourier transforms corresponding to windows in (a). (c) Fourier transforms ofKaiser windows with β = 6 and N = 11 (solid), 21 (semi-dashed), and 41 (dashed). 41

4.12 Response functions. (a) Impulse response. (b) Log magnitude. . . . . . . . . . . 434.13 Kaiser window estimate. (a) Impulse response. (b) Log magnitude. . . . . . . . . 444.14 Filter Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.15 Flowchart of Parks-McClellan algorithm. . . . . . . . . . . . . . . . . . . . . . . . 504.16 Signal flow graph for a delay element. . . . . . . . . . . . . . . . . . . . . . . . . 514.17 Signal Flow graph representing the convolution sum. . . . . . . . . . . . . . . . 514.18 Efficient signal flow graph for the realisation of a type I FIR filter. . . . . . . . . 51

5.1 Low-pass filter specifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555.2 Magnitude transfer function Hc(jΩ). . . . . . . . . . . . . . . . . . . . . . . . . . 565.3 Filter specifications for Example 5.2.2. . . . . . . . . . . . . . . . . . . . . . . . . 565.4 Mapping of the s-plane onto the z-plane using the bilinear transformation . . . . 585.5 Mapping of the continuous-time frequency axis onto the unit circle by bilinear

transformation (Td = 1). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585.6 Signal flow graph of direct form I structure for an (N − 1)th-order system. . . . 615.7 Signal flow graph of direct form II structure for an N − 1th-order system. . . . 625.8 Cascade structure for a 6th-order system with a direct form II realisation of each

second-order subsystem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

6.1 A Michelson Interferometer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

7.1 Probability function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 717.2 χ2

ν distribution with ν = 1, 2, 3, 5, 10 . . . . . . . . . . . . . . . . . . . . . . . . . 737.3 The quadrant D of two arbitrary real numbers x and y. . . . . . . . . . . . . . . 747.4 Bivariate probability distribution function. . . . . . . . . . . . . . . . . . . . . . . 757.5 Properties for n=2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 767.6 Probability that (X1, X2) is in the rectangle D3. . . . . . . . . . . . . . . . . . . 767.7 Voltage waveforms emitted from a noise source. . . . . . . . . . . . . . . . . . . . 857.8 Measurement of SOMF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 907.9 A linear filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1017.10 Interaction of two linear systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 1047.11 Interaction of two linear systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1057.12 Cascade of L linear systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

LIST OF FIGURES vii

7.13 Cascade of two linear systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1067.14 A linear filter plus noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

8.1 Transfer function of an ideal bandpass filter. . . . . . . . . . . . . . . . . . . . . 114

9.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1159.2 Transfer function of an ideal band-pass filter . . . . . . . . . . . . . . . . . . . . 116

10.1 The window function 1N |∆N (ejω)|2 for N = 20, N = 10 and N = 5. . . . . . . . 122

10.2 The periodogram estimate of the spectrum (solid line) and the true spectrum(dotted line). The abscissa displays the frequency ω

2π while the ordinate showsthe estimate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

10.3 The periodogram as an estimate of the spectrum using 4 non-overlapping seg-ments. The abscissa displays the frequency ω

2π while the ordinate shows theestimate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

10.4 Function Am(ejω) for N = 15 and m = 0, 1, 3, 5. . . . . . . . . . . . . . . . . . . 12910.5 The smoothed periodogram for values of m = 2, 5, 10 and 20 (from the top left to

bottom right). The abscissa displays the frequency ω2π while the ordinate shows

the estimate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13010.6 Function ASW (ejω) for N = 15 and m = 4 with weights Wk in shape of (starting

from the upper left to the lower right) a rectangular, Bartlett, Hamming andBlackman window. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

11.1 Linear and time-invariant filtering of white noise . . . . . . . . . . . . . . . . . . 13911.2 Filter model for a moving average process . . . . . . . . . . . . . . . . . . . . . . 14111.3 Estimate of an MA process with N = 100 and q = 5 . . . . . . . . . . . . . . . . 14311.4 True and estimated AR(6) spectrum. The abscissa displays the frequency ω

2πwhile the ordinate shows the estimate. . . . . . . . . . . . . . . . . . . . . . . . 144

11.5 Akaike and MDL information criteria for the AR(6) process for orders 1 through10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

12.1 Relationship between system state and measurements assumed for Kalman filter. 146

viii LIST OF FIGURES

List of Tables

3.1 The four classes of FIR filters and their associated properties . . . . . . . . . . . 24

4.1 Suitability of a given class of filter for the four FIR types . . . . . . . . . . . . . 334.2 Properties of different windows. . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

ix

x LIST OF TABLES

Preface

This draft is provided as an aid for students studying the subject Digital Signal Processing. Itcovers the material presented during lectures. Useful references on this subject include:

1. Oppenheim, A.V. & Schafer, R.W. Discrete-Time Signal Processing, Prentice-Hall, 1989.

2. Proakis, J.G. & Manolakis, D.G. Digital Signal Processing Principles, Algorithms, andApplications, Maxwell MacMillan Editions, 1992.

3. Mitra, S.K. Digital Signal Processing: A Computer-Based Approach, Prentice-Hall, 1998.

4. Diniz, P. & da Silva, E. & Netto, S. Digital Signal Processing, System Analysis and Design,Cambridge University Press, 2002

5. Porat, B. A Course on Digital Signal Processing, J. Wiley & Sons Inc., 1997

6. Kammeyer, K.D. & Kroschel, K. Digitale Signalverarbeitung, Teubner Studienbucher, 1998

7. Hansler, E. Statistische Signale, Springer Verlag, 3. Auflage, 2001

8. Bohme, J.F. Stochastische Signale, Teubner Studienbucher, 1998

9. Brillinger, D.R. Time Series, Data Analysis and Theory, Holden Day Inc., 1981

10. Priestley, M.B. Spectral Analysis and Time Series, Academic Press, 2001

xi

Chapter 1

Discrete-Time Signals and Systems

1.1 Signals

Most signals can in practice be classified into three broad groups:

1. Analog or continuous-time signals are continuous in both time and amplitude. Examples:speech, image, seismic and radar signals. An example of a continuous time signal is shownin Figure 1.1.

x(t)

t

Figure 1.1: Example of a continuous-time signal.

2. Discrete-time signals are discrete in time and continuous in amplitude. An example ofsuch a signal is shown in Figure 1.2. A common way to generate discrete-time signals isby sampling continuous-time signals. There exist, however, signals which are intrinsicallydiscrete in time, such as the average monthly rainfall.

3. Digital signals are discrete in both time and amplitude. Digital signals may be generatedthrough the amplitude quantisation of discrete-time signals. The signals used by computersare digital, being discrete in both time and amplitude.

The development of digital signal processing concepts based on digital signals requires adetailed treatment of amplitude quantisation, which is extremely difficult and tedious. Inaddition, many useful insights are lost in such a treatment because of the mathematicalcomplexity.

For this reason, digital signal processing concepts have been developed based on discrete-time signals and continuous-time signals. Experience shows that the theories developed

1

2 CHAPTER 1. DISCRETE-TIME SIGNALS AND SYSTEMS

6

-n

x(n)

1 2 3

Figure 1.2: Example of a continuous-time signal.

based on continuous-time or discrete-time signals are often applicable to digital signals. Adiscrete-time signal (sequence) will be denoted by functions whose arguments are integers.

For example, x(n) represents a sequence which is defined for all integer values of n. Ifn is not integer then x(n) is not zero, but is undefined. The notation x(n) refers to thediscrete-time function x or to the value of the function x, at a specific n. Figure 1.3 givesan example of a sampled signal with sampling period T . Herein x(n) = x(t)|t=nT = x(nT ),n integer and T > 0.

x(n)

1Τ 2Τ 3Τ t

Figure 1.3: Example of a sampled continuous-time signal.

There are some sequences which play an important role in digital signal processing. Wediscuss these next.

1.1.1 The unit sample sequence

The unit sample sequence is denoted by δ(n) and is defined as

δ(n) =

1, n=0,0, otherwise.

The sequence δ(n) may be regarded as an analog to the impulse function used in continuous-time system analysis. An important aspect of the impulse sequence is that any sequence canbe represented as a linear combination of delayed impulses. For example, the sequence p(n) inFigure 1.4 can be expressed as

p(n) = a−1δ(n + 1) + a1δ(n − 1) + a3δ(n − 3) .

1.1. SIGNALS 3

In general,

6

-−1 1 3

n

a−1

p(n)

a1 a3

Figure 1.4: The sequence p(n).

x(n) =∞∑

k=−∞x(k)δ(n − k) ,

as shown in Figure 1.5.

-

6

-

6

x(n)

n1

k

δ(n − k)

n

x(k)

Figure 1.5: Discrete Time Signal.

1.1.2 The unit step sequence

The unit step sequence denoted by u(n) is defined as

u(n) =

1, n ≥ 0,0, otherwise.

The sequence u(n) is related to δ(n) by

u(n) =n∑

k=−∞δ(k) ,


or alternatively

u(n) =∞∑

k=0

δ(n − k) .

Conversely, the impulse sequence can be expressed as the first backward difference of the unitstep sequence, i.e.,

δ(n) = u(n) − u(n − 1) =

n∑

k=−∞δ(k) −

n−1∑

k=−∞δ(k)

1.1.3 Exponential and sinusoidal sequences

Exponential and sinusoidal sequences play an important role in digital signal processing. Thegeneral form of an exponential sequence is

x(n) = A · αn, (1.1)

where A and α may be complex numbers. For real A > 0 and real 0 < α < 1, x(n) is a decreasingsequence similar in appearance to Figure 1.6.

x(n)

n

Figure 1.6: Exponential Sequence with 0 < α < 1.

For −1 < α < 0, the sequence values alternate in sign but again decrease in magnitude withincreasing n, as shown in Figure 1.7. If |α| > 1, then x(n) grows in magnitude as n increases.

A sinusoidal sequence has the general form

x(n) = A · cos(ωon + φ)

with A and φ real. The sinusoidal sequence is shown in Figure 1.8 for φ = 3π2 .

Now let A and α be complex so that A = |A|ejφ and α = |α|ejω0 , then

x(n) = Aαn = |A|ejφ|α|nejnω0

= |A| · |α|nej(ω0n+φ)

= |A| · |α|n [cos(ω0n + φ) + j sin(ω0n + φ)]

A plot of the real part of x(n) = A ·αn for A = |A|ejω0 is given in Figure 1.9 for |α| > 1. In thiscase we set ω0 = π and φ = 0, i.e.,

ℜx(n) = |A| · |α|n cos(πn)

= |A| · |α|n(−1)n .

1.1. SIGNALS 5

x(n)

n

Figure 1.7: Exponential Sequence with −1 < α < 0.

6

-

x(n)

nπ2ω0

Figure 1.8: Sinusoidal Sequence.

The imaginary part is zero for all n.For |α| = 1, x(n) is referred to as the complex exponential sequence, and has the form

x(n) = |A| · ej(ω0n+φ) = |A| [cos(πn + φ) + j sin(πn + φ)] ,

i.e., real and imaginary parts of ejω0n vary sinusoidally with n.

Note. In the continuous-time case, the quantity ω0 is called the frequency and has the di-mension radians per second. In the discrete-time case n is a dimensionless integer. Thus thedimension of ω0 must be radians. To maintain a closer analogy with the continuous-time case wecan specify the units of ω0 to be radians per sample and the units of n to be samples. Followingthis rationale we can define the period of a complex exponential sequence as 2π/ω0 which hasthe dimension samples.


x(n)

n

Figure 1.9: Real and Imaginary parts of x(n).

1.1.4 Periodic Sequences

A sequence x(n) is called periodic with period N if x(n) satisfies the following condition:

x(n) = x(n + N) ∀ n,

where N is a positive integer. For example, the sequence cos(πn) is periodic with period 2k,k ∈ Z, but the sequence cos(n) is not periodic. Note that a sinusoidal discrete-time sequenceA cos (ω0n) is periodic with period 2π if ω0N = 2πk, k ∈ Z. A similar statement holds for thecomplex exponential sequence C · ejω0n, i.e., periodicity with period N requires

ejω0(n+N) = ejω0n ,

which is true only forω0N = 2πk, k ∈ Z . (1.2)

Consequently, complex exponential and sinusoidal sequences are not necessarily periodic withperiod 2π/ω0 and, depending on the value of ω0, may not be periodic at all. For example, forω0 = 3π/4, the smallest value of N for which equation (1.2) can be satisfied with k an integeris N = 8 ( corresponding to k = 3). For ω0 = 1, there are no integer values of N or k for whichEquation (1.2) is satisfied.

When we combine the condition of Equation (1.2) with the observation that ω0 and ω0+2πr,r ∈ Z, are indistinguishable frequencies, then it is clear that there are only N distinguishablefrequencies for which the corresponding sequences are periodic with period N . One set offrequencies is ωk = 2πk/N , k = 0, 1, . . . , N − 1. These properties of complex exponential andsinusoidal sequences are basic to both the theory and the design of computational algorithmsfor discrete-time Fourier analysis.

1.2 Systems

A system T that maps an input x(n) to an output y(n) is represented by:

y(n) = T[x(n)]

1.2. SYSTEMS 7

- -T[·] y(n)x(n)

Figure 1.10: Representation of a discrete-time system.

Example 1.2.1 The ideal delay system is defined by y(n) = x(n − nd), nd ∈ Z+. The system

would shift the input to the right by nd of samples corresponding to a time delay.

This definition of a system is very broad. Without some restrictions, we cannot completelycharacterise a system when we know the output of a system to a certain set of inputs. Two typesof restrictions greatly simplify the characterisation and analysis of systems; linearity and shiftinvariance (LSI). Many systems can be approximated in practice by a linear and shift invariantsystem. A system T is said to be linear if

T[ax1(n) + bx2(n)] = ay1(n) + by2(n)

holds, where T[x1(n)] = y1(n), T[x2(n)] = y2(n), and a and b are any scalar constants. Theproperty

T[ax(n)] = aT[x(n)]

is called homogeneity, and

T[x1(n) + x2(n)] = T[x1(n)] + T[x2(n)]

is called additivity. Additivity and homogeneity are collectively known as the Principle ofSuperposition. A system is said to be shift invariant (SI) if

T[x(n − n0)] = y(n − n0)

where y(n) = T[x(n)] and n0 is any integer.

Example 1.2.2 A system described by the relation

y(n) =n∑

k=−∞x(k)

is called an accumulator. This system is both linear and shift invariant.

Example 1.2.3 Consider a system which is described by

y(n) = T[x(n)] = x(n) · g(n)

where g(n) is an arbitrary sequence. This system is shown in Figure 1.11. Linearity of thesystem is easily verified since

T [ax1(n) + bx2(n)] = g(n) [ax1(n) + bx2(n)]

= ag(n)x1(n) + bg(n)x2(n)

= aT [x1(n)] + bT [x2(n)] .


×-

6

- y(n)

g(n)

x(n)

T

Figure 1.11: Linear Shift Invariant System.

However, the system is shift variant because

T [x(n − n0)] = x(n − n0)g(n)

does not equaly(n − n0) = x(n − n0)g(n − n0) .

Example 1.2.4 Consider a system described by

y(n) = T [x(n)] = x2(n) .

Although this system is shift-invariant, it is non-linear. The system is non-linear because

T [x1(n) + x2(n)] = [x1(n) + x2(n)]2 ,

which is different fromy1(n) + y2(n) = x2

1(n) + x22(n) .

The system is shift-invariant because

T[x(n − n0)] = x2(n − n0) ,

andy(n − n0) = x2(n − n0) .

1.2.1 The unit sample response of linear systems

Consider a linear system T. Using the relationship

x(n) =∞∑

k=−∞x(k)δ(n − k)

and the linearity property, the output y(n) of a linear system T for an input x(n) is given by

y(n) = T[x(n)]

= T

[ ∞∑

k=−∞x(k)δ(n − k)

]

= T[· · · + x(−1)δ(n + 1) + x(0)δ(n) + x(1)δ(n − 1) + · · · ]

=∞∑

k=−∞x(k)T[δ(n − k)]

1.3. CONVOLUTION 9

This result indicates that a linear system can be completely characterised by the response ofthe system to a unit sample sequence δ(n) and its delays δ(n − k). Moreover, if the system isshift-invariant, we have:

h(n − n0) = T[δ(n − n0)]

where h(n) = T[δ(n)] is the unit sample response of a system T to an input δ(n).

1.3 Convolution

The input-output relation of a linear and shift-invariant (LSI) system T is given by

y(n) = T[x(n)] =∞∑

k=−∞x(k)h(n − k)

This equation shows that knowledge of h(n) completely characterises the system, i.e., knowingthe unit sample response, h(n), of a LSI system T, allows us to completely determine the outputy(n) for a given input x(n). This equation is referred to as convolution and is denoted by theconvolution operator ⋆. For a LSI system, we have

y(n) = x(n) ⋆ h(n)

=∞∑

k=−∞x(k)h(n − k)

Remark 1: The unit sample response h(n) loses its significance for a nonlinear or shift-variantsystem.

Remark 2: The choice of δ(n) as an input in characterising a LSI system is the simplest, bothconceptually and in practice.

Example 1.3.1 A linear transversal filter is given by

y(n) =

q∑

k=0

x(n − k)a(k)

= a(0)x(n) + a(1)x(n − 1) + . . . + a(q)x(n − q)

If we take a(0) = a(1) = . . . = a(q) = 1q+1 , the filter output sequence is an average of q + 1

samples of the input sequence around the nth sample. A transversal filter input and output forq = 3 are shown in Figure 1.12.

Example 1.3.2 To compute the relationship between the input and the output with a transversalfilter, h(−k) is first plotted against k; h(−k) is simply h(k) reflected or “flipped” around k = 0,as shown in Figure 1.13. Replacing k by k − n, where n is a fixed integer, leads to a shift in theorigin of the sequence h(−k) to k = n. y(n) is then obtained by the

∑x(k)h(n− k). See Figure

1.13.

1.4 Stability and Causality

To implement a LSI system in practice (realisability), we have to impose two additional con-straints – stability and causality.


-

k

x(k)

n − 3

n − 3

n

n

-

k

y(k)

Figure 1.12: Transversal Filter input and output for q = 3.

-

6

-

6

--

6

6

x(k)

k

12

3

k

12

h(k)

k

1

n

1

h(−k)

2

y(n)

4

76

Figure 1.13: Convolution operation.

1.4.1 Stability

A system is considered stable in the bounded-input bounded-output (BIBO) sense if and onlyif a bounded input always leads to a bounded output. A necessary and sufficient condition for

1.4. STABILITY AND CAUSALITY 11

an LSI system to be stable is that the unit sample response h(n) is absolutely summable, i.e.,

∞∑

n=−∞|h(n)| < ∞

Such a sequence h(n) is sometimes referred to as stable sequence. The proof is quite simple.First let x(n) be bounded by M , i.e., |x(n)| ≤ M < ∞ ∀ n, then using convolution

|y(n)| =

∣∣∣∣∣

∞∑

k=−∞h(k)x(n − k)

∣∣∣∣∣

≤∞∑

k=−∞|h(k)||x(n − k)|

≤ M∞∑

k=−∞|h(k)|

So given a bounded input |x(n)| ≤ M , the output y(n) is bounded if and only if h(n) is absolutelysummable.

1.4.2 Causality

Causality is a fundamental law of physics stating that information cannot travel from the futureto the past. In the context of systems we say that a system is causal if and only if the currentoutput y(n) does not depend on any future values of the input such as x(n + 1), x(n + 2), . . ..Causality holds for any practical system regardless of linearity or shift invariance.

A necessary and sufficient condition for an LSI system to be causal is that h(n) = 0, forn < 0. The proof follows from convolution,

y(n) =∞∑

k=−∞x(n − k)h(k).

To obtain y, at index n, causality implies that we cannot assume knowledge of x at index n− kfor n − k > n, or simply, for k < 0. Setting h(k) = 0 for k < 0 ensures this.

Chapter 2

Digital Processing ofContinuous-Time Signals

Discrete-time signals arise in many ways, but they most commonly occur in representationsof continuous-time signals. This is partly due to the fact that the processing of continuous-time signals is often carried out on discrete-time sequences obtained by sampling continuous-time signals. It is remarkable that under reasonable constraints a continuous-time signal canbe adequately represented by a sequence. In this chapter we discuss the process of periodicsampling.

2.1 Periodic Sampling

Let xc(t) be a continuous-time signal and its Fourier transform be denoted by Xc(jΩ)

Xc(jΩ) =

∫ +∞

−∞xc(t)e

−jΩtdt .

The signal xc(t) is obtained from Xc(jΩ) by

xc(t) =1

2π

∫ +∞

−∞Xc(jΩ)e+jΩtdΩ .

The typical method of obtaining a discrete-time representation of a continuous-time signal isthrough periodic sampling, wherein a sequence of samples x(n) is obtained from the continuous-time signal xc(t) according to

x(n) = xc(t)|t=nT = xc(nT ), n = 0,±1,±2, . . . , T > 0.

Herein, T is the sampling period, and its reciprocal, fs = 1/T , is the sampling frequency, insamples per second and Ωs = 2πfs is the sampling frequency in rad/sec. This equation is theinput-output relationship of an ideal A/D converter. It is generally not invertible since manycontinuous-time signals can produce the same output sequence of samples. This ambiguity canbe removed by restricting the class of input signals to the sampler, however.

It is convenient to represent the sampling process in two stages, as shown in Figure 2.1. Thisconsists of an impulse train modulator which is followed by conversion of the impulse train intoa discrete sequence. The system G in Figure 2.1 represents a system that converts an impulse

13

14 CHAPTER 2. DIGITAL PROCESSING OF CONTINUOUS-TIME SIGNALS

×-

6

- G -xc(t)

p(t)

xs(t)

x(n) = xc(nT )

Figure 2.1: Continuous-Time to digital converter.

train into a discrete-time sequence. The modulation signal p(t) is a periodic impulse train,

p(t) =∞∑

n=−∞δ(t − nT ) ,

where δ(t) is called the Dirac delta function. Consequently, we have

xs(t) = xc(t) · p(t) = xc(t) ·∞∑

n=−∞δ(t − nT ) =

∞∑

n=−∞xc(nT )δ(t − nT ) =

∞∑

n=−∞x(n)δ(t − nT ).

The Fourier transform of xs(t), Xs(jΩ), is obtained by convolving Xc(jΩ) and P (jΩ). TheFourier transform of a periodic impulse train is a periodic impulse train, i.e.,

P (jΩ) =2π

T

∞∑

k=−∞δ(Ω − 2πk

T) =

2π

T

∞∑

k=−∞δ(Ω − kΩs).

Since

Xs(jΩ) =1

2πXc(jΩ) ∗ P (jΩ),

it follows that

Xs(jΩ) =1

T

∞∑

k=−∞Xc

(

j

(

Ω − 2πk

T

))

=1

T

∞∑

k=−∞Xc (jΩ − kjΩs) .

The same result can be obtained by noting that p(t) is periodic so that it can be representedas a Fourier series with coefficients 1

T , i.e.,

p(t) =1

T

∞∑

n=−∞ejnΩst.

Thus

xs(t) = xc(t)p(t) =1

T

∞∑

n=−∞xc(t)e

jnΩst.

Taking the Fourier transform of xs(t) leads to

Xs(jΩ) =1

T

∞∑

k=−∞Xc (j (Ω − kΩs)) .

2.2. RECONSTRUCTION OF BAND-LIMITED SIGNALS 15

Figure 2.2 depicts the frequency domain representation of impulse train sampling. Figure2.2(a) represents the Fourier transform of a band-limited signal where the highest non-zerofrequency component in Xc(jΩ) is at ΩB. Figure 2.2(b) shows the periodic impulse train P (jΩ),Figure 2.2(c) shows Xs(jΩ), which is the result of convolving Xc(jΩ) with P (jΩ). From Figure2.2(c) it is evident that when

Ωs − ΩB > ΩB , or Ωs > 2ΩB

the replica of Xc(jΩ) do not overlap, and therefore xc(t) can be recovered from xs(t) with anideal low-pass filter with cut-off frequency of at least ΩB but not exceeding ΩS−ΩB. However, inFigure 2.2(d) the replica of Xc(jΩ) overlap, and as a result the original signal cannot be recoveredby ideal low pass filtering. The reconstructed signal is related to the original continuous-timesignal through a distortion referred to as aliasing.

If xc(t) is a band-limited signal with no components above ΩB rad/sec, then the samplingfrequency has to be equal to or greater than 2ΩB rad/sec. This is known as the Nyquist criterionand 2ΩB is known as the Nyquist frequency.

2.2 Reconstruction of Band-Limited Signals

If x(n) is the input to an ideal low pass filter with frequency response Hr(jΩ) and impulseresponse hr(t), then the output of the filter will be

xr(t) =∞∑

n=−∞x(n)hr(t − nT ) .

where T is the sampling interval. The reconstruction filter commonly has a gain of T and acutoff frequency of Ωc = Ωs/2 = π/T . This choice is appropriate for any relationship betweenΩs and ΩB that avoids aliasing, i.e., so long as Ωs > 2ΩB. The impulse response of such a filteris given by

hr(t) =sin (πt/T )

πt/T.

Consequently, the relation between xr(t) and x(n) is given by

xr(t) =∞∑

n=−∞x(n)

sin[π(t − nT )/T ]

π(t − nT )/T,

with xc(nT ) = x(n).

2.3 Discrete-time Processing of Continuous-Time Signals.

A continuous-time signal is processed by digital signal processing techniques using analog-to-digital (A/D) and digital-to-analog (D/A) converters before and after processing. This conceptis illustrated in Figure 2.3. The low pass filter limits the bandwidth of the continuous-timesignal to reduce the effect of aliasing. The spectral characteristics (magnitude) of the low passfilter is shown in Figure 2.5.

Example 2.3.1 Assume that we wanted to restore the signal sc(t) from yc(t) = sc(t) + nc(t),where nc(t) is a background noise. The spectral characteristics of these signals are shown inFigure 2.4.


Ω

Ω

ΩΩ Ω

Ω

X (j )

P(j )

X (j )

1

−

Ω

Ω−2 Ω Ω Ω2

Ω− Ω

c

s

s sss

T

sB

BB

−−−

1

1

(a)

(b)

(c)

(d)

Ω

Ω

s

T

T−−−

−−−2π

X (j )sΩ

Ω

Figure 2.2: Effect in the frequency domain of sampling in the time domain: (a) Spectrum ofthe continuous-time signal, (b) Spectrum of the sampling function, (c) Spectrum of the sampledsignal with Ωs > 2ΩB, (d) Spectrum of the sampled signal with Ωs < 2ΩB.

We can perform the processing for this problem using either continuous-time or digital tech-niques. Using a continuous-time approach, we would filter the signal sc(t) using an continuous-time low pass filter. Alternatively, we could adopt a digital approach, as shown in Figure 2.3.We would first low pass filter yc(t) to prevent aliasing, and then convert the continuous-timesignal into a discrete-time signal using an A/D converter. Any processing is then performed

2.3. DISCRETE-TIME PROCESSING OF CONTINUOUS-TIME SIGNALS. 17

Analog

(or Continuous

-Time) Signal

D/A

Processed

Continuous-Time

Signal

- - - - -Analog

Low pass

Filter

A/D

Digital

Signal

Processing

Figure 2.3: Discrete-time processing of continuous-time signals.

digitally on the discrete-time signal. The processed signal is then converted back to continuousform using an D/A converter.

-

6 6

-

-

6

1 1

1

Ω Ω

ΩΩB ΩB1-ΩB1 ΩB

-ΩB1 -ΩB ΩB ΩB1+ΩB-ΩB

Sc(jΩ) Nc(jΩ)

Yc(jΩ)

Figure 2.4: Frequency Spectra.

6

---

ΩB-ΩB 0 Ω

H(jΩ)Processed

Signalyc(t)

Figure 2.5: Continuous-Time Filter.

The advantages of processing signals digitally as compared with an analog approach are:


1. Flexibility, if we want to change the continuous-time filter because of change in signal andnoise characteristics, we would have to change the hardware components. Using the digitalapproach, we only need to modify the software.

2. Better control of accuracy requirements. Tolerances in continuous-time circuit componentsmake it extremely difficult for the system designer to control the accuracy of the system.

3. The signals can be stored without deterioration or loss of signal quality.

4. Lower cost of the digital implementation.

Chapter 3

Filter Design

3.1 Introduction

Filters are an important class of linear time-invariant systems. Their objective is to modify theinput signal. In practice filters are frequency selective, meaning they pass certain frequencycomponents of the input signal and totally reject others. The objective of this chapter and thefollowing ones is to introduce the concepts of frequency selective filters and the most popularfilter design techniques.

The design of filters (continuous-time or digital) involves the following three steps.

1. Filter specification

This step consists of setting the required characteristics of the filter. For example, torestore a signal which has been degraded by background noise, the filter characteristics willdepend on the spectral characteristics of both the signal and background noise. Typically(but not always) specifications are given in the frequency domain.

2. Filter design

Determine the unit sample response, h(n) of the filter or its system function, H(z) andH(ejω).

3. Filter implementation

Implement a discrete time system with the given h(n) or H(z) or one which approachesit.

These three steps are closely related. For instance, it does not make much sense to specify afilter that cannot be designed or implemented. So, we shall consider a particular class of digitalfilters which possess the following practical properties:

1. h(n) is real.

2. h(n) is causal, i.e., h(n) = 0 for n < 0.

3. h(n) is stable, i.e,∞∑

n=0

|h(n)| < ∞ .

19

20 CHAPTER 3. FILTER DESIGN

In a practical setting, the desired filter is often implemented digitally on a chip and is usedto filter a signal derived from a continuous-time signal by means of periodic sampling followedby analog-to-digital conversion. For this reason, we refer to digital filters as discrete-time filterseven though the underlying design techniques relate to the discrete-time nature of the signalsand systems.

A/D Discrete-Time System D/A-xc(t)

-x(n)

-y(n)

-yc(t)

6

T

6

T

Figure 3.1: A basic system for discrete-time filtering of a continuous-time signal.

A basic system for discrete-time filtering of continuous-time signals is given in Figure 3.1.Assuming the input, xc(t), is band-limited and the sampling frequency is high enough to avoidaliasing, the overall system behaves as a linear time-invariant system with frequency response

Heff (jΩ) =

H(ejΩT ), |Ω| < π/T0, |Ω| > π/T

In such cases, it is straightforward to convert from specifications on the effective continuous-timefilter to specifications on the discrete-time filter through the relation ω = ΩT . That is H(ejω)is specified over one period by the equation

H(ejω) = Heff

(

jω

T

)

, |ω| < π .

This concept will be treated in detail in Chapter 5.

3.2 Digital Filters

Throughout the text we will interchangeably use system and filter.Linear time-invariant discrete-time systems can be characterised by linear constant coefficientdifference equations with initial rest condition of the type

M∑

k=0

aky(n − k) =

N−1∑

r=0

brx(n − r), (3.1)

where x(n) and y(n), n = 0,±1, . . . are the input and output of the system and max(N − 1, M)is the order.

The unit sample response of the system may be of either finite duration or infinite duration.It is useful to distinguish between these two classes as the properties of the digital processingtechniques for each differ. If the unit sample response is of finite duration, we have a finiteimpulse response (FIR) filter, and if the unit sample response is of infinite duration, we have aninfinite impulse response (IIR) filter.

If M = 0 so that

y(n) =1

a0

N−1∑

r=0

brx(n − r) (3.2)

3.3. LINEAR PHASE FILTERS 21

then the system corresponds to an FIR filter. In fact comparison with the convolution sum

y(n) =∞∑

k=−∞h(k)x(n − k)

shows that Equation (3.2) is identical to the convolution sum, hence it follows that

h(n) =

bna0

, n = 0, . . . , N − 1

0, otherwise.

An FIR filter can be always described by a difference equation of the form (3.1) with M = 0.M must be greater than zero for an IIR filter.

3.3 Linear Phase Filters

A digital filter with unit sample response h(n) is said to have linear phase if H(ejω) can beexpressed as

H(ejω) = HM (ejω)e−jωα |ω| < π

where HM (ejω) is a real function of ω and α is a constant. The significance of a linear phasefilter lies in the result that if a discrete-time signal is filtered by a linear-shift-invariant (LSI)system whose transfer function has magnitude HM (ejω) = 1 and linear phase (see Figure 3.2),then the spectrum of the output is

HM (ejω) = 1, α-s(n) - y(n)

Figure 3.2: Linear and time-invariant filtering of s(n).

S(ejω)e−jωα ↔ s(n − α) = y(n) .

Simply put, a linear phase filter preserves the shape of the signal. Should the filter have theproperty that |H(ejω)| = 1, but does not have linear phase, then the shape of the signal will bedistorted. We generalise the notion of linear phase to

H(ejω) = HM (ejω)e−jωαejβ = HM (ejω)e−jφ(ω) , φ(ω) = −β + ωα

where HM (ejω) is a real function of ω, and α and β are constants. The case where β = 0 isreferred to as linear phase as opposed to generalised linear phase where we have

H(ejω) = HM (ejω) cos(β − ωα) + jHM (ejω) sin(β − ωα)

=∞∑

n=−∞h(n)e−jωn

=∞∑

n=−∞h(n) cos(ωn) − j

∞∑

n=−∞h(n) sin(ωn), h(n) real


leading to

tan(β − ωα) =sin(β − ωα)

cos(β − ωα)= −

∞∑

n=−∞h(n) sin(ωn)

∞∑

n=−∞h(n) cos(ωn)

and ∞∑

n=−∞h(n) sin(ω(n − α) + β) = 0. (3.3)

To have φ(ω) = −β + ωα, (3.3) must hold for all ω, and is a necessary, but not sufficient,condition on h(n), α and β. For example, one can show that

β = 0 or π , 2α = N − 1 , h(2α − n) = h(n) (3.4)

satisfy (3.3). Alternatively

β =π

2, or

3π

2, 2α = N − 1 , h(2α − n) = −h(n) (3.5)

also satisfies (3.3).To have a causal system we require

∞∑

n=0

h(n) sin(ω(n − α) + β) = 0

to hold for all ω. Causality and the previous conditions (3.4) and (3.5) imply that h(n) = 0, n < 0and n > N−1, i.e., causal FIR filters have generalised linear phase if they have impulse responselength N and satisfy either h(2α − n) = h(n) or h(2α − n) = −h(n). Specifically if,

h(n) =

h(N − 1 − n), 0 ≤ n ≤ N − 1,0, otherwise ,

thenH(ejω) = He(e

jω)e−jω(N−1)/2 ,

where He(ejω) is a real, even, periodic function of ω. Similarly if

h(n) =

−h(N − 1 − n), 0 ≤ n ≤ N − 1 ,0, otherwise ,

thenH(ejω) = jHO(ejω)e−jω(N−1)/2 = HO(ejω)e−jω(N−1)/2+jπ/2 ,

where HO(ejω) is a real, odd, periodic function of ω. The above two equations for h(n) aresufficient, but not necessary conditions, which guarantee a causal system with generalised linearphase.

3.3.1 Generalised Linear Phase for FIR Filters

We shall restrict our discussion to linear phase FIR filters. We classify FIR filters into fourtypes depending on whether h(n) is symmetric (h(n) = h(N −1−n)) or anti-symmetric (h(n) =−h(N − 1−n)) about the point of symmetry or anti-symmetry, and whether the filter length isodd or even. The four types of FIR filters and their associated properties are discussed belowand summarised in Table 3.1. Figure 3.3 shows examples of the four types of FIR filters.


Type I FIR linear phase filter. A type I filter has a symmetric impulse response

h(n) = h(N − 1 − n), 0 ≤ n ≤ N − 1

with N an odd integer. The corresponding frequency response is

H(ejω) =N−1∑

n=0

h(n)e−jωn = e−jω N−12

N−12∑

k=0

a(k) cos(ωk)

,

where a(0) = h(N−12 ) and a(k) = 2h(N−1

2 − k), k = 1, 2, . . . , N−12 .

Type II FIR linear phase filter. A type II filter has a symmetric impulse response with Nan even integer. The corresponding frequency response is

H(ejω) = e−jω N−12

N/2∑

k=1

b(k) cos[ω(k − 1/2)]

,

whereb(k) = 2h(N/2 − k), k = 1, 2, . . . , N/2.

Type III FIR linear phase filter. A type III filter has an antisymmetric impulse response

h(n) = −h(N − 1 − n), 0 ≤ n ≤ N − 1

with N an odd integer. The corresponding frequency response is

H(ejω) = je−jω N−12

(N−1)/2∑

k=1

c(k) sin(ωk)

,

wherec(k) = 2h((N − 1)/2 − k), k = 1, 2, . . . , (N − 1)/2.

Type IV FIR linear phase filter. A type IV filter has an antisymmetric impulse responsewith N an even integer. The corresponding frequency response is

H(ejω) = je−jω N−12

N/2∑

k=1

d(k) sin[ω(k − 1/2)]

,

whered(k) = 2h(N/2 − k), k = 1, 2, . . . , N/2.


Table 3.1: The four classes of FIR filters and their associated properties

Type Symmetry of N α β Form of Constraints onh(n) HM (ejω) HM (ejω) · ejβ

I h(n) = h(N − 1 − n) odd N−12 0

N−12∑

n=0

a(n) · cos ωn real

symmetric π

II h(n) = h(N − 1 − n) even N−12 0

N2∑

n=1

b(n) · cos ω(n − 1

2) real

symmetric π HM (ejπ) = 0

III h(n) = −h(N − 1 − n) odd N−12 π/2

N−12∑

n=1

c(n) · sinωn Pure Imaginary

anti-symmetric 3π/2 HM (ej0) = HM (ejπ) = 0

IV h(n) = −h(N − 1 − n) even N−12 π/2

N2∑

n=1

d(n) · sin ω(n − 1

2) Pure Imaginary

anti-symmetric 3π/2 HM (ej0) = 0

Remarks

1. A filter will have no phase distortion if H(ejω) is real and even since h(n) will be real andeven.

2. Non-causal FIR filters can be made causal by an appropriate time shift so that the firstnon-zero value occurs at n = 0. The point of symmetry can therefore occur at two discrete-time points N−1

2 depending on whether N was odd or even.

Example 3.3.1 Example of a type I FIR filter. Given a filter with unit sample response

h(n) =

1, 0 ≤ n ≤ 40, otherwise

,

we can evaluate the frequency response to be

H(ejω) =4∑

n=0

e−jωn =1 − e−j5ω

1 − e−jω= e−j2ω sin(5 · ω/2)

sin(ω/2).

The magnitude and the phase of the frequency response of the filter are depicted in Figure 3.4

Example 3.3.2 Example of a type II FIR filter.Consider an extension, by one sample of the impulse response of the filter in example 3.3.1.

Then the frequency response of the system is given by

H(ejω) =5∑

n=0

e−jωn = e−jω 52sin(3 · ω)

sin(ω/2),

whose magnitude and phase are depicted in Figure 3.5


-

6

-

6

6

-

-

6

n

12

n

3

123

h(n)

h(n)

n

1

h(n)

2

-2-1

n-1

-2-3

1

3

h(n)

2

TYPE I

TYPE II

TYPE III

TYPE IV

Figure 3.3: Examples of the four types of linear phase FIR filters

Example 3.3.3 Example of a type III FIR filter. Consider a filter with unit impulse response

h(n) = δ(n) − δ(n − 2) ,

where δ(n) is Kronecker’s delta defined as

δ(n) =

1 , n = 00 , n 6= 0

.

We can then evaluate the frequency response as

H(ejω) = 1 − e−j2ω = j[2 sin(ω)]e−jω = 2e−j(ω−π2 ) sin(ω).

The magnitude and the phase of H(ejω) are depicted in Figure 3.6.


0 1 2 3 4 5 60

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

w rad/s

Mag

nitu

de

0 1 2 3 4 5 6−4

−3

−2

−1

0

1

2

3

4

w rad/s

Pha

se (

Rad

ians

)

Figure 3.4: Frequency response of type I filter of Example 9.3.1. Magnitude (left). Phase (right).

0 1 2 3 4 5 60

1

2

3

4

5

6

w rad/s

Mag

nitu

de

0 1 2 3 4 5 6−4

−3

−2

−1

0

1

2

3

4

w rad/s

Pha

se (

Rad

ians

)

Figure 3.5: Frequency response of type II filter of Example 9.3.2. Magnitude (left). Phase(right).

0 1 2 3 4 5 60

0.5

1

1.5

2

2.5

3

w rad/s

Mag

nitu

de

0 1 2 3 4 5 6−3

−2

−1

0

1

2

3

w rad/s

Pha

se (

Rad

ians

)

Figure 3.6: Frequency response of type III filter of Example 9.3.3. Magnitude (left). Phase(right).

3.4. FILTER SPECIFICATION 27

Example 3.3.4 Example of a type IV FIR filter. Consider a filter with unit impulse response

h(n) = δ(n) − δ(n − 1).

We can then evaluate the frequency response as

H(ejω) = 1 − e−jω = j[2 sin(ω/2)]e−jω/2 = 2e−j(ω2−π

2 ) sin(ω

2

)

.

The magnitude and the phase of H(ejω) are depicted in Figure 3.7.

0 1 2 3 4 5 60

0.5

1

1.5

2

2.5

3

w rad/s

Mag

nitu

de

0 1 2 3 4 5 6−3

−2

−1

0

1

2

3

w rad/s

Pha

se (

Rad

ians

)

Figure 3.7: Frequency response of type IV filter of Example 9.3.4. Magnitude (left). Phase(right).

3.4 Filter Specification

Like continuous-time filters, digital filters are specified in the frequency domain. Since H(ejω) =H(ej(ω+2π)) for all ω, H(ejω) is completely specified for 0 ≤ ω < 2π. In addition, since h(n) isreal, we have H(ejω) = H(e−jω)∗. This implies that specifying H(ejω) for 0 ≤ ω < π completelyspecifies H(ejω) for all ω. To uniquely specify the complex frequency response H(ejω), we needto specify both the magnitude and the phase of H(ejω). However, since we are dealing withlinear phase FIR filters, we need only to specify the magnitude.

The ideal structure (magnitude of the frequency response) of a filter is given in Figure 3.8for the low-pass, high-pass, band-pass and stop-band.

Example 3.4.1 Specification of a low-pass filter.The ideal low-pass filter is composed of a pass-band and stop-band region. In practice we also

have a transition band, as a sharp transition cannot be achieved. Ideally, |H(ejω)| is unity inthe passband and zero in the stopband. In practice, we supply error tolerances that require

1 − α1 ≤ |H(ejω)| ≤ 1 + α1 for 0 ≤ ω ≤ ωp ,

and|H(ejω)| < α2 for ωs ≤ ω ≤ π ,

where α1 and α2 are called the passband tolerance and stopband tolerance, respectively. A low-pass filter is then completely specified by 4 parameters α1, α2, ωp and ωs. Figures 3.9 and 3.10illustrate the concept of tolerance bands in filter design.


6

-

6

-

6

-

6

-

1

π ω

1

π ω

1

π ω

1

π ω

ω1 ω2

ω1 ω2

|H(ejω)|

|H(ejω)|

|H(ejω)|

|H(ejω)|

Figure 3.8: Frequency response of ideal low-pass, high-pass, band-pass, and band-stop filters.

A similar arrangement holds for high pass, bandstop and band pass filters (see Figure 3.11).

3.5 Properties of IIR Filters

As discussed an IIR filter has a unit sample response of infinite duration. In general, an IIRfilter with arbitrary unit sample response h(n) cannot be realised, since computing each outputsample requires a large amount of arithmetic operations. Therefore we restrict ourselves to theclass of filters which:

1. have a unit sample response h(n) that is real, causal, and stable.

3.5. PROPERTIES OF IIR FILTERS 29

Figure 3.9: Specifications of a low-pass filter.

? -

6|H(ejω)|

ωp ωs

Transition

Band

ω

Figure 3.10: Low-pass filter with transition band.

2. possess a rational z-transform with a0 = 1, i.e.,

H(z) =

N−1∑

k=0

bkz−k

1 −M∑

r=1

akz−k

.

A causal h(n) with a rational z-transform can be realised by a linear constant coefficientdifference equation (LCCDE) with initial rest conditions (IRC) where the output is com-puted recursively. Thus IIR filters can be implemented efficiently.

Example 3.5.1 Consider a causal impulse response described by

h(n) =

(1

2

)n

· u(n) ,

with H(z) given by

H(z) =1

1 − 12z−1

.


........................................................................................................................

................ ........................................................................................................

................ ................................................................................................ ................ ................................................................................................ ................ ................................................................................................ ................ ................................................................................................

................ ................................................................................................................ ................................................................................................

................ ................................................................................................................ ................................................................................................

................ ................................................................................................................ ................................................................................................

................ ................................................................................................ ................ ................................................................................................ ................ ................................................................................................ ................ ................................................................................................

................ ................................................................................................ ................ ................................................................................................ ................ ................................................................................................ ................ ................................................................................................

................ ................................................................................................ ........

................ ................................................................................................ ........

6

-

6

-

................ ................................................................................

................ ................................................................................

................ ................................................................................................ ........

................ ................................................................................................

................ ........................................................................................................

........

6

-................ ................................................................................................ ................................................................................................ ........

6

-................ ................................................................................................ ........

π ω

π ω

|H(ejω)|

|H(ejω)|

π ω

|H(ejω)|

π ω

|H(ejω)|

1 + α1

1 − α1

1 + α1

1 − α1

1 − α1

1 + α1

1 + α1

1 − α1

ωp ωs

ωs ωp

ωs1 ωp1 ωp2 ωs2

ωp1 ωs1 ωs2 ωp2

α2

α2

α2

α2

(a)

(b)

(c)

(d)

Figure 3.11: Tolerance scheme for digital filter design. (a) Low-pass. (b) High pass. (c)Bandpass. (d) Stopband.

This can be realised by

y(n) =1

2y(n − 1) + x(n)

assuming IRC. Note that although h(n) is of infinite extent, the number of arithmetic operationsrequired per output is only one multiplication and one addition using a LCCDE formulation.

3.5.1 Stability

An FIR is stable if h(n) is finite for all n, therefore stability is not a problem in practical designor implementation. With an IIR filter, all the poles of H(z) must lie inside the unit circle toensure filter stability.

3.5. PROPERTIES OF IIR FILTERS 31

3.5.2 Linear phase

It is simple to achieve linear phase with FIR filters, to the point that we require all FIR filtersto possess this property. It is much more difficult to control the phase of an IIR filter. As aresult, we specify only the magnitude response for the IIR filter and accept the resulting phasecharacteristics.

Chapter 4

Finite Impulse Response FilterDesign

4.1 Introduction

The filter design problem is that of determining h(n), H(ejω) or H(z) to meet the design speci-fication. The first step is to choose the type of filter from the previously discussed four types ofFIR filters.

Some types of filters may not be suitable for a given design specification. For example,from Table 3.1 a Type II filter is not suitable for a high-pass filter design as it will always haveH(ejω) = 0 at ω = π since cos π(n − 1/2) = 0. Table 4.1 shows what type of filters are notsuitable for a given class of filter.

Table 4.1: Suitability of a given class of filter for the four FIR types

Type Low-pass High-pass Band-pass Band stop

I

II Not suitable Not suitable

III Not suitable Not suitable Not suitable

IV Not suitable Not suitable

The two standard approaches for FIR filter design are:

1. The window method.

2. The optimal filter design method.

We now discuss each of these approaches in turn.

4.2 Design of FIR Filters by Windowing

Many idealised systems are defined by piecewise-constant or piecewise-functional frequency re-sponses with discontinuities at the boundaries between bands. As a result, they have non causalimpulse responses that are infinitely long. The most straightforward approach for obtaining acausal FIR approximation to such systems is to truncate the ideal response.

33

34 CHAPTER 4. FINITE IMPULSE RESPONSE FILTER DESIGN

Let Hd(ejω) be the desired frequency response of the filter. Then

Hd(ejω) =

∞∑

n=−∞hd(n)e−jωn ,

where hd(n) is the corresponding unit sample response sequence given by

hd(n) =1

2π

∫ π

−πHd(e

jω)ejωndω .

The simplest way to obtain a causal FIR filter from hd(n) is to define a new system with unitsample response

h(n) =

hd(n), 0 ≤ n ≤ N − 1 ,0, elsewhere ,

which can be re-written ash(n) = hd(n) · w(n) ,

and therefore

H(ejω) =1

2π

∫ π

−πHd(e

jθ) · W (ej(ω−θ))dθ .

Note that H(ejω) represents a smeared version of Hd(ejω). If hd(n) is symmetric (or anti-

symmetric) with respect to a certain point of symmetry (or anti-symmetry) and w(n) is alsosymmetric with respect to the same point of symmetry, then hd(n) · w(n) is the unit sampleresponse of a linear phase filter.

Example 4.2.1 Low-pass filter design with specifications α1,α2, ωp, ωs.We choose a Type I filter for the design. Since the desired frequency response Hd(e

jω) shouldbe consistent with the type of filter used, from Table 3.1, Hd(e

jω) is given by

Hd(ejω) =

e−jωα, 0 ≤ |ω| ≤ ωc ,0, ωc ≤ |ω| ≤ π

where α = N−12 , ωc =

ωp+ωs

2 , and N is the filter length (odd). The inverse Fourier transform ofHd(e

jω) is

hd(n) =sin ωc(n − α)

π(n − α)

If we apply a rectangular window to hd(n), the resulting filter impulse response h(n) is

h(n) =

sin ωc(n−α)

π(n−α) , for 0 ≤ n ≤ N − 1 ,

0 , otherwise .

The sequence h(n) is shown in Figure 4.1 for N = 7 and ωc = π/2. Note that this filter may ormay not meet the design specification.

4.3. IMPORTANCE OF THE WINDOW SHAPE AND LENGTH 35

-

6

.

h(n)

n−0.106 -0.106

0.318 0.318

0.5

Figure 4.1: The sequence h(n).

4.3 Importance of the Window Shape and Length

In the time-domain we haveh(n) = hd(n) · w(n) ,

which corresponds in the frequency domain to

H(ejω) = Hd(ejω) ©⋆ W (ejω)

=1

2π

∫ π

−πHd(e

jθ)W (ej(ω−θ))dθ (4.1)

If we do not truncate at all so that w(n) = 1 for all n, W (ejω) is a periodic impulse train withperiod 2π, and therefore H(ejω) = Hd(e

jω). This interpretation suggests that if w(n) is chosenso that W (ejω) is concentrated in a narrow band of frequencies around ω = 0, then H(ejω)approximates Hd(e

jω) very closely except where Hd(ejω) changes very abruptly. Thus, the

choice of window is governed by the desire to have w(n) as short as possible in duration so as tominimise computation in the implementation of the filter while having W (ejω) approximate animpulse; that is, we want W (ejω) to be highly concentrated in frequency so that the convolution(4.1) faithfully reproduces the desired frequency response. These are conflicting requirements.Take, for example, the case of the rectangular window, where

W (ejω) =

N−1∑

n=0

e−jωn =1 − e−jωN

1 − e−jω= e−jω(N−1)/2 sin(ωN/2)

sin(ω/2).

The magnitude of the function sin(ωN/2)/ sin(ω/2) is shown in Figure 4.2 for the case N = 8.Note that W (ejω) for the rectangular window has a generalised linear phase. As N increases,the width of the “main-lobe” decreases. The main-lobe is usually defined as the region betweenthe first zero crossing on either side of the origin. For the rectangular window, the main-lobewidth is ∆ω = 4π/N . However, for the rectangular window, the side lobes are large and, infact, as N increases, the peak amplitudes of the main-lobe and the side-lobes grow in a mannersuch that the area under each lobe is a constant even though the width of each lobe decreaseswith N . Consequently, as W (ej(ω−θ)) “slides by” a discontinuity of Hd(e

jω) with increasing ω,the integral of W (ej(ω−θ))Hd(e

jω) will oscillate as each side-lobe of W (ej(ω−θ)) moves past thediscontinuity.


−2 0 2 4 6 80

1

2

3

4

5

6

7

8

9

10

w rad/s

Mag

nitu

dePeak sidelobe

Mainlobe

Figure 4.2: Magnitude of the Fourier transform of a rectangular window (N = 8).

We can modify this effect by tapering the window smoothly to zero at each end so thatthe height of the side-lobes are diminished; however this is achieved at the expense of a widermain-lobe and thus a wider transition at the discontinuity.

Five common windows are used in practice: the rectangular, Bartlett, Hanning, Hammingand Blackman window. The formula for these windows are given below.

Rectangular

w(n) =

1, 0 ≤ n ≤ N − 1 ,0, elsewhere.

Bartlett

w(n) =

2n

N − 1, 0 ≤ n ≤ N − 1

2,

2 − 2n

N − 1,

N − 1

2≤ n ≤ N − 1 .

Hanning

w(n) =1

2

[

1 − cos(2πn

N − 1)

]

, 0 ≤ n ≤ N − 1 .

Hamming

w(n) = 0.54 − 0.46 cos

(2πn

N − 1

)

, 0 ≤ n ≤ N − 1 .

Blackman

w(n) = 0.42 − 0.5 cos

(2πn

N − 1

)

+ 0.08 cos

(4πn

N − 1

)

, 0 ≤ n ≤ N − 1 .

Figure 4.3 compares these window functions for N = 200. A good choice of the windowcorresponds to choosing a window with small main-lobe width and small side-lobe amplitude.Clearly, the shape of a window affects both main-lobe and side-lobe behavior.

Figures 4.4, 4.5, 4.6, 4.7 and 4.8 show these windows in the frequency domain for N = 51,using a 256-point FFT. For comparison, Figures 4.9 and 4.10 show |W (ejω)| of the Hamming


0 50 100 150 2000

0.2

0.4

0.6

0.8

1

1.2

n

w(n

)

Rectangular

Bartlett

Hamming

Blackman

Hanning

Figure 4.3: Commonly used windows for FIR filter design N = 200.

window for N = 101 and 201, respectively. This shows that main-lobe behavior is affectedprimarily by window length.

The effect of the shape and size of the window can be summarised as follows:

1. Window shape controls side-lobe as well as main-lobe behavior.

2. Window size controls main-lobe behavior only.

Since side-lobe behavior and therefore passband and stop-band tolerance is affected only bythe shape of the window, the passband and stop-band requirements dictate which window ischosen. The window size is then determined from the transition width requirements. Table 4.2(column 1 and 2) contains information useful in choosing the shape and length of the window.

Table 4.2: Properties of different windows.

Peak Amplitude Approximate Width (rad) Peak ApproximationWindow of Side-lobe (dB) of main-lobe Error (dB)

Type (relative) ∆ωm 20 log10(min(α1, α2))

Rectangular -13 4π/N -21

Bartlett -25 8π/(N − 1) -25

Hanning -31 8π/(N − 1) -44

Hamming -41 8π/(N − 1) -53

Blackman -57 12π/(N − 1) -74

Example 4.3.1 Filter design problem.


0 0.5 1 1.5 2 2.5 3 3.5-60

-50

-40

-30

-20

-10

0

w rad/s

20 lo

g|W

|

Figure 4.4: Fourier transform (magnitude) of rectangular window (N = 51).

0 0.5 1 1.5 2 2.5 3 3.5-160

-140

-120

-100

-80

-60

-40

-20

0

w rad/s

20 lo

g|W

|

Figure 4.5: Fourier transform (magnitude) of Bartlett (triangular) window (N = 51).

0 0.5 1 1.5 2 2.5 3 3.5-140

-120

-100

-80

-60

-40

-20

0

w rad/s

20 lo

g|W

|

Figure 4.6: Fourier transform (magnitude) of Hanning window (N = 51).


0 0.5 1 1.5 2 2.5 3 3.5-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

w rad/s

20 lo

g|W

|

Figure 4.7: Fourier transform (magnitude) of Hamming window (N = 51)

0 0.5 1 1.5 2 2.5 3 3.5-140

-120

-100

-80

-60

-40

-20

0

w rad/s

20 lo

g|W

|

Figure 4.8: Fourier transform (magnitude) of Blackman window (N = 51).

0 0.5 1 1.5 2 2.5 3 3.5-120

-100

-80

-60

-40

-20

0

w rad/s

20 lo

g|W

|

Figure 4.9: Fourier transform of a Hamming window with window length of N = 101.


0 0.5 1 1.5 2 2.5 3 3.5-120

-100

-80

-60

-40

-20

0

w rad/s

20 lo

g|W

|

Figure 4.10: Fourier transform of a Hamming window with window length of N = 201.

We have to design a low-pass filter with a required stop-band attenuation of −50dB. Whichfilter is most appropriate?

The third column of Table 4.2 contains information on the best stop-band (or passband)attenuation we can expect from a given window shape. Therefore, we can see that the rectangular,Bartlett and Hanning windows will not meet the specifications. The Hamming window may justmeet the requirements, but the Blackman window will clearly be the most appropriate.

From the second column of Table 4.2 the length of the window required can be derived. Thedistance between the peak ripples on either side of discontinuity is approximately equal to themain-lobe width ∆ωm. The transition width ∆ω = ωs − ωp is therefore somewhat less thanthe main-lobe width. For example, for the Blackman window, N can be determined from therelationship 12π/(N − 1) ≃ ∆ω. Therefore N ≈ 12π

∆ω + 1 points.

4.4 Kaiser Window Filter Design

We can quantify the trade-off between main-lobe width and side-lobe area by seeking the windowfunction that is maximally concentrated around ω = 0 in the frequency domain. Kaiser proposedthe window function

w(n) =

I0[β(1 − [(n − α)/α]2)1/2]

I0(β), 0 ≤ n ≤ N − 1 ,

0, elsewhere ,

where α = N−12 , and I0(·) is the zeroth-order modified Bessel function of the first kind. In

contrast to the previously discussed windows, this one has two parameters; the length N , anda shape parameter β. Thus, by varying N and β we can trade side-lobe amplitude for main-lobe width. Figure 4.11 shows the trade-off that can be achieved. If the window is taperedmore, the side-lobes of the Fourier transform become smaller, but the main-lobe becomes wider.Alternatively, we can hold β constant and increase N , causing the main-lobe to decrease inwidth without affecting the amplitude of the side-lobes.

4.4. KAISER WINDOW FILTER DESIGN 41

0 2 4 6 8 10 12 14 16 18 200

0.2

0.4

0.6

0.8

1

1.2

Samples

Am

plitu

de

(a)

0 0.5 1 1.5 2 2.5 3−100

−90

−80

−70

−60

−50

−40

−30

−20

−10

0

w rad/s

20 l

og |W

|

(b)

0 0.5 1 1.5 2 2.5 3−100

−90

−80

−70

−60

−50

−40

−30

−20

−10

0

w rad/s

20 l

og |W

|

(c)

Figure 4.11: (a) Kaiser windows for β = 0 (solid), 3 (semi-dashed), and 6 (dashed) and N = 21.(b) Fourier transforms corresponding to windows in (a). (c) Fourier transforms of Kaiser windowswith β = 6 and N = 11 (solid), 21 (semi-dashed), and 41 (dashed).


Kaiser obtained formula that permitted the filter designer to estimate the values of N andβ which were needed to meet a given filter specification. Consider a frequency selective filterwith the specifications shown in Figure 3.11(a). Kaiser found that over a usefully wide rangeof conditions α1 is determined by the choice of β. Given α1, the passband cutoff frequency ωp

of the low-pass filter is defined to be the highest frequency such that |H(ejω)| ≥ 1 − α1. Thestop-band cutoff frequency ωs is defined to be the lowest frequency such that |H(ejω)| ≤ α2.The transition region width is therefore

∆ω = ωs − ωp

for the low-pass filter approximation. With A being the negative peak approximation error (pae)

A = −20 log10 min (α1, α2) ,

Kaiser determined empirically that to achieve a specified value of A, we required that

β =

0.1102(A − 8.7), A > 50,0.5842(A − 21)0.4 + 0.07886(A − 21), 21 ≤ A ≤ 50,0, A < 21.

To achieve prescribed values of A and ∆ω, N must satisfy

N =

⌈A − 8

2.285∆ω+ 1

⌉

.

With these formula no iterations or trial is required.

Example 4.4.1 Given the specifications of a low-pass filter ωp = 0.4π, ωs = 0.6π, α1 = α2 =0.001 and ∆ω = 0.2π, find the unit sample response using Kaiser’s method. We find

A = −20 log10 α1 = 60

andβ = 5.635, N = 38 .

Then,

h(n) =

sin ωc(n−α)

π(n−α) · I0[β(1−[(n−α)/α]2)1/2]I0(β) , 0 ≤ n ≤ N − 1,

0, elsewhere

where α = N−12 = 37

2 = 18.5 and ωc = (ωs + ωp) /2. Since N is an even integer, the resultinglinear phase system would be of type II. Figure 4.12 shows h(n) and H(ejω).

Example 4.4.2 Design of a high-pass filter.In the following, we consider a high-pass filter with

ωs = 0.35π

ωp = 0.5π

α1 = 0.021

Again using the formulae proposed by Kaiser, we get β = 2.6 and N = 25. Figure 4.13 showsh(n) and H(ejω). The resulting peak error is 0.0213 > α1 = α2 = 0.021 as specified. Thus weincrease N by 26. In this case the filter would be of type II and is not appropriate for designinga high-pass. Therefore we use N = 27, which exceeds the required specifications.

4.4. KAISER WINDOW FILTER DESIGN 43

0 5 10 15 20 25 30 35 40−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

Sample number (n)

Am

plitu

de

(a)

0 0.5 1 1.5 2 2.5 3−120

−100

−80

−60

−40

−20

0

20

w rad/s

20 l

og |H

|

(b)

Figure 4.12: Response functions. (a) Impulse response. (b) Log magnitude.

The design of FIR filters by windowing is straightforward to apply and it is quite generaleven though it has a number of limitations. However, we often wish to design a filter that is the“best” that can be achieved for a given value of N . We describe two methods that attempt toimprove upon the windowing method by introducing a new criterion of optimality.


0 5 10 15 20 25 30−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Sample number (n)

Am

plitu

de

(a)

0 0.5 1 1.5 2 2.5 3−100

−80

−60

−40

−20

0

20

w rad/s

20 l

og |H

|

(b)

Figure 4.13: Kaiser window estimate. (a) Impulse response. (b) Log magnitude.

4.5. OPTIMAL FILTER DESIGN. 45

4.5 Optimal Filter Design.

As seen, the design of a filter with the window method is straightforward and easy to handlebut it is not optimal in the sense of minimizing the maximum error. Through the Fourierapproximation, any filter design which uses windowing attempts to minimize the integratederror

ǫ2 =1

2π

∫ π

−π

∣∣Hd(e

jω) − H(ejω)∣∣2dω.

In many applications we get a better result by minimizing the maximum error. Using thewindowing method and Fourier approximation we always get the Gibbs phenomenon whichimplies that the maximum error will be at discontinuities of Hd(e

jω). Better results can beachieved if the approximation error is spaced equally over the whole frequency band.In the following section only type I linear phase FIR filters are taken into account but theimplementation of other linear phase FIR filters is straightforward.Take a non-causal FIR filter with the impulse response

h(n) = h(−n),

the frequency response is

H(ejω) =L∑

n=−L

h(n)e−jωn

= h(0) +L∑

n=1

2h(n) cos(ωn)

=L∑

n=0

a(n) cos(ωn)

=L∑

n=0

a(n)

[n∑

k=0

βkn cos(ω)k

]

=L∑

n=0

a(n) cos(ω)n

which is real and even. The filter specifications for Hd(ejω) are shown in Fig. 4.14

To design the filter the Parks-McClellan algorithm formulates the problem as one of polynomialapproximation which can be solved by the Chebyshev approximation over disjoint sets of fre-quencies (F = 0 ≤ ω ≤ ωp, ωs ≤ ω ≤ π). The approximation error function, E(ω), is definedas

E(ω)=W (ω)[Hd(e

jω) − H(ejω)]. (4.2)

E(ω) measures the error between the optimum and attained frequency response, which isweighted by W (ω) where

W (ω) =

α1α2

: 0 ≤ ω ≤ ωp

1 : ωs ≤ ω ≤ π.


1 + α1

1

1 − α1

α2

−α2ωp ωs

Figure 4.14: Filter Specification

The optimal H(ejω) minimizes the maximum weighted error which means we have to search foran H(ejω) which fulfills

mina(n) : 0≤n≤L

(

maxω∈F

|E(ω)|)

= mina(n) : 0≤n≤L

(

maxω∈F

∣∣W (ω)

[Hd(e

jω) − H(ejω)]∣∣

)

.

Theorem 4.5.1 (Alternation Theorem)With a closed disjoint subset F on the real axis a necessary and sufficient condition for

H(ejω) =L∑

n=0

a(n) cos(ω)n

to be the unique, best weighted Chebyshev approximation of Hd(ejω) in F is that the error

function E(ω) has at least L + 2 alternating frequencies in F . This means that there must existat least L + 2 frequencies ωi in F such that ω0 < ω1 < · · · < ωL+1 with E(ωi) = −E(ωi+1) and

|E(ωi)| = maxω∈F

|E(ω)| i = 0, . . . , L + 1.

The maximum of the alternating frequencies can be found from the derivative of E(ω). If wecalculate the derivative of E(ω) and take into account that Hd(e

jω) is constant in F , we get forthe maximal frequencies

dE(ω)

dω=

d

dω

W (ω)

[Hd(e

jω) − H(ejω)]

= −dH(ejω)

dωW (ω) (4.3)

= W (ω) sin(ω)L∑

n=1

na(n) cos(ω)n−1 = 0.

This means that there are at most L + 1 maxima belonging to the polynomial structure andto the frequencies at ω = 0 and ω = π. Two more maxima will occur at ω = ωp and ω = ωs

4.5. OPTIMAL FILTER DESIGN. 47

because of the discontinuity and the inability to approximate this.As it can be seen in Eq. (4.3), the error function E(ω) and the approximation function H(ejω)possess maxima at the same frequencies ωi. Thus from Theorem 4.5.1 we can write the errorE(ω) at the ωi, i = 0 . . . L + 1, as

E(ωi) = W (ωi)[Hd(e

jωi) − H(ejωi)]

= (−1)iρ i = 0 . . . L + 1

and can express the ideal frequency response as

Hd(ejωi) = H(ejωi) +

(−1)iρ

W (ωi)=

L∑

n=0

a(n) cos(ωin) +(−1)iρ

W (ωi)

or

1 cos(ω0) · · · cos(Lω0)1

W (ω0)...

.... . .

......

1 cos(ωL+1) · · · cos(LωL+1)(−1)L+1

W (ωL+1)

a0...

aL

ρ

=

Hd(ejω0)

...Hd(e

jωL+1)

.

To solve these equations for the parameter an, we use an algorithm called ”Remez” which wasdeveloped by Rabiner et. al. in 1975.

After an initial guess of L + 2 extremal frequencies of H(ejω) we can compute ρ according to

ρ =γ0Hd(e

jω0) + . . . + γL+1Hd(ejωL+1)

γ0

W (ejω0 )+ . . . +

(−1)L+1γL+1

W (ejωL+1 )

(4.4)

with

γk =L+1∏

n=0, n6=k

1

cos(ωk) − cos(ωn).

Since E(ω) has the same extremal frequencies as H(ejω) we can write

H(ejωi) = Hd(ejωi) − (−1)iρ

W (ωi).

Using Lagrange interpolation it follows that

H(ejω) =

∑Lk=0 H(ejωi) βk

cos(ω)−cos(ωk)∑L

k=0βk

cos(ω)−cos(ωk)

with

βk =L∏

n=0, n6=k

1

cos(ωk) − cos(ωn).

With H(ejω) it is possible to calculate the error E(ω) according to Eq. (4.2). After calculat-ing E(ω), L + 2 extremal frequencies belonging to the largest alternating peaks can be selectedand the algorithm can start again by using Eq. (4.4). The Algorithm is summarised in Fig.4.15.


By using the Remez algorithm we are able to find the δ that defines the upper bound of |E(ω)|,which is the optimum solution for the Chebyshev approximation. After the algorithm convergesthe filter coefficients h(n) can be calculated by taking L + 1 frequencies from H(ejω) with

h(n) =1

L + 1

L∑

n=0

H(eωn)ej 2πknL+1 .

Property of Optimal Filters

The error between the desired and attained filter frequency response achieve a minimum andmaximum consecutively at a finite number of frequencies in the frequency range of interest (pass-band and stopband regions) called alternation frequencies. The minimum number of alternatingfrequencies for a Type I optimum filter is N+3

2 (with L = N−12 ). Because of this characteristic,

optimal filters are called equi-ripple filters.

Remarks

1. Designing an optimal filter requires much more computation than the windowing method.

2. The length of the filter designed by the window method is 20% to 50% longer than thelength of the optimal filter.

3. Optimal filter design is the most widely used FIR filter design method. In c©MATLABthe function that performs optimum filter design is called ”firpm”.

4.6 Implementation of FIR Filters

Here we discuss the implementation of a discrete time system with unit sample response h(n),where h(n) may be a filter. The input-output relationship is given by the convolution sum

y(n) =∞∑

k=−∞h(k)x(n − k) =

N−1∑

k=0

h(k)x(n − k) ,

i.e., the output y(n) represents a sum of weighted and delayed versions of the input x(n). Thesignal flow graph for a single delay element is shown in Figure 4.16. The convolution sum canbe represented as in Figure 4.17.

The computational efficiency of the algorithm in Figure 4.17 may be improved by exploitingthe symmetry (or anti-symmetry) of h(n). For type I Filters, we have h(n) = h(N − 1− n) and

4.6. IMPLEMENTATION OF FIR FILTERS 49

N is odd. This leads to

y(n) =N−1∑

k=0

h(k) · x(n − k)

=

N−12

−1∑

k=0

h(k) · x(n − k) + h

(N − 1

2

)

x

(

n − N − 1

2

)

+N−1∑

k=N−12

+1

h(k) · x(n − k)

y(n) =

N−12

−1∑

k=0

h(k) · [x(n − k) + x(n − (N − 1 − k))]

+ h

(N − 1

2

)

x

(

n − N − 1

2

)

This realisation is shown by the flow graph in Figure 4.18. This method reduces the number ofmultiplications by 50% without affecting the number of additions or storage elements required.


Calculate the optimum

ρ on extremal set

Interpolate through (L + 1)

points to obtain H(ejω)

Calculate error E(ω)

and find local maxima

where |E(ω)| ≥ ρ

Retain (L + 2)

largest

extrema

More than

(L + 2)

extrema?

Check whether the

extremal points changed

Best approximation

Yes

No

Unchanged

Changed

?

?

?

?

?

Initial guess of

(L + 2) extremal frequencies

Input filter parameters

?

?

-

-

Figure 4.15: Flowchart of Parks-McClellan algorithm.

4.6. IMPLEMENTATION OF FIR FILTERS 51

-- z−1x(n) y(n)

Figure 4.16: Signal flow graph for a delay element.

- - z−1 - - z−1

? ?- -

?- z−1

?- - -

h(0) h(1) h(2) h(N − 1)

x(n)

y(n)

Figure 4.17: Signal Flow graph representing the convolution sum.

z−1 z−1 z−1

] ] ] ]

z−1 z−1 z−1

? ? ??

- - - - -

-

?

?

h(0) h(1) h(2) h(N−32 ) h(N−1

2 )

x(n)

y(n)

Figure 4.18: Efficient signal flow graph for the realisation of a type I FIR filter.

Chapter 5

Design of Infinite Impulse ResponseFilters

5.1 Introduction

When designing IIR filters one must determine a rational H(z) which meets a given magnitudespecification. The two different approaches are :

1. Design a digital filter from an analog or continuous-time filter.

2. Design the filter directly.

Only the first approach is considered since it is much more useful. It consists of the followingfour steps:

Step 1 : Specification of the digital filter.

Step 2 : Translation to a continuous-time filter specification.

Step 3 : Determination of the continuous-time filter system function Hc(s).

Step 4 : Transformation of Hc(s) to a digital filter system function H(z).

Remarks

1. Step 2 depends on Step 4 since translation of the digital filter specification to a continuous-time filter specification depends on how the continuous-time filter is transformed back tothe digital domain.

2. It is desirable for Step 4 to always transform a causal and stable Hc(s) to a causal andstable H(z), and also that the transformation results in a digital filter with a rational H(z)when the continuous-time filter has a rational Hc(s).

3. The continuous-time filter frequency response Hc(jΩ) = Hc(s)|s=jΩ should map the digital

filter frequency response H(ejω) = H(z = ejω). Typically Butterworth, Chebyshev andelliptic filters are used as continuous-time prototypes.

The impulse invariance method and bilinear transformation are two approaches that satisfyall the above properties, and are often used in designing IIR filters. We discuss their implemen-tation in the following sections.

53

54 CHAPTER 5. DESIGN OF INFINITE IMPULSE RESPONSE FILTERS

5.2 Impulse Invariance Method

In the impulse invariance method, we obtain the unit sample response of the digital filter bysampling the impulse response of the continuous-time filter:

1. From Hc(s) and Hc(jΩ), we determine hc(t).

2. The sequence h(n) is obtained by h(n) = Td · hc(nTd), where Td is the design samplinginterval which does not need to be the same as the sampling period T associated withA/D and D/A conversion.

3. H(z) is obtained from h(n).

Hc(s)

hc(t) h(n)

H(z)

Example 5.2.1 A continuous-time filter is described by

Hc(s) =1

s − sc.

The inverse Laplace transform ishc(t) = esct · u(t) .

Therefore,h(n) = hc(t)|t=nTd

= escn · u(n)

assuming Td = 1, which leads to

H(z) =1

1 − escz−1

In general, to transform Hc(s) to H(z), Hc(s) is first expressed as a linear combination ofsystem functions of the form 1

(s−sc), which is then transformed to give H(z), i.e. perform a

partial fraction expansion and transform each component.If Hc(s) has multiple poles at s = sc, Hc(s) would have a component of the form 1

(s−sc)p

with p > 1 . Its z-transform can be obtained by following the same steps as the above example.In previous chapters, we have seen that

H(ejω) =∞∑

k=−∞Hc

(

jω

Td+ j

2π

Tdk

)

. (5.1)

So, H(ejω) is not simply related to Hc(jΩ) because of aliasing. If there were no aliasing effect,

H(ejω) = Hc

(

jω

Td

)

for − π ≤ ω ≤ π , (5.2)

and the shape of H(ejω) would be the same as Hc(jΩ). Note that in practice Hc(jΩ) is notband limited and therefore aliasing is present which involves some difficulty in the design. Inpractice, we suppose that the relation H(ejω) = Hc(j

ωTd

) holds over [−π, π) and we carry out

5.2. IMPULSE INVARIANCE METHOD 55

the remaining steps, instead of using (5.1). Note that the discrete-time system frequency ω isrelated to the continuous-time system frequency Ω by ω = ΩTd. Because of the approximation,the resulting filter may not satisfy the digital filter specifications and some iterations may beneeded. Also due to aliasing, it is difficult to design band stop or high-pass filters that have nosignificant aliasing.

In the impulse invariance design procedure, we transform the continuous-time filter specifi-cations through the use of Eq. (5.2) to the discrete-time filter specifications. We illustrate thisin the following example.

Example 5.2.2 Assume that we have the low-pass filter specifications as shown in Figure 5.1with α1 = 0.10875, α2 = 0.17783, ωp = 0.2π, Td = 1 and ωs = 0.3π. In this case the passband isbetween 1 − α1 and 1 rather than between 1 − α1 and 1 + α1. Historically, most continuous-

6

-

................................................................................

................................................................................

........................................................................................

ωp ωs π

α2

1

1 − α1

ω

|H(ejω)|

Figure 5.1: Low-pass filter specifications.

time filter approximation methods were developed for the design of passive systems (having gainless than 1). Therefore, the design specifications for continuous time filters have typically beenformulated so that the passband is less than or equal to 1.

We will design a filter by applying impulse invariance to an appropriate Butterworth continuous-time filter. A Butterworth low-pass filter is defined by the property that the magnitude-squaredfunction is

|Hc(jΩ)|2 =1

1 + (Ω/Ωc)2N

where N is an integer corresponding to the order of the filter. The magnitude response of a But-terworth filter of N th order is maximally flat in the passband, i.e., the first (2N − 1) derivativesof the magnitude-squared function are zero at Ω = 0. An example of this characteristic is shownin Figure 5.2. The first step is to transform the discrete-time specifications to continuous-timefilter specifications. Assuming that the aliasing involved in the transformation from Hc(jΩ) toH(ejω) will be negligible, we obtain the specification on Hc(jΩ) by applying the relation

Ω = ω/Td

to obtain the continuous-time filter specifications shown in Figure 5.3.The Butterworth filter should have the following specifications,

1 − α1 = 0.89125 ≤ |Hc(jΩ)| ≤ 1 , 0 ≤ |Ω| ≤ 0.2π ,

and|Hc(jΩ)| ≤ 0.17783 = α2 , 0.3π ≤ |Ω| ≤ π .


0 1 2 3 4 5 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Frequency

Mag

nitu

de

N=8

N=4

N=2

Figure 5.2: Magnitude transfer function Hc(jΩ).

-Ω0

6

1

1 − α1

α2

|Hc(jΩ)|

πTd

ωsTd

ωp

Td

Figure 5.3: Filter specifications for Example 5.2.2.

Because |Hc(jΩ)| is a monotonic function of Ω, the specifications will be satisfied if

|Hc(j0.2π)| ≥ 0.89125

and|Hc(j0.3π)| ≤ 0.17783

Now, we determine N and Ωc, which are parameters of

|Hc(jΩ)|2 =1

1 + (Ω/Ωc)2N,

to meet the specifications. Thus, we have

1 +

(0.2π

Ωc

)2N

=

(1

0.89125

)2

5.3. BILINEAR TRANSFORMATION 57

and

1 +

(0.3π

Ωc

)2N

=

(1

0.17783

)2

which leads to N = 5.8858 and Ωc = 0.70474. Rounding N up to 6, we find Ωc = 0.7032. Withthese values, the passband specification will be met exactly and the stopband specifications willbe exceeded, reducing the potential of aliasing. With N = 6, the 12 poles of

Hc(s)Hc(−s) = 1/[1 + (s/jΩc)2N ]

are uniformly distributed around a circle of radius ωc = 0.7032. Consequently, the poles of Hc(s)are the 3 pole pairs in the left half of the s-plane with coordinates

s1 = −0.182 ± j0.679

s2 = −0.497 ± j0.497

s3 = −0.679 ± j0.182

so that

Hc(s) =0.12093

(s2 + 0.364s + 0.4942)(s2 + 0.994s + 0.494)(s2 + 1.358s + 0.4542).

If we express Hc(s) as a partial fraction expansion, we transform 1(s−sk) into 1

(1−eskz−1)and get

H(z) =0.2871 − 0.4466z−1

1 − 1.2971z−1 + 0.6949z−2+

−2.1428 + 1.1455z−1

1 − 1.0691z−1 + 0.3699z−2+

1.8557 − 0.6303z−1

1 − 0.9972z−1 + 0.2570z−2.

5.3 Bilinear Transformation

The other main approach to the design problem is the bilinear transformation. With thisapproach, H(z) is obtained directly from the continuous-time frequency response Hc(s) by re-placing s by

s =2

Td

[1 − z−1

1 + z−1

]

so that

H(z) = Hc

[2

Td

(1 − z−1

1 + z−1

)]

.

A sample on the unit circle, z = ejω, corresponds to the point

s =2

Td· 1 − e−jω

1 + e−jω= j

2

Tdtan(ω/2).

The bilinear transformation transforms the imaginary axis of the s-plane to the unit circleas shown in Figure 5.4. The inverse transform is given by

z =

2Td

+ s2Td

− s=

1 + (Td/2)s

1 − (Td/2)s=

1 + σTd/2 + jΩTd/2

1 − σTd/2 − jΩTd/2.

The region ℜs < 0 is transformed to |z| < 1, i.e. the left-hand half plane ℜs < 0 of thes-plane is mapped inside the unit circle. This means the bilinear transformation will preservethe stability of systems.


-σ

6jΩ s-plane

Image ofleft half-plane

Image ofs = jΩ (unit circle)

6ℑ z-plane

-ℜ

Figure 5.4: Mapping of the s-plane onto the z-plane using the bilinear transformation

In the definition of the transformation, 2Td

is a scaling factor. It controls the deformation ofthe frequency axis when we use the bilinear transformation to obtain a digital transfer functionH(z) from the continuous-time transfer function Hc(s). From s = j 2

Tdtan(ω/2) = σ + jΩ, it

follows that

Ω =2

Tdtan(ω/2) and σ = 0 .

For low frequencies, the deformation between Ω and ω is very small,i.e. for small ω, we canassume Ω ≈ ω

2 . This was the criterion for choosing 2/Td as a scaling factor. This is shown inFigure 5.5. The choice of another criterion will yield another scaling factor.

−20 −15 −10 −5 0 5 10 15 20−4

−3

−2

−1

0

1

2

3

4

Continuous−time frequency

Dis

cret

e−tim

e fr

eque

ncy

Figure 5.5: Mapping of the continuous-time frequency axis onto the unit circle by bilineartransformation (Td = 1).

Remarks

1. This transformation is the most used in practice. It calculates digital filters from knowncontinuous-time filter responses.

2. By taking into account the frequency deformation,

Ω =2

Tdtan

ω

2,

5.3. BILINEAR TRANSFORMATION 59

the detailed shape of Hc(jΩ) is not preserved.

3. The delay time is also modified

τd = τc[1 + ((ωcTd)/2)2] .

If the continuous-time filter has a constant delay time, the resulting digital filter will nothave this property.

4. To design a digital filter using this method one must first transform the discrete time filterspecifications to the continuous time specifications, design a continuous-time filter withthese new requirements and then apply the bilinear transformation.

Example 5.3.1 Consider the discrete-time filter specifications of Example 5.2.2, where we il-lustrated the impulse invariance technique for the design of a discrete-time filter. In carrying outthe design using the bilinear transformation, the critical frequencies of the discrete-time filtermust be prewarped to the corresponding continuous-time frequencies using Ω = (2/Td) tan(ω/2)so that the frequency distortion inherent in bilinear transformation will map them back to thecorrect discrete-time critical frequencies. For this specific filter, with |Hc(jΩ)| representing themagnitude response function of the continuous-time filter, we require that

0.89125 ≤ |Hc(jΩ)| ≤ 1, 0 ≤ |Ω| ≤ 2

Tdtan

(0.2π

2

)

,

|Hc(jΩ)| ≤ 0.17783,2

Tdtan

(0.3π

2

)

≤ |Ω| ≤ ∞.

For convenience we choose Td = 1. Also, as with Example 5.2.2, since a continuous-timeButterworth filter has monotonic magnitude response, we can equivalently require that

|Hc(j2 tan(0.1π))| ≥ 0.89125

and|Hc(j2 tan(0.15π))| ≤ 0.17783.

The form of the magnitude-squared function for the Butterworth filter is

|Hc(jΩ)|2 =1

1 + (Ω/Ωc)2N.

Solving for N and Ωc with the equality sign we obtain

1 +

(2 tan(0.1π)

Ωc

)2N

=

(1

0.89125

)2

(5.3)

and

1 +

(2 tan(0.15π)

Ωc

)2N

=

(1

0.17783

)2

(5.4)

and solving for N gives

N =log[(( 1

0.17783)2 − 1)/(( 10.89125)2 − 1)]

2 log[tan(0.15π)/ tan(0.1π)]= 5.30466.


Since N must be an integer, we choose N = 6. Substituting this N into Eq. ( 5.4) we obtainΩc = 0.76622. For this value of Ωc, the passband specifications are exceeded and stop bandspecifications are met exactly. This is reasonable for the bilinear transformation since we do nothave to be concerned with aliasing. That is, with proper prewarping, we can be certain that theresulting discrete-time filter will meet the specifications exactly at the desired stopband edge.

In the s-plane, the 12 poles of the magnitude-squared function are uniformly distributed inangle on a circle of radius 0.7662. The system function of the continuous-time filter obtainedby selecting the left half-plane poles is

Hc(s) =0.20238

(s2 + 0.3966s + 0.5871)(s2 + 1.0836s + 0.5871)(s2 + 1.4802s + 0.5871).

We then obtain the system function H(z) for the discrete-time filter by applying the bilineartransformation to Hc(s) with Td = 1,

H(z) =0.0007378(1 + z−1)6

(1 − 1.268z−1 + 0.7051z−2)(1 − 1.0106z−1 + 0.3583z−2)(1 − 0.9044z−1 + 0.2155z−2).

Since the bilinear transformation maps the entire jΩ-axis of the s-plane onto the unit circle,the magnitude response of the discrete-time filter falls off much more rapidly than the originalcontinuous-time filter, i.e. the entire imaginary axis of the s-plane is compressed onto the unitcircle. Hence, the behaviour of H(ejω) at ω = π corresponds to the behaviour of Hc(jΩ) atΩ = ∞. Therefore, the discrete-time filter will have a sixth-order zero at z = −1 since thecontinuous-time Butterworth filter has a sixth-order zero at s = ∞.

Note that as with impulse invariance, the parameter Td is of no consequence in the designprocedure since we assume that the design problem always begins with discrete-time filtersspecifications. When these specifications are mapped to continuous-time specifications and thenthe continuous-time filter is mapped back to a discrete-time filter, the effect of Td is cancelled.Although we retained the parameter Td in our discussion, in specific problems any convenientvalue of Td can be chosen.

5.4 Implementation of IIR Filters

An IIR filter has an infinite extent and cannot be implemented by direct convolution in practice.For this reason, we require the IIR filter to have a rational z-transform, as we have seen thatthis can be realised by a difference equation.

5.4.1 Direct Form I

The difference equation that describes the system is given by

y(n) =M∑

k=1

aky(n − k) +N−1∑

r=0

brx(n − r)

This can be realized in a straightforward manner as depicted in Figure 5.6 which representsthe pair v(n) =

∑N−1r=0 brx(n − r) and y(n) =

∑Mk=1 aky(n − k) + v(n) for (N − 1) = M . This

5.4. IMPLEMENTATION OF IIR FILTERS 61

representation can be viewed as an implementation of H(z) through the decomposition

H(z) = H2(z)H1(z) =

1

1 −M∑

k=1

akz−k

(N−1∑

r=0

brz−r

)

.

Herein we have represented the delay element

z−1- - -z−1

by

- - - - -

?

?

-

-

6

6 6?

?

6

?

-

?

-

b0

b1

b2

bN−2

bN−1

a1

a2

aN−2

aN−1

x(n)

x(n − 1)

x(n − 2)

x(n − N + 2)

x(n − N + 1)

z−1

z−1

z−1

v(n)

z−1

z−1

z−1

y(n)

y(n − 1)

y(n − 2)

y(n − N + 2)

y(n − N + 1)

Figure 5.6: Signal flow graph of direct form I structure for an (N − 1)th-order system.

5.4.2 Direct Form II

The previous system can also be viewed as a cascade of two systems whose system functions aregiven by H1(z) and H2(z). Since

H1(z)H2(z) = H(z) = H2(z)H1(z) ,

we can change the order of H1(z) and H2(z) without affecting the overall system function. Thisis because the delay elements are redundant and can be combined. We then obtain the directform II structure. The advantage of the second form lies in the reduction in the number ofstorage elements required, while the number of arithmetic operations required is constant. Theflow graph of direct form II structure is depicted in Figure 5.7


- -

6

6?

?

?

- -

6

6

-

-

-

-a1

a2

aN−2

aN−1

x(n)w(n)

z−1

z−1

z−1

b0

b1

b2

bN−2

bN−1

y(n)

Figure 5.7: Signal flow graph of direct form II structure for an N − 1th-order system.

5.4.3 Cascade form

The rational system function

H(z) =

N−1∑

r=0

brz−r

1 −M∑

k=1

akz−k

can be expressed in the general form,

H(z) = A∏

k

Hk(z)

where Hk(z) is given by :

Hk(z) =1 + β1kz

−1 + β2kz−2

1 − α1kz−1 − α2kz−2

The constants A, β1k, β2k, α1k and α2k are real, since h(n) is real. The previous equations haveshown that H(z) can be represented by a cascade of second order sections Hk(z). The cascadeform is less sensitive to coefficient quantisation. If we change the branch transmittance in thetwo direct forms, the poles and zeroes of the system function will be affected. In the cascadeform, however, the change in one transmittance branch will affect the poles or zeroes only in thesecond order section where the branch transmittance is affected. Figure 5.8 gives the cascadestructure for a 6th-order system with a direct form II realisation of each second-order subsystem.

5.5. A COMPARISON OF FIR AND IIR DIGITAL FILTERS 63

- - -

6?

?

-

-6

- -

6 -

?

-

- - -

66 -

?

? ?

-

-

6

-

α11

α21

β01

β11

β21

α12

α22

β02

β12

β22

α13

α23

β03

β13

β23

z−1

z−1

z−1

z−1

z−1

z−1

x(n) y(n)R R R

w1(n) y1(n) w2(n) y2(n) w3(n) y3(n)

Figure 5.8: Cascade structure for a 6th-order system with a direct form II realisation of eachsecond-order subsystem.

5.5 A Comparison of FIR and IIR Digital Filters

After having discussed a wide range of methods for the design of both infinite and finite durationimpulse response filters several questions naturally arise: What type of system is best, IIR orFIR? Which methods yields the best results? The answer is neither. We have discussed a widerange of methods for both IIR and FIR filter because no single type of filter nor single designmethod is best for all circumstances.

The choice between an FIR filter and an IIR filter depends upon the relative weight that oneattaches to the advantages and disadvantages of each type of filter. For example, IIR filters havethe advantage that a variety of frequency selective filters can be designed using closed-form designformulae. That is, once the problem has been defined in terms of appropriate specifications fora given kind of filter (e.g., Butterworth, Chebyshev, or elliptic), then the coefficients (or polesand zeros) of the desired digital filter are obtained by straightforward substitution into a setof design equations. This kind of simplicity in the design procedure is attractive if only a fewfilters are to be designed or if limited computational facilities are available.

In the case of FIR filters, closed-form design equations do not exist. While the windowmethod can be applied in a straightforward manner, some iteration may be necessary to meet aprescribed specification. Most of the other FIR design methods are iterative procedures whichrequire relatively powerful computational facilities for their implementation. In contrast, it isoften possible to design frequency selective IIR digital filters using only a hand calculator andtables of continuous-time filter design parameters. A drawback of IIR filters is that the closed-form IIR designs are preliminary limited to low-pass, band-pass, and high-pass filters, etc.Furthermore, these designs generally disregard the phase response of the filter. For example,with a relatively simple computational procedure we may obtain excellent amplitude responsecharacteristics with an elliptic low-pass filter while the phase response will be very nonlinear(especially at the band edge).

In contrast, FIR filters can have precise linear phase. Also, the window method and mostalgorithmic methods afford the possibility of approximating more arbitrary frequency responsecharacteristics with little more difficulty than is encountered in the design of low-pass filters.Also, it appears that the design problem for FIR filters is much more under control than theIIR design problem because there is an optimality theorem for FIR filters that is meaningful ina wide range of practical situations.

Finally, there are questions of economics in implementing a digital filter. Concerns of thistype are usually measured in terms of hardware complexity or computational speed. Both thesefactors are strongly related to the filter order required to meet a given specification. Putting


aside phase considerations, a given amplitude response specification will, in general, be met mostefficiently with an IIR filter. However, in many cases, the linear phase available with an FIRfilter may be well worth the extra cost.

Thus a multitude of tradeoffs must be considered in designing a digital filter. Clearly, thefinal choice will most often be made in terms of engineering judgement on such questions asthe formulation of specifications, the method of implementation, and computational facilitiesavailable for carrying out the design.

Chapter 6

Introduction to Digital SpectralAnalysis

6.1 Why Spectral Analysis?

There are mainly three principal reasons for spectral analysis.

1. To provide useful descriptive statistics.

2. As a diagnostic tool to indicate which further analysis method might be relevant.

3. To check postulated theoretical models.

6.2 Applications

We encounter many areas where spectral analysis is needed. A few examples are given below.

Physics. In spectroscopy we wish to investigate the distribution of power in a radiation fieldas a function of frequency (wavelength). Consider, for example, the situation shown inFigure 6.1 where a Michelson interferometer is used to measure the spectrum of a lightsource. Let S(t) be a zero-mean stochastic stationary process which may be complex. Itcan be seen that

X(t) = A (S(t − t0) + S(t − t0 − τ))

andV (t) = |X(t)|2 = A2|S(t − t0) + S(t − t0 − τ)|2 .

Using the notation t for t − t0, we get

V (t) = A2[S(t)S(t)∗ + S(t)S(t − τ)∗ + S(t − τ)S(t)∗ + S(t − τ)S(t − τ)∗

].

After low-pass filtering, we obtain (ideally)

w(τ) = E [V (t)] = A2 [cSS(0) + cSS(τ) + cSS(−τ) + cSS(0)]

= A2 [2cSS(0) + 2ℜcSS(τ)]

Performing a Fourier transform leads to

W (ejω) = Fw(τ) = A2(2cSS(0)δ(ejω) + CSS(ejω) + CSS(e−jω))

65

66 CHAPTER 6. INTRODUCTION TO DIGITAL SPECTRAL ANALYSIS

2

1

1

Fixed mirror

Beam splitter

Moveablemirror

S(t)

X(t)

AS(t − t0)d = 0

AS(t − t0 − τ)

τ = 2d/c

Detector

V (t)

Low-pass

filter

w(τ)

Fouriertransform

W (ejω)

Figure 6.1: A Michelson Interferometer.

Thus, the output of a Michelson interferometer is proportional to the spectrum of the lightsource. One of the objectives in spectroscopic interferometry is to get a good estimate ofthe spectrum of S(t) without moving the mirror a large d.

Electrical Engineering. Measurement of the power in various frequency bands of some elec-tromagnetic signal of interest. In radar there is the problem of signal detection and inter-pretation.

Acoustics. The power spectrum plays the role of a descriptive statistic. We may use thespectrum to calculate the direction of arrival (DOA) of an incoming wave or to characterisethe source.

Geophysics. Earthquakes give rise to a number of different waves such as pressure waves, shearwaves and surface waves. These waves are received by an array of sensors. Even for a

6.2. APPLICATIONS 67

100km distance between the earthquake and the array, these types of waves can arrive atalmost the same time at the array. These waves must be characterised.

Mechanical Engineering. Detection of abnormal combustion in a spark ignition engine. Thepower spectrum is a descriptive statistic to test if a combustion cycle is a knocking one.

Medicine. EEG (electroencephalogram) and ECG (electrocardiogram) analysis. It is wellknown that the adult brain exhibits several major modes of oscillation, the level of brainactivity in these various frequency bands may be used as a diagnostic tool.

68 CHAPTER 6. INTRODUCTION TO DIGITAL SPECTRAL ANALYSIS

Chapter 7

Random Variables and StochasticProcesses

7.1 Random Variables

A random variable is a number X(ζ) assigned to every outcome ζ ∈ S of an experiment. Thiscould be the gain in a game of chance, the voltage of a random source, or the cost of a randomcomponent.

Given ζ ∈ S, the mapping ζ → X(ζ) ∈ R is a random variable, if for all x ∈ R the setζ : X(ζ) ≤ x is also an event

X ≤ x := ζ : X(ζ) ≤ x

with probabilityP (X ≤ x) := P (ζ : X(ζ) ≤ x)

7.1.1 Distribution Functions

The mapping x ∈ R → FX(x) = P (X ≤ x) is called the probability distribution function ofthe random variable X.

7.1.2 Properties

The probability distribution function has the following properties

1. 0 ≤ FX(x) ≤ 1

FX(∞) = 1

FX(−∞) = 0

2. FX(x) is continuous from the right i.e.,

limǫ→0

FX(x + ǫ) = FX(x)

3. If x1 < x2 then FX(x1) ≤ FX(x2), i.e., FX(x) is a non-decreasing function of x

P (x1 < X ≤ x2) = FX(x2) − FX(x1) ≥ 0

69

70 CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES

4. The probability density function is defined as:

fX(x) :=dFX(x)

dx

and if it exists has the propertyfX(x) ≥ 0.

To find FX(x), we integrate fX(x), i.e.,

P (X ≤ x) = FX(x) =

∫ x

−∞fX(u)du

and because FX(∞) = 1,∫ ∞

−∞fX(x)dx = 1.

Further,

P (x1 < X ≤ x2) = FX(x2) − FX(x1) =

∫ x2

x1

fX(x) dx

5. For a continuous variable X, P(x1 < X < x2) = P(x1 < X ≤ x2) because

P(x) = P ((−∞, x]) − P ((−∞, x)) = FX(x) − FX(x − 0) = 0

7.1.3 Uniform Distribution

X is said to be uniformly distributed on [a, b] if it admits the probability density function

fX(x) =1

b − a[u(x − a) − u(x − b)] , a < b

and probability distribution function

FX(x) =

∫ x

−∞fX(u)du =

1

b − a[(x − a)u(x − a) − (x − b)u(x − b)]

Both FX(x) and fX(x) are displayed in Figure 7.1.

7.1.4 Exponential Distribution

Let X be exponentially distributed with parameter λ. Then,

fX(x) = λe−λxu(x), λ > 0

andFX(x) =

(

1 − e−λx)

u(x),

where

u(x) =

1, x ≥ 00, x < 0.

The exponential distribution is useful in many applications in engineering, for example, todescribe the life-time X of a transistor. The breakdown rate of a transistor is defined by

lim∆→0+

1

∆

P x < X ≤ x + ∆P x < X

and is given byfX(x)

1 − FX(x)= λ = constant

7.1. RANDOM VARIABLES 71

6

-

-

6

1b−a

a b

a b x

x

fx(x)

1

Figure 7.1: Probability function.

7.1.5 Normal Distribution

X is normally distributed with mean µ and variance σ2(X ∼ N (µ, σ2)

)if

fX(x) =1

σ√

2πe−

12σ2 (x−µ)2 σ2 > 0, −∞ < µ < ∞

A closed-form expression for FX(x) does not exist. For µ = 0, σ2 = 1, X is said to be standardnormally distributed, and its distribution function is denoted by ΦX(x), i.e., we have

FX(x) = ΦX

(x − µ

σ

)

Example 7.1.1 Show that for the normal distribution∫∞−∞ fX(x) dx = 1.

∫ ∞

−∞fX(x) dx =

∫ ∞

−∞

1

σ√

2πe−

12σ2 (x−µ)2 dx

If we let s = x−µσ , then

∫ ∞

−∞fX(x)dx =

∫ ∞

−∞

1√2π

e−12s2

ds

Because 1√2π

e−12x2

> 0, we calculate

(∫ ∞

−∞

1√2π

e−12x2

dx

)2

=

∫ ∞

−∞

∫ ∞

−∞

1

2πe−

12(x2+y2) dx dy

=

∫ 2π

0

∫ ∞

0

1

2πe−

12r2

r dr dϕ

=

∫ ∞

0e−

12r2

r dr

=

∫ ∞

0e−s ds

= 1

where r =√

x2 + y2 and ϕ = arctan( y

x

).


Example 7.1.2 A random variable X ∼ N (1000, 50). Find the probability that X is between900 and 1050. We have

P(900 ≤ X ≤ 1050) = FX(1050) − FX(900)

Numerical values of ΦX(x) over a range of x are shown in Table 1. Using Table 1, we concludethat

P(900 ≤ X ≤ 1050) = ΦX(1) + ΦX(2) − 1 = 0.819.

sinceΦX(−x) = 1 − ΦX(x).

Often we define the following as the error-function. Note that erf(x) = ΦX(x) − 12 though

there are other definitions in the literature.

erf(x) =1√2π

∫ x

0e−x2/2du = ΦX(x) − 1

2

Table 1: Numerical values for erf(x) function.

x erf(x) x erf(x) x erf(x) x erf(x)

0.05 0.01994 0.80 0.28814 1.55 0.43943 2.30 0.489280.10 0.03983 0.85 0.30234 1.60 0.44520 2.35 0.490610.15 0.05962 0.90 0.31594 1.65 0.45053 2.40 0.491800.20 0.07926 0.95 0.32894 1.70 0.45543 2.45 0.492860.25 0.09871 1.00 0.34134 1.75 0.45994 2.50 0.49379

0.30 0.11791 1.05 0.35314 1.80 0.46407 2.55 0.494610.35 0.13683 1.10 0.36433 1.85 0.46784 2.60 0.495340.40 0.15542 1.15 0.37493 1.90 0.47128 2.65 0.495970.45 0.17364 1.20 0.38493 1.95 0.47441 2.70 0.496530.50 0.19146 1.25 0.39495 2.00 0.47726 2.75 0.49702

0.55 0.20884 1.30 0.40320 2.05 0.47982 2.80 0.497440.60 0.22575 1.35 0.41149 2.10 0.48214 2.85 0.497810.65 0.24215 1.40 0.41924 2.15 0.48422 2.90 0.488130.70 0.25804 1.45 0.42647 2.20 0.48610 2.95 0.498410.75 0.27337 1.50 0.43319 2.25 0.48778 3.00 0.49865

7.1.6 χ2 Distribution

Let X =∑ν

k=1 Y 2k be a random variable where Yk ∼ N (0, 1). X is then χ2 distributed with

ν degrees of freedom. The probability density function of a χ2ν random variable is displayed in

Figure 7.2 for ν = 1, 2, 3, 5 and 10.The probability density function as

fX(x) =x

ν2−1

2ν2 Γ(ν

2 )e−

x2 u(x) ,

where Γ(x) is the gamma function,

Γ(x) =

∫ ∞

0ξx−1e−ξdξ .


0 1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

ν = 1

ν = 2

ν = 3

ν = 5

ν = 10

χ2ν

x

Figure 7.2: χ2ν distribution with ν = 1, 2, 3, 5, 10

The mean of X isE [X] = ν

and the variance isσ2

X = 2ν .

7.1.7 Multiple Random Variables

The probability of two events A = X1 ≤ x1 and B = X2 ≤ x2 have already been defined asfunctions of x and y, respectively called probability distribution functions

FX1(x1) = P(X1 ≤ x1)FX2(x2) = P(X2 ≤ x2)

We must introduce a new concept to include the probability of the joint event X1 ≤ x1, X2 ≤x2 = (X1, X2) ∈ D, where x1 and x2 are two arbitrary real numbers and D is the quadrantshown in Figure 7.3.

The joint distribution FX1X2(x1, x2) of two random variables X1 and X2 is the probabilityof the event X1 ≤ x1, X2 ≤ x2,

FX1X2(x1, x2) = P (X1 ≤ x1, X2 ≤ x2) .

For n random variables X1, . . . , Xn the joint distribution function is denoted by

FX1X2...Xn(x1, . . . , xn) = P (X1 ≤ x1, X2 ≤ x2, . . . , Xn ≤ xn)


-X1

6

X2

x1

x2

D

Figure 7.3: The quadrant D of two arbitrary real numbers x and y.

7.1.8 Properties

The multivariate probability distribution function has the following properties.

1)

limx1→∞,...,xn→∞

FX1X2...Xn(x1, . . . , xn) = F (∞,∞, . . . ,∞) = 1

limxi→−∞

FX1X2...Xn(x1, . . . , xi−1, xi, xi+1, . . . , xn)

= Fx1,x2,...,xn(x1, . . . , xi−1,−∞, xi+1, . . . , xn) = 0

2) F is continuous from the right in each component xi when the remaining components arefixed.

3) F is a non decreasing function of each component xi when the remaining components arefixed.

4) For the nth difference

∆b

aF =∑

k∈0,1n

(−1)[n−Pn

i=1 ki]F (bk11 a1−k1

1 , . . . , bknn a1−kn

n ) ≥ 0

where a = (a1, . . . , an)T , b = (b1, . . . , bn)T with ai < bi (i = 1, . . . , n)

where T denotes transpose. For n = 1, ∆baF is equivalent to F (b)−F (a), and for n = 2, ∆b

aF isequivalent to F (b1, b2)−F (b1, a2)−F (a1, b2)+F (a1, a2). To show that (1)–(3) are not sufficientto define a probability, one should discuss the function FX1X2(x1, x2), defined as:

FX1X2(x1, x2) =

0, x1 + x2 < 0,1, elsewhere.

and is depicted in Figure 7.4.Suppose that in a combined experiment the result of an experiment i depends on the first to

(i−1)th experiments. With the density function fX1(x1) for the first experiment one constructsthe density function of the total experiment

fX1...Xn(x1, . . . , xn) = fX1(x1)n∏

i=2

fXi(xi|x1, . . . , xi−1)

= fX1(x1)n∏

i=2

fXi(xi|X1 = x1, . . . , Xi−1 = xi−1)


-

6

x1

x2

FX1,X2(x1, x2) = 1

FX1,X2(x1, x2) = 0

Figure 7.4: Bivariate probability distribution function.

If the ith experiment is independent of the previous ones, then

fXi(xi|x1, . . . , xi−1) = fXi(xi)

and

fX1...Xn(x1, . . . , xn) =n∏

i=1

fXi(xi)

or equivalently

FX1...Xn(x1, . . . , xn) =n∏

i=1

FXi(xi)

which means that the events X1 ≤ x1, . . . , Xn ≤ xn are independent. For the special casefor n = 2, we have the following properties:

1)

FX1X2(∞,∞) = P(X1 ≤ ∞, X2 ≤ ∞) = 1

FX1X2(−∞, x2) = P(X1 ≤ −∞, X2 ≤ x2) = 0 = FX1X2(x1,−∞)

because X1 = −∞, X2 ≤ x2 ⊂ X1 = −∞, X1 ≤ x1, X2 = −∞ ⊂ X2 = −∞.

2)limǫ→0

FX1X2(x1 + ǫ, x2) = limǫ→0

FX1X2(x1, x2 + ǫ) = FX1X2(x1, x2).

3) The event x11 < X1 ≤ x12, X2 ≤ x2 consists of all the points (X1, X2) in D1, and theevent X1 ≤ x1, x21 < X2 ≤ x22 consists of all the points in D2. From Figure 7.5, we seethat

X1 ≤ x12, X2 ≤ x2 = X1 ≤ x11, X2 ≤ x2 ∪ x11 < X1 ≤ x12, X2 ≤ x2.

The two summands are mutually exclusive, hence

P(X1 ≤ x12, X2 ≤ x2) = P(X1 ≤ x11, X2 ≤ x2) + P(x11 < X1 ≤ x12, X2 ≤ x2)

and therefore,

P(x11 < X1 ≤ x12, X2 ≤ x2) = FX1X2(x12, x2) − FX1X2(x11, x2).


- X1

6

X2

x22

x21

D2

x12x11

D1

x1

x2

Figure 7.5: Properties for n=2

Similarly,

P(X1 < x1, x21 < X2 ≤ x22) = FX1X2(x1, x22) − FX1X2(x1, x21)

4)

P(x11 < X1 ≤ x12, x21 < X2 ≤ x22) =

FX1X2(x12, x22) − FX1X2(x11, x22) − FX1X2(x12, x21) + FX1X2(x11, x21) > 0

This is a probability that (X1, X2) is in the rectangle D3 (see Figure 7.6).

-X1

D3

6X2

x21

x22

x11 x12

Figure 7.6: Probability that (X1, X2) is in the rectangle D3.

Example 7.1.3 The bivariate normal distribution

fX1X1(x1, x2) =1

2πσX1σX2

√

1 − ρ2

× exp

[

− 1

2(1 − ρ2)

[(x1 − µX1

σX1

)2

− 2ρ(x1 − µX1)(x2 − µX2)

σX2σX2

+

(x2 − µX2

σX2

)2]]

where µX1 and µX2 are respectively the means of X1 and X2, σ2X1

> 0 and σ2X2

> 0 the corre-sponding variances, and −1 < ρ < 1 is the correlation coefficient. N (µX1 , σ

2X1

) and N (µX2 , σ2X2

)are the marginal distributions.


The conditional density function fX2(x2|x1) of X2 when X1 = x1 is N(µX2 + ρσX2σX2

(x1 −µX2), σ

2X2

(1 − ρ2)). If X1 and X2 are independent then ρ = 0, and clearly fX1X2(x1, x2) =fX1(x1)fX2(x2).

7.1.9 Operations on Random Variables

The concept of a random variable was previously introduced as a means of providing a symmetricdefinition of events defined on a sample space. It formed a mathematical model for describingcharacteristics of some real, physical world random variables, which are mostly based on a singleconcept – expectation.

7.1.10 Expectation

The mathematical expectation of a random variable X,

E [X]

which may be read “the expected value of X” or “the mean value of X” is defined as

E [X] :=

∫ ∞

−∞x · fX(x)dx

where fX(x) is the probability density function of X. If X happens to be discrete with Npossible values xi having probabilities P(xi) of occurrence, then

fX(x) =N∑

i=1

P(xi) · δ(x − xi)

E [X] =

N∑

i=1

xiP(xi)

Example 7.1.4

i) Normal Distribution: If X ∼ N (µ, σ2), then

fX(x) =1

σ√

2π· exp

−1

2(x − µ)2/σ2

E [X] =

∫ ∞

−∞xfX(x)dx

=1

σ√

2π

∫ ∞

−∞x exp−1

2(x − µ)2/σ2dx

=1

σ√

2π

∫ ∞

−∞(σu + µ) exp−1

2u2σdu

=1√2π

[

σ

∫ ∞

−∞u · exp−1

2u2du + µ

∫ ∞

−∞exp−1

2u2du

]

=1√2π

[

σ · 0 + µ√

2π]

= µ


ii) Uniform distribution: If X is uniformly distributed on the interval [a, b], then

fX(x) =1

b − a· rect[a,b](x)

where

rect[a,b] =

1, a ≤ x ≤ b0, else.

E [X] =

∫ ∞

−∞xfX(x)dx =

1

b − a

∫ b

axdx

=1

b − a

1

2(b2 − a2)

=(b − a)(b + a)

2(b − a)

=a + b

2

which can be verified graphically.

iii) Exponential Distribution: If X is exponentially distributed, then

fX(x) =1

bexp

[

−x − a

b

]

u(x − a)

E [X] =

∫ ∞

−∞x · fX(x)dx

=

∫ ∞

−∞x · 1

bexp

[

−x − a

b

]

u(x − a)dx

=1

b

∫ ∞

ax · exp

[

−x − a

b

]

dx

=1

b

∫ ∞

ax · exp

[

−x

b

]

dx exp[a

b

]

E [X] = a + b

Note: If a random variable’s density is symmetrical about a line x = a, then E [X] = a, i.e.

E [X] = a, if fX(x + a) = fX(−x + a)

7.1.11 Expected Value of a Function of Random Variables

Suppose we are interested in the mean of the random variable Y = g(X)

E [Y ] =

∫ ∞

−∞y · fY (y)dx

and we are given fX(x). Then,

E [Y ] = E [g(X)] =

∫ ∞

−∞g(x) · fX(x)dx


Example 7.1.5 The average power in a 1 Ω resistor can be found as

E[V 2]

=

∫ ∞

−∞v2fV (v)dv

where V and fV (v) are respectively the random voltage and its probability density function.

In particular, if we apply the transformation g(·) to the random variable X, defined as

g(X) = Xn, n = 0, 1, 2, . . . ,

the expectation of g(X) is known as the nth order moment of X and denoted by µn:

µn := E [Xn] =

∫ ∞

−∞xnfX(x)dx .

It is also of importance to use central moments of X around the mean. These are defined as

µ′n := E [(X − E [X])n] , n = 0, 1, 2 . . .

∫ ∞

−∞(x − E [X])n · fX(x)dx =

∫

(x − µ)nfX(x)dx

Clearly the first order central moment is zero.

7.1.12 The Variance

The variance of a random variable X is by definition

σ2 :=

∫ ∞

−∞(x − E [X])2 · fX(x)dx

The positive constant σ, is called the standard deviation of X. From the definition it followsthat σ2 is the mean of the random variable (X − E [X])2. Thus,

σ2 = E[(X − E [X])2

]= E

[X2 − 2X · (E [X]) + (E [X])2

]

= E[X2]− 2E [XE [X]] + (E [X])2

= E[X2]− 2 (E [X])2 + (E [X])2

σ2 = E[X2]− (E [X])2

= E[X2]− µ2

Example 7.1.6We will now find the variance of three common distributions.

i) Normal distribution:

Let X ∼ N (µ, σ2). The probability density function of X is

fX(x) =1

σ√

2πexp

[

−1

2

(x − µ)2

σ2

]

.


Clearly∫ ∞

−∞exp

−1

2

(x − µ)2

σ2

dx = σ√

2π

Differentiating with respect to σ, we obtain

∫ ∞

−∞

(x − µ)2

σ3exp

−1

2

(x − µ)2

σ2

dx =√

2π

Multiplying both sides by σ2/√

2π, we conclude that

E[(X − µ)2

]= E

[(X − E [X])2

]= σ2.

ii) Uniform distribution:

E[[(X − E [X])2

]= E

[X2]− (E [X])2

=

∫ ∞

−∞x2fX(x)dx −

(a + b

2

)2

=

∫ b

a

x2

b − adx −

(a + b

2

)2

=(b − a)2

12.

iii) Exponential distribution:

fX(x) = λ exp−λxu(x), x > 0, λ > 0

E [X] =1

λ

σ2 =1

λ2

7.1.13 Transformation of a Random Variable

In practice, one may wish to transform one random variable X into a new random variable Yby means of a transformation

Y = g(X),

where the probability density function fX(x) or distribution function of X is known. Theproblem is to find the probability density function of fY (y) or distribution function FY (y) of Y .Let us assume that g(·) is continuous and differentiable at all values of x for which fX(x) 6= 0.Furthermore, we assume that g(·) is monotone for which the inverse g−1(·) exists. Then:

FY (y) = P (Y ≤ y) = P (g(X) ≤ y) = P(X ≤ g−1(y)

)=

∫ g−1(y)

−∞fX(x)dx

holds. The density function of Y = g(X) is obtained via differentiation which leads to

fY (y) = fX

(g−1(y)

)·∣∣∣∣

dg−1(y)

dy

∣∣∣∣


Example 7.1.7 LetY = aX + b.

To find the probability density function of Y knowing FX(x) or fX(x), we calculate

FY (y) = P (Y ≤ y) = P (aX + b ≤ y)

= P

(

X ≤ y − b

a

)

, a > 0

= FX

(y − b

a

)

,

and

FY (y) = 1 − FX

(y − b

a

)

, a < 0

By differentiating FY (y), we obtain the probability density function

fY (y) =1

|a| · fX

(y − b

a

)

Example 7.1.8 LetY = X2

To find the probability density function of Y knowing FX(x) or fX(x), we calculate

FY (y) = P(X2 ≤ y

)=

0, y < 0P (−√

y ≤ X ≤ √y) , y ≥ 0

FY (y) = FX (√

y) − FX (−√y)

By differentiating FY (y), we obtain the probability density function

fY (y) =

0, y < 0fX (

√y) + fX (−√

y)2√

y, y > 0

FY (y) is non differentiable at y = 0. We may choose arbitrarily fY (y) = 0 at y = 0.

7.1.14 The Characteristic Function

The characteristic function (cf) of a random variable X is defined as follows:

ΦX(ejω) := E[ejωX

](7.1)

=

∫ ∞

−∞fX(x)ejωxdx. (7.2)

This can be interpreted as the expectation of a function g(X) = ejωX , or as the Fourier transform(with reversed sign of the exponent) of the probability density function (pdf) of X. Consid-ering Equation (7.2) as a Fourier transform, we can readily apply all the properties of Fouriertransforms to the characteristic function. Most importantly, the inverse relationship of Equation(7.2) is given by

fX(x) =1

2π

∫ ∞

−∞ΦX(ω)e−jωxdω.

Using Schwarz’s inequality in (7.2) shows that |ΦX(ejω)| ≤ 1 and thus it always exists.


Example 7.1.9 We shall derive the cf of a Gaussian distributed random variable. First, weconsider Z ∼ N (0, 1).

ΦZ(ω) =

∫

fZ(z)ejωzdz

=1√2π

∫ ∞

−∞e−

12z2

ejωzdz

=1√2π

∫ ∞

−∞e−

12(z−jω)2+ 1

2(jω)2dz

= e−12ω2

∫

fZ(z − jω)dz

= e−12ω2

Now we consider the more general case of X = σZ + µ ∼ N (µ, σ2):

ΦX(ω) = E[

ejω(σZ+µ)]

= ejωµΦZ(σω)

= ejωµ− 12σ2ω2

The cf is also known as the moment-generating function as it may be used to determine nthorder moment of a random variable. This can be seen from the following derivation. Considerthe power series expansion of an exponential term for x < ∞:

ex =∞∑

i=0

xn

n!.

Substituting this into Equation (7.2) yields

ΦX(ω) =

∫∫

fX(x)

1 + jωx +(jωx)2

2!+ · · ·

dx

= 1 + jωE [X] +(jω)2

2!E[X2]+ · · · (7.3)

Differentiating the above expression n times and evaluating at the origin gives

(d

dω

)n

ΦX(ω)

∣∣∣∣ω=0

= jnE [Xn]

which means that the nth order moment of X can be determined according to

E [Xn] =1

jn

(d

dω

)n

ΦX(ω)

∣∣∣∣ω=0

.

This is known as the moment theorem. When the power series expansion of Equation (7.3)converges the characteristic function and hence the pdf of X are completely determined by themoments of X.

7.2. COVARIANCE 83

7.2 Covariance

If the statistical properties of the two random variables X1 and X2 are described by their jointprobability density function fX1X2(x1, x2), then we have

E [g(X1, X2)] =

∫ ∞

−∞

∫ ∞

−∞g(x1, x2) · fX1X2(x1, x2)dx1dx2

For g(X1, X2) = Xn1 Xk

2

µn,k := E[

Xn1 Xk

2

]

=

∫ ∞

−∞

∫ ∞

−∞xn

1xk2fX1X2(x1, x2)dx1 dx2, n, k = 0, 1, 2, . . .

are called the joint moments of X1 and X2. Clearly µ0k = E[Xk

2

], while µn0 = E [Xn

1 ]. Thesecond order moment E [X1X2] of X1 and X2 we denote1 by rX1X2 ,

rX1X2 = E [X1X2] =

∫ ∞

−∞

∫ ∞

−∞x1x2fX1X2(x1, x2)dx1dx2

If rX1X2 = E [X1X2] = E [X1] · E [X2], then X1 and X2 are known to be uncorrelated. IfrX1X2 = 0, then X1 and X2 are orthogonal.

The moments µ′nk, k = 0, 1, 2, . . . are known to be joint central moments and defined by

µ′nk = E

[

(X1 − E [X1])n(X2 − E [X2])

k)]

=

∫ ∞

−∞

∫ ∞

−∞(x1 − E [X1])

n(x2 − E [X2])kfX1X2(x1, x2)dx1dx2, n, k = 0, 1, 2, . . .

The second order central moments

µ′20 = σ2

X1

µ′02 = σ2

X2

are the variances of X1 and X2, respectively. The joint moment µ′11 is called covariance of X1

and X2 and denoted by cX1X2

cX1X2 = E [(X1 − E [X1])(X2 − E [X2])]

=

∫ ∞

−∞(x1 − E [X1])(x2 − E [Y2]) · fX1X2(x1, x2)dx1dx2

ClearlycX1X2 = rX1X2 − E [X1] E [X2] .

If two random variables are independent, i.e. fX1X2(x1, x2) = fX1(x1)fX2(x2), then cX1X2 = 0.The converse however is not always true (except for the Gaussian case). The normalized second-order moment ρ = µ′

11/√

µ′20µ

′02 =

cX1X2σX1

σX2is known as the correlation coefficient of X1 and X2.

It can be shown that−1 ≤ ρ ≤ 1

We will show that two uncorrelated random variables are not necessarily independent.

1In the engineering literature rX1X2is often called correlation. We reserve this term to describe the normalized

covariance as it is common in the statistical literature.


Example 7.2.1 U is uniformly distributed on [−π, π)

X1 = cos U, X2 = sinU

X1 and X2 are not independent because X21 + X2

2 = 1

cXY = E [(X1 − E [X1])(X2 − E [X2])]= E [X1Y1] = E [cos U sinU ]= 1

2E [sin 2U ]= 0

Thus X1 and X2 are uncorrelated but not independent.

Example 7.2.2 Let Xi, i = 1, . . . , N be random variables and αi, i = 1, . . . , N be scalar values.We calculate the mean and the variance of

Y =N∑

i=1

αiXi

given the means and variances (and covariances) of Xi.

E [Y ] =

N∑

i=1

αiE [Xi]

σ2Y = E

[(Y − E [Y ])2

]=

N∑

i=1

N∑

j=1

αiαjcXiXj

For the special case cXiXj = 0, i 6= j

σ2Y =

N∑

i=1

α2i σ

2Xi

.

7.3 Stochastic Processes (Random Processes)

Random processes naturally arise in many engineering applications. For example, the voltagewaveform of a noise source or the bit stream of a binary communications signal are continuous-time random processes. The monthly rainfall values or the daily stock market index are examplesof discrete-time random processes. However, random sequences are most commonly encounteredwhen sampling continuous-time processes, as discussed in the preceding chapter. The remain-der of the manuscript focuses solely on discrete-time random processes. However the resultspresented herein apply in a similar fashion to the continuous- time case.

Definition 7.3.1 A real random process is an indexed set of real functions of time that hascertain statistical properties. We characterise a real random process by a set of real functionsand associated probability description.

In general, xi(n) = X(n, ζi), n ∈ Z, ζi ∈ S, denotes the waveform that is obtained whenthe event ζi of the process occurs. xi(n) is called a sample function of the process (see Figure7.7). The set of all possible sample functions X(n, ζi) is called the ensemble and defines the

7.3. STOCHASTIC PROCESSES (RANDOM PROCESSES) 85

...

...

n

n

n

n1 n2

X1 = X(n1) = X(n1, ζ) X2 = X(n2) = X(n2, ζ)

xi−1(n) = X(n, ζi−1)

xi(n) = X(n, ζi)

xi+1(n) = X(n, ζi+1)

Figure 7.7: Voltage waveforms emitted from a noise source.

random process X(n) that describes a noise source. A random process can be seen as a functionof two variables, time n and elementary event ζ. When n is fixed, X(n, ζ) is simply a randomvariable. When ζ is fixed, X(n, ζ) = x(n) is a function of time known as “sample function” orrealisation. If n = n1 then X(n1, ζ) = X(n1) = X1 is a random variable. If the noise source hasa Gaussian distribution, then any of the random variables would be described by

fXi(xi) =1

√

2πσ2i

· e− (xi−µi)

2

2σ2i

where Xi = X(ni, ζ). Also, all joint distributions are multivariate Gaussian. At each pointin time, the value of the waveform is a random variable described by its probability densityfunction at that time n, fX(x; n).

The probability that the process X(n) will have a value in the range [a, b] at n is given by

P(a ≤ X(n) ≤ b) =

∫ b

afX(x; n)dx


The first moments of X(n) are

µ1(n) = E [X(n)] =

∫ ∞

−∞x fX(x; n)dx

µ2(n) = E[X(n)2

]=

∫ ∞

−∞x2 fX(x; n)dx

......

µl(n) = E[X(n)l

]=

∫ ∞

−∞xl fX(x; n)dx,

where µ1(n) is the mean of X(n), and µ2(n)−µ1(n)2 is the variance of X(n). Because in general,fX(x; n) depends on n, the moments will depend on time. The probability that a process X(n)lies in the interval [a, b] at time n1 and in the interval [c, d] at time n2 is given by the jointdensity function fX1X2(x1, x2; n1, n2),

P (a ≤ X1 ≤ b, c ≤ X2 ≤ d) =

∫ b

a

∫ d

cfX1X2(x1, x2; n1, n2)dx1dx2.

A complete description of a random process requires knowledge of all such joint densities for allpossible point locations and orders fX1X2...XN

(x1, . . . , xN ; n1, . . . , nN ).

• A continuous random process consists of a random process with associated continuouslydistributed random variables, X(n), n ∈ Z. For example, noise in communication circuitsat a given time; the voltage measured can have any value.

• A discrete random process consists of a random process with associated discrete randomvariables, e.g. the output of an ideal limiter.

7.3.1 Stationarity and Ergodicity

Definition 7.3.2 A random process X(n), n ∈ Z is said to be stationary to the order N if, forany n1, n2, . . . nN , and x1, x2, . . . , xN , and any n0

FX (x1, . . . , xN ; n1, . . . , nN ) = P (X(n1) ≤ x1, X(n2) ≤ x2, . . . , X(nN ) ≤ xN) =

P(X(n1 + n0) ≤ x1, X2(n2 + n0) ≤ x2, . . . , X(nN + n0) ≤ xN)= FX (x1, . . . , xN ; n1 + n0, . . . , nN + n0)

holds or equivalently:

fX (x1, x2, . . . , xN ; n1, . . . , nN ) = fX (x1, x2, . . . , xN ; n1 + n0, . . . , nN + n0) .

The process is said to be strictly stationary if it is stationary to the infinite order. If a processis stationary, it can be translated in time without changing its statistical description.

Ergodicity is a topic dealing with the relationship between statistical averages and timeaverages. Suppose that, for example, we wish to determine the mean µ1(n) of a process X(n).For this purpose, we observe a large number of samples X(n, ζi) and we use their ensembleaverage as the estimate of µ1(n) = E [X(n)]

µ1(n) =1

L

L∑

i=1

X(n, ζi)


Suppose, however, that we have access only to a single sample x(n) = X(n, ζ) of X(n) for−N ≤ n ≤ N . Can we use then the time average

X(n) = limN→∞

1

2N + 1

N∑

n=−N

X(n, ζ)

as the estimate of µ1(n)? This is, of course, impossible if µ1(n) depends on n. However, ifµ1(n) = µ1 is a constant, then under the general conditions, X(n) approaches µ1.

Definition 7.3.3 A random process is said to be ergodic if all time averages of any samplefunction are equal to the corresponding ensemble averages (expectation).

Example 7.3.1 DC and RMS values are defined in terms of time averages, but if the processis ergodic, they may be evaluated by use of ensemble averages. The DC value of X(n) is:

Xdc = E [X(n)]

Xrms =√

E [X(n)2].

If X(n) is ergodic, the time average is equal to the ensemble average, that is

Xdc = limN→∞

1

2N + 1

N∑

n=−N

X(n) = E [X(n)] =

∫ ∞

−∞xfX(x; n)dx = µX

Similarly, we have for the rms value:

Xrms =

√√√√ lim

N→∞1

2N + 1

N∑

n=−N

X(n)2 =√

E [X(n)2] =√

σ2X + µ2

X =

∫ ∞

−∞x2fX(x; n)dx

where σ2X is the variance of X(n).

Ergodicity and Stationarity

If a process is ergodic, all time and ensemble averages are interchangeable. The time average isby definition independent of time. As it equals the ensemble averages (such as moments), thelatter are also independent of time. Thus, an ergodic process must be stationary. But not allstationary processes are ergodic.

Example 7.3.2 An Ergodic Random ProcessLet the random process be given by: X(n) = A cos(ω0n + Φ), where A and ω0 are constants

and Φ is a random variable that is uniformly distributed over [0, 2π). First we evaluate someensemble averages (expectation).

E [X(n)] =

∫ ∞

−∞x(ϕ) fΦ(ϕ) dϕ

=

∫ 2π

0A cos(ω0n + ϕ)

1

2πdϕ

= 0.


Furthermore,

E[X(n)2

]=

∫ 2π

0A2 cos(ω0n + ϕ)2

1

2πdϕ

=A2

2

In this example, the time parameter n disappeared when the ensemble averages were evaluated.The process is stationary up to the second order.

Now, evaluate the corresponding time averages using a sample function of the random pro-cess. X(n, ζ1) = A cos ω0n that occurs when Φ = 0. The zero phase Φ = 0 corresponds to oneof the events. The time average for any of the sample functions can be calculated by letting Φbe any value between 0 and 2π. The time averages are:

• First moment: X(n) = limN→∞

1

2N + 1

N∑

n=−N

A cos(ω0n + ϕ) = 0

• Second moment: X(n)2 = limN→∞

1

2N + 1

N∑

n=−N

[A cos(ω0n + ϕ)]2 =A2

2

In this example, ϕ disappears when the time average is calculated. We see that the time averageis equal to the ensemble average for the first and second moments.

We have not yet proved that the process is ergodic because we have to evaluate all the ordersof moments and averages. In general, it is difficult to prove that a process is ergodic. In thismanuscript we will focus on stationary processes.

Example 7.3.3 Suppose that the amplitude in the previous example is now random. Let usassume that A and Φ are independent. In this case, the process is stationary but not ergodicbecause if E

[A2]

= σ2, then

E[X(n)2

]= E

[A2]· E[cos(ω0n + Φ)2

]=

σ2

2

but

limT→∞

1

2N + 1

N∑

n=−N

X(n)2 =1

2A2

Summary:

• An ergodic process has to be stationary.

• However, if a process is stationary, it may or may not be ergodic.

7.3.2 Second-Order Moment Functions and Wide-Sense Stationarity

Definition 7.3.4 The second-order moment function (SOMF) of a real stationary process X(n)is defined as:

rXX(n1, n2) = E [X(n1)X(n2)]

=

∫ ∞

−∞

∫ ∞

−∞x1x2fX1X2(x1, x2; n1, n2) dx1 dx2

where X1 = X(n1) and X2 = X(n2).


If the process is stationary to the second order, the SOMF is only a function of the time differenceκ = |n2 − n1|, and

rXX(κ) = E [X(n + κ)X(n)]

Definition 7.3.5 A random process is said to be wide-sense stationary if:

1. E [X(n)] is a constant.

2. rXX(n1, n2) = rXX(κ), where κ = |n2 − n1|

A process that is stationary to order 2 or greater is certainly wide-sense stationary. However, afinite order stationary process is not necessarily stationary.

Properties of the SOMF

1. rXX(0) = E[X(n)2

]= σ2

X + µ2X is the second order moment

2. rXX(κ) = rXX(−κ)

3. rXX(0) ≥ |rXX(κ)|, |κ| > 0

Proofs for 1 and 2 follow from the definition. The third property holds because

E[(X(n + κ) ± X(n))2

]≥ 0

E[X(n + κ)2

]± 2E [X(n + κ)X(n)] + E

[X(n)2

]≥ 0

rXX(0) ± 2 rXX(κ) + rXX(0) ≥ 0.

Definition 7.3.6 The “cross-SOMF” for two real processes X(n) and Y (n) is:

rXY (n1, n2) = E [X(n1)Y (n2)]

If X(n) and Y (n) are jointly stationary, the cross-SOMF becomes:

rXY (n1, n2) = rXY (n1 − n2) = rXY (κ)

where κ = n1 − n2.

Properties of the Cross-SOMF of a Jointly Stationary Real Process

1. rXY (−κ) = rY X(κ)

2. |rXY (κ)| ≤√

rXX(0)rY Y (0).

This holds becauseE[

(X(n + κ) − aY (n))2]

≥ 0 ∀a ∈ R

E[X(n + κ)2

]+ a2E

[Y (n)2

]− 2aE [X(n + κ)Y (n)] ≥ 0 ∀a ∈ R

rXX(0) + a2rY Y (0) − 2arXY (κ) ≥ 0

This quadratic form in a is valid if the roots are not real (except as a double real root). Therefore

rXY (κ)2 − rXX(0)rY Y (0) ≤ 0


rXY (κ)2 ≤ rXX(0)rY Y (0) ⇒ |rXY (κ)| ≤√

rXX(0)rY Y (0)

SometimesrXY (κ)

√

rXX(0)rY Y (0)

is denoted by ρXY (κ).

3.

|rXY (κ)| ≤ 1

2[rXX(0) + rY Y (0)]

Property 2 constitutes a tighter bound than that of property 3, because the geometric mean oftwo positive numbers cannot exceed the arithmetic mean, that is

√

rXX(0)rY Y (0) ≤ 1

2[rXX(0) + rY Y (0)]

• Two random processes X(n) and Y (n) are said to be uncorrelated if:

rXY (κ) = E [X(n + κ)]E [Y (n)] = µXµY ∀κ ∈ Z

• Two random processes X(n) and Y (n) are orthogonal if:

rXY (κ) = 0, ∀κ ∈ Z

• If X(n) = Y (n), the cross-SOMF becomes the SOMF.

• If the random processes X(n) and Y (n) are jointly ergodic, the time average may be usedto replace the ensemble average.

If X(n) and Y (n) are jointly ergodic

rXY (κ)= E [X(n + κ)Y (n)] = E [X(n)Y (n − κ)] ≡ lim

N→∞1

2N + 1

N∑

n=−N

X(n)Y (n − κ)

(7.4)

In this case, cross-SOMFs and auto-SOMFs may be measured by using a system composedof a delay element, a multiplier and an accumulator. This is illustrated in Figure 7.8, wherethe block z−κ is a delay line of order κ.

×- -1

2N + 1

N∑

n=−N

--z−κ

?Y (n)

X(n)

Y (n − κ)

rXY (κ)

Figure 7.8: Measurement of SOMF.


Central Second-Order Moment Function (Covariance Function)

The central second-order moment functionis called the covariance function and is defined by

cXX(n + κ, n) = E [(X(n + κ) − E [X(n + κ)])(X(n) − E [X(n)])]

which can be re-written in the form

cXX(n + κ, n) = rXX(n + κ, n) − E [X(n + κ)] E [X(n)]

The covariance function for two processes X(n) and Y (n) is defined by

cXY (n + κ, n) = E [(X(n + κ) − E [X(n + κ)])(Y (n) − E [Y (n)])]

or equivalentlycXY (n + κ, n) = rXY (n + κ, n) − E [X(n + κ)] E [Y (n)]

For processes that are at least jointly wide sense stationary, we have

cXX(κ) = rXX(κ) − (E [X(n)])2

andcXY (κ) = rXY (κ) − E [X(n)]E [Y (n)]

The variance of a random process is obtained from cXX(κ) by setting κ = 0. If the process isstationary

σ2X = E

[

(X(n) − E [X(n)])2]

= rXX(0) − (E [X(n)])2

= rXX(0) − µ2X

For two random processes, if cY X(n, n + κ) = 0, then they are called uncorrelated, which isequivalent to rXY (n + κ, n) = E [X(n + κ)] E [Y (n)].

Complex Random Processes

Definition 7.3.7 A complex random process is Z(n)= X(n) + j Y (n) where X(n) and Y (n)

are real random processes.

A complex process is stationary if X(n) and Y (n) are jointly stationary, i.e.

fZ (x1, . . . , xN , y1, . . . , yN ; n1, . . . , n2N )

= fZ (x1, . . . , xN , y1, . . . , yN ; n1 + n0, . . . , n2N + n0)

for all x1, . . . , xN , y1, . . . , yN and all n1, . . . , n2N , n0 and N . The mean of a complex randomprocess is defined as:

E [Z(n)] = E [X(n)] + jE [Y (n)]

Definition 7.3.8 The SOMF for a complex random process is:

rZZ(n1, n2) = E [Z(n1)Z(n2)∗]


Definition 7.3.9 The complex random process is stationary in the wide sense if E [Z(n)] is acomplex constant and rZZ(n1, n2) = rZZ(κ) where κ = |n2 − n1|.

The SOMF of a wide-sense stationary complex process has the symmetry property:

rZZ(−κ) = rZZ(κ)∗

as it can be easily seen from the definition.

Definition 7.3.10 The cross-SOMF for two complex random processes Z1(n) and Z2(n) is:

rZ1Z2(n1, n2) = E [Z1(n1)Z2(n2)∗]

If the complex random processes are jointly wide-sense stationary, the cross-SOMF becomes:

rZ1Z2(n1, n2) = rZ1Z2(κ) where κ = |n2 − n1|

The covariance function of a complex random process is defined as

cZZ(n + κ, n) = E [(Z(n + κ) − E [Z(n + κ)]) (Z(n) − E [Z(n)])∗]

and is a function of κ only if Z(n) is wide-sense stationary. The cross-covariance function ofZ1(n) and Z2(n) is given as:

cZ1Z2(n + κ, n) = E [(Z1(n + κ) − E [Z1(n + κ)]) (Z2(n) − E [Z2(n)])∗]

Z1(n) and Z2(n) are called uncorrelated if cZ1Z2(n+κ, n) = 0. They are orthogonal if rZ1Z2(n+κ, n) = 0.

Example 7.3.4 A complex process Z(n) is comprised of a sum of M complex signals

Z(n) =M∑

m=1

Amej(ω0n+Φm)

Here ω0 is constant, Am is a zero-mean random amplitude of the mth signal and Φm is itsrandom phase. We assume all variables Am and Φm for m = 1, 2, . . . , M to be statisticallyindependent and the random phases uniformly distributed on [0, 2π). Find the mean and SOMF.Is the process wide sense stationary ?

E [Z(n)] =M∑

m=1

E [Am] ejω0nE[ejΦm

]

=M∑

m=1

E [Am] ejω0n E [cos(Φm)] + jE [sin(Φm)]

= 0

rZZ(n + κ, n) = E [Z(n + κ)Z(n)∗]

= E

[M∑

m=1

Amejω0(n+κ)ejΦm ·M∑

l=1

Ale−jω0ne−jΦl

]

=

M∑

m=1

M∑

l=1

E [AmAl] · ejω0κ · E[

ej(Φm−Φl)]

7.4. POWER SPECTRAL DENSITY 93

Since

E[

ej(Φm−Φl)]

=

1 m = l0 m 6= l

The former holds because for m 6= l,

E[ejΦm−Φl

]= E

[ejΦm

]E[e−jΦl

]= 0

Thus if m = l,

rZZ(n + κ, n) =M∑

m=1

E[A2

m

]ejω0κ = ejω0κ

M∑

m=1

E[A2

m

]

7.4 Power Spectral Density

7.4.1 Definition of Power Spectral Density

Let X(n, ζi), n ∈ Z, represent a sample function of a real random process X(n, ζ) ≡ X(n). Wedefine a truncated version of this sample function as follows:

xN (n) =

X(n, ζi), |n| ≤ M, M ∈ Z

+

0, elsewhere

where N = 2M + 1 is the length of the truncation window. As long as M is finite, we presumethat xN (n) will satisfy

M∑

n=−M

|xN (n)| < ∞

Then the Fourier transform of xN (n) becomes:

XN (ejω) =∞∑

n=−∞xN (n) e−jωn

=

M∑

n=−M

X(n, ζi) e−jωn

The normalised energy in the time interval [−M, M ] is

EN =M∑

n=−M

xN (n)2 =

∫ π

−π

∣∣XN (ejω)

∣∣2 dω

2π

where Parseval’s theorem has been used to obtain the second integral. By dividing the expressionEN by N we obtain the average power PN in xN (n) over the interval [−M, M ]:

PN =1

2M + 1

M∑

n=−M

xN (n)2 =

∫ π

−π

|XN (ejω)|22M + 1

dω

2π(7.5)

From Equation (7.5), we see that the average power PN , in the truncated ensemble memberxN (n), can be obtained by the integration of |XN (ejω)|2/N . To obtain the average power for theentire sample function, we must consider the limit as M → ∞. PN provides a measure of powerfor a single sample function. If we wish to determine the average power of the random process,


we must also take the average (expected value) over the entire ensemble of sample functions.The average power of the process X(n, ζ) is defined as

PXX = limM→∞

1

2M + 1

M∑

n=−M

E[X(n, ζ)2

]= lim

M→∞

∫ π

−π

E[|XN (ejω, ζ)|2

]

2M + 1

dω

2π

Remark 1: The average power PXX in a random process X(n) is given by the time averageof its second moment.

Remark 2: If X(n) is wide-sense stationary, E[X(n)2

]is a constant and PXX = E

[X(n)2

].

PXX can be obtained by a frequency domain integration. If we define the power spectral density(PSD) of X(n) by

SXX(ejω, ζ) = limM→∞

E[∣∣XN (ejω, ζ)

∣∣2]

2M + 1

where

XN (ejω, ζ) =

M∑

n=−M

X(n, ζ) e−jωn

then the normalised power is given by

PXX =

∫ π

−πSXX(ejω)

dω

2π= rXX(0)

Example 7.4.1 Consider the random process

X(n) = A · cos(ω0n + Φ)

where A and ω0 are real constants and Φ is a random variable uniformly distributed on [0, π/2).We shall find PXX for X(n) using

PXX = limM→∞

M∑

n=−M

E[X(n)2

]

2M + 1.

The mean-squared value of X(n) is

E[X(n)2

]= E

[

A2 cos (ω0n + Φ)2]

= E

[A2

2+

A2

2cos (2ω0n + 2Φ)

]

=A2

2+

A2

2

∫ π/2

0cos (2ω0n + 2ϕ)

2

πdϕ

=A2

2+

A2

2

[sin (2ω0n + 2ϕ)

2

]π/2

0

2

π

=A2

2+

A2

2

[sin (2ω0n + π) − sin (2ω0n)

2

]2

π

=A2

2+

A2

2

2

π

−2 sin (2ω0n)

2

=A2

2− A2

πsin (2ω0n)


X(n) is non stationary because E[X(n)2

]is a function of time. The time average of the above

expression is

limM→∞

1

2M + 1

M∑

n=−M

[A2

2− A2

πsin(2ω0n)

]

=A2

2

Using

PXX = limM→∞

∫ π

−π

E[∣∣XN (ejω)2

∣∣]

2M + 1

dω

2π

yields PXX = A2

2 which agrees with the above, which should be shown by the reader as an exercise.

The results derived for the PSD can be obtained similarly for two processes X(n) and Y (n),say. The cross-power density spectrum is then given by

SXY (ejω) = limM→∞

E[XN (ejω)YN (ejω)∗

]

2M + 1

and thus the total average cross power is

PXY =

∫ π

−πSXY (ejω)

dω

2π

7.4.2 Relationship between the PSD and SOMF: Wiener-Khintchine Theo-rem

Theorem 7.4.1 Wiener-Khintchine TheoremWhen X(n) is a wide-sense stationary process, the PSD can be obtained from the Fourier

Transform of the SOMF:

SXX(ejω) = F rXX(κ) =∞∑

κ=−∞rXX(κ) e−jωκ

and conversely,

rXX(κ) = F−1SXX(ejω)

=

∫ π

−πSXX(ejω) ejωκ dω

2π

Consequence: there are two distinctly different methods that may be used to evaluate thePSD of a random process:

1. The PSD is obtained directly from the definition

SXX(ejω) = limM→∞

E[∣∣XN (ejω)

∣∣2]

2M + 1

where

XN (ejω) =

M∑

n=−M

X(n) e−jωn

2. The PSD is obtained by evaluating the Fourier transform of rXX(κ), where rXX(κ) hasto be obtained first.


7.4.3 Properties of the PSD

The PSD has a number of important properties:

1. The PSD is always real, even if X(n) is complex

SXX(ejω)∗ =∞∑

κ=−∞rXX(κ)∗e+jωκ

=∞∑

κ=−∞rXX(−κ)e+jωκ

=

∞∑

κ=−∞rXX(κ′)e−jωκ′

= SXX(ejω)

2. SXX(ejω) ≥ 0, even if X(n) is complex.

3. When X(n) is real, SXX(e−jω) = SXX(ejω).

4.∫ π−π SXX(ejω) dω

2π = PXX is the total normalised average power.

Example 7.4.2X(n) = A sin(ω0n + Φ)

where A, ω0 are constants and Φ is uniformly distributed on [0, 2π). The mean of X(n) is givenby

E [X(n)] =

∫ 2π

0

1

2πA sin(ω0n + ϕ) dϕ = 0

and its SOMF is

rXX(κ) = E [X(n + κ)X(n)] =A2

2cos(ω0κ)

By taking the Fourier transform of rXX(κ) we obtain the PSD

SXX(ejω) = FrXX(κ) =A2

2π η(ω − ω0) + η(ω + ω0)

where

η(ω) =∞∑

k=−∞δ(ω + k2π)

Example 7.4.3 Suppose that the random variables Ai are uncorrelated with zero mean andvariance σ2

i , i = 1, . . . , P . Define

Z(n) =

P∑

i=1

Ai expjωin.

It follows that

rZZ(κ) =P∑

i=1

σ2i expjωiκ

and

SZZ(ejω) =P∑

i=1

σ2i η(ω − ωi)


Example 7.4.4 Suppose now that the random variables Ai and Bi, i = 1, . . . , P are pairwiseuncorrelated with zero mean and common variance

E[A2

i

]= E

[B2

i

]= σ2

i , i = 1, . . . , P.

Define

X(n) =P∑

i=1

[Ai cos(ωin) + Bi sin(ωin)] .

We find

rXX(κ) =P∑

i=1

σ2i cos(ωiκ)

and therefore

SXX = πP∑

i=1

σ2i [η(ω − ωi) + η(ω + ωi)]

Definition 7.4.1 A random process X(n) is said to be a “white noise process” if the PSD isconstant over all frequencies:

SXX(ejω) =N0

2

where N0 is a positive constant.

The SOMF for a white process is obtained by taking the inverse Fourier transform of SXX(ejω).

rXX(κ) =N0

2δ(κ)

where δ(κ) is the Kronecker delta function given by

δ(κ) =

1, κ = 00, κ 6= 0

7.4.4 Cross-Power Spectral Density

Suppose we have observations xN (n) and yN (n) of a random process X(n) and Y (n), respec-tively. Let XN (ejω) and YN (ejω) be their respective Fourier transforms, i.e.

XN (ejω) =M∑

n=−M

xN (n)e−jωn

and

YN (ejω) =

M∑

n=−M

yN (n)e−jωn

The cross-spectral density function is given as


E[XN (ejω)YN (ejω)∗

]

2M + 1


Substituting XN (ejω) and YN (ejω):


1

2M + 1

M∑

n1=−M

M∑

n2=−M

rXY (n1, n2)e−jω(n1−n2)

Taking the inverse Fourier transform of both sides

∫ π

−πSXY (ejω)ejωκ dω

2π=

∫ π

−πlim

M→∞1

2M + 1

M∑

n1=−M

M∑

n2=−M

rXY (n1, n2)e−jω(n1−n2)ejωκ dω

2π

= limM→∞

1

2M + 1

M∑

n1=−M

M∑

n2=−M

rXY (n1, n2)

∫ π

−πejω(n2+κ−n1) dω

2π

= limM→∞

1

2M + 1

M∑

n1=−M

M∑

n2=−M

rXY (n1, n2)δ(n2 + κ − n1)

= limM→∞

1

2M + 1

M∑

n2=−M

rXY (n2 + κ, n2) (7.6)

provided −M ≤ n2 + κ ≤ M so that the delta function is in the range of the summation. Thiscondition can be relaxed in the limit as M → ∞. The above result shows that the time averageof the SOMF and the PSD form a Fourier transform pair. Clearly, if X(n) and Y (n) are jointlystationary then this result also implies

SXY (ejω) =∞∑

κ=−∞rXY (κ)e−jωκ

Definition 7.4.2 For two jointly stationary processes X(n) and Y (n) the cross-power spectraldensity is given by

SXY (ejω) = FrXY (κ) =∞∑


and is in general complex.

Properties:

1. SXY (ejω) = SY X(ejω)∗. In addition, if X(n) and Y (n) are real, then SXY (ejω) =SY X(e−jω).

2. ReSXY (ejω)

and Re

SY X(ejω)

are even functions of ω if X(n) and Y (n) are real.

3. ImSXY (ejω)

and Im

SY X(ejω)

are odd functions of ω if X(n) and Y (n) are real.

4. SXY (ejω) = 0 and SY X(ejω) = 0 if X(n) and Y (n) are orthogonal.

7.5. DEFINITION OF THE SPECTRUM 99

Property 1 results from the fact that:

SY X(ejω) =∞∑

κ=−∞rY X(κ)e−jωκ

=

∞∑

κ=−∞rXY (−κ)∗e−jωκ

=

[ ∞∑


]∗

= SXY (ejω)∗

Example 7.4.5 Let the SOMF function of two processes X(n) and Y (n) be

rXY (n + κ, n) =AB

2[sin(ω0κ) + cos(ω0[2n + κ])]

where A, B and ω0 are constants. We find

limM→∞

1

2M + 1

M∑

n=−M

rXY (n + κ, n) =AB

2sin(ω0κ) +

AB

2lim

M→∞1

2M + 1

M∑

n=−M

cos(ω0[2n + κ])

︸︷︷︸

=0

so using the result of Equation (7.6)

SXY (ejω) = F

AB

2sin(ω0κ)

=π

j

AB

2[η(ω − ω0) − η(ω + ω0)]

7.5 Definition of the Spectrum

The PSD of a stationary random process has been defined as the Fourier transform of the SOMF.We now define the ‘spectrum’ of a stationary random process X(n) as the Fourier transform ofthe covariance function:

CXX(ejω) :=∞∑

n=−∞cXX(n)e−jωn

If∑

n |cXX(n)| < ∞, CXX(ejω) exists and is bounded and uniformly continuous. The inversionis given by

cXX(n) =1

2π

∫ π

−πCXX(ejω)ejωn dω

CXX(ejω) is real, 2π periodic and non-negative.The cross-spectrum of two jointly stationary random processes X(n) and Y (n) is defined by

CXY (ejω) :=∞∑

n=−∞cXY (n)e−jωn


with the property

CXY (ejω) = CY X(ejω)∗

The inversion is given by

cXY (n) =1

2π

∫ π

−πCXY (ejω)ejωn dω

If X(n) and Y (n) are real,

CXX(ejω) = CXX(e−jω)

CXY (ejω) = CXY (e−jω)∗ = CY X(e−jω) = CY X(ejω)∗

The spectrum of a real process is completely specified on the interval [0, π].

7.6 Random Signals and Linear Systems

7.6.1 Convergence of Random Variables

In the following we consider definitions of convergence of a sequence of random variables Xk,k = 0, 1, 2, . . . . They are given in order of power.

1. Convergence with probability one or almost surely convergenceA variable Xk converges to X with probability 1 if

P

(

limk→∞

|Xk − X| = 0

)

= 1

2. Convergence in the Mean Square SenseA variable Xk converges to X in the MSS if

limk→∞

E[|Xk − X|2

]= 0

3. Convergence in ProbabilityXk converges in probability to X if ∀ ǫ > 0

limk→∞

P (|Xk − X| > ǫ) = 0

4. Convergence in DistributionXk converges in distribution to X (or Xk is asymptotically FX distributed) if

limk→∞

FXk(x) = FX(x)

for all continuous points of FX .

The convergences 1 to 4 are not equally weighted.

i) Convergence with probability 1 implies convergence in probability.

ii) Convergence with probability 1 implies convergence in the MSS, provided second ordermoments exist.

iii) Convergence in the MSS implies convergence in probability.

iv) Convergence in probability implies convergence in distribution.

With the concepts of convergence one may define continuity, differentiability and so on.

7.6. RANDOM SIGNALS AND LINEAR SYSTEMS 101

7.6.2 Linear filtering

We are given two stationary processes X(n) and Y (n) and a linear time-invariant (LTI) systemwith impulse response h(n), where the filter is stable (

∑ |h(n)| < ∞) as in figure 7.9. Then forsuch a system

Y (n) =∑

k

h(k)X(n − k) =∑

k

h(n − k)X(k)

exists with probability one i.e.∑M

k=−N h(k)X(n − k) converges with probability one to Y (n)for N, M → ∞.

h(n) Y(n)X(n)

Figure 7.9: A linear filter.

If X(n) is stationary and E [|X(n)|] < ∞, then Y (n) is stationary. If X(n) is white noise,i.e. E [X(n)] = 0, E [X(n)X(k)] = σ2

Xδ(n − k), with σ2X the power of the noise and δ(n) the

Kronecker delta function

δ(n) =

1, n = 00, else

Y (n) is called a linear process. If we do not require the filter to be stable but∫|H(ejω)|2 dω < ∞,

where

H(ejω) =∑

n

h(n)e−jωn

is the transfer function of the system, and if for X(n),∑ |cXX(n)| < ∞ then

Y (n) =∑

k

h(k)X(n − k) =∑

k

h(n − k)X(k)

exists in the MSS, i.e,

M∑

k=−N

h(k)X(n − k)

converges in the MSS for N, M → ∞ and Y (n) is stationary in the wide sense with

µY , E [Y (n)] =∑

k

h(k)E [X(n − k)] = µXH(ej0)


Example 7.6.1 Hilbert Transforms

H(ejω) =

−j, 0 < ω < π0, ω = 0,−πj, −π < ω < 0

h(n) =

0, n = 0(1 − cos(πn))/(πn), else

The filter is not stable, but∫|H(ejω)|2 dω < ∞. Y (n) is stationary in the wide sense. For

example if X(n) = a cos(ωon + φ), then Y (n) = a sin(ωon + φ).

Output covariance function and spectrum

We now determine the relationship between the covariance function and the spectrum, at theinput and at the output. First, let us define

Y (n) , Y (n) − µY =∑

k

h(k)X(n − k) −∑

k

h(k)E [X(n − k)]

=∑

k

h(k) (X(n − k) − µX) =∑

k

h(k)X(n − k)

where X(n) , X(n) − µX . In the general case, the system impulse response and the inputprocess may be complex.

The covariance function of the output is determined according to

cY Y (κ) = E[

Y (n + κ)Y (n)∗]

= E

[(∑

k

h(k)X(n + κ − k)

)(∑

l

h(l)X(n − l)

)∗]

=∑

k

∑

l

h(k)h(l)∗E[

X(n + κ − k)X(n − l)∗]

=∑

k

∑

l

h(k)h(l)∗cXX(κ − k + l)

Taking the Fourier transform of both sides leads to

CY Y (ejω) = |H(ejω)|2CXX(ejω) (7.7)

and applying the inverse Fourier transform formula yields2

cY Y (κ) =

∫ π

−π|H(ejω)|2CXX(ejω)ejωκ dω

2π

Example 7.6.2 FIR filterLet X(n) be white noise with power σ2 and

Y (n) = X(n) − X(n − 1).

2The integral exists because CXX(ejω) is bounded andR

|H(ejω)|2 dω < ∞.


From the above, one can see that

h(n) =

1, n = 0−1, n = 10, else

Taking the Fourier transform leads to

H(ejω) =∞∑

n=−∞h(n)e−jωn = 1 − e−jω

and thus,

CY Y (ejω) = |H(ejω)|2CXX(ejω) = |1 − e−jω|2σ2

= |2je−jω/2 sin(ω/2)|2σ2

= 4 sin(ω/2)2σ2

Cross-covariance function and cross-spectrum

The cross-covariance function of the output is determined according to

cY X(κ) = E[

Y (n + κ)X(n)∗]

= E

[(∑

k

h(k)X(n + κ − k)

)

X(n)∗]

=∑

k

h(k)E[

X(n + κ − k)X(n)∗]

=∑

k

h(k)cXX(κ − k)

Taking the Fourier transform of both sides leads to

CY X(ejω) = H(ejω)CXX(ejω)

and applying the inverse Fourier transform formula yields

cY X(κ) =

∫ π

−πH(ejω)CXX(ejω)ejωκ dω

2π.

Example 7.6.3 Consider a system as depicted in Figure 7.9 where X(n) is a stationary, zero-mean white noise process with variance σ2

X . The impulse response of the system is given by

h(n) =

[1

3

(1

2

)n

+2

3

(

−1

2

)n]

u(n)

where u(n) is the unit step function

u(n) =

1, n ≥ 00, n < 0

and we wish to determine CY Y (ejω) and CY X(ejω).


The system transfer function is given by

H(ejω) = Fh(n) =1/3

1 − 14e−jω

+2/3

1 + 12e−jω

=1

1 + 14e−jω − 1

8e−j2ω

The spectrum and cross-spectrum are given respectively by

CY Y (ejω) = |H(ejω)|2CXX(ejω)

=σ2

X∣∣1 + 1

4e−jω − 18e−j2ω

∣∣2


=σ2

X

1 + 14e−jω − 1

8e−j2ω

Interaction of Two Linear Systems

The previous result may be generalised to obtain the cross-covariance or cross-spectrum of twolinear systems. Let X1(n) and Y1(n) be the input and output processes of the first system, withimpulse response h1(n), and X2(n) and Y2(n) be the input and output processes of the secondsystem with impulse response h2(n) (see Figure 7.10). Suppose that X1(n) and X2(n) are wide-sense stationary and the systems are time-invariant and linear. Then the output cross-covariancefunction is:

cY1Y2(κ) =∑

k

∑

l

h1(k)h2(l)∗cX1X2(κ − k + l) (7.8)

It can be seen from (7.8) that

cY1Y2(κ) = h1(κ) ⋆ h2(−κ)∗ ⋆ cX1X2(κ)

where ⋆ denotes convolution. Taking the Fourier transform leads to

CY1Y2(ejω) = H1(e

jω)H2(ejω)∗CX1X2(e

jω) (7.9)

- h1(n), H1(ejω) -X1(n) Y1(n) =

∑

κ h1(κ)X1(n − κ)

cX1X2(κ)

CX1X2(ejω)

cY1Y2(κ)

CY1Y2(ejω)

- h2(n), H2(ejω) -X2(n) Y2(n) =

∑

κ h2(κ)X2(n − κ)

Figure 7.10: Interaction of two linear systems


Example 7.6.4 Consider the system shown in Figure 7.11 where Z(n) is stationary, zero-meanwhite noise with variance σ2

Z . Let h(n) be the same system as discussed in Example 7.6.3 andg(n) is given by

g(n) = δ(n) +1

2δ(n − 1) − 1

2δ(n − 2)

where δ(n) is the Kronecker delta function. We wish to find the cross-spectrum CXY (ejω).

G(ejω) =∑

n

g(n)e−jωn

= 1 +1

2e−jω − 1

2e−j2ω

CXY (ejω) = H(ejω)G(ejω)∗CZZ(ejω)

=

[

1 + 12e−jω − 1

2e−j2ω

1 + 14e−jω − 1

8e−j2ω

]

σ2Z

Z(n)

X(n)h(n)

g(n) Y (n)

Figure 7.11: Interaction of two linear systems.

Cascade of linear systems

We consider a cascade of L linear systems in serial connection, as illustrated by Figure 7.12.From linear system theory, it is known that a cascade of linear systems can be equivalentlyrepresented by a system whose transfer function H(ejω) is the product of the L individualsystem functions

H(ejω) = H1(ejω)H2(e

jω) · · ·HL(ejω).

When a random process is the input to such a system, the spectrum of the output and crossspectrum of the output with the input are given respectively by

CY Y (ejω) = CXX(ejω)

∣∣∣∣∣

L∏

i=1

Hi(ejω)

∣∣∣∣∣

2

= CXX(ejω)L∏

i=1

∣∣Hi(e

jω)∣∣2

CY X(ejω) = CXX(ejω)L∏

i=1

Hi(ejω)

Example 7.6.5 We consider again the system shown in Figure 7.11 where we wish to find thecross spectrum CXY (ejω). Instead of using Equation (7.9) with X1(n) = X2(n) = Z(n) as done


X(n) H1(ejω) H2(e

jω) HL(ejω) Y (n)· · ·

Figure 7.12: Cascade of L linear systems.

Y (n)1

G(ejω)H(ejω) X(n)

Figure 7.13: Cascade of two linear systems.

in Example 7.6.4, we reformulate the problem as a cascade of two systems as shown in Figure7.13.

Remembering that CY Y (ejω) = |G(ejω)|2CZZ(ejω) (from Figure 7.11 and Equation (7.7)) weobtain

CXY (ejω) =H(ejω)

G(ejω)CY Y (ejω)

=H(ejω)

G(ejω)|G(ejω)|2CZZ(ejω)

= H(ejω)G(ejω)∗CZZ(ejω)

7.6.3 Linear filtered process plus noise

Consider the system shown in Figure 7.14. It is assumed that X(n) is stationary with E [X(n)] =0, V (n) stationary with E [V (n)] = 0 and cV X(n) = 0 (X(n) and V (n) uncorrelated). It is alsoassumed that h(n), X(n) and V (n) are real.

+

V(n)

Y(n)X(n) h(n)

Figure 7.14: A linear filter plus noise.


The expected value of the output is derived below.

Y (n) =∞∑

k=−∞h(k)X(n − k) + V (n)

E [Y (n)] =

∞∑

k=−∞h(k)E [X(n − k)] + E [V (n)] = 0

Output covariance function and spectrum

The covariance function of the output is determined according to

cY Y (κ) = E [Y (n + κ)Y (n)]

= E

[(∑

k

h(k)X(n + κ − k) + V (n + κ)

)(∑

l

h(l)X(n − l) + V (n)

)]

=∑

k

∑

l

h(k)h(l)E [X(n + κ − k)X(n − l)]

+∑

k

h(k)E [X(n + κ − k)V (n)]

+∑

l

h(l)E [V (n + κ)X(n − l)]

+ E [V (n + κ)V (n)]

=∑

k

∑

l

h(k)h(l)cXX(κ − k + l) + cV V (κ)

Applying the Fourier transform to both sides of the above expression yields

CY Y (ejω) = FcY Y (κ) = |H(ejω)|2CXX(ejω) + CV V (ejω)

Cross-covariance function and cross-spectrum

The cross-covariance function of the output is determined according to

cY X(κ) = E [Y (n + κ)X(n)]

= E

[(∑

k

h(k)X(n + κ − k) + V (n + κ)

)

X(n)

]

=∑

k

h(k)E [X(n + κ − k)X(n)]

cY X(κ) =∑

k

h(k)cXX(κ − k)

Applying the Fourier transform to both sides of the above expression yields


Note that when the additive noise at the output, V (n) is uncorrelated with the input processX(n), it has no effect on the cross-spectrum of the input and output.


Example 7.6.6 A complex-valued process X(n) is comprised of a sum of K complex signals

X(n) =K∑

k=1

Akej(ω0n+φk)

where ω0 is a constant and Ak is a zero-mean random amplitude of the kth signal with varianceσ2

k, k = 1, 2, . . . , K. The phase φk is uniformly distributed on [−π, π). The random variablesAk and φk are statistically pairwise independent for all k = 1, . . . , K.

Z(n) is the input to a linear, stable time-invariant system with known impulse responseh(n). The output of the system is buried in zero-mean noise V (n) of known covariance cV V (n),independent of Z(n) (refer to Figure 7.14). We wish to determine the cross-spectrum CY X(ejω)and the auto-spectrum CY Y (ejω).

We first calculate the covariance function of the input process X(n)

E [X(n)] = E[∑K

k=1 Akej(ω0n+φk)

]

and since Ak and φk are independent for all k = 1, . . . , K

=

K∑

k=1

E [Ak] E[

ej(ω0n+φk)]

= 0

cXX(n + κ, n) = E [X(n + κ)X(n)∗]

= E

[K∑

k=1

Akej(ω0(n+κ)+φk)

K∑

l=1

Ale−j(ω0n+φl)

]

=K∑

k=1

K∑

l=1

E [AkAl] ejω0κE

[

ej(φk−φl)]

Since E[

ej(φk−φl)]

=

1, k = l0, k 6= l

cXX(n + κ, n) =K∑

k=1

E[A2

k

]ejω0κ

= ejω0κK∑

k=1

σ2k = cXX(κ)


= H(ejω)F cXX(κ)

= H(ejω)

[K∑

k=1

σ2k

]

2π∑

l

δ(ω + ω0 + 2πl)

=

[K∑

k=1

σ2k

]∑

l

H(ω − ω0 + 2πl)δ(ω − ω0 + 2πl)


CY Y (ejω) = |H(ejω)|2CXX(ejω) + CV V (ejω)

=

[K∑

k=1

σ2k

]∑

l

|H(ω − ω0 + 2πl)|2δ(ω − ω0 + 2πl) + CV V (ejω)

Chapter 8

The Finite Fourier Transform

Let X(0), . . . , X(n−1) be a model for the observations x(0), . . . , x(N−1) of a stationary randomprocess X(n), n = 0,±1,±2, . . . . The function

XN (ejω) =N−1∑

n=0

X(n)e−jωn, −∞ < ω < ∞

is called the finite Fourier transform.

Properties

The finite Fourier transform has some properties which are listed below.

1. XN (ej(ω+2π)) = XN (ejω)

2. XN (e−jω) = XN (ejω)∗ ∀ X(n) ∈ R

3. The finite Fourier transform of aX(n) + bY (n) is aXN (ejω) + bYN (ejω), where a and b areconstants.

4. If y(n) =∑

m h(m)x(n − m),∑

n(1 + |n|)|h(n)| < ∞ and |X(n)| < M , then there existsa constant K > 0 such that |YN (ejω) − H(ejω)XN (ejω)| < K ∀ ω.

5. X(n) = (1/2π)∫ π−π XN (ejω)ejωn dω, n = 0, 1, . . . , N − 1, and 0 otherwise.

The finite Fourier transform can be taken at discrete frequencies ωk = 2πk/N

XN (2πk/N) =∑

X(n)e−j2πkn/N k = 0, 1, . . . , N − 1 ,

which can be computed by means of the Fast Fourier Transform (FFT).

8.1 Statistical Properties of the Finite Fourier Transform

8.1.1 Preliminaries

Real normal distribution: X = (X(0), . . . , X(N − 1))T is said to be normally distributed (X ∼NN (µ,Σ)) if

fX(X) = (2π)−N/2|Σ|−1/2e−12(X−µ)T

Σ−1(X−µ)

111

112 CHAPTER 8. THE FINITE FOURIER TRANSFORM

given that µ = E[(X(0), . . . , X(N − 1))T

], Σ = (Cov [Xi, Xk])i,k=0,...,N−1 = E

[(X − µ)(X − µ)T

]

and |Σ| > 0.Complex normal distribution: A vector X is called complex normally distributed with mean µ

and covariance Σ (X ∼ NCN (µ,Σ)) if the 2N vector of real valued random variables

(ℜXℑX

)

is a normal vector with mean(

ℜµ

ℑµ

)

and covariance matrix

1

2

(ℜΣ −ℑΣℑΣ ℜΣ

)

One should note that Σ is Hermitian and so (ℑΣ)T = −ℑΣ, and

E[(X − µ)(X − µ)T

]= 0

If Σ is nonsingular, then the probability density function is

f cX(x) = π−N |Σ|−1e−(x−µ)H

Σ−1(x−µ)

For N = 1, Σ = E [(X − µ)(X − µ)∗] is real and identical to σ2X . Then Cov [ℜX,ℑX] = 0, i.e.

ℜX and ℑX are independent N1(ℜX, σ2X/2) and N1(ℑX, σ2

X/2) distributed respectively.

Asymptotic Normal DistributionWe say that a sequence X(n), n = 1, 2, . . . of N dimensional random vectors is asymptotically

NCN (µ(n),Σ(n)) distributed if the distribution function of

[G(n)

]−1 (X(n) − µ

(n))

converges to

the distribution NCN (0, I) as n → ∞. Herein G(n) is a matrix with G(n)[G(n)]H = Σ(n).

In the following, we assume that X(n) is real and stationary with finite moments of allorders and absolutely summable cumulants, that is, the statistical dependence between X(n)and X(m) decreases sufficiently fast as |n−m| increases, for example, if X(n) is a linear process.Then we have the following properties for the finite Fourier transform.

1. (a) For large N , XN (ejω) is distributed as

XN (ejω) ∼

NC1 (0, NCXX(ejω)), ω 6= kπ

N1(0, NCXX(ejπ)), ω = (2k + 1)πN1(NµX , NCXX(0)), ω = 2kπ

where k is an integer.

(b) For pairwise distinct frequencies ωi, i = 1, . . . , M with 0 ≤ ωi ≤ π, the XN (ejωi) areasymptotically independent variables. This is also valid when the ωi are approximatedby 2πki/N → ωi as N → ∞.

This states that the discrete Fourier transformation provides N independent variablesat frequencies ωi which are normally distributed with mean 0 or NµX and varianceNCXX(ejωi).

8.1. STATISTICAL PROPERTIES OF THE FINITE FOURIER TRANSFORM 113

2. If we use a window w(t), t ∈ R with w(t) = 0 for t < 0 and t > 1 that is bounded and∫ 10 w(t)2 dt < ∞, then for ωi, i = 0, . . . , N − 1

XW,N (ejωi) =N−1∑

n=0

w(n/N)X(n)e−jωin, −∞ < ω < ∞

are asymptotically independently distributed as

XN (ejωi) ∼

NC1 (0, N

∫ 10 w(t)2 dt CXX(ejωi)), ωi 6= kπ

N1(0, N∫ 10 w(t)2 dt CXX(ejπ)), ωi = (2k + 1)π

N1(NµX

∫ 10 w(t) dt, N

∫ 10 w(t)2 dt CXX(0)), ωi = 2kπ

The above is also valid for ωi ≃ 2πki/N .This states that for large N , the Fourier transform of windowed stationary processes aredistributed similarly as non-windowed processes, except that the variance is multiplied bythe factor

∫ 10 w(t)2 dt and the mean at ωi = 2kπ is multiplied by the factor

∫ 10 w(t) dt.

3. If N = ML and X(0), . . . , X(N − 1) are divided into L segments each of length M , then

XW,M (ejω, l) =

M−1∑

m=0

w(m/M)X(lM + m)e−jω(lM+m)

for l = 0, . . . , L − 1 are asymptotically independently distributed as

XW,M (ejωi , l) ∼

NC1 (0, MCXX(ejωi)

∫ 10 w(t)2 dt), ωi 6= kπ

N1(0, MCXX(ejπ)∫ 10 w(t)2 dt), ωi = (2k + 1)π

N1(MµX

∫ 10 w(t) dt, MCXX(0)

∫ 10 w(t)2 dt), ωi = 2kπ

This states that finite Fourier transforms of data segments of size M each at ωi are inde-pendently normally distributed random variable with variance M · CXX(ejωi)

∫ 10 w(t)2 dt.

4. If X(n) is Gaussian then the distributions are exact.

5. Under stronger conditions with respect to the cumulants, one can show that the Fouriertransform has a rate of growth at most of order

√N log N . If X(n) is bounded by a

constant, M say, then

supω

|XN (ejω)| ≤ MN

giving growth rate of order N .

6. If Y (n) =∑∞

k=−∞ h(k)X(n − k), E [X(n)] = 0 and∑∞

n=−∞(1 + |n|)|h(n)| < ∞,then with probability 1

YN (ejω) = H(ejω)XN (ejω) + O(√

log N)

, (8.1)

YW,N (ejω) = H(ejω)XW,N (ejω) + O

(√

log N

N

)

(8.2)

as N → ∞

114 CHAPTER 8. THE FINITE FOURIER TRANSFORM

Interpretation

To 1.(a): Take a broad-band noise with a normally distributed amplitude and pass it through anideal narrow bandpass filter,

H(ejω) =

1, |ω ± ωo| < π/N0, else

.

jωH(e )

ω0−ω0 ω

2π/Ν

Figure 8.1: Transfer function of an ideal bandpass filter.

If X(n) is filtered, then Y (n) =∑

h(m)X(n − m). For large N

Y (n) ≈ 1

NXN (2πs/N)e2πsn/N +

1

NXN (−2πs/N)e−2πsn/N

where 2πs/N ≈ ωo, also Y (n) is approximately N1(0, 1N CXX(ejωo)) distributed.

To 1.(b):

j ωH(e )0

j ωH(e )1

ω

ω−ω1 ω1

non−overlappingfrequency ranges

−ω ω0 0

approximately independentX(n)

Y(n)

Y(n) and U(n) are

U(n)

Chapter 9

Spectrum Estimation

9.1 Use of bandpass filters

X(n) is a stationary stochastic process with covariance function cXX(n) and spectrum CXX(ejω).Assume E [X(n)] = 0. If we transform X(n) by

Y (n) = X(n)2

then

E [Y (n)] = E[X(n)2

]= cXX(0) =

∫ π

−πCXX(ejω)

dω

2π

This motivates the following construction to estimate the spectrum.The bandpass filter has the transfer function

HB(ejω) =

H0

B, |ω ± ω0| ≤ B2 for |ω| ≤ π

0, else ,

where H0B will be specified later and ω0 is the center frequency of the filter. The low-pass filter

is an FIR filter with impulse response

hl(n) =

1N , n = 0, 1, 2, . . . , N − 10, otherwise ,

Band−pass filter

Low−pass filter

X(n)V(n) U(n)

Y(n)

Figure 9.1:

115

116 CHAPTER 9. SPECTRUM ESTIMATION

ω0

(ω)HΒ

Β0H

0ω +Β/20ω −Β/2−ω0

Figure 9.2: Transfer function of an ideal band-pass filter

used to calculate the mean. We have

HL(ejω) =sin(ωN

2 )

N sin(ω/2)e−ω N−1

2

The spectrum of V (n) is given by

CV V (ejω) = |HB(ejω)|2CXX(ejω)

The output process is given by

Y (n) =N−1∑

m=0

1

NU(n − m) =

1

N

N−1∑

m=0

V (n − m)2

The mean of Y (n) is given by

µY = E [Y (n)] =1

N

N−1∑

m=0

E[V (n − m)2

]= cV V (0)

=

∫ π

−πCV V (ejω)

dω

2π

=

∫ π

−π|HB(ejω)|2CXX(ejω)

dω

2π

= 2(H0B)2

∫ ω0+B/2

ω0−B/2CXX(ejω)

dω

2π

≈ (H0B)2

B

πCXX(ejω0)

If we choose H0B = 1, µY = E [Y (n)] measures the power of X(n) in the frequency band

ω − ω0 < B/2. The spectral value CXX(ejω0) will be measured if H0B =

√

π/B, providedB ≪ ω0 and CXX(ejω) is continuous at ω = ω0.

Under the assumption that X(n) is Gaussian, one can show that

σ2Y ≈ 2π

BNCXX(ejω0)2.

9.1. USE OF BANDPASS FILTERS 117

To measure CXX(ejω0), we choose H0B so that µY = CXX(ejω0). However, the power of the

output process is proportional to CXX(ejω0)2.To measure CXX(ejω) in a frequency band, we would need a filter bank with a cascade of

squarers and integrators. In acoustics, one uses such a circuit, where the center frequencies aretaken to be ωk+1 = (2ωk)

1/3, k = 1, . . . , K − 1. This is the principle of a spectral analyser,especially when it is used for analysing analog signals. Such a spectral analyser yields thespectrum at frequencies ω1, . . . , ωk simultaneously. In measurement technology, one uses simplersystems that work with a band-pass having variable center frequency. The signal of the systemoutput gives an estimate of the spectrum in a certain frequency band. The accuracy of themeasurement depends on the slowness of changing the center frequency and the smoothness ofthe spectrum of X(n).

118 CHAPTER 9. SPECTRUM ESTIMATION

Chapter 10

Non-Parametric SpectrumEstimation

10.1 Consistency of Spectral Estimates

An estimate CXX(ejω) of the spectrum CXX(ejω) is a random variable. If we could design anideal estimator for the spectrum CXX(ejω), the pdf fCXX(ejω)(α) would be a delta function at

CXX(ejω), i.e. the mean E[

CXX(ejω)]

would be CXX(ejω) and the variance Var[

CXX(ejω)]

would be zero.

Because it is not possible to create an ideal estimator, we try to reach this goal asymptoticallyas the number N of observed samples X(n), n = 0, 1, . . . , N−1 tends to infinity. We require thatCXX(ejω) is a consistent estimator for CXX(ejω), i.e., CXX(ejω) → CXX(ejω) in probability asN → ∞. A sufficient condition for consistency is that the mean square error (MSE) of CXX(ejω)converges to zero as N → ∞ (see 7.6.1), i.e.,

MSE(

CXX(ejω))

= E

[(

CXX(ejω) − CXX(ejω))2]

→ 0 as N → ∞ .

Equivalently, we may say that a sufficient condition for CXX(ejω) to be consistent is that

Var[

CXX(ejω)]

→ 0 as N → ∞

andBias

(

CXX(ejω))

= E[

CXX(ejω)]

− CXX(ejω) → 0 as N → ∞because

MSE(

CXX(ejω))

= E

[(

CXX(ejω) − CXX(ejω))2]

= E[

CXX(ejω)2]

− 2E[

CXX(ejω)]

· CXX(ejω) + CXX(ejω)2

−(

E[

CXX(ejω)])2

+(

E[

CXX(ejω)])2

= E[

CXX(ejω)2]

−(

E[

CXX(ejω)])2

+(

E[

CXX(ejω) − CXX(ejω)])2

= Var[

CXX(ejω)]

+[

Bias(

CXX(ejω))]2

.

119

120 CHAPTER 10. NON-PARAMETRIC SPECTRUM ESTIMATION

If E[

CXX(ejω)]

6= CXX(ejω) the estimator has a bias which results in a systematic error of the

method. On the other hand if Var[

CXX(ejω)]

6= 0, we get a random error in the estimator, i.e.,

the estimator varies from data set to data set.

10.2 The Periodogram

Suppose X(n), n = 0,±1, . . . is stationary with mean µX and cXX(k) = Cov [X(n + k), X(n)],i.e. the spectrum CXX(ejω) =

∑∞k=−∞ cXX(k)e−jωk, for −π < ω ≤ π is uniformly continuous

and bounded.As seen in Chapter 8, the finite Fourier transform satisfies the following:

XN (ejω) ∼

NC1 (0, NCXX(ejω)), ω 6= kπ

N1(0, NCXX(ejπ)), ω = (2k + 1)πN1(NµX , NCXX(0)), ω = 2kπ

which suggests 1 the so called Periodogram INXX(ejω) as an estimator for the spectrum CXX(ejω),

i.e.,

INXX(ejω) =

1

N|XN (ejω)|2 (10.1)

=1

N

∣∣∣∣∣

N−1∑

n=0

X(n)e−jωn

∣∣∣∣∣

2

.

Note that INXX(ejω) has the same symmetry, non-negativity and periodicity properties as CXX(ejω).

The periodogram was introduced by Schuster (1898) as a tool for the identification of hiddenperiodicities. Indeed, in the case of

X(n) =

J∑

j=1

aj cos(ωjn + φj), n = 0, . . . , N − 1

INXX(ejω) has peaks at frequencies ω = ±ωj (mod 2π), j = 1, . . . , J , provided N is large.

10.2.1 Mean of the Periodogram

The mean of the periodogram can be calculated as follows:

1From the asymptotic distribution of XN

`

ejω

´

we note that Varˆ

XN

`

ejω

´˜

= NCXX(ejω). Thus1N

Varˆ

XN

`

ejω

´˜

= 1N

E

h

˛

˛XN

`

ejω

´˛

˛

2i

= E

h

1N

˛

˛XN

`

ejω

´˛

˛

2i

.

10.2. THE PERIODOGRAM 121

E[INXX(ω)

]=

1

N· E[XN (ejω) · XN (ejω)∗

](10.2)

=1

N· E[

N−1∑

n=0

X(n)e−jωnN−1∑

m=0

X(m)∗ejωm

]

=1

N

N−1∑

n=0

N−1∑

m=0

E [X(n)X(m)∗] e−jω(n−m)

=1

N

N−1∑

n=0

N−1∑

m=0

(cXX(n − m) + µ2X)e−jω(n−m)

=1

N

N−1∑

n=0

N−1∑

m=0

cXX(n − m)e−jω(n−m) +1

N|∆N (ejω)|2µ2

X

=1

N

N−1∑

n=0

N−1∑

m=0

e−jω(n−m)

∫ π

−πCXX(ejλ)ejλ(n−m) dλ

2π+

1

N|∆N (ejω)|2µ2

X

=1

N

∫ π

−π|∆N (ej(ω−λ))|2CXX(ejλ)

dλ

2π+

1

N|∆N (ejω)|2µ2

X

where

∆N (ejω) =N−1∑

n=0

e−jωn = e−jω N−12 · sin(ωN

2 )

sin(ω2 )

is the Fourier transform of a rectangular window. Thus, for a zero mean stationary processX(n), the expected value of the periodogram IN

XX(ejω), E[INXX(ejω)

]is equal to the spectrum

CXX(ejω) convolved with the magnitude squared of the Fourier transform of the rectangular win-dow, |∆N (ejω)|2. This suggests that limN→∞ E

[INXX(ejω)

]= CXX(ejω)+µ2

X ·2π∑

k δ(ω+2πk),because limN→∞

1N |∆N (ejω)|2 = 2π

∑

k δ(ω + 2πk), as illustrated in Fig. 10.1.

Therefore the periodogram estimate is asymptotically unbiased for ω 6= 2πk, k ∈ Z. We mayfind an asymptotic unbiased estimator for CXX(ejω) ∀ω by subtracting the sample mean of the

data, before computing the periodogram, i.e., INX−µX ,X−µX

(ejω) = 1N

∣∣∑∞

n=−∞(X(n) − µX)e−jωn∣∣2

is asymptotically unbiased.

In (10.2) we calculated the mean of the periodogram of a rectangular windowed data sequenceX(n), n = 0, . . . , N−1. Advantages of using different window types are stated in the filter designtheory part (chapter 4). We now consider periodograms of sequences X(n), n = 0, . . . , N − 1,multiplied by a real-valued window (also taper) wN (n) of length N , having a Fourier transformWN (ejω). Then,

E[INWX,WX(ejω)

]=

1

N

N−1∑

n=0

N−1∑

m=0

wN (n)wN (n)E [X(n)X(m)∗] e−jω(n−m) (10.3)

=1

N

N−1∑

n=0

N−1∑

m=0

wN (n)wN (n)cXX(n − m)e−jω(n−m) +1

N|WN (ejω)|2µ2

X

=1

N

∫ π

−π|WN (ej(ω−λ))|2CXX(ejλ)

dλ

2π+

1

N|WN (ejω)|2µ2

X


−1 −0.5 0 0.5 10

5

10

15

20

Frequency ω2π

Win

dow

funct

ion

1 N|∆

N(e

jω)|2

N = 20

N = 10

N = 5

Figure 10.1: The window function 1N |∆N (ejω)|2 for N = 20, N = 10 and N = 5.

Comparing this result with the one for rectangular windowed data, we observe the following:

If CXX(ejα) has a significant peak at α in the neighborhood of ω, the expected value ofINXX(ejω) and IN

WX,WX(ejω) can differ substantially from CXX(ejω). The advantage of employinga taper is now apparent. It can be considered as a means to reducing the effect of neighboringpeaks.

10.2.2 Variance of the Periodogram

To find the variance of the periodogram we consider the covariance function of the periodogram.For simplicity, let us assume that X(n) is a real white Gaussian process with variance σ2 andabsolute summable covariance function cXX(n) and ω, λ 6= 2kπ with k ∈ Z. Then,

Cov[

INXX(ejω), IN

XX(ejλ)]

= E[

INXX(ejω)IN

XX(ejλ)]

− E[INXX(ejω)

]E[

INXX(ejλ)

]

=1

N2

(

|∆N (ej(ω+λ))|2 + |∆N (ej(ω−λ))|2)

σ4 .

Using the characteristic function, this can be proven to equal

E [X1X2X3X4] = E [X1X2] E [X3X4] + E [X1X3]E [X2X4] + E [X1X4] E [X2X3] .

10.2. THE PERIODOGRAM 123

If the process X(n) is non-white and non-Gaussian the covariance function can be found in[Brillinger 1981] as

Cov[

INXX(ejω), IN

XX(ejλ)]

≈ (10.4)

CXX(ejω)2 · 1

N2

(

|∆N (ej(ω+λ))|2 + |∆N (ej(ω−λ))|2)

,

which reduces to the variance of the periodogram to

Var[INXX(ejω)

]≈ CXX(ejω)2 · 1

N2

(|∆N (ej2ω)|2 + N2

).

This suggests that no matter how large N is, the variance of INXX(ejω) will tend to remain at the

level CXX(ejω)2. Because the variance does not tend to zero for N → ∞, according to Section10.1, the periodogram is not a consistent estimator. If an estimator with a variance smaller thanthis is desired, it cannot be obtained by simply increasing the sample length N .

The properties of the finite Fourier transform suggest that periodogram ordinates asymp-totically (for large N) are independent random variates, which explains the irregularity of theperiodogram in Figure 10.2. From the asymptotic distribution of the finite Fourier transform,

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

2

4

6

8

10

12

14

16

Figure 10.2: The periodogram estimate of the spectrum (solid line) and the true spectrum (dottedline). The abscissa displays the frequency ω

2π while the ordinate shows the estimate.

we can deduce the asymptotic distribution of periodogram ordinates as

INXX(ejω) ∼

CXX(ejω)

2 χ22, ω 6= πk

CXX(ejω)χ21, ω = πk

k ∈ Z .


10.3 Averaging Periodograms - Bartlett Estimator

One way to reduce the variance of an estimator is to average estimators obtained from indepen-dent measurements.

If we assume that the covariance function cXX(k) of the process X(n) is finite or can beviewed as finite with a small error and the length of cXX(k) is much smaller than the lengthof the data stretch N , then we can divide the process X(n) of length N into L parts withlength M each, where M ≪ N . With this assumption we can calculate L nearly independentperiodograms IM

XX(ejω, l) l = 0, . . . , L − 1 of sequences

Xl(n) = X(n + l · M) , n = 0, . . . , N − 1

The periodograms are written as

IMXX(ejω, l) =

1

M

∣∣∣∣∣

M−1∑

n=0

Xl(n)e−jωn

∣∣∣∣∣

2

(10.5)

and the estimator as

CBXX(ejω) =

1

L

L−1∑

l=0

IMXX(ejω, l) .

This method of estimating the spectral density function CXX(ejω) is called Bartlett method dueto its invention by Bartlett in 1953.

The mean of CBXX(ejω) can be easily obtained as

E[

CBXX(ejω)

]

= E

[

1

L

L−1∑

l=0

IMXX(ejω, l)

]

=1

L

L−1∑

l=0

E[IMXX(ejω, l)

]

= E[IMXX(ejω)

]

which is the same as the mean of the periodogram of the process X(n) for n = 0, . . . , M − 1.From Section 10.2.1, it can be seen that Bartlett’s estimator is an unbiased estimator of thespectrum for M → ∞ because the periodogram is asymptotically unbiased.

To check the performance of the estimator we also have to derive the variance of the estimator.

10.3. AVERAGING PERIODOGRAMS - BARTLETT ESTIMATOR 125

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

2

4

6

8

10

12

14

16

Figure 10.3: The periodogram as an estimate of the spectrum using 4 non-overlapping segments.The abscissa displays the frequency ω


The variance of Bartlett’s estimator can be calculated as

Var[

CBXX(ejω)

]

= E

[

1

L2·

L−1∑

k=0

L−1∑

m=0

IMXX(ejω, k)IM

XX(ejω, m)

]

−(

E

[

1

L·

L−1∑

l=0

IMXX(ejω, l)

])2

=1

L2·

L−1∑

p=0

E[IMXX(ejω, p)2

]+

1

L2·

L−1∑

k=0

L−1∑

m=0m6=k

E[IMXX(ejω, k)

]E[IMXX(ejω, m)

]

−(E[IMXX(ejω)

])2

=1

L2·

L−1∑

p=0

E[IMXX(ejω, p)2

]+

L · (L − 1)

L2E[IMXX(ejω)

]2 −(E[IMXX(ejω)

])2

=1

L·[

E[IMXX(ejω)2

]−(E[IMXX(ejω)

])2]

=1

L· Var

[IMXX(ejω)

]. (10.6)

Note that we can proceed similarly as above with a windowed periodogram. We would thenconsider

CBWX,WX(ejω) =

1

L

L−1∑

l=0

IMWX,WX(ejω, l)

which has similar properties as the above.The variance of the Bartlett estimator is equivalent to the variance of the periodogram di-

vided by the number of data segments L. So the variance of the estimator CBXX(ejω) decreases


with an increasing L.

With this result and according to the distribution of the periodogram, Bartlett’s estimatoris asymptotically

CBXX(ejω) ∼

CXX(ejω)

2L χ22L, ω 6= πk

CXX(ejω)L χ2

L, ω = πkk ∈ Z .

distributed. This is due to the fact that the sum of L independent χ22 distributed variables is

χ22L distributed.

10.4 Windowing Periodograms - Welch Estimator

Another method of averaging periodograms was developed by Welch in 1967. Welch derived amodified version of the Bartlett method called Welch’s method, which allows the data segmentsto overlap. Additionally, the segments are multiplied by a window function which affects thebias of the estimator. For each periodogram l, l = 0, . . . , L − 1 the data sequence is now

Xl(n) = X(n + l · D) ,

where l · D is the starting point of the lth sequence. According to this, the periodograms aredefined similarly to Equation (10.5), except for a multiplication by the window wM (n), i.e.,

IMWX,WX(ejω, l) =

1

M

∣∣∣∣∣

M−1∑

n=0

Xl(n) · wM (n) · e−jωn

∣∣∣∣∣

2

.

Welch defined his estimator as

CWXX(ejω) =

1

L · AL−1∑

l=0

IMWX,WX(ejω, l)

with a normalization factor A to get an unbiased estimation. If we set A = 1 and wM (n) as arectangular window of length M and D = M , the Welch method is equivalent to the Bartlettmethod. Thus, the Bartlett method is a special case of the Welch method.

To calculate the normalization factor A we have to derive the mean of the estimator.

E[

CWXX(ejω)

]

= E

[

1

L · A ·L−1∑

l=0

IMWX,WX(ejω, l)

]

=1

L · A ·L−1∑

l=0

E[IMWX,WX(ejω, l)

]

=1

AE[IMWX,WX(ejω)

]

From (10.3) we get

E[

CWXX(ejω)

]

=1

A· 1

M·[

1

2π

∫ π

−π|WM (ej(ω−λ))|2CXX(ejλ)d λ + µ2

X · |WM (ejω)|2]

.

10.5. SMOOTHING THE PERIODOGRAM 127

If we assume that M is large, the energy of WM (ejω) is concentrated around ω = 0. Thus, wecan state that approximately

E[

CWXX(ejω)

]

≈ 1

A· 1

M·[

1

2πCXX(ejω)

∫ π

−π|WM (ejλ)|2d λ + µ2

X · |WM (ejω)|2]

.

Using Parseval’s theorem,

1

2π

∫ π

−π|WM (ejλ)|2dλ =

M−1∑

n=0

|wM (n)|2 ,

we rewrite the above as

E[

CWXX(ejω)

]

≈ 1

A· 1

M·[

CXX(ejω)M−1∑

n=0

|wM (n)|2 + µ2X · |WM (ejω)|2

]

.

To get an asymptotically unbiased estimator for ω 6= 2kπ, k ∈ Z, A has to be chosen as

A =1

M·

M−1∑

n=0

|wM (n)|2 .

With this and M → ∞, the Welch estimator is unbiased for ω 6= 2kπ.

For M → ∞ the variance of the Welch method for non-overlapping data segments withD = M is asymptotically the same as for the Bartlett method (unwindowed data)

Var[

CWXX(ejω)

]

≈M→∞, D=M

Var[

CBXX(ejω)

]

≈ 1

LCXX(ejω)2 .

In the case of 50% overlap (D = M2 ) Welch calculated the asymptotic variance of the estimator

to be

Var[

CWXX(ejω)

]

≈M→∞, D=M

2

9

8· 1

L· CXX(ejω)2 ,

which is higher than the variance of Bartlett’s estimator. The advantage of using Welch’sestimator with overlap is that one can use more segments for the same amount of data. Thisreduces the variance of the estimate while having a bias which is comparable to the one obtainedby the Bartlett estimator. For example, using a 50% overlap leads to doubling the number ofsegments, thus a reduction of the variance by the factor 2. Additionally, the shape of the windowonly affects the bias of the estimate, not the variance. So even for non-overlapping segments,the selection of suitable window functions can result in an estimator of reduced bias. Theseadvantages makes Welch’s estimator one of the most used non-parametric spectrum estimators.The method is also implemented in the ”pwelch” function of c©MATLAB.

10.5 Smoothing the Periodogram

The asymptotic independence of the finite Fourier transform at discrete frequencies (see prop-erties 1b in section 8.1) suggests that the periodogram ordinates IN

XX(ejω1), . . . , INXX(ejωL) for

0 ≤ ω1 < ω2 < . . . < ωL ≤ π are asymptotically independent. These frequencies may be discrete


frequencies ωk = 2πkN , k ∈ Z in the neighborhood of a value of ω at which we wish to estimate

CXX(ejω). This motivates the estimate

CSXX(ejω) =

1

2m + 1

m∑

k=−m

INXX

(

ej 2πN

[k(ω)+k])

where 2πN ·k(ω) is the nearest frequency to ω at which the spectrum CXX(ejω) is to be estimated.

Given the above property, we can expect to reduce the variance of the estimation by averag-

ing over adjacent periodogram ordinates. While for ω 6= kπ, k ∈ Z INXX(ejω) ∼ CXX(ejω)

2 χ22

distributed, we note that INXX(ejω) for ω = πk is not CXX(ejω)

2 χ22 distributed but CXX(ejω)χ2

1

distributed. Therefore we exclude these values in the smoothing, which means

k(ω) + k 6= l · N, l ∈ Z .

Because the spectrum CXX(ejω) of a real process X(n) is of even symmetry and periodic in 2πit is natural to calculate only values of the periodogram IN

XX(ejω) for 0 ≤ ω < π. This resultsin the estimation of CXX(ejω) for ω = 2πl, l ∈ Z, by

CSXX(ejω) =

1

2m

[ −1∑

k=−m

INXX(ej(ω+ 2πk

N )) +m∑

k=1

INXX(ej(ω+ 2πk

N ))

]

=1

m

m∑

k=1

INXX(ej(ω+ 2πk

N )) [ω = 2lπ] or [ω = (2l + 1)π and N even]

which is the same as for the estimation of ω = (2l + 1)π when N is even. For the estimation atfrequency ω = (2l + 1)π when N is odd we use

CSXX(ejω) =

1

2m

m∑

k=1

[

INXX(ej(ω− π

N+ 2kπ

N )) + INXX(ej(ω+ π

N− 2kπ

N ))]

=1

m

m∑

k=1

INXX(ej(ω− π

N+ 2kπ

N )) ω = (2l + 1)π and N odd

which implies that the variance of the estimator at frequencies ω = lπ, l ∈ Z will be higher thanat other frequencies. In the sequel, we will only consider frequencies ω 6= lπ, but the method issimilar to frequencies ω = lπ.

The smoothing method is nothing else but sampling the periodogram at discrete frequencieswith the 2π

N spacing and averaging. Thus, the method can be easily implemented using the FFTwhich makes it computational efficient.

The mean of the smoothed periodogram is given by

E[

CSXX(ejω)

]

=1

2m + 1

m∑

k=−m

E[

INXX

(

ej 2πN

(k(ω)+k))]

.

In the following calculation we will assume that 2πk(ω)N = ω. Using the formula for the mean of

10.5. SMOOTHING THE PERIODOGRAM 129

the periodogram (10.2), we obtain

E[

CSXX(ejω)

]

=1

2m + 1

m∑

k=−m

1

N

∫ π

−π|∆N (ej(ω+ 2πk

N−λ))|2CXX

(

ejλ) dλ

2π(10.7)

+1

N|∆N (ejω)|2µ2

X

=

∫ π

−π

1

2m + 1· 1

N·

m∑

k=−m

|∆N (ej(ω+ 2πkN

−λ))|2

︸︷︷︸

Am(ej(ω−λ))

CXX

(

ejλ) dλ

2π+

1

N|∆N (ejω)|2µ2

X

=

∫ π

−πAm

(

ej(ω−λ))

CXX

(

ejλ) dλ

2π+

1

N|∆N (ejω)|2µ2

X .

Examining the shape of Am(ejω) in Figure 10.4 for a fixed value of N , and different values ofm, we can describe Am(ejω) as having approximately a rectangular shape. A comparison of

0 0.2 0.4 0.6 0.8 10

5

10

15

Am

(ejω

)| N=

15,m

=0

ωπ

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5A

m(e

jω)| N

=15,m

=1

ωπ

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

Am

(ejω

)| N=

15,m

=3

ωπ

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.2

1.4

Am

(ejω

)| N=

15,m

=5

ωπ

Figure 10.4: Function Am(ejω) for N = 15 and m = 0, 1, 3, 5.

the mean of the periodogram (10.2) and the mean of the smoothed periodogram (10.7) suggeststhat the bias of CS

XX(ejω) will most often be greater than the bias of INXX(ejω) as the integral

extends over a greater range. Both will be the same if the spectrum CXX(ejω) is flat, howeverCS

XX(ejω) is unbiased as N → ∞.


The calculation of the variance for CSXX(ejω) is similar to the calculation in (10.6) and results

in

Var[

CSXX(ejω)

]

=1

2m + 1Var

[INXX(ejω)

].

This can be confirmed from Figure 10.5 where the estimate CSXX(ejω) is plotted for different

values of m.

0 0.1 0.2 0.3 0.4 0.5

2

4

6

8

10

12

14

0 0.1 0.2 0.3 0.4 0.5

2

4

6

8

10

0 0.1 0.2 0.3 0.4 0.5

2

4

6

8

10

0 0.1 0.2 0.3 0.4 0.5

2

4

6

8

Figure 10.5: The smoothed periodogram for values of m = 2, 5, 10 and 20 (from the top left tobottom right). The abscissa displays the frequency ω


With this result and the asymptotic behavior of the periodogram ordinates, we find theasymptotic distribution of CS

XX(ejω) as

CSXX(ejω) ∼

CXX(ejω)4m+2 χ2

4m+2, ω 6= πk

CXX(ejω)2m χ2

2m, ω = πk

k ∈ Z .

10.6 A General Class of Spectral Estimates

In section 10.5 we considered an estimator which utilised the independence property of theperiodogram ordinates. The smoothed periodogram weighted equally all periodogram ordinates

10.6. A GENERAL CLASS OF SPECTRAL ESTIMATES 131

near ω, which, for a nearly constant spectrum is reasonable. However, if the spectrum CXX(ejω)varies, this method creates a larger bias. In this case it is more appropriate to weigh periodogramordinates near ω more than those further away. Analogously to Section 10.5, we construct theestimator as

CSWXX (ejω) =

1

A

m∑

k=−m

WkINXX

(

ej 2πN

[k(ω)+k]) k(ω) + k 6= l · N

ω 6= l · π l ∈ Z , (10.8)

where A is a constant to be chosen for an unbiased estimator. The case where ω = lπ can beeasily derived from Section 10.5. For further investigations of the estimator in (10.8) we willonly consider the case where ω 6= l · π, l ∈ Z.

The mean of the estimator CSWXX (ejω) can be calculated with the assumption that 2πk(ω)

N = ωby

E[

CSWXX (ejω)

]

=1

A

m∑

k=−m

Wk E[

INXX

(

ej[ω+ 2πkN ])]

.

To get an asymptotically unbiased estimator for CSWXX (ejω) we set

A =m∑

k=−m

Wk

because E[INXX(ejω)

]= CXX(ejω) as N → ∞. The mean of the estimator can be derived as

E[

CSWXX (ejω)

]

=

∫ π

−π

1

AN

m∑

k=−m

Wk|∆N (ej(ω+ 2πkN

−λ))|2

︸︷︷︸

ASW (ej(ω−λ))

CXX

(

ejλ) dλ

2π

+1

N|∆N (ejω)|2µ2

X

=

∫ π

−πASW (ej(ω−λ))CXX

(

ejλ) dλ

2π+

1

N|∆N (ejω)|2µ2

X

where

ASW

(ejω)

=1

∑ml=−m Wl

· 1

N

m∑

k=−m

Wk|∆N (ej(ω+ 2πkN

))|2 .

This expression is very similar to the function Am(ejω) from Equation (10.7). Indeed if we setWk = 1

2m+1 ∀ k, it follows automatically that ASW (ejω) = Am(ejω), i.e. Am(ejω) is a special case

of ASW (ejω). However with the weights Wk we have more control over the shape of ASW (ejω)which also leads to a better control of the bias of the estimator itself. The effect of differentweightings on the shape of ASW (ejω) can be seen in Figure 10.6.


0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

AS

W(e

jω)

ωπ

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

3

AS

W(e

jω)

ωπ

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

3

3.5

4

AS

W(e

jω)

ωπ

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

3

3.5

4

AS

W(e

jω)

ωπ

Figure 10.6: Function ASW (ejω) for N = 15 and m = 4 with weights Wk in shape of (startingfrom the upper left to the lower right) a rectangular, Bartlett, Hamming and Blackman window.

The variance of the estimator CSWXX (ejω) is given by

Var[

CSWXX (ejω)

]

= E[

CSWXX (ejω)2

]

−(

E[

CSWXX (ejω)

])2

=1

A2

m∑

k=−m

m∑

l=−m

WkWlE[

INXX

(

ej 2πN

[k(ω)+k])

INXX

(

ej 2πN

[k(ω)+l])]

− 1

A2

m∑

q=−m

m∑

µ=−m

WqWµ

(

E[

INXX

(

ej 2πN

[k(ω)])])2

=1

A2

m∑

p=−m

W 2p E

[

INXX

(

ej 2πN

[k(ω)+p])2]

+1

A2

m∑

k=−m

m∑

l=−ml6=k

WkWl

(

E[

INXX

(

ej 2πN

[k(ω)])])2

− 1

A2

m∑

q=−m

m∑

µ=−m

WqWµ

(

E[

INXX

(

ej 2πN

[k(ω)])])2

=1

A2

m∑

p=−m

W 2p Var

[

INXX

(

ej 2πN

[k(ω)])]

.

10.7. THE LOG-SPECTRUM 133

Withm∑

k=−m

W 2k =

m∑

k=−m

[

Wk −∑m

i=−m Wi

2m + 1

]

+

(∑mk=−m Wk

)2

2m + 1≥ 0 (10.9)

we are able to calculate the weighting coefficients Wk for the lowest variance of CSWXX (ejω). From

(10.9) it is easy to see that if

Wk =

∑mi=−m Wi

2m + 1=

A

2m + 1, k = 0,±1, . . . ,±m,

we reach the lowest possible variance for the estimator CSWXX (ejω). Because this result is nothing

other than the smoothed periodogram (see section 10.5) we see that CSXX(ejω) has the largest

bias for a fixed N but the lowest variance of the general class of spectrum estimators covered inthis section.

10.7 The Log-Spectrum

Instead of estimating the spectrum directly, it is possible to estimate 10 log CXX(ejω) from10 log CXX(ejω). The log-function can be used as a variance stabilizing technique [Priestley2001]. We then find

E[

10 log CXX

(ejω)]

≈ 10 log CXX

(ejω)

Var[

10 log CXX

(ejω)]

≈ (10 log e)2

C

where C is a constant that controls the variance. If we use Bartlett’s estimator, for example, Cwould be equal to L.

The above equations show that the variance is approximately a constant and does not dependon CXX(ejω).

10.8 Estimation of the Spectrum using the Sample Covariance

The covariance function cXX(n) is defined in Chapter 7 as

cXX(n) = E [(X(k + n) − µX)(X(k) − µX)∗] .

Given X(0), . . . , X(N − 1) we can estimate cXX(k) by

cXX(n) =

1N

N−1−|n|∑

k=0

[X(k + |n|) − X

] [X(k) − X

]∗, 0 ≤ n ≤ N − 1

0 , otherwise

(10.10)

where X = 1N

N−1∑

k=0

X(k) is an estimate of the mean µX = E [X(k)]. As it can be seen from

(10.10) the estimator of the covariance function cXX(n) is only defined for n ≥ 0. It is obviousto calculate estimates for n < 0 via the symmetry property of the covariance function

cXX(n) = cXX(−n)∗ .


Equation (10.10) is a computationally efficient implementation for cXX(k) = 1N

∑∞k=−∞(X(k +

n)− X)(X(k)− X)∗ if X(n) is zero outside the interval n = 0, . . . , N − 1. It can be proven thatcXX(n) is an asymptotically consistent estimator of the covariance function cXX(n).

Motivated by the fact that the spectrum CXX(ejω) is the Fourier transform of the covariancefunction cXX(n) it is intuitive to suggest an estimator of CXX(ejω) as

CXX(ejω) =∞∑

n=−∞cXX(n)e−jωn

=1

N

N−1∑

n=−(N−1)

∞∑

l=−∞[X(l + n) − X][X(l) − X]∗ · e−jωn

=1

N

N−1∑

n=−(N−1)

∞∑

l=−∞[X(n − l) − X][X(−l) − X]∗ · e−jωn

which is equivalent to

CXX(ejω) = INX−X,X−X(ejω) (10.11)

and should be shown by the reader as an exercise.

The derivation in (10.11) shows that taking the Fourier transform of the sample covari-ance function is equivalent to computing the periodogram as in (10.1). This is therefore not aconsistent method for estimating the spectrum.

10.8.1 Blackman-Tukey Method

In 1958 Blackman and Tukey developed a spectral density estimator, here referred to theBlackman-Tukey method, which is based on the sample covariance function. The basic ideaof the method is to estimate the sample covariance function cXX(k) and window it with afunction w2M−1(k) of the following form

w2M−1(k) =

w2M−1(−k) , for |k| ≤ M − 1 with M ≪ N0 , otherwise

.

This condition ensures that the Fourier transform of the window is real-valued and that theestimator CBT

XX(ejω) will not be complex. We then get

CBTXX(ejω) =

M−1∑

n=−(M−1)

cXX(n) · w2M−1(n) · e−jωn .

Using (10.11) this is equivalent to

CBTXX(ejω) =

π∫

−π

INX−X,X−X(ejλ) · W2M−1

(

ej(ω−λ)) d λ

2π(10.12)

which is the convolution of the Periodogram INX−X,X−X

(ejω) with the Fourier transform of the

window w2M−1(n). From (10.12) we get another constraint for the window

W2M−1(ejω) ≥ 0 ∀ω ,

10.9. CROSS-SPECTRUM ESTIMATION 135

to ensure the estimate CBTXX(ejω) is non-negative for all ω.

The mean of the estimator CBTXX(ejω) can be easily calculated as

E[

CBTXX(ejω)

]

= E[

INX−X,X−X(ejω)

]

©⋆ W2M−1(ejω)

which, according to Section 10.2.1, is given by

E[

CBTXX(ejω)

]

=1

N|∆N (ejω)|2 ©⋆ W2M−1(e

jω) ©⋆ CXX(ejω)

=1

2π

π∫

−π

CXX(ejθ)1

2π

∞∫

−∞

1

N|∆N (ejϑ)|2 · W2M−1(e

j(ω−θ−ϑ))d ϑd θ .

Since limN→∞1N

∣∣∆N (ejω)

∣∣2 = 2π

∑

k δ(ω + 2πk), we find

limN→∞

1

2π

∫ ∞

−∞

1

N|∆N (ejϑ)|2 · W2M−1

(

ej(ω−θ−ϑ))

d ϑ = W2M−1

(

ej(ω−θ))

which reduces the term for the mean of the estimator to a simple convolution

E[

CBTXX(ejω)

]

=1

2π

∫ π

−πCXX(ejθ) · W2M−1

(

ej(ω−θ))

d θ . (10.13)

Therefore the Blackman-Tukey estimator is unbiased for M → ∞.

The variance of the estimator is given from Eq. (10.12) as

Var[

CBTXX(ejω)

]

≈ CXX(ejω)2 · 1

N

∫ π

−πW2M−1(e

jω)2d ω

≈ CXX(ejω)2 · 1

N

M−1∑

n=−(M−1)

w2M−1(n)2 .

Based on this approximation the variance of the Blackman Tukey method tends to zero if thewindow function w2M−1(n) has finite energy.

10.9 Cross-Spectrum Estimation

If X(n) and Y (n) are jointly stationary and real, the cross-covariance function is defined as

cY X(n) = E [(Y (n + k) − E [Y (n)])(X(k) − E [X(k)])]

cY X(n) = E [Y (n + k)X(k)] − E [Y (n)] E [X(n)]

The cross-spectrum is defined by

CY X(ejω) =

∞∑

n=−∞cY X(n)e−jωn = CXY (ejω)∗ = CXY (e−jω) .


Given X(0), . . . , X(N − 1), Y (0), . . . , Y (N − 1), the objective is to estimate the cross-spectrumCY X(ejω) for −π < ω < π. The finite Fourier transforms are given by

YN (ejω) =N−1∑

n=0

Y (n)e−jωn

XN (ejω) =

N−1∑

n=0

X(n)e−jωn

XN (ejω) and YN (ejω) are for large N normally distributed random variables as discussed above.At distinct frequencies 0 < ω1 < ω2 < . . . < ωN < π the random variables XN (ejω) and YN (ejω)are independent.

Considering L data stretches of the series X(0), . . . , X(N − 1) and Y (0), . . . , Y (N − 1), sothat ML = N ,

YN (ejω, l) =M−1∑

m=0

Y ([l − 1]M + m) e−jωm l = 1, . . . , L and

XN (ejω, l) =

M−1∑

m=0

X ([l − 1]M + m) e−jωm l = 1, . . . , L

are for large M stochastically independent for 0 < ω ≤ π.

Defining V(ejω) =(ℜXN (ejω),ℜYN (ejω),ℑXN (ejω),ℑYN (ejω)

)T, we have

E[V(ejω)V(ejω)H

]=

N

2

CXX(ejω) ℜCY X(ejω) 0 ℑCY X(ejω)ℜCY X(ejω) CY Y (ejω) −ℑCY X(ejω) 0

0 ℑCY X(ejω) CXX(ejω) ℜCY X(ejω)−ℑCY X(ejω) 0 ℜCY X(ejω) CY Y (ejω)

.

10.9.1 The Cross-Periodogram

The cross-periodogram is defined as

INY X(ejω) =

1

NYN (ejω)XN (ejω)∗

It can be shown that

E[INY X(ejω)

]= CY X(ejω)

Also, we have

E[YN (ejω)XN (ejω)

]= 0

and

Var[INY X(ejω)

]= CY Y (ejω)CXX(ejω)

10.9. CROSS-SPECTRUM ESTIMATION 137

Because

∣∣CY X(ejω)

∣∣2 ≤ CY Y (ejω)CXX(ejω)

The variance of INY X(ejω) is never smaller than

∣∣CY X(ejω)

∣∣2 for any N .

To construct estimates for CY X(ejω) with lower variance, we use the properties given above.We calculate

CY X(ejω) =1

L

L∑

l=1

1

MYM (ejω, l)XM (ejω, l)∗

and we have (for large M)

E[

CY X(ejω)]

≈ CY X(ejω)

Var[

CY X(ejω)]

≈ 1

LCY Y (ejω)CXX(ejω)

Also we can smooth the cross-periodogram over neighboring frequencies and get similar results.

Chapter 11

Parametric Spectrum Estimation

11.1 Spectrum Estimation using an Autoregressive Process

X(n)h(n)Z(n)

Figure 11.1: Linear and time-invariant filtering of white noise

Consider the system of Figure 11.1 with output X(n) such that

X(n) +

p∑

k=1

akX(n − k) = Z(n)

where E [Z(n)] = 0, E [Z(n)Z(k)] = σ2Zδ(n−k) and ap 6= 0. It is assumed that the filter with unit

impulse response h(n) is stable. The process X(n) is then called an autoregressive process oforder p or in short notation AR(p). The recursive filter possesses the following transfer function

H(ejω) =1

1 +∑p

k=1 ake−jωk

Because we have assumed the filter to be stable, the roots of the characteristic equation

zp +

p∑

k=1

akzp−k = 0, z = ejω

lie in the unit disc, i.e. |zl| < 1 for all l roots of the above equation. The spectrum is given as

CXX(ejω) = |H(ejω)|2CZZ(ejω)

thus,

CXX(ejω) =σ2

Z

|1 +∑p

k=1 ake−jωk|2

139

140 CHAPTER 11. PARAMETRIC SPECTRUM ESTIMATION

The objective is to estimate CXX(ejω) based on observations of the process X(n) for n =0, . . . , N − 1.

Multiplying

X(n) +

p∑

k=1

akX(n − k) = Z(n)

by X(n − l) (from the right), l ≥ 0 and taking expectation, leads to

E [X(n)X(n − l)] +

p∑

k=1

akE [X(n − k)X(n − l)] = E [Z(n)X(n − l)]

Thus,

cXX(l) +

p∑

k=1

akE [X(n − k)X(n − l)] = E [Z(n)X(n − l)]

Assuming the recursive filter to be causal, i.e. h(n) = 0 for n < 0 leads to the relationship

X(n) =∞∑

k=0

h(k)Z(n − k)

= h(0)Z(n) +∞∑

k=1

h(k)Z(n − k)

Thus, shifting X(n) by l gives

X(n − l) = h(0)Z(n − l) +∞∑

k=1

h(k)Z(n − l − k)

Taking the expected value of the product Z(n)X(n − l) leads to

E [Z(n)X(n − l)] = h(0)cZZ(l) +∞∑

k=1

h(k)cZZ(l + k)

︸︷︷︸

=0

E [Z(n)X(n − l)] = h(0)cZZ(l) = h(0)σ2Zδ(l)

Assuming h(0) = 1, leads to the so called Yule-Walker equations

cXX(l) +

p∑

k=1

akcXX(l − k) =

σ2

Z , l = 00, l = 1, 2, . . .

The difference equation in cXX(l) has a similar form to the system describing difference equationin X(n).

Having observed X(n), one can estimate cXX(n). Using cXX(0), . . . , cXX(p), one would geta unique solution for a1, . . . , ap. The solution is unique if the inverse matrix with the elementscXX(l − k) exists, which is guaranteed if the filter is stable. Given estimates a1, . . . , ap ofa1, . . . , ap, one can use the Yule-Walker equations for l = 0 to get σ2

Z . Then it is straightforwardto find an estimate for CXX(ejω) given by

CXX(ejω) =σ2

Z

|1 +∑p

k=1 ake−jωk|2

11.2. SPECTRUM ESTIMATION USING A MOVING AVERAGE PROCESS 141

11.2 Spectrum Estimation using a Moving Average Process

Consider a transversal filter with impulse response h(n), input Z(n) and output X(n) ∈ R asshown in Fig. 11.2:

X(n) =

q∑

k=0

h(k)Z(n − k) = Z(n) +

q∑

k=1

h(k)Z(n − k) .

If Z(n) is a white noise process, i.e., E [Z(n)] = 0 and cZZ(k) = σ2Zδ(k), then X(n) is a Moving

Average (MA) Process of order q, MA(q).

Z(n)

X(n)

z−1z−1z−1

h(1) h(2) h(q)

Figure 11.2: Filter model for a moving average process

The spectrum of X(n) can be calculated as

CXX(ejω) =∣∣H(ejω)

∣∣2σ2

Z .

The covariance function of X(n) is

cXX(n) = E [X(n + k)X(k)]

= E

[(q∑

r=0

h(r)Z(n + k − r)

)(q∑

m=0

h(m)Z(k − m)

)]

=

q∑

r=0

q∑

m=0

h(r)h(m)σ2Zδ(n + m − r)

cXX(n) =

σ2Z

q−|n|∑

m=0h(m)h(m + |n|), 0 ≤ |n| ≤ q

0, else

.

If we have cXX(0), . . . , cXX(q), we would need to solve non-linear equations in h(0), . . . , h(q)and σ2

Z , which makes it impossible to find a unique solution.

Let b(m) = σZh(m) for m = 0, . . . , q, then

cXX(n) =

q−|n|∑

m=0b(m)b(m + |n|), 0 ≤ |n| ≤ q

0, else

.


Let cXX(0), . . . , cXX(q) be available (consistent) estimates for cXX(0), . . . , cXX(q). Then

cXX(n) =

q−n∑

m=0

b(m)b(m + n), n = 0, . . . , q .

The estimates b(0), . . . , b(q) are consistent, because of the consistency of cXX(k), k = 0, . . . , q.

Then h(1) = b(1)

b(0), . . . , h(q) = b(q)

b(0), σ2

Z = b(0)2.

The implementation of the solution is as follows:With

B(z) =

q∑

n=0

b(n)z−n

CXX(z) =

q∑

n=−q

cXX(n)z−n

= B(z)B (1/z)

1. Calculate P (z) = zqCXX(z)

2. Assign all roots zi of P (z) inside the unit circle to B(z), i.e. B(z) = b(0)∏q

i=1(1 − ziz−1)

with b(0) > 0 and |zi| ≤ 1.

3. Calculate h(m) = b(m)

b(0)for m = 1, . . . , q

As an example

b(1)

b(0)= h(1) = −

q∑

i=1

zi .

To find σ2Z = b2

0 we use

cXX(n) =

q−k∑

m=0

b(m)b(m + n) , n = 0, 1, . . . , q

and find for n = 0

b(0)2 =cXX(0)

1 +∑q

m=1 h(m)2

as an estimate for the power σ2Z of Z(n).

An alternative to estimate the spectrum CXX(ejω) of X(n) is

CXX(ejω) =

q∑

n=−q

cXX(n)e−jωn = cXX(0) + 2

q∑

n=1

cXX(n)e−jωn . (11.1)

Example 11.2.1 Consider a receiver input signal X(n) from a wireless transmission. To findthe moving average estimate of the spectrum CXX(ejω), we first calculate the sample covariancecXX(n) and take the z-transform CXX(z). By taking P (z) = zN−1CXX(z) we find N − 1 rootszi, i = 1, . . . , N − 1 inside the unit circle and assign them to B(z). The estimator CXX(ejω)can then be found according to (11.1).

11.3. MODEL ORDER SELECTION 143

0 0.5 1 1.5 20

2

4

6

8

10

12

ωπ

CXX(ejω)

CXX(ejω)

Figure 11.3: Estimate of an MA process with N = 100 and q = 5

11.3 Model order selection

A problem arising with the parametric approach is that of estimating the number of coefficientsrequired, i.e. p or q.

Having observed X(0), . . . , X(N − 1) there are several techniques for doing this. Akaike’sinformation criterion (AIC) suggests minimising the following function with respect to m.

AIC(m) = log σ2Z,m + m

2

N, m = 1, . . . , M

The order estimate is the smallest m minimising AIC(m).Another more accurate estimate is given by the minimum description length (MDL) criterion,

obtained by taking the smallest m minimising

MDL(m) = log σ2Z,m + m

log N

N, m = 1, . . . , M

ExamplesWe generate realisations from the N (0, 1) distribution. We filter the data recursively to get 1024values. The filter has poles

z1,2 = 0.7e±j0.4, z3,4 = 0.75e±j1.3, z5,6 = 0.9e±j1.9

It is a recursive filter of order 6. The process is an AR(6) process.


0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

1

2

3

4

5

6

7

8

9

10 True Estimate

Figure 11.4: True and estimated AR(6) spectrum. The abscissa displays the frequency ω2π while the

ordinate shows the estimate.

2 4 6 8 10

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Order

AIC

2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Order

MDL

Figure 11.5: Akaike and MDL information criteria for the AR(6) process for orders 1 through 10.

Chapter 12

Discrete Kalman Filter

The Kalman filter has become an indispensable tool in modern control and signal processing.Its simplest use is as an estimation tool. However, it is capable of revealing more than justa standard estimate; it provides a complete statistical description of the degree of knowledgeavailable. This can be highly useful in system analysis. In contrast to many other estimationtechniques, the Kalman filter operates in the state space domain. This intuitive structure hasgiven it physical motivation in a variety of applications. In addition, its implementation can becomputationally efficient.

As a result, it has found application in a wide variety of fields; especially in process controland target tracking, but also in many other signal processing topics.

12.1 State Estimation

The purpose of a Kalman filter is to estimate the state of a process. In many (or most) physicalapplications, this state will consist of multiple random variables. For example, in a remotesensing scenario, the task may be to estimate the position and velocity of a vehicle. In thiscase information may be available in the form of measurements such as acoustic signals, radarreturns or visual images.

Once an appropriate model of the system is constructed, it may be possible to derive analgorithm for the estimation of the state, based on a finite set of observations.

12.2 Derivation of Kalman Filter Equations

Consider a multi-dimensional system where the time evolution of the (unobservable) processstate can be described by

(X1(n)XN (n)

)

=

a11 . . . a1N...

. . ....

aN1 . . . aNN

X1(n − 1)...

XN (n − 1)

+

b11 . . . b1L...

. . ....

bN1 . . . bNL

U1(n − 1)...

UL(n − 1)

+

W1(n − 1)...

WN (n − 1)

and in vector notation

X(n) = AX(n − 1) + BU(n − 1) + W (n − 1) (12.1)

where

• X(n) ∈ ℜN×1 is the process state at time n

145

146 CHAPTER 12. DISCRETE KALMAN FILTER

• A is the state transition matrix

• U(n) ∈ ℜL×1 is the control input

• B governs the effect of the control input on the state

• W (n) ∈ ℜN×1 is the process noise.

The measurements of the process are determined by

Z1(n)...

ZM (n)

=

h11 . . . h1N...

. . ....

hM1 . . . hMN

X1(n)...

XN (n)

+

V1(n)...

VM (n)

Z(n) = HX(n) + V (n) (12.2)

where

• Z(n) ∈ ℜM×1 is the measurement

• H maps the process state to a measurement

• V (n) ∈ ℜM×1 is the measurement noise

The relationships between the system components can be seen in Figure 12.1.

B

W (n)

A

δK(n − 1) HX(n)

V (n)

U(n) Z(n)

Figure 12.1: Relationship between system state and measurements assumed for Kalman filter.

In addition, we assume that the noise quantities W (n) and V (n) are independent to eachother and to U(n), white and Gaussian distributed,

W (n) ∼ N (0, Q)

V (n) ∼ N (0, R) .

The matrices A, B and H are assumed known from the physics of the system.The control input U(n) is the intervention or “steering” of the process by an outside source

(e.g. a human operator). It is assumed that U(n) is known precisely. Therefore, its effect onthe process, through (12.1) is also known.

Now, in order to derive the Kalman filter equations we need to consider what is the “best”estimate X(n) of the new state X(n) based on the previous state estimate X(n − 1), the pre-vious control input U(n − 1) and the current observation Z(n). This estimate of X(n) may becalculated in two steps:

12.2. DERIVATION OF KALMAN FILTER EQUATIONS 147

1. Predict forward in time from previous estimate

X−(n) = AX(n − 1) + BU(n − 1) . (12.3)

From (12.1) it can be seen that the new state is a linear function of the previous state withthe addition of the (known) control input and (unknown) zero-mean noise. To predictthe new state it is reasonable to use the previous state estimate in place of the true stateand to ignore the additive noise component since its expected value is zero. The quantityX−(n) is referred to a the a priori state estimate since it is determined from the knowledgeavailable before time instant n. It is a projection of the previous state estimate forward intime.

2. Predict the new measurement value

Z−(n) = HX−(n) (12.4)

and correct the state estimate based on the residual between predicted and actual mea-surements

X(n) = X−(n) + K(n)(Z(n) − Z−(n)) . (12.5)

The predicted measurement, Z−(n), is calculated from the a priori state estimate. Itcan be seen that (12.4) is similar to (12.2) without the zero-mean measurement noise andusing the previous state estimate rather than the true, unknown value. Since the truemeasurement, Z(n), is available, the measurement residual Z(n) − Z−(n) is a measureof how correct the state estimate is – a small error is more likely when the estimate isgood, whereas a poor estimate will be more likely to produce a large difference betweenpredicted and actual measurements. The state estimate is now corrected by the weightedresidual K(n)(Z(n) − Z−(n)).

The N ×M matrix K(n) is referred to as the Kalman gain. It controls how the measurementresidual affects the state estimate. In order to optimise the choice of K(n), a criterion must bespecified. A logical choice would be to minimise the mean squared error between the a posterioriestimate X(n) and the true state vector X(n), i.e.

minK(n)

E[

‖X(n) − X(n)‖2]

= minK(n)

tr P (n) (12.6)

whereP (n) = E

[E(n)E(n)T

](12.7)

withE(n) = X(n) − X(n) (12.8)

is the (state) estimate error – i.e. the difference between the true state vector and our a posterioristate estimate.

To find the K(n) that minimises (12.6), start by writing the a posteriori error in terms ofthe a priori error, E−(n) = X(n) − X−(n), and its covariance, P−(n) = E

[E−(n)E−(n)T

]

E(n) = X(n) − X(n)

= E−(n) −(

X(n) − X−(n))

= E−(n) − K(n)(

Z(n) − HX−(n))

= E−(n) − K(n)(

H(X(n) − X−(n)) + V (n))

= E−(n) − K(n)(HE−(n) + V (n)

)(12.9)


Now substitute into (12.7)

P (n) = E[E(n)E(n)T

]

= E[[

E−(n) − K(n)(HE−(n) + V (n)

)] [E−(n) − K(n)

(HE−(n) + V (n)

)]T]

= E[E−(n)E−(n)T

]− E

[K(n)

(HE−(n) + V (n)

)E−(n)T

]. . .

−E[

E−(n)(HE−(n) + V (n)

)TK(n)T

]

. . .

+E[

K(n)(HE−(n) + V (n)

) (HE−(n) + V (n)

)TK(n)T

]

Differentiate1 tr P (n) with respect to K(n) and set to 0

0 = 0 − P−(n)HT − P−(n)HT + 2K(n)(HP−(n)HT + R

)

P−(n)HT = K(n)(HP−(n)HT + R

)

K(n) = P−(n)HT(HP−(n)HT + R

)−1. (12.10)

It is easy to see that as R → 0, i.e. the measurement noise becomes less influential , K(n) → H−1.Thus, the residual between actual and predicted measurements is more trusted and is given aheavy weighting in correcting the state estimate through (12.5).

Alternatively, if it is known that the a priori error covariance approaches 0, i.e. P−(n) → 0,or if R is large, then K(n) → 0. This means that the residual is given less weight in updatingthe state estimate – the a priori estimate is given priority.

Recall that

X(n) = AX(n − 1) + BU(n − 1) + W (n − 1)

X−(n) = AX(n − 1) + BU(n − 1)

then

P−(n) = E[E−(n)E−(n)T

]

= E

[(

X(n) − X−(n))(

X(n) − X−(n))T]

= E

[(

A(X(n − 1) − X(n − 1)) + W (n − 1))(

A(X(n − 1) − X(n − 1)) + W (n − 1))T]

= AE

[(

X(n − 1) − X(n − 1))(

X(n − 1) − X(n − 1))T]

AT + E[W (n − 1)W (n − 1)T

]

= AE[E(n − 1)E(n − 1)T

]AT + E

[W (n − 1)W (n − 1)T

]

= AP (n − 1)AT + Q (12.11)

1The following lemmas are used in the differentiation:

∂tr AX

∂X= A

T

∂tr˘

AXT

¯

∂X= A

∂XAXT

∂X= X(A + A

T )

12.3. EXTENDED KALMAN FILTER 149

Starting from (12.9),

E(n) = E−(n) − K(n)(HE−(n) + V (n))

= (I − K(n)H)E−(n) − K(n)V (n)

so

E[E(n)E(n)T

]= E

[((I − K(n)H)E−(n) − K(n)V (n)

) ((I − K(n)H)E−(n) − K(n)V (n)

)T]

P (n) = E[

(I − K(n)H) E−(n)E−(n)T (I − K(n)H)T + K(n)V (n)V (n)T K(n)T]

= (I − K(n)H) P−(n) (I − K(n)H)T + K(n)RK(n)T

= P−(n) − K(n)HP−(n) − P−(n)HT K(n)T + K(n)HP−(n)HT K(n)T + K(n)RK(n)T

= (I − K(n)H) P−(n) − P−(n)HT K(n)T + K(n)(HP−(n)HT + R)K(n)T

= (I − K(n)H) P−(n) − P−(n)HT K(n)T + P−(n)HT K(n)T

= (I − K(n)H) P−(n) (12.12)

Note that we have used the solution for K(n) in (12.10) to simplify the above equation.It is now possible to draw all the equations derived above together to complete the description

of the Kalman filter:

1. Predict the next state and covariance using the previous estimate.

(a) X−(n) = AX(n − 1) + BU(n − 1)

(b) P−(n) = AP (n − 1)AT + Q

2. Correct the predictions using the measurement.

(a) K(n) = P−(n)HT (HP−(n)H + R)−1

(b) X(n) = X−(n) + K(n)(Z(n) − Z−(n))

(c) P (n) = (I − K(n)H)P−(n)

The distinction between the two main steps is quite clear. The prediction is based on the apriori information available, i.e. the estimates from the previous time instance which was basedon all measurements up to, but not including, the current time instance n. While the correctionis made using the current measurement Z(n) to yield an a posteriori estimate.

In fact, the suitability of the Kalman filter for computer implementation is, in large part, dueto this feature. The procedure is recursive – it is run after each measurement is made, meaningintermediate estimates are always available. This is in contrast to some techniques that mustprocess the entire measurement sequence to finally yield an estimate. It also compresses all theinformation from previous measurements into the current state estimate and its correspondingcovariance matrix estimate – there is no need to retain a record of previous measurements. Thiscan result in significant memory savings.

12.3 Extended Kalman Filter

The Kalman filter has very strict assumptions on the system model. In particular it uses linearstochastic difference equations to describe the state transition and the state-to-measurementmapping. This linearity assumption is violated by many or, by strict interpretation, all real-life


applications. In these cases it becomes necessary to use the Extended Kalman Filter (EKF). Asthe name suggests, this is a generalisation or extension to the standard Kalman filter frameworkwhereby the equations are “linearised” around the estimates in a way analogous to the Taylorseries.

Let the state transition now be defined as: (c.f. (12.1))

X(n) = f(X(n − 1), U(n − 1), W (n − 1)) (12.13)

and the measurement is found by: (c.f. (12.2))

Z(n) = h(X(n), V (n)). (12.14)

As for the Kalman filter case, the noise processes, W (n− 1) and V (n− 1), are unobservableand only an estimate of the previous state, X(n − 1), is available. However, the control input,U(n− 1) is known. Therefore, estimates of the state and measurement vectors can be found by

X−(n) = f(X(n − 1), U(n − 1), 0) (12.15)

Z−(n) = h(X−(n), 0) . (12.16)

This means that (12.13) and (12.14) can be linearised around these initial estimates

X(n) = f(X(n − 1), U(n − 1), W (n − 1))

≈ X−(n) + A(n)(X(n − 1) − X(n − 1)) + C(n)W (n − 1)

and

Z(n) ≈ Z−(n) + H(n)(X(n) − X−(n)) + D(n)V (n)

where A(n), C(n), H(n) and D(n) are Jacobian matrices of the partial derivatives of f and h,i.e.

A[i,j] =∂f[i]

∂x[j](X(n − 1), U(k − 1), 0)

W[i,j] =∂f[i]

∂w[j](X(n − 1), U(k − 1), 0)

H[i,j] =∂h[i]

∂x[j](X(n − 1), 0)

V[i,j] =∂h[i]

∂v[j](X(n − 1), 0) .

Hopefully the interpretation of these equations is clear. The nonlinear state and measurementequations are linearised about an initial, a priori estimate. The four Jacobian matrices describethe rate of change of the nonlinear functions at these estimates.

Due to the non-linearities, the random variables in the EKF cannot be guaranteed to beGaussian. Further, no general optimality criterion can be associated with the resulting estimate.One way to interpret the EKF is that it is simply an adaptation of the Kalman filter to the non-linear case through the use of a first order Taylor series expansion about the a priori estimate.

By using the equations above and following the methodology in deriving the Kalman filter,the complete algorithm for implementing an EKF is:

12.3. EXTENDED KALMAN FILTER 151

1. Predict the next state and covariance using the previous estimate.

(a) X−(n) = f(X(n − 1), U(n − 1), 0)

(b) P−(n) = A(n)P (n − 1)A(n)T + C(n)Q(n − 1)C(n)T

2. Correct the predictions using the measurement.

(a) K(n) = P−(n)H(n)T (H(n)P−(n)H(n)T + V RV T )−1

(b) X(n) = X−(n) + K(n)(Z(n) − h(X−(n), 0))

(c) P (n) = (I − KH(n))P−(n)

Appendix A

Discrete-Time Fourier Transform

Sequence Fourier Transform

ax(n) + by(n) aX(ejω) + bY (ejω)

x(n − nd) (nd integer) e−jωndX(ejω)

e−jω0nx(n) X(ej(ω+ω0))

x(−n) X(e−jω)

X(ejω)∗ if x(n) real

nx(n) jdX(ejω)

dωx(n) ⋆ y(n) X(ejω)Y (ejω)

x(n)y(n)1

2π

∫ π

−πX(ejθ)Y (ej(ω−θ)dθ

−jnx(n) dX(ejω)dω

Symmetry properties:x(n)∗ X(e−jω)∗

x(n) real X(ejω) = X(e−jω)∗

x(n) real XR(ejω) and|X(ejω)| are evenXI(e

jω) and argX(ejω) are odd

x(−n)∗ X(ejω)∗

x(n) real and even X(ejω) real and evenx(n) real and odd X(ejω) pure imaginary and oddParseval’s Theorem

∞∑

n=−∞|x(n)|2 =

1

2π

∫ π

−π|X(ejω)|2dω

∞∑

n=−∞x(n) · y(n)∗ =

1

2π

∫ π

−πX(ejω) · Y (ejω)∗dω

153

154 APPENDIX A. DISCRETE-TIME FOURIER TRANSFORM

Sequence Fourier Transform

δ(n) 1

δ(n − n0) e−jωn0

1, (−∞ < n < ∞)∞∑

k=−∞2πδ(ω + 2πk)

an u(n), (|a| < 1)1

1 − ae−jω

u(n)1

1 − e−jω+

∞∑

k=−∞πδ(ω + 2πk)

(n + 1)an u(n) (|a| < 1)1

(1 − ae−jω)2

rn sin(ωp(n + 1))

sin(ωp)u(n) (|r| < 1)

1

1 − 2r cos(ωpe−jω) + r2e−j2ω

sin(ωcn)

πnX(ejω) =

1, |ω| < ωc

0, ωc < |ω| ≤ π

x(n) =

1, 0 ≤ n ≤ M0, otherwise

sin[ω(M + 1)/2]

sin(ω/2)e−jωM/2

ejω0m∞∑

k=−∞2πδ(ω − ω0 + 2πk)

cos(ω0n + φ) π∞∑

k=−∞[ejφδ(ω − ω0 + 2πk) + e−jφδ(ω + ω0 + 2πk)

Digital Signal Processin-TU Darmstadt

Documents

Transcript of Digital Signal Processin-TU Darmstadt