CHEE825/436 - Module 4J. McLellan - Fall 20051 Process and Disturbance Models.

Post on 28-Dec-2015

218 views 0 download

Tags:

Transcript of CHEE825/436 - Module 4J. McLellan - Fall 20051 Process and Disturbance Models.

CHEE825/436 - Module 4

J. McLellan - Fall 2005 1

Process and Disturbance Models

CHEE825/436 - Module 4

J. McLellan - Fall 2005 2

Outline

• Types of Models• Model Estimation Methods• Identifying Model Structure• Model Diagnostics

CHEE825/436 - Module 4

J. McLellan - Fall 2005 3

The Task of Dynamic Model Building

partitioning process data into a deterministic component (the process) and a stochastic component (the disturbance)

process disturbance

?

time seriesmodel

transfer functionmodel

CHEE825/436 - Module 4

J. McLellan - Fall 2005 4

Process Model Types

• non-parametric– impulse response– step response– spectrum

• parametric– transfer function models

» numerator» denominator

– difference equation models » equivalent to transfer function models with backshift

operator

}technically “parametric” when in finite form (e.g., FIR)

CHEE825/436 - Module 4

J. McLellan - Fall 2005 5

Impulse and Step Process Models

described as a set of weights:

y t h i u t ii

N( ) ( ) ( )= −∑

=0

y t s i u t ii

N( ) ( ) ( )= −∑

0

impulsemodel

stepmodel

Note - typically treat Δu(t-N) as a step from 0 - i.e., Δu(t-N) = u(t-N)

CHEE825/436 - Module 4

J. McLellan - Fall 2005 6

Process Spectrum Model

represented as a set of frequency response values, or graphically

frequency (rad/s)

amplitude

ratio

CHEE825/436 - Module 4

J. McLellan - Fall 2005 7

Process Transfer Function Models

numerator, denominator dynamics and time delay

G qB q q

F qp

f

( )( )

( )

( )−

− − +

−=11 1

1

poles

zeros

time delayextra 1 stepdelay introduced by zero order hold and sampling - f is pure time delay

q-1 is backwards shift operator: q-1 y(t)=y(t-1)

CHEE825/436 - Module 4

J. McLellan - Fall 2005 8

Model Types for Disturbances

• non-parametric– “impulse response” - infinite moving average– spectrum

• parametric– “transfer function” form

» autoregressive (denominator)» moving average (numerator)

CHEE825/436 - Module 4

J. McLellan - Fall 2005 9

ARIMA Models for Disturbances

)()1)((

)()(

11

1ta

qqD

qCtd

d−−

−=

autoregressive component

moving average component

random shock

AutoRegressive Integrated Moving Average ModelTime Series Notation - ARIMA(p,d,q) model has • pth-order denominator - AR• qth-order numerator - MA• d integrating poles (on the unit circle)

CHEE825/436 - Module 4

J. McLellan - Fall 2005 10

ARMA Models for Disturbances

)()(

)()(

1

1ta

qD

qCtd −

−=

autoregressive component

moving average component

random shock

Simply have no integrating component

CHEE825/436 - Module 4

J. McLellan - Fall 2005 11

Typical Model Combinations

• model predictive control– impulse/step process model + ARMA disturbance model

» typically a step disturbance model which can be considered as a pure integrator driven by a single pulse

• single-loop control– transfer function process model + ARMA disturbance model

CHEE825/436 - Module 4

J. McLellan - Fall 2005 12

Classification of Models in Identification

• AutoRegressive with eXogenous inputs (ARX)• Output Error (OE)• AutoRegressive Moving Average with eXogenous

inputs (ARMAX)• Box-Jenkins (BJ)• per Ljung’s terminology

CHEE825/436 - Module 4

J. McLellan - Fall 2005 13

ARX Models

– u(t) is the exogenous input– same autoregressive component for process, disturbance– numerator term for process, no moving average in

disturbance– physical interpretation - disturbance passes through entire

process dynamics » e.g., feed disturbance

A q y t B q q u t a tf( ) ( ) ( ) ( ) ( )( )− − − += +1 1 1

CHEE825/436 - Module 4

J. McLellan - Fall 2005 14

Output Error Models

– no disturbance dynamics – numerator and denominator process dynamics – physical interpretation - process subject to white noise

disturbance (is this ever true?)

y tB q

A qq u t a tf( )

( )

( )( ) ( )( )= +

−− +

1

11

CHEE825/436 - Module 4

J. McLellan - Fall 2005 15

ARMAX Models

– process and disturbance have same denominator dynamics– disturbance has moving average dynamics– physical interpretation - disturbance passing though process

which enters at a point away from the input» except if C(q-1) = B(q-1)

A q y t B q q u t C q a tf( ) ( ) ( ) ( ) ( ) ( )( )− − − + −= +1 1 1 1

CHEE825/436 - Module 4

J. McLellan - Fall 2005 16

Box-Jenkins Model

– autoregressive component plus input, disturbance can have different dynamics

– AR component A(q-1) represents dynamic elements common to both process and disturbance

– physical interpretation - disturbance passes through other dynamic elements before entering process

A q y tB q

F qq u t

C q

D qa tf( ) ( )

( )

( )( )

( )

( )( )( )−

−− +

−= +11

11

1

1

CHEE825/436 - Module 4

J. McLellan - Fall 2005 17

Range of Model Types

Output Error

ARX

ARMAX

Box-Jenkins

least general

most general

CHEE825/436 - Module 4

J. McLellan - Fall 2005 18

Outline

• Types of Models• Model Estimation Methods• Identifying Model Structure• Model Diagnostics

CHEE825/436 - Module 4

J. McLellan - Fall 2005 19

Model Estimation - General Philosophy

Form a “loss function” which is to be minimized to obtain the “best” parameter estimates

Loss function » “loss” can be considered as missed trend or information» e.g. - linear regression

• loss would represent left-over trends in residuals which could be explained by a model

• if we picked up all trend, only the random noise e(t) would be left• additional trends drive up the variation of the residuals• loss function is the sum of squares of the residuals (related to

the variance of the residuals)

CHEE825/436 - Module 4

J. McLellan - Fall 2005 20

Linear Regression - Types of Loss Functions

First, consider the linear regression model:

Least Squares estimation criterion -

Y x x x e e Np p= + + + + +β β β β σ0 1 1 2 220L , ~ ( , )

min ( $)

min ( { })

min

{ , ,... }

{ , ,... }

{ , ,... }

β

β

β

β β β β

i

i

i

i pi i

i

n

i pi i i p pii

n

i pi

i

n

y y

y x x x

e

= =

= =

= =

−∑

= − + + + +∑

= ∑

12

1

10 1 1 2 2

2

1

12

1

L

squared prediction error at point “i”

CHEE825/436 - Module 4

J. McLellan - Fall 2005 21

Linear Regression - Types of Loss Functions

The model describes how the mean of Y varies:

and the variance of Y is because the random component in Y comes from the additive noise “e”. The probability density function at point “i” is

where ei is the noise at point “i”

E Y x x xp p{ } = + + + +β β β β0 1 1 2 2 L

σ 2

fy x x

e

Yi p p

i

ii i=

− − + + +⎛

⎜⎜

⎟⎟

= −⎛

⎝⎜⎜

⎠⎟⎟

12 2

12 2

0 1 12

2

2

2

πσβ β β

σ

πσ σ

exp{ ( )}

exp{ }

L

CHEE825/436 - Module 4

J. McLellan - Fall 2005 22

Linear Regression - Types of Loss Functions

We can write the joint probability density function for all observations in the data set:

⎟⎟⎟⎟

⎜⎜⎜⎜

⎛∑−

=

⎟⎟⎟⎟

⎜⎜⎜⎜

⎛∑ +++−−

=

=

=

21

2

2/

21

2110

2/

2

}{exp

)2(

1

2

)}({exp

)2(

11

σσπ

σ

βββ

σπ

n

ii

nn

n

ippi

nnYY

e

xxyf

ii

n

L

K

CHEE825/436 - Module 4

J. McLellan - Fall 2005 23

Linear Regression - Types of Loss Functions

Given parameters, we can use to determine probability that a given range of observations will occur.

What if we have observations but don’t know parameters?» assume that we have the most common, or “likely”,

observations - i.e., observations that have the greatest probability of occurrence

» find the parameter values that maximize the probability of the observed values occurring

» the joint density function becomes a “likelihood function” » the parameter estimates are “maximum likelihood

estimates”

fY Yn1K

CHEE825/436 - Module 4

J. McLellan - Fall 2005 24

Linear Regression - Types of Loss Functions

Maximum Likelihood Parameter Estimation Criterion -

⎟⎟⎟⎟

⎜⎜⎜⎜

⎛∑ +++−−

= ==

=

21

2110

2/,...,1,

,...,1,

2

)}({exp

)2(

1max

)(max

σ

βββ

σπ

β

β

β

n

ippi

nnpi

pi

ii

i

i

xxy

L

L

y

CHEE825/436 - Module 4

J. McLellan - Fall 2005 25

Linear Regression - Types of Loss Functions

Given the form of the likelihood function, maximizing is equivalent to minimizing the argument of the exponential, i.e.,

For the linear regression case, the maximum likelihood parameter estimates are equivalent to the least squares parameter estimates.

min ( { })

min

{ , ,... }

{ , ,... }

β

β

β β β βi

i

i pi i i p pii

n

i pi

i

n

y x x x

e

= =

= =

− + + + +∑

= ∑

10 1 1 2 2

2

1

12

1

L

CHEE825/436 - Module 4

J. McLellan - Fall 2005 26

Linear Regression - Types of Loss Functions

Least Squares Estimation» loss function is sum of squared residuals = sum of

squared prediction errors

Maximum Likelihood» loss function is likelihood function, which in the linear

regression case is equivalent to the sum of squared prediction errors

Prediction Error = observation - predicted value

y y y x x xi i i i i p pi− = − + + + +$ { }β β β β0 1 1 2 2 L

CHEE825/436 - Module 4

J. McLellan - Fall 2005 27

Loss Functions for Identification

Least Squares

“minimize the sum of squared prediction errors”

The loss function is

where N is the number of points in the data record.

( ( ) $( ))y t y tt

N−∑

=1

2

CHEE825/436 - Module 4

J. McLellan - Fall 2005 28

Least Squares Identification Example

Given an ARX(1) process+disturbance model:

the loss function can be written as

y t a y t b u t e t( ) ( ) ( ) ( )= − + − +1 11 1

( ( ) $( )) ( ( ) { ( ) ( )})y t y t y t a y t bu tt

N

t

N−∑ = − − + −∑

= =2

21 1

2

21 1

CHEE825/436 - Module 4

J. McLellan - Fall 2005 29

Least Squares Identification Example

In matrix form,

and the sum of squares prediction error is

e e eT

yy

y N

y uy u

y N u N

ab

with =

⎢⎢⎢

⎥⎥⎥−

− −

⎢⎢⎢

⎥⎥⎥

⎡⎣⎢

⎤⎦⎥

( )( )

( )

( ) ( )( ) ( )

( ) ( )

23

1 12 2

1 1

11M M M

CHEE825/436 - Module 4

J. McLellan - Fall 2005 30

Least Squares Identification Example

The least squares parameter estimates are:

Note that the disturbance structure in the ARX model is such that the disturbance contribution appears in the formulation as a white noise additive error --> satisfies assumptions for this formulation.

$$ ( )

( )( )

( )

ab

yy

y N

T T1

1

1

23⎡

⎣⎢⎤⎦⎥=

⎢⎢⎢

⎥⎥⎥

−Φ Φ ΦM

CHEE825/436 - Module 4

J. McLellan - Fall 2005 31

Least Squares Identification

• ARX models fit into this framework• Output Error models -

or in difference equation form:

y tB q

A qq u t e t

A q y t B q q u t A q e t

f

f

( )( )

( )( ) ( )

( ) ( ) ( ) ( ) ( ) ( )

( )

( )

= +

= +

−− +

− − − + −

1

11

1 1 1 1

or

y t a y t a y t p B q q u t A q e tpf( ) ( ) ( ) ( ) ( ) ( ) ( )( )= − + + − + +− − + −

11 1 11 L

violates least squaresassumptions of independent errors

CHEE825/436 - Module 4

J. McLellan - Fall 2005 32

Least Squares Identification

Any process+disturbance model other than the ARX model will not satisfy the structural requirements.

Implications? » estimators are not consistent - don’t asymptotically tend

to true values of parameters» potential for bias

CHEE825/436 - Module 4

J. McLellan - Fall 2005 33

Prediction Error Methods

Choose parameter estimates to minimize some function of the prediction errors.

For example, for the Output Error Model, we have

Use a numerical optimization routine to obtain “best” estimates.

ε ( ) ( )( )

( )( )( )t y t

B q

A qq u tf= −

−− +

1

11

predictionprediction error

CHEE825/436 - Module 4

J. McLellan - Fall 2005 34

Prediction Error Methods

AR(1) Example -

Use model to predict one step ahead given past values:

This is an optimal predictor when e(t) is normally distributed, and can be obtained by taking the “conditional expectation” of y(t) given information up to and including time t-1. e(t) disappears because it has zero mean and adds no information on average.

“one step ahead predictor”

y t a y t b u t e t( ) ( ) ( ) ( )= − + − +1 11 1

$( ) ( ) ( )y t a y t b u t= − + −1 11 1

CHEE825/436 - Module 4

J. McLellan - Fall 2005 35

Prediction Error Methods

Prediction Error for the one step ahead predictor:

We could obtain parameter estimates to minimize sum of squared prediction errors:

ε ( ) ( ) $( ) ( ) { ( ) ( )}t y t y t y t a y t bu t= − = − − + −1 11 1

ε ( ) ( ( ) $( ))t y t y tt

N

t

N2

2 2

2

= =∑ = −∑

same as Least Squares Estimates for thisARX example

CHEE825/436 - Module 4

J. McLellan - Fall 2005 36

Prediction Error Methods

What happens if we have an ARMAX(1,1) model?

One step ahead predictor is:

But what is e(t-1)? » estimate it using measured y(t-1) and estimate of y(t-1)

y t a y t b u t e t c e t( ) ( ) ( ) ( ) ( )= − + − + + −1 1 11 1 1

$( ) ( ) ( ) ( )y t a y t b u t c e t= − + − + −1 1 11 1 1

$( ) ( ) $( )

( ) { ( ) ( ) ( )}

e t y t y t

y t a y t b u t c e t

− = − − −= − − − + − − −

1 1 11 2 2 21 1 1

CHEE825/436 - Module 4

J. McLellan - Fall 2005 37

Prediction Error Methods

Note that estimate of e(t-1) depends on e(t-2), which depends on e(t-3), and so forth

» eventually end up with dependence on e(0), which is typically assumed to be zero

» “conditional” estimates - conditional on assumed initial values

» can also formulate in a way to avoid conditional estimates

» impact is typically negligible for large data sets• during computation, it isn’t necessary to solve recursively all the way

back to the original condition

» use previous prediction to estimate previous prediction error)1(ˆ)1()1(̂ −−−=− tytyte

CHEE825/436 - Module 4

J. McLellan - Fall 2005 38

Prediction Error Methods

Formulation for General Case - given a process plus disturbance model:

we can write

so that the prediction is:

The random shocks are estimated as

y t G q u t H q e t( ) ( ) ( ) ( ) ( )= +− −1 1

y t G q u t H q e t e t( ) ( ) ( ) ( ( ) ) ( ) ( )= + − +− −1 1 1

$( ) ( ) ( ) ( ( ) ) ( )y t G q u t H q e t= + −− −1 1 1

e t H q y t G q u t( ) ( ){ ( ) ( ) ( )}= −− − −1 1 1

CHEE825/436 - Module 4

J. McLellan - Fall 2005 39

Prediction Error Methods

Putting these expressions together yields

which is of the form

The prediction error for use in the estimation loss function is

$( ) ( ){( ( ) ) ( ) ( ) ( )}y t H q H q y t G q u t= − +− − − −1 1 1 11

$( ) ( , ) ( ) ( , ) ( )y t L q y t L q u t= +− −1

12

1θ θ

ε θ θ( ) ( ) $( ) ( ) { ( , ) ( ) ( , ) ( )}t y t y t y t L q y t L q u t= − = − +− −1

12

1

CHEE825/436 - Module 4

J. McLellan - Fall 2005 40

Prediction Error Methods

How does this look for a general ARMAX model?

Getting ready for the prediction,

we obtain

A q y t B q u t C q e t( ) ( ) ( ) ( ) ( ) ( )− − −= +1 1 1

y t A q y t B q u t C q e t e t( ) ( ( )) ( ) ( ) ( ) ( ( ) ) ( ) ( )= − + + − +− − −1 11 1 1

$( ) ( ( )) ( ) ( ) ( ) ( ( ) ) ( )y t A q y t B q u t C q e t= − + + −− − −1 11 1 1

CHEE825/436 - Module 4

J. McLellan - Fall 2005 41

Prediction Error Methods

Note that the ability to estimate the random shocks depends on the ability to invert C(q-1)

» invertibility discussed in moving average disturbances» ability to express shocks in terms of present and past

outputs - convert to an infinite autoregressive sum

Note that the moving average parameters appear in the denominator of the prediction

» the model is nonlinear in the moving average parameters, and conditionally linear in the others

CHEE825/436 - Module 4

J. McLellan - Fall 2005 42

Likelihood Function Methods

Conditional Likelihood Function» assume initial conditions for outputs, random shocks» e.g., for ARX(1), values for y(0)» e.g., for ARMAX(1,1), values for y(0), e(0)

General argument -

• form joint distribution for this expressionover all times

• find optimal parameter values to maximize likelihood

y t G q u t H q e t e t( ) ( ) ( ) ( ( ) ) ( ) ( )− − − =− −1 1 1

normallydistributed,zero mean,known variance

CHEE825/436 - Module 4

J. McLellan - Fall 2005 43

Likelihood Function Methods

Exact Likelihood Function

Note that we can also form an exact likelihood function which includes the initial conditions

» maximum likelihood estimation procedure estimates parameters AND initial conditions

» exact likelihood function is more complex

In either case, we use a numerical optimization procedure to solve for the maximum likelihood estimates.

CHEE825/436 - Module 4

J. McLellan - Fall 2005 44

Likelihood Function Methods

Final Comment - » derivation of likelihood function requires convergence of

moving average, autoregressive elements» moving average --> invertibility» autoregressive --> stability

Example - Box-Jenkins model:`

can be re-arranged to yield the random shock

A q y tB q

F qu t

C q

D qe t( ) ( )

( )

( )( )

( )

( )( )−

−= +11

1

1

1

e t A q y tB q

F qu t

D q

C q( ) { ( ) ( )

( )

( )( )}

( )

( )= −−

−1

1

1

1

1

inverted MA component

inverted AR component

CHEE825/436 - Module 4

J. McLellan - Fall 2005 45

Outline

• Types of Models• Model Estimation Methods• Identifying Model Structure• Model Diagnostics

CHEE825/436 - Module 4

J. McLellan - Fall 2005 46

Model-Building Strategy

• graphical pre-screening• select initial model structure• estimate parameters• examine model diagnostics• examine structural diagnostics• validate model using additional data set}

modify modeland re-estimateas required

CHEE825/436 - Module 4

J. McLellan - Fall 2005 47

Example - Debutanizer

Objective - fit a transfer function +disturbance model describing changes in bottoms RVP in response to changes in internal reflux

Data– step data– slow PRBS (switch down, switch up, switch down)

CHEE825/436 - Module 4

J. McLellan - Fall 2005 48

Graphical Pre-Screening

• examine time traces of outputs, inputs, secondary variables– are there any outliers or major shifts in operation?

• could there be a model in this data?• engineering assessment

– should there be a model in this data?

CHEE825/436 - Module 4

J. McLellan - Fall 2005 49

Selecting Initial Model Structure

• examine auto- and cross-correlations of output, input– look for autoregressive, moving average components

• examine spectrum of output– indication of order of process

» first-order» second-order underdamped - resonance» second or higher order overdamped

CHEE825/436 - Module 4

J. McLellan - Fall 2005 50

Selecting Initial Model Structure...

• examine correlation estimate of impulse or step response– available if input is not a step – what order is the process ?

» 1st order, 2nd order over/underdamped– size of the time delay

CHEE825/436 - Module 4

J. McLellan - Fall 2005 51

Selecting Initial Model Structure

Time Delays

For low frequency input signal (e.g., few steps or filtered PRBS), examine transient response for delay

For pre-filtered data, examine cross-correlation plots - where is first non-zero cross-correlation?

CHEE825/436 - Module 4

J. McLellan - Fall 2005 52

Debutanizer Example

• step response– indicates settling time ~100 min– potentially some time delay– positive gain– 1st order or overdamped higher-order

• correlation estimate of step response– indicates time delay of ~4-5 min– overdamped higher-order

CHEE825/436 - Module 4

J. McLellan - Fall 2005 53

Debutanizer Example - PRBS Test

0 50 100 150-0.2

-0.1

0

0.1

0.2

Output # 1

Input and output signals

0 50 100 150-50

0

50

Time

Input # 1

CHEE825/436 - Module 4

J. McLellan - Fall 2005 54

Debutanizer Example - Step Response Test

0 50 100 1500

0.05

0.1

0.15

0.2

Output # 1

Input and output signals

0 50 100 15049

49.5

50

50.5

51

Time

Input # 1

CHEE825/436 - Module 4

J. McLellan - Fall 2005 55

Debutanizer Example - Correlation Step Response Estimate

0 5 10 15 20 25 30 35 400

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

-3

Time

Step Response

CHEE825/436 - Module 4

J. McLellan - Fall 2005 56

Debutanizer Example

• process spectrum– suggests higher-order

• disturbance spectrum– cut-off behaviour suggests AR type of disturbance

• initial model– ARX with delay of 4 or 5– ARMAX– Box-Jenkins– NOT output error - disturbance isn’t white

CHEE825/436 - Module 4

J. McLellan - Fall 2005 57

Debutanizer Example - Process Spectrum Plot

10-2

10-1

100

101

10-6

10-4

10-2

Amplitude

Frequency response

10-2

10-1

100

101

-1000

-500

0

Frequency (rad/s)

Phase (deg)

CHEE825/436 - Module 4

J. McLellan - Fall 2005 58

Debutanizer Example - Disturbance Spectrum

10-2

10-1

100

101

10-8

10-6

10-4

10-2

100

Frequency (rad/s)

Power Spectrum

CHEE825/436 - Module 4

J. McLellan - Fall 2005 59

Additional Initial Selection Tests

CHEE825/436 - Module 4

J. McLellan - Fall 2005 60

Singularity Test

Form the data vector

Covariance matrix for this vector will be singular if s>model order, non-singular if s≤model order

Notes:

1. Test developed for deterministic model – results are exact for this

2. Test is approximate when random shocks enter process – results will depend on signal-to-noise ratio

[ ]ϕ ( )

( ) ( ) ( ) ( )

t

y t y t s u t u t s

=

− − − −1 1L L

CovN

t t T

t

N( ) ( ) ( )ϕ ϕ ϕ= ∑

=

1

1

CHEE825/436 - Module 4

J. McLellan - Fall 2005 61

Pre-Filtering

If input is not white noise, cross-correlation does not show process structure clearly

» autocorrelation in u(t) complicates structure

Solution - estimate time series model for input, and pre-filter using inverse of this model– prefilter input and output to ensure consistency

Now estimate cross-correlations between filtered input, filtered output– look for sharp cut-off - negligible denominator– gradual decline - denominator dynamics

CHEE825/436 - Module 4

J. McLellan - Fall 2005 62

Pre-Filtering

• can also examine cross-correlation plots for indication of time delay– first non-zero lag in cross-correlation function

Note that differencing, which is used to treat non-stationary disturbances, is a form of pre-filtering– more on this later...

CHEE825/436 - Module 4

J. McLellan - Fall 2005 63

Outline

• Types of Models• Model Estimation Methods• Identifying Model Structure• Model Diagnostics

CHEE825/436 - Module 4

J. McLellan - Fall 2005 64

Model Diagnostics

Analyze residuals:

– look for unmodelled trends» auto-correlation» cross-correlation with inputs» spectrum - should be flat

– assess size of residual standard error

Wet towel analogy - wring out all moisture (information) until there is nothing left

CHEE825/436 - Module 4

J. McLellan - Fall 2005 65

Unmodelled Trends in Residuals

• autocorrelations– should be statistically zero

• cross-correlations – between residual and inputs should be zero for lags greater

than the numerator order » i.e., at long lags

– if cross-correlation between inputs and past residuals is non-zero, indicates feedback present in data (inputs depend on past errors)

» i.e., at negative lags

CHEE825/436 - Module 4

J. McLellan - Fall 2005 66

Debutanizer Example

Consider ARX(2,2,5) model– 2 poles, 1 zero, delay of 5

Autocorrelation plots– no systematic trend in residuals

Cross-correlation plots– no systematic relationship between residuals and input

CHEE825/436 - Module 4

J. McLellan - Fall 2005 67

Debutanizer Example - Residual Correlation Plots

-20 -15 -10 -5 0 5 10 15 20-0.5

0

0.5

Autocorrelation of residuals for output 1

-20 -15 -10 -5 0 5 10 15 20-0.5

0

0.5

Samples

Cross corr for input 1and output 1 resids

CHEE825/436 - Module 4

J. McLellan - Fall 2005 68

Debutanizer Example - Predicted vs. Response

0 50 100 150-0.15

-0.1

-0.05

0

0.05

0.1

0.15

Time

Measured and simulated model output

CHEE825/436 - Module 4

J. McLellan - Fall 2005 69

Detecting Incorrect Time Delays

If cross-correlation between residual and input is non-zero for small lags, the time delay is possibly too large

– additional early transients aren’t being modeled because model assumes nothing is happening

CHEE825/436 - Module 4

J. McLellan - Fall 2005 70

Debutanizer Example

Let’s choose a delay of 7

Cross-correlation plot– indicates significant cross-correlation between input and

output at positive lag– estimate of time delay is too large

CHEE825/436 - Module 4

J. McLellan - Fall 2005 71

Model Diagnostics

Quantitative Tests

– significance of parameter estimates– ratio tests - of explained variation

Debutanizer Example– parameters are all significant

CHEE825/436 - Module 4

J. McLellan - Fall 2005 72

Debutanizer Example - Parameter Estimates

This matrix was created by the command ARX on 11/16 1996 at 11:36

Loss fcn: 5.805e-006 Akaike`s FPE: 6.123e-006 Sampling interval 1

The polynomial coefficients and their standard deviations are

B =

1.0e-003 * 0 0 0 0 0 0.1428 -0.0605

0 0 0 0 0 0.0243 0.0272

A = 1.0000 -1.3924 0.4303

0 0.0747 0.0697

parameter

parameter

standarderror

standarderror

AR parameters

numerator parameters

CHEE825/436 - Module 4

J. McLellan - Fall 2005 73

Model Diagnostics

Cross-Validation

Use model to predict behaviour of a new data set collected under similar circumstances

Reject model if prediction error is large

CHEE825/436 - Module 4

J. McLellan - Fall 2005 74

Debutanizer Example

Use initial step test data as a cross-validation data set.

Prediction errors are small, and trend is predicted quite well

Conclusion - acceptable model

CHEE825/436 - Module 4

J. McLellan - Fall 2005 75

Debutanizer Example - Prediction for Validation Data

0 50 100 1500

0.02

0.04

0.06

0.08

0.1

0.12

0.14

Time

Measured and simulated model output

CHEE825/436 - Module 4

J. McLellan - Fall 2005 76

Debutanizer Example - Residual Correlation Plots for Validation Data

-20 -15 -10 -5 0 5 10 15 20-0.5

0

0.5

Autocorrelation of residuals for output 1

-20 -15 -10 -5 0 5 10 15 20-0.5

0

0.5

Samples

Cross corr for input 1and output 1 resids

CHEE825/436 - Module 4

J. McLellan - Fall 2005 77

Outline

• Types of Models• Model Estimation Methods• Identifying Model Structure• Model Diagnostics

CHEE825/436 - Module 4

J. McLellan - Fall 2005 78

Initially...

Use the structure selection methods described earlier.

Once you have estimated several candidate models...

CHEE825/436 - Module 4

J. McLellan - Fall 2005 79

Model Structure Diagnostics

Akaike’s Information Criterion (AIC)

– weighted estimation error » unexplained variation with term penalizing excess

parameters» analogous to adjusted R2 for regression

– find model structure that minimizes the AIC

CHEE825/436 - Module 4

J. McLellan - Fall 2005 80

Akaike’s Information Criterion

Definition

AIC N V pN= +log( ( $))θ 2

number of data points in sample

}related to prediction error(residual sum of squares)

number ofparameters

CHEE825/436 - Module 4

J. McLellan - Fall 2005 81

Akaike’s Information Criterion

best model at minimum

AIC

# of parameters

CHEE825/436 - Module 4

J. McLellan - Fall 2005 82

Akaike’s Final Prediction Error

An attempt to estimate prediction error when model is used to predict new outputs

Goal - choose model that minimizes FPE (balance between number of parameters and explained variation)

( )FPEp Np N N

residual sum of squares=+−

⎛⎝⎜

⎞⎠⎟

11

1//

CHEE825/436 - Module 4

J. McLellan - Fall 2005 83

Minimum Data Length (MDL)

• Another approach - find “minimum length description” of data - measure is based on loss function + penalty for terms

• find description that minimizes this criterion

NN

VN)log(

)dim(θ+

CHEE825/436 - Module 4

J. McLellan - Fall 2005 84

Cross-Validation

Collect additional data, or partition your data set, and predict output(s) for the additional input sequence– poor predictions - modify model accordingly, re-estimate

with old data and re-validate– good predictions - use your model!

Note - cross-validation set should be collected under similar conditions– operating point, no known disturbances (e.g., feed changes)

CHEE825/436 - Module 4

J. McLellan - Fall 2005 85

Debutanizer Example

Search over a range of ARX model orders and time delay:

poles: 1-4

zeros: 1-4

time delay: 1-6

Examine mean square error, MDL, AIC and/or FPE

- Matlab generated -> ARX(2,2,5) model is best

CHEE825/436 - Module 4

J. McLellan - Fall 2005 86

Debutanizer Example

0 2 4 6 8 100

0.02

0.04

0.06

0.08

0.1

0.12

# of par's

% Unexplained of output variance

Model Fit vs # of par's

AIC optimal (ARX3,2,5)

MDL optimal (ARX2,2,5)

CHEE825/436 - Module 4

J. McLellan - Fall 2005 87

Other methods...

Look for Singularity of the “Information Matrix”

CHEE825/436 - Module 4

J. McLellan - Fall 2005 88

Outline

• The Modeling Task• Types of Models• Model Building Strategy• Model Diagnostics• Identifying Model Structure• Modeling Non-Stationary Data• MISO vs. SISO Model Fitting• Closed-Loop Identification

CHEE825/436 - Module 4

J. McLellan - Fall 2005 89

What is Non-Stationary Data?

Non-stationary disturbances – exhibit meandering or wandering behaviour– mean may appear to be non-zero for periods of time– stochastic analogue of integrating disturbance

Non-stationarity is associated with poles on the unit circle in the disturbance transfer function

» AR component has one or more roots at 1

CHEE825/436 - Module 4

J. McLellan - Fall 2005 90

Non-StationaryData

0 100 200 300-4

-2

0

2

4AR parameter of 0.3

output

0 100 200 300-5

0

5AR parameter of 0.6

output

0 100 200 300-10

-5

0

5

10AR parameter of 0.9

time

output

0 100 200 300-20

-10

0

10

20Non-stationary

time

output

CHEE825/436 - Module 4

J. McLellan - Fall 2005 91

How can you detect non-stationary data?

Visual– meandering behaviour

Quantitative– slowly decaying autocorrelation behaviour– difference the data– examine autocorrelation, partial autocorrelation functions for

differenced data– evidence of MA or AR indicates a non-stationary, or

integrated MA or AR disturbance

CHEE825/436 - Module 4

J. McLellan - Fall 2005 92

Differencing Data

… is the procedure of putting the data in “delta form”

Start with y(t) and convert to

– explicitly accounting for the pole on the unit circle

Δy t y t y t( ) ( ) ( )= − −1

CHEE825/436 - Module 4

J. McLellan - Fall 2005 93

Detecting Non-Stationarity

-2 0 2 4 6 8 10 120

0.5

1

response

Autocorrelation for Non-Stationary Disturbance

-2 0 2 4 6 8 10 12-0.5

0

0.5

1

time

response

Autocorrelation for Differenced Disturbance

CHEE825/436 - Module 4

J. McLellan - Fall 2005 94

Impact of Over-Differencing

Over-differencing can introduce extra meandering and local trends into data

Differencing - “cancels” pole on unit circle

Over-differencing - introduces artificial unit pole into data

CHEE825/436 - Module 4

J. McLellan - Fall 2005 95

Recognizing Over-Differencing

Visual– more local trends, meandering in data

Quantitative– autocorrelation behaviour decays more slowly than initial

undifferenced data

CHEE825/436 - Module 4

J. McLellan - Fall 2005 96

Estimating Models for Non-Stationary Data

Approaches

Estimate the model using the differenced data

Explicitly incorporate the pole on the unit circle in the disturbance transfer function specification

CHEE825/436 - Module 4

J. McLellan - Fall 2005 97

Estimating Models from Differenced Data

• Prepare the data by differencing BOTH the input and the output

• Specify initial model structure after using graphical, quantitative tools

• Estimate, diagnose model for differenced data• Convert model to undifferenced form by multiplying

through by (1-q-1)• Assess predictions on undifferenced data for fitting

and validation data sets

CHEE825/436 - Module 4

J. McLellan - Fall 2005 98

Differenced Form of Box-Jenkins Model

Note - in time series literature,

is used to denote differencing

A q y tB q

F qq u t

C q

D qa tf( ) ( )

( )

( )( )

( )

( )( )( )−

−− +

−+ = +11

11

1

11Δ Δ

∇= − =−( )1 1q Δ

CHEE825/436 - Module 4

J. McLellan - Fall 2005 99

Outline

• Types of Models• Model Estimation Methods• Identifying Model Structure• Model Diagnostics• Estimating MIMO models

CHEE825/436 - Module 4

J. McLellan - Fall 2005 100

SISO Approach

Estimate models individually

Advantage– simplicity

Disadvantage– need to reconcile disturbance models for each input-output

channel in order to obtain one disturbance model for the output

– can’t assess directionality with respect to inputs

CHEE825/436 - Module 4

J. McLellan - Fall 2005 101

MISO Approach

Estimate the transfer function models + disturbance model for a single output and all inputs simultaneously

Advantage– consistency - obtain one disturbance model directly– potential to assess directionality

Disadvantage– complexity - recognizing model structures is more difficult

CHEE825/436 - Module 4

J. McLellan - Fall 2005 102

A Hybrid Approach

• conduct preliminary analysis using SISO approach– model structures– apparent disturbance structure

• estimate final model using MISO approach– must decide on a common disturbance structure

• feasible if input sequences are independent

CHEE825/436 - Module 4

J. McLellan - Fall 2005 103

Outline

• Types of Models• Model Estimation Methods• Identifying Model Structure• Model Diagnostics• Closed-loop vs. open-loop estimation

CHEE825/436 - Module 4

J. McLellan - Fall 2005 104

The Closed-Loop Identification Problem

Yt

UtSPt

+-

ControllerGc

ProcessGp

dither signal Wt

X

CHEE825/436 - Module 4

J. McLellan - Fall 2005 105

Where should the input signal be introduced?

Options:

Dither at the controller output– clearer indication of process dynamics– preferred approach

Perturbations in the setpoint– additional controller dynamics will be included in estimated

model

CHEE825/436 - Module 4

J. McLellan - Fall 2005 106

What do the closed-loop data represent?

• dither signal case, without disturbances• open-loop

– input-output data represents

• closed-loop– input-output data represents

Y G Wt p t=

YG

G GWt

p

p ct=

+1

CHEE825/436 - Module 4

J. McLellan - Fall 2005 107

Estimating Models from Closed-Loop Data

Approach #1:

Working with W-Y data,

estimate and back out controller to

obtain process transfer function. – we already know the controller transfer function

G

G Gp

p c1+

CHEE825/436 - Module 4

J. McLellan - Fall 2005 108

Estimating Models from Closed-Loop Data

Approach #2:

Estimate transfer functions for the process

(U ->Y), and for the controller (Y->U) simultaneously.

CHEE825/436 - Module 4

J. McLellan - Fall 2005 109

Estimating Models from Closed-Loop Data

Approach #3:

Fit the model as in the open-loop case (U->Y).

Note that so that we are effectively

using a filtered input signal.

WGG

Ucp+

=1

1

CHEE825/436 - Module 4

J. McLellan - Fall 2005 110

Some Useful References

Identification Case Study - paper by Shirt, Harris and Bacon (1994).

Closed-Loop Identification - issues

- paper by MacGregor and Fogal

System Identification Workshop

- paper edited by Barry Cott