CHEE825/436 - Module 4J. McLellan - Fall 20051 Process and Disturbance Models.
-
Upload
morris-gilmore -
Category
Documents
-
view
218 -
download
0
Transcript of CHEE825/436 - Module 4J. McLellan - Fall 20051 Process and Disturbance Models.
CHEE825/436 - Module 4
J. McLellan - Fall 2005 1
Process and Disturbance Models
CHEE825/436 - Module 4
J. McLellan - Fall 2005 2
Outline
• Types of Models• Model Estimation Methods• Identifying Model Structure• Model Diagnostics
CHEE825/436 - Module 4
J. McLellan - Fall 2005 3
The Task of Dynamic Model Building
partitioning process data into a deterministic component (the process) and a stochastic component (the disturbance)
process disturbance
?
time seriesmodel
transfer functionmodel
CHEE825/436 - Module 4
J. McLellan - Fall 2005 4
Process Model Types
• non-parametric– impulse response– step response– spectrum
• parametric– transfer function models
» numerator» denominator
– difference equation models » equivalent to transfer function models with backshift
operator
}technically “parametric” when in finite form (e.g., FIR)
CHEE825/436 - Module 4
J. McLellan - Fall 2005 5
Impulse and Step Process Models
described as a set of weights:
y t h i u t ii
N( ) ( ) ( )= −∑
=0
y t s i u t ii
N( ) ( ) ( )= −∑
=Δ
0
impulsemodel
stepmodel
Note - typically treat Δu(t-N) as a step from 0 - i.e., Δu(t-N) = u(t-N)
CHEE825/436 - Module 4
J. McLellan - Fall 2005 6
Process Spectrum Model
represented as a set of frequency response values, or graphically
frequency (rad/s)
amplitude
ratio
CHEE825/436 - Module 4
J. McLellan - Fall 2005 7
Process Transfer Function Models
numerator, denominator dynamics and time delay
G qB q q
F qp
f
( )( )
( )
( )−
− − +
−=11 1
1
poles
zeros
time delayextra 1 stepdelay introduced by zero order hold and sampling - f is pure time delay
q-1 is backwards shift operator: q-1 y(t)=y(t-1)
CHEE825/436 - Module 4
J. McLellan - Fall 2005 8
Model Types for Disturbances
• non-parametric– “impulse response” - infinite moving average– spectrum
• parametric– “transfer function” form
» autoregressive (denominator)» moving average (numerator)
CHEE825/436 - Module 4
J. McLellan - Fall 2005 9
ARIMA Models for Disturbances
)()1)((
)()(
11
1ta
qqD
qCtd
d−−
−
−=
autoregressive component
moving average component
random shock
AutoRegressive Integrated Moving Average ModelTime Series Notation - ARIMA(p,d,q) model has • pth-order denominator - AR• qth-order numerator - MA• d integrating poles (on the unit circle)
CHEE825/436 - Module 4
J. McLellan - Fall 2005 10
ARMA Models for Disturbances
)()(
)()(
1
1ta
qD
qCtd −
−=
autoregressive component
moving average component
random shock
Simply have no integrating component
CHEE825/436 - Module 4
J. McLellan - Fall 2005 11
Typical Model Combinations
• model predictive control– impulse/step process model + ARMA disturbance model
» typically a step disturbance model which can be considered as a pure integrator driven by a single pulse
• single-loop control– transfer function process model + ARMA disturbance model
CHEE825/436 - Module 4
J. McLellan - Fall 2005 12
Classification of Models in Identification
• AutoRegressive with eXogenous inputs (ARX)• Output Error (OE)• AutoRegressive Moving Average with eXogenous
inputs (ARMAX)• Box-Jenkins (BJ)• per Ljung’s terminology
CHEE825/436 - Module 4
J. McLellan - Fall 2005 13
ARX Models
– u(t) is the exogenous input– same autoregressive component for process, disturbance– numerator term for process, no moving average in
disturbance– physical interpretation - disturbance passes through entire
process dynamics » e.g., feed disturbance
A q y t B q q u t a tf( ) ( ) ( ) ( ) ( )( )− − − += +1 1 1
CHEE825/436 - Module 4
J. McLellan - Fall 2005 14
Output Error Models
– no disturbance dynamics – numerator and denominator process dynamics – physical interpretation - process subject to white noise
disturbance (is this ever true?)
y tB q
A qq u t a tf( )
( )
( )( ) ( )( )= +
−
−− +
1
11
CHEE825/436 - Module 4
J. McLellan - Fall 2005 15
ARMAX Models
– process and disturbance have same denominator dynamics– disturbance has moving average dynamics– physical interpretation - disturbance passing though process
which enters at a point away from the input» except if C(q-1) = B(q-1)
A q y t B q q u t C q a tf( ) ( ) ( ) ( ) ( ) ( )( )− − − + −= +1 1 1 1
CHEE825/436 - Module 4
J. McLellan - Fall 2005 16
Box-Jenkins Model
– autoregressive component plus input, disturbance can have different dynamics
– AR component A(q-1) represents dynamic elements common to both process and disturbance
– physical interpretation - disturbance passes through other dynamic elements before entering process
A q y tB q
F qq u t
C q
D qa tf( ) ( )
( )
( )( )
( )
( )( )( )−
−
−− +
−
−= +11
11
1
1
CHEE825/436 - Module 4
J. McLellan - Fall 2005 17
Range of Model Types
Output Error
ARX
ARMAX
Box-Jenkins
least general
most general
CHEE825/436 - Module 4
J. McLellan - Fall 2005 18
Outline
• Types of Models• Model Estimation Methods• Identifying Model Structure• Model Diagnostics
CHEE825/436 - Module 4
J. McLellan - Fall 2005 19
Model Estimation - General Philosophy
Form a “loss function” which is to be minimized to obtain the “best” parameter estimates
Loss function » “loss” can be considered as missed trend or information» e.g. - linear regression
• loss would represent left-over trends in residuals which could be explained by a model
• if we picked up all trend, only the random noise e(t) would be left• additional trends drive up the variation of the residuals• loss function is the sum of squares of the residuals (related to
the variance of the residuals)
CHEE825/436 - Module 4
J. McLellan - Fall 2005 20
Linear Regression - Types of Loss Functions
First, consider the linear regression model:
Least Squares estimation criterion -
Y x x x e e Np p= + + + + +β β β β σ0 1 1 2 220L , ~ ( , )
min ( $)
min ( { })
min
{ , ,... }
{ , ,... }
{ , ,... }
β
β
β
β β β β
i
i
i
i pi i
i
n
i pi i i p pii
n
i pi
i
n
y y
y x x x
e
= =
= =
= =
−∑
= − + + + +∑
= ∑
12
1
10 1 1 2 2
2
1
12
1
L
squared prediction error at point “i”
CHEE825/436 - Module 4
J. McLellan - Fall 2005 21
Linear Regression - Types of Loss Functions
The model describes how the mean of Y varies:
and the variance of Y is because the random component in Y comes from the additive noise “e”. The probability density function at point “i” is
where ei is the noise at point “i”
E Y x x xp p{ } = + + + +β β β β0 1 1 2 2 L
σ 2
fy x x
e
Yi p p
i
ii i=
− − + + +⎛
⎝
⎜⎜
⎞
⎠
⎟⎟
= −⎛
⎝⎜⎜
⎞
⎠⎟⎟
12 2
12 2
0 1 12
2
2
2
πσβ β β
σ
πσ σ
exp{ ( )}
exp{ }
L
CHEE825/436 - Module 4
J. McLellan - Fall 2005 22
Linear Regression - Types of Loss Functions
We can write the joint probability density function for all observations in the data set:
⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜
⎝
⎛∑−
=
⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜
⎝
⎛∑ +++−−
=
=
=
21
2
2/
21
2110
2/
2
}{exp
)2(
1
2
)}({exp
)2(
11
σσπ
σ
βββ
σπ
n
ii
nn
n
ippi
nnYY
e
xxyf
ii
n
L
K
CHEE825/436 - Module 4
J. McLellan - Fall 2005 23
Linear Regression - Types of Loss Functions
Given parameters, we can use to determine probability that a given range of observations will occur.
What if we have observations but don’t know parameters?» assume that we have the most common, or “likely”,
observations - i.e., observations that have the greatest probability of occurrence
» find the parameter values that maximize the probability of the observed values occurring
» the joint density function becomes a “likelihood function” » the parameter estimates are “maximum likelihood
estimates”
fY Yn1K
CHEE825/436 - Module 4
J. McLellan - Fall 2005 24
Linear Regression - Types of Loss Functions
Maximum Likelihood Parameter Estimation Criterion -
⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜
⎝
⎛∑ +++−−
= ==
=
21
2110
2/,...,1,
,...,1,
2
)}({exp
)2(
1max
)(max
σ
βββ
σπ
β
β
β
n
ippi
nnpi
pi
ii
i
i
xxy
L
L
y
CHEE825/436 - Module 4
J. McLellan - Fall 2005 25
Linear Regression - Types of Loss Functions
Given the form of the likelihood function, maximizing is equivalent to minimizing the argument of the exponential, i.e.,
For the linear regression case, the maximum likelihood parameter estimates are equivalent to the least squares parameter estimates.
min ( { })
min
{ , ,... }
{ , ,... }
β
β
β β β βi
i
i pi i i p pii
n
i pi
i
n
y x x x
e
= =
= =
− + + + +∑
= ∑
10 1 1 2 2
2
1
12
1
L
CHEE825/436 - Module 4
J. McLellan - Fall 2005 26
Linear Regression - Types of Loss Functions
Least Squares Estimation» loss function is sum of squared residuals = sum of
squared prediction errors
Maximum Likelihood» loss function is likelihood function, which in the linear
regression case is equivalent to the sum of squared prediction errors
Prediction Error = observation - predicted value
y y y x x xi i i i i p pi− = − + + + +$ { }β β β β0 1 1 2 2 L
CHEE825/436 - Module 4
J. McLellan - Fall 2005 27
Loss Functions for Identification
Least Squares
“minimize the sum of squared prediction errors”
The loss function is
where N is the number of points in the data record.
( ( ) $( ))y t y tt
N−∑
=1
2
CHEE825/436 - Module 4
J. McLellan - Fall 2005 28
Least Squares Identification Example
Given an ARX(1) process+disturbance model:
the loss function can be written as
y t a y t b u t e t( ) ( ) ( ) ( )= − + − +1 11 1
( ( ) $( )) ( ( ) { ( ) ( )})y t y t y t a y t bu tt
N
t
N−∑ = − − + −∑
= =2
21 1
2
21 1
CHEE825/436 - Module 4
J. McLellan - Fall 2005 29
Least Squares Identification Example
In matrix form,
and the sum of squares prediction error is
e e eT
yy
y N
y uy u
y N u N
ab
with =
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥−
− −
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
⎡⎣⎢
⎤⎦⎥
( )( )
( )
( ) ( )( ) ( )
( ) ( )
23
1 12 2
1 1
11M M M
CHEE825/436 - Module 4
J. McLellan - Fall 2005 30
Least Squares Identification Example
The least squares parameter estimates are:
Note that the disturbance structure in the ARX model is such that the disturbance contribution appears in the formulation as a white noise additive error --> satisfies assumptions for this formulation.
$$ ( )
( )( )
( )
ab
yy
y N
T T1
1
1
23⎡
⎣⎢⎤⎦⎥=
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
−Φ Φ ΦM
CHEE825/436 - Module 4
J. McLellan - Fall 2005 31
Least Squares Identification
• ARX models fit into this framework• Output Error models -
or in difference equation form:
y tB q
A qq u t e t
A q y t B q q u t A q e t
f
f
( )( )
( )( ) ( )
( ) ( ) ( ) ( ) ( ) ( )
( )
( )
= +
= +
−
−− +
− − − + −
1
11
1 1 1 1
or
y t a y t a y t p B q q u t A q e tpf( ) ( ) ( ) ( ) ( ) ( ) ( )( )= − + + − + +− − + −
11 1 11 L
violates least squaresassumptions of independent errors
CHEE825/436 - Module 4
J. McLellan - Fall 2005 32
Least Squares Identification
Any process+disturbance model other than the ARX model will not satisfy the structural requirements.
Implications? » estimators are not consistent - don’t asymptotically tend
to true values of parameters» potential for bias
CHEE825/436 - Module 4
J. McLellan - Fall 2005 33
Prediction Error Methods
Choose parameter estimates to minimize some function of the prediction errors.
For example, for the Output Error Model, we have
Use a numerical optimization routine to obtain “best” estimates.
ε ( ) ( )( )
( )( )( )t y t
B q
A qq u tf= −
−
−− +
1
11
predictionprediction error
CHEE825/436 - Module 4
J. McLellan - Fall 2005 34
Prediction Error Methods
AR(1) Example -
Use model to predict one step ahead given past values:
This is an optimal predictor when e(t) is normally distributed, and can be obtained by taking the “conditional expectation” of y(t) given information up to and including time t-1. e(t) disappears because it has zero mean and adds no information on average.
“one step ahead predictor”
y t a y t b u t e t( ) ( ) ( ) ( )= − + − +1 11 1
$( ) ( ) ( )y t a y t b u t= − + −1 11 1
CHEE825/436 - Module 4
J. McLellan - Fall 2005 35
Prediction Error Methods
Prediction Error for the one step ahead predictor:
We could obtain parameter estimates to minimize sum of squared prediction errors:
ε ( ) ( ) $( ) ( ) { ( ) ( )}t y t y t y t a y t bu t= − = − − + −1 11 1
ε ( ) ( ( ) $( ))t y t y tt
N
t
N2
2 2
2
= =∑ = −∑
same as Least Squares Estimates for thisARX example
CHEE825/436 - Module 4
J. McLellan - Fall 2005 36
Prediction Error Methods
What happens if we have an ARMAX(1,1) model?
One step ahead predictor is:
But what is e(t-1)? » estimate it using measured y(t-1) and estimate of y(t-1)
y t a y t b u t e t c e t( ) ( ) ( ) ( ) ( )= − + − + + −1 1 11 1 1
$( ) ( ) ( ) ( )y t a y t b u t c e t= − + − + −1 1 11 1 1
$( ) ( ) $( )
( ) { ( ) ( ) ( )}
e t y t y t
y t a y t b u t c e t
− = − − −= − − − + − − −
1 1 11 2 2 21 1 1
CHEE825/436 - Module 4
J. McLellan - Fall 2005 37
Prediction Error Methods
Note that estimate of e(t-1) depends on e(t-2), which depends on e(t-3), and so forth
» eventually end up with dependence on e(0), which is typically assumed to be zero
» “conditional” estimates - conditional on assumed initial values
» can also formulate in a way to avoid conditional estimates
» impact is typically negligible for large data sets• during computation, it isn’t necessary to solve recursively all the way
back to the original condition
» use previous prediction to estimate previous prediction error)1(ˆ)1()1(̂ −−−=− tytyte
CHEE825/436 - Module 4
J. McLellan - Fall 2005 38
Prediction Error Methods
Formulation for General Case - given a process plus disturbance model:
we can write
so that the prediction is:
The random shocks are estimated as
y t G q u t H q e t( ) ( ) ( ) ( ) ( )= +− −1 1
y t G q u t H q e t e t( ) ( ) ( ) ( ( ) ) ( ) ( )= + − +− −1 1 1
$( ) ( ) ( ) ( ( ) ) ( )y t G q u t H q e t= + −− −1 1 1
e t H q y t G q u t( ) ( ){ ( ) ( ) ( )}= −− − −1 1 1
CHEE825/436 - Module 4
J. McLellan - Fall 2005 39
Prediction Error Methods
Putting these expressions together yields
which is of the form
The prediction error for use in the estimation loss function is
$( ) ( ){( ( ) ) ( ) ( ) ( )}y t H q H q y t G q u t= − +− − − −1 1 1 11
$( ) ( , ) ( ) ( , ) ( )y t L q y t L q u t= +− −1
12
1θ θ
ε θ θ( ) ( ) $( ) ( ) { ( , ) ( ) ( , ) ( )}t y t y t y t L q y t L q u t= − = − +− −1
12
1
CHEE825/436 - Module 4
J. McLellan - Fall 2005 40
Prediction Error Methods
How does this look for a general ARMAX model?
Getting ready for the prediction,
we obtain
A q y t B q u t C q e t( ) ( ) ( ) ( ) ( ) ( )− − −= +1 1 1
y t A q y t B q u t C q e t e t( ) ( ( )) ( ) ( ) ( ) ( ( ) ) ( ) ( )= − + + − +− − −1 11 1 1
$( ) ( ( )) ( ) ( ) ( ) ( ( ) ) ( )y t A q y t B q u t C q e t= − + + −− − −1 11 1 1
CHEE825/436 - Module 4
J. McLellan - Fall 2005 41
Prediction Error Methods
Note that the ability to estimate the random shocks depends on the ability to invert C(q-1)
» invertibility discussed in moving average disturbances» ability to express shocks in terms of present and past
outputs - convert to an infinite autoregressive sum
Note that the moving average parameters appear in the denominator of the prediction
» the model is nonlinear in the moving average parameters, and conditionally linear in the others
CHEE825/436 - Module 4
J. McLellan - Fall 2005 42
Likelihood Function Methods
Conditional Likelihood Function» assume initial conditions for outputs, random shocks» e.g., for ARX(1), values for y(0)» e.g., for ARMAX(1,1), values for y(0), e(0)
General argument -
• form joint distribution for this expressionover all times
• find optimal parameter values to maximize likelihood
y t G q u t H q e t e t( ) ( ) ( ) ( ( ) ) ( ) ( )− − − =− −1 1 1
normallydistributed,zero mean,known variance
CHEE825/436 - Module 4
J. McLellan - Fall 2005 43
Likelihood Function Methods
Exact Likelihood Function
Note that we can also form an exact likelihood function which includes the initial conditions
» maximum likelihood estimation procedure estimates parameters AND initial conditions
» exact likelihood function is more complex
In either case, we use a numerical optimization procedure to solve for the maximum likelihood estimates.
CHEE825/436 - Module 4
J. McLellan - Fall 2005 44
Likelihood Function Methods
Final Comment - » derivation of likelihood function requires convergence of
moving average, autoregressive elements» moving average --> invertibility» autoregressive --> stability
Example - Box-Jenkins model:`
can be re-arranged to yield the random shock
A q y tB q
F qu t
C q
D qe t( ) ( )
( )
( )( )
( )
( )( )−
−
−
−
−= +11
1
1
1
e t A q y tB q
F qu t
D q
C q( ) { ( ) ( )
( )
( )( )}
( )
( )= −−
−
−
−
−1
1
1
1
1
inverted MA component
inverted AR component
CHEE825/436 - Module 4
J. McLellan - Fall 2005 45
Outline
• Types of Models• Model Estimation Methods• Identifying Model Structure• Model Diagnostics
CHEE825/436 - Module 4
J. McLellan - Fall 2005 46
Model-Building Strategy
• graphical pre-screening• select initial model structure• estimate parameters• examine model diagnostics• examine structural diagnostics• validate model using additional data set}
modify modeland re-estimateas required
CHEE825/436 - Module 4
J. McLellan - Fall 2005 47
Example - Debutanizer
Objective - fit a transfer function +disturbance model describing changes in bottoms RVP in response to changes in internal reflux
Data– step data– slow PRBS (switch down, switch up, switch down)
CHEE825/436 - Module 4
J. McLellan - Fall 2005 48
Graphical Pre-Screening
• examine time traces of outputs, inputs, secondary variables– are there any outliers or major shifts in operation?
• could there be a model in this data?• engineering assessment
– should there be a model in this data?
CHEE825/436 - Module 4
J. McLellan - Fall 2005 49
Selecting Initial Model Structure
• examine auto- and cross-correlations of output, input– look for autoregressive, moving average components
• examine spectrum of output– indication of order of process
» first-order» second-order underdamped - resonance» second or higher order overdamped
CHEE825/436 - Module 4
J. McLellan - Fall 2005 50
Selecting Initial Model Structure...
• examine correlation estimate of impulse or step response– available if input is not a step – what order is the process ?
» 1st order, 2nd order over/underdamped– size of the time delay
CHEE825/436 - Module 4
J. McLellan - Fall 2005 51
Selecting Initial Model Structure
Time Delays
For low frequency input signal (e.g., few steps or filtered PRBS), examine transient response for delay
For pre-filtered data, examine cross-correlation plots - where is first non-zero cross-correlation?
CHEE825/436 - Module 4
J. McLellan - Fall 2005 52
Debutanizer Example
• step response– indicates settling time ~100 min– potentially some time delay– positive gain– 1st order or overdamped higher-order
• correlation estimate of step response– indicates time delay of ~4-5 min– overdamped higher-order
CHEE825/436 - Module 4
J. McLellan - Fall 2005 53
Debutanizer Example - PRBS Test
0 50 100 150-0.2
-0.1
0
0.1
0.2
Output # 1
Input and output signals
0 50 100 150-50
0
50
Time
Input # 1
CHEE825/436 - Module 4
J. McLellan - Fall 2005 54
Debutanizer Example - Step Response Test
0 50 100 1500
0.05
0.1
0.15
0.2
Output # 1
Input and output signals
0 50 100 15049
49.5
50
50.5
51
Time
Input # 1
CHEE825/436 - Module 4
J. McLellan - Fall 2005 55
Debutanizer Example - Correlation Step Response Estimate
0 5 10 15 20 25 30 35 400
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2x 10
-3
Time
Step Response
CHEE825/436 - Module 4
J. McLellan - Fall 2005 56
Debutanizer Example
• process spectrum– suggests higher-order
• disturbance spectrum– cut-off behaviour suggests AR type of disturbance
• initial model– ARX with delay of 4 or 5– ARMAX– Box-Jenkins– NOT output error - disturbance isn’t white
CHEE825/436 - Module 4
J. McLellan - Fall 2005 57
Debutanizer Example - Process Spectrum Plot
10-2
10-1
100
101
10-6
10-4
10-2
Amplitude
Frequency response
10-2
10-1
100
101
-1000
-500
0
Frequency (rad/s)
Phase (deg)
CHEE825/436 - Module 4
J. McLellan - Fall 2005 58
Debutanizer Example - Disturbance Spectrum
10-2
10-1
100
101
10-8
10-6
10-4
10-2
100
Frequency (rad/s)
Power Spectrum
CHEE825/436 - Module 4
J. McLellan - Fall 2005 59
Additional Initial Selection Tests
CHEE825/436 - Module 4
J. McLellan - Fall 2005 60
Singularity Test
Form the data vector
Covariance matrix for this vector will be singular if s>model order, non-singular if s≤model order
Notes:
1. Test developed for deterministic model – results are exact for this
2. Test is approximate when random shocks enter process – results will depend on signal-to-noise ratio
[ ]ϕ ( )
( ) ( ) ( ) ( )
t
y t y t s u t u t s
=
− − − −1 1L L
CovN
t t T
t
N( ) ( ) ( )ϕ ϕ ϕ= ∑
=
1
1
CHEE825/436 - Module 4
J. McLellan - Fall 2005 61
Pre-Filtering
If input is not white noise, cross-correlation does not show process structure clearly
» autocorrelation in u(t) complicates structure
Solution - estimate time series model for input, and pre-filter using inverse of this model– prefilter input and output to ensure consistency
Now estimate cross-correlations between filtered input, filtered output– look for sharp cut-off - negligible denominator– gradual decline - denominator dynamics
CHEE825/436 - Module 4
J. McLellan - Fall 2005 62
Pre-Filtering
• can also examine cross-correlation plots for indication of time delay– first non-zero lag in cross-correlation function
Note that differencing, which is used to treat non-stationary disturbances, is a form of pre-filtering– more on this later...
CHEE825/436 - Module 4
J. McLellan - Fall 2005 63
Outline
• Types of Models• Model Estimation Methods• Identifying Model Structure• Model Diagnostics
CHEE825/436 - Module 4
J. McLellan - Fall 2005 64
Model Diagnostics
Analyze residuals:
– look for unmodelled trends» auto-correlation» cross-correlation with inputs» spectrum - should be flat
– assess size of residual standard error
Wet towel analogy - wring out all moisture (information) until there is nothing left
CHEE825/436 - Module 4
J. McLellan - Fall 2005 65
Unmodelled Trends in Residuals
• autocorrelations– should be statistically zero
• cross-correlations – between residual and inputs should be zero for lags greater
than the numerator order » i.e., at long lags
– if cross-correlation between inputs and past residuals is non-zero, indicates feedback present in data (inputs depend on past errors)
» i.e., at negative lags
CHEE825/436 - Module 4
J. McLellan - Fall 2005 66
Debutanizer Example
Consider ARX(2,2,5) model– 2 poles, 1 zero, delay of 5
Autocorrelation plots– no systematic trend in residuals
Cross-correlation plots– no systematic relationship between residuals and input
CHEE825/436 - Module 4
J. McLellan - Fall 2005 67
Debutanizer Example - Residual Correlation Plots
-20 -15 -10 -5 0 5 10 15 20-0.5
0
0.5
Autocorrelation of residuals for output 1
-20 -15 -10 -5 0 5 10 15 20-0.5
0
0.5
Samples
Cross corr for input 1and output 1 resids
CHEE825/436 - Module 4
J. McLellan - Fall 2005 68
Debutanizer Example - Predicted vs. Response
0 50 100 150-0.15
-0.1
-0.05
0
0.05
0.1
0.15
Time
Measured and simulated model output
CHEE825/436 - Module 4
J. McLellan - Fall 2005 69
Detecting Incorrect Time Delays
If cross-correlation between residual and input is non-zero for small lags, the time delay is possibly too large
– additional early transients aren’t being modeled because model assumes nothing is happening
CHEE825/436 - Module 4
J. McLellan - Fall 2005 70
Debutanizer Example
Let’s choose a delay of 7
Cross-correlation plot– indicates significant cross-correlation between input and
output at positive lag– estimate of time delay is too large
CHEE825/436 - Module 4
J. McLellan - Fall 2005 71
Model Diagnostics
Quantitative Tests
– significance of parameter estimates– ratio tests - of explained variation
Debutanizer Example– parameters are all significant
CHEE825/436 - Module 4
J. McLellan - Fall 2005 72
Debutanizer Example - Parameter Estimates
This matrix was created by the command ARX on 11/16 1996 at 11:36
Loss fcn: 5.805e-006 Akaike`s FPE: 6.123e-006 Sampling interval 1
The polynomial coefficients and their standard deviations are
B =
1.0e-003 * 0 0 0 0 0 0.1428 -0.0605
0 0 0 0 0 0.0243 0.0272
A = 1.0000 -1.3924 0.4303
0 0.0747 0.0697
parameter
parameter
standarderror
standarderror
AR parameters
numerator parameters
CHEE825/436 - Module 4
J. McLellan - Fall 2005 73
Model Diagnostics
Cross-Validation
Use model to predict behaviour of a new data set collected under similar circumstances
Reject model if prediction error is large
CHEE825/436 - Module 4
J. McLellan - Fall 2005 74
Debutanizer Example
Use initial step test data as a cross-validation data set.
Prediction errors are small, and trend is predicted quite well
Conclusion - acceptable model
CHEE825/436 - Module 4
J. McLellan - Fall 2005 75
Debutanizer Example - Prediction for Validation Data
0 50 100 1500
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Time
Measured and simulated model output
CHEE825/436 - Module 4
J. McLellan - Fall 2005 76
Debutanizer Example - Residual Correlation Plots for Validation Data
-20 -15 -10 -5 0 5 10 15 20-0.5
0
0.5
Autocorrelation of residuals for output 1
-20 -15 -10 -5 0 5 10 15 20-0.5
0
0.5
Samples
Cross corr for input 1and output 1 resids
CHEE825/436 - Module 4
J. McLellan - Fall 2005 77
Outline
• Types of Models• Model Estimation Methods• Identifying Model Structure• Model Diagnostics
CHEE825/436 - Module 4
J. McLellan - Fall 2005 78
Initially...
Use the structure selection methods described earlier.
Once you have estimated several candidate models...
CHEE825/436 - Module 4
J. McLellan - Fall 2005 79
Model Structure Diagnostics
Akaike’s Information Criterion (AIC)
– weighted estimation error » unexplained variation with term penalizing excess
parameters» analogous to adjusted R2 for regression
– find model structure that minimizes the AIC
CHEE825/436 - Module 4
J. McLellan - Fall 2005 80
Akaike’s Information Criterion
Definition
AIC N V pN= +log( ( $))θ 2
number of data points in sample
}related to prediction error(residual sum of squares)
number ofparameters
CHEE825/436 - Module 4
J. McLellan - Fall 2005 81
Akaike’s Information Criterion
best model at minimum
AIC
# of parameters
CHEE825/436 - Module 4
J. McLellan - Fall 2005 82
Akaike’s Final Prediction Error
An attempt to estimate prediction error when model is used to predict new outputs
Goal - choose model that minimizes FPE (balance between number of parameters and explained variation)
( )FPEp Np N N
residual sum of squares=+−
⎛⎝⎜
⎞⎠⎟
11
1//
CHEE825/436 - Module 4
J. McLellan - Fall 2005 83
Minimum Data Length (MDL)
• Another approach - find “minimum length description” of data - measure is based on loss function + penalty for terms
• find description that minimizes this criterion
NN
VN)log(
)dim(θ+
CHEE825/436 - Module 4
J. McLellan - Fall 2005 84
Cross-Validation
Collect additional data, or partition your data set, and predict output(s) for the additional input sequence– poor predictions - modify model accordingly, re-estimate
with old data and re-validate– good predictions - use your model!
Note - cross-validation set should be collected under similar conditions– operating point, no known disturbances (e.g., feed changes)
CHEE825/436 - Module 4
J. McLellan - Fall 2005 85
Debutanizer Example
Search over a range of ARX model orders and time delay:
poles: 1-4
zeros: 1-4
time delay: 1-6
Examine mean square error, MDL, AIC and/or FPE
- Matlab generated -> ARX(2,2,5) model is best
CHEE825/436 - Module 4
J. McLellan - Fall 2005 86
Debutanizer Example
0 2 4 6 8 100
0.02
0.04
0.06
0.08
0.1
0.12
# of par's
% Unexplained of output variance
Model Fit vs # of par's
AIC optimal (ARX3,2,5)
MDL optimal (ARX2,2,5)
CHEE825/436 - Module 4
J. McLellan - Fall 2005 87
Other methods...
Look for Singularity of the “Information Matrix”
CHEE825/436 - Module 4
J. McLellan - Fall 2005 88
Outline
• The Modeling Task• Types of Models• Model Building Strategy• Model Diagnostics• Identifying Model Structure• Modeling Non-Stationary Data• MISO vs. SISO Model Fitting• Closed-Loop Identification
CHEE825/436 - Module 4
J. McLellan - Fall 2005 89
What is Non-Stationary Data?
Non-stationary disturbances – exhibit meandering or wandering behaviour– mean may appear to be non-zero for periods of time– stochastic analogue of integrating disturbance
Non-stationarity is associated with poles on the unit circle in the disturbance transfer function
» AR component has one or more roots at 1
CHEE825/436 - Module 4
J. McLellan - Fall 2005 90
Non-StationaryData
0 100 200 300-4
-2
0
2
4AR parameter of 0.3
output
0 100 200 300-5
0
5AR parameter of 0.6
output
0 100 200 300-10
-5
0
5
10AR parameter of 0.9
time
output
0 100 200 300-20
-10
0
10
20Non-stationary
time
output
CHEE825/436 - Module 4
J. McLellan - Fall 2005 91
How can you detect non-stationary data?
Visual– meandering behaviour
Quantitative– slowly decaying autocorrelation behaviour– difference the data– examine autocorrelation, partial autocorrelation functions for
differenced data– evidence of MA or AR indicates a non-stationary, or
integrated MA or AR disturbance
CHEE825/436 - Module 4
J. McLellan - Fall 2005 92
Differencing Data
… is the procedure of putting the data in “delta form”
Start with y(t) and convert to
– explicitly accounting for the pole on the unit circle
Δy t y t y t( ) ( ) ( )= − −1
CHEE825/436 - Module 4
J. McLellan - Fall 2005 93
Detecting Non-Stationarity
-2 0 2 4 6 8 10 120
0.5
1
response
Autocorrelation for Non-Stationary Disturbance
-2 0 2 4 6 8 10 12-0.5
0
0.5
1
time
response
Autocorrelation for Differenced Disturbance
CHEE825/436 - Module 4
J. McLellan - Fall 2005 94
Impact of Over-Differencing
Over-differencing can introduce extra meandering and local trends into data
Differencing - “cancels” pole on unit circle
Over-differencing - introduces artificial unit pole into data
CHEE825/436 - Module 4
J. McLellan - Fall 2005 95
Recognizing Over-Differencing
Visual– more local trends, meandering in data
Quantitative– autocorrelation behaviour decays more slowly than initial
undifferenced data
CHEE825/436 - Module 4
J. McLellan - Fall 2005 96
Estimating Models for Non-Stationary Data
Approaches
Estimate the model using the differenced data
Explicitly incorporate the pole on the unit circle in the disturbance transfer function specification
CHEE825/436 - Module 4
J. McLellan - Fall 2005 97
Estimating Models from Differenced Data
• Prepare the data by differencing BOTH the input and the output
• Specify initial model structure after using graphical, quantitative tools
• Estimate, diagnose model for differenced data• Convert model to undifferenced form by multiplying
through by (1-q-1)• Assess predictions on undifferenced data for fitting
and validation data sets
CHEE825/436 - Module 4
J. McLellan - Fall 2005 98
Differenced Form of Box-Jenkins Model
Note - in time series literature,
is used to denote differencing
A q y tB q
F qq u t
C q
D qa tf( ) ( )
( )
( )( )
( )
( )( )( )−
−
−− +
−
−+ = +11
11
1
11Δ Δ
∇= − =−( )1 1q Δ
CHEE825/436 - Module 4
J. McLellan - Fall 2005 99
Outline
• Types of Models• Model Estimation Methods• Identifying Model Structure• Model Diagnostics• Estimating MIMO models
CHEE825/436 - Module 4
J. McLellan - Fall 2005 100
SISO Approach
Estimate models individually
Advantage– simplicity
Disadvantage– need to reconcile disturbance models for each input-output
channel in order to obtain one disturbance model for the output
– can’t assess directionality with respect to inputs
CHEE825/436 - Module 4
J. McLellan - Fall 2005 101
MISO Approach
Estimate the transfer function models + disturbance model for a single output and all inputs simultaneously
Advantage– consistency - obtain one disturbance model directly– potential to assess directionality
Disadvantage– complexity - recognizing model structures is more difficult
CHEE825/436 - Module 4
J. McLellan - Fall 2005 102
A Hybrid Approach
• conduct preliminary analysis using SISO approach– model structures– apparent disturbance structure
• estimate final model using MISO approach– must decide on a common disturbance structure
• feasible if input sequences are independent
CHEE825/436 - Module 4
J. McLellan - Fall 2005 103
Outline
• Types of Models• Model Estimation Methods• Identifying Model Structure• Model Diagnostics• Closed-loop vs. open-loop estimation
CHEE825/436 - Module 4
J. McLellan - Fall 2005 104
The Closed-Loop Identification Problem
Yt
UtSPt
+-
ControllerGc
ProcessGp
dither signal Wt
X
CHEE825/436 - Module 4
J. McLellan - Fall 2005 105
Where should the input signal be introduced?
Options:
Dither at the controller output– clearer indication of process dynamics– preferred approach
Perturbations in the setpoint– additional controller dynamics will be included in estimated
model
CHEE825/436 - Module 4
J. McLellan - Fall 2005 106
What do the closed-loop data represent?
• dither signal case, without disturbances• open-loop
– input-output data represents
• closed-loop– input-output data represents
Y G Wt p t=
YG
G GWt
p
p ct=
+1
CHEE825/436 - Module 4
J. McLellan - Fall 2005 107
Estimating Models from Closed-Loop Data
Approach #1:
Working with W-Y data,
estimate and back out controller to
obtain process transfer function. – we already know the controller transfer function
G
G Gp
p c1+
CHEE825/436 - Module 4
J. McLellan - Fall 2005 108
Estimating Models from Closed-Loop Data
Approach #2:
Estimate transfer functions for the process
(U ->Y), and for the controller (Y->U) simultaneously.
CHEE825/436 - Module 4
J. McLellan - Fall 2005 109
Estimating Models from Closed-Loop Data
Approach #3:
Fit the model as in the open-loop case (U->Y).
Note that so that we are effectively
using a filtered input signal.
WGG
Ucp+
=1
1
CHEE825/436 - Module 4
J. McLellan - Fall 2005 110
Some Useful References
Identification Case Study - paper by Shirt, Harris and Bacon (1994).
Closed-Loop Identification - issues
- paper by MacGregor and Fogal
System Identification Workshop
- paper edited by Barry Cott