IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 5, … · 2013-10-16 · problems in...

15
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 5, MAY 2011 983 A Geometric Approach to Variance Analysis in System Identification Håkan Hjalmarsson, Member, IEEE, and Jonas Mårtensson Abstract—This paper addresses the problem of quantifying the model error (“variance-error”) in estimates of dynamic systems. It is shown that, under very general conditions, the asymptotic (in data length) covariance of an estimated system property (repre- sented by a smooth function of estimated system parameters) can be interpreted in terms of an orthogonal projection of a certain function, associated with the property of interest, onto a subspace determined by the model structure and experimental conditions. The presented geometric approach simplifies structural analysis of the model variance and this is illustrated by analyzing the influence of inputs and sensors on the model accuracy. Index Terms—Asymptotic covariance, model accuracy, sto- chastic systems, system identification. I. INTRODUCTION Q UANTIFICATION of the model error is a core issue in system identification and much research effort has been devoted to this issue; the following list of contributions is by no means complete: [1]–[21]. In the stochastic setting [22] that we will consider in this contribution, unknown system pa- rameters are estimated using a data set consisting of measured inputs and outputs resulting in the pa- rameter estimate . We assume that a true parameter value exists, denoted by , and we assume that the (normal- ized) model error becomes normally distributed as the sample size of the data set grows to infinity As (1) The asymptotic covariance matrix of the limit distri- bution is a measure of the model accuracy. This is reinforced by that, under mild conditions [22] The asymptotic covariance matrix was early used for optimal experiment design [23]–[26] and Ljung’s [27] model-order Manuscript received March 25, 2009; revised December 11, 2009, June 01, 2010, and August 04, 2010; accepted August 04, 2010. Date of publication September 13, 2010; date of current version May 11, 2011. This work was sup- ported in part by the Swedish Research Council under Contract 621-2007-6271 and Contract 621-2009-4017. Recommended by T. Zhou. H. Hjalmarsson and J. Mårtensson are with the ACCESS Linnaeus Center, KTH School of Electrical Engineering, KTH—Royal Institute of Tech- nology, S-100 44 Stockholm, Sweden (e-mail: [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TAC.2010.2076213 asymptotic variance expression led to experiment designs for high-order models [28]–[31]. Optimal experiment design prob- lems have received a renewed interest in recent years [32]–[37]. A. The Importance of Asymptotic Covariance Theory As a motivating example we would like to mention Zhu, [38], [39], who has developed an experiment design procedure, en- tirely based on the asymptotic covariance theory, that has signif- icantly reduced the time and cost related to modeling in several industrial plants. We quote [39]: Model identification plays a crucial role in MPC tech- nology and it is also the most time consuming and difficult task in MPC projects and maintenance. Considerable benefits are obtained using the new identifi- cation technology: 1) reduction of identification test time and model building time by over 70%; 2) higher model quality for control; One specific example is given in the case study [40], which deals with the re-tuning of an MPC controller at the Hovensa re- finery in Virgin Islands, United States, which is one of the largest refineries in the world, with a crude oil processing capacity of 495.000 barrels per day. The MPC application at hand had 34 inputs and 90 outputs, and the last time the model was tuned it took one month to gather the data and one month to identify the model. With the new method, the total modeling time was re- duced to just five days; a reduction of over 90%. This example more than well justifies the use of the asymp- totic covariance theory for assessing the model quality, and we will now return to the study of the asymptotic covariance matrix. B. The Asymptotic Covariance Matrix Often the asymptotic covariance matrix can be written as (2) where , for some integer depending on the model structure, and whose elements belong to and where denotes the integral (superscript denotes complex conjugate transpose). The rela- tion (2) will be our standing assumption throughout this paper. 1) Example 1: Consider a system described by the linear regression (3) where , where are elements of a stationary stochastic process with bounded moments and , and where is white noise with bounded 0018-9286/$26.00 © 2010 IEEE

Transcript of IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 5, … · 2013-10-16 · problems in...

Page 1: IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 5, … · 2013-10-16 · problems in estimation of linear time invariant (LTI) sys-tems. First we illustrate the method by examining

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 5, MAY 2011 983

A Geometric Approach to VarianceAnalysis in System Identification

Håkan Hjalmarsson, Member, IEEE, and Jonas Mårtensson

Abstract—This paper addresses the problem of quantifying themodel error (“variance-error”) in estimates of dynamic systems.It is shown that, under very general conditions, the asymptotic (indata length) covariance of an estimated system property (repre-sented by a smooth function of estimated system parameters) canbe interpreted in terms of an orthogonal projection of a certainfunction, associated with the property of interest, onto a subspacedetermined by the model structure and experimental conditions.The presented geometric approach simplifies structural analysis ofthe model variance and this is illustrated by analyzing the influenceof inputs and sensors on the model accuracy.

Index Terms—Asymptotic covariance, model accuracy, sto-chastic systems, system identification.

I. INTRODUCTION

Q UANTIFICATION of the model error is a core issue insystem identification and much research effort has beendevoted to this issue; the following list of contributions

is by no means complete: [1]–[21]. In the stochastic setting [22]that we will consider in this contribution, unknown system pa-rameters are estimated using a dataset consisting of measured inputs and outputs resulting in the pa-rameter estimate . We assume that a true parametervalue exists, denoted by , and we assume that the (normal-ized) model error becomes normally distributedas the sample size of the data set grows to infinity

As (1)

The asymptotic covariance matrix of the limit distri-bution is a measure of the model accuracy. This is reinforced bythat, under mild conditions [22]

The asymptotic covariance matrix was early used for optimalexperiment design [23]–[26] and Ljung’s [27] model-order

Manuscript received March 25, 2009; revised December 11, 2009, June 01,2010, and August 04, 2010; accepted August 04, 2010. Date of publicationSeptember 13, 2010; date of current version May 11, 2011. This work was sup-ported in part by the Swedish Research Council under Contract 621-2007-6271and Contract 621-2009-4017. Recommended by T. Zhou.

H. Hjalmarsson and J. Mårtensson are with the ACCESS Linnaeus Center,KTH School of Electrical Engineering, KTH—Royal Institute of Tech-nology, S-100 44 Stockholm, Sweden (e-mail: [email protected];[email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TAC.2010.2076213

asymptotic variance expression led to experiment designs forhigh-order models [28]–[31]. Optimal experiment design prob-lems have received a renewed interest in recent years [32]–[37].

A. The Importance of Asymptotic Covariance Theory

As a motivating example we would like to mention Zhu, [38],[39], who has developed an experiment design procedure, en-tirely based on the asymptotic covariance theory, that has signif-icantly reduced the time and cost related to modeling in severalindustrial plants. We quote [39]:

Model identification plays a crucial role in MPC tech-nology and it is also the most time consuming and difficulttask in MPC projects and maintenance.Considerable benefits are obtained using the new identifi-cation technology: 1) reduction of identification test timeand model building time by over 70%; 2) higher modelquality for control;

One specific example is given in the case study [40], whichdeals with the re-tuning of an MPC controller at the Hovensa re-finery in Virgin Islands, United States, which is one of the largestrefineries in the world, with a crude oil processing capacity of495.000 barrels per day. The MPC application at hand had 34inputs and 90 outputs, and the last time the model was tuned ittook one month to gather the data and one month to identify themodel. With the new method, the total modeling time was re-duced to just five days; a reduction of over 90%.

This example more than well justifies the use of the asymp-totic covariance theory for assessing the model quality, and wewill now return to the study of the asymptotic covariance matrix.

B. The Asymptotic Covariance Matrix

Often the asymptotic covariance matrix can be written as

(2)

where , for some integer dependingon the model structure, and whose elements belong to andwhere denotes the integral(superscript denotes complex conjugate transpose). The rela-tion (2) will be our standing assumption throughout this paper.

1) Example 1: Consider a system described by the linearregression

(3)

where , where are elements of astationary stochastic process with bounded moments and

, and where is white noise with bounded

0018-9286/$26.00 © 2010 IEEE

Page 2: IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 5, … · 2013-10-16 · problems in estimation of linear time invariant (LTI) sys-tems. First we illustrate the method by examining

984 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 5, MAY 2011

moments, variance and independent of . The least-squaresestimate of has asymptotic covariance matrix

(4)

see [22]. If we take to be a Cholesky factor of ,(2) holds for this .

When the regressor vector in Example 1 has some struc-ture, structure can be provided to as well.

2) Example 2: The th order finite impulse response (FIR)model

(5)

with being white noise with bounded moments and vari-ance , corresponds to (3) with

where . Here is thedelay operator: . Now, if is stationary withspectrum , it follows from the multivariable counterpart to[22, Theorem 2.2] that the spectrum for is given by

and, from the definition of a spectrum as the Fourier transformof the correlation sequence, it thus follows that:

(6)

where denotes a stable minimum phase spectral factor ofthe spectrum (so that ). Thus, for an FIR modelstructure with the true system in the model set and with a sta-tionary input sequence, (2) holds with

(7)

Example 2 can be generalized to prediction error modelswhere the true system belongs to the model structure [22]. Theregressor is then replaced by the prediction error gradient.

The primary interest is often not the model parametersthemselves but some “system theoretic” quantity such as

the frequency response, impulse response coefficients, polesand zeros, some norm of the system, or the performance of aclosed loop system where the controller is designed using theidentified model. We will let such a quantity be represented bya differentiable function . Given an estimate

of the true parameter , which we assume belongs to theset of model parameters, a natural estimate of is .This estimator is motivated by that if is asymptoticallyefficient, i.e., it is consistent and its asymptotic covariancematrix equals the Cramér-Rao lower bound, then isalso asymptotically efficient. This follows from the invarianceprinciple, see [41].

It follows from (1) (and some additional mild conditions, see[22]) that

As

where, using Gauss’ approximation formula and (2), it can beshown that

where is the derivative . We shall beslightly more general and allow cases where is singular.Then the parameter estimate is non-unique but there may beother properties that are identifiable from the data. In thatcase, provided that belongs to the column space of , thecorrect variance is given by

(8)

We refer to [42], [43] for details. In the following, we will as-sume that belongs to the column space of and consider(8) as a definition of .

From knowledge of the model structure, the true system andexperimental conditions, the asymptotic covariance ofcan be computed, at least, numerically from (8). However, it isnot straightforward to obtain structural insights into how quanti-ties such as model structure, model order and experimental con-ditions influence the asymptotic covariance matrix from (8). Un-derstanding the nature of this expression for different problemsettings has been subject of rather intense research. An exampleof a well known structural result is that the variance of themodel parameters increases with the model order [44], [45].Another, more recent, result in the same vein has been to de-termine when estimating a transfer function in a multiple inputsingle output system may help improve the quality of estimatesof other transfer functions in the model [46]. Also related to ourwork is the very interesting paper [47] where conditions for theasymptotic covariance matrix to be non-singular are studied.

Most of what is known regarding the structure of (8) pertainsto models of linear time-invariant systems. An exception is[48] where the asymptotic variance for the linear part of aHammerstein model is characterized asymptotically as themodel order tends to infinity and when the linear part has afixed denominator.

Recently, there have been several interesting contributionsto non-asymptotic assessment of confidence regions [3],[49]–[51]. In [52] it is discussed under which conditions theasymptotic theory is valid.

3) Contributions and Outline: This contribution has itsorigin in [53], where exact expressions for the asymptoticvariance of frequency function estimates for LTI models werederived using the theory of reproducing kernels. It was in thiswork that the importance of the subspace generated by the rowsof was recognized. Our work can be seen as an extension ofthe work in [53] to general system properties (represented bythe function ) and a more general class of model structures

Page 3: IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 5, … · 2013-10-16 · problems in estimation of linear time invariant (LTI) sys-tems. First we illustrate the method by examining

HJALMARSSON AND MÅRTENSSON: A GEOMETRIC APPROACH TO VARIANCE ANALYSIS IN SYSTEM IDENTIFICATION 985

using new techniques which deepens the geometric interpreta-tion of (8) initiated in [53]. Related to our work is [54], wherea geometric interpretation is given to the Cramér-Rao lowerbound for the special case where the estimate of the parametervector is based on an observation of a normal distributedvariable with mean and known covariance. An approachsimilar to [54] is taken in [55]. A preliminary version of thispaper has appeared as [56].

More precisely, the contributions of this paper are:i) Section II: A geometric interpretation of the asymptotic

variance (8). The results referred to above are by nomeans obvious by studying the initial expression (8), in-stead many of the results are proved using rather intricatematrix manipulations. A new geometric interpretationis presented which facilitates the understanding of howdifferent quantities such as model structure, model order,input spectra and the quantity of interest, influence theasymptotic accuracy.

ii) Section III: General structural results. Here we providesome general results concerning what happens with (8)when the structure of changes.

iii) Section IV: Structural results for LTI-systems. In this sec-tion we apply the results of Section III to some structuralproblems in estimation of linear time invariant (LTI) sys-tems. First we illustrate the method by examining howsystem complexity impacts the model accuracy. The par-ticular example of non-minimum phase zero identifica-tion is studied and both matrix algebra and the geometricapproach are used so that the reader may compare thesetwo methods. Secondly, we examine the impact differentinputs have whereby we extend results in [46]. Finally,we examine the impact different number of sensors haveon the estimation accuracy.

NOTATION

We will consider vector valued complex functions as row vec-tors and the inner product of two such functions

is defined as wheredenotes the complex conjugate transpose of . When and

are matrix-valued functions, we will still use the notationto denote whenever the dimensionsof and are compatible. The elements of will consist ofthe inner products between the functions corresponding to rowsof and rows of . In particular, is the Gramian matrixof the rows of . The -norm of is given by

. The space consists of all functionssuch that and when , the nota-

tion is simplified to . For ,denotes the th row of . When is partitioned into blocks, with

th block , denotes the function . Sim-ilarly, denotes . If for some pos-itive integers and , then denotes the subspace ofgenerated by the rows of .

For a differentiable function , isa matrix with as th entry, the partialderivative is defined analogously.

With being closed subspaces of a Hilbert space, we useto denote the subspace ,

to denote that is the direct sum of and ,i.e., that is the orthogonal complement of in . We use

to denote the orthogonal complement of with respectto the entire Hilbert space. Furthermore, we define tobe the orthogonal complement of in , i.e.,

.By we mean that is a normal distributed

random vector with mean and covariance matrix . TheMoore-Penrose pseudo-inverse of a matrix is denoted .

II. GENERAL RESULTS

The purpose of this paper is to provide a geometric interpreta-tion of (8). After recalling some standard results in Section II-A,we will in Section II-B show how an isometric isomorphism,between the stochastic model error and a deterministicobject related to , leads to the paper’s main technical result,Theorem II.5. This result will serve as foundation for the se-quel. In Sections II-C–II-E follow some results that can be usedas building blocks when analyzing (8).

Technically, the results concern orthogonal projections in theHilbert space and several results are well known; the maincontribution of this section is the linking of these results to theasymptotic covariance expression (8).

A. Technical Preliminaries

Lemma II.1: Let for some finite positive integersand . Then the number of linearly independent rows of ,

or, equivalently, the dimension of , the subspace of gen-erated by the rows of , is given by the rank of .

Proof: Straightforward.Next, we have the following standard result from Hilbert

space theory.Lemma II.2: Let and let be a closed subspace of

with orthonormal basis , . Then

is said to be the orthogonal projection of on and is the uniquesolution to

Proof: See, e.g., Lemma 6.2.1, Theorem 6.4.2 and The-orem 6.4.4 in [57].

Remark: Notice that with and as in Lemma II.2but with for some positive integer

so we write also in this case .Lemma II.3: Let and . Then the or-

thogonal projection of the rows of on (the subspace ofspanned by the rows of ) is given by

(9)

Page 4: IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 5, … · 2013-10-16 · problems in estimation of linear time invariant (LTI) sys-tems. First we illustrate the method by examining

986 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 5, MAY 2011

Furthermore

(10)

Finally, it holds that

(11)

where , for some , is any orthonormal basis for.

Proof: This is a standard result and we only point out thatby taking the inner product of the right-hand side of (9) with it-self, (10) readily follows. Furthermore, the equality in (11) fol-lows using Lemma II.2 when evaluating the left hand side.

Lemma II.4: With and as in Lemma II.3, the rows ofspan a subspace of with dimension equal to the

rank of

(12)

In particular, if and only if .Proof: By writing where has

full (column) rank and with being orthonormal, i.e.,, the result follows straightforwardly from Lemma II.1 and

Lemma II.3.

B. Basic Result

Estimation is closely linked to (orthogonal) projection,cf. optimal filtering and parameter estimation. Considerfor example the least-squares problem in Example 1. With

and , the problembecomes that of finding the best predictor of

, which effectively means that the predictor should be theorthogonal projection of onto the subspace spanned by therows of . The least-squares estimate is given by

(13)

where , and if we introduce the innerproduct in the linear vector space of real valuedrow vectors of dimension , we have from (13) that 1

(14)

which is exactly the right-hand side of (9) so that, i.e., the optimal predictor is indeed the orthogonal

projection of onto the linear subspace spanned by the rowsof . One may wonder whether the parameter estimate it-self can be related to some projection in a meaningful way. Asit turns out, this can be done for the estimation error .For simplicity of argument we will assume that is determin-istic and that . Under these assumptions and thosemade on the noise in Example 1, the elements of

1Notice that for� � and� � , ����� � �� is a matrixconsisting of the inner products between the rows of � and the rows of � .

are random variables with zero mean and finite second ordermoments so we can consider them as elements of the Hilbertspace with inner product2 . Notice that

(15)

i.e., the covariance matrix consists of all inner products betweenthe elements of . Furthermore, from (13) it follows thatthis covariance matrix can be expressed as

(16)

We will now argue that we can identify, through an isometricisomorphism, with an element, which itself is a pro-jection, in another Hilbert space which consists of deterministicelements. To this end notice first that the projection (14) gives

This calculation suggests that if we instead of projecting ,project the rows of some with the property that

, we will obtain a matrix3 whoserows are the projection of the rows of onto and for whichit holds that

(17)

Combining (15)–(17) we have shown that

(18)

which means that the elements of (as seen as elementsof the Hilbert space consisting of zero mean random variableswith finite variance and inner product ) canbe related to the rows of (which are seen as elements ofthe Hilbert space of vectors with inner product

) through an isometric isomorphism. This means that thesecond order statistical properties of can be deducedfrom . In particular (15) and (18) give

Furthermore, if we are interested in the variance of a linear com-bination of the parameter estimates, we obtain

for any for which .This leads us to our main technical result which generalizes

the idea above to asymptotic covariance matrices.Theorem II.5: Suppose that is differen-

tiable and let the asymptotic covariance matrix be

2The definition includes a transpose of the first argument so that when � and� are random row vectors with elements of this Hilbert space, then ��� �� �� � � � � is a matrix with the inner products between all elements of � and �.

3Notice that the notation is such that the same formula � ��� ���� ������� � as before gives as result a matrix whose rows are the rowsof � projected on � .

Page 5: IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 5, … · 2013-10-16 · problems in estimation of linear time invariant (LTI) sys-tems. First we illustrate the method by examining

HJALMARSSON AND MÅRTENSSON: A GEOMETRIC APPROACH TO VARIANCE ANALYSIS IN SYSTEM IDENTIFICATION 987

defined by (8) where . Suppose that issuch that

(19)

then

(20)

where is the subspace of spanned by the rows of .In particular, when is scalar

(21)

Proof: Follows directly from (8) and Lemma II.3.The relations (19)–(21) provide a geometric interpretation of

the asymptotic covariance matrix. Expression (21) shows thatthe asymptotic variance of is given by the squared normof the projection of onto the subspace spanned by the rows of

in (8). Notice that typically depends on the model struc-ture, operating conditions (such as input and noise excitationsand feedback configuration) but is independent of the quan-tity of interest . The link between and is given bythrough (19). The geometric interpretation together with thesetwo observations can provide useful insights, especially whencomparing asymptotic covariances for two different scenarios.For example when is changed, the function which is pro-jected will change, whereas the subspace onto which the pro-jection takes place will remain the same. Furthermore there areseveral situations when structural changes in the identificationproblem result in a change of the subspace whereas the func-tion to be projected can be kept the same, see Section II-Dfor further discussion. Qualitative information such as whichstructure has higher asymptotic covariance can then sometimesbe obtained by studying the corresponding subspaces withoutever computing the function to be projected. The remainingsections of the paper explore this possibility. For example, inSection IV-A we encounter a case where a whole range of modelstructures can be compared. However, before embarking on thistrack we provide some basic results that will be useful in thesequel.

C. Bounds

It may be cumbersome to compute the projection in (20).Lower and upper bounds can then be computed using the fol-lowing lemma.

Lemma II.6: Let and be two closed subspaces ofsuch that and let .

It holds that

(22)

Proof: By definition it follows thatand since we have . Thus,

and by taking the inner product of eachside of the equation with itself, the result (22) follows.

Upper bounds for (20) are obtained by taking andsuch that the projection is easy to compute, for ex-

ample is one possibility. Lower bounds can be ob-tained by taking and by choosing suitably.

The next lemma is useful for characterizing whenand are the spans of the rows of different -functions.

Lemma II.7: Let , and forsome positive integers , and . Then

(23)

and

(24)

where .Proof: is the set of -functions in

that are orthogonal to . This must thus be a subset of. But and

and (23) follows.Since

(25)

Now we make the orthogonal decomposition of intowhere the first term belongs to

and where the second term is orthogonal to . Bycombining these observations with (25) the result (24) follows.

D. Existence of Suitable Functions

The applicability of Theorem II.5 for rewriting the asymptoticcovariance matrix as (20) hinges on the existence of an -func-tion such that (19) holds. The next lemma shows that there isa whole family of such functions.

Lemma II.8: Let and let be such thatits columns are in the column space of .

Then all solutions to the equation aregiven by , where is any -functionorthogonal to .

Proof: Straightforward.The possibility to include an arbitrary term (as long as it

is orthogonal to ) in opens up the possibility of using thesame for different model structures. When this is possible, itwill only be the subspaces onto which the projections take placethat differ. This can facilitate comparisons significantly. For ex-ample, when two model structures are nested in the sense thatone subspace is a subset of the other, then Lemma II.6 immedi-ately gives that the asymptotic covariance for the structure cor-responding to the larger subspace will be no less than the otherone. We refer to Section III-B for more details on this. Anotheradvantage of this degree of freedom in the choice of is that itmay be used so that an upper bound (obtained using Lemma II.6)becomes tractable to compute. We will not pursue this idea here(due to space limitations), instead we refer to [58], [59] wherethis idea is used to obtain model structure independent upperbounds when the underlying system is linear time invariant.

E. Explicit Expression for the Projection

In some cases it is easy to compute the projectionexplicitly.

Page 6: IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 5, … · 2013-10-16 · problems in estimation of linear time invariant (LTI) sys-tems. First we illustrate the method by examining

988 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 5, MAY 2011

Lemma II.9: Let , and be as in Theorem II.5 andsuppose that for some and .Let , , be an orthonormal basis for . Then(20) can be expressed as

Proof: Let . Then there exists a full(column) rank such that . Inserting thisin and some straightforward algebra gives theresult.

III. STRUCTURAL RESULTS

In this section we will consider structural properties of theasymptotic covariance (8). We will derive expressions for how(8) changes when (and ) changes.

A. Introduction

We will use some simple examples to illustrate how structuralchanges affect the estimation accuracy. The purpose is to illu-minate the basic principles rather than the final expressions forthe asymptotic variances (which in these simple examples canbe derived in simpler ways).

Consider again the simple FIR model (5) in Example 2, wherewe for simplicity assume the noise variance . One ex-ample of a structural property of the asymptotic covariance ishow the variance of a particular parameter (or set of parameters)changes when the system order changes. Let us first assume thatthe true system order is , i.e.,

Following (7) and taking

(26)

we have from (4) and (6) that the asymptotic covariance of theestimate of is given by where

. If now the system instead is of order ,changes to

(27)

with associated asymptotic covariance

Since we see that thevariance of the -estimate increases when two parameters haveto be estimated (unless in which the variances becomeequal). Using Schur-complements, this well known fact can begeneralized to higher dimensions, see, e.g., Example 9.1 in [60].

Now, if we take a look at what happens structurally when thenumber of estimated parameters changes we see that the onlychange is that rows are added to (compare (26) to (27)). Weconclude that when rows are added to there is an informationdecrease regarding the original parameters.

Now we will examine how the information can be increased.One obvious method is to increase the excitation. Suppose thata second input (independent of ) can be used to excite thesystem, leading to the model

The asymptotic variance is now determined by

(28)

resulting in

(29)

where . Thus, we have shown the obvious factthat the second input will increase the information in the dataand thus improve the estimate.

Another way to increase the information contents is to usemore sensors. Suppose that we can use also measurements from

(30)

where is also zero mean white noise with unit variance (anduncorrelated with ) and where is a measured input, e.g.,

. Now the asymptotic variance of the least-squares esti-mate is again determined by (28) (with exchanged for ),so that the variance is given by (29). From the last two simple ex-amples we conclude that there is an information increase whencolumns are added to .

When both rows and columns are added there will be a jointeffect of information decrease and increase and one objectiveof this section is to provide tools for analyzing this situation.Another objective will be to derive general conditions for whenthe asymptotic variance remains constant to structural changes.This is instrumental when understanding the limitations thatneed to be considered when complex systems are identified aswell as how such systems should be identified. This is also im-portant in order to understand when adding new actuators andsensors to a system is beneficial from a system identificationpoint of view.

B. A Comparison Theorem

The following theorem is a technical result for comparisonof asymptotic covariances for two different structures. We willillustrate its use in ensuing subsections. In Section IV, these re-sults will be used for structural analysis when identifying linearsystems.

Theorem III.1: Let , andfor some positive integers , , , and . Define

(31)

Page 7: IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 5, … · 2013-10-16 · problems in estimation of linear time invariant (LTI) sys-tems. First we illustrate the method by examining

HJALMARSSON AND MÅRTENSSON: A GEOMETRIC APPROACH TO VARIANCE ANALYSIS IN SYSTEM IDENTIFICATION 989

and

(32)

Suppose .Then

(33)

and

(34)

Furthermore,

(35)

Proof: The result (33) is an almost immediate consequenceof Lemma II.6. We have

, but since we get

(36)

which gives (34) upon observing that (thisfollows from Lemma II.7). Taking the inner product of (36) withitself gives (33) due to that . Finally, (35) follows fromLemma II.1 and Lemma II.4.

Theorem III.1 explores the possibility of using the same func-tion to be projected for two different structures that was dis-cussed below Theorem II.5 and in Section II-D. Suppose that weare to estimate some quantity of a system and that we are giventwo identification settings (including the experimental condi-tions, model structure, etc) which correspond to ,

, 2, respectively, in the asymptotic covariance expression(8). When one can find such that , ,2 are the derivatives of , the quantity of interest, correspondingto , , 2 then, since , Theorem III.1 shows that theasymptotic covariance for is no larger than that for .

In the proof of Theorem III.1 it was noticed that. Thus, (33) and (34) show that how much differs fromis important for how big the difference in asymptotic

covariance will be. For example, when thenand regardless of . Furthermore, (35) gives

that the rank of is upper bounded by .One way to explicitly compute the right-hand side of (33) is

to express as using LemmaII.3 and then to use (10) in the same lemma.

C. Structure Extension

In Section III-A we saw that an information decrease occurswhen rows are added to and an information increase occurswhen columns are added. For given ,we are therefore interested in comparing the asymptotic covari-ance when and when

(37)

In order to work in (row) spaces of the same dimension, weextend the single blocks of with zeros into

(38)

Notice that so whether we use oris immaterial when we are computing asymptotic covari-

ances. We will thus compare

(39)

where we will comment on the function inSection III-D. By choosing either and or

and in Theorem III.1 we obtain results forwhen there will be an information decrease or an information in-crease, respectively, when is augmented from to . Byusing the structure that is an extension of we can pro-vide more detailed expressions for the projections that appear inTheorem III.1. This we do next.

1) Information Decrease: We start with the case whereand . Since in Theorem III.1 is positive (semi-

)definite, this corresponds to the case where augmenting fromto will result in a net decrease of information.

Theorem III.2: This is an application of Theorem III.1 forand according to (37)–(38). Let

and let for some inte-gers , , , and . Suppose that

. Defineand let be as in (32).

Then(40)

and

(41)

where .Furthermore,

(42)

and, finally, defined in (31) satisfies

(43)

Proof: First (40) and (42) follow from the orthogonal de-composition

. Furthermore, it follows from the decomposition above that, which, combined with Lemma

II.4, gives (43). Finally, (41) follows from

where the last equality is due to Lemma II.3.One case of particular interest is when and ,

i.e., when only rows are added to when is formed. We

Page 8: IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 5, … · 2013-10-16 · problems in estimation of linear time invariant (LTI) sys-tems. First we illustrate the method by examining

990 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 5, MAY 2011

will return to this setup in Section IV-A; here we only noticethat (41) simplifies to

so that (35) and (43) give the conditionfor the difference in (31) to

have zero rank.2) Information Increase: Taking and

in Theorem III.1 corresponds to the case where there is an in-formation increase when the structure is extended. From The-orem III.1 it follows that in this case it is the properties of thespace that will be important. To this end, for any

function let denote the projection of on , e.g.,

.

Theorem III.3: This is an application of Theorem III.1 forand according to (37)–(38). Letand let for some in-

tegers , , , and . Supposethat and let be defined by (32). Suppose inaddition that . Then

(44)

Furthermore

(45)

and

(46)

where .Furthermore,

(47)

and, finally defined in (31) satisfies

(48)

Proof: From the condition it followsthat and from the orthogonality

between and we get which

gives (44). The result (45) follows from (34) if we make thedecomposition and use the orthogonalitybetween and . From the same decomposition and the factthat we have

(49)

Now we observe that so that. Furthermore, both and

since they are orthogonal to both and . Insertingthis in (49) gives (46) and then (48) follows from a standard rankinequality .

Finally, (47) follows from the following decompositions:

which gives that .The expressions (44)–(48) can be used in Theorem III.1 to

quantify the decrease in the asymptotic variance when is ex-tended from to . When , a simple inter-pretation of the rank result (47) in Theorem III.3 can be given.Then in (47) and (48) we get

However, so

and we arrive at the interpretation that

equals the degrees of freedom offered by adding

to reduced by the degrees of freedom thathave to be used for the new rows that have been added. Inparticular, . Furthermore, since

and (recall that and arethe number of rows of and , respectively), it follows that

when .We summarize this.

Lemma III.4: Let the notation and assumptions in TheoremIII.3 be in force as well as the condition .

Theni)

ii) when.

Notice that i) in Lemma III.4 is a simple condition for whenthere will be no information increase no matter what is.

D. How to Use the Results to Compare Asymptotic CovarianceMatrices

The intended use of the results above is to compare covari-ances given by . Of course, a crucialaspect for the usefulness of Theorem III.1 is how easy it is toestablish the existence of a so that is satisfied forpre-specified and . Following the discussionaround the expression (8) in the Introduction, we require thatand belong to the column spaces of and ,respectively, which also ensures the existence of functionsand such that and . Lemma II.8gives the general solutions

(50)

When applying Theorem III.1 it is required thatand also that and that limits the possible

Page 9: IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 5, … · 2013-10-16 · problems in estimation of linear time invariant (LTI) sys-tems. First we illustrate the method by examining

HJALMARSSON AND MÅRTENSSON: A GEOMETRIC APPROACH TO VARIANCE ANALYSIS IN SYSTEM IDENTIFICATION 991

combinations of and . The latter condition is equivalentto

which implies

(51)

and hence while can be chosen freely, is directly deter-mined by . The existence of a solution to (50) for

follows from (51) since we then have

and the general solution to the joint problemis given by

(52)

That cannot be found so that is satisfied forarbitrary and may seem as a severe limitation. However,it is quite natural since for given and

does not hold for arbitrary and .One special case which sometimes is studied is when

for some , in which case it is required that. Condition (51) is then auto-

matically fulfilled

An example of this is in Theorem III.2 in the case when

, , and i.e., when

. Thus, when only rows are added to , TheoremIII.1 and Theorem III.2 can be used to compare the covarianceswhen is an extension of .

Another example of how the relations between and arerestricted is in Theorem III.3 where we have the condition (44)

which implies .

IV. STRUCTURAL RESULTS FOR LTI SYSTEMS

We will now use the results in the preceding sections to an-alyze some structural issues that arise when identifying LTIsystems.

A. Influence of Model Complexity

For simplicity we will assume that the true system is of FIRtype, i.e., the system is described by (5), with noise variance

, and that it is operated in open loop. We recall that

the asymptotic covariance matrix of the parameter estimates isgiven by (2) with given by (7).

Consider the problem of estimating a real-valued non-min-imum phase (NMP) zero located at . We are interestedin understanding how the asymptotic variance of the corre-sponding zero estimate, obtained from an FIR model, dependson the model complexity, as represented by the order whichhere is assumed to be larger than or equal to the system order

, (meaning that the true system always is in the model set).It is well known that the asymptotic variances of individualparameter estimates do not decrease when the model order isincreased, compare with the discussion in Section III-A. Thesame actually holds for the variance of a zero estimate, butthis fact is not immediately evident, (at least not to us), sincea zero depends on all parameters in a rather complicated way.The usual way to show this is by matrix algebra using Schurcomplements, a routine that will be explained in Section IV-A-I.

We will also ask the question whether there may be exper-imental conditions such that the asymptotic variance becomesindependent of the model order . This issue is of independentinterest since the system order is typically not known before-hand. We will analyze these two questions using both matrixalgebra and the geometric approach presented in this paper.

It is straightforward to show that the asymptotic variance ofthe estimate of a zero at is given by (8) with(recall that is defined in Example 2) where the constant ,which will be of no concern for us, depends on the true systembut not on the model order , see [61], [62] for details. Theasymptotic variance of the estimate of is thus given by

(53)

where is a stable minimum-phase spectral factor of the input.Introducing , the asymp-totic variances for model orders and are given by

(54)

and

(55)

respectively, with obvious definitions of the matrices , and.1) Analysis Using Matrix Algebra: Using a standard expres-

sion for the inverse of a block matrix (63), we can write

(56)

where is a Schur complement. However,the block matrix in the last term of (56) can be factorized. This

Page 10: IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 5, … · 2013-10-16 · problems in estimation of linear time invariant (LTI) sys-tems. First we illustrate the method by examining

992 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 5, MAY 2011

gives the following expression for the variance when the modelorder is

(57)

Since the first term in (57) corresponds to the asymptotic vari-ance when the model order is and the second term is non-nega-tive definite, we have shown that the variance is non-decreasingwith the model order.

We now turn to the question of whether the variance can beindependent of the model order . For this to hold, the secondterm in (57) has to be identically zero, and this has to hold re-gardless of . This implies the condition

(58)

which can be expressed as

(59)

where and are Toeplitz matrices

.... . .

......

. . ....

where . Thus, if the second order properties ofthe input are such that (59) is satisfied for any then theasymptotic variance is independent of the model order .

We now outline how one can determine if there existsuch that (59) is satisfied. First we determine all

possible such that

(60)

Note that (60) defines constraints on the pa-rameters in . The set of allowable parameters of :s can thenbe parametrized by an dimensional sub-space of . Using this parametrization and solving thelinear system of equations

(61)

for all in this subspace with respect to willproduce all possible (if any) solutions to (59). Fora fixed , the relation (61) defines constraints on the

parameters of the matrices and .Compounding factors when solving this problem are that only

those solutions which correspond to bona fide autocorrelationsequences are allowed and, furthermore, that we are looking forsolutions which are valid regardless of the particular value of

. All in all, this gives rise to a rather complex problem andeven though the solution can be obtained we will not make thecomplete derivation. Instead we will analyze the problem usingthe geometric tools that have been presented.

2) Geometric Analysis: Lemma II.8 gives that there existssuch that

(62)

But with and , then (33) in TheoremIII.1 gives that the asymptotic variance for the zero estimateis non-decreasing with model order and that the difference be-tween (55) and (54) is given by

(63)

Notice that we did not have to compute explicitly in order toarrive at this result and that the result holds for all and ,

.We now turn to the question of whether the variance can be

independent of the model order . We notice that if can betaken to belong to then the projection of on willbe zero. For example we can examine if (thefirst element of , which clearly belongs to ) is possible.

The conditions (62) that has to satisfy can be expressed as

Notice that these are constraints on the first coefficients in theseries expansion of the input spectrum. For example, the inputspectrum

(64)

satisfies (62). In fact, this spectrum satisfies (62) regardless ofthe order .

Thus, using the geometric approach, we have with little ef-fort shown that the asymptotic variance of an estimate of a NMPzero does not depend on the model complexity if the input spec-trum has a single pole at the zero itself (and at its mirror imagein the unit circle). The intuition is that this input spectrum pro-vides an infinite signal to noise ratio at the zero in question,since , which makes the estimate independent ofthe system complexity. Compare this with the well known factthat a sinusoid with frequency results in the asymptotic vari-ance of the frequency function estimate at being independentof the system complexity, see, e.g., [8]. The analysis presentedhere can be generalized and we refer the reader to [64]–[66] fordetails. The result is illustrated with a simulation example.

3) Example 3: In this example the identification of the zeroof the system is considered. The

model is where , 2, 3,10. White noise with variance 0.01 is added to the output and theinput is a unit variance white noise sequence, filtered by

. The pole of the filter is varied between

Page 11: IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 5, … · 2013-10-16 · problems in estimation of linear time invariant (LTI) sys-tems. First we illustrate the method by examining

HJALMARSSON AND MÅRTENSSON: A GEOMETRIC APPROACH TO VARIANCE ANALYSIS IN SYSTEM IDENTIFICATION 993

Fig. 1. Normalized sample variance from the numerical simulations ofExample 3.

and 0.9 in steps of 0.1 and the value corresponding to (64)is . Sequences of length of the two whiteinputs are drawn from a Gaussian distribution and the outputis simulated for the different pole locations . For each data setthe four models with different are estimated. The simulation isrepeated 25.000 times with different noise realizations. In Fig. 1the sample variance of the estimated zero, normalized by , isplotted against the pole location of the input filter. The gain

of the input filter is adjusted so that the asymptotic varianceof the zero, see (53), is equal for all when . Whenincreases to 10, the normalized sample variance increases withas much as 80% (for ), but when the increaseis only 20%. The asymptotic theory says that there should be noincrease at all for . This is not true when the sample sizeis small (as in this example), but note that the minimum varianceis achieved for . This example shows that the asymptoticresults also are useful when the sample size is small.

B. Influence of Inputs

Next we will consider the problem of when adding inputs mayincrease the information. The following result shows that thereare some limitations.

Theorem IV.1: Let

where . Suppose thatand . Let . Then

(65)

and the rank of the left-hand side is given by

(66)Proof: See Appendix.

We illustrate the use of Theorem IV.1 with an example.

1) Example 4: Consider the linear regression model

(67)

where is stationary. The question now is if the use of a secondinput , which we assume also to be stationary and uncorrelatedwith , will reduce the asymptotic variance of the estimate of .

Denoting by and , the stable minimum phase spectralfactors of the spectra of and , we have that the asymptoticcovariance matrix of the estimate of is given by(2) with

when only is used, and by

when also is used. Identifying and, Theorem IV.1 gives that the asymptotic

variance of the estimate of will decrease when is used onlyif

(68)For example, if is white noise, there will be no improvement.

For the readers’ benefit we provide the expression for here.The comparison that is made is between and

for a such that. From (52) we get where

is orthogonal to . If we letwe can express as

In [46] it is analyzed, for linear time-invariant MISO sys-tems, when a particular input improves the accuracy of a cer-tain parameter estimate. The rank result above gives more pre-cise information regarding the possible improvements than canbe obtained as compared with Theorem 4 in [46]. In particular,the example above illustrates that not only the connectedness ofthe transfer functions (see [46] for a definition) are importantfor when an added input will improve the estimate of a param-eter not directly connected to it, also the input properties areimportant.

C. Influence of Sensors

We will now turn to a problem that in a sense is dual to theproblem studied in Section IV-B. Consider a system that canbe modelled by an output error model

(69)

where and where is white noise with. The issue is now whether the accuracy of the es-

timated parameters can be improved by adding a second

Page 12: IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 5, … · 2013-10-16 · problems in estimation of linear time invariant (LTI) sys-tems. First we illustrate the method by examining

994 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 5, MAY 2011

sensor to the system. The new sensor will give measurementsaccording to

(70)

where is white noise and independent of with, and where represents unknown sensor dynamics

that also have to be estimated. The sensor dynamics is modeledby

(71)

where . As the signal contains informationabout it may seem obvious that a second sensor would im-prove the accuracy but let us analyze the problem using the toolswe have developed in this paper.

We first study the case where only is used. Then the outputerror model (69) will have asymptotic parameter covariancegiven by (2) with

(72)

where is the spectral factor of the input.When a second sensor is added and is estimated together

with , the maximum likelihood criterion is given by

(73)and it follows, see [67], that the asymptotic parameter covari-ance is given by (2) with

(74)

It is easy to verify that the condition (51) holds for, with as in

(72) and with as in (74). This ensures that thereexists a such that and

. The expression for is given by (52)

Now taking and in Theorem III.1 (asin Section III-C-II) and then referring to i) in Lemma III.4 nowgives the following result.

Theorem IV.2: Consider the system

where is white noise with .Let the system be modelled by the output error model (69)

which we assume to be globally identifiable. Assume also thatthe input signal is persistently exciting of order at least equal tothe number of estimated parameters.

Adding the sensor (70), where is white noise withwith variance and independent of and

where is unknown but can be modelled by (71), and esti-mating all unknown parameter using maximum likelihood es-timation does not improve the asymptotic covariance for theestimate of as compared to only using measurements ofif the elements of are spanned by the elements of

.Remark IV.3: With some additional effort, the result of The-

orem IV.2 can be extended to the singular case when .The condition in Theorem IV.2 holds, for example, when

has the same parametrization as and is propor-tional to . The perhaps surprising result from Theorem IV.2is thus that in this situation the extra sensor does not provide anynew information about . Perhaps even more surprising is thatthis holds regardless of the noise variance for the second sensor,i.e., even very high quality measurements from the extra sensorare useless (as measured by the asymptotic covariance matrix)for estimating if also is used.

When the condition in Theorem IV.2 does not hold, there maystill be estimates of certain properties of that does not im-prove when a second sensor is added. Drawing on the resultsabove, variance analysis for cascade systems is presented in[68].

V. CONCLUSIONS

We have in Theorem II.5 presented a geometric interpretationof the asymptotic covariance expression

While all quantitative results concerning this expression, e.g.,those presented in this paper, can equally well be obtained usingmatrix algebra such as Schur complements and the matrix inver-sion lemma, we believe that the geometrical interpretation canbe useful for unveiling structural relationships. In Section IV-Athis was illustrated by showing that a specific choice of inputspectrum makes the asymptotic variance of a NMP zero esti-mate independent of the system complexity. For comparisonpurposes, we also presented a derivation based entirely on ma-trix algebra. It is up to the reader to decide which approach suitshim/her. Furthermore, in Section IV-B we re-visited the problemof determining when new inputs can decrease the asymptoticcovariance. Again using geometric arguments, we were ableto show that the problem is more intricate than previously be-lieved in [46]. We have also analyzed the dual problem of whenadding sensors may help decrease the asymptotic covariancematrix (Section IV-C) and it was shown that there are situationsin which high quality measurements may be useless. We believethat this problem and its generalizations will turn out to be im-portant in distributed sensor networks. Our insights have alreadypaved the way for results concerning cascade systems [68].

We point to two properties that we believe are of key impor-tance in the geometric approach to variance analysis:

i) For qualitative structural analysis, it is often not necessaryto compute such that , it is sufficient toknow that such a exists. Notice that the only occasion inentire Section IV when we needed was in Section IV-A

Page 13: IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 5, … · 2013-10-16 · problems in estimation of linear time invariant (LTI) sys-tems. First we illustrate the method by examining

HJALMARSSON AND MÅRTENSSON: A GEOMETRIC APPROACH TO VARIANCE ANALYSIS IN SYSTEM IDENTIFICATION 995

and then only when we wanted to explicitly determine theinput spectrum.

ii) The freedom in the choice of represented by inLemma II.8. For some problems, where asymptotic co-variance matrices for two (or more) identification setups(or structures) are to be compared, this allows the same

(which still may not have to be computed explicitly) tobe used. Hence, it will only be the subspaces on which

is projected that differ. This will simplify the analysis.All results in Section III are based on this. For example,(62) shows that the same can be used for analyzing thevariance of a NMP zero up to some finite, but arbitraryhigh, system order.Another use of the freedom represented by is that itcan be used together with Lemma II.6 to obtain tractableexplicit upper bounds. We have not explored this possi-bility in this contribution but refer to [58], [59] wheremodel structure independent upper bounds are derived fora number of different estimated system properties whenthe underlying system is linear time invariant.

Our work has its origin in [53], where the importance of thesubspace spanned by the prediction error gradient was recog-nized. Perhaps the most important contribution is the analysistechnique as such. Hopefully the methodology will be of usein future research. It has already been used in [58] for derivingvariance expressions for estimates obtained from single inputsingle output causal finite dimensional LTI systems. We see asinteresting applications to LTI multi-input and multi-output sys-tems and non-linear systems, for which at present the availableresults on the structural properties of the asymptotic covariancematrix are limited. Experiment design is another area where theapproach has potential, e.g., the result in Section IV-A has beengeneralized in [64]–[66].

APPENDIX

PROOF OF THEOREM IV.1

It is easily verified thatfor any and Theorem III.3 shows that there existsa such that where

. Theorem III.1 gives (65) ispositive and that the rank of the left-hand side of (65) isgiven by the rank of which in turn by (46)in Theorem III.3 equals the rank of where

.

Introduce ,

and and the notation .

Next, observe that sincewith these two terms being orthogonal and where,

by assumption, . Decomposingand observing that gives that

where we have used the same decomposition of as above.Now, since and by assump-tion, we obtain

Since and by assumption,the rank of this expression equals the rank ofbut since by assumption the rank of thisquantity equals the rank of whichaccording to Lemmas II.1 and II.4 equals the right-most expres-sion in (66).

ACKNOWLEDGMENT

The authors would like to thank the six anonymous refereesfor many constructive comments. In particular the interpretationat the beginning of Section II-B is due to one of the reviewers.The first author wishes to thank Prof. Brett Ninness, colleagueand friend, for a very rewarding collaboration throughout thelast decade, it has been instrumental for arriving at the results inthis paper.

REFERENCES

[1] J. Antoni and J. Schoukens, “A comprehensive study of the biasand variance of frequency-response-function measurements: Optimalwindow selection and overlapping strategies,” Automatica, vol. 43, no.10, pp. 1723–1736, 2007.

[2] X. Bombois, B. Anderson, and M. Gevers, “Mapping parametricconfidence ellipsoids to Nyquist plane for linearly parametrizedtransfer functions,” in Model Identification and Adaptative Control, G.Goodwin, Ed. New York: Springer Verlag, 2000, pp. 53–71.

[3] M. Campi and E. Weyer, “Finite sample properties of system iden-tification methods,” IEEE Trans. Autom. Contr., vol. 47, no. 8, pp.1329–1334, Aug. 2002.

[4] S. Douma and P. V. den Hof, “Relations between uncertainty struc-tures in identification for robust control,” Automatica, vol. 41, no. 3,pp. 439–457, 2005.

[5] M. Gevers, L. Ljung, and P. Van den Hof, “Asymptotic variance ex-pressions for closed-loop identification,” Automatica, vol. 37, no. 5,pp. 781–786, 2001.

[6] R. Hakvoort and P. Van den Hof, “Identification of probabilistic systemuncertainty regions by explicit evaluation of bias and variance errors,”IEEE Trans. Autom. Contr., vol. 42, no. 11, pp. 1516–1528, Nov. 1997.

[7] R. Hildebrand and M. Gevers, “Quantification of the variance of esti-mated transfer functions in the presence of undermodelling,” in Proc.13th IFAC Symp. System Identification, Rotterdam, The Netherlands,2003, pp. 1850–1855.

[8] H. Hjalmarsson, “From experiment design to closed loop control,” Au-tomatica, vol. 41, no. 3, pp. 393–438, Mar. 2005.

[9] L. Ljung, “Perspectives on system identification,” in Proc. 17th IFACWorld Congress, Seoul, Korea, Jul. 2008, pp. 7172–7184.

[10] L. Ljung and L. Guo, “The role of model validation for assessing thesize of the unmodeled dynamics,” IEEE Trans. Autom. Cotntr., vol. 42,no. 9, pp. 1230–1239, 1997.

[11] M. Milanese and C. Novara, “Model quality in nonlinear SM identi-fication,” in Proc. 42nd IEEE Conf. Decision Control, Maui, HI, Dec.2003, pp. 6021–6026.

[12] M. Milanese and A. Vicino, “Optimal estimation theory for dynamicsystems with set membership uncertainty: An overview,” Automatica,vol. 27, no. 6, pp. 997–1009, 1991.

[13] M. Milanese and A. Vicino, “� set membership identification: Asurvey,” Automatica, vol. 41, no. 12, pp. 2019–2032, 2005.

[14] B. Ninness, “On the CRLB for combined model and model order esti-mation of stationary stochastic processes,” IEEE Signal Process. Lett.,vol. 11, pp. 293–296, Feb. 2004.

[15] B. Ninness and G. Goodwin, “Estimation of model quality,” Auto-matica, vol. 31, no. 12, pp. 32–74, 1995.

Page 14: IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 5, … · 2013-10-16 · problems in estimation of linear time invariant (LTI) sys-tems. First we illustrate the method by examining

996 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 5, MAY 2011

[16] B. Ninness and S. Henriksen, “System identification via a computa-tional Bayesian approach,” in Proc. 13th IFAC Symp. System Identifi-cation, Rotterdam, The Netherlands, 2003, pp. 785–790.

[17] B. Ninness and H. Hjalmarsson, “On the frequency domain accuracyof closed-loop estimates,” Automatica, vol. 41, no. 7, pp. 1109–1122,2005.

[18] R. Pintelon, P. Guillaume, and J. Schoukens, “Uncertainty calculationin (operational) modal analysis,” Mech. Syst. Signal Process., vol. 21,no. 6, pp. 2359–2373, 2007.

[19] R. Pintelon, J. Schoukens, and Y. Rolain, “Uncertainty of transfer func-tion modelling using prior estimated noise models,” Automatica, vol.39, no. 10, pp. 1721–1733, 2003.

[20] W. Reinelt, A. Garulli, and L. Ljung, “Comparing different approachesto model error modeling in robust identification,” Automatica, vol. 38,no. 5, pp. 787–803, May 2002.

[21] B. Wahlberg and L. Ljung, “Hard frequency-domain model errorbounds from least-squares like identification techniques,” IEEE Trans.Autom. Contr., vol. 37, no. 7, pp. 900–912, Jul. 1992.

[22] L. Ljung, System Identification: Theory for the User, 2nd ed. Engle-wood Cliffs, NJ: Prentice Hall, 1999.

[23] V. Fedorov, Theory of Optimal Experiments, ser. Probability and Math-ematical Statistics. New York: Academic, 1972, vol. 12.

[24] G. Goodwin and R. Payne, Dynamic System Identification: Experi-ment Design and Data Analysis, ser. Mathematics in Science and En-gineering. New York: Academic, 1977, vol. 136.

[25] R. Mehra, “Optimal input signals for parameter estimation in dynamicsystems—Survey and new results,” IEEE Trans. Autom. Contr., vol.19, no. 12, pp. 753–768, Dec. 1974.

[26] M. Zarrop, Optimal Experiment Design for Dynamic System Identi-fication. Berlin, Germany: Springer Verlag, 1979, Lecture Notes inControl and Information Sciences. Sci. 21.

[27] L. Ljung, “Asymptotic variance expressions for identified black-boxtransfer function models,” IEEE Trans. Autom. Contr., vol. 30, no. 9,pp. 834–844, Sep. 1985.

[28] U. Forssell and L. Ljung, “Some results on optimal experiment design,”Automatica, vol. 36, no. 5, pp. 749–756, May 2000.

[29] M. Gevers and L. Ljung, “Optimal experiment designs with respectto the intended model application,” Automatica, vol. 22, pp. 543–554,1986.

[30] H. Hjalmarsson, M. Gevers, and F. D. Bruyne, “For model-based con-trol design, closed-loop identification gives better performance,” Auto-matica, vol. 32, no. 12, pp. 1659–1673, 1996.

[31] Z. Yuan and L. Ljung, “Unprejudiced optimal open loop input designfor identification of transfer functions,” Automatica, vol. 21, no. 6, pp.697–708, 1985.

[32] G. Belforte and P. Gay, “Optimal experiment design for regressionpolynomial models identification,” Int. J. Contr., vol. 75, no. 15, pp.1178–1189, Oct. 2002.

[33] G. Franceschini and S. Macchietto, “Model-based design of experi-ments for parameter precision: State of the art,” Chem. Eng. Sci., vol.63, no. 19, pp. 4846–4872, 2008.

[34] C. Jauberthie, L. Denis-Vidal, P. Coton, and G. Joly-Blanchard, “Anoptimal input design procedure,” Automatica, vol. 42, no. 5, pp.881–884, 2006.

[35] L. Pronzato, “Adaptive optimization and D-optimum experimental de-sign,” Ann. Statist., vol. 28, no. 6, pp. 1743–1761, 2000.

[36] L. Pronzato, “Optimal experimental design and some related controlproblems,” Automatica, vol. 44, no. 2, pp. 303–325, 2008.

[37] C. R. Rojas, J. S. Welsh, G. C. Goodwin, and A. Feuer, “Robust optimalexperiment design for system identification,” Automatica, vol. 43, no.6, pp. 993–1008, 2007.

[38] Y. Zhu, Multivariable System Identification for Process Control. Ox-ford, U.K.: Elsevier Science, 2001.

[39] Y. Zhu, “System identification for process control: recent experienceand outlook,” Int. J. Modelling, Identification Contr., vol. 6, no. 2, pp.89–103, 2009.

[40] P. Celaya, R. Tkatch, Y. Zhu, and R. Patwardhan, “Closed-loop identi-fication at the Hovensa refinery,” presented at the NPRA Plant Autom.Decision Support Conf., San Antonio, TX, 2004.

[41] P. Zehna, “Invariance of maximum likelihood estimators,” Ann. Math.Statist., vol. 37, no. 755, pp. 744–744, 1966.

[42] H. Hjalmarsson, “System identification of complex and structured sys-tems,” Eur. J. Control/Plenary Address Eur. Contr. Conf., vol. 15, no.3–4, pp. 275–310, May–Aug. 2009.

[43] P. Stoica and T. Marzetta, “Parameter estimation problems with sin-gular information matrices,” IEEE Trans. Signal Process., vol. 49, pp.87–90, Jan. 2001.

[44] G. Box and G. Jenkins, Time Series Analysis, Forecasting and Con-trol. Oakland, CA: Holden-Day, 1976.

[45] I. Gustavsson, L. Ljung, and T. Söderström, “Identification ofprocesses in closed-loop—Identifiability and accuracy aspects,” Auto-matica, vol. 13, no. 1, pp. 59–75, 1977.

[46] M. Gevers, L. Miskovic, D. Bonvin, and A. Karimi, “Identification ofmulti-input systems: variance analysis and input design issues,” Auto-matica, vol. 42, no. 4, pp. 559–572, 2006.

[47] M. Gevers, A. Bazanella, X. Bombois, and L. Miskovic, “Identificationand the information matrix: How to get just sufficiently rich?,”IEEE Trans. Autom. Contr., vol. 54, no. 12, pp. 2828–2840, Dec.2009.

[48] B. Ninness and S. Gibson, “Quantifying the accuracy of Hammersteinmodel estimation,” Automatica, vol. 38, no. 12, pp. 2037–2051, 2002.

[49] M. Campi, S. Ooi, and E. Weyer, “Non-asymptotic quality assessmentof generalised FIR models with periodic inputs,” Automatica, vol. 40,no. 12, pp. 2029–2041, 2004.

[50] M. Campi and E. Weyer, “Guaranteed non-asymptotic confidenceregions in system identification,” Automatica, vol. 41, no. 10, pp.1751–1764, 2005.

[51] S. Douma and P. Van den Hof, “Probabilistic model uncertaintybounding: An approach with finite-time perspectives,” in Proc. 4thIFAC Symp. System Identification, Newcastle, Australia, Mar. 27–29,2006, pp. 1021–1026.

[52] S. Garatti, M. Campi, and S. Bittanti, “Assessing the quality of iden-tified models through the asymptotic theory—When is the result reli-able?,” Automatica, vol. 40, no. 8, pp. 1319–1332, 2004.

[53] B. Ninness and H. Hjalmarsson, “Variance error quantifications that areexact for finite model order,” IEEE Trans. Autom. Contr., vol. 49, pp.1275–1291, 2004.

[54] L. Scharf and L. McWorter, “Geometry of the Cramer-Rao bound,”Signal Process., vol. 31, no. 3, pp. 301–311, 1993.

[55] E. Dilaveroglu, “Simple general expression for Cramér-Rao bound inpresence of nuisance parameters,” Electron. Lett., vol. 38, no. 25, pp.1750–1751, 2002.

[56] H. Hjalmarsson and J. Mårtensson, “A geometric approach to varianceanalysis in system identification: Theory and nonlinear systems,” inProc. 46th IEEE Conf. Decision Contr., New Orleans, LA, Dec. 2007,pp. 5092–5097.

[57] A. Friedman, Foundations of Modern Analysis. New York: Dover,1982.

[58] J. Mårtensson and H. Hjalmarsson, “Variance analysis in SISO linearsystems identification,” IEEE Trans. Autom. Contr., 2011, provision-ally accepted for publication.

[59] J. Mårtensson and H. Hjalmarsson, “A geometric approach to vari-ance analysis in system identification: Linear time-invariant systems,”in 46th IEEE Conf. Decision Contr., Dec. 2007, pp. 4269–4274.

[60] H. Hjalmarsson, “Aspects on incomplete modeling in system identi-fication,” Ph.D. dissertation, Linköping University, Sweden, Sweden,1993.

[61] K. Lindqvist and H. Hjalmarsson, “Identification of performance lim-itations in control,” in Proc. Eur. Contr. Conf., Porto, Portugal, 2001,pp. 1446–1451.

[62] J. Mårtensson and H. Hjalmarsson, “Variance error quantification foridentified poles and zeros,” Automatica, vol. 45, no. 11, pp. 2512–2525,Nov. 2009.

[63] T. Kailath, Linear Systems. Englewood Cliffs, NJ: Prentice-Hall,1980.

[64] H. Hjalmarsson, J. Mårtensson, and B. Wahlberg, “On some robustnessissues in input design,” in Proc. 14th IFAC Symp. Syst. Identification,Mar. 2006, pp. 511–516.

[65] J. Mårtensson and H. Hjalmarsson, “How to make bias andvariance errors insensitive to system and model complexity inidentification,” IEEE Trans. Autom. Contr., vol. 56, no. 1, pp.100–112, 2011.

[66] J. Mårtensson, Geometric Analysis of Stochastic Model Errors inSystem Identification KTH, Stockholm, Sweden, 2007 [Online].Available: http://www.ee.kth.se/~jonas1/thesis.pdf

[67] T. Söderström and P. Stoica, System Identification. : Prentice HallInternational series in systems and control engineering, 1989.

[68] B. Wahlberg, H. Hjalmarsson, and J. Mårtensson, “Variance resultsfor identification of cascade systems,” Automatica, vol. 45, no. 6, pp.1443–1448, Jun. 2009.

Page 15: IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 56, NO. 5, … · 2013-10-16 · problems in estimation of linear time invariant (LTI) sys-tems. First we illustrate the method by examining

HJALMARSSON AND MÅRTENSSON: A GEOMETRIC APPROACH TO VARIANCE ANALYSIS IN SYSTEM IDENTIFICATION 997

Håkan Hjalmarsson (M’98) was born in 1962. Hereceived the M.Sc. degree in electrical engineering,the Licentiate degree, the Ph.D. degree in automaticcontrol from Linköping University, Sweden, in 1988,1990, and 1993, respectively.

He has held visiting research positions at Cali-fornia Institute of Technology, Louvain Universityand at the University of Newcastle, Australia. His re-search interests include system identification, signalprocessing, control and estimation in communicationnetworks and automated tuning of controllers.

Dr. Hjalmarsson has served as an Associate Editor for Automatica(1996–2001), and IEEE TRANS. AUTOM. CONTR. (2005–2007) and been GuestEditor for European Journal of Control and Control Engineering Practice. Heis Professor at the School of Electrical Engineering, KTH, Stockholm, Sweden.He is Chair of the IFAC Technical Committee on Modeling, Identificationand Signal Processing. In 2001 he received the KTH award for outstandingcontribution to undergraduate education.

Jonas Mårtensson was born in 1976. He received theM.Sc. degree in vehicle engineering and the Ph.D.degree in automatic control from KTH Royal Insti-tute of Technology, Stockholm, Sweden, in 2002 and2007, respectively.

Since 2007 he is a Postdoctoral research associateat the School of Electrical Engineering at KTH. Hisresearch interests include system identification, ex-periment design and modeling and control of vehic-ular systems, with a present focus on internal com-bustion engines.