A Multi-Variate Hammer Stein Model for Processes

8/2/2019 A Multi-Variate Hammer Stein Model for Processes

1/12

A multi-variate Hammerstein model for processeswith input directionality

Gerrit Harnischmacher, Wolfgang Marquardt *

Lehrstuhl fur Prozesstechnik, RWTH-Aachen, D-52064 Aachen, Germany

Received 4 August 2006; received in revised form 4 December 2006; accepted 4 December 2006

Abstract

A new formulation of a block-structured model based on the Hammerstein operator is presented for the identification of multi-variatesystems with input directionality. In contrast to the existing formulations for multi-variate Hammerstein models, the proposed structureoffers the possibility to independently model the dynamic and nonlinear characteristics of the system and at the same time preserves thepossibility to use the new efficient algorithms developed for the identification of single input Hammerstein models. Further, the formu-lation allows for a representation of arbitrary static nonlinear coupling of input variables with a considerably lower amount of param-eters compared to existing formulations. The new model structure is applied to the identification of a fluid catalytic cracking (FCC) unitand significantly outperforms all previous multi-variate Hammerstein model structures by reducing the prediction error by over 50%. 2006 Elsevier Ltd. All rights reserved.

Keywords: Hammerstein model; Multi-variable block-oriented model; Nonlinear identification; Block-structured model

1. Introduction

Hammerstein models have been used in systems identifi-cation since the 1960s [22] and have been successfullyapplied to control in such different fields as neuroprothesis[17] and chemical engineering [30]. They may be viewed asthe prototype of the class of block-structured models [24]and were the first, for which an efficient identification algo-rithm existed [22]. Because of the independence of the non-linear static and linear dynamic elements, the identificationproblem can be greatly simplified for single-input single-

output (SISO) models, if the nonlinearity is known [23]or if the characteristics of different excitation signals areexploited [3]. This model structure is particularly attractivefor process systems modeling because steady-state informa-tion is often available from process design or fromhistorical data [23,27]. In that case, plant tests are onlyneeded to generate data for a standard linear identification

experiment. The proposed formulation extends the possi-bility to use these sources of information to the multi-var-iate case.

While there is a wealth of papers treating multi-inputmulti-output (MIMO) Hammerstein models (see Section2.2), applications of these models are rare. At the sametime, the published control applications based on single-input Hammerstein models are inherently multi-variable.This apparent inconsistency may be attributed to thelimitations of existing MIMO Hammerstein model formu-lations. As will be shown in Section 2, existing formula-

tions either do not sufficiently capture the systemnonlinearity or lead to very demanding parameter identifi-cation problems. A new formulation accounting for theseproblems as well as the question of the under-modelingof real processes is developed and analyzed in Section 3.Strictly speaking, the new formulation results in a complex-ity beyond the Hammerstein operator, as most existingMIMO Hammerstein models do. However, it maintainsthe independence of the linear and nonlinear elementsknown from SISO Hammerstein models, which can beexploited for identification and control applications, for

0959-1524/$ - see front matter 2006 Elsevier Ltd. All rights reserved.

doi:10.1016/j.jprocont.2006.12.001

* Corresponding author. Tel.: +49 241 8094668; fax: +49 241 8092326.E-mail address: [email protected] (W. Marquardt).

www.elsevier.com/locate/jprocont

Journal of Process Control 17 (2007) 539550
mailto:[email protected]:[email protected]


2/12

example, in [17,30]. Section 4 demonstrates the perfor-mance of the proposed model structure in a simulationstudy, before the paper is concluded in Section 5.

2. Background

2.1. The Hammerstein operator

The Hammerstein operator is at the core of Hammer-stein modeling. It refers to an integral equation studiedby Hammerstein [15] and has been used by Narendra andGallman [22] in their seminal work on Hammerstein modelidentification. Narendra and Gallman considered systemsthat can be expressed as

y Hut

ZX

Ksfut s ds: 1

This operator defines a single-input single-output, time-invariant, nonlinear impulse response model defined on an

input space X. Narendra and Gallman [22] also introducedthe very intuitive block-diagram notation for the Hammer-stein model depicted in Fig. 1, which has since been usedextensively in the discussion of block-structured models.Most of the literature on Hammerstein models is concernedwith discrete-time models. Using the block-diagram nota-tion of Fig. 1, the scalar discrete-time Hammerstein modelis generally defined as

yk Xnai1

ai yki Xnbi0

bi vki; 2a

vk Nuk; 2b

where uk and yk are the input and output sampled at timest = tk, k= 0,1, . . .K. Nuk : R

1 ! R1 is an arbitrary, non-linear, memoryless function mapping the scalar input ukinto the nonmeasurable scalar intermediate variable vk.The parameters a and b in Eq. (2a) define a linear time-invariant dynamic system. The summation index i runsfrom 1 to na = dim(a) for the delayed outputs and accord-ingly from 0 to nb = dim(b) 1 for the current and delayedinputs. Analogous definitions of the number of summandsare used in all subsequent models. An additional termaccounting for disturbances at the output may be added[20], but will be omitted here for the sake of simplicity.

2.2. Multi-variate Hammerstein models

Multi-output Hammerstein models can be designed bysimply using one Hammerstein model for each output. Incontrast, the design of a multi-input Hammerstein model

is not straightforward, as will be shown below. This papertherefore only deals with the multi-input single-output(MISO) case for the sake of simplicity, as it straightfor-wardly generalizes to the MIMO case. The followingconsiderations therefore deal with identifying multi-variateprocesses using an appropriate block-structured model.

It follows from (1), that multi-variate formulations ofHammerstein models can consider vector-valued inputvariables u as well as matrix kernels K(s) and vector func-tions f[]. Models with matrix kernels K(s) and vectorfunctions f[] have been introduced for scalar inputs u asUryson models [12], referring to a corresponding integralequation studied by Uryson, or as generalized Hammer-stein models, where each function fj[] is a basis function.A multi-input Hammerstein model is defined by an oper-ator of type (1) with a vector-valued input signal u. Whenconsidering vector-valued input variables, matrix kernelsK(s), and vector functions f[], their respective dimensionsare obviously independent degrees of freedom of the

model structure.The SISO Hammerstein model (2) straightforwardly

extends to the MISO Hammerstein model

yk Xnai1

ai yki Xnbi0

bi Nuki RB

considered by Rollins et al. [25,26] by replacing N(uk) byNuk : R

dimu ! R1. The block-diagram structure is de-picted in Fig. 1, (right); we will term it RB model for fur-ther reference. However, only a very limited variety ofdynamic behavior can be represented as the dynamic ele-ment is of scalar type (see, e.g., the chemical reactor exam-ple in [26]). In fact, a linear multi-input model is capable ofrepresenting more complex dynamic behavior, namelydynamics which depends on the direction of the input vec-tor. It is well known that such input directionality is adefining characteristic of multi-variate systems. As anexample consider the response of the fluid catalytic crack-ing (FCC) unit we treat as a sample process in Section 4.Fig. 2, (right), shows the response to subsequent steps inthe two input variables. Clearly, the dynamic response tothe step at t0 changes qualitatively with the direction ofthe step in the input space. To show, that this is indeed amulti-variate effect and not a nonlinear effect, the other in-

put is stepped at t150 showing that this qualitative differencein the response is independent of the starting point.

To address the problem of input directionality, amulti-input Hammerstein model based on separatenonlinearities,

yk Xnuj1

yj;k; KUa

yj;k Xnaji1

aj;i yj;ki Xnbji0

bj;i Njuj;ki KUb

has been introduced by Kortmann and Unbehauen [20]. Its

block-diagram structure is depicted in Fig. 3 and we will

N(uk) L

vk ykukN(uk) L

vk ykuk

Fig. 1. Discrete-time SISO Hammerstein model (left, cf. (2)) and MISOHammerstein model (right, cf. (RB)) in block-diagram notation: N()

static nonlinear element, L linear dynamic element.

540 G. Harnischmacher, W. Marquardt / Journal of Process Control 17 (2007) 539550


3/12

term it KU model for further reference. It consists of a setof nu = dim(u) scalar Hammerstein models containing sca-

lar nonlinear maps Njuj : R1

! R1

driven by scalar in-puts uj, where the output is calculated as the sum of theoutputs yj of the model in channel j. The KU model canobviously represent input-directional dynamics via the lin-ear systems differing for each channel, but it cannot repre-sent any nonlinear coupling among the input variables.Such nonlinear coupling is another defining characteristicof nonlinear multi-variable processes. Again, consider theFCC unit example, the steady-state gain surface of whichis also depicted in Fig. 2(left). With nonlinear gains mod-eled independently in each direction, slope and curvatureof the gain functions in one direction would be independentfrom the other input. This is clearly not the case in the realprocess. Consequently, Eskinat et al. [11] found the capa-bilities of the KU model to be limited, when applied to areal process. The strength of the model, however, lies inthe ease of identification. Each channel of the model con-tains a scalar Hammerstein model. Hence, all techniquesfor scalar Hammerstein model identification remain appli-cable. The KU model has been extended to parallel chan-nels of WienerHammerstein systems [4] and to networksof scalar block-structured models [8]. State-space ratherthan inputoutput representations have also been usedfor the linear element [21].

A MISO Hammerstein model based on a combined

nonlinearity,

yk Xnu

j1

yj;k; EJLa

yj;k Xnaji1

aj;i yj;ki Xnbji0

bj;i Njuki EJLb

has been introduced by Eskinat et al. [11]. Its block-dia-gram structure is depicted in Fig. 3, (right); we will termit EJL model for further reference. While the authors con-sider dim(u) nonlinear functions Nju : R

dimu ! R1, themodel structure has later been extended to an arbitrarynumber of nonlinear elements, independent from dim(u)[21,28], making the structure even more similar to the Ury-son model discussed below. Arbitrary basis functions [14],neural networks [27], and support vector machines [13]were proposed for the nonlinear element. State-space mod-els have also been employed for the linear element [21]. Incontrast to the KU model, the EJL model can represent thenonlinear coupling of input variables. Eskinat et al. [11]found it to be a much better representation of the multi-variable distillation column they studied.

Based on the model structure, the EJL model shouldmore accurately be termed a multi-input Uryson model[12]. The Uryson model is defined as a set of nN parallelHammerstein models, with different nonlinear and linearelements in each channel but driven by the same input.These properties hold for the EJL model, except that nN

was originally restricted to dim(u). However, it is a

40

41

42

390

400

410

420940

950

960

970

980

Rrc

[ton/min]R

ai[Mlb/hr]

T

ra

[F]

0 50 100 150 200 250

954

956

958

960

962

964

Tra

[F]

Fig. 2. FCC unit steady-state gain function (left) and step responses (right, solid line: RaiRrc, dotted line: RrcRai).

N(uk)

L1

v1,k y1,k

L

v y

ykuk+

nu,k

nu

nu,k

L1

v1,k y1,k

L

v y

yk+

nu,k

nu

nu,k

N1(u

1,k)

v1,k

v

nu,k

nuN (u )nu,k

Fig. 3. Block diagrams of KU model (left, cf. (KU)) and EJL model (right, cf. (EJL)).

G. Harnischmacher, W. Marquardt / Journal of Process Control 17 (2007) 539550 541


4/12

straightforward exercise to transform the EJL model intoa Uryson model by developing the nonlinear elementN() into a basis function expansion. Then, a Urysonmodel containing one of the nN basis functions fj in eachchannel

yk XnNj1 y

j;k; 5a

yj;k Xnaji1

aj;i yj;ki Xnbji0

bj;i fjuki 5b

is straightforwardly obtained. Note, that the parameters ajand bj in (5b) are fundamentally different from the respec-tive parameters in (EJLb).

The structure of the EJL model implies, that identifica-tion concepts based on an independent identification ofthe static and dynamic elements (e.g., [3,23,25]) are notapplicable. To identify the linear element, an input signalwould be required, which excites only one fj() in (5b) or

one Nj() in (EJLb). Since all nonlinear elements are drivenby the same input, this is generally not possible. Instead, ifa suitable set of basis functions is known, the overparam-eterization method of Chang and Luus [7] straightfor-wardly extends to the multi-variate case: If fj() areknown basis functions, the Uryson model (5) is linear inits identifiable parameters and hence can be obtained bylinear identification. This method has been used in combi-nation with different sets of basis functions, for example[13,27]. However, for large sets of basis functions, forexample, neural networks, this leads to challenging identi-fication problems. In summary, while offering the greatest

modeling flexibility of the three existing multi-variateHammerstein structures, the EJL model looses the mainadvantage of Hammerstein modeling, namely the ease ofidentification.

3. Hammerstein model for input-directional dynamics

In this section we develop a new multi-variate Ham-merstein model structure. It is designed such that the non-linear element N() can be an arbitrary nonlinear functionrepresenting the steady-state behavior of the process,which can then be identified independently using themethod of Rollins et al. [25]. For this method to be appli-cable, we assume the steady-state gain of the system to bedifferent from zero, which is the case for most physicalsystems, but may exclude some cases, such as certain elec-trical circuits. We decouple the dynamic response of themodel with respect to all inputs, such that the linear ele-ments can be identified independently using the methodof Bai [3].

3.1. Model derivation

We start our model derivation from a MISO Hammer-stein model of type (RB). We reformulate the linear model

of (RB) to the parallel structure

yk yS;k yD;k; 6a

yS;k Nuk; 6b

yD;k XnaDi1

aD;iyD;ki XnbDi0

bD;iNuki 6c

comprising a static channel yS and a dynamic channel yD,where aD and bD can be determined analytically to matchthe linear element of (RB). The static channel (6b) com-pletely defines the static behavior of the model includingthe nonlinear coupling of input variables, while (6c) definesthe dynamic deviation thereof. As a consequence of thisreformulation

XnbDi0

bD;i 0 7

holds for arbitrary values of the parameters a and b.To decouple the response of the dynamic channel (6c)

with respect to its scalar inputs uj, we represent the delayedinputs N(uki) to the linear element (6c) by the Taylor-ser-ies expansion of N() at uk:

Nuk Duki Nuk X1l1

1

l!

Xnuj1

Duj;kio

ouj

!lNu

uuk

8

with Duj,ki being the jth element of

Duki uki uk: 9

The products of the differential operators in (8) are to beinterpreted as higher order differentiation operators, forexample,

o

ouj1

o

ouj2Nu

uuk

o

2Nu

ouj1ouj2

uuk

: 10

Next to the term N(uk), (8) contains terms depending onlyon one input Duj,kias well as coupling terms, that is, termswith j15j2. We pragmatically neglect the latter and dis-

cuss the consequences below. This reduces the dynamicchannel to

yD;k XnaDi1

aD;i yD;ki XnbDi0

bD;ivD;ki; 11a

vD;ki Nuk Xnuj1

X1l1

1

l!Duj;ki

o

ouj

lNu

uuk

: 11b

We can now split the sum over j into nu independent chan-nels each containing the terms which depend only onDuj,ki. Since (7) holds, we can add N(uk) to every channel

obtaining

http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-


5/12

yD;k Xnuj1

yD;j;k; 12a

yD;j;k XnaDi1

aD;i yD;j;ki XnbDi1

bD;i vj;ki; 12b

vj;ki Nuk X1l1

1l!Duj;ki

o

ouj

lNu

uuk

0@ 1A: 12cThe input vj,ki to each linear element j is now equal to theTaylor series expansion ofN(u) in the direction of uj,ki atuk. Hence, we can simplify the expression for vj,ki to

vj;ki Nuk Duj;ki ej 13

with ej= (0, . . . , 1, . . . , 0)T being the unit vector in direction

j. In (12b) we can allow for different dynamic elements ineach channel. This yields a model, where N() can be anarbitrary function, yet input directionality can be modeledand the linear and nonlinear elements can be identified

independently:

ykNuk Xnuj1

yD;j;k; 14a

yD;j;kXnaD;ji1

aD;j;i yD;j;ki XnbD;ji0

bD;j;iNukDuj;ki ej: 14b

However, this model suffers from a decoupling error in-curred by neglecting the coupling terms in (11). This errorcan become large for strongly coupled systems. The cou-pling terms neglected in (11) are indeed all equal to zero,if only one input j* is excited, that is, if

Duj;ki 0 8j : j 6 j 15

is introduced in (8). With (7) the decoupling error of(14) iszero, if (15) holds for all i 6 nbD;j . For arbitrary inputs, thedecoupling error can be eliminated by a second reformula-tion of the model. We define a new sampling time

Dtm DtkPnuj1nbD;j

16

and refer to the discretization grids with sampling timesDtm and Dtk as m-grid and k-grid, respectively. We refor-mulate the model (14) to the new m-grid. The parameters

aD,j and bD,jare determined such that the original time con-stants remain unchanged. On the m-grid, we define thereformulated inputs

uj;m

uj;k1 8tm : tk 6 tm < tk Pji1

nbD;i Dtm;

uj;k 8tm : tk Pji1

nbD;i Dtm 6 tm < tk1:

8>>>>>: 17

An example of this input reformulation with nu = 3 andnbD;j 2;j 1; . . . ; 3 is depicted in Fig. 4. For the reformu-lated inputs, (15) holds for all jand i 6 nbD;j . Hence, all cou-pling terms in (8) are zero. The remaining error is the delay

incurred by the definition of the inputs in (17). I t i s

bounded above by Dtk and well acceptable for a properlychosen sampling interval Dtk. Finally, determine the outputyk from the output of the reformulated model by

yk ym with tm tk1 Dtm: 18

Note, that the sampling interval Dtk of the input uk andoutput yk from the process remains unchanged by thisreformulation. We can then state the Hammerstein model

based on deviation dynamics for the measurements onthe k-grid as

yk ym with tm tk Dtm; HMa

ym Num Xnuj1

yD;j;m; HMb

yD;j;m XnaD;ji1

aD;j;i yD;j;mi XnbD;ji0

bD;j;i vj;mi; HMc

vj;mi Num Duj;mi ej; HMd

uj;m

uj;k1 8tm : tk 6 tm < tk Pj

i1

nbD;iDtm;

uj;k 8tm : tk Pji1

nbD;iDtm 6 tm < tk1:

8>>>>>: HMeWe will refer to this model as HM model in the sequel.

The block-diagram structure of this model, depicted inFig. 5, is much more complex than those of the otherMISO Hammerstein models, but as we will see below, theproperties of the model are much more similar to the sim-ple SISO Hammerstein model than to those of existingMISO formulations.

3.2. Model properties

The structure of the HM model allows the identificationof input-directional dynamics in multi-variate processes.This property is combined for the first time with the possi-bility to identify arbitrary nonlinear static models. At thesame time the independence of the nonlinear and linear ele-ments is preserved. This allows for simplified identificationof the model, increases interpretability and facilitates theincorporation of application-specific nonlinear maps.

3.2.1. Input-directional dynamics

The HM model accounts for linear dynamic coupling of

input variables. This property is shared by the KU model,

tm

kt 1kt +tk

u1

u

2u

3u

2kt +

Fig. 4. Example of inputs reformulated to the time discretization Dtmaccording to (17).

http://-/?-http://-/?-


6/12

but not by the RB model. The EJL model can exhibitinput-directional dynamics to the extent of the correspond-ing Uryson model, which includes the linear case discussedbelow. However, for the linear case the EJL model is equalto the KU model, because no nonlinear couplings of input

variables exist. The example process in Section 4 alsoshows dynamic behavior which may not be representedas a linear combination of the responses to changes in eachinput. Such behavior may be modeled in block-structuredform only by adding additional linear and nonlinear ele-ments in sequence [8], which is beyond the complexity ofthe Hammerstein model and poses challenging identifica-tion problems.

Theorem 1. The HM model contains a linear dynamic modelwith nu inputs as a special case and can account for the same

linear input-directional dynamics.

Proof. For the linear case, with u a vector of constants,

Nu uTu 20

holds. The dynamic channels of (HMc) reduce to

yD;j;m XnaD;ji1


bD;j;i

uTum uj Duj;mi

: 21

With (7) and (9) this straightforwardly reduces to

yD;j;m XnaD;j

i1

aD;j;i yj;mi XnbD;j

i0

bD;j;i uj uj;mi: 22

With the static channel of (HM) being

yS;m uTum 23

for the linear case and the equivalence of (RB) and (6),(HMb)(HMd), reduce to

ym Xdimuj1

yj;m; 24a

yj;m Xnaji1

aj;i yj;mi Xnbji0

bj;i uj uj;mi; 24b

which is a linear multi-input model. The definitions ofum in(HMe) and yk in (HMa) are linear functions. Since in (24)all channels are independent, these reformulations amountto simple delay operators. The modeling of these by anappropriate choice ofbj is standard in linear discrete-timemodeling. h

3.2.2. Arbitrary nonlinear static models

For the HM model the independence of the dynamicchannels is preserved regardless of the structure of the non-linear element. This property is unique to the new HMmodel, as the KU model explicitly restricts the structureof the nonlinear element and the dynamic channels of theEJL model are only independent, when it degenerates tothe KU model.

Theorem 2. For the excitation of one single input variable

uj , the HM model reduces to a single-input single-outputHammerstein model containing Lj as the linear element

regardless of the structure of the nonlinear element.

N(uk)

L1

v1,m

y1,k

yk

uk

+

y ,k

+

N(um + u1,me1)

v1,k-

N(um + u1,me1)H1,

L

v ,k

N(um + u ,me )

dim(bl,1)

H1, 1

dim(b1)

R1

y1,m

Ry ,m

Hnu,1

nu

nu nu

vn ,k-dim(b )u nu

nu

nu

nu

nu

nu

N(um + u ,me )nu

nu

Hnu,dim(b )

Fig. 5. Block diagram of new HM model: N(), nonlinear static map; Lj(), linear transfer function in channel j; Hj,i(), reformulation of inputs according to(17) including zero order hold of i sampling steps Dtm; Rj, calculation of output on the k-grid from output on the m-grid according to (18).

http://-/?-http://-/?-http://-/?-http://-/?-


7/12

Proof. Since (7) holds for all linear elements

Duj;mi 0 8m ) yD;j;m 0 8m: 25

Hence, ifDuj,mi = 0 "m holds for all j 5j*, i.e. all uj,k butuj;k are constant for all k, (HMb)(HMd) reduce to

ym Num yD;j;m; 26a

yD;j;m XnaD;ji1


bD;j;i vj;mi; 26b

vj;mi Num Duj;mi ej : 26c

Again with the equivalence of(RB) and (6), (26) is equiva-lent to a SISO Hammerstein model. With uj,m = const."j: j5 j*, (HMa) and (HMe) again introduce only simpledelay operators the modeling of which by suitable choice ofbj is standard in discrete-time Hammerstein modeling. h

3.2.3. Independence of nonlinear and linear elements

Just as in the scalar Hammerstein model, the structuresof the nonlinear and linear elements of the HM model areindependent. This property is shared by the KU model andthe RB model, but not by the EJL model, unless it degen-erates to the KU model.

Lemma 1. The static response of the HM model isdetermined entirely by the nonlinear element, regardless of

the structure of the linear elements. The linear elements of

the HM model can be determined by identifying a linearmodel of the system and replacing its gain u by 1.

Proof. The first part is a direct consequence of the equiva-lence of (RB) and (6). The second part is a direct conse-quence of Theorem 1, because replacing N(u) b y aconstant gain u directly leads to a linear model with gainu. h

3.2.4. Complexity and interpretability

Single-input Hammerstein models comprise a very intu-itive yet narrow generalization of a linear model. In con-trast to Wiener models for example, the dynamicbehavior of the Hammerstein model is completely definedby its linear element. The nonlinear element provides thesteady-state gain, and hence the intermediate variable ofthe SISO Hammerstein model has a physical meaning: Itis the future steady-state value of the output correspondingto the current input. This interpretability is fully lost by theEJL model. It is preserved by the new model as well as theRB and KU models. The KU model as well as the new HMmodel also preserve the close ties to the linear model, suchthat step response experiments can be used for the identifi-cation of their linear elements. Hence, much more easilyinterpretable data can be used for the identification in con-trast to the more complex excitation signals required forthe identification of the RB model.

To discuss model complexity, we assume that the non-

linear element can be represented by an expansion in nN

basis functions. The linear elements contain dim(hL) = di-m([aT,bT]T) parameters. The EJL model contains the mostparameters with dimhEJL nu nN

Pnuj1dimhL;j. The

HM model contains only dimhHM nN Pnu

j1dimhL;jparameters. For an example with three inputs, 15 basisfunctions in the nonlinearity and second order linear ele-

ments, the HM model contains 27 parameters, which is lessthan 50% of the 57 parameters of the corresponding EJLmodel. The RB model is obviously the least complex con-taining dim(hRB) = nN+ dim(hL) parameters, or, respec-tively, 19 in the example stated above, which is 30% lessthan the number of parameters of the HM model. How-ever, this reduction comes a the cost of greatly reducedmodeling flexibility. The basis function expansion of theKU model contains only those basis functions, which areR

1 ! R1. Its complexity is therefore not directly compara-ble to the other three models.

3.3. Identification method for the new HM model

As stated in (13), the nonlinearity N() is the same in allchannels. Theorem 1 states that it may be of arbitrarystructure. We may therefore identify a nonlinear estimatorNu : Rdimu ! R1 independently from the linear elementsusing the method of Rollins et al. [25]. Therefore, for a suit-able choice of steady-state experiments with inputs ul andprocess outputs ~yl we minimize

JN Xl

~yl Nulk k: 27

N(u) can be any nonlinear function parameterized by

some h including, for example, polynomials or neural net-works. Hence, identification of N(u) comprises of thechoice of an appropriate estimator N(u) and determinationof its parameters via (27).

As stated in Theorem 2, the HM model reduces to a sin-gle-input Hammerstein model for excitation of one singleinput. Hence, the method of Bai [3] directly applies to eachdynamic channel of the HM model. Since the nonlinearityof the model is not excited by the step or the pseudo ran-dom binary sequence (PRBS) used for identification, thenonlinearity may be replaced by its secant approximationbetween the lower and upper values of the step or PRBS.First, we determine an estimate of the dynamics in channel

jafter excitation with uk= uk0 + Duj,kej, and Duj,ka PRBS

or step signal and solving

minaj;bj

JLj XKk1

k~yk ykk 28a

s:t: yk Xnaji1

aj;iyki Xnbji0

bj;iuj uj;ki c0;j; 28b

where uj and c0,j are the parameters of the secant approx-imation ofN(u) between the lower and upper bounds of thestep or PRBS. The linear system in channel j of the HMmodel is then obtained by setting uj= 1, c0,j= 0 accor-

ding to Lemma 1. Then, we perform the reformulations

http://-/?-http://-/?-


8/12

discussed in Section 3.1 to eliminate the decoupling errorand to obtain the parallel structure of the HM model.

The proposed identification method identifies both, thelinear and nonlinear elements without knowledge of theother. Hence, errors in the identification of one element,which are inevitable due to the structural under-modeling

of real processes, will not carry over into the identificationof the other element.This two-stage identification method is especially suit-

able for applications in the chemical industry, where thesteady-state behavior of the process is either known [23]from first principles modeling or can be much easier iden-tified than the dynamic behavior [25,27]. In principle, theHM model can also be identified using the modified algo-rithm of Narendra and Gallman, which is suitable for iden-tification of the EJL model [11]. However, in this case thestructure of the model is not exploited.

3.4. Application specific nonlinear maps

Originally, polynomial nonlinear maps were used inHammerstein systems identification [22] because of theirsimplicity. However, these are limited in the nonlinearbehavior they can identify. Precisely, they tend to oscillatewhen saturation is to be represented, that is, the gain of anactuator approaching zero as the unit approaches a physi-cal limitation. This can be overcome by means of other setsof basis functions or rigorous models. If the identificationof the linear and nonlinear elements is not independent,as in the case of the EJL model, identification algorithmsneed to be tailored to the respective nonlinear maps. In

the case of the HM model, the identification of the two ele-ments is independent. Therefore, rigorous steady-state pro-cess models can be incorporated in the block-structuredHM model without further modification. Furthermore,any other nonlinear function approximation such as artifi-cial neural networks (ANN) can be used in connection withthe new HM model structure together with existing identi-fication algorithms without the need of any furthermodification.

In this section we will consider three nonlinear maps,which are more complex than polynomial representations:artificial neural networks, sparse grid representations, and

rigorous models. They will also be benchmarked on thesimulation example in Section 4.A popular choice for a set of basis functions to approx-

imate a nonlinear function is the use of an ANN with onehidden layer, which may be represented as

Nu XSj1

aj gjbTj u cj d; 29

where a and d are the weights and bias of the output layerand bj, cj, and gj() are the weights, biases, and transferfunctions of the S neurons j of the hidden layer. Such net-works are known to be able to represent any continuous

nonlinear function to arbitrary precision [9]. The function

approximation is continuous with continuous first-orderderivatives. The latter allows to use these models as con-straints in dynamic optimization problems together withstandard gradient based solvers (see [16] for an applicationwith the new HM model). The main drawback of the ANNrepresentation is that the identification is nonlinear in the

parameters and can therefore converge to a local mini-mum. The popularity of ANN in nonlinear function repre-sentation has motivated the development of tailoredidentification algorithms for their incorporation in SISOHammerstein models [1,27].

Multi-dimensional spline models feature the same uni-versal approximation capabilities as ANN, but result in alinear parameter estimation problem [29]. However, thenumber of basis functions increases exponentially withthe number of input variables, because the basis functionsare defined on a full discretization grid of the input space.We therefore propose to use a much more efficient repre-sentation based on sparse grid approximation [5,6]. The

sparse grid representation of the nonlinear map

Nu Xs2I

wsfsu 30

is the weighted sum of the approximations on a minimal setI of subgrids

fsu Xj2K

hs;j/s;ju 31

with weights ws and parameters of the subgrids hs,j. Thesparse-grid approximation uses local basis functions/s,j(), which are derived from a one-dimensional piecewise

linear basis by a tensor product construction. Details ondiscretization and regularization of the sparse grid can befound in [5,6,19], for example. While the sparse-grid repre-sentation is favorable with respect to identification, its firstorder derivatives are not continuous. Therefore, they can-not easily be used in connection with gradient-based solv-ers, which require continuous first-order derivatives. Notailored identification algorithm for Hammerstein modelsincorporating sparse grids exists. The new model structure,therefore, for the first time allows the easy incorporation ofsparse-grid representations into a multi-variate Hammer-stein model.

For SISO Hammerstein models, Pearson and Pottmann[23] have proposed to use rigorous steady-state models forthe nonlinear element. These models are developed bydefining a set of equations modeling the physical principlesunderlying the process such as mass and energy balances.In the chemical industry such models are oftentimes avail-able from process design. Their reuse greatly reduces theidentification problem as only a linear system remains tobe identified. Rigorous models cannot be used for the non-linear elements of the KU model, because the latter arerestricted to R1 ! R1. In the EJL model the use of rigorousmodels for Nj() would lead to all yj being equal. In thatcase the EJL model could easily be transformed into the

RB model. The proposed HM model, therefore, for the



9/12

first time allows the use of these models in identifyinginput-directional multi-variate systems using Hammersteinmodels. While the reuse of rigorous models reduces theidentification effort, the computational load of these mod-els can be considerable. For use in nonlinear model predic-tive control, this increased computational load can become

prohibitive.The three nonlinear maps discussed above can be con-sidered as application-specific examples: ANN for use inoptimization problems, sparse-grid representations for easeof identification, and rigorous models to reuse existing pro-cess knowledge. Beyond these, a wealth of other nonlinearfunction approximation methods exist, which we will nottreat in detail in this paper. The three approximations dis-cussed above will be compared on the simulation examplein Section 4, but any nonlinear approximation, which canbe identified from inputoutput data can be used in con-nection with the HM model structure.

4. Application to a simulated FCC unit

In this section we will benchmark the new model againstthe existing structures on a simulated fluid catalytic crack-ing (FCC) unit depicted in Fig. 6, which consists of fourcoupled units. The FCC is an industrially relevant processand several rigorous models exist in the open literature. Weuse the model originally developed by Kurihara in anunpublished dissertation and comprehensively discussedby Denn [10]. This model has been validated and usedfor control of a real unit by Ansari and Tade [2]. We willnot restate the equations here for brevity. The nomencla-

ture and units used in the sequel are the same as those ofDenn [10], where the complete model may be found. Ansariand Tade [2] also state the complete model, but with sometypographical error and a slightly different notation.Detailed process descriptions can be found in bothreferences.

4.1. Simulated FCC unit

The main manipulated variables of the process are theair flowrate Rai and the catalyst circulation rate Rrc, while

the feed rate Rtfand feed temperature Tfp are treated as dis-turbances. For simplicity we will restrict our discussion totwo-input single-output models, as we can visualize thenonlinear static maps for these models. We therefore treatRrc and Rai as inputs to the system. To control the mainquality variable, the cracking severity, several controlled

variables have been explored due to the complex dynamicsof the system. However, the riser outlet temperature Tra isdirectly related to the cracking severity and has recentlybeen used for control [18]. We will therefore treat it asthe output variable.

The reactor regenerator section of the FCC unit (cf.Fig. 6) shows a complex behavior. The static response iscoupled and highly nonlinear. We consider a range ofinputs Rai 390; 420

Mlbh

and Rrc 40; 42tonmin

, for whichthe nonlinear static map of the process is shown inFig. 2, (left). While the process exhibits nearly linear behav-ior at low Rai or high Rrc, it becomes highly nonlinear athigh Rai and low Rrc. In fact, it even becomes unstable

(the reaction is extinguishing) at higher Rai and lower Rrcthan considered here. The dynamic response of the processshows a strong input directionality. Fig. 2, (right), showsthe response to subsequent steps in both inputs. Clearly,the dynamic behavior changes qualitatively with the direc-tion of the step in the input space regardless of the startingpoint. Repeating this experiment with different startingpoints yields the same qualitative difference in theresponses to Rai and Rrc.

4.2. Hammerstein models

The process is identified using the new HM model struc-ture as well as the previously developed structures RB, KUand EJL discussed in Section 2.2 with polynomial nonlin-ear maps to assess their suitability for identifying thedynamic behavior of the system. Subsequently, the capabil-ity of the HM model to incorporate independently identi-fied, arbitrary nonlinear functions is exploited, and thepolynomial representation is replaced by the three repre-sentations discussed in Section 3.4.

4.2.1. Identification

The new model is identified as described in Section 3.3.

The nonlinear models are identified from a measuredsteady-state data set ~u1;s, ~u2;s, ~ys by solving

minhN

~ys ysT~ys ys 32a

s:t: ys;i NhN; ~u1;s;i; ~u2;s;i; i 1 . . . dimys 32b

for the parameters hNof the nonlinear map. Terms of up tosecond order in each input are considered for the poly-nomial representation. The ANN is identified using abackpropagation algorithm based on the LevenbergMarquardt method. The sparse grid is identified using thealgorithm of Kahrs et al. [19]. Both of these algorithmsuse the same error-criterion (32a). The data points used

for identification of the nonlinear maps are taken on a

product flue gas

reactor regenerator

catalyst recycle

air riser

oil riser

air feed

Rrc

Rai Rtf, Tfp

Tra

Fig. 6. Flowsheet of the FCC process example.



10/12

uniform grid ofDRrc 0:05tonmin

and DRai 0:5Mlb

h. The lin-

ear models are identified independently by solving

mina0j;b0j

~y yT~y y 33a

s:t: yk Xna0j

i1

a0j;i yki Xnb0j

i0

b0j;i ~uj;k cj;0; 33b

where the vector y contains system response {yk} afterexcitation with a PRBS signal in ~uj. The parameters ajand bj are determined from a

0j and b

0j by normalizing the

gain of the linear elements to one and performing thereformulations discussed in Section 3.1. The linear ele-ments are of second order for Rrc and of fourth orderfor Rai. In this example, we use noise-free data for thesake of simplicity. As can be seen from the discussionabove, the determination of the model parameters is per-formed by standard identification experiments. A broadbody of literature treating the effects of noise on these

experiments exists.The RB model can incorporate arbitrary, independently

identified nonlinear maps. Therefore, the same nonlinearmap as in the HM model is used. The linear model needsto represent an average dynamic response of the system.The data set used for identification, therefore, has to con-tain an excitation such that the user deems the process out-put to be representative of the average dynamic responseof the system. With this requirement, identifying the FCCprocess with the RB model obviously is more of an art thana science. In our case, we use uniformly distributed randomsignals with switching probability of 0.5 and calculate input

sequence {vj,k} to the linear element using N(). The linearelement is of second order.

The KU model uses the same linear dynamic model asthe HM model. The nonlinear element is identified inde-pendently using (32) by replacing N() by N1(hN1,u1) +N2(hN2,u2) in (32b).

For the EJL model an iterative algorithm has been pro-posed by Eskinat et al. [11], which we applied to our exam-ple. We chose different model parameterizations of up tothird order linear elements and nonlinear maps of up tosecond order in each input. However, the algorithm didnot converge for any of the chosen model orders, and themodel could not be identified. This may be due to two rea-sons. First, our example process is more complex than thedistillation column studied by Eskinat et al., which, forexample, shows less severe input directionality. Further,the EJL model is only unique up to a similarity transforma-tion: Since all channels contain the same terms, they areinterchangeable. During identification with an iterativeprocedure such as the one proposed by Eskinat et al., thismay lead to convergence problems.

The identification algorithms for the RB model, the KUmodel and the new model are all noniterative with respectto the elements of the Hammerstein model and computa-tion times for identification are well below 5 s using MAT-

LAB

on a 1.5 GHz PC.

4.2.2. Prediction

To assess model accuracy, we use a test sequence of 5000time intervals with 90 random steps in the input space.Figs. 7 and 8 each show system responses for part of thissequence. The model prediction errors are defined by theroot mean square error

RMSE 1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

dimyp k~y yk2; 34

where ~y is the vector of measurement data and y the corre-sponding model prediction.

We first compare the different block-structured models.Since the KU model cannot incorporate rigorous models,we compare the block-structured models using polynomialnonlinear maps in each case. The polynomials containterms of up to the second power in each input. As can be

4250 4300 4350 4400 4450 4500 4550

956

958

960

962

964

966

968

970

972

974

Tra

[F]

process

RB model

KU model

new HM model

Fig. 7. Comparison of RB model, KU model, and new HM modelstructure with polynomial nonlinear maps containing up to the secondpower of each input.

4250 4300 4350 4400 4450 4500 4550

956

958

960

962

964

966

968

970

972

974

Tra

[F]

process

polynomial

rigorous

sparse grid

neural network

Fig. 8. Comparison of four nonlinear maps for the new HM model.

http://-/?-http://-/?-


11/12

seen in Fig. 7, the RB model does not sufficiently capturethe dynamic response of the process. The prediction errorover the entire test sequence is 1.29 F. The new HMmodel, in contrast, is much more suitable to identify pro-cess dynamics. Due to inaccurate modeling of the gain ofthe system a prediction error of 0.91 F remains. To dem-

onstrate the difference between the two models more explic-itly, we use the rigorous steady-state process model as thenonlinear element for both models. In this case, there isno plant-model mismatch at steady state. The resultingerrors for the dynamic test sequence are HM = 0.47 Fand RB = 1.07 F for the HM and RB model, respectively.In this case the new model reduces the prediction error bymore than 50% compared to the RB model.

While the KU model is capable of identifying the pro-cess dynamics as can be seen in Fig. 7, the nonlinear cou-pling of inputs cannot be represented. This results in thelarge steady-state error visible in Fig. 7 between 4400 and4500. In fact, the prediction error of 1.44 F is even larger

than that of the RB model.For the new HM model we now compare the different

nonlinear maps discussed above. Fig. 8 shows the same sec-tion of the test sequence as Fig. 7. For comparison, the pre-diction of the polynomial model is redrawn and it becomesobvious, that it is far inferior to the rigorous, sparse grid,or ANN representations. All three of these are, however,very close and in fact, the error incurred by the sparse-gridor neural-network approximations at steady state is smallcompared to the error incurred during transients by theblock-structured approximation of the system dynamics(e.g., between 4500 and 4550). The respective errors are

RMSE = 0.47 F for the rigorous model andRMSE = 0.52 F for both the sparse-grid and neural-net-work representations. The errors of all three models areabout 50% lower than that of the polynomial representa-tion. The errors are due to the block-structured model aswell as due to the representation of the nonlinear map.The different nonlinear maps may be compared using theRMSE incurred in predicting the nonlinear gain shown inFig. 2. The sparse grid and ANN result in errors ofRMSE = 0.21 F and RMSE = 0.31 F, respectively, whilethe error increases to RMSE = 0.74 F for the polynomialrepresentation. Since the KU model further restricts thepolynomial as discussed in Section 2.2, the error increasesfurther to RMSE = 1.49 F, which is seven times the errorof the sparse grid.

4.2.3. Computation time

For control applications, computation times are ofmajor interest, and we therefore compare the computa-tional cost for the simulation of the test sequence used inthe previous section. As expected, the models based onpolynomial nonlinearities perform best with respect tocomputational performance. Regardless of the differentblock-structure, they could all be solved in less than 0.6 susing MATLAB on a 1.5 GHz PC. The models based

on rigorous nonlinear maps in turn are expected to perform

worst and in fact consume 475 s of CPU-time. The modelsbased on sparse grid or neural network approximationsperform much better, as they could be solved in 4 and2 s, respectively, with no significant difference in modelfidelity. The competitive computation times of the lattertwo are preserved regardless of the internal complexity of

the process as they only depend on its inputoutputdimensionality.

5. Conclusions

A new Hammerstein model structure has been devel-oped for multi-variate nonlinear processes with input direc-tionality. This model for the first time allows to usearbitrary, independently identified nonlinear maps with alinear model independently identified by standard SISOstep or PRBS response experiments. The model structureis applied to the identification of a simulated chemical pro-

cess exhibiting input-directional dynamics and a nonlinearcoupling of input variables at the same time. For the case-study the new model formulation is shown to be superior toall previously developed Hammerstein model structures.Input-directional dynamics is represented via identificationof a linear multi-input model. The representation of thenonlinear steady-state behavior can be tailored to thedesired application of the model. While a polynomial rep-resentation suffers from poor prediction capability and arigorous model from poor computational performance,the ANN and the sparse grid representations prove to effi-ciently combine high model fidelity with low computationalcost. In combination with a suitable representation of the

nonlinearity, the use of the new model structure results ina reduction in prediction error of more than 50% comparedto block-structured models reported in previous literature.

References

[1] H. Al-Duwaish, M.N. Karim, A new method for the identification ofHammerstein model, Automatica 33 (10) (1997) 18711875.

[2] R.M. Ansari, M.O. Tade, Constrained nonlinear multivariablecontrol of a fluid catalytic cracking process, J. Proc. Contr. 10 (6)(2000) 539555.

[3] E.-W. Bai, Decoupling the linear and nonlinear parts in Hammersteinmodel identification, Automatica 40 (4) (2004) 671676.

[4] M. Boutayeb, M. Darouach, Recursive identification method forMISO WienerHammerstein model, IEEE Trans. Autom. Contr. 40(2) (1995) 287297.

[5] M. Brendel, W. Marquardt, An algorithm for multivariate functionestimation based on hierarchically refined sparse grids, Comput.Visual Sci., accepted for publication.

[6] H.-J. Bungartz, M. Griebel, Sparse grids, Acta Numer. 13 (2004) 147269.

[7] F.H.I. Chang, R. Luus, A noniterative method for identification usingHammerstein model, IEEE Trans. Autom. Contr. 16 (5) (1971) 464468.

[8] H.-W. Chen, Modeling and identification of parallel nonlinearsystems: structural classification and parameter estimation methods,Proc. IEEE 83 (1) (1995) 3966.

[9] G. Cybenko, Approximation by superpositions of a sigmoidal

function, Math. Contr. Sign. Syst. 2 (1989) 303314.



12/12

[10] M.M. Denn, Process Modeling, Pitman Publishing, Marshfield, MA,1986.

[11] E. Eskinat, S.H. Johnson, W.L. Luyben, Use of Hammerstein modelsin identification of nonlinear systems, AIChE J. 37 (2) (1991) 255268.

[12] P.G. Gallman, An iterative method for the identification of nonlinearsystems using a Uryson model, IEEE Trans. Autom. Contr. 20 (6)(1975) 771775.

[13] I. Goethals, K. Pelckmans, J.A.K. Suykens, B. De Moor, Subspaceidentification of Hammerstein systems using least squares supportvector machines, IEEE Trans. Autom. Contr. 50 (10) (2005) 15091519.

[14] J.C. Gomez, E. Baeyens, Identification of block-oriented nonlinearsystems using orthonormal basis, J. Proc. Contr. 14 (6) (2004) 685697.

[15] A. Hammerstein, Nichtlineare Integralgleichungen nebst Anwendun-gen, Acta Math. 54 (1930) 117176.

[16] G. Harnischmacher, W. Marquardt, Nonlinear model predictivecontrol of multivariable processes using block-structured models,Contr. Eng. Pract., in press, doi: 10.1016/j.conengprac.2006.10.016.

[17] K.J. Hunt, M. Munih, N. Donaldson, M.D. Barr, Optimal control ofankle joint moment: toward unsupported standing in paraplegia,IEEE Trans. Autom. Contr. 43 (6) (1998) 819832.

[18] C. Jia, S. Rohani, A. Jutan, FCC unit modeling, identification andmodel predictive control, a simulation study, Chem. Eng. Process. 42(4) (2003) 311325.

[19] O. Kahrs, M. Brendel, W. Marquardt, Incremental identification ofNARX models by sparse grid approximation, in: Proceedings of the16th IFAC World Congress 2005, Prague, 2005.

[20] M. Kortmann, H. Unbehauen, Identification methods for nonlinearMISO systems, in: Proceedings of the IFAC World Congress 1987,Munich, Germany, 1987, pp. 225230.

[21] S. Lakshminarayanan, S.L. Shah, K. Nandakumar, Identification ofHammerstein models using multivariate statistical tools, Chem. Eng.Sci. 50 (22) (1995) 35993613.

[22] K.S. Narendra, P.G. Gallman, An iterative method for the identifi-cation of nonlinear systems using a Hammerstein model, IEEE Trans.Autom. Contr. 11 (3) (1966) 546550.

[23] R.K. Pearson, M. Pottmann, Gray-box identification of block-oriented nonlinear models, J. Proc. Contr. 10 (4) (2000) 301315.

[24] R.K. Pearson, Selecting nonlinear model structures for computercontrol, J. Proc. Contr. 13 (1) (2003) 126.

[25] D.K. Rollins, N. Bhandari, A.M. Bassili, G.M. Colver, S.-T. Chin, Acontinuous-time nonlinear dynamic predictive modelling method forHammerstein processes, Ind. Eng. Chem. Res. 42 (4) (2003) 860872.

[26] D.K. Rollins, N. Bhandari, Constrained MIMO dynamic discrete-time modeling exploiting optimal experimental design, J. Proc. Contr.14 (6) (2004) 671683.

[27] H.-T. Su, T.J. McAvoy, Integration of multilayer perceptronnetworks and linear dynamic models: A Hammerstein modelingapproach, Ind. Eng. Chem. Res. 32 (9) (1993) 19271936.

[28] M. Verhaegen, D. Westwick, Identifying MIMO Hammersteinsystems in the context of subspace model identification methods,Int. J. Contr. 63 (2) (1996) 331349.

[29] Wahba, Grace, Spline Models for Observational Data, SIAM 1990.[30] X. Zhu, D.E. Seborg, Nonlinear predictive control based on

Hammerstein models, in: Proceedings: Process Systems Engineering1994, 1994, pp. 9951000.


A Multi-Variate Hammer Stein Model for Processes

Documents

Transcript of A Multi-Variate Hammer Stein Model for Processes