Decoupling multivariate polynomials for nonlinear state...

6
Decoupling multivariate polynomials for nonlinear state-space models Jan Decuyper 1 and Philippe Dreesen 2 and and Johan Schoukens 1,3 and Mark C. Runacres 1 and Koen Tiels 4 Abstract— Multivariate polynomials are omnipresent in black-box modelling. They are praised for their flexibility and ease of manipulation yet typically fall short in terms of insight and interpretability. Therefore often an alternative representation is desired. Translating the coupled polynomials into a decoupled form, containing only univariate polynomials has hence become a popular option. In this paper two new polynomial decoupling techniques are introduced. The features and performance of both methods are illustrated on a nonlinear state-space model identified from data of the forced Duffing oscillator. I. INTRODUCTION In many cases (nonlinear) black-box identification is the most suitable modelling technique, e.g. when white box models are too expensive or when they are too complex to be derived from first principles. It is however often desirable to be able to interpret the model. Given the generic nature of the black-box structures, generally complex models, described by a large number of parameters are obtained. In this paper the complexity of such models is reduced while maintaining high accuracy. We will focus on the polynomial nonlinear state-space (PNLSS) model class [1] since it is a very generic nonlinear model class. Such models are typically built making use of coupled, i.e. containing cross-terms, multivariate polyno- mials. Note however that multivariate polynomials are also common in parellel Wiener models [2], parallel Wiener- Hammerstein models [3], [4] and NARX models [5], [6]. Inherent to using generic multivariate polynomials is that a large number of degrees of freedom are used during the identification, even when the underlying nonlinear relation- ship can in fact be grasped by a considerably less complex function. The objective is then to retrieve such a reduced, simplified form. *This work was supported by the Fund for Scientific Research (FWO- Vlaanderen) under projects G.0280.15N and G.0901.17N, EOS Project no 30468160, the Swedish Research Council (VR) via the project NewLEADS – New Directions in Learning Dynamical Systems (contract number: 621- 2016-06079), and by the Swedish Foundation for Strategic Research (SSF) via the project ASSEMBLE (contract number: RIT15-0012). 1 Jan Decuyper and Johan Schoukens and Mark C. Runacres are with Faculty of Engineering Technology (INDI), Vrije Universiteit Brussel, 1050 Brussel, Belgium [email protected],[email protected] 2 Philippe Dreesen is with Department of Fundamental Electricity and Instrumentation (ELEC), Vrije Universiteit Brussel, 1050 Brussel, Belgium [email protected] 3 Johan Schoukens is with Department of Electrical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands [email protected] 4 Koen Tiels is with Department of Information Technology, Uppsala University, SE-75105 Uppsala, Sweden [email protected] What is preferred is a description in which only univariate (polynomial) functions are used. Given their single-input single-output nature they are much more tractable and easily visualised. A univariate description can additionally result in a reduction of the number of parameters, especially for large functions with a large number of inputs and large polynomial degrees. A number of methods, able to decouple multivariate polynomials into a set of univariate polynomials, have been proposed. In [7], [2] it was shown that the cross-terms of a multivariate polynomial can be eliminated by diagonalising the matrix holding the coefficients of the polynomial. The method, however, requires a tensor decomposition of which the order grows with the degree of the polynomial. An ele- gant solution was proposed in [8] showing that a decoupled form can be inferred from the first order derivate information of the coupled function. The procedure limits the order of the Jacobian tensor which is to be decomposed to 3, irrespective of the degree of the polynomial. Recent results of applying polynomial decoupling to PNLSS models were presented for the Bouc-Wen hysteresis system in [9], [10]. The the number of nonlinear parameters was reduced from 90 for the full PNLSS model to 51 for the decoupled representation. Direct usage of the method of [8] is limited to those cases in which the decomposition of the three-way tensor of derivative information is unique. We will show that this uniqueness is not easily met. Two alternative methods which are not constrained by this uniqueness condition are presented: 1) The first method is a regularisation-based approach that imposes smoothness on the derivatives of the obtained univariate functions. 2) The second method is a geometric approach in which a decoupled representation is obtained from a clever choice of operating points on which the function is evaluated. II. NOTATION Vectors are denoted by lower-case bold-faced letters, e.g. v R p . Matrices are denoted by bold-faced upper-case letters, e.g. A. The columns of a matrix are denoted by bold- faced lower-case letters, e.g. we have A =(a 1 ,..., a r ) R p×r . Tensors are denoted by caligraphic letters, e.g. T R n×m×N . The elements of a matrix A or a (third-order) tensor T may be denoted as A =(A) ij or T =(T ) ijk , respectively. The trace of the matrix A R p×p is denoted by tr A = p i=1 a ii . The pseudo-inverse of a matrix A is denoted by A . The outer product of the vectors a R p and b R q is denoted by a b = C R p×q . The Kronecker product

Transcript of Decoupling multivariate polynomials for nonlinear state...

Page 1: Decoupling multivariate polynomials for nonlinear state ...homepages.vub.ac.be/~pdreesen/pub/decuyper2019cdc_lcss.pdf · The model structure is split in a linear and a nonlinear part

Decoupling multivariate polynomials for nonlinear state-space models

Jan Decuyper1 and Philippe Dreesen2 and and Johan Schoukens1,3 and Mark C. Runacres1 and Koen Tiels4

Abstract— Multivariate polynomials are omnipresent inblack-box modelling. They are praised for their flexibilityand ease of manipulation yet typically fall short in termsof insight and interpretability. Therefore often an alternativerepresentation is desired. Translating the coupled polynomialsinto a decoupled form, containing only univariate polynomialshas hence become a popular option. In this paper two newpolynomial decoupling techniques are introduced. The featuresand performance of both methods are illustrated on a nonlinearstate-space model identified from data of the forced Duffingoscillator.

I. INTRODUCTION

In many cases (nonlinear) black-box identification is themost suitable modelling technique, e.g. when white boxmodels are too expensive or when they are too complex to bederived from first principles. It is however often desirable tobe able to interpret the model. Given the generic nature of theblack-box structures, generally complex models, describedby a large number of parameters are obtained. In this paperthe complexity of such models is reduced while maintaininghigh accuracy.

We will focus on the polynomial nonlinear state-space(PNLSS) model class [1] since it is a very generic nonlinearmodel class. Such models are typically built making useof coupled, i.e. containing cross-terms, multivariate polyno-mials. Note however that multivariate polynomials are alsocommon in parellel Wiener models [2], parallel Wiener-Hammerstein models [3], [4] and NARX models [5], [6].Inherent to using generic multivariate polynomials is thata large number of degrees of freedom are used during theidentification, even when the underlying nonlinear relation-ship can in fact be grasped by a considerably less complexfunction. The objective is then to retrieve such a reduced,simplified form.

*This work was supported by the Fund for Scientific Research (FWO-Vlaanderen) under projects G.0280.15N and G.0901.17N, EOS Project no30468160, the Swedish Research Council (VR) via the project NewLEADS– New Directions in Learning Dynamical Systems (contract number: 621-2016-06079), and by the Swedish Foundation for Strategic Research (SSF)via the project ASSEMBLE (contract number: RIT15-0012).

1Jan Decuyper and Johan Schoukens and Mark C.Runacres are with Faculty of Engineering Technology(INDI), Vrije Universiteit Brussel, 1050 Brussel, [email protected],[email protected]

2Philippe Dreesen is with Department of Fundamental Electricity andInstrumentation (ELEC), Vrije Universiteit Brussel, 1050 Brussel, [email protected]

3Johan Schoukens is with Department of Electrical Engineering,Eindhoven University of Technology, Eindhoven, The [email protected]

4Koen Tiels is with Department of Information Technology, UppsalaUniversity, SE-75105 Uppsala, Sweden [email protected]

What is preferred is a description in which only univariate(polynomial) functions are used. Given their single-inputsingle-output nature they are much more tractable and easilyvisualised. A univariate description can additionally result ina reduction of the number of parameters, especially for largefunctions with a large number of inputs and large polynomialdegrees. A number of methods, able to decouple multivariatepolynomials into a set of univariate polynomials, have beenproposed. In [7], [2] it was shown that the cross-terms of amultivariate polynomial can be eliminated by diagonalisingthe matrix holding the coefficients of the polynomial. Themethod, however, requires a tensor decomposition of whichthe order grows with the degree of the polynomial. An ele-gant solution was proposed in [8] showing that a decoupledform can be inferred from the first order derivate informationof the coupled function. The procedure limits the order of theJacobian tensor which is to be decomposed to 3, irrespectiveof the degree of the polynomial. Recent results of applyingpolynomial decoupling to PNLSS models were presented forthe Bouc-Wen hysteresis system in [9], [10]. The the numberof nonlinear parameters was reduced from 90 for the fullPNLSS model to 51 for the decoupled representation.

Direct usage of the method of [8] is limited to thosecases in which the decomposition of the three-way tensorof derivative information is unique. We will show thatthis uniqueness is not easily met. Two alternative methodswhich are not constrained by this uniqueness condition arepresented:

1) The first method is a regularisation-based approach thatimposes smoothness on the derivatives of the obtainedunivariate functions.

2) The second method is a geometric approach in whicha decoupled representation is obtained from a cleverchoice of operating points on which the function isevaluated.

II. NOTATION

Vectors are denoted by lower-case bold-faced letters, e.g.v ∈ Rp. Matrices are denoted by bold-faced upper-caseletters, e.g. A. The columns of a matrix are denoted by bold-faced lower-case letters, e.g. we have A = (a1, . . . ,ar) ∈Rp×r. Tensors are denoted by caligraphic letters, e.g. T ∈Rn×m×N . The elements of a matrix A or a (third-order)tensor T may be denoted as A = (A)ij or T = (T )ijk,respectively. The trace of the matrix A ∈ Rp×p is denotedby trA =

∑pi=1 aii. The pseudo-inverse of a matrix A is

denoted by A†.The outer product of the vectors a ∈ Rp and b ∈ Rq

is denoted by a ◦ b = C ∈ Rp×q . The Kronecker product

Page 2: Decoupling multivariate polynomials for nonlinear state ...homepages.vub.ac.be/~pdreesen/pub/decuyper2019cdc_lcss.pdf · The model structure is split in a linear and a nonlinear part

of two matrices A and B is denoted A ⊗ B. The Khatri-Rao product of two matrices is denoted by A�B [11]. TheHadamard (element-wise) product of A and B is denoted byA ∗B.

The vectorization operator that converts a matrix or atensor to a vector by concatenating its columns is denotedby vec (A) = a. The inverse operation is denoted by A =unvec (a), where it is assumed that the dimensions of theresulting matrix or tensor are clear from the circumstances.The mode-n matricisation of a tensor T is denoted by T(n).The columns of T(n) are the mode-n fibers (columns, rows,tubes) of T .

As a quality metric for models, the relative root-mean-square output error is used,

erms =rms (ymeas − y)

rms (y), (1)

with ymeas being the true output and y representing themodelled output.

III. PROBLEM STATEMENTThe context is that of polynomial nonlinear state-space

models, {x(k + 1) = Ax(k) + Bu(k) + f(x(k), u(k))y(k) = Cx(k) + Du(k),

(2a)(2b)

where k = t/Tsis the time index with Ts the sampling

period and the matrices have the following dimensions: A ∈Rn×n with n the number of state variables, B ∈ Rn×m

with m the number of input variables, C ∈ Rp×n with p thenumber of outputs, D ∈ Rp×m. The state equation contains amultivariate polynomial f : Rn+m → Rn. A similar functioncould be present in the output equation. Both functions maycontain all possible cross-products between state and inputvariables raised to a user defined total power. The modelstructure is split in a linear and a nonlinear part to be able toinitiate the identification procedure from a linear model [1].

The goal is to decouple f into a set of univariate polyno-mial functions, g (also branches), given a linear transforma-tion of the function-inputs (V) and a linear transformationof the function-outputs (W), x(k + 1) = Ax(k) + Bu(k) + Wg

(VT

[x(k)u(k)

])y(k) = Cx(k) + Du(k),

(3a)

(3b)

with W ∈ Rn×r, r the number of branches in g and V ∈R(n+m)×r.

The decoupled form of a general multivariate vectorfunction f is represented graphically in Fig. 1.

A. Polynomial decoupling via tensor decompositionWe would like to reduce the complexity of a general vector

function f(x(k),u(k)) by decoupling it in a set of univariate

functions, i.e f(k) = Wg(

VT

[x(k)u(k)

]). For the decoupled

form, the Jacobian of f(k) is parameterised as

Jf (k) = W diag

(g′i

(vTi

[x(k)u(k)

]))VT . (4)

x1

xn

u1

um

f(x, u)

q1

qn

x1

xn

u1

um

VT

z1

zr

g1(z1)

gr(zr)

W

q1

qn

Fig. 1. Graphical illustration of the coupled multivariate nonlinear functionf on the left hand side and a decoupled formulation with nonlinear univariatebranches g on the right hand side. q denotes the outputs to the multivariatepolynomial, i.e. the left hand side in either (3a) or (3b) minus the linearpart in the right hand side.

The decoupling method, presented in [8], proceeds bycollecting the Jacobian matrices for a set of operating pointsinto a three-way tensor. It can be shown that W and Vcan then be determined from a simultaneous matrix diago-nalisation problem. This can be solved using the CanonicalPolyadic Decomposition (CPD) [12], [13], [11]. Since theCPD decomposes the tensor into a sum of rank-one terms,we have that

T =(J(1)f , . . . ,J

(N)f

)=

r∑j=1

wj ◦ vj ◦ hj , (5)

where N is the number of operating points. In short handnotation T = JW,V,HK, with

Hij = g′j

(vTj

[x(i)u(i)

]). (6)

The vectors hj contain nonparametric estimates of g′j . Theunivariate mapping functions gj then follow from polynomialbasis function expansions and the coefficients are estimatedfrom a linear regression of hj .

The success of the decoupling is linked to the uniquenessproperties of the CPD. If a unique decomposition intoa certain number r, of rank-one terms can be obtained,the underlying (unique) decoupled function is found. Thisimplies that the function f admits an exact decoupled formof r branches.

B. The issue of non-unique CPD

When the number of branches, required in the exactdecoupled form of f (notice that all coupled polynomials canbe written in decoupled form) surpasses a critical value, thesolution to the CPD becomes non-unique. This is observedin practice by retrieving non-correlated (non-polynomial)entries in hj . Also in case of single-output functions, e.g.f : Rn → R1, as they appear in NARX models, one cannotrely on the uniqueness of the decomposition since in thatcase the Jacobian results in a matrix rather than a tensor. Asufficient condition for uniqueness was provided by Kruskal[14], [15].

The requirement of uniqueness of the CPD constitutes theprincipal limitation of the method presented in [8].

C. Minimal example: f = x21x2As a minimal example we consider the single-output

coupled multivariate monomial

f = x21x2. (7)

Page 3: Decoupling multivariate polynomials for nonlinear state ...homepages.vub.ac.be/~pdreesen/pub/decuyper2019cdc_lcss.pdf · The model structure is split in a linear and a nonlinear part

-2 0 2-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

-2 0 2-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

-2 0 2-0.5

0

0.5

1

Fig. 2. Visualisation of the vectors hj which contain the nonparametricestimates (black markers) of the mapping functions g′j obtained from theCPD of the Jacobian Eq.(7)). In red are the non-meaningful polynomialparameterisations of hj .

It can easily be shown that a 3-branch decoupled formexists:

x21x2 =W[1/6 1/6

−1/3] gz31z32z33

, z =

VT 1 1−1 10 1

x[x1x2

]. (8)

Nevertheless, since the uniqueness properties of the CPDare not satisfied for single-output functions, no meaningfuldecomposition is obtained from the Jacobian informationsampled in a number of operating points. The results areshown in Fig. 2. Similar results are obtained irrespective ofthe value of r.

The decoupling was unsuccessful resulting in a relativefunction output error of 78%,

ef =

rms

(f(x1, x2)− wg

(VT

[x1

x2

]))rms(f(x1, x2))

= 0.78. (9)

IV. POLYNOMIAL DECOUPLING VIAREGULARISED TENSOR DECOMPOSITION

This section discusses a regularisation-based approach forimposing smoothness on the obtained hj (black markersin Fig. 2) during iterations of an alternating least squares(ALS) routine. In this way, it is possible to again retrieve ameaningful decoupled representation of a given function f ,where the plain CPD approach fails.

A. Computing the CPD: Alternating Least-Squares

The first question that arises when computing the CPDis determining the rank (or, relatedly, choosing a propernumber of terms) r. In practice, a proper estimate for therank is obtained by assessing the approximation error for aset of candidate ranks r = 1, 2, . . . until a sufficient level ofaccuracy is reached. Note that, unlike the matrix rank, thetensor rank may exceed the largest dimension of the tensor.

Once r is fixed, a typical algorithm for computing theCPD uses an alternating approach. The optimisation problemformulation underlying the CPD can be written as

minimiseT

∥∥∥T − T∥∥∥ , (10)

where T = JW,V,HK. By fixing two of the factors, thethird factor can be found from a linear least-squares problem.For the first factor W, we have the minimisation problem

minimiseW

∥∥∥T(1) −W(V �H)T∥∥∥2F, (11)

leading to the update formula

W(k+1) = T(1)

((V �H)T

)†, (12)

which is often rephrased as

W(k+1) = T(1)(V �H)(VTV ∗HTH)†, (13)

which is computationally simpler.Similar updating formulas can be found for the factors

V and H, and ultimately the following updating scheme isobtained:

W(k+1) = T(1)(H�V)(HTH ∗VTV) (14)

V(k+1) = T(2)(H�W)(HTH ∗WTW) (15)

H(k+1) = T(3)(V �W)(VTV ∗WTW), (16)

where in the right hand side of Eq. (14), H and V are H(k)

and V(k), and accordingly in the right hand side of Eq. (15)H and W are H(k) and W(k+1) and V and W are V(k+1)

and W(k+1) in the right hand side of Eq. (16). Next k isupdated, k = k + 1. This updating scheme is repeated untilsome convergence criterion is met.

B. Smoothness-Promoting Regularisation

From the parameterisation of the Jacobian slices (Eq. (4))it can be seen that the elements of the j-th column of

H are the evaluations of g′j(vTj

[xu

]). This allows for a

smoothness-promoting regularised cost criterion to computethe updates of the factor H in the ALS iterations. For

notational convenience we introduce zk = VT

[x(k)u(k)

].

We consider the regularised cost function

QH =∥∥∥T(3) −H(V �W)T

∥∥∥2F+

r∑j=1

λj‖DSjHej‖22‖DSjZT ej‖22

, (17)

where• λj is the regularisation parameter that balances the

least-squares fit of the tensor versus the smoothness-promoting term for branch j. The smoothness promot-ing term has the interpretation of minimising the localLipschitz constant of hj . The values of λi are increasedin a stepwise manner until sufficient smoothness isobserved. If imposing smoothness results in a too highcost, r can be increased.

• ej is the j-th canonical vector (i.e. the j-th column ofthe r × r identity matrix);

• D is a smoothing operator matrix that approximates thefinite-difference approximations of a derivative, e.g.

D =

−1 1 0 . . . 0

0 −1 1. . .

......

. . .. . .

. . . 00 . . . 0 −1 1

;

Page 4: Decoupling multivariate polynomials for nonlinear state ...homepages.vub.ac.be/~pdreesen/pub/decuyper2019cdc_lcss.pdf · The model structure is split in a linear and a nonlinear part

• Sj is a row-permutation matrix that is necessary toreorder the rows of the j-th column of H such that eachrow in Z is sorted in ascending order; this order maychange in each iteration, as it depends on vj , which areupdated in every loop.

The objective QH can be rewritten as

QH = tr

(TT

(3)T(3) −TT(3)H(V �W)T

−(V �W)HTT(3) + (V �W)HTH(V �W)T

+

r∑j=1

λjeTj H

TSTj D

TDSjHei

eTj ZS

Tj D

TDSjZT ej

) (18)

Setting the partial derivative of QH with respect to Hequal to zero leads to the expression

−(V �W)TTT(3) + (V �W)T (V �W)HT

+r∑

j=1

λjeje

Tj H

TSTj D

TDSj

eTj ZS

Tj D

TDSjZT ej= 0.

(19)

Solving for vec (H) using the identity vec (AXB) =(BT ⊗A) vec (X) results in

vec (H) =

((V �W)T ⊗ (V �W)

+

r∑j=1

λj(eje

Tj )⊗ ST

j DTDSj

eTj ZS

Tj D

TDSjZT ej

)†vec(T(3)(V �W)

).

(20)

Ultimately H is reconstructed as

H = unvec

(((V �W)T ⊗ (V �W)

+

r∑j=1

λj(eje

Tj )⊗ ST

j DTDSj

eTj ZSTj D

TDSjZTej

† vec (T(3)(V �W)) (21)

From H one then obtains smooth functions gj (SectionIII-A). The value of λj is tuned considering both the levelof smoothness that is obtained and the approximation errorof the decomposition.

C. Minimal example: f = x21x2

Decomposing the Jacobian information of the minimalexample function f = x21x2 using regularised CPD returnssmooth univariate functions. The results are shown in Fig. 3.

The relative function output error is at the 1%-level forλi = 1,

ef =

rms

(f(x1, x2)− wg

(VT

[x1

x2

]))rms(f(x1, x2))

= 0.01. (22)

Notice that the decoupled form consists of 4 branchesthere were 3 branches should suffice (Eq. (8)). An additionalbranch is required in order to obtain a sufficiently lowminimum of the regularised cost. Given the non-unique

-2 0 2-1

0

1

2

3

4

-2 0 2-6

-5

-4

-3

-2

-1

0

1

-2 0 2-6

-5

-4

-3

-2

-1

0

1

-4 -2 0 2-1.5

-1

-0.5

0

0.5

1

Fig. 3. Results of the regularised CP decomposition of the Jacobianinformation of f = x2

1x2, sampled in a number of operating points (black).Red are polynomial fits.

solution of the CPD, the results heavily depends on theinitialisation. Regularisation helps to retrieve a solution inwhich hj is smooth.

V. GEOMETRIC POLYNOMIAL DECOUPLING

It can be insightful to think of the decoupling problemfrom a geometric perspective. Loosely said we are lookingfor a minimal number r of directions vTj along which themultivariate function f can be represented by a sum ofunivariate polynomials gj . It can be understood that anymultivariate polynomial boils down to a univariate polyno-mial should it be evaluated along a straight line in its inputspace. This suggests that a particular choice of operatingpoints, along lines, effectively decouples the polynomial forthe considered input points. The latter is exploited in whatwe will call geometric polynomial decoupling.

The procedure tackles the trilinear problem of finding V, gand W by fixing V and solving for the two other terms. Thedirections in V are tuned in a later step when the functionis plugged back into the model and the model is optimisedin terms of output error.

A. Selecting VThe directions, vTj , along which to select the operating

points, and hence decouple the function are chosen asrandom unit vectors, uniformly distributed in the input space.

In order to grasp local features of the multivariate polyno-mial, a large number of vT

j , i.e. much larger than the actuallyrequired number of branches is used. Redundant brancheswill be removed during the estimation of W.

B. Obtaining gAn expression for gj can be obtained analytically by

implementing the linear conditions on the inputs, given byvTj , into the coupled function,

gj(x, u) = f

(vTj

[xu

]), (23)

where f represents one component of f should it be a vectorfunction. In that case the components of f are decoupledindependently.

Remark: Since constructing the regression matrix is ofteneasier, gj may be obtained from polynomial fitting of a

Page 5: Decoupling multivariate polynomials for nonlinear state ...homepages.vub.ac.be/~pdreesen/pub/decuyper2019cdc_lcss.pdf · The model structure is split in a linear and a nonlinear part

number of evaluations of f along the directions vTj . Noticethat there is no need for the Jacobian when decoupling thepolynomial geometrically.

C. Estimating WHaving obtained V and the expressions for g, an appro-

priate output transformation W is found from a linear least-squares problem.

minimiseW

N∑k=1

∥∥∥∥f(k)−Wg(

VT

[x(k)u(k)

])∥∥∥∥22

, (24)

where x(k) and u(k) are sampled from the training data soto cover the required region of the input space.

At this point the redundant branches are removed. This isdone on the basis of a singular value decomposition (SVD)which is used at the core of the pseudo-inverse.

D. Minimal example: f = x21x2

Solving the decoupling of the minimal example f = x21x2starts by selecting a number of random unit vectors vTj . Anillustration of the function f with highlighted evaluationsalong vTj is provided in Fig.4. In this example r wasinitially chosen equal to 15 (or in other words the functionis decoupled along 15 directions). After estimating W it isfound that 5 of the 15 branches remain unused and can beremoved.

The function is however very accurately decoupled, yield-ing a relative function output error close to machine preci-sion,

ef =

rms

(f(x1, x2)− wg

(VT

[x1

x2

]))rms(f(x1, x2))

= 3.3× 10−15. (25)

In comparison, a regularised based decoupling (Section IV)with r = 10 branches yields an error of ef = 0.03. Thehigher error is mainly due to an increased difficulty in tuningthe hyper parameters λj for larger values of r.

In [16] it was shown that the number of branches ofdecoupled functions can be reduced iteratively when pluggedback into the nonlinear state-space model. Combining thegeometric decoupling with the model reduction presented in[16] results in accurate low complexity models.

Remark: The major challenge of coming to a minimalnumber of branches is the selection of the directions vTj .This is the topic of further study. In this article they areuniformly distributed (see Section V-A).

VI. MODELLING THE FORCED DUFFING OSCILLATOR

In this section the decoupling of multivariate polynomialsis illustrated for nonlinear state-space models. As a numericalexample, a model for the forced Duffing equation is used.Consider the true system to be described by

y(t) + cy(t) + ky(t) + kNLy3(t) = u(t), (26)

with parameters: c = 0.1 as viscous damping coefficient,k = 1 as linear stiffness coefficient and kNL = 0.5 asnonlinear stiffness coefficient. The displacement, y(t), isconsidered as the output and the force u(t) is the input.

1

0-1

-0.5

1

0

0.5

0.5

0

1

-1-0.5 -1

Fig. 4. Surface of the function f = x21x2 in black. Operating points along

a number of random unit vectors vTj in red.

0 0.2-1.6

-1.4

-1.2

-1

-0.8

-0.6

-0.4

-0.2

0

-0.2 0-1.6

-1.4

-1.2

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

-0.2 0-2

-1.5

-1

-0.5

0

0.5

1

-0.2 0-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

Fig. 5. Visualisation of the vectors hj obtained from the CPD of theJacobian tensor of the PNLSS Duffing model (Eq.(27)).

Using time integration (Runge-Kutta), a training data set isgenerated from the system. As input signal, u(t), 3 random-phase multisine realisations are used. A fourth realisationis used only for validation purposes. The PNLSS modelis initialised using the best linear approximation (BLA)[17].Obtaining the BLA and training the PNLSS to theinput-output data is done using the PNLSS toolbox [18].An accurate model yielding a relative validation error oferms = 8 × 10−6 is obtained (definition in Section II). Themodel is of the following form,{

x(k + 1) = Ax(k) + bu(k) + f(x(k), u(k))y(k) = cx(k) + du(k),

(27a)(27b)

with n = 2, m = 1 and p = 1. The multivariate polynomialis described by f(k) = Eζ(x(k), u(k)) where E is a matrixof coefficients and ζ is a vector of monomial basis functions.Since the structure of the underlying system is assumedto be unknown (black-box setting), an apriori choice forζ is required. In this case all 10 possible cross-productmonomials with a total degree equal to 3 were chosen,

ζ(k) = [x31(k) x21(k)x2(k) x21(k)u(k) · · · ]T . (28)

Notice that an overly complex description of the nonlinearityis obtained since all entries of E are free to take non-zerovalues, even-though the true system contains only a singlecubic term (Eq. (26)).

Attempting to decouple f using the plain CPD methodreturns the results depicted in Fig. 5. A similar figure isobtained independent of the number of terms in the decom-

Page 6: Decoupling multivariate polynomials for nonlinear state ...homepages.vub.ac.be/~pdreesen/pub/decuyper2019cdc_lcss.pdf · The model structure is split in a linear and a nonlinear part

position (r). It is clear that the CPD returns non-polynomialhj which are the nonparametric estimates of gj .

A. Regularised decoupling results

Applying the regularised decoupling method to the PNLSSmodel returns smooth decoupled functions. Since the unique-ness conditions of the CPD should not necessarily be met,the number of branches can be chosen freely. Table I showsthe results for a scan over r. The relative function errorsare reported for both components of f, together with therelative model output error erms. The decoupled function isan intermediate result. It is plugged back into the model andsubjected to a global nonlinear optimisation from input tooutput. The resulting error is provided in the bottom row.All errors are obtained on the validation data.

The results show that a much simpler nonlinear model,containing a single univariate polynomial branch can beretrieved via decoupling. The obtained model moreoveroutperforms the originally estimated coupled PNLSS model.

TABLE IREGULARISED DECOUPLING RESULTS OF THE DUFFING PNLSS MODEL.

r = 1 r = 2 r = 3 r = 4ef1 rel. 0.03 0.006 0.005 0.005ef2 rel. 0.02 0.02 0.02 0.009erms rel. 0.01 0.005 0.002 0.005

ermsrel. optimised 2× 10−6 9× 10−7 3× 10−6 2× 10−4

B. Geometric decoupling results

Using the geometric polynomial decoupling a nearly exactdecoupling is obtained. The results are presented in Table II.The nonlinear function is originally decoupled in r = 20branches. Using the reduction method of [16] this numbercan be reduced down to r = 1. Removing branches maylead to increased errors. The reduced models are howevergood initialisation points for further optimisation (bottomrow results).

TABLE IIGEOMETRIC DECOUPLING RESULTS OF THE DUFFING PNLSS MODEL.

r = 1 r = 2 r = 3 r = 4 r = 20ef1 rel. 0.40 0.12 0.09 0.20 6× 10−16

ef2 rel. 0.40 0.14 0.10 0.20 1× 10−15

erms rel. 0.16 0.04 0.04 0.05 2× 10−15

erms rel. optimised 2× 10−6 2× 10−5 1× 10−5 2× 10−5 1× 10−5

Being able to initialise the decoupled PNLSS model witha nearly exact decoupled function can be crucial for modelswhich are sensitive to instability [16].

VII. DISCUSSION

One could argue that there would be no need for decou-pling and reduction of the model should a simpler modelstructure be identified from the start. Estimating a coupledPNLSS model rather than estimating the decoupled (e.g.single branch) model directly from the data is however oftenjustified since the latter suffers from an initialisation problem,

leading to poor local minima. Direct estimation of the r = 1structure of the Duffing model (Section VI) results in arelative validation error of erms = 0.27 while following themethods presented in this work a erms = 2× 10−6 is found.

VIII. CONCLUSIONS

Two methods able to decouple multivariate polynomialswere presented. One is based on regularised tensor decom-position and starts from the Jacobian information of thefunction. The other decouples the function from a geometricperspective by using a particular set of operating points.Both methods reduce the complexity of nonlinear state-space models while maintaining high accuracy. Results werepresented for a numerical test case of the forced Duffingoscillator.

REFERENCES

[1] J. Paduart, L. Lauwers, J. Swevers, K. Smolders, J. Schoukens, andR. Pintelon, “Identification of nonlinear systems using polynomialnonlinear state space models,” Automatica, vol. 46, pp. 647–657, 2010.

[2] M. Schoukens and Y. Rolain, “Cross-term elimination in parallelWiener systems using a linear input transformation,” IEEE Trans.Instrum. Meas, vol. 61, no. 3, pp. 845–847, 2012.

[3] M. Schoukens, K. Tiels, M. Ishteva, and J. Schoukens, “Identificationof parallel Wiener-Hammerstein systems with a decoupled staticnonlinearity,” IFAC Proc. Vol., vol. 47, no. 3, pp. 505–510, 2014.

[4] P. Dreesen, M. Schoukens, K. Tiels, and J. Schoukens, “Decouplingstatic nonlinearities in a parallel wiener-hammerstein system: A first-order approach,” in IEEE Int. Instrum. and Meas. Tech. Conf. (I2MTC)Proc., May 2015, pp. 987–992.

[5] S. Billings, Nonlinear System Identification: NARMAX Methods in theTime, Frequency and Spatio-Temporal Domains. Wiley, 2013.

[6] D. Westwick, G. Hollander, K. Karami, and J. Schoukens, “Usingdecoupling methods to reduce polynomial NARX models,” in 18thIFAC Symposium on System Identification, Stockholm, Sweden, 2018.

[7] K. Usevich, “Decomposing multivariate polynomials with structuredlow-rank matrix completion,” in 21st Int. Symp. on Math. Theory ofNetworks and Systems (MTNS), 2014, pp. 1826–1833.

[8] P. Dreesen, M. Ishteva, and J. Schoukens, “Decoupling multivariatepolynomials using first-order information,” SIAM Journal on MatrixAnalysis and Applications, vol. 36, no. 2, pp. 864–879, 2014.

[9] A. Fakhrizadeh Esfahani, P. Dreesen, J. P. Noel, K. Tiels, andJ. Schoukens, “Parameter reduction in nonlinear state-space identi-fication of hysteresis,” Mech. Syst. And Sign. Proc., vol. 104, p. 884,895 2018.

[10] P. Dreesen, A. Fakhrizadeh Esfahani, J. Stoev, K. Tiels, andJ. Schoukens, “Decoupling nonlinear state-space models: case studies,”in Proceedings of the International Conference on Noise and VibrationEngineering (ISMA), 2016, pp. 2639–2646.

[11] T. G. Kolda and B. W. Bader, “Tensor decomposition and applica-tions,” SIAM Rev., vol. 51, no. 3, pp. 455–500, 2009.

[12] J. D. Carroll and J. J. Chang, “Analysis of individual differences inmultidimensional scaling via an N -way generalization of ”Eckart-Young” decomposition,” Psychometrika, vol. 35, pp. 283–319, 1970.

[13] R. Harshman, “Foundations of the PARAFAC procedure: Models andconditions for an ”explanatory” multimodal factor analysis,” UCLAWorking Papers in Phonetics, vol. 16, pp. 1–84, 1970.

[14] J. B. Kruskal, “Three-way arrays: rank and uniqueness of trilinear de-compositions, with application to arithmetic complexity and satistics,”Lin. Algebra Appl., vol. 18, pp. 95–138, 1977.

[15] ——, Rank decomposition, and uniqueness for 3-way and N-wayarrays. Elsevier Science Publishers B.V., 1989.

[16] J. Decuyper, K. Tiels, and J. Schoukens, “Retrieving highly structuredmodels starting from black-box nonlinear state-space models usingpolynomial decoupling,” February 2019, Unpublished internal note.

[17] R. Pintelon and J. Schoukens, System Identification: A FrequencyDomain Approach, 2nd Edition. Wiley-IEEE Press, 2012.

[18] K. Tiels, PNLSS 1.0 A polynomial nonlinear state-spacetoolbox for MATLAB, 1st ed., Vrije Universiteit Brussel,http://homepages.vub.ac.be/jschouk/, 2016.