Identification and Neural Networks

Jump to first page

10. 10. 2001 NIMIA Crema, Italy 1

Identification and Neural

NetworksG. Horváth

I S R GDepartment of Measurement and Information Systems

Jump to first page


Identification and Neural NetworksIdentification and Neural Networks

Part III

Industrial application

http://www.mit.bme.hu/~horvath/nimia

Jump to first page


OverviewOverview Introduction Modeling approaches Building neural models Data base construction Model selection Modular approach Hybrid approach Information system Experiences with the advisory system Conclusions

Jump to first page


Introduction to the problemIntroduction to the problem Task

to develop an advisory system for operation of a Linz-Donawitz steel converter

to propose component composition to support the factory staff in supervising the steel-

making process A model of the process is required

Jump to first page


LD Converter modelingLD Converter modeling

Jump to first page


Phases of steelmaking

1. Filling of waste iron 2. Filling of pig iron 3. Blasting with pure

oxygen 4. Supplement additives 5. Sampling for quality

testing 6. Tapping of steel and

slag

Linz-Donawitz converterLinz-Donawitz converter

Jump to first page






slag


Jump to first page


Nonlinear input-output relation between many inputs and two outputs

input parameters (~50 different parameters) certain features “measured” during the process

The main output parameters temperature (1640-1700 CO -10 … +15 CO)

carbon content (0.03 - 0.70 % ) More than 5000 records of data

Main parameters of the processMain parameters of the process

Jump to first page


The difficulties of model building High complexity nonlinear input-output relationship No (or unsatisfactory) physical insight Relatively few measurement data There are unmeasurable parameters Noisy, imprecise, unreliable data Classical approach (heat balance, mass balance)

gives no acceptable results

Modeling taskModeling task

Jump to first page


Modeling approachesModeling approaches Theoretical model - based on chemical, physical

equations Input - output behavioral model

Neural model - based on the measured process data

Rule based system - based on the experimental knowledge of the factory staff

Combined neural - rule based system

Jump to first page


System

Neural Model

components(parameters)

temperature

predicted temperature

+

-

oxygen

measuredtemperature

components(parameters)

-

+

Copy of Model

predictedoxygen

InverseModel

model outputtemperature

The modeling taskThe modeling task

Jump to first page


„„Neural” solutionNeural” solution The steps of solving a practical problem

Preprocessing

Neural network

Postprocessing

Results

Raw input data

Jump to first page


Creating a reliable database the problem of noisy data the problem of missing data the problem of uneven data distribution

Selecting a proper neural architecture static network dynamic network

regressor selection Training and validating the model

Building neural modelsBuilding neural models

Jump to first page


Creating a reliable databaseCreating a reliable database Input components

measure of importance physical insight sensitivity analysis principal components

Normalization input normalization output normalization

Missing data artificially generated data

Noisy data preprocessing, filtering

Jump to first page


Building databaseBuilding database Selecting input components, dimension reduction

Initial databaseInitial database

Neural network trainingNeural network training

Sensitivity analysisSensitivity analysis

Input parameter of small effect on the output?

Input parameter of small effect on the output?

New databaseNew database

Input parameter cancellation

Input parameter cancellation

no

yes

Jump to first page


Building databaseBuilding database Dimension reduction: mathematical methods

PCA

Non-linear PCA

ICA

Combined methods

Jump to first page


Data compression, PCA networksData compression, PCA networks Principal component analysis (Karhunen-Loeve

transformationx2y2 y1

x1

Jump to first page


Output

Input

x

x

x

xFeed-forward weight vector

y

= w xTy

1

2

3

Nw

Oja networkOja network Linear feed-forward network

Jump to first page


iii ywxμyw

Oja networkOja network Learning rule Normalized Hebbian learning

wxw yμy

Jump to first page


Output

Input

x

x

x

x

y

y

y

W

Weights modified by Oja rule

N

1

12

23

M

WyyxyW TT

Oja subspace networkOja subspace network Multi-output extension

Jump to first page


121

)1(11 wxw yy

11)1(

1)1(

1)1()2( wxwxwxx yT

222121

)1(22

22

)2(22 wwxwxw yyyyyy

=

...=

21

1

)1(

221

)1(

2)1(

ii

i

jijii

iiii

iiii

yyyy

yyyy

yy

wwx

wwx

wxw

WyyyxW TT LT

GHA, Sanger networkGHA, Sanger network

Multi-output extensionOja rule + Gram-Schmidt orthogonalization

Jump to first page


Nonlinear data compressionNonlinear data compression Nonlinear principal components

x 1

y 1

x 2

Jump to first page


Independent component analysisIndependent component analysis A method of finding a transformation where the

transformed components are statistically independent Applies higher order statistics Based on the different definitions of statistical

independenceThe typical task

Can be implemented using neural architecture

Bxs

nAsx

Asx 1AB

Jump to first page


Normalizing DataNormalizing Data Typical data distributions

0 10 20 30 40 50 600

100

200

300

400

500

600

700

800

900

1000

1600 1620 1640 1660 1680 1700 1720 17400

10

20

30

40

50

60

70

Jump to first page


NormalizationNormalization Zero mean, unit standard deviation

Normalization into [0,1]

Decorrelation + normalization

}min{}max{

}min{~

ii

ii

xx

xxx i

P

p

pii x

Px

1

)(1

P

pi

pii xx

P 1

2)(2 )(1

1i

ip

ipi

xxx

)(

)(~

P

p

pp

P 1

T)()( ))((1

1xxxxΣ )...(diag 1 N

)(~ )(T2/1)( xxΦx pp

jjj λ Σ

TN 21Φ

Jump to first page


NormalizationNormalization Decorrelation + normalization = Whitening transformation

Original

Whitened

Jump to first page


Missing or few dataMissing or few data Filling in the missing values

Artificially generated data using trends using correlation using realistic transformations

),(),(

),(),(

~

jjii

jiji

CC

CC iii xx ˆ

)(ˆˆ )()( hj

hi xfx

)}()({),(R txtxEt iii

Jump to first page


Few dataFew data

Artificial data generation using realistic transformations

using sensitivity values: data generation around various

working points (a good example: ALVINN)

Jump to first page


Noisy dataNoisy data EIV

input and output noise are taken into consideration modified criterion function

SVM -insensitive criterion function

Inherent noise suppression classical neural nets have noise suppression

property (inherent regularization) averaging (modular approach)

Jump to first page


Errors in variables (EIV)Errors in variables (EIV) Handling of noisy data

System][

,,i

kxmn

*kx

*ky

][,

ikpn

][,,

ikymn

][ikx

][iky

M

i

ikk x

Mx

1

][1

M

i

ikk y

My

1

][1

M

ik

ikkx xx

M 1

2][2, )(

1

1

M

ik

ikky yy

M 1

2][2, )(

1

1

))((1

1 ][

1

][2, k

ik

M

ik

ikkxy yyxx

M

kxk nxx ,* kyk nyy ,

*

M

i

ikxkx n

Mn

1

][,,

1

M

i

ikyky n

Mn

1

][,,

1

Jump to first page


EIVEIV LS vs EIV criterion function

EIV training

N

k j

kNN

ky

kfj

xfe

N

M

12,

, ),(

2 W

WW

2,

,2,

, ),(

2 kx

kx

k

kNN

ky

kfk

e

x

xfeMx

W

N

kkNNkLS xfy

NC

1

2** )),((1

W

N

k ku

kk

ky

kNNkEIV

xxxfy

N

MC

12,

2

2,

2 )()),((

W

),(, WkNNkkf xfye

Jump to first page


EIVEIV Example

Jump to first page


EIVEIV Example

-1 -0.5 0 0.5 1 1.5-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Jump to first page


Support Vector Machine

(SVM)

+Better generalization (upper bounds)

+Selects the more important input samples

+Handles noise

+~Automatic structure and parameter selection

Why SVM?

„Classical” Neural Networks (MLP)

-„Overfitting”

- Model- Structure- Parameter

Selectiondifficulties

SVMSVM

Jump to first page


SVMSVM Special problem of SVM

selecting hyperparameters insensitive RBF type SVM: , C

slow „training”, complex computations SVM-Light Smaller, reduced teaching set

difficulty of real-time adaptation

Jump to first page


0 1 2 3 4 5-1.5

-1

-0.5

0

0.5

1

1.5

C=1, =0.05, σ=0.9

0 1 2 3 4 5-1.5

-1

-0.5

0

0.5

1

1.5

C=1, =0.05, σ =1.9

Selecting the optimal parametersSelecting the optimal parameters

Jump to first page


Selecting the optimal parametersSelecting the optimal parameters

Sigma

Jump to first page


Selecting the optimal parametersSelecting the optimal parametersMean square error

Sigma

Jump to first page


-10 -8 -6 -4 -2 0 2 4 6 8 10-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

EIV-SVM comparisonf(x)=sin(x)/xTraining pointsSupport vectorsTraining result of the SVMTraining result with EIVTraining result with MLP

Comparison of SVM, EIV and NNComparison of SVM, EIV and NN

Jump to first page


Model selectionModel selection Static or dynamic

Dynamic model class regressor selection basis function selection

Size of the network number of layers number of hidden neurons model order

Jump to first page


Model selectionModel selection NARX model, NOE model

Lipschitz number, Lipschitz quotient

)(),...,2(),1(),(),...,2(),1(),( mkykykynkkkkfky xxxx

p

np

k

n kqnq/1

1)(

,ji

iiij

yyq

xx

1 2 3 4 5 6 7 8 94

5

6

7

8

9

10

11

12

0 5 10 15 206

8

10

12

14

16

18

Jump to first page


Model selectionModel selection Lipschitz quotient

general nonlinear input-output relation, f(.) continuous, smooth

multivariable function

bounded derivatives

Lipschitz quotient

Sensitivity analysis

n,...,x,xxfy 21

niMx

ff

ii ,...,2,1'

jiyy

qji

iiij

,

xxLqij 0

nnnn

xfxfxfxx

fx

x

fx

x

fy '

2'

21'

122

11

Jump to first page


Model selectionModel selection Lipschitz number

for optimal n

Mn

xxx

yq

n

nij

222

21

)(

;

21

22

21

)1(

n

nij

xxx

yq

212

22

1

)1(

n

nij

xxx

yq

p

np

k

n kqnq/1

1)(

),...,2,1,;( all amongquotient Lipchitzlargest th )( )( Njijiqkkq nij

n

nn qq 1 nn qq 1

NNp 02.001.0

Jump to first page


Modular solutionModular solution Ensemble of networks

linear combination of networks Mixture of experts

using the same paradigm (e.g. neural networks) using different paradigms (e.g. neural networks +

symbolic systems) Hybrid solution

expert systems neural networks physical (mathematical) models

Jump to first page


Cooperative networksCooperative networks

Ensemble of cooperating networks (classification/regression)

The motivation Heuristic explanation

Different experts together can solve a problem betterComplementary knowledge

Mathematical justificationAccurate and diverse modules

Jump to first page


Ensemble of networksEnsemble of networks Mathematical justification

Ensemble output

Ambiguity (diversity)

Individual error

Ensemble error

Constraint

xx j

M

jj y

0

,y

2,y)((x) xxd

2)( xxx jj yd

2,)( xxx yya jj

1

jj

Jump to first page


Ensemble of networksEnsemble of networks Mathematical justification (cont’d)

Weighted error

Weighted diversity

Ensemble error

Averaging over the input distribution

Solution: Ensemble of accurate and diverse networks

xx j

M

jjaa

0

,

xx j

M

jj

0

,

2,y)((x) xxd ,),( xx a

x

xxx dfE )(),( x

xxx dfE )(),( x

xxx dfaA )(),(

AEE

Jump to first page


Ensemble of networksEnsemble of networks How to get accurate and diverse networks

different structures: more than one network structure (e.g. MLP, RBF, CCN, etc.)

different size, different complexity networks (number of hidden units, number of layers, nonlinear function, etc.)

different learning strategies (BP, CG, random search,etc.) batch learning, sequential learning

different training algorithms, sample order, learning samples

different training parameters

different initial parameter values

different stopping criteria

Jump to first page


Linear combination of networksLinear combination of networks Fixed weights

NNM

NN1

NN2

α1

α2

αM

Σ

y1

y2

yM

x

NNM

α 0

y0=1

xx j

M

jj y

0

,y

Jump to first page


Linear combination of networksLinear combination of networks Computation of optimal coefficients

simple average

, k depends on the input for

different input domains different network (alone

gives the output) optimal values using the constraint

optimal values without any constraint

Wiener-Hopf equation

MkMk ...1,1

kjjk ,0,1

PR 1*1

y

Ty xyxyR E

1

k

xxyP dE

Jump to first page


Mixture of Experts (MOE)Mixture of Experts (MOE)

Expert 2Expert 1

Gating network

μ1

μ

g1 g2

x

Expert M

gM

Σ

Jump to first page


Mixture of Experts (MOE)Mixture of Experts (MOE) The output is the weighted sum of the outputs of the

experts

is the parameter of the i-th expert

The output of the gating network: “softmax” function

is the parameter of the gating network

M

j

ij

i

e

eg

1

i iTv x

i

M

ii μgμ

1

11

M

iig 0ig i),( ii fμ Θx

i

Tiv

Jump to first page


Mixture of Experts (MOE)Mixture of Experts (MOE)

Probabilistic interpretation

the probabilistic model with true parameters

a priori probability

],|[ ii Eμ xy g P ii i ( | , )x v

i

iii PgP ),|(),(),|( 000 xyvxxy

g P ii i i( , ) ( | , )x v x v0 0

Jump to first page


Mixture of Experts (MOE)Mixture of Experts (MOE) Training

Training data

Probability of generating output from the input

The log likelihood function (maximum likelihood estimation)

X l l

l

L

x y( ) ( ),

1

P P i Pl l li

l li

i

( | , ) ( | , ) ( | , )( ) ( ) ( ) ( ) ( )y x x v y x

P P P i Pl l

l

Ll

il l

iil

L

( | , ) ( | , ) ( | , ) ( | , )( ) ( ) ( ) ( ) ( )y x y x x v y x

1 1

ii

lli

l

l

PiPL ),|(),|(log),( )()()( xyvxx

Jump to first page


Mixture of Experts (MOE) Mixture of Experts (MOE) Training (cont’d)

Gradient method

The parameter of the expert network

The parameter of the gating network

and 0v

x

i ),(

0x

i

),(

i i i

l li

l

Li

i

k k h( ) ( ) ( )( ) ( ) 1

1

y

v v xi i il

il

l

Llk k h g( ) ( ) ( ) ( ) ( )

1

1

Jump to first page


Mixture of Experts (MOE)Mixture of Experts (MOE) Training (cont’d)

A priori probability

A posteriori probability

jj

lllj

illl

ili

Pg

Pgh

),(

),(

xy

xy

),|(),( il

il

il

i iPgg vxvx

Jump to first page


Mixture of Experts (MOE)Mixture of Experts (MOE) Training (cont’d)

EM (Expectation Maximization) algorithmA general iterative technique for maximum likelihood

estimation Introducing hidden variables Defining a log likelihood function

Two steps: Expectation of the hidden variables Maximization of the log likelihood function

Jump to first page


EM (Expectation Maximization)EM (Expectation Maximization) A simple example: estimating means of k (2) Gaussians

f (y|µ1) f (y|2)

Measurements

Jump to first page


EM algorithmEM algorithm A simple example: estimating means of k (2) Gaussians

hidden variables for every observation,

(x(l), z(l)1, z(l)

2)

likelihood function

Log likelihood function

expected value of with given

)2()()(2

)(1 if1and0 X lll xzz

)1()()(2

)(1 if0and1 X lll xzz

)(

)((),()( )(

1

)()()( liz

il

k

ii

li

li

l xfzxfxf

)((log),(log )(

1

)()()(i

lk

i

lii

li

l xfzzxf

L

)(liz 21 and

2

1

)(

1)(

)(1

)(

)(

jj

l

ll

xxf

xxfzE

2

1

)(

2)(

)(2

)(

)(

jj

l

ll

xxf

xxfzE

Jump to first page


Mixture of Experts (MOE)Mixture of Experts (MOE) A simple example: estimating means of k (2) Gaussians

Expected log likelihood function

where

The estimate of the means

)((log)(

)()((log][][ )(

12

1

)(

)()(

1

)(i

lk

i

jj

l

il

ilk

i

li xf

xxf

xxfxfzEE

L

]

2

1exp[

2

1)(

2

2

2

)(

i

il x

xxf

2

2)(

2

)(

2

1

2

1log)(log

pi

il x

xf

L

l

li

li zEx

L 1

)()( ][1̂

Jump to first page


Hybrid solutionHybrid solution Utilization of different forms of information

measurement, experimental data

symbolic rules

mathematical equations, physical knowledge

Jump to first page


Solution: integration of measurement information and

experimental knowledge about the process results

Realization: development system – supports the design and testing

of different hybrid models advisory system

hybrid models using the current process state and input information,

experiences collected by the rule-base system can be used to

update the model.

The hybrid information systemThe hybrid information system

Jump to first page


NNK

O1

O2

OK

OSZ O

NN2

NN1

...

Correction term expert system

Output estimator expert system

Input data preparatory expert system

Output expert system

Mixture of experts system

Input data

Contro

l

Oxygen prediction

No prediction (explanation)

The hybrid-neural systemThe hybrid-neural system

Jump to first page


Neural Model

Data preprocessing

Input data

Data preprocessing and correction


Jump to first page


Expert for selecting a neural model

NN 2

NN 1

NN k

Input data

OkO2O1

Conditional network running


Jump to first page


NN k

NN 2

Expertfor

selecting an NN model

Output expert

Input data

NN 1

Ox. prediction

O1 O2 Ok

Parallel network running -

postprocessing


Jump to first page


Result satisfactory

N Y

Neural network running, prediction

making

Modification of input parameters

Iterative network running


Jump to first page


Data table management

Analysis module

Analysis user interface

Filters Filters user interface

Neural network module

Neural networks user interface

Expert system module

Expert system user interface

Hard disk

User

The hybrid information systemThe hybrid information system

Jump to first page


The structure of the systemThe structure of the system

Process andoxygen models(hybrid neural-expert models)

Process control and database servers

Datafiltering

User interface controller

Dataconversion

Result verification,model maintenance,model adaptation.

Services(explanation,

help, etc.)

User

Real time display system

Controller

Process control system and database system interface

Jump to first page


Jump to first page


ValidationValidation Model selection

iterative process utilization of domain knowledge

Cross validation fresh data on-site testing

Jump to first page


The hit rate is increased by + 10% Most of the special cases can be handled Further rules for handling special cases should

be obtained The accuracy of measured data should be

increased

ExperiencesExperiences

Jump to first page


For complex industrial problems all available information have to be used

Thinking about NNs as universal modeling devices alone Physical insight is important The importance of preprocessing and post-processing Modular approach:

decomposition of the problem cooperation and competition “experts” using different paradigms

The hybrid approach to the problem provided better results

ConclusionsConclusions

Jump to first page


References and further readingsReferences and further readings Pataki, B., Horváth, G., Strausz, Gy., Talata, Zs. "Inverse Neural Modeling of a Linz-Donawitz Steel Converter" e & i

Elektrotechnik und Informationstechnik, Vol. 117. No. 1. 2000. pp.

Strausz, Gy., G. Horváth, B. Pataki : "Experiences from the results of neural modelling of an industrial process" Proc. of Engineering Application of Neural Networks, EANN'98, Gibraltar 1988. pp. 213-220

Strausz, Gy., G. Horváth, B. Pataki : "Effects of database characteristics on the neural modeling of an industrial process" Proc. of the International ICSC/IFAC Symposium on Neural Computation / NC’98, Sept. 1998, Vienna pp. 834-840.

Horváth, G., Pataki, B. Strausz, T.: "Neural Modeling of a Linz-Donawitz Steel Converter: Difficulties and Solutions" Proc. of the EUFIT'98, 6th European Congress on Intelligent Techniques and Soft Computing. Aachen, Germany. 1998. Sept. pp.1516-1521

Horváth, G. Pataki, B. Strausz, Gy.: "Black box modeling of a complex industrial process", Proc. Of the 1999 IEEE Conference and Workshop on Engineering of Computer Based Systems, Nashville, TN, USA. 1999. pp. 60-66

Bishop, C, M.: “Neural Networks for Pattern Recognition” Clanderon Press, Oxford, 1995.

Berényi, P.,, Horváth, G., Pataki, B., Strausz, Gy. : "Hybrid-Neural Modeling of a Complex Industrial Process" Proc. of the IEEE Instrumentation and Measurement Technology Conference, IMTC'2001. Budapest, May 21-23. Vol. III. pp. 1424-1429.

Berényi P., Valyon J., Horváth, G. : "Neural Modeling of an Industrial Process with Noisy Data" IEA/AIE-2001, The Fourteenth International Conference on Industrial & Engineering Applications of Artificial Intelligence & Expert Systems, June 4-7, 2001, Budapest in Lecture Notes in Computer Sciences, 2001, Springer, pp. 269-280.

Jordan, M. I., Jacobs, R. A.: “Hierarchical Mixture of Experts and the EM Algorithm” Neural Computation Vol. 6. pp. 181-214, 1994.

Hashem, S. “Optimal Linear Combination of Neural Networks” Neural Networks, Vol. 10. No. 4. pp. 599-614, 1997.

Krogh, A, Vedelsby, J.: “Neural Network Ensembles Cross Validation and Active Learning” In Tesauro, G, Touretzky, D, Leen, T.Advances in Neural Information Processing Systems, 7. Cambridge, MA. MIT Press pp. 231-238.

Identification and Neural Networks

Documents

Transcript of Identification and Neural Networks