00170970

8/2/2019 00170970

1/11

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 39, NO. 6, DECEMBER 1992 51 1

Process Control by On-Line TrainedNeural Controllers

J u li o T a n o m a r u , Student Member, IEE E, and S i g e r u O m a t u , Member, IEE E

Abstract-Although neural controllers based on multilayerneural networks have been demonstrating high potential in thenonconventional branch of adaptive process control called neu-rocontrol, practical applications are severely limited by the longtraining time that they require. This paper addresses the ques-tion of how to perform on-line training of multilayer neuralcontrollers in an efficient way in order to reduce the trainingtime. At first, based on multilayer neural networks, structuresfor a plant emulator and a controller are described. Only a littlequalitative knowledge about the process to be controlled isrequired. The controller must learn the inverse dynamics of theplant from randomly chosen initial weights. Basic control con-figurations are briefly presented and new on-line training meth-ods, based on performing multiple updating operations duringeach sampling period, are proposed and described in algo-rithmic form. One method, the direct inverse control errorapproach, is effective for small adjustments of the neural con-troller when it is already reasonably trained; another, the pre-dicted output error approach, directly minimizes the controlerror an d greatly improves convergence of the controller. Simu-lation and experimental results using a simple plant show theeffectiveness of the proposed neuromorphic control s truc turesand training methods.

I. INTRODUCTIONRTIFICIAL neural networks (ANN'S) are mathe-Amatical systems designed to deliberately employprinciples on which biological nervous systems are be-lieved to be based. By embodying such principles, ANNmodelers expect to be able to emulate the informationprocessing capabilities of biological neura l systems to so meextent [ l ] , 2 ] . The recent efforts in applying ANN'S tocontrol of dynamical processes resulted in the fledgling,but very promising, field of neurocontrol[3] , 4] ,which canbe thought of as a nonconventional (connectionist) branchof adaptive control theory. The neurocontrol appeal forcontrol engineers can be primarily explained by threereasons: 1 ) biological nervous systems are living examplesof intelligent adaptive controllers, 2) ANN'S are essen-tially adaptive systems able to learn how to performcomplex tasks, and 3 ) neurocontrol techniques are be-lieved to be able to overcome many of the difficulties that

Manuscript received April 17, 1992; revised July 31 and August 22,The authors are with the Department of Information Science andIEEE Log Number 9204064.

1992.Intelligent Systems, University of Tokushima, Tokushima 770 Japan.

conventional adaptive control techniques [5 ] uffer whendealing with nonlinear plants or plants with unknownstructure.Generally, the training of ANN'S involved in a controlsystem can be performed on- or off-line, depending onwhether they execute useful work or not while learning istaking place. Although off-line training is usually straight-forward, conditions for assuring good generalization ofthe ANN'S through the control space are difficult toattain, making on-line training always necessary in controlapplications. In fact, the training should ideally occurexclusively on-line, with ANN'S learning at high speedfrom any initial set of weights.Neural or neuromorphic controllers (NC's), i.e., con-trollers based on an ANN structure, have been proposedto learn the inverse dynamics of the control plant fromthe observation of the plant input-output relationshipthrough time [6], [7] .Although such NC's have proven tobe effective in controlling complex systems, the usuallyvery long time required for training makes controllersobtained via conventional techniques preferable in manypractical applications. The slow convergence of NC's im-plies poo r contro l performance and robustness, especiallyduring the first stages of training. In order to makeneurocontrol a viable alternative to industrial control ofprocesses, there is a pressing necessity of efficient on-linetraining algorithms for NC's.This paper first presents general neuromorphic controlstructures based on multilayer ANN'S. Only a little quali-tative knowledge about the plant is necessary. Basic con-trol configurations are briefly presented, but, althoughthey perform well after enough learning, long trainingtime is necessary until good control performance isachieved. In conventional adaptive systems in which oneupdating operation takes place for each sampling period,increasing the sampling rate seems to be a natural steptoward improving performance. However, in many practi-cal cases the sampling rate cannot exceed some limit,either by physical or practical (technical) constraints. Con-sidering such a limitation, new on-line training methods,based on efficient use of plant input-output data anddistinction between sampling and learning frequencies,are proposed in order to reduce NC training t ime anddescribed in algorithmic form. Simulation a nd ex perimen-tal results attest the effectiveness of the proposed neuro-morphic control structures and training methods.

0278-0046/92$03.00 0 1992 IEE E

8/2/2019 00170970

2/11

512 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 39, NO . 6, PECEMBER 199211. NEU ROM ORP HICONTROLTRUCTURES

A. Problem Statement(SISO) processConsider the discrete-time single-input-single-outputy ( k + 1 ) = f [ y ( k ) , y ( k - 1 ) > . . . , Y ( k- p + 11 ,

~ ( k ) ,( k - 11 , * e , u ( k - Q ) l (1 )where y denotes the output, U is the input, k is thediscrete-time index, P and Q are nonnegative integers,an d f ( . ) is some function. In many practical cases, theplant input is limited in amplitude, i.e., there exist U, an duM such that, for any k ,

U , I( k ) I U M . (2 )In this paper, the task is to learn how to control theplant described in (1) in order to follow a specified refer-ence r ( k ) , minimizing som e norm of the error e ( k ) = r ( k )- ( k ) through time. It is assumed that the only availablequalitative a priori knowledge about the plant is that pan d q , which are, respectively, estimates for P an d Q in(11, are known. In othe r words, it is assumed that a roughestimate of the order of the plant to be controlled isgiven.B. Multilayer Neural Networks for Control

Although several ANN architectures have been appliedto process control, most of the incipient neurocontrolliterature concentrates on multilayer neural networks(MNNs). MNNs are particularly attractive to controlengineers d ue to the following basic reasons:1) MNNs are essentially feedforward structures inwhich the information flows forward, from the inputsto the outputs, through hidden layers. This charac-teristic is very convenient for control engineers, used

to work with systems represented by blocks withinputs and outputs clearly defined and separated,and is not present in recurrent networks (e.g., Hop-fields model [SI), in which bidirectional nodes ap-pear and there are no true inputs and outputs.2) MNNs with as few as on e hidden layer using arbi-trary sigmoidal activation functions ar e &le to per-form any nonlinear mapping between two finite-di-mensional spaces to any desired degree of accuracy,provided there are enough number of hidden units(neurons) [9], [lo]. In other words, MNNs are veryversatile mappings of arbitrary precision. In control,usually many of the blocks involved in a controlsystem can be viewed as mappings and, therefore,can be emulated by MNNs with inputs and outputs

blocks with mapping learning ability. Based on such map-ping ability, two general neural control structures can beproposed: a plant emulator and a controller [131, [141.C. Neural Plant Emulator (PE)

Given the estimates p an d q, a MNN with m = p + q+ 1 inputs and single output can be used for emulatingf(.) in (1). Denoting the mapping performed by the plantemulator by cp,(.), and its output by y , , we have

Y l = ( P E ( X E ) ( 3 )where xE is an m-dimensional vector. For x , ( k ) =[ y ( k ) ; - . , ( k - + 11, u ( k ) ; . - , (k - q)IT, the emulatoris trained in order to minimize a norm of the emulationerror y ( k + 1) - y , . T h e PE is illustrated in Fig. l(a),where zT1 stands for the time-delay operator.D. eural Controller (NC)exists a function g(.) such thatAssume that the plant in (1) is invertible, i.e., thereu ( k ) = g [ y ( k + 1) ,Y( k) , . . . ,Y( k - p + 11 ,

~ ( k ) , u ( k - 2 ) * * * , ~ ( ke ) ] . (4 )Consider again a MNN with m-dimensional input vec-to r xc ,single outpu t u l , and an input-output relationshipbriefly represented by

U1 = d x , ) ( 5 )where cp,(.) denote s the input-output mapping of theMNN. If the output of cp,(.) approximates the output ofg(.) for corresponding inputs, the M NN can be thou ght ofas a controller in the feedforward con trol path. At instantk , the input to the plant can be obtained from ( 5 ) bysettingx c ( k ) = [ r ( k + 1 ) , y ( k ) , . - - , y ( k p + 11 ,

~ ( k ) ; * . , ~ ( k- q ) I T (6 )where the reference r ( k + 1) was used instead of th eunknown y ( k + 1). After e nough training of the NC, ifthe output error e ( k ) s kept small, it is possible to havex , ( k ) = [ r ( k + l ) , r ( k ) ; . . , r ( k p + l ) ,

u ( k - ) , . . - , u ( k- q ) I T (7 )emphasizing the feedforward natu re of the NC. Th e basicNC configuration is illustrated in Fig. l(b), where the twoalternatives t o the in put vector, as given by (6 ) an d (71, arebriefly indicated.

111.. TR AIN IN GONFIGURATIONSThe controllers training signal in Fig. l(b) provides theinformation necessary for the NC to learn the inversedynamics of th e plant in such a way that an error function

J defined on the plant output er ror e ( k ) = r ( k ) - ( k ) isminimized. In order to enable learning based on a gradi-ent method, it is necessary to compute the derivative of

properly defined.3) The basic algorithm for learning in MNNs, thebackpropagation (BPI algorithm [ l l ] , [121, belongs tothe broad class of gradient methods largely appliedin optimal control, and is, therefore, familiar tocontrol engineers.l) , 2), an d 3 ) indicate that MNNs can be thought of as the e rro r function J with respect to the ou tput of the NC,

8/2/2019 00170970

3/11

TANOMARU AND OMATU: PROCESS CONTROL BY ON-LINE TRAINED NEURAL CONTROLLERS

Fig. 1. General neuromorphic control structures. (a) Plant emulator(PE). (b ) Neural controller (NC).

i.e., 6 = - d J / d u , [13], [14]. Th e knowledge of 6 sufficesfor updating the weights of the NC via backpropagation(BP). Three controllers training schemes are here brieflydescribed [131-[161.A . Direct Inuerse Control

This configuration, depicted in Fig. 2(a), can be em-ployed for on- or off-line training. However, since thiscontrol sch eme relies on th e NCs generalization ability, itshould not be used alone as the only training scheme. O nthe other hand, training data for supervised learning canbe obtained in a straightforward way. At time k + 1, forq)], the NC can be updated in orde r to minimize an errorfunction J defined as a function of the difference u ( k )-U@), where u , ( k ) = qc [x 3k )] and, therefore , the te rm6 = - J / d u , can be easily calculated, enabling BP. Forexample, defining

xL ( k) = [ y ( k + l),..., y ( k + 1 - p ) , U(k - l),..., U(k -

J ( k ) = O S [ U ( k ) - u l (k ) ] * (8)

6 , U ( k ) - U , ( k ) . (9)the expression for 6 (with subindex k ) becomes merely

B. Direct Adaptiue ControlThis configuration is shown in Fig. 2(b). Learning isessentially on-line performed and the error function J isdefined on the plant output error, as error functions incontrol systems usually are. T he problem is that th e exactcalculation of 6 (sensitivity of J with respect to the

output of the NC) requires knowledge of the Jacobian ofthe plant. For example, a straightforward error functionand corresponding 6 term can be given by

51 3

Fig. 2. Control configurations. (a) Direct inverse control. (b) Directadaptive control. (c) Indirect adaptive control .

where e ( k ) = r ( k ) - ( k ) and the binary factor [k wasintroduced to account for the constraints on the input~ ( k ) . efining i(k) = e ( k ) [ d y ( k ) / d u ( k ) l , or the systemdescribed in (1) and (21, at instant k the factor t k isexpressed asi ( k ) > 0 an d u [ k - 1 ) =U,,,,

if{ i ( k ) < 0 an d zl(k - 1 ) = U ,otherwise.

ort k =

(1-21The role of t k is to avoid mistraining the NC forrefepxces that cannot be tracked. The inclusion of tk sequivalent to , considering th e existence of a limiter b e-tween the output of the NC and the input of the plant.When the output error results from the physical limita-tions exprtssed in (21, the limiter saturates a nd a zero-de-rivative ( t k 0) is included in (10, inhibiting learning.

On the other hand, when the reference can be physicallyfollowed, learning is performed in the usual way.C. Indirect Adaptive Control

Although in many practical cases the derivative in (11)can b e easily estimated or replaced + 1 or - , that is, thesignum of the derivative (since the algebraic signal of 6 isenough to specify the direction of the gradient of J ) , this

8/2/2019 00170970

4/11

I l I 1 ,

51 4 IEEE TRANSACTIONSON INDUSTRIAL ELECTRONICS, VOL. 39 , NO. 6, DECEMBER 1992

is not the general case. In the indirect adaptive controlscheme shown in Fig. 2(c), a PE is used to compute thesensitivity of the error function J with respect to thecontrollers out pu t [7], [13]-[16]. Since P E is a M NN , th edesired sensitivity can be easily calculated by using BP.Furthermore, the configuration in Fig. 2(c) is particularlyuseful when the inverse of the plant is ill defined, i.e., thefunction f(.) n (1) does not admit a true inverse. Forupdating the controller , the N C and PE are viewed as thevariable an d fixed parts of an m-input-single-output MN N,respectively. At the outset, the PE should be off-linetrained with a data set sufficiently rich to allow plantidentif ication, and then both the NC and the PE areon-line trained. In a sense, the PE performs system identi-fication and, therefore, for rapidly changing systems, it isreasonable to update the PE more often than the NC dueto robustness concerns.IV . EFFICIENTN-LINETRAINING

MNNs and th e BP algorithm w ere originally developedfor pattern classification problems, where the trainingpatterns are static, the training procedure and the errorfunction are straightforward, a nd real-tim e learning is notrequired. In control, training patterns for the A schange with time, several training algorithms and errorfunctions can be defined, and real-time learning is neces-sary.Slow convergence is the most severe drawback of MNNsand seriously restricts neurocontrol practical applications.Several approaches have been proposed toward conver-gence speedup in neurocontrol [131. Among them, somecommon ideas are:1) Developing of efficient BP algorithms.2) Embodiment of plant structural knowledge in thestructure of the MNNs [161.3) Hybrid systems in which A s are associated with

control structures derived from nonneural tech-niques.4) Prelearning and efficient initialization procedures.Based on the distinction between sampling frequencyand the frequency at which learning iterations are per-formed (learning frequency), new on-line training algo-rithms to reduce the NCs training time are introducedhere. In discrete-time control systems, the sampling pe-riod T, is usually chosen via rules of thumb in order tohave 2rr/T, much larger than the largest frequency in-volved in the continu ous-time system. It is normally tru ethat increasing the sampling frequency improves the sys-tems performance, but the noticeable improvementrapidly reaches a plateau. In usual adaptive control sys-

tems, the adaptive elements are normally updated once ineach sampling period in such a way that the samplingfrequency and updating (or adaptation or learning) fre-quency may b e used indistinctly. Ignorin g processing timeconstraints, it seems that the actual training time can bereduced by increasing th e sampling frequency.However, in many practical applications the sampling

frequency c annot o r should not exceed so me limit. Fo rinstance, in common industrial chemical plants, one isusually interested in processes involving large time con-stants. There is not much sense in using high samplingrates, since information redundancy would occur. In fact,very high sampling frequencies can modify the controlsystem completely and increase its complexity. It may benecessary to consider fast subprocesses and transientsthat could be ignored otherwise. Another case in whichthe sampling frequency cannot be made arbitrarily highoccurs in distributed con trol systems in which informationis sent to and received from a control unit at intervalswhich are out of control of the unit.Although the sampling period T, sets the basic pace ofthe control system, in systems with iterative learning thefrequency at which learning occurs can be thought of as adifferent tim e basis. Fo r most practical cases, T , is muchlarger than T L , he time spent in one learning iteration(updating of all network weights), and the ratio T,/T,tends to higher values as faster implemen tations of MNNsbecome available in hardware or software. Therefore, ifappropriate plant input-outp ut data is available and theonly concern is time, many learning iterations can beperformed during a sampling period and the normal andsimplest approach of a single updating per period impliesa waste of processing time. The problems are how toselect appropriate training data and how to use such dataand available time to perform meaningful training of theneuromorphic structures, that is, training that is likely toimprove control performance. Novel training methods inwhich several learning iterations take place during eachsingle sampling period are proposed as follows.A . Emulators Training

Consider that at instant k + 1 he current plant outputy ( k + l ) , the p + t - 1 previous values of y, and theq + t previous values of U are available in memory. Thenth e t pairs ( x , ( k - >, y (k + 1 - ) ) , or i = 0, l,. . . , - 1can be used as patterns for training the P E at time k + 1.For y,(k + 1 - ) = cp,[x, (k - ll , one possibility is tominimize the following error function

t - 1J , ( k ) = 0.5 A,[y( k - ) - ,( k - )] (13)

i = Owhere {Al ] (1 2 A, 2 A, 2 2 A, - , 2 0) is a nonin-creasing positive sequence whose role is to implementsome forgetting, emphasizing the most recent patterns.Example I : Assume that y(10) was just read (and,therefore, y( l1 ) is not available), p = 3, q = 2, an d t = 3.Also assume that y(9), y(S);-., y ( 5 ) an d u(9),u(S);-*,u ( 5 )are available in memory. Then the PE input vectors canbe arranged as follows:

8/2/2019 00170970

5/11

TANOMARU AND OMATU: PROCESS CONTROL BY ON-LINE TRAINED NEURAL CONTROLLERS

These input vectors and values y(8 ) , y (9 ) , an d ~ ( 1 0 )constitute the training patterns for training at time k + 1= 10. This procedure is illustrated in Fig. 3(a), where thenotation PE,,, indicates the PE's state during the kthsampling interval, after the ith learning iteration, i =0, l;.., t - 1. Equivalently, c&() denotes the input-out-put mapping performed by P Ek ,'. Clearly, &'(.) =9;' l ?O ( . ) .B. ControllerS Training: The Direct Inverse Control ErrorApproach

The approach described above has an equivalent to theNC. Consider the direct inverse control configuration(Fig. 2(a)) and assume that at instant k + 1 the currentoutput y ( k + 11, th e p + t - 1 previous values of y an dq + t previous values of u are all stored in memory. Thenth e t pairs ( x : ( k - ) ,u ( k - ) ) , = O;.., t - 1 , fo r xL(k)can be used as patterns for training the N C at time k + 1.Writing u , (k - ) = qPc[x',(k- )], for the error function= [ y ( k + l);.., y ( k - p + 11, ~ ( k l),..., u (k - q)lT

t - 1J , ( k ) = 0.5 h , [ u ( k - ) - u , ( k - ) ] ' (14)

the corresponding &term for the ith pattern becomessimplyi = O

ak,' = A , [ u ( k - ) - u , ( k - ) ] . (15)Notice that the error function in (14) is not directlybased on the plant output error. Hence, training of theneural controller does not improve control performancedirectly, unless learning has been carried out in such away that good generalization through the control spacecan be expected. In fact, controller training based exclu-sively on the direct inverse control error approach gener-ally leads to bad results, and in practice it has beenobserved that the output of NC tends to stick at someconstant value, resulting in zero training error, but obvi-ously poor control performance. This drawback, commonto training methods based on the minimization of theinverse control erro r, can b e overco me by combining suchtraining methods with others that directly minimize theplant ou tpu t erro r, as illustrated in the following example.Example 2: Assume that y ( 9 ) was just read, p = 2, andq = 3. Also assume that the values y(8 ) ,y(7) , - . . , ( 5 ) an d481 , u(7); . ., 4 3 ) are available in memory. Denoting thecurrent mapping perform ed by the NC by (p,".'0 , indicat-ing k + 1 = 9, no learning yet), the plant input u(9) canbe calculated from the relation u(9 ) = c O[ x,(9)], where

x c ( 9 )= [ r (10) , ( 9 1 ,(8 ) ,u(8>,u(7),u(6)lT.For learning based on the direct inverse control errorapproach, the following vectors are then available:

x X 6 ) = [ ~ ( 7 > , ~ ( 6 > , ~ ( 5 ) ~ ~ ( 5 > ~( 4 ) , 4 3 ) l rx : ( 7 ) = [ Y( 8 ) , ( 7 ~w ) ,m75 1 , ~ 4 1 1 ~x : ( 8 ) = t~(9),~(8)~~(7),u(7),u(6)~( 5 1 1 ~ .

515

Fig. 3. Simple multiple training schemes. (a) PE (example 1).(b ) NC , bycombining the direct inverse control error and indirect adaptive controlapproaches (example 2).

These vectors and the input values ~ ( 6 1 , ( 7 1 ,nd u(8 )constitute three training patterns (input vector and de-sired output) available for training the NC at time k + l= 9. However, since this kind of training does not mini-mize the control error directly, in practice it is necessaryto combine this approach with one of the methods de-scribed in Section 111-B and 111-C. Fig. 3(b) shows thesituation in which multiple learning based on the directinverse control error and simple learning based on theindirect adaptive control configuration are combined, re-sulting in four learning iterations per sampling period.Th e vector x,.(8) is given by [ r ( 9 ) , y ( 8 ), y (7 ) , ~ ( 7 1 , .461,an d u(5)IT, the notation NCkr' denotes the N C s stateduring the kth sampling interval, after the ith learningiteration (the corresponding mapping is given by cp,"2'>,and the P E is considered perfectly trained for the sake ofsimplicity.C, ControllerS Training: The Predicted OutputError Approac hA more complex approach for multiple training of theNC can be derived from the indirect adaptive controlconfiguration (Fig. 2(c)). Assume that t reference values

r ( k + 1 - ) , i = 0, , - - . , t- 1, ar e also available at in-stant k + 1, in addition to t + p values of y , includingy ( k + l) , an d t + q previous values of U. From ( 6 ) an d(7 ) , this is equivalent to having t input vectors x, (k - )in memory. According to (5), at instant k - , the controlinput u (k - )was generated byu ( k - ) = (pck'i.0 [ x , ( k - l l . ( 1 6 )

8/2/2019 00170970

6/11

516 IEEE TRANSACTIONSON INDUSTRIAL ELECTRONICS, VOL. 39, NO. 6 , DECEMBER 1992

However, the NC h as been updated several times sincethe time the vector x , ( k - ) was stored until the presentinstant k + 1. Hence, at time k + 1, the stored inputvector x , ( k - ) would yield the virtual plant inputu * ( k - ) = p ~ + l ~ o [ x , ( k) ] . ( 1 7 )

This means that, although the most recent data x , ( k ) ,with corresponding y ( k + 1) an d r ( k + l), can be directlyused for training by one of the adaptive control configu-rations, the same does not happen with the past valuesx , ( k - ), fo r i = l;..,t - 1. For those cases, .however,the corresponding plant responses can be predicted fromthe emulator, as shown in Fig. 4(a), byy * ( k + 1 - ) = c p ;+ ' [x ; (k - ) ] (18)

where the second superscript index of pE(Jwas omittedfor simplicity, and the vector x ; ( k - ) is given byx E ( k - ) = [ y ( k - ) , . - - , y ( k p + 1 - ),

U* k - ) ;-., U* k - q - ) I T . (19)The NC training can be understood by considering theNC and the PE as a single MN N: at time k + 1, for eac hinput vector x , ( k - ), i = l;..,t - 1, there correspondsa predicted error r ( k + 1 - ) - y * ( k + 1 - ), andtraining is performed in order t o minimize some no rm ofthe predicted output errors. A possible error functionwould be

t - 1J , ( k ) = 0.5 X , [ r ( k- ) - y * ( k - ) ] ' . (20)In the training configuration shown in Fig. 4(a), the NCis trained based on the error between the reference andthe output of the PE, and not the er ror between thereference and the plant output, as in the direct adaptivecontrol configuration in Fig. 2(b). The previous values ofthe plant output y ( k ) are still necessary for the inputvectors to the NC and PE.Example 3: Assume that y ( 9 ) was just read, p = 2, andq = 3. Also assume that ~ ( 8 1 ,( 7 ) , - . . , ( 5 ) and ~(81,u ( 7 ) ; . + , (3) are available in mem ory, as well as r ( 9 ) , r (8 ) ,and r ( 7 ) . Therefore, the NC input vectors used to com-pute th e three most recent control input values are avail-able as follows:

~ ~ ( 6 )[ r ( 7 ) , ( 6 ) , ~ ( 5 ) ,5 ) , 4 4 ) u (3 )I Tx J 7 ) = [ r ( 8 ) ~ ( 7 1 ,( 6 1 ,( 6 ) , 4 5 ) u( 4) I Tx,(8) = [ r ( 9 ) , ~ ( 8 ) , ~ ( 7 ) ,7 ) , u ( 6 ) , u ( 5 ) l T .

At t ime k + 1 = 9 , a possible training procedure isshown in Fig. 4(b), where three learning iterations takeplace during a sampling period. The predicted plant out-put values ar e com puted following the sche me in Fig. 4(a),enabling the utilization of x,(6) and x , ( 7 ) at time k + 1= 9, whereas the most recent vector x,(8) is used in theconventional way.

i = O

xc l k- , I

y l k - i t 1 1

x&7 )(b)

Fig. 4. Predicted output error approach. (a) Training scheme usingerrors between the reference and the output of the PE. (b) Multipletraining of th e NC by the predicted output error approach (example 3).

V. TRAININGLGORITHMSIn the following, the on-line training methods proposedin the previous section are written as algorithms in a sortof pseudocode, making real implementation somewhatstraightforward. The BP algorithm is assumed to be knownand is briefly represented by a routine call of the type

where BP s tands for the routine 's name, cp denotes theMNN to be updated, x is the input vector, t the corre-sponding target (desired) output, and o is the actualoutput.A. Multileaming Emulator

Assume that at instant k the vectors x ~ , ~[ y ( k - 1 -fo r i = O,-. . , t - 1, are stored in memory. Clearly, themost recent data corresponds to i = 0, whereas highervalues for i correspond to older data. Starting from theinstant k , the algorithm fo r multiple training of the em u-lator can be summarized by:

i ) ; * . , y ( k- p - ), u ( k - 1 - ), . * * u ( k - 1 - - q)lT,

Step 1) READ y ( k )Step 2 ) { emulator's training }i + t - 1REPEAT

Y1 , i ( P E ( x E , i )WCP , ,E , i , Aiy(k - ), Aiy l , i )i + i - 1UNTIL ( i < 0)x, + [ r ( k + 11, y ( k ) , * . * ,( k + 1 - p ) , u ( k -Step 3) { control input generation 1

8/2/2019 00170970

7/11

TANOMARU AND OMATU: PROCESS CONTROL BY ON-LINE TRAINED NEURAL CONTROLLERS 517

I ) ; . . , u ( k - q)IT o r[ r ( k + l ) , r ( k ) ; . . , r ( k + 1 - p ) , u ( k -l);. . ,u ( k - q)ITu ( k ) +- cp,(x,>Step 4) APPLY u ( k ) o the plant and WAIT a T,Step 5 ) ( data shifting 1i + t - 1REPEAT

xE,J X E , l - li + i - 1UNTIL ( i = 0);Step 6 ) (most recent data vector )XE,O + [ y ( k ) , ( k - l),..., ( k + 1 -p ) ,u (k) , - . . ,u ( k - q)ITStep 7) k +-- k + 1Step 8) GOTO Step 1

B. Multileaming Controller: The Direct Inverse C ontrolError Approa chAssume that at instant k the vectors XL , ~ [ y ( k -

i);.., y ( k - p - ), u ( k - 2 - );.., u ( k - 1 - q - l lT,fo r i = l;.., t - 1, are available in memory. Starting fromthe instant k , the algorithm for the multilearning con-troller becomes:

Step 1 ) READ y ( k )Step 2 ) ( most recent data vector )+ [ y ( k ) , * * * , y ( k ) , u ( k - 2 ) , * * * ,( k - q

- )ITStep 3) controller's training )i + t - 1REPEAT

' 1 , J q C ( x : , l )BP(cp, ,x:,~,A,& - 1 - ) , A , u l , , )i t i - 1UNTIL ( i < 0)x, = [ r ( k + I ) , y ( k ) ; . . , y ( k + 1 - p ) , u ( k -Step 4) ( control input generation )

I ) ; . . , u ( k - q)lT or[ r ( k + I ) , r ( k ) ; - . , r ( k + 1 - ) , u ( k -l),..., ( k - q)lTu ( k ) +- cp,(x,)Step 5 ) APPLY u ( k ) o the plant and WAIT a T,Step 6 ) ( data shifting )i c t - 1REPEAT

x ' c , l X h , J - li + i - 1UNTIL ( i = 0);Step 7) k + k + 1Step 8) GOTO Step 1C. Multileaming C ontroller: The P redicted O utputError Approa ch

Assume that the data corresponding the t + q + 1 vec-tors x ~ , ~[ r ( k - ) , y ( k - 1 - ) ; .* , y (k - p - ), u ( k- 2 - );.., u ( k - q - 1 - llT, or alternatively, =[ r ( k - ), r ( k - 1 - );.., r ( k - - ) , u ( k - 2 - ),...,u ( k - q - 1 - >lT, re available in memory at instant k .

Denoting by (p c+ ( p E ) the MNN formed by the associa-tion between the NC and th e PE , start ing from instant kthe algorithm for multiple learning of the controller be-comesStep 1 ) READ y ( k )Step 2 ) ( controller's training via the predicted error

approach 1i + t - 1REPEATj + - 0REPEAT( P C ( x C , J + , )j + j + l

UNTIL ( j > q )(virtual input vector for th e emulator)x E t y ( k - 1 - ) ; . . , y ( k - -( virtual output )BP(c p , + ( p E , xc , , ,AJr (k- ) , A J * )i + i - 1UNTIL ( i = 0)

i ) ,u ; , u T ; . . , u ~ I TY* + 4 X E )

Step 3) ( conventional training using most recent data )Step 4) ( data shifting 1i + t + q + lREPEAT

BP(cp, + qE , A,r (k) , A , y ( k ) )

' C , J X C , J - li + i - 1UNTIL ( i = 0)Step 5 ) ( control input generation )= [ r ( k+ 11, y ( k ) ; - . , y ( k + 1 - ) , u ( k -

I),..., u ( k - q)IT or[ r ( k + l ) , ( k ) ; . . , r ( k + 1 - ) , u ( k -l);.., u ( k - q>lT

u ( k )+ 'pc(xc,JStep 6) APPLY u ( k ) o the plant and WAIT a T,Step 7) k + k + 1Step 8 ) GOTO Step 1VI. EVALUATION

A. Simulation StudyIn order to evaluate the performance of the proposedcontrol system a nd training m ethods, results from simula-tion of a simple plant model are presented. Computersimulation is especially appealing when dealing with on-line training of neural-network-based con trol systems,since very time-consuming experiments are often neces-sary. Consider the continuous-time temperature control

system( 2 1 )d Y ( t ) - ( t ) + yo - Y ( t )dt C RC

where t denotes time, y ( t ) is the output temperature in"C , f ( t ) is the heat flowing inward the system, Y,, is theroom temp erature (constant, for simplicity), C denotes thesystem thermal capacity, and R is the thermal resistance

8/2/2019 00170970

8/11

518 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 39, NO. 6 , DECEMBER 1992

between the system borders and surroundings. Assumingthat R an d C are essentially constant, obtaining the pulsetransfer function for the system in (21) by the step re-sponse criterion results in the discrete-time systemY ( k + 1) = 4 T , ) Y ( k ) + b ( T , ) u ( k ) (22)

where k is the discrete-time index, u ( k ) an d y ( k ) denotethe system input and output, respectively, and Ts is thesampling period. Denoting by a an d P some constantvalues depending on R an d C , the remaining parameterscan be expressed byPa (Ty )= C a T . Y an d b(T , ) = -(1 -a

(23)For the simulation results presented below, the systemdescribed in (21)-(23) was modified to include a saturat-ing nonlinearity so that the output temperature cannotexceed some limitation. The simulated control plant isdescribed by

+ [ I - a(Ts) lYo (24)where a(TS) and b(TS) are given by (23). Th e parametersfor simulation are a = 1.00151E-4, P = 8.67973E-3, y =40.0, an d yo= 25.0 ("C), which were obtained from a realwater bath plant. The plant input u ( k ) was limited be-tween 0 an d 5 , and it is also assumed that the samplingperiod is limited by

T, 2 10 s . (25)With the chosen parameters, the simulated system isequivalent to a SISO temperature control system of a

water bath that exhibits linear behavior up to about 70Cand then becomes nonlinear and saturates at about 80C.Comp aring (24) with (l), t is clear that P = 1 an d Q = 0.Simple three-layer MNN's with 10 to 20 hidden sig-moidal neurons were chosen for the PE and NC, andconvergence was obtained for several pairs(p,q), for pranging from 1 to 3, and q from 0 to 2. The graphspresented in the following correspond to p = 1 an d q = 0(perfect m atching). Th e networks were u pdated accordingthe general ruleA w n = w n + l -

where the superscript indices denote learning iterations, wis a generic weight, J the error function to be minimized,7 > 0 the learning rate, and a 2 0 the momentum term.Before the outset, the PE is roughly off-line trained,whereas the weights of the NC are randomly initialized.From the initial condition y ( 0 ) = Yo, he target is tofollow a control reference set to 35.0"C for 0II0min, 55.0"C for 30 min < tI0 min, and 75.0"C for 60min < t 5 90 min. Each simulation cycle, for t ranging

- - , . , , . . . . . .0 2 0 4 0 6 0 80 100 120 140 160 180

Fig. 5. Performance of the simulated system after learning.

from 0 to 90 min, is called a trial. After a trial, the w eightsare conserved and a new trial starts. Fig. 5 shows thereference, and the plant input and output after goodconvergence was achieved for T , = 30 s.Th e performanc e of NC's carrying out a single learningiteration per sampling period is compared with the multi-ple-learning-iteration case in Figs. 6-8. Th e grap hs showthe total squared error per trial as a function of th enumber of trials. In Fig. 6 the NC is initialized in such away to result in small error from the beginning. In theupper graph, training is performed once a period usingthe indirect adaptive control configuration. In the lowergraph, the NC is updated 10 additional times per periodby using the direct inverse control error approach. In bo thcases the sampling period is T, = 30 s. The proposedmethod improved performance as expected, since the NCis assumed to be relatively well trained from the outsetand generalization is, thus, reliable. This fact suggests thatthis training method can be used for small adjustments ofthe NC near a good operating point, i.e., for fine tuning.The same is not true when the NC is in a naive state. Infact, experience indicates that when the weights of the NCare such that the plant output error is large, trainingexclusively based on the inverse control approach oftenleads to situations in which the outpu t of the NC sticks atsome value, resulting in p oor perform ance.Th e performan ce of a randomly initialized NC carryingout one learning iteration per period via the indirectadaptive control configuration is shown in Figs. 7(a) and8(a) for T, = 30 s. Since each simulation trial is equiva-lent to 90 min of real time, 50 trials would require 75 h ormore than three days, and that is one of the reasons whymost of the results presented here involve simulationrather than real experiments.

In the lower graphs in Fig. 7 the sampling period waschanged to 15 s (Fig. 7(b)) and 10 s (Fig. 7(c)), and anexpected perform ance improvement was observed. On th eother hand, in the remaining graphs in Fig. 8, the sam-pling period was fixed in 30 s, an d 5 or 10 additionallearning iterations per period were included, based on th eproposed predicted output error approach (Figs. 8(b) and8(c), respectively). The inclusion of only a few learning

8/2/2019 00170970

9/11

TANOMA RU AND OMATU: PROCESS CONTROL BY ON-LINE TRA

20

L 15lz$I l o -J .

eD

U

o i

(a)- 4_-----\ -.-\--.-___-7\ --.----__-\

. , . I . , I .

0 10 2 0 30 40 5 0Tr ia ls

Fig. 7. Neural controller randomly initialized performing one learningiteration per sampling period. (a) T, = 30 s. (b) T, = 15 s. (c) T, = 10 s.

12000

10000Le 8000lzU 6000m: 4000cn

2000

0

Tr ia lsFig. 8. Neural controller randomly initialized for T, = 30 s. (a) Onlyconventional on-line training. (b) Conventional training plus 5 learningiterations/period based on the predicted output error approach. (c)Conventional training plus 10 learning iterations/period.

iterations per period resulted in sharp convergence speed-up, greatly reducing the total error.The t ime spent for updating the neuromorphic struc-tures is roughly proportional to the number of learningiterations, and basically depends on the structure andnumber of weights of the MNN being considered. In thesimulation results presented here, 11 learning iterationsof a three layer NC with 80weights took abou t 11 X TL=390 ms in a personal com puter, far less than th e samplingperiod in many practical control applications.As mentioned above, the results in Figs. 5-8 were

JNED NEURAL CONTROLLERS 51 9

obtained for p = P = 1 an d q = Q = 0. What happenswhen one does not have accurate estimates of the orderof the plant is a question that arises naturally. In thesimulations carried out, training has succeeded for severalcombinations of different p an d q values, but muchcaution is recommend ed when generalizing results. Smallchanges in the initial conditions, references, structures ofthe neural nets, neuron activation functions, and so forthtoo often produce dramatic consequences in the perfor-mance results. Various combinations for p an d q weretested under different conditions, and some results areillustrated in Fig. 9. The perfect matching case is shown inFig. 9(a), whereas Figs. 9(b) and 9(c) correspond to mis-matching cases for p = 3, q = 0, an d p = 2, q = 2, re -spectively. Although in some cases, as in Fig. s ( ~ ) ,onver-gence could not be obtained, good results were achievedfor several pairs ( p ,q ) near the true values of P an d Q.B. Experimental Results

Experimen ts using a real water bath plant on which thesimulation model was based were carried out. The plantconsists of a laboratory 7-L water bath as depicted in Fig.10(a). A personal computer reads the temperature of thewater bath through a link consisting of a diode-basedtemperature sensor module (SM) and an 8-b A/D con-verter. The plant input produced by the computer islimited between 0 an d 5, and controls the duty cycle fora 1.3-kW heater via a pulse-width-modulation (PWM)scheme.For plant order estimates given by p = 1 an d q = 0,the P E and N C were designed as four-layered MNN swith m = 2 inputs normalized between -1 an d +1, 6sigmoidal neurons in each of the hidden layers, and linearoutput units. Before starting the real control operation, atrain of pulses was applied to the plant, and the corre-sponding input-output pairs were recorded. The PE wasthen roughly trained with 1 0 sets of data chosen in ord erto span a considerable region of the control space. TheNC was randomly initialized w ith small weights in or der t oavoid saturation of the sigmoidal neuron s.The sampling period T, was fixed in 30 s, and thecontrol reference for each trial was set as the same as inthe simulation study. During each sampling period, theP E was updated 15 t imes, whereas the NC performed 11learning iterations (10 iterations based on one of theproposed multiple learning approaches, and 1 iterationcorresponding to the indirect adaptive control scheme).During the first 10 trials, only the predicted output errorapproach was used for th e NCs multiple learning. Fo r thesubsequent trials, we used a combination of 5 iterationsbased on the direct inverse control error approach, fol-lowed by 5 i terations based o n the predicted output errorapproach. The learning parame ters were chosen as 77 = 0.1an d a = 0.2 at the outset, and were reduced heuristicallyas the total squared error per trial decreased, in order toimprove convergence. The resulting performance after 34trials is shown in Fig. 10(b). Good results were alsoobtained for several different initial conditions and refer-

8/2/2019 00170970

10/11

520 IEEE TRANSACTIONSON INDUSTRIAL ELECTRONICS, VOL. 39, NO. 6, DECEMBER 1992

16000 -, I

0 10 20 30 40 5 0Tr ia ls

Fig. 9. Effects of mismatching between the estimates and the optimalorder of the plant model. (a) Perfect matching for p = P = 1,q = Q = 0.(b) Mismatching with convergence for p = 3, q = 0. (c) Mismatchingwithout convergence for p = q = 2.

Computer

heater

80,

20-1 . , . , . , . , . , . , . , , , .0 20 4 0 6 0 8 0 10 0 120 140 16 0 18 0

u(k) . , . , e . , . p . , . , .0 20 40 6 0 8 0 10 0 120 140 160 180k

a - ! . , . , . , . , . , . , . , . , .0 2 0 4 0 60 8 0 100 120 140 160 180

5

__.0 20 4 0 60 80 100 120 140 160 180

k(C)

Fig. 10. Experimental results. (a) Diagram of the water bath controlsystem. (b) System performance after enough learning. (b) System sub-jected to spikelike measurement errors.

ences, indicating good generalization of the NC a nd PEover the control space. In Fig. lO(c), the initial tempera-ture was set to Yo= 20.0"C, and the control reference to40.0"C for 0 5 k s 60, [40.0 + ( k - 60)0.5]"C for 60 < ks 120, and 70.0"C for 120 < k I 180. Artificial distur-bances corresponding to measurement errors in the plantoutput were added for k = 50 (+5.0C) and k = 15 0( - 5.OoC), and fast ad aptatio n was observed.

VII. CONCLUSIONMultilayer neural controllers are able to implementeffective nonli near adaptive control. However, th e usuallylong training time they require discourages industrialpractical applications. In this paper, in order to reducesignificantly the training time, new on-line training meth-ods for multilayer NC's were proposed and described inalgorithmic form. The basic idea is to perform severaltraining iteratio ns during a single sampling period, in sucha way to improve control performance effectively. Thedirect inverse control error approach relies on generaliza-tion ability, and thus, performs well when the NC has

been already reasonably trained. Such an approach is,thus, useful for fine tuning. On the other hand, thepredicted output error approach directly minimizes thecontrol error and results in great convergence speedupeven when the NC is randomly initialized. The proposedtraining methods provide an effective way to reduce thenumber of sampling periods until good convergence isachieved.Hybrid training methods combining the proposed ap-proach es can be promptly devised, as well as extensions tomultiple-inp ut-multiple -output systems. Significant perfor-mance improvement can be achieved with only a fewadditional training iterations per period. In a reportedsimulation example, the m ere inclusion of 10 learningiterations per sampling period resulted in more than five-fold convergence speedup. The faster the MNN imple-mentation, the larger the number of possible trainingiterations in a sampling interval; the larger the number oflearning iterations and the nu mber of training data pairs,the better the local generalization and the expected per-formance of the neurocontrol system.Durin g the design of the co ntrol system, it was assumedthat good estimates for the order of the plant wereavailable. Simulation results seem to indicate that theproposed NC and PE are relatively robust to small mis-matching between the estimates p and q and optimalvalues. Although theoretical results are still to be devel-oped, the possibility of having neural structures robust toorder mismatching is encouraging, since traditional adap-tive control techniques are severely affected by modelmismatching. When devising the NC, the plant was as-sumed to be essentially invertible. Such a strong hypothe-sis is greatly relaxed by the use of the neuromorphic PE,which allows simple computation of the sensitivity of theerror function with respect to each weight of the NC.Many of the open problems are common to virtually allpractical applications of MNN's, and include the nonexis-

8/2/2019 00170970

11/11

TANOMARU AND OMATU: PROCESS CONTROL BY ON-LINE TRAINED NEURAL CONTROLLERS 521

tence of systematic methods for quasi-optimal specifica-tion of the internal structure on the MNNs and learningrates, and suitable initialization procedures.dynamical systems using neural networks, IEEE Trans. NeuralNetworks no p 4-27 1990

REFERENCESR. P. Lippmann, An introduction to computing with neural nets,IEEEAS S P M a g . , vol. 4, no. 2, pp. 4-22, 1987.D. E. Rumelhart, J. L. McClelland, and the PDP Group, ParallelDistributed Processing: Explorations in the Microstructure of Cogni-tion, vols. 1 and 2. Cambridge, MA: MIT Press, 1986.P. J. Werbos, Backpropagation and neurocontrol: A review andprospectus, in Proc. IJCNN89, Washington, DC, 1989, pp. I-A. G. Barto, Connectionist learning for control, in W. T. Miller111, R. S . Sutton, and P. J. Werbos, eds., Neural Networks forContrql. Cambridge, MA: MIT Press, 1990, pp. 5-58.K. J. Astrom and B. Wittenmark, Adaptwe Control. Reading, M AAddison-Wesley, 1989.D. Psaltis, A. Sideris, and A. A. Yamamura, A multilayeredneural network controller, IEEE Control Systems Mag., vol. 8, no.M. I. Jordan, Generic constrain ts on underspecified target trajec-tories, in Proc. IJCNN89, Washington, DC, pp. 1-217-1-225,1989.J. J. Hopfield, Neural networks and physical systems with emer-gent collective computational abilities, Proc. National AcademySci., vol. 79, pp. 2554-2558, 1982.K. Funahashi, On the approximate realization of continuousmappings by neural networks, NeuralNetworks, vol. 2, pp. 183-192,1989.K. Hornik, M. Stinchcombe, and H. White, Multilayer feedfor-ward networks are qniversal approximates, Neural Networks, vol.D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learninginternal representations by error propagation, in D. E. Rumelhartand J. L. McClelland (Eds.), Parallel Distributed Processing: Explo-rations in the Microstructure of Cognition, vol. 1. Cambridge, MA:The MIT Press, 1986, pp. 318-362.R. Hecht-Nielsen, Theory of the backpropagation neural net-work, in Proc. IJCNN89, Washington DC, 1989, pp. 1-593-1-605.J. Tanomaru and S. Omatu, Towards effective neuromorphiccontrollers, in Proc. IECON PI, Kobe, Japan, 1991, pp. 1395-1400.-, Efficient on-line training of multilayer neural controllers,to appear in J . SICE, 1992.K. Narendra, Adaptive control using neural networks, in W. T.Miller 111, R. S . Sutton, and P. J. Werbos, Eds., Neural Networksfor Control. Cambridge, MA: MIT Press, 1990, pp. 115-142.K. Narendra and K. Parthasarathy, Identification and control of

209-1-216.

2, pp. 17-21, 1988.

2, pp. 359-366, 1989.

Julio Tanomaru (S90) was born in 1964. Hereceived the B.E. degree in electronics engineer-ing in 1986 from the Instituto Tecnoldgico deAeroniutica, Brazil, and the M.E. degree inInformation Science from the Unibersity ofTokushima, Japan, in 1992.From 1987 to 1989 he worked on the designof hardware and software for digital communi-cation systems. From 1989 to 1992 he was spon-sored by a scholarship from the Japanese gov-emment. He is currently a doctoral candidateand a Research Associate in the Department of Information Science andIntelligent Systems at the University of Tokushima. His research inter-ests include neural networks and applications to adaptive control andoptimization problems, cognitive science, genetic algorithms, and paral-lel computing.

SigeruOmatu (M76) was born in Ehime, Japan,on December 16, 1946. He received the B.E.degree in electrical engineering from the Uni-versity of Ehime, Japan, in 1969, and M.E. andPh.D. degrees in electronics engineering fromthe University of Osaka Prefecture, Japan, in1971 and 1974, respectively.From 1974 to 1975 he was a Research Associ-ate, from 1975 to 1980 a Lecturer, from 1980 to1988 an Associate Professor, and since 1988 aProfessor at the Universitv of Tokushima, Jauan.From November 1980 to February 1981 and frdm June to September1986, he was a Visiting Associate on Chemical Engineering at theCalifornia Institute of Technology, Pasadena. From November 1984 toOctober 1985 he was a Visiting Researcher at the International Institutefor Applied Systems Analysis, Austria. His current interests center onneural networks and distributed parameter system theory.Dr. Omatu received the Excellent Young Researcher Award from theSociety of Instrument and Control Engineers of Japan in 1972 and aBest Paper Award from the Institute of Electrical Engineers of Japan in1991. He is an Associate Editor of the Intemational Joumal of Modelingand Simulation (U.S.) and of the I M A Joumal of Mathematical Controland Information (U.K.). He is a coauthor qf Qistributed Parameter Sys-tems: Theory and Applications (Oxford University Press).

00170970

Documents

Transcript of 00170970