Implementation of the web-based flow-oriented...
Transcript of Implementation of the web-based flow-oriented...
Implementation of the web-based flow-oriented approach
to the process control and optimization2, Oxana Ye. Rodionova
1Moscow State Institute of Electronics and Mathematics, Moscow
2Semenov Institute of Chemical Physics, Russian Academy of Sciences, Moscow
1 2,3Yury V. Zontov , Alexey L. Pomerantsev
3 State South Research & Testing Site RAS, Sochi
Projection methods + SIC approach
For real world data SIC method is often used together with some multivariate
calibration method. The most useful outcomes are yielded when regression
results are supplemented with the SIC results.
the
Fig. 9 Procedure flow-chart
Initial Data Set{X,Y}
PLS/PCR modelFixed number of PCs
SIC-modeling
- +[v , v ]
b: b , bmin sic
RMSEC RMSEP
y
SIC prediction
Using the obtained RPV we can solve the prediction problem for any given predictor vector x
(e.g. a spectra). The result of prediction is presented as an interval for response y + – V = [ v , v ]
+ twhere v =maximum xa, for a subject to a О A – tv = minimum xa, for a subject to a О A.
This is a typical problem of linear programming, so interval V can be found for any new object
x (see Fig. 6), and there is no need to construct A explicitly.
SIC Status Classification
The interval approach helps to build the explicit classification of calibration or new objects
(test set samples, or new x-data alone) in relation to the obtained calibration model, which is
presented by the RPV. This classification is constructed using the following measures of
prediction quality (Fig. 7):
SIC-residual r is defined as - + – r (x, y) = [ y – 0.5 (v + v ) ]/b
SIC-leverage h is defined as - + – h (x, y) = [ 0.5 (v – v ) ]/b
Two fundamental equations
| r (x, y) |=1 – h (x, y) and | r (x, y) |=1 + h (x, y)
divide the SIC-residual (r) vs. SIC-leverage (h) plane into three categories: insiders,
outsiders and outliers. This can be represented as the object status plot (OSP, Fig. 8) for any
dimensionality of initial data set and for any number of estimated model parameters.
Fig. 5 Typical shape of RPV - polyhedron in the 2-dimensional space
yV
+v
v –
ty=x a
a1
a2
RPV
Fig. 6 SIC interval prediction Fig. 7 Prediction & calibration intervals
Fig.8 SIC Object Status Plot.
Region of possible parameter values (RPV)
The Region of Possible (parameter) Values ( RPV, see Fig.5) is a set in parameter space
determined as
pA={aОR : |Xa – y|< b }
Region A is a closed convex polyhedron. This is a volumetric analogue of the conventional
parameter point estimates, which is calculated by some traditional regression method, e.g.
PLS.
Introduction
We present CSDesign, a modular web-based system providing a flow-oriented style
environment for the synthesis and analysis of various processes. Two mathematical
methods are implemented for the process control and optimization.They are the PLS
regression and the SIC-method.
S1 S2 S3
M1 M2 M3 CM1 CM2 CM3
W1 W2 W3 CW1 CW2 CW3
WR1
WR2
MR1
MR2
S
W CW
M CM PA1 A2
A3 A4
A5 A6
I6
II8
III11
IV14
V16
VI19
VII25
Fig. 2 Multi-stage production process
S1
S2
S3
W1
W2
W3
WR
1
WR
2
CW
1
CW
2
CW
3
M1
M2
M3
MR
1
MR
2
CM
1
CM
2
CM
3
A1
A2
A3
A4
A5
A6 Y
Tra
inin
g
Se
t(1
02
)
Y
Te
st
Se
t(5
2)
Y
XV XVI XVII
XI XII XIII XIV XV XVI XVII
XI XII XIII XIV
Fig. 3 Process variables set
M
Multistage process
The functionality of the system is illustrated with a real-world example of a multi-stage
continuous technological process. It is represented by 25 key variables X and by one
output variable y, which is the final quality of the product. The whole cycle is divided into
seven stages numbered by the Roman numerals. First stage (I) is represented by six
input variables (W1, W2, W3 and S1, S2, S3) that stand for the properties of the raw
components S and W.
At the second stage (II) component W is refining and variables WR1 and WR2 characterize
this process. Variables CW1, CW2, and CW3 (Stage III) represent the properties of the
outcome product CW. The next stage (IV) is mixing of the raw component S and the refined
component CW. The result M is characterized by variables M1, M2, and M3. Afterward,
blend M is also refined (Stage V) with the process characteristics MR1 and MR2, and the
properties of outcome CM are presented by variables CM1, CM2, and CM3 (Stage VI). The
last stage (VII) stands for the ultimate amendments, which are done with additives A1,…,
A6. The output variable (P=y) is the final product quality.
Data set description
We have a collection of historical data measured for 154 samples that characterize proper
process performance. Each sample corresponds to the entire production cycle shown on
Fig. 2. The whole data set is divided horizontally (by samples) in two parts: the training set
(102 objects) and the test set (52 object). All data are also divided vertically (by variables) into
7 blocks in conformity with the technological stages (see Fig. 3).
Fig. 10 The modeling network
Fig. 11 SIC-Results module GUI
Fig. 12 OSP Module GUI
Software implementation
CSDesign is a software system, that uses the ideas of flow-based programming
approach. In computer science, flow-based programming (FBP) is a
programming paradigm that defines applications as networks of "black box"
processes, which exchange data across predefined connections by message
passing, where the connections are specified externally to the processes.
This approach allows you to extend application functionality by adding new
modules or by making changes in their interaction patterns without having to
change their internal structure.
Also, CSDesign is web-based, which means, that you don’t need to install it on all
computers in your laboratory. Everything you’ll need is modern web-browser.
Process control and optimization
Using these data, we construct a series of PLS1 regression models.
Each model is denoted here by the operator XY(M), which maps the X block, X(M), to
the Y block, y. Each XY model uses the same number of PLS principal components k.
The main purpose of these models is the prediction of the output quality variable y at
each (M-th) stage of production process. The predicted value could be further compared
with a desired quality level. Too large difference signalizes that something is wrong and
the process demands active improvements at the next (M+1)-th stage. To verify these
corrections, a process engineer may try out various values of the variables that
characterize stage M+1. The corresponding model XY(M+1): X(M+1)⇒y can validate the
solution. Therefore ,the system of such models serves as an “adviser” that helps the
engineer to make a decision. However, this adviser cannot predict the future outcome y
exactly. There is always some uncertainty. To present it, the corresponding SIC models
are used. These models are built on the base of the relative PLS models with a given
number of principal components, k.
Fig. 1 Base classes and interfaces
Main features
The main features of the system are as follows:
• Extensibility
This means that the functionality and the range of tasks solved by the system may be
expanded by addition of a new task-specific software module.
The programming framework developed as a CSDesign part provides a number of
interfaces, which a new software module should implement (see Fig. 1) .
• Intuitiveness
This means a specific problem development in the form of a flow-chart drawing including
such regular actions as drag-n-drop of the solutions' components, and their interconnection
with links.
• Server-side calculations
Most of all complex and resource-consuming calculations are processed remotely and the
results are transferred back to a client asynchronously using the Ajax technology.
The rich user interface is used primarily for visualization and input.
SIC definitionSimple Interval Calculation (SIC) is a method for linear modeling that gives the result of
prediction directly in the interval form. The primary SIC consequence is a radically new object
classification that can be interpreted using a two-dimensional object status plot (OSP), ‘SIC
residual vs. SIC leverage’ .
SIC basic assumption
All errors involved in the general calibration problemty = xa + e
are limited (sampling errors, measurement errors,
modeling errors, etc.), which would appear to be a
reasonable supposition in many practical applications.
This assumption means that there exists a positive b value
(initially unknown), which limits the difference between the
predicted response and the true response value y.
Prob { |e|>b } = 0 and for any 0<b<b Prob { |e|>b } > 0.
b is called the maximum error deviation (MED).
Fig. 4 Examples of error distributionsNormal and some finite distributions considered in SIC.
-b +b
e
Process modeling
To accomplish the process modeling task several software components such as
matrix input module (ch_input), autoscaling module, and PLS module (ch_pls)
were implemented in the system (see Fig. 1).
The SIC method is adopted and incorporated into the SIC and the Object Status
Plot drawing modules (ch_sic, ch_sic_osp, ch_sic_out).
Each module consists of a GUI-part written in HTML, a DLL-library and several
Matlab m-files.
Fig. 10 shows the “network” used in modeling of one stage of the process under
consideration. You can build such a network simply by dragging the necessary
modules from the tool-box panel to the CSDesign’s main workspace. The Input
modules contain the calibration and test data sets . The PLS module calculates
the PLS model, based on the number of principal components and data
preprocessing, defined on it’s settings screen. Finally, the SIC module calculates
the SIC model, that can be interpreted using the OSP and prediction intervals.
One can see them by clicking on the “magnifier glass button” on the Intervals
module (see Fig. 11) and the OSP module (see Fig. 12).
Conclusions
The employment of flow-oriented approach to user interface development allows
us to define the modeling task in an intuitive and user-friendly manner and
implement the process in the form of a network of reusable modules.