Adm graphics-2003
-
Upload
john-b-cook-pe-ceo -
Category
Engineering
-
view
77 -
download
5
Transcript of Adm graphics-2003
Itinerary
“A Traveler's Guide”
• About ADM
– Data Mining
– Visualization
– Intelligent Software
– Real-Time Web Applications
• Technology
• Examples
Data Mining?
• “The search for valuable knowledge in massive volumes
of data” (Weiss and Indurkya)
• Data Mining Tool Box
– signal processing, advanced statistics, machine learning, chaos
theory, advanced visualization
• Why?
– Continuously maximize yields, throughput, profit
– Continuously minimize problems
• How?
– Learn/quantify important cause-effect relationships
– Computer models developed directly from data
• are “virtual processes” that behave like the real processes
• predict future outcomes, evaluate alternatives, show the best
pathway forward
More on Data Mining
• Data have properties that must be measured for optimal
use
– uni / multivariate relationships
– periodicity / chaos / noise
– orthogonality / redundancy
– continuity / segmentation
– dynamics
• temporal: time delays, prediction horizon
• dimensions: inertia, historical uniqueness
• Data Mining
– Maximizes/Extracts “information content”
– Automates discovery
– Integrates your data with your business
About ADM
• New Company
• Data Mining & Visualization Services and Software
• Founders have 40+ years
– engineering, artificial intelligence/expert systems,
complex programming, signal processing,
clustering/classification, machine learning, advanced
visualization, data mining
– Automotive, Environmental, Medical, Metals, Oil & Gas,
Polymers, Electronics
– special expertise in dynamical systems that constantly
change/evolve
• Fastest, most skilled anywhere
A View of Processes
PHYSICAL
PROCESS
“deterministic
dynamical
system”
inputs
outputsx1
x2
x3
x4
x5
x6
x7
x8
y1
y2
y3
multiply periodicchaotic
stochastic
non-stochastic effects
should be predictable,
therefore controllable
Multiply Periodic
(Fourier approximations)
• people
• lab tests
• controls tuning
• raw materials
• weather
Chaos
Lorenz attractor
Power Spectrum
3D Delay Plot
“Orbitals”
Prediction from = -10
“extreme sensitivity
to changes in
boundary conditions”
Role of a Process Model
process
model
inputs
outputsx1
x2
x3
x4
x5
x6
x7
x8
control
setpointse.g., pressure
temperature
speed
raw material
propertiese.g., density
surface area
molecular weight
y1
y2
y3
quality
measurese.g., strength
clarity
thickness
Things you CAN’T control
What you want to know
other state
variablese.g., humidity
amb. temperature
Things you CAN control
Deterministic vs. Empirical
n Sxy - Sx Sy
n Sx2 - (Sx)2A =
Sy Sx2 - Sx Sxy
n Sx2 - (Sx)2B =
Neural
Networks
Statistics
Empirical Models
E = m c2
du d2u d2u d2u
dt dx2 dy2 dz2 = 0+ +-
First Principles
Models
Production
Economic
Environment
Interpolation / Extrapolation
P1 Pz
Px
Py
P3
P2
P4
Pw
“a good design”
“a bad design”
“a mediocre
design”
regions where
model
extrapolates
regions where
model
interpolates
Historical Data
• noisy, small data
• designed experiments
Model Space
About Neural Networks
• Inspired by the Brain
– get complicated behaviors from lots of “simple”
interconnected devices - neurons and synapses
– models are synthesized from example data
• machine learning
x1
x2
x3
x4
x5
y1
y2
inputs outputs
About Neural Networks
• Non-linear Multivariate Curve Fitting
– the modeler prescribes inputs, outputs, hidden layer
neurons, and connections
– “Weights” are the
unknown coefficients that
are determined by the
computer from examples
using an error minimizing
“learning algorithm”
output layer
hidden layerinput layer
“weights” control connections
wi
wi+n
y1
y2
y1
y2
y1
y2
input/output examples
x1
x2
x3
x4
x5
x1
x2
x3
x4
x5
x1
x2
x3
x4
x5
About Neural Networks
• Shifts Modeling Focus
– from smaller data/big deterministic modeling effort
– to bigger data/smaller modeling effort
– combine with optimization (search) methods
• real-time prediction
• resource allocation
– deterministic + error correcting ANN hybrids
Response SurfacesWater Disinfection Trihalomethanes Formation
no data
surface fitted by non-linear
ANN model represents normal
behavior
deviation from normal
better conditions?
Optimizing With Models
process
model
inputs outputs
x1
x2
x3
x4
x5
x6
x7
x8
y1
y2
y3
PI = ay1 + b y2 + cy3
are varied by
search routineare evaluated
for goodness
optimization program
(search routine)
GOAL: determine values of inputs (within controllable
range) to optimize Performance Index while meeting
constraints.
Control Possibilities
Polymer Packaging Film Intrinsic Viscosity
PROCESS
WEATHER
ACTUAL
Prediction
BEFORE
AFTER
Discrete Event Prediction
Unexplained Polymer Film Production Shutdowns
20 minute interval , 4 minute ramp1 minute
before19 minutes
before
12 minutes
before
temperature
related web
breaks
viscosity
initial web break
Representation
Dynamics
• can require multiple delays for same variable
• delays may be vary
Different events due to
different causes are
detected at different
times prior to occurrence
Off-Spec Production
Synthetic Textile Fiber Quality
Days since July 1, 1998 Days since July 1, 1998
Days since July 1, 1998
Pro
ce
ss
Te
mp
(C
)
Pro
ce
ss
Te
mp
(C
)
Pro
ce
ss
Te
mp
(C
)
Waste
(lb
s)
Q3
(lb
s)
Am
bie
nt
Te
mp
(C
)
• During period of
high off-spec,
process tracks
ambient
temperature.
Semi-Quantitative Data
Polymer Resin Solid State Polymerization
Ambient Pressure = 29.41
Ambient
Pressure = 30.26
Fre
qu
en
cy
heater
dryer
air jets
Environmental Compliance Water Disinfection Trihalomethanes Formation
0
20
40
60
80
100
120
140
160
7/ 1/ 00 7/ 31/ 00 8/ 30/ 00 9/ 29/ 00 10/ 29/ 00 11/ 28/ 00 12/ 28/ 00 1/ 27/ 01 2/ 26/ 01 3/ 28/ 01
Tri
halo
me
than
es
(p
pb
)
FINISHED THM (ppb)
Control Model
Virtual Sensor
• EPA regulated
carcinogen
• Different models for
– prediction
– control (gains)
• $$$ Savings by
optimizing use of
ClO2 vs. Cl2straight predictions
CustomerProcess
Engineering
Tech
Service
Output of your
process
Input to your
customer’s process
Customer Feedback
Your
Continuous
Improvement
Customer’s
Continuous
Improvement
Customer Performance
Synthetic Textile Fiber Quality
0.88
0.89
0.9
0.91
0.92
0.93
0.94
0.95
5/20/99 7/9/99 8/28/99 10/17/99 12/6/99 1/25/00 3/15/00 5/4/00 6/23/00 8/12/00 10/1/00
Opelika Date
Den
ier
Co
mp
osit
e
3.5
4
4.5
5
5.5
6
6.5
Red
Bu
tto
ns
DENVMACV
Opelika MJS Red Buttons
108
110
112
114
116
118
120
8/15/99 10/4/99 11/23/99 1/12/00 3/2/00 4/21/00 6/10/00 7/30/00
Date
H_W
HIT
Co
mp
osit
e
5
6
7
8
9
10
11
Calh
ou
n M
JS
Red
Bu
tto
ns
HWHITCV
Calhoun MJS Red Buttons
AL
SC
-20
-15
-10
-5
0
5
10
15
20
7/9/99 8/28/99 10/17/99 12/6/99 1/25/00 3/15/00 5/4/00 6/23/00
Frontier DATEF
US
E C
om
po
sit
e 2
wk a
vg
, D
ela
y =
15 d
ays
325
330
335
340
345
350
355
360
365
370
32.5
Sin
gle
En
ds B
reak 2
wk a
vg
FUSEC
F32_5SEB
NC
Setpoints Quality?
Synthetic Textile Fiber Quality
19
20
21
22
23
24
25
26
6/3/99 7/23/99 9/11/99 10/31/99 12/20/99 2/8/00 3/29/00 5/18/00 7/7/00
Date
CR
IMP
_P
410
420
430
440
450
460
470
480
490
500
U3
VI3
52
3.S
P
CRIMPPA
CRIMPPB
I3523SP
0.85
0.86
0.87
0.88
0.89
0.9
0.91
0.92
0.93
0.94
0.95
0.96
6/6/99 7/26/99 9/14/99 11/3/99 12/23/99 2/11/00 4/1/00 5/21/00 7/10/00
Date
De
nie
r C
om
po
sit
e
90
100
110
120
130
140
U3
PP
37
27
.SP
an
d U
3P
P3
72
7.P
V
DENVMAC
P3727SP
P3727PV
25
27
29
31
33
35
37
6/4/99 7/24/99 9/12/99 11/1/99 12/21/99 2/9/00 3/30/00 5/19/00 7/8/00Date
Da
ily
Av
era
ge
EL
ON
GA
Co
mp
os
ite
50
60
70
80
90
100
U3
PF
32
91
.PV
an
d L
16
PT
40
9.P
V
ELONGAC
F3291PV
PT409PV
0
10
20
30
40
50
60
6/4/99 7/24/99 9/12/99 11/1/99 12/21/99 2/9/00 3/30/00 5/19/00 7/8/00
Date
FU
SE
Co
mp
os
ite
-0.5
0.5
1.5
2.5
3.5
4.5
S3
P2
78
45
.PV
FUSEC
27845PV
CRIMP
DENIER
ELONG
FUSE
Contract Optimization(electricity purchasing relative to usage)
Kosa Spartanburg Baselines
24000
26000
28000
30000
32000
34000
36000
38000
40000
42000
44000
M-98 M-98 A-98 M-98 J-98 J-98 A-98 S-98 O-98 N-98 D-98 J-99 F-99
Date
kw
or
kw
h
D1+D2 LOAD TOTAL
D1+D2 USE kwh/hour
D1+D2 USE Billing DemandEvaluate Contract Costs and Options
Koas Spartanburg Load Shifting Scenarios
37,000
38,000
39,000
40,000
41,000
42,000
43,000
9/1 9/2 9/3 9/4 9/5 9/6 9/7 9/8
Date 1998
kw
0.01
0.03
0.05
0.07
0.09
0.11
0.13
0.15
0.17
0.19
$/k
wh
Current D1+D2 kw Best Case D1+D2 kw Worst Case D1+D2 kw RTP($/kw)
Load Control Scenarios
Real-Time Pricing and
Electricity Deregulation
Estuarine Water Quality
3
4
5
6
7
8
9
10
8/21/93 0:30 8/22/93 0:30 8/23/93 0:30 8/24/93 0:30
Date and time
Dis
so
lve
d o
xy
ge
n (
mg
/L)
16
18
20
22
24
26
28
30
32
Te
mp
era
ture
(de
gre
e C
els
ius
)
Measured Neural Network BRANCH/BLTM
Water temperature
Dissolved oxygen
• Mixing - Tides, Flows from 3 Rivers
• Weather (T, P Dew Point)
• Point Discharge Wastewater
Treatment Plants
• Non-Point Discharges - rainfall, 50%
overbank storage
Pollution in Estuaries
High TideMean Tide
wastewater
discharges
non-point
from tidal
flooding
gauging
stations
non-point
from rain
Dissolved Oxygen
Concentration (mg/L)
Water-Temperature (deg. C)
Specific Conductivity (µ-siemens/cm)
Time (hours)
Water Level (feet)
2.8e4
0.4e4
1.6e4
32.0
30.5
29.0
27.5
10.4
5.6
8.0
5.0
3.5
6.5
2 months
raw signals
low frequency
broadband
6.2 hr
12.4 hr
25.6 & 24 hr
water level
dissolved oxygen
concentration
specific
conductivity
water temperature
8.2 hr
spectral
analysis
5.7
5.4
5.1
4.8
9.1
8.8
8.5
30.6
28.8
29.7
2.4e4
1.2e4
1.8e4
signal
decomposition
chaotic
component via
digital filtering
Dissolved Oxygen
Concentration (mg/L)
Water-Temperature (deg. C)
Specific Conductivity (µ-siemens/cm)
Time (hours)
Water Level (feet)
0
10,000
20,000
30,000
40,000
50,000
6/19/93 7/3/93 7/17/93 7/31/93 8/14/93 8/28/93 9/11/93
Date
SC
(m
icro
-sie
men
s/c
m)
2.5
3.0
3.5
4.0
4.5
5.0
5.5
6.0
6.5
DO
(m
g/L
)
SC DO
0.6
0.6e4
-0.3
-1.2
1.2
0.3
-0.6
-1.5
0.4
-0.2
-0.5
0.1
-0.6e4
0.0e4
signal
decomposition
high frequency,
periodic
components via
digital filtering
Dissolved Oxygen Concentration (mg/L)
Water-Temperature (deg. C)
Specific Conductivity (µ-siemens/cm)
Time (hours)
Water Level (feet)
Chaotic
and Low
Frequency
Data
ANN
DOt=0High
Frequency
Data
ANN
DO
final
SC’st
=x
WL’st=y
Chaotic and
Low
Frequency
Data
ANN
SC’stiDOt=0
Beaufort River Estuary
SS
SP
PI
-0.5
-0.3
-0.1
0.1
0.3
0.5
1 6 11 16 21 26 31 36 41 46 51 56 61 66
Sequential but Non-Consecutive Data Point Number
24
hr D
eri
va
tive
of
DO
A
(mg
/L)
-400
-200
0
200
400
600
800
So
uth
sid
e B
OD
(L
B/D
ay
)
EDOa6611
SSBOD_D
R = -0.357, R2 = 0.128
R2siso = 0.130 = 1
BOD Effect on DO
Inputs = WL, SC, TP, Rainfall, BOD
R2ANN = 0.57
N = 61 points
delt
a D
Os-D
Om
flow towards gauge
TP = 20 C
61 Points
increasing BOD
delta DOs-DOm
No Data
Groundwater ModelingUpper Floridian Aquifer
Well Histories
(18 years)
Surface
Contour
Well Locations
(100x100 miles)
Sub-Regions Behave Differently
350000
370000
390000
410000
430000
450000
470000
490000
2360000 2380000 2400000 2420000 2440000 2460000 2480000 2500000
25
x3
0 m
iles
Accuracy by Cluster
Actual
PredictionC1
History from April 1982 to October 1998
No
rma
lize
d W
ate
r
Le
ve
l a
bo
ve
Se
a L
eve
l
C3
C6
C10
Problem
• Area A is a coal bed methane field
• Data for 59 wells was compiled by petroleum engineering group in CO
to determine if an artificial neural network* (ANN) could be used to
predict total gas production for the life of each well.
0
5000
10000
15000
20000
25000
30000
0 5000 10000
Normalized UTMX (m)
No
rma
lize
d U
TM
Y (
m)
3D Range Model of
Area A (16x32 km
vertical scale ~ 200m)
Well
Locations
* A form of “machine learning” from AI.
North
North
Variable Types
• Static Variables
– e.g., depth, permeability
– for each well, treated like they do not vary in time
• in reality some probably do
– some measurements are just estimates with large error
– limited “information content”, e.g. one value per well
• Time Series, a.k.a. Signals
– e.g., water and gas production rates
– values vary in time
– large “information content”, dozens of values per well
– variability in time
• caused by the underlying process physics
• a detailed surrogate for an explicit description of the physics
• Synthetic Variables - are computed by equations or models
Static Variables
• Potential Model Inputs
– Geometric Variables - SURFace ELEVation, COAL
ELEVation, COAL DEPTH, COAL THICKness
– COAL PERMeability COAL POROsity - from cores,
logs, and engineering estimates
– KV CONFining - permeability in vertical direction
Water & Gas Time Series
Detail
59 Wells
Wells are similar, but
are different in detail
Mo
nth
ly G
as
Pro
du
ce
d (
MM
CF
)
Mo
nth
ly W
ate
r P
rod
uc
ed
(b
bl)
Production Month
Well 1
Well 2
Well 3
Synthetic Variables
• Potential Model Inputs
– GIP - estimated gas in place from geometric variables
• Model Outputs
– CGP 25 - cumulative gas production to 25 psi, estimated by reservoir
simulator using static variables
– CGP 50 - as for CGP 25, but to 50 psi.
CGP 50 vs. 25
R2 = 0.99
CGP 25 vs. GIP
R2 = 0.39
R2ANN = 0.52
“Model the Model”
• A computer model can be treated as a black box. The box “maps”
multiple inputs to an output.
• If, for all combinations of input values, the box computes a
continuously differentiable output, its map can be learned very
accurately by an ANN.
• A cause of non-differentiable output is switching by programmed logic
in model’s computer code.
• Discontinuous maps can be segmented by clustering, then modeled
using multiple ANN’s.
Reservoir
Simulator
x1
x2
x3
x4
x5
x6
y
inputs output
continuously
differentiable
output
x1
y
x5
Chaotic Systems
• Modeled using Phase Space Reconstruction
• Takens Theorem, univariate systems (1980)
– x(t) = F[x(t - p - d), x(t - p - 2d),…, x(t - p - nd)]– current state-of-the-art
– implies
• x can be predicted at time t from an optimal number n previous
measurements (n is called the “dimension”)
• n measurements spaced at optimal time delays nd produces an
optimal prediction
• Requirements/Options
– multivariate, variable delays - much better than Takens
– completely extracts “information content”
Phase Space Reconstruction
• Select d = 6 months by inspection (optimal delays can be computed
for larger data sets).
• MWA = 6-month moving window average of water and gas
production to remove high frequency variability.
0
5,000
10,000
15,000
20,000
25,000
0 6 12 18 24 30 36 42 48 54 60 66 72 78
Month of Production
Wa
ter
(BB
L/m
) &
Ga
s (
MC
F/m
) WATER MWA GAS MWA
One well’s
history
Model CGP by ANN
• Develop succession of models with longer histories as inputs, checking R2ANN
as you go.
• Point count N decreases with longer histories because some well histories are
less complete.
• R2ANN at 6 months = 0.68, 0.93 at 36 months…good enough!
• RMSE = 344 MCF at 36 months relative to 6500 MCF actual full scale (5%).
0
10
20
30
40
50
60
70
80
90
100
6 12 18 24 30 36
Month of Production
N (
po
int
co
un
t) &
R2
AN
N x
10
0
0
100
200
300
400
500
600
700
800
CU
M G
AS
25
Pre
dic
tio
n R
MS
E
(MM
CF
)
N
R2 x 100
RMSE
Results
Actual & Predicted using 6
months history, R2ANN = 0.68
Actual & Predicted using 36
months history, R2ANN = 0.93
CGP25 (BCF)CGP25 (BCF)
Pre
dic
ted
CG
P2
5 (
BC
F)
.
.
.
.
.
.
.
.
Pre
dic
ted
CG
P2
5 (
BC
F)
Prediction using 6
months of history.Prediction using 36
months of history.
Coal Gas Summary
• Time series have higher information content,
less noise than static variables.
• Phase Space Reconstruction
– leverages hidden physics manifest in time series
variability
– high accuracy
– extensible to other gas fields, collections of fields,
other problems, other domains
Conclusions
• Your process - room for improvement?
• Data Mining
– fast, powerful, decisive
– automates knowledge acquisition,
produces predictive models
– solves problems that are unsolvable by any
other means.
• Advanced visualization makes results
understandable to all.
Modeling Chaos
• Takens Theorem (1980), univariate systems
– x(t) = F[x(t - p - d), x(t - p - 2d),…, x(t - p - nd)]• each t represents a vector of measurements
• p = the “time delay” of the most recent measurement available
– implies a “prediction horizon”
• n and d = “dynamical invariants”
– analogous to amplitudes, periods, and phases in periodic systems
• n + 1 = “embedding dimension”
– the number of previous measurements
– implies an optimal number of previous measurements
– d = characteristic “time delay” of the attractor
– implies an optimal spacing in time for the previous measurements
• F is an arbitrary mapping function, e.g., “look up”, regression, or
ANN, whatever gives the best results
Modeling Chaos
• ADM, multivariate “Takens”
– y(t) = F[xi(t - pi), xi(t - pi - di),…, xi(t - pi - nidi)]• i, pi, ni, and di are dynamical invariants
• i = number of input variables
• xi = model input variables
– implies an optimal set
• pi = time delay of peak (optimal) correlation between y and
each xi
• ni + 1 = the embedding dimension of the attractor of each xi
• di = characteristic “time delay” of the attractor xi
• F is an arbitrary mapping function, generally an ANN
Modeling Chaos
• ADM generalization
– y(t) = F[xi(t - pi), xi(t - pi - dij),…, xi(t - pi - din)]• din replaces di, a variable delay
• For medium data sets (300 to 1000 vectors)
– y(t) = F[xi(t - pi), x’i(t - pi - dij),…, x’i(t - pi - din)]• replace xi with derivatives x’I to mitigate tendency of
aggressive regression techniques to overfit data
• also amplifies effects of small changes
• Dynamical Invariants computed by systematic
search