Zaptron, 1999 1
Industrial Diagnosis by Hyper Space Data Mining
Dr. Dongping (Daniel) Zhu Zaptron Systems, Inc.Mountain View, CA 94043Tel: 650-966-8700, Fax: 650-966-8780E-mail: [email protected]://www.zaptron.com
Presented atAAAI 99 Spring Symposiumon Equipment Diagnosis
Stanford UniversityMarch 23, 1999
Zaptron, 1999 2
OUTLINE
Diagnosis overview: applications & technologies
Hyperspace data miningDiagnostic examples
product quality control (steel making) resolve bottleneck (gasoline production) improve yield (chemical plan)
ConclusionsMasterMiner™ demo
Zaptron, 1999 3
Diagnosis &Trouble-Shooting
Cost of support to products/services Customer satisfaction Key Issues
how to best approach the same problem next time how to use history information - data mining how to update KB
Solutions on-line help web-based, remote diagnostics knowledge management tools data mining (history data are available)
Zaptron, 1999 4
A Web-based Diagnostic System
Data Collecting Mechanisms
Standardization
Data Management
D Mining KD(D+K) K Updating
Product Delivery Mechanisms
Trainingtools
Web-baseddiagnosis
On-lineHelp SW
Remote Repairs
FactorAnalysis
KBmanage
Call Centers Service Teams Support Teams
Zaptron, 1999 5
Rule-based Diagnostic Process
History Database
FaultPhysics
PrimaryCases
CauseAnalysis
Fix Fault Diagnose DiagnosticMatrix
SelfLearning
New data& Cases
Query
UpdateDatabase
Rule Base
Zaptron, 1999 6
Expert System Architecture Expert System Architecture
WebUsers
WebGUI
Interviewer (fi, hj)
K Collector (aijl, bikl)
KB Builder (Mijk)
Problem Solver (Search Engine)
Self Learner ijk
DataBase
{a, b}
KB
{Mij}
Analyzer, Visualizer
Zaptron, 1999 7
• Equipment and Processes• Sensors• Data• Databases• Data Models• Data Patterns (behavior in space)• Data Fusion, sensor fusion• Data Mining• Data ……
Evolution of Diagnostic Techniques
Zaptron, 1999 8
Data Mining: Techniques
• Correlation/association analysis
• Factor analysis• Trend prediction & forecasting• Neural networks• Genetic algorithms• Fuzzy logic, expert systems• Uncertainty reasoning (DS, rough sets)• Bayessian Networks• Hyper space data mining -
• find data pattern first• no model assumption• provide solutions to failure isolation/recognition
Zaptron, 1999 9
Introduction Diagnosis - An optimization problem A Hyper Space Technology Application Examples SW: MasterMiner™
Hyper Space Data Mining
Zaptron, 1999 10
A General IssueA General Issue
• For any system - find a model to describe
RelationRelationshipsships
NonlinearHigh noiseM-variant
(no model)
Operating data record
In situsensor report
Raw materials composition
Design/operatingprocess parameters
Failure & fault Bottle neck Energy use Cost/risk Quality Yield/returns Reliability Productivity
Zaptron, 1999 11
Data Pattern <--?--> Data Model
A Catch 21 Problem
Questions: what type of data to collect which data to use in modeling
Solution: Hyperspace data mining
Zaptron, 1999 12
Aluminum Production ProblemTarget: to Optimize the Leaching Rate of Al2O3
Factors: a1 - Fe/Al in the ore
a2 - Sodium Na/(Al2O3+Fe2O3))
a3 - leaching temperature
a4 - lime (CaO)/(SiO2-TiO2)
2 Solutions: Principal Component Analysis (PCA) by SAS JMP or RS/1 -
bad Hyperspace data mining by Zaptron MasterMiner™ - good
result
To Start - A Real Case
Zaptron, 1999 13
Can you see the pattern?
If not, do data mining to separate into subspaces
Zaptron, 1999 14
A Real Case - PCA Result: no separation
Zaptron, 1999 15
A Real Case - MasterMiner: good separation
Zaptron, 1999 16
MasterMiner 2nd step: complete separation
Zaptron, 1999 17
A Real Case - MasterMiner: build a model
Zaptron, 1999 18
Extrapolation to optimal zone for max yield
Steps in Data MiningSteps in Data Mining
Propose an optimaloperating conditionor new materials
Separability Test
Modeling (PH, MREC, ANN, GA)
Data Mining
Feature Selection
State diagnosis by using current operation data
Equations ascriteria for optimal control
Map description of cross-sections of normal op zone & failure zones
Pretreatment: local view, delete outliers
Feature reduction (entropy, voting)
History Data
Linearity, topological type, correlation, association, best matching point,NN points
Inequality, equations,PLS, sensitivity, advisory
Zaptron, 1999 19
Clustering - Data Clustering - Data SeparationSeparation
Inclusive(entropy)
Data Patterns
Data Mining
Data Base
Sandwich Exclusive One-sided
(voting)
PCA - projection in the max separable directionFisher: line projection with max distance between clustersMREC: projective geometry, better than either
Zaptron, 1999 20
Software ArchitectureSoftware Architecture
Genetic AlgorithmGenetic Algorithm
DataBaseDataBase
Artificial Neural NetsArtificial Neural Nets
Pattern RecognitinPattern Recognitin
KnowBaseKnowBaseGUIGUI
Zaptron, 1999 21
MasterMiner™ FunctionsMasterMiner™ Functions
Zaptron, 1999 22
MasterMiner™ ToolsMasterMiner™ Tools
• Data loading, editing, sorting, calculation• Preprocessing: statistics, Feature selection, folding• Factor analysis target-factor analysis factor-factor analysis• Projections
Fisher, LMAP, PCA, PLS, MREC• Modeling
envelope, auto-box, Sphere, KL, ANN (train, estimation, sensitivity)• Extrapolation PLS vector (linear), Simplex, appending,
Zaptron, 1999 23
Virtual Mining Tools for Virtual Mining Tools for Convex and concave spaceConvex and concave space
Virtual mining in hyper space• Hidden projection - tunnel model• Envelope - generate a convex polyhedron• Use “auto-box” for concave polyhedrons of samples• Interchange of data classes• Folding transform (to change data pattern in space)
Virtual mining of data samples• divide into multiple segments • convert concave polyhedron into convex ones• build the model for each subspace• separability went from 31% to 96% in one case
Zaptron, 1999 24
Virtual Mining Methods
(a) Tunnel model to separate data samples in hyper space
(b) The Envelop-Boxing method
(c) Generate convex polyhedronsfrom a concave one
Zaptron, 1999 25
Iterative Feature Selection/Reducton
Data pattern classified into 2 topological classes“one-sided class”“inclusive class”
Hidden projections appliedProjected factors are orthogonal in hyper space Feature selection method (highly effective):
Entropy method is used for inclusive pattern Voting method is used for one-sided pattern
Reduce features to reduce noise & complexity e.g., good result based on 5 features out of 500
Reduced feature set needs to pass Separation test
Zaptron, 1999 26
MREC - Map Recognition Method
MREC - Projection in the best direction, complete separation in 2 steps
PCA:No separation
Zaptron, 1999 27
We have Improved the We have Improved the Quality ofQuality of
alloy steelscarbon fiber reinforced, resin-based composite materialsBi2O3-containing High Tc superconductorsrare earth containing phosphorelectrode materials of Ni/H batteriesVPTC ceramic semi-conductorhigh temperature, SiC-based structural ceramicshigh-polymers: PVC, synthetic fiber & rubber, polyethylene, ...high energy materialssemi-conductor devicesMOCVD method of III-V compound film
Zaptron, 1999 28
We have applied We have applied MasterMiner™ to Industrial MasterMiner™ to Industrial Optimization & Diagnosis Optimization & Diagnosis
Petrochemical industry• distillation• hydro-cracking• vapor recovery• platinum reforming• delayed cooking• de-waxing• vinyl acetate• polypropylene• jet fuel (Union Oil recipe, yield 87% -> 94%, +6,000 ton/yr)• increase life of catalyst in polyvinyl plant (catalyst cost $1.2MM) • etc.
Zaptron, 1999 29
We have applied We have applied MasterMiner™ to Industrial MasterMiner™ to Industrial Optimization & DiagnosisOptimization & Diagnosis
Metallurgical Industry• blast furnace• casting• alloy steels quality improving (60% -> 80%)• energy saving in aluminum production
Automobile Industry• electro-plating• heat treatment
Chemical Industry• PVC, polyformaldhyde• butadiene rubber
Zaptron, 1999 30
Application AreasApplication Areas
EquipmentEquipmentProcessProcessDiagnosisDiagnosis
PetrochemicalPetrochemicalIndustryIndustry
MetallurgicaMetallurgicallIndustryIndustry
Semiconductor Semiconductor IndustryIndustry
Data Mining Data Mining
Process OptimizationProcess Optimization Materials DesignMaterials Design
GOAL: Optimal control of complex processes involving Heat transfer Mass transfer Fluid flow Chemical reactions
Zaptron, 1999 31
Pattern Recognition Pattern Recognition MethodsMethods
• Linear Regression (LS) - “forced fitting” LS fitting coefficients as model parameters, the “best wish”• PCA - principal component analysis projection in “best” direction, select two directions, LS• LMAP - linear mapping• NN - neural nets blind learning, over-fitting, forced fitting origin at cluster center, covered with an ellipsoidal, PCA• MREC - map recognition (non linear) polyhedrons, hidden projections, separation, back-mapping • NNREC - neural nets + MREC
Zaptron, 1999 32
Comparison of Various Comparison of Various MethodsMethods
CONDITION METHOD TO USE
1. (in some cases) Rule-based expert systemsMechanism known
2. (in 20% cases) Linear regression, statistical methodLinear w/o noise
3. (in most cases) Hyper-space data miningHighly noisyMulti-variantNon Gaussian
Zaptron, 1999 33
Why not Principle Component Analysis (PCA) ?
Principle Component Analysis (PCA) Data Mining by MasterMiner Linear nonlinear, Hierarchical Gaussian Non-Gaussian Low noise High noise Use all data in modeling Use subset of data in modeling 20 projections 2 projections
No separation good separation
Zaptron, 1999 34
Why not Least Square Only ?Why not Least Square Only ?
PLS applies when PRESS < 0.3 (1/4 of cases in our practice)
PROJECT PRESS (Error)synthetic rubber 0.2052 (can use PLS)steel plate for ship building 0.6419 (can not use PLS)rare earth phosphor 0.3067Baoshan Iron & Steel 0.3441Ni/H battery 0.7389Ni/H materials 0.1932propylene recovery (noisy data) 0.7755propylene recovery 0.3752solvent oil 0.3975VPTC 0.1330hydro-cracking plant 0.2055methanol production 0.8255casting for car 0.9157
Zaptron, 1999 35
Why not Neural Networks (GA) Only ?
Over-fitting problem by NN (GA) Industrial records are not complete e.g. Leaching rate problem at an aluminum Co. Leaching rate = f(a, b, c, T)
A cross-section of the optimal zone:• by ANN: too large• by our Yield Mater™: smaller
Wrong zone by ANN
Zone by MasterMiner
b
c
Zaptron, 1999 36
Applications in Diagnosis
• Equipment setup
• steel making (roller distance, • oil refinery (bottleneck in gasoline production)• chemical plans (cooling pipe length, inlet position)
• Process optimization• drug fermentation• environmental emission controls• materials manufacturing
Zaptron, 1999 37
E.g. 1 Steel Making
• German equipment, yield 10,000 tons/yr• Problem - “deep pressing” property • 100 = 5x20 factors in 5 stages• 2 major factors:
• N2 - Nitrogen content should be reduced• d1/d2 - distance ratio of cold rollers increased
• Benefit - wasted steel reduced by 5 times
ST14 steel platefor auto body
Blastingfurnace
Steelmaking
CastingHot
rollingCold
rolling
Zaptron, 1999 38
2nd issue: QC in ST14 Steel Plate Making2nd issue: QC in ST14 Steel Plate Making
O2 blower
Feed of Scrap, CaO, MgO, Iron Ore
Ladle
Zaptron, 1999 39
Problem BackgroundProblem Background
• After each batch, samples were taken in a 3-min test for QC
• Need to control the amount of O2 blown and scrap added
• Japanese case-based reasoning SW --> 65% separability• Problem: ST14 quality is off-spec• We used MasterMiner to build a model for QC• Target: FC (C content in steels, 17-30% by customer spec)• 13 Factors• Model built and used to control product quality• Result: 100% separability, products are on-spec
Zaptron, 1999 40
Feature SelectionFeature SelectionFeature selected PropertyLY age of O2 gun (years)PLH height of O2 gunDYSLT O2 amount (m3) before sampling DYCD C content at sampling time (10-2 %)DYTEMP liquid iron temperature when sampling (C°)PCAO amount of CaO usedPMGO amount of MgO addedPORE amount of iron ore addedWCH total charge of the converter in ton TOIRON total liquid ironSCAPT amount of scrapLDLIFE life of ladle used to transport liquid ironQO2 amount of O2 blown after sampling
Zaptron, 1999 41
114 Sample Data114 Sample Data
Zaptron, 1999 42
Target-Feature MapsTarget-Feature Maps
Zaptron, 1999 43
Data Separation by MasterMiner: Data Separation by MasterMiner: 100%100%
Zaptron, 1999 44
Data Separation by PCA: Data Separation by PCA: 30%30%
Zaptron, 1999 45
Feature Selection (1) Feature Selection (1) - Principle component - Principle component
regressionregression
Zaptron, 1999 46
Feature Selection (2) Feature Selection (2) - PLS (partial least square)- PLS (partial least square)
Zaptron, 1999 47
Feature Selection (3)Feature Selection (3)- KW method (linear)- KW method (linear)
Zaptron, 1999 48
Tunnel Models: 32 Tunnel Models: 32 InequalitiesInequalities
Zaptron, 1999 49
Quality Control IssueQuality Control Issue
• Solve the set of 32 equations• or use “appending” operation
• assign values to uncontrollable factors • add N random samples• project them onto the N-dimensional space• select those falling into the optimal space
• Results: The C content of ST14 products are on-specs
Zaptron, 1999 50
Add Random Samples (green)
Zaptron, 1999 51
E.g.2 Bottleneck in Gasoline Production
Problem: gasoline yield low diagnose thermal cracking
setup data mining method identify major factors diagnostic result:
the length of cooling coils is too short
Benefit: gasoline increased by 10,000 tons/yr
Cooling coil
DistillationTower
Crudeoil inlet
Jet fuel
Gasoline
Diesel
Naphtha
Heavy oil
Asphalt
heat
Zaptron, 1999 52
e.g. 3 Ethylbenzene Synthesis
FractionationTower
NaphthaInlet
Ethylbenzene
A Platinum Reforming Workshop
heat
Reactor
PlatinumCatalyst
Zaptron, 1999 53
Ethylbenzene Synthesis
Problem: yield lowData MiningDiagnostic result:
position of inlet is wrong
Action: move from layer 99 to 111
Benefit: yield raised by 35%
Zaptron, 1999 54
E.g. 4 Predictive Control of Chaotic Process
Product
Atomic collision
Materials
A
D
B
C
• Answer: No• Reason: Chaotic noises (Dr. Leon Chao of UC-Berkeley)• An historical story:
a butterfly in Thailand caused a hurricane in Florida!• Chaotic noises in chemical reactions: A -> B, C -> D
Zaptron, 1999 55
E.g. 4 Predictive Control of Chaotic Process
• A Real Case: quality control in PTC ceramic production• Problem: inconsistent (average) particle size (good rate: 60%)• Material used: ultra-fine Al2O2 powder• Chemical reaction: NaAlO2 + H2O --> Al(ON2)3 + NaOH• Process:
• add acid or base to control the above induction process• or change the cooling rate• heated Al(ON)3 powder formed • distribution of the particle size - near Gaussian• Al2O3 powder formed
•
Zaptron, 1999 56
E.g. 4 Predictive Control of Chaotic Process
• Discovery:use a violet light, the transparency is varying from batch to batch
Time
Violet Transparency
1001 2 3
Al2O3
Violet Light 2800 å
Transparencymeasure
Zaptron, 1999 57
E.g. 4 Predictive Control of Chaotic Process • Analysis: chaotic noises do have patterns by DataMaster™
• Practical Solution: • measure the resistance curveof a Al2O3 block being formed• predict the product quality 30 min before finishing • change the cooling rate to control the final at 60 min
• Result: quality increased from 60% to 100% in 500 experiments
1350
0time (min)30
Temperature (C°)
t06030
Zaptron, 1999 58
If Linear (near linear) must have “one-sided” pattern use LS - “the best wish” extrapolate by accurate model-based
predictionIf Nonlinear
if one-sided patternuse Fisher methodextrapolate by principal components
if inclusive pattern use MRECextrapolation by Simplex
Conclusion
Zaptron, 1999 59
1997, L. Zadeh: “What is important about soft computing is that FL, NN, GA & PCA are synergistic rather than competitive.”
In agreement with our experienceData do have patternsDifferent patterns need different methodsSeveral methods need to be integratedNew data mining technologies developed
Conclusion: Integrated Solution
Zaptron, 1999 60
Economic Benefit Economic Benefit GeneratedGenerated
Factory Application Benefit (USD)
A Petroleum Co. yield increased: jet fuel, 3.5million/2 years
gas solvent, oil, propylene, xylene
A Petrochem Refinery yield increased: 1.2 million/year gasoline, wax products
An Iron & Steels . Yield increased: 3 million/year alloy steels for ships
Total profit 7.5 million/year
Ratio of cost to profit in 5 years: 1:100
Zaptron, 1999 61
MasterMiner™ SoftwareMasterMiner™ Software
• Desktop application software• Run on Window95/NT• Software demo download
http://www.zaptron.com/masterminer• Examples:
Zaptron, 1999 62
4-D Maps for Control
Zaptron, 1999 63
Test Samples Added
Zaptron, 1999 64
Announcement2nd International Conference onInformation Fusion -- FUSION’99
July 6 -8, 1999
Sunnyvale HiltonSilicon Valley, California, USA
abstract due: Feb 1, 1999http://www.inforfusion.org/fusion99
Sponsored byInternational Society of Information Fusion
NASA, AROIEEE Signal Processing Society
IEEE Robotics and Automation SocietyIEEE Control Systems Society
Special Session on Diagnostic Information Fusion
Zaptron, 1999 65
Thank You !
Zaptron
VIP
Top Related