Process Data Mining
-
Upload
thesupplychainniche -
Category
Documents
-
view
289 -
download
5
Transcript of Process Data Mining
BASF IT ServicesBASF IT Services
Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 11 Monday, May 24, 2010Monday, May 24, 2010
Process Data MiningProcess Data Mining
Process Data MiningProcess Data Mining
Geoff C JonesGeoff C Jones
BASF IT ServicesBASF IT Services
Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 22 Monday, May 24, 2010Monday, May 24, 2010
Process Data MiningProcess Data Mining
Data Mining using modern techniquesData Mining using modern techniques
The truth is out there - in our plant data!The truth is out there - in our plant data!
But how can we find it, and how do we get to understand it?But how can we find it, and how do we get to understand it?
– Is my process running normally? Is my process running normally?
– Is that QC result correct for my operating conditions?Is that QC result correct for my operating conditions?
– Can I predict when that column needs cleaning?Can I predict when that column needs cleaning?
BASF IT ServicesBASF IT Services
Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 33 Monday, May 24, 2010Monday, May 24, 2010
Process Data MiningProcess Data Mining
What do we mean by Data Mining?What do we mean by Data Mining?
Knowledge discovery from human activity (‘information from data’)Knowledge discovery from human activity (‘information from data’)
– Consumer surveys, loyalty cards, credit historyConsumer surveys, loyalty cards, credit history
– Modelling behaviour - identifying relationships & trendsModelling behaviour - identifying relationships & trends
– Prediction of buying habits, risk, etcPrediction of buying habits, risk, etc
How does Process Data Mining differ?How does Process Data Mining differ?
Knowledge discovery from systems behaviourKnowledge discovery from systems behaviour
– Data capture (on-line process & QC data)Data capture (on-line process & QC data)
– Modelling of normal behaviour (exploratory data analysis)Modelling of normal behaviour (exploratory data analysis)
– Monitoring for abnormal behaviourMonitoring for abnormal behaviour
– Prediction of quality, yield, fouling, catalyst deactivation, etcPrediction of quality, yield, fouling, catalyst deactivation, etc
BASF IT ServicesBASF IT Services
Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 44 Monday, May 24, 2010Monday, May 24, 2010
Process Data MiningProcess Data Mining
How Can It Benefit My Business?How Can It Benefit My Business?
It enables the relationships between process and performance variables to be It enables the relationships between process and performance variables to be identified, that influence: identified, that influence:
Yields & usagesYields & usages
Processing times, speed, throughputProcessing times, speed, throughput
Product impurities Product impurities
Cost is low compared with the benefitsCost is low compared with the benefits
BASF IT ServicesBASF IT Services
Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 55 Monday, May 24, 2010Monday, May 24, 2010
Process Data MiningProcess Data Mining
How does it differ from Conventional Process Control & SPC?How does it differ from Conventional Process Control & SPC?
Closed-loop control:Closed-loop control:– deals with dynamic behaviourdeals with dynamic behaviour– requires sampling rate much higher than system frequency responserequires sampling rate much higher than system frequency response– requires cause-effect relationships to be known requires cause-effect relationships to be known
SPC:SPC:– (generally) deals with steady-state systems (generally) deals with steady-state systems – sampling rate determined by cost of data collection sampling rate determined by cost of data collection – identifies non-random system behaviour (variation due to assignable identifies non-random system behaviour (variation due to assignable
causes)causes)– usually configured for univariate monitoringusually configured for univariate monitoring
Process Data Mining:Process Data Mining:– identifies relationships and abnormalities in multivariate systems for identifies relationships and abnormalities in multivariate systems for
process optimisation and troubleshootingprocess optimisation and troubleshooting– three stages: exploration, modelling, validationthree stages: exploration, modelling, validation
BASF IT ServicesBASF IT Services
Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 66 Monday, May 24, 2010Monday, May 24, 2010
Process Data MiningProcess Data Mining
What do I need to achieve this?What do I need to achieve this?
For modelling normal behaviour:For modelling normal behaviour:– Access to historic time-stamped process & QC dataAccess to historic time-stamped process & QC data
example BASF Seal Sands plant:example BASF Seal Sands plant:– over 2 full years PI data for >11,000 variables, every 30 secondsover 2 full years PI data for >11,000 variables, every 30 seconds
– Some knowledge of what variables to monitorSome knowledge of what variables to monitor– Some expectation of what is to be achievedSome expectation of what is to be achieved– Understanding the dataUnderstanding the data
For monitoring abnormal behaviour:For monitoring abnormal behaviour:– A model of normal behaviour A model of normal behaviour – Real-time data accessReal-time data access– Software to identify causes of abnormal behaviourSoftware to identify causes of abnormal behaviour
BASF IT ServicesBASF IT Services
Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 77 Monday, May 24, 2010Monday, May 24, 2010
Process Data MiningProcess Data Mining
Integrated network access to historic time-stamped process & QC dataIntegrated network access to historic time-stamped process & QC data
Majority of process control systems (DCS & PLC) have interfaces to external Majority of process control systems (DCS & PLC) have interfaces to external systems:systems:
– normally offer ‘firewall’ security to prevent unauthorised write-backnormally offer ‘firewall’ security to prevent unauthorised write-back– normally offer network connectivitynormally offer network connectivity
Ethernet, TCP/IP, browser-configurableEthernet, TCP/IP, browser-configurable
Major data historian packages have standard interfaces to these systems:Major data historian packages have standard interfaces to these systems:– client - serverclient - server– data compression comes as standarddata compression comes as standard– can import QC datacan import QC data– clients run on same platform as business systemsclients run on same platform as business systems
GUI, Excel add-in, API & web browserGUI, Excel add-in, API & web browser
Data historian systems may be plant-dedicated, shared over a business WAN, Data historian systems may be plant-dedicated, shared over a business WAN, or shared over the Internet.or shared over the Internet.
BASF IT ServicesBASF IT Services
Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 88 Monday, May 24, 2010Monday, May 24, 2010
Process Data MiningProcess Data Mining
Customer’s Business LAN
Service Provider’s LAN
Internet
Internet FW
Internet FW
LAN Firewall
LAN Firewall
PI Server
Web Server
DMZ
DMZ
I/F Server
DCS
PC
Internet Service Delivery
Business Network Service Delivery
BASF IT ServicesBASF IT Services
Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 99 Monday, May 24, 2010Monday, May 24, 2010
Process Data MiningProcess Data Mining
Some expectation of what is to be achieved:Some expectation of what is to be achieved:
High Profile Targets:High Profile Targets:
– % Yield Improvement% Yield Improvement– % Give-away reduction% Give-away reduction– % Utility usage reduction% Utility usage reduction– % Rate or batch-speed increase% Rate or batch-speed increase
Side benefits:Side benefits:
– Reduced ‘Grey’ material when changing gradesReduced ‘Grey’ material when changing grades– Reduced downtimeReduced downtime– Reduced inventories awaiting analysis - sell from lineReduced inventories awaiting analysis - sell from line– On-line monitoring of catalyst deactivation and process foulingOn-line monitoring of catalyst deactivation and process fouling
BASF IT ServicesBASF IT Services
Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 1010 Monday, May 24, 2010Monday, May 24, 2010
Process Data MiningProcess Data Mining
ConstraintConstraint
Poor control
Improved control Move closer to constraint
Time
Pro
cess
va
riabl
e
Mean
££+
BASF IT ServicesBASF IT Services
Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 1111 Monday, May 24, 2010Monday, May 24, 2010
Process Data MiningProcess Data Mining
Quantifying the improvement:
BEFORE
AFTER
∆x
∆x = 1.65(σ − σ )old new
5% violation
BASF IT ServicesBASF IT Services
Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 1212 Monday, May 24, 2010Monday, May 24, 2010
Process Data MiningProcess Data Mining
Understanding the data:Understanding the data:
To deal with abnormalities in the data: To deal with abnormalities in the data: – Pre-ScreeningPre-Screening
VisualisationVisualisation Time-shift correctionTime-shift correctionMissing valuesMissing valuesOutliersOutliers
Exploratory data analysis:Exploratory data analysis:Projection Techniques (Chemometrics)Projection Techniques (Chemometrics)
– ClusteringClusteringPrincipal Component Analysis (PCA)Principal Component Analysis (PCA)
– RegressionRegressionLeast Squares (PLS)Least Squares (PLS)
BASF IT ServicesBASF IT Services
Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 1313 Monday, May 24, 2010Monday, May 24, 2010
Process Data MiningProcess Data Mining
Time-shift correctionTime-shift correction
Raw data
No identifiable relationship
Cross-correlation
Time-shifted cross-
correlation
Relationship now obvious
BASF IT ServicesBASF IT Services
Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 1414 Monday, May 24, 2010Monday, May 24, 2010
Process Data MiningProcess Data Mining
BASF IT ServicesBASF IT Services
Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 1515 Monday, May 24, 2010Monday, May 24, 2010
Process Data MiningProcess Data Mining
Sample with large Q (or SPE) Unusual variation outside the model
First PC
Second PC
Sample with large T2
Unusual variation inside the model
Variable 1Variable 2
Var
iabl
e 3
0
0
0
2
2
2
6
6
64
4
4
Ack to Eigenvector Research Inc.
BASF IT ServicesBASF IT Services
Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 1616 Monday, May 24, 2010Monday, May 24, 2010
Process Data MiningProcess Data Mining
Multivariate data analysis - correlation & clusteringMultivariate data analysis - correlation & clustering
Highly correlated data
Outlier
Outlier
BASF IT ServicesBASF IT Services
Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 1717 Monday, May 24, 2010Monday, May 24, 2010
Process Data MiningProcess Data Mining
BASF IT ServicesBASF IT Services
Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 1818 Monday, May 24, 2010Monday, May 24, 2010
Process Data MiningProcess Data Mining
Real-time ModellingReal-time Modelling
Modelling for monitoring:Modelling for monitoring:
– A subset of historic data is chosen which represents:A subset of historic data is chosen which represents: ‘‘common-cause’ process behaviour (no outliers)common-cause’ process behaviour (no outliers) the process operating within ‘control’the process operating within ‘control’
– The model will then be sensitive to outliers & out-of control dataThe model will then be sensitive to outliers & out-of control data
– Easier to say than it is to do:Easier to say than it is to do:
many processes are ‘multi-modal’many processes are ‘multi-modal’
– rate & grade changes - known as co-variates’rate & grade changes - known as co-variates’
– co-variates must be ‘orthogonalised’ first, to give model sensitivityco-variates must be ‘orthogonalised’ first, to give model sensitivity
BASF IT ServicesBASF IT Services
Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 1919 Monday, May 24, 2010Monday, May 24, 2010
Process Data MiningProcess Data Mining
Real-time modellingReal-time modelling
Modelling for prediction:Modelling for prediction:
– Can include ‘richer’ process data for better prediction away from normal Can include ‘richer’ process data for better prediction away from normal operation (but excluding outliers)operation (but excluding outliers)
– Must be validated on unseen dataMust be validated on unseen data
– Model drift is a problem, as processes change with time:Model drift is a problem, as processes change with time:
Difference between predicted & actual used for monitoring fouling & Difference between predicted & actual used for monitoring fouling & catalyst deactivationcatalyst deactivation
Can be used to validate QC dataCan be used to validate QC data
BASF IT ServicesBASF IT Services
Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 2020 Monday, May 24, 2010Monday, May 24, 2010
Process Data MiningProcess Data MiningA selection of available softwareA selection of available software
ExcelExcel– Data analysis add-in - Moving Average, StDev, MLRData analysis add-in - Moving Average, StDev, MLR– VBA programmable - on-line applications are possibleVBA programmable - on-line applications are possible
Matlab (statistics toolboxes including PCA & PLS) Matlab (statistics toolboxes including PCA & PLS) – http://www.ncl.ac.uk/inpact/– http://www.eigenvector.com/
can also be programmed for on-line monitoringcan also be programmed for on-line monitoring
MSPC+ (off & on-line - uses embedded PI database)MSPC+ (off & on-line - uses embedded PI database)– http://www.mdctech.com/mspc.htm
PPirouette (exploratory data analysis) & InStep (prediction)irouette (exploratory data analysis) & InStep (prediction)– http://www.infometrix.com/ (demo available to download)
SIMCA-P (modelling) & SIMCA 4000 (on-line)SIMCA-P (modelling) & SIMCA 4000 (on-line)– http://www.umetrics.com/ (demo available to download)
BASF IT ServicesBASF IT Services
Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 2121 Monday, May 24, 2010Monday, May 24, 2010
Process Data MiningProcess Data Mining
Thank you for
your attention