Process Data Mining

21
BASF IT Services BASF IT Services Supply Chain Management - Production Systems Supply Chain Management - Production Systems Page Page 1 Monday, May 24, 2010 Monday, May 24, 2010 Process Data Mining Process Data Mining Process Data Mining Process Data Mining Geoff C Jones Geoff C Jones

Transcript of Process Data Mining

Page 1: Process Data Mining

BASF IT ServicesBASF IT Services

Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 11 Monday, May 24, 2010Monday, May 24, 2010

Process Data MiningProcess Data Mining

Process Data MiningProcess Data Mining

Geoff C JonesGeoff C Jones

Page 2: Process Data Mining

BASF IT ServicesBASF IT Services

Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 22 Monday, May 24, 2010Monday, May 24, 2010

Process Data MiningProcess Data Mining

Data Mining using modern techniquesData Mining using modern techniques

The truth is out there - in our plant data!The truth is out there - in our plant data!

But how can we find it, and how do we get to understand it?But how can we find it, and how do we get to understand it?

– Is my process running normally? Is my process running normally?

– Is that QC result correct for my operating conditions?Is that QC result correct for my operating conditions?

– Can I predict when that column needs cleaning?Can I predict when that column needs cleaning?

Page 3: Process Data Mining

BASF IT ServicesBASF IT Services

Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 33 Monday, May 24, 2010Monday, May 24, 2010

Process Data MiningProcess Data Mining

What do we mean by Data Mining?What do we mean by Data Mining?

Knowledge discovery from human activity (‘information from data’)Knowledge discovery from human activity (‘information from data’)

– Consumer surveys, loyalty cards, credit historyConsumer surveys, loyalty cards, credit history

– Modelling behaviour - identifying relationships & trendsModelling behaviour - identifying relationships & trends

– Prediction of buying habits, risk, etcPrediction of buying habits, risk, etc

How does Process Data Mining differ?How does Process Data Mining differ?

Knowledge discovery from systems behaviourKnowledge discovery from systems behaviour

– Data capture (on-line process & QC data)Data capture (on-line process & QC data)

– Modelling of normal behaviour (exploratory data analysis)Modelling of normal behaviour (exploratory data analysis)

– Monitoring for abnormal behaviourMonitoring for abnormal behaviour

– Prediction of quality, yield, fouling, catalyst deactivation, etcPrediction of quality, yield, fouling, catalyst deactivation, etc

Page 4: Process Data Mining

BASF IT ServicesBASF IT Services

Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 44 Monday, May 24, 2010Monday, May 24, 2010

Process Data MiningProcess Data Mining

How Can It Benefit My Business?How Can It Benefit My Business?

It enables the relationships between process and performance variables to be It enables the relationships between process and performance variables to be identified, that influence: identified, that influence:

Yields & usagesYields & usages

Processing times, speed, throughputProcessing times, speed, throughput

Product impurities Product impurities

Cost is low compared with the benefitsCost is low compared with the benefits

Page 5: Process Data Mining

BASF IT ServicesBASF IT Services

Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 55 Monday, May 24, 2010Monday, May 24, 2010

Process Data MiningProcess Data Mining

How does it differ from Conventional Process Control & SPC?How does it differ from Conventional Process Control & SPC?

Closed-loop control:Closed-loop control:– deals with dynamic behaviourdeals with dynamic behaviour– requires sampling rate much higher than system frequency responserequires sampling rate much higher than system frequency response– requires cause-effect relationships to be known requires cause-effect relationships to be known

SPC:SPC:– (generally) deals with steady-state systems (generally) deals with steady-state systems – sampling rate determined by cost of data collection sampling rate determined by cost of data collection – identifies non-random system behaviour (variation due to assignable identifies non-random system behaviour (variation due to assignable

causes)causes)– usually configured for univariate monitoringusually configured for univariate monitoring

Process Data Mining:Process Data Mining:– identifies relationships and abnormalities in multivariate systems for identifies relationships and abnormalities in multivariate systems for

process optimisation and troubleshootingprocess optimisation and troubleshooting– three stages: exploration, modelling, validationthree stages: exploration, modelling, validation

Page 6: Process Data Mining

BASF IT ServicesBASF IT Services

Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 66 Monday, May 24, 2010Monday, May 24, 2010

Process Data MiningProcess Data Mining

What do I need to achieve this?What do I need to achieve this?

For modelling normal behaviour:For modelling normal behaviour:– Access to historic time-stamped process & QC dataAccess to historic time-stamped process & QC data

example BASF Seal Sands plant:example BASF Seal Sands plant:– over 2 full years PI data for >11,000 variables, every 30 secondsover 2 full years PI data for >11,000 variables, every 30 seconds

– Some knowledge of what variables to monitorSome knowledge of what variables to monitor– Some expectation of what is to be achievedSome expectation of what is to be achieved– Understanding the dataUnderstanding the data

For monitoring abnormal behaviour:For monitoring abnormal behaviour:– A model of normal behaviour A model of normal behaviour – Real-time data accessReal-time data access– Software to identify causes of abnormal behaviourSoftware to identify causes of abnormal behaviour

Page 7: Process Data Mining

BASF IT ServicesBASF IT Services

Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 77 Monday, May 24, 2010Monday, May 24, 2010

Process Data MiningProcess Data Mining

Integrated network access to historic time-stamped process & QC dataIntegrated network access to historic time-stamped process & QC data

Majority of process control systems (DCS & PLC) have interfaces to external Majority of process control systems (DCS & PLC) have interfaces to external systems:systems:

– normally offer ‘firewall’ security to prevent unauthorised write-backnormally offer ‘firewall’ security to prevent unauthorised write-back– normally offer network connectivitynormally offer network connectivity

Ethernet, TCP/IP, browser-configurableEthernet, TCP/IP, browser-configurable

Major data historian packages have standard interfaces to these systems:Major data historian packages have standard interfaces to these systems:– client - serverclient - server– data compression comes as standarddata compression comes as standard– can import QC datacan import QC data– clients run on same platform as business systemsclients run on same platform as business systems

GUI, Excel add-in, API & web browserGUI, Excel add-in, API & web browser

Data historian systems may be plant-dedicated, shared over a business WAN, Data historian systems may be plant-dedicated, shared over a business WAN, or shared over the Internet.or shared over the Internet.

Page 8: Process Data Mining

BASF IT ServicesBASF IT Services

Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 88 Monday, May 24, 2010Monday, May 24, 2010

Process Data MiningProcess Data Mining

Customer’s Business LAN

Service Provider’s LAN

Internet

Internet FW

Internet FW

LAN Firewall

LAN Firewall

PI Server

Web Server

DMZ

DMZ

I/F Server

DCS

PC

Internet Service Delivery

Business Network Service Delivery

Page 9: Process Data Mining

BASF IT ServicesBASF IT Services

Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 99 Monday, May 24, 2010Monday, May 24, 2010

Process Data MiningProcess Data Mining

Some expectation of what is to be achieved:Some expectation of what is to be achieved:

High Profile Targets:High Profile Targets:

– % Yield Improvement% Yield Improvement– % Give-away reduction% Give-away reduction– % Utility usage reduction% Utility usage reduction– % Rate or batch-speed increase% Rate or batch-speed increase

Side benefits:Side benefits:

– Reduced ‘Grey’ material when changing gradesReduced ‘Grey’ material when changing grades– Reduced downtimeReduced downtime– Reduced inventories awaiting analysis - sell from lineReduced inventories awaiting analysis - sell from line– On-line monitoring of catalyst deactivation and process foulingOn-line monitoring of catalyst deactivation and process fouling

Page 10: Process Data Mining

BASF IT ServicesBASF IT Services

Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 1010 Monday, May 24, 2010Monday, May 24, 2010

Process Data MiningProcess Data Mining

ConstraintConstraint

Poor control

Improved control Move closer to constraint

Time

Pro

cess

va

riabl

e

Mean

££+

Page 11: Process Data Mining

BASF IT ServicesBASF IT Services

Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 1111 Monday, May 24, 2010Monday, May 24, 2010

Process Data MiningProcess Data Mining

Quantifying the improvement:

BEFORE

AFTER

∆x

∆x = 1.65(σ − σ )old new

5% violation

Page 12: Process Data Mining

BASF IT ServicesBASF IT Services

Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 1212 Monday, May 24, 2010Monday, May 24, 2010

Process Data MiningProcess Data Mining

Understanding the data:Understanding the data:

To deal with abnormalities in the data: To deal with abnormalities in the data: – Pre-ScreeningPre-Screening

VisualisationVisualisation Time-shift correctionTime-shift correctionMissing valuesMissing valuesOutliersOutliers

Exploratory data analysis:Exploratory data analysis:Projection Techniques (Chemometrics)Projection Techniques (Chemometrics)

– ClusteringClusteringPrincipal Component Analysis (PCA)Principal Component Analysis (PCA)

– RegressionRegressionLeast Squares (PLS)Least Squares (PLS)

Page 13: Process Data Mining

BASF IT ServicesBASF IT Services

Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 1313 Monday, May 24, 2010Monday, May 24, 2010

Process Data MiningProcess Data Mining

Time-shift correctionTime-shift correction

Raw data

No identifiable relationship

Cross-correlation

Time-shifted cross-

correlation

Relationship now obvious

Page 14: Process Data Mining

BASF IT ServicesBASF IT Services

Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 1414 Monday, May 24, 2010Monday, May 24, 2010

Process Data MiningProcess Data Mining

Page 15: Process Data Mining

BASF IT ServicesBASF IT Services

Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 1515 Monday, May 24, 2010Monday, May 24, 2010

Process Data MiningProcess Data Mining

Sample with large Q (or SPE) Unusual variation outside the model

First PC

Second PC

Sample with large T2

Unusual variation inside the model

Variable 1Variable 2

Var

iabl

e 3

0

0

0

2

2

2

6

6

64

4

4

Ack to Eigenvector Research Inc.

Page 16: Process Data Mining

BASF IT ServicesBASF IT Services

Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 1616 Monday, May 24, 2010Monday, May 24, 2010

Process Data MiningProcess Data Mining

Multivariate data analysis - correlation & clusteringMultivariate data analysis - correlation & clustering

Highly correlated data

Outlier

Outlier

Page 17: Process Data Mining

BASF IT ServicesBASF IT Services

Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 1717 Monday, May 24, 2010Monday, May 24, 2010

Process Data MiningProcess Data Mining

Page 18: Process Data Mining

BASF IT ServicesBASF IT Services

Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 1818 Monday, May 24, 2010Monday, May 24, 2010

Process Data MiningProcess Data Mining

Real-time ModellingReal-time Modelling

Modelling for monitoring:Modelling for monitoring:

– A subset of historic data is chosen which represents:A subset of historic data is chosen which represents: ‘‘common-cause’ process behaviour (no outliers)common-cause’ process behaviour (no outliers) the process operating within ‘control’the process operating within ‘control’

– The model will then be sensitive to outliers & out-of control dataThe model will then be sensitive to outliers & out-of control data

– Easier to say than it is to do:Easier to say than it is to do:

many processes are ‘multi-modal’many processes are ‘multi-modal’

– rate & grade changes - known as co-variates’rate & grade changes - known as co-variates’

– co-variates must be ‘orthogonalised’ first, to give model sensitivityco-variates must be ‘orthogonalised’ first, to give model sensitivity

Page 19: Process Data Mining

BASF IT ServicesBASF IT Services

Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 1919 Monday, May 24, 2010Monday, May 24, 2010

Process Data MiningProcess Data Mining

Real-time modellingReal-time modelling

Modelling for prediction:Modelling for prediction:

– Can include ‘richer’ process data for better prediction away from normal Can include ‘richer’ process data for better prediction away from normal operation (but excluding outliers)operation (but excluding outliers)

– Must be validated on unseen dataMust be validated on unseen data

– Model drift is a problem, as processes change with time:Model drift is a problem, as processes change with time:

Difference between predicted & actual used for monitoring fouling & Difference between predicted & actual used for monitoring fouling & catalyst deactivationcatalyst deactivation

Can be used to validate QC dataCan be used to validate QC data

Page 20: Process Data Mining

BASF IT ServicesBASF IT Services

Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 2020 Monday, May 24, 2010Monday, May 24, 2010

Process Data MiningProcess Data MiningA selection of available softwareA selection of available software

ExcelExcel– Data analysis add-in - Moving Average, StDev, MLRData analysis add-in - Moving Average, StDev, MLR– VBA programmable - on-line applications are possibleVBA programmable - on-line applications are possible

Matlab (statistics toolboxes including PCA & PLS) Matlab (statistics toolboxes including PCA & PLS) – http://www.ncl.ac.uk/inpact/– http://www.eigenvector.com/

can also be programmed for on-line monitoringcan also be programmed for on-line monitoring

MSPC+ (off & on-line - uses embedded PI database)MSPC+ (off & on-line - uses embedded PI database)– http://www.mdctech.com/mspc.htm

PPirouette (exploratory data analysis) & InStep (prediction)irouette (exploratory data analysis) & InStep (prediction)– http://www.infometrix.com/ (demo available to download)

SIMCA-P (modelling) & SIMCA 4000 (on-line)SIMCA-P (modelling) & SIMCA 4000 (on-line)– http://www.umetrics.com/ (demo available to download)

Page 21: Process Data Mining

BASF IT ServicesBASF IT Services

Supply Chain Management - Production SystemsSupply Chain Management - Production Systems Page Page 2121 Monday, May 24, 2010Monday, May 24, 2010

Process Data MiningProcess Data Mining

Thank you for

your attention