BIG DATA NEEDS DESIGN OF EXPERIMENTS · BIG DATA NEEDS DOE Why the interest: recent client...
Transcript of BIG DATA NEEDS DESIGN OF EXPERIMENTS · BIG DATA NEEDS DOE Why the interest: recent client...
BIG DATA NEEDS DOE
BIG DATA NEEDS DESIGN OF
EXPERIMENTS
BIG DATA NEEDS DOE
Learning Points from Today
1. Why the interest in “big data”?
2. Some realities
3. What is DOE?
4. Benefits of DOE
5. DOE case Studies
6. Why big data needs DOE?
BIG DATA NEEDS DOE
The Age of Analytics: competing in a data
driven worldBy embracing the data and
analytics revolution, McKinsey
discovered:
– reduce product
development costs by 50%
– lower operating costs by
25%
– increase gross margin by
up to 30%
The Age of Analytics: Competing
in a Data Driven World
BIG DATA NEEDS DOE
According to McKinsey
Success Requires
• People with data savvy, industry and
functional expertise
– critical to generating informed data driven
discoveries
• Ability to distil and bring analytical insights
to life through visualization
– necessary to help decision makers
understand data analytic findings
BIG DATA NEEDS DOE
Why the interest: recent client conversations
• Growing realisation at the executive level that a
company’s current and historical data is a key,
yet underutilised asset.
• Desire to make better use of data assets to drive
better informed development and manufacturing
decisions.
• Wary of big infrastructure projects as a
foundation to getting started.
• Need an approach that augments current
scientific and engineering approaches.
Data augmented intelligence is the goal
BIG DATA NEEDS DOE
Big Data
Big data is an all encompassing term for any
collection of data sets so large and complex
that it becomes difficult to process using
traditional data processing applications.
(Wikipedia)
Is it big data or big data methods we should be talking about?
BIG DATA NEEDS DOE
Which path to take?
• Big data, predictive analytics, machine
learning, artificial intelligence, internet of
things, industry 4.0
• Getting conjoined into a common
association
• Offering the promise of automated
learning
• Some unrealistic expectations
BIG DATA NEEDS DOE
Fayyad Axiom
“Your organization’s data warehouse will not
have the data you need to build your
predictive model”
Paraphrased, Usama Fayyad, Chief Data Officer, Barclays,
https://en.wikipedia.org/wiki/Usama_Fayyad
9/28/16!9/28/16!
HereistheproblemwithBigDataandPredic/ veModeling
• Paraphrased,UsamaFayyad,ChiefDataOfficer,Barclays,cofounderofKDDconferences
• IcallthistheFayyadaxiom.
“Your organization’s data warehouse will not have the information you need to build your predictive model.”
BIG DATA NEEDS DOE
Why could that be?
Smaller People More At Risk of Heart Attack
The Lurking Variable Problem: Correlated and Unmeasured Variables
BIG DATA NEEDS DOE
Lurking Variables
• Just because a variable predicts an
outcome, it does not mean it causes the
outcome
• The cause could be some other
predictor(s) that are measured or
unmeasured and are correlated with the
selected predictor
BIG DATA NEEDS DOE
Restricted Factor Ranges
“If you want to know what happens if a factor
changes, you must actually change it.”
George Box, Early Pioneer of DOE
BIG DATA NEEDS DOE
Prediction vs Causality?
• Are the relevant predictors measured?
• Are the predictors varied in a way that
enables us to assess the impact they have
on our responses independently of the
effects of other predictors?
• What information can we realistically
expect to achieve with data mining or
machine learning approaches if answer is
no or uncertain.
BIG DATA NEEDS DOE
Big Data Summary
• Can deliver useful models that predict
outcomes
• But your data may have limitations
– Lurking (potentially important) variables are
not measured
– Variables are varied together
– Other variables may not be varied at all
• To understand what is truly driving a
problem you may need DOE
BIG DATA NEEDS DOE
Use DOE when the answer is not in your data
• In a recent interview Per Vase,
Managing Partner at NNE, stated:
“poor understanding leads to poor
predictability which leads to missed
milestones in R&D; and results in
yield, quality, or compliance issues in
manufacturing.”
• He further stated: “DOE is the
solution to breaking this cycle.”
• See his interview at:
https://www.jmp.com/en_us/events/o
ndemand/analytically-
speaking/operational-excellence-in-
pharmaceutical-manufacturing.html
BIG DATA NEEDS DOE
What is DoE?
New data collection plan that maximises information by efficiently exploring all possibilities
BIG DATA NEEDS DOE
The Father of DoE – Sir Ronald A. Fisher
Rothamstead Experimental Station, England – Early 1920’s
BIG DATA NEEDS DOE
Cornell University, http://usda.mannlib.cornell.edu/MannUsda
Factorial
Fractional
Factorial
RSM
Mixture
Optimal
Taguchi
Non-linear
Choice
Space Filling
Accelerated
Life Test
Covering Arrays
Definitive
Screening
DOE has a rich evolution
BIG DATA NEEDS DOE
DOE efficiently generates the new data you
will need to determine cause and effects
• Highly efficient
• Increases reliability of data
• Removes blind-spots in understanding
• Reduces uncertainty in decisions
• Enables more productive-work to be done with the same
resources
BIG DATA NEEDS DOE
DOE CASE STUDIES
BIG DATA NEEDS DOE
Case study: new process comes back to
R&D to be fixedBEFORE JMP
• A working manufacturing process was developed
• After launch, process time steadily got worse
• Manufacturing capacity was cut in half, and the cycle time doubled
• Came back to R&D to be fixed
• Poor understanding of what really affects process (milling) time
BIG DATA NEEDS DOE
Determining the experiments needed for
efficient process know-how
WITH JMP
• Defined an efficient data collection plan to learn more
• Presented results graphically
• Quickly identified actionable root causes of poor milling
results
• Generated full, not partial, understanding for the future
• Solved this problem faster and more cheaply than without
JMP
BIG DATA NEEDS DOE
Fixing problem for good with JMP
Deteriorating performance over time
Potential causes Identified with Data Mining
Verification runs confirmed gain
Efficient experimentation with top potential causes gave new
process know-how at minimum cost
JMP eliminated future waste by quickly identifying possible root causes and
providing a way to efficiently identify and verify a lasting solution
Doubled production capacity with significantly lower R&D effort and time,
compared to not using JMP
BIG DATA NEEDS DOE
A Way Forward
• Be realistic
• Exploit what data you have
• Be aware of limitations of current data
• Be prepared to experiment or collect new
data
• Use common sense
BIG DATA NEEDS DOE
Integrated approach is best
Acceptance
Sampling
SQC
SPC
APC
Statistics
Predictive Modelling
Data Mining
Machine Learning
Artificial Intelligence
Design
of
Experiments
Your toolset needs more than big data methods to efficiently
augment scientific and engineering learning with data
BIG DATA NEEDS DOE
JMP Statistical Discovery
Enables scientists and engineers to supplement their industry
and functional expertise with informed data driven discoveries
Distilling and bringing your analytical insights to life through dynamic
visualizations to help decision makers understand your findings
Predictive Modeling
BIG DATA NEEDS DOE
References
• The Age of Analytics: Competing in a Data
Driven World, McKinsey & Co
• Big Data & Industry 4.0 in the Chemistry using
Industries Stan Higgins OBE, Non-Executive
Director at Industrial Technology Systems Ltd
(ITS Ltd)
• DOE is the future optimal, Chris Nachtsheim
University of Minnesota Carlson School of
Management and School of Statistics
BIG DATA NEEDS DOE
QUESTIONS