BIG DATA NEEDS DESIGN OF EXPERIMENTS · BIG DATA NEEDS DOE Why the interest: recent client...

27
BIG DATA NEEDS DOE BIG DATA NEEDS DESIGN OF EXPERIMENTS

Transcript of BIG DATA NEEDS DESIGN OF EXPERIMENTS · BIG DATA NEEDS DOE Why the interest: recent client...

Page 1: BIG DATA NEEDS DESIGN OF EXPERIMENTS · BIG DATA NEEDS DOE Why the interest: recent client conversations •Growing realisation at the executive level that a company’s current and

BIG DATA NEEDS DOE

BIG DATA NEEDS DESIGN OF

EXPERIMENTS

Page 2: BIG DATA NEEDS DESIGN OF EXPERIMENTS · BIG DATA NEEDS DOE Why the interest: recent client conversations •Growing realisation at the executive level that a company’s current and

BIG DATA NEEDS DOE

Learning Points from Today

1. Why the interest in “big data”?

2. Some realities

3. What is DOE?

4. Benefits of DOE

5. DOE case Studies

6. Why big data needs DOE?

Page 3: BIG DATA NEEDS DESIGN OF EXPERIMENTS · BIG DATA NEEDS DOE Why the interest: recent client conversations •Growing realisation at the executive level that a company’s current and

BIG DATA NEEDS DOE

The Age of Analytics: competing in a data

driven worldBy embracing the data and

analytics revolution, McKinsey

discovered:

– reduce product

development costs by 50%

– lower operating costs by

25%

– increase gross margin by

up to 30%

The Age of Analytics: Competing

in a Data Driven World

Page 4: BIG DATA NEEDS DESIGN OF EXPERIMENTS · BIG DATA NEEDS DOE Why the interest: recent client conversations •Growing realisation at the executive level that a company’s current and

BIG DATA NEEDS DOE

According to McKinsey

Success Requires

• People with data savvy, industry and

functional expertise

– critical to generating informed data driven

discoveries

• Ability to distil and bring analytical insights

to life through visualization

– necessary to help decision makers

understand data analytic findings

Page 5: BIG DATA NEEDS DESIGN OF EXPERIMENTS · BIG DATA NEEDS DOE Why the interest: recent client conversations •Growing realisation at the executive level that a company’s current and

BIG DATA NEEDS DOE

Why the interest: recent client conversations

• Growing realisation at the executive level that a

company’s current and historical data is a key,

yet underutilised asset.

• Desire to make better use of data assets to drive

better informed development and manufacturing

decisions.

• Wary of big infrastructure projects as a

foundation to getting started.

• Need an approach that augments current

scientific and engineering approaches.

Data augmented intelligence is the goal

Page 6: BIG DATA NEEDS DESIGN OF EXPERIMENTS · BIG DATA NEEDS DOE Why the interest: recent client conversations •Growing realisation at the executive level that a company’s current and

BIG DATA NEEDS DOE

Big Data

Big data is an all encompassing term for any

collection of data sets so large and complex

that it becomes difficult to process using

traditional data processing applications.

(Wikipedia)

Is it big data or big data methods we should be talking about?

Page 7: BIG DATA NEEDS DESIGN OF EXPERIMENTS · BIG DATA NEEDS DOE Why the interest: recent client conversations •Growing realisation at the executive level that a company’s current and

BIG DATA NEEDS DOE

Which path to take?

• Big data, predictive analytics, machine

learning, artificial intelligence, internet of

things, industry 4.0

• Getting conjoined into a common

association

• Offering the promise of automated

learning

• Some unrealistic expectations

Page 8: BIG DATA NEEDS DESIGN OF EXPERIMENTS · BIG DATA NEEDS DOE Why the interest: recent client conversations •Growing realisation at the executive level that a company’s current and

BIG DATA NEEDS DOE

Fayyad Axiom

“Your organization’s data warehouse will not

have the data you need to build your

predictive model”

Paraphrased, Usama Fayyad, Chief Data Officer, Barclays,

https://en.wikipedia.org/wiki/Usama_Fayyad

9/28/16!9/28/16!

HereistheproblemwithBigDataandPredic/ veModeling

• Paraphrased,UsamaFayyad,ChiefDataOfficer,Barclays,cofounderofKDDconferences

• IcallthistheFayyadaxiom.

“Your organization’s data warehouse will not have the information you need to build your predictive model.”

Page 9: BIG DATA NEEDS DESIGN OF EXPERIMENTS · BIG DATA NEEDS DOE Why the interest: recent client conversations •Growing realisation at the executive level that a company’s current and

BIG DATA NEEDS DOE

Why could that be?

Smaller People More At Risk of Heart Attack

The Lurking Variable Problem: Correlated and Unmeasured Variables

Page 10: BIG DATA NEEDS DESIGN OF EXPERIMENTS · BIG DATA NEEDS DOE Why the interest: recent client conversations •Growing realisation at the executive level that a company’s current and

BIG DATA NEEDS DOE

Lurking Variables

• Just because a variable predicts an

outcome, it does not mean it causes the

outcome

• The cause could be some other

predictor(s) that are measured or

unmeasured and are correlated with the

selected predictor

Page 11: BIG DATA NEEDS DESIGN OF EXPERIMENTS · BIG DATA NEEDS DOE Why the interest: recent client conversations •Growing realisation at the executive level that a company’s current and

BIG DATA NEEDS DOE

Restricted Factor Ranges

“If you want to know what happens if a factor

changes, you must actually change it.”

George Box, Early Pioneer of DOE

Page 12: BIG DATA NEEDS DESIGN OF EXPERIMENTS · BIG DATA NEEDS DOE Why the interest: recent client conversations •Growing realisation at the executive level that a company’s current and

BIG DATA NEEDS DOE

Prediction vs Causality?

• Are the relevant predictors measured?

• Are the predictors varied in a way that

enables us to assess the impact they have

on our responses independently of the

effects of other predictors?

• What information can we realistically

expect to achieve with data mining or

machine learning approaches if answer is

no or uncertain.

Page 13: BIG DATA NEEDS DESIGN OF EXPERIMENTS · BIG DATA NEEDS DOE Why the interest: recent client conversations •Growing realisation at the executive level that a company’s current and

BIG DATA NEEDS DOE

Big Data Summary

• Can deliver useful models that predict

outcomes

• But your data may have limitations

– Lurking (potentially important) variables are

not measured

– Variables are varied together

– Other variables may not be varied at all

• To understand what is truly driving a

problem you may need DOE

Page 14: BIG DATA NEEDS DESIGN OF EXPERIMENTS · BIG DATA NEEDS DOE Why the interest: recent client conversations •Growing realisation at the executive level that a company’s current and

BIG DATA NEEDS DOE

Use DOE when the answer is not in your data

• In a recent interview Per Vase,

Managing Partner at NNE, stated:

“poor understanding leads to poor

predictability which leads to missed

milestones in R&D; and results in

yield, quality, or compliance issues in

manufacturing.”

• He further stated: “DOE is the

solution to breaking this cycle.”

• See his interview at:

https://www.jmp.com/en_us/events/o

ndemand/analytically-

speaking/operational-excellence-in-

pharmaceutical-manufacturing.html

Page 15: BIG DATA NEEDS DESIGN OF EXPERIMENTS · BIG DATA NEEDS DOE Why the interest: recent client conversations •Growing realisation at the executive level that a company’s current and

BIG DATA NEEDS DOE

What is DoE?

New data collection plan that maximises information by efficiently exploring all possibilities

Page 16: BIG DATA NEEDS DESIGN OF EXPERIMENTS · BIG DATA NEEDS DOE Why the interest: recent client conversations •Growing realisation at the executive level that a company’s current and

BIG DATA NEEDS DOE

The Father of DoE – Sir Ronald A. Fisher

Rothamstead Experimental Station, England – Early 1920’s

Page 17: BIG DATA NEEDS DESIGN OF EXPERIMENTS · BIG DATA NEEDS DOE Why the interest: recent client conversations •Growing realisation at the executive level that a company’s current and

BIG DATA NEEDS DOE

Cornell University, http://usda.mannlib.cornell.edu/MannUsda

Factorial

Fractional

Factorial

RSM

Mixture

Optimal

Taguchi

Non-linear

Choice

Space Filling

Accelerated

Life Test

Covering Arrays

Definitive

Screening

DOE has a rich evolution

Page 18: BIG DATA NEEDS DESIGN OF EXPERIMENTS · BIG DATA NEEDS DOE Why the interest: recent client conversations •Growing realisation at the executive level that a company’s current and

BIG DATA NEEDS DOE

DOE efficiently generates the new data you

will need to determine cause and effects

• Highly efficient

• Increases reliability of data

• Removes blind-spots in understanding

• Reduces uncertainty in decisions

• Enables more productive-work to be done with the same

resources

Page 19: BIG DATA NEEDS DESIGN OF EXPERIMENTS · BIG DATA NEEDS DOE Why the interest: recent client conversations •Growing realisation at the executive level that a company’s current and

BIG DATA NEEDS DOE

DOE CASE STUDIES

Page 20: BIG DATA NEEDS DESIGN OF EXPERIMENTS · BIG DATA NEEDS DOE Why the interest: recent client conversations •Growing realisation at the executive level that a company’s current and

BIG DATA NEEDS DOE

Case study: new process comes back to

R&D to be fixedBEFORE JMP

• A working manufacturing process was developed

• After launch, process time steadily got worse

• Manufacturing capacity was cut in half, and the cycle time doubled

• Came back to R&D to be fixed

• Poor understanding of what really affects process (milling) time

Page 21: BIG DATA NEEDS DESIGN OF EXPERIMENTS · BIG DATA NEEDS DOE Why the interest: recent client conversations •Growing realisation at the executive level that a company’s current and

BIG DATA NEEDS DOE

Determining the experiments needed for

efficient process know-how

WITH JMP

• Defined an efficient data collection plan to learn more

• Presented results graphically

• Quickly identified actionable root causes of poor milling

results

• Generated full, not partial, understanding for the future

• Solved this problem faster and more cheaply than without

JMP

Page 22: BIG DATA NEEDS DESIGN OF EXPERIMENTS · BIG DATA NEEDS DOE Why the interest: recent client conversations •Growing realisation at the executive level that a company’s current and

BIG DATA NEEDS DOE

Fixing problem for good with JMP

Deteriorating performance over time

Potential causes Identified with Data Mining

Verification runs confirmed gain

Efficient experimentation with top potential causes gave new

process know-how at minimum cost

JMP eliminated future waste by quickly identifying possible root causes and

providing a way to efficiently identify and verify a lasting solution

Doubled production capacity with significantly lower R&D effort and time,

compared to not using JMP

Page 23: BIG DATA NEEDS DESIGN OF EXPERIMENTS · BIG DATA NEEDS DOE Why the interest: recent client conversations •Growing realisation at the executive level that a company’s current and

BIG DATA NEEDS DOE

A Way Forward

• Be realistic

• Exploit what data you have

• Be aware of limitations of current data

• Be prepared to experiment or collect new

data

• Use common sense

Page 24: BIG DATA NEEDS DESIGN OF EXPERIMENTS · BIG DATA NEEDS DOE Why the interest: recent client conversations •Growing realisation at the executive level that a company’s current and

BIG DATA NEEDS DOE

Integrated approach is best

Acceptance

Sampling

SQC

SPC

APC

Statistics

Predictive Modelling

Data Mining

Machine Learning

Artificial Intelligence

Design

of

Experiments

Your toolset needs more than big data methods to efficiently

augment scientific and engineering learning with data

Page 25: BIG DATA NEEDS DESIGN OF EXPERIMENTS · BIG DATA NEEDS DOE Why the interest: recent client conversations •Growing realisation at the executive level that a company’s current and

BIG DATA NEEDS DOE

JMP Statistical Discovery

Enables scientists and engineers to supplement their industry

and functional expertise with informed data driven discoveries

Distilling and bringing your analytical insights to life through dynamic

visualizations to help decision makers understand your findings

Predictive Modeling

Page 26: BIG DATA NEEDS DESIGN OF EXPERIMENTS · BIG DATA NEEDS DOE Why the interest: recent client conversations •Growing realisation at the executive level that a company’s current and

BIG DATA NEEDS DOE

References

• The Age of Analytics: Competing in a Data

Driven World, McKinsey & Co

• Big Data & Industry 4.0 in the Chemistry using

Industries Stan Higgins OBE, Non-Executive

Director at Industrial Technology Systems Ltd

(ITS Ltd)

• DOE is the future optimal, Chris Nachtsheim

University of Minnesota Carlson School of

Management and School of Statistics

Page 27: BIG DATA NEEDS DESIGN OF EXPERIMENTS · BIG DATA NEEDS DOE Why the interest: recent client conversations •Growing realisation at the executive level that a company’s current and

BIG DATA NEEDS DOE

QUESTIONS