201505 Statistical Thinking course extract

32
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1 1 “Statistical thinking” course by Red Olive – extract for publication For further details or to discuss a bespoke course for your organisation please contact Jefferson Lynch: [email protected] Analytics and Data Management

Transcript of 201505 Statistical Thinking course extract

Page 1: 201505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1 1

“Statistical thinking” course by Red Olive – extract for publication

For further details or to discuss a bespoke course for your organisation please contact Jefferson

Lynch: [email protected]

Analytics and Data Management

Page 2: 201505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2 2

Contents

1st Day Introduction The research process: CRISP-DM Analysis

Reporting vs. modelling Is there an effect? Is there a single cause? Forecasting Could there be more than one cause?

2nd day

Working together The Data Academy

Sharing results What to show

How to show it

Next steps A further project

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2

Page 3: 201505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 3 3

Introduction: Getting your data to speak

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 3

Page 4: 201505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 4 4

Customer Acquisition & Retention Marketing Efficiency & Advertising Revenue Cost to Serve & Profitability Promotion & Pricing Optimisation Demand Forecasting Fraud Detection

Many business challenges

4 Copyright © 2012 Red Olive Ltd, All Rights Reserved.

Page 5: 201505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 5 5 5

Do you know what you are looking for? The business need for the analysis is formulated and confirmed. The question that the stakeholder needs an answer for is articulated. Steering away from analysing ‘right answer to the wrong question’.

Do you know what you will do with the answers you find?

The desired outputs from the analysis are shaped in detail to ensure that the analysis produces outputs in a format that is fit-for-purpose. Actual outputs can be easily integrated into the stakeholders’ target documents, systems or processes.

Do you have a way to evaluate success? Can you measure the current situation in terms of money, time or units? Do you have a way of tracking the results of your work in the same units?

Before jumping into the data...

Copyright © 2012 Red Olive Ltd, All Rights Reserved.

Page 6: 201505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 6 6 6

Gather ideas from people in your business about the cause –> effect relationships. Gather impressions about the different classes or types of events. Consider both positive and negative outcomes. Translate these ideas/impressions into data

What would data have to look like to detect the effects and trends people believe in?

Translate business objective into analysis goal…

Getting your data to speak

Copyright © 2012 Red Olive Ltd, All Rights Reserved.

Page 7: 201505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 7 7

The research process: CRISP-DM

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 7

Page 8: 201505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 8 8

CRISP-DM process

Business data for analytics

1 Develop business

understanding

2 Develop data understanding

3 Prepare data

4 Develop model

5 Evaluate results

6 Deploy live model

Key: Data set Process stage Flow between stages

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 8

Page 9: 201505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 9 9

CRISP-DM

Business understanding Determine objectives Establish use cases Summarise current situation Determine project goals Map business goals to data problem Estimate current value so that ROI can be calculated Create project plan

Data understanding Collect initial data Document the real meaning of each data field Capture baseline SQL Explore the data Check data quality

Page 10: 201505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 10 10

Analysis: Reporting vs. Modelling

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 10

Page 11: 201505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 11 11 11

vs.

What was the rate of net growth? Why did we have higher/lower rate?

Information based on user-directed queries

(hypothesis testing)

Knowledge based on finding unknown

relationships (hypothesis generation)

Historical Analysis Predictive Analysis

Monitors performance measures Determines performance measures

Reactive Proactive

Reporting vs. Modelling

Modelling Reporting

Copyright © 2012 Red Olive Ltd, All Rights Reserved.

Page 12: 201505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 12 12 12

Data Input Target Output Algorithm Goal Results

Find Most

Important

Inputs?

Easy to

interprete/

visualise?

Numeric and/or Symbolic Symbolic C5.0 Predicting / Profiling Decision Tree or Rule Set with prediction and confidence Yes Yes

Numeric and/or Symbolic Numeric or Symbolic C&RT Predicting / Profiling Decision Tree or Rule Set with prediction and confidence Yes Yes

Numeric Numeric Linear Regression Predicting / Forecasting Equation for prediction with coefficients Yes No

Numeric and/or Symbolic Symbolic Logistic Regression Predicting / Probability Equation for prediction of probability and associated coefficients No1 No

Numeric and/or Symbolic Numeric or Symbolic Neural Network Predicting / Probability Prediction and relative importance of input neurons No2 No

Numeric and/or Symbolic None Kohonen Map Clustering / Segmentation Cluster Membership and deviation No Yes

Numeric None K-Means Clustering / Segmentation Cluster Membership with cluster centers No Yes

Numeric and/or Symbolic None Two-Step Clustering / Segmentation Cluster Membership with cluster centers No Yes

Symbolic Symbolic Apriori4, 5 Association Detection Association rules with confidence Yes Yes

Numeric and/or Symbolic Time to event Kaplan-Meier Strategic Planning Survivor / Hazard Curve No Yes

Numeric and/or Symbolic Time to event Cox Regression Tactical Interventions Survivor / Hazard Curve No Yes

Sometimes we put the data into the model and see what happens. Other times we manipulate the inputs (or the outputs) in some way so as to give the algorithm more information to work with. By combining multiple techniques, we can often gain better insight into the nature of potential solutions to a business problem and hopefully lead us to a more useful result. Since more than one approach may be used to address a single business problem, the same data may be used to address a wide range of applications. It will depend on which model you choose, how you manipulate the data in the file, and which input or target variables you choose.

Map analysis goal to modelling technique

Copyright © 2012 Red Olive Ltd, All Rights Reserved.

Page 13: 201505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 13 13

Analysis: Basic statistical terms

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 13

Page 14: 201505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 14 14

Level of measurement

Level of measurement Summary statistic

Visualization

Categorical or Nominal Mode Bar chart, pareto chart

Ranked or Ordinal Median, percentile

Bar chart

Numeric or Scale Mean or average

Histogram, line graph, bubble chart

0

5

10

15

20

25

30

35

40

45

50

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 0

5

10

15

20

25

30

35

40

45

50

Page 15: 201505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 15 15

Inserting functions

Click in a cell Go the Insert menu Choose Functions… Select a category Click on a function and look at the brief help (first letter search works) Click OK to paste Click Help on this function for more information and a worked example

Page 16: 201505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 16 16

Numbers that describe a distribution Statistic Function Definition

Mode =MODE The most common value

Percent =COUNT What proportion of the cases are in this group? COUNT in the group divided by total COUNT.

Percentile =PERCENTILE =PERCENTILERANK

How far down the list of an ordered set are you?

Median =MEDIAN The middle value of an ordered set. The 50th percentile.

Mean =AVERAGE Add all the values and divide by the count

Page 17: 201505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 17 17

Analysis: Is there an effect?

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 17

Page 18: 201505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 18 18

Discussion of crosstabs

A method to test if two variables have a non-random relationship Also called chi-square analysis for the name of the statistic that is calculated Χ2 or X2

Page 19: 201505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 19 19

Discussion of crosstabs

Is there a relationship between the section you are reading on the website and whether or not you are motivated to subscribe? Or are the numbers just due to the normal visit pattern on the site? Data:

Subscribe y/n on this visit to this section Section

YES NO TOTAL

HOME

NEWS

SPORT

FINANCE

COMMENT

BLOGS

CULTURE

TRAVEL

LIFESTYLE

FASHION

TECH

Offers

TOTAL

Page 20: 201505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 20 20

Crosstabs example

Actual counts Calculated %

Page 21: 201505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 21 21

Analysis: Is there a single cause?

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 21

Page 22: 201505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 22 22

When we seek to predict something, what we are really saying is that we have in mind what the cause is and we are trying to predict how likely the effect is.

Modelling techniques do not make predictions on their own. Analysts structure the data input so that the model can use it in a cause and effect way. Thus, it is important to make sure that all of the inputs into a model precede the output in time. You can’t put the effect before the cause.

22

Predictive modelling

Copyright © 2012 Red Olive Ltd, All Rights Reserved.

Page 23: 201505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 23 23 23

Try to get to models early in the process, even before you think you are ready.

Models can tell you things about the data that you can’t see “just by looking”

Build lots of models. Throw away the ones that you are done with

Refine models based on what you learn at each iteration.

Algorithms (within their limitations) are objective

Interpret the results, then make them better

Copyright © 2012 Red Olive Ltd, All Rights Reserved.

Page 24: 201505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 24 24

Could there be more than one cause?

A topic for another course… Advanced analytics

Structure the data into before and after Pick a target Test multiple input hypotheses at once

Forecasting: ARIMA allows for including multiple time series inputs Special events Weather Economic trends

Multivariate propensity Discover different predictive segments Works best with predicting Y/N actions rather than values

Page 25: 201505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 25 25

Sharing results: What to show

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 25

Page 26: 201505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 26 26

Your job is to inspire Your job is not to convince or teach

Lead with the important and interesting findings Explain in general terms Leave the details at the Data Academy

Inspire them

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 26

Page 27: 201505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 27 27

Business

Objectives

Analysis

Results

Business

Terms

Modelling & Evaluation (Accuracy & Significance)

Meaningful Relevant

Actionable Quantified

Translating analysis results into business terms

27 Copyright © 2012 Red Olive Ltd, All Rights Reserved.

Analysis

Goals

Page 28: 201505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 28 28

How you did it How long it took Statistical methods

What to leave in the Data Academy

Problems you had

Caveats related to data

Dirty data

Analysis Goals

Analysis Results

Modelling & Evaluation

(Accuracy & Significance)

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 28

Page 29: 201505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 29 29

Sharing results: How to show it

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 29

Page 30: 201505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 30 30

Can everyone read it Is someone color blind? Does someone have corrective lenses?

Will it print in black and white? Test print Black text on dark colors, including red, will not print. Use white text instead

Wrong Better

Better

0

10

20

30

40

50

60

70

80

90

100

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

East

West

North

Not as Nice

0102030405060708090

100

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

East

West

North

Design – Colour

Page 31: 201505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 31 31

Design – Colour for the colourblind

• http://www.colorbrewer2.org/

Page 32: 201505 Statistical Thinking course extract

Copyright © 2012 Red Olive Ltd, All Rights Reserved. 32 32

Contact information

Please direct enquiries to Jefferson Lynch: [email protected] Office: 01256 831100 Mobile: 07860 353027

32