Post on 18-Aug-2015
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1 1
“Statistical thinking” course by Red Olive – extract for publication
For further details or to discuss a bespoke course for your organisation please contact Jefferson
Lynch: Jefferson.lynch@red-olive.co.uk
Analytics and Data Management
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2 2
Contents
1st Day Introduction The research process: CRISP-DM Analysis
Reporting vs. modelling Is there an effect? Is there a single cause? Forecasting Could there be more than one cause?
2nd day
Working together The Data Academy
Sharing results What to show
How to show it
Next steps A further project
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 3 3
Introduction: Getting your data to speak
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 3
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 4 4
Customer Acquisition & Retention Marketing Efficiency & Advertising Revenue Cost to Serve & Profitability Promotion & Pricing Optimisation Demand Forecasting Fraud Detection
Many business challenges
4 Copyright © 2012 Red Olive Ltd, All Rights Reserved.
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 5 5 5
Do you know what you are looking for? The business need for the analysis is formulated and confirmed. The question that the stakeholder needs an answer for is articulated. Steering away from analysing ‘right answer to the wrong question’.
Do you know what you will do with the answers you find?
The desired outputs from the analysis are shaped in detail to ensure that the analysis produces outputs in a format that is fit-for-purpose. Actual outputs can be easily integrated into the stakeholders’ target documents, systems or processes.
Do you have a way to evaluate success? Can you measure the current situation in terms of money, time or units? Do you have a way of tracking the results of your work in the same units?
Before jumping into the data...
Copyright © 2012 Red Olive Ltd, All Rights Reserved.
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 6 6 6
Gather ideas from people in your business about the cause –> effect relationships. Gather impressions about the different classes or types of events. Consider both positive and negative outcomes. Translate these ideas/impressions into data
What would data have to look like to detect the effects and trends people believe in?
Translate business objective into analysis goal…
Getting your data to speak
Copyright © 2012 Red Olive Ltd, All Rights Reserved.
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 7 7
The research process: CRISP-DM
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 7
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 8 8
CRISP-DM process
Business data for analytics
1 Develop business
understanding
2 Develop data understanding
3 Prepare data
4 Develop model
5 Evaluate results
6 Deploy live model
Key: Data set Process stage Flow between stages
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 8
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 9 9
CRISP-DM
Business understanding Determine objectives Establish use cases Summarise current situation Determine project goals Map business goals to data problem Estimate current value so that ROI can be calculated Create project plan
Data understanding Collect initial data Document the real meaning of each data field Capture baseline SQL Explore the data Check data quality
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 10 10
Analysis: Reporting vs. Modelling
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 10
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 11 11 11
vs.
What was the rate of net growth? Why did we have higher/lower rate?
Information based on user-directed queries
(hypothesis testing)
Knowledge based on finding unknown
relationships (hypothesis generation)
Historical Analysis Predictive Analysis
Monitors performance measures Determines performance measures
Reactive Proactive
Reporting vs. Modelling
Modelling Reporting
Copyright © 2012 Red Olive Ltd, All Rights Reserved.
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 12 12 12
Data Input Target Output Algorithm Goal Results
Find Most
Important
Inputs?
Easy to
interprete/
visualise?
Numeric and/or Symbolic Symbolic C5.0 Predicting / Profiling Decision Tree or Rule Set with prediction and confidence Yes Yes
Numeric and/or Symbolic Numeric or Symbolic C&RT Predicting / Profiling Decision Tree or Rule Set with prediction and confidence Yes Yes
Numeric Numeric Linear Regression Predicting / Forecasting Equation for prediction with coefficients Yes No
Numeric and/or Symbolic Symbolic Logistic Regression Predicting / Probability Equation for prediction of probability and associated coefficients No1 No
Numeric and/or Symbolic Numeric or Symbolic Neural Network Predicting / Probability Prediction and relative importance of input neurons No2 No
Numeric and/or Symbolic None Kohonen Map Clustering / Segmentation Cluster Membership and deviation No Yes
Numeric None K-Means Clustering / Segmentation Cluster Membership with cluster centers No Yes
Numeric and/or Symbolic None Two-Step Clustering / Segmentation Cluster Membership with cluster centers No Yes
Symbolic Symbolic Apriori4, 5 Association Detection Association rules with confidence Yes Yes
Numeric and/or Symbolic Time to event Kaplan-Meier Strategic Planning Survivor / Hazard Curve No Yes
Numeric and/or Symbolic Time to event Cox Regression Tactical Interventions Survivor / Hazard Curve No Yes
Sometimes we put the data into the model and see what happens. Other times we manipulate the inputs (or the outputs) in some way so as to give the algorithm more information to work with. By combining multiple techniques, we can often gain better insight into the nature of potential solutions to a business problem and hopefully lead us to a more useful result. Since more than one approach may be used to address a single business problem, the same data may be used to address a wide range of applications. It will depend on which model you choose, how you manipulate the data in the file, and which input or target variables you choose.
Map analysis goal to modelling technique
Copyright © 2012 Red Olive Ltd, All Rights Reserved.
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 13 13
Analysis: Basic statistical terms
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 13
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 14 14
Level of measurement
Level of measurement Summary statistic
Visualization
Categorical or Nominal Mode Bar chart, pareto chart
Ranked or Ordinal Median, percentile
Bar chart
Numeric or Scale Mean or average
Histogram, line graph, bubble chart
0
5
10
15
20
25
30
35
40
45
50
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 0
5
10
15
20
25
30
35
40
45
50
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 15 15
Inserting functions
Click in a cell Go the Insert menu Choose Functions… Select a category Click on a function and look at the brief help (first letter search works) Click OK to paste Click Help on this function for more information and a worked example
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 16 16
Numbers that describe a distribution Statistic Function Definition
Mode =MODE The most common value
Percent =COUNT What proportion of the cases are in this group? COUNT in the group divided by total COUNT.
Percentile =PERCENTILE =PERCENTILERANK
How far down the list of an ordered set are you?
Median =MEDIAN The middle value of an ordered set. The 50th percentile.
Mean =AVERAGE Add all the values and divide by the count
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 17 17
Analysis: Is there an effect?
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 17
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 18 18
Discussion of crosstabs
A method to test if two variables have a non-random relationship Also called chi-square analysis for the name of the statistic that is calculated Χ2 or X2
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 19 19
Discussion of crosstabs
Is there a relationship between the section you are reading on the website and whether or not you are motivated to subscribe? Or are the numbers just due to the normal visit pattern on the site? Data:
Subscribe y/n on this visit to this section Section
YES NO TOTAL
HOME
NEWS
SPORT
FINANCE
COMMENT
BLOGS
CULTURE
TRAVEL
LIFESTYLE
FASHION
TECH
Offers
TOTAL
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 20 20
Crosstabs example
Actual counts Calculated %
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 21 21
Analysis: Is there a single cause?
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 21
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 22 22
When we seek to predict something, what we are really saying is that we have in mind what the cause is and we are trying to predict how likely the effect is.
Modelling techniques do not make predictions on their own. Analysts structure the data input so that the model can use it in a cause and effect way. Thus, it is important to make sure that all of the inputs into a model precede the output in time. You can’t put the effect before the cause.
22
Predictive modelling
Copyright © 2012 Red Olive Ltd, All Rights Reserved.
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 23 23 23
Try to get to models early in the process, even before you think you are ready.
Models can tell you things about the data that you can’t see “just by looking”
Build lots of models. Throw away the ones that you are done with
Refine models based on what you learn at each iteration.
Algorithms (within their limitations) are objective
Interpret the results, then make them better
Copyright © 2012 Red Olive Ltd, All Rights Reserved.
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 24 24
Could there be more than one cause?
A topic for another course… Advanced analytics
Structure the data into before and after Pick a target Test multiple input hypotheses at once
Forecasting: ARIMA allows for including multiple time series inputs Special events Weather Economic trends
Multivariate propensity Discover different predictive segments Works best with predicting Y/N actions rather than values
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 25 25
Sharing results: What to show
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 25
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 26 26
Your job is to inspire Your job is not to convince or teach
Lead with the important and interesting findings Explain in general terms Leave the details at the Data Academy
Inspire them
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 26
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 27 27
Business
Objectives
Analysis
Results
Business
Terms
Modelling & Evaluation (Accuracy & Significance)
Meaningful Relevant
Actionable Quantified
Translating analysis results into business terms
27 Copyright © 2012 Red Olive Ltd, All Rights Reserved.
Analysis
Goals
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 28 28
How you did it How long it took Statistical methods
What to leave in the Data Academy
Problems you had
Caveats related to data
Dirty data
Analysis Goals
Analysis Results
Modelling & Evaluation
(Accuracy & Significance)
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 28
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 29 29
Sharing results: How to show it
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 29
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 30 30
Can everyone read it Is someone color blind? Does someone have corrective lenses?
Will it print in black and white? Test print Black text on dark colors, including red, will not print. Use white text instead
Wrong Better
Better
0
10
20
30
40
50
60
70
80
90
100
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
East
West
North
Not as Nice
0102030405060708090
100
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
East
West
North
Design – Colour
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 31 31
Design – Colour for the colourblind
• http://www.colorbrewer2.org/
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 32 32
Contact information
Please direct enquiries to Jefferson Lynch: jefferson.lynch@red-olive.co.uk Office: 01256 831100 Mobile: 07860 353027
32