Copyright © 2012 Red Olive Ltd, All Rights Reserved. 11
Analytics Academy – “Statistical thinking”
(Client: a household-name media company)
Information andData Management
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 22
Contents
1st DayIntroduction
The research process: CRISP-DM
AnalysisReporting vs. modellingIs there an effect?Is there a single cause?ForecastingCould there be more than one cause?
2nd dayWorking together
The Data Academy
Sharing resultsWhat to showHow to show it
Next stepsA further project
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 33
Introduction: Getting your data to speak
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 3
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 44
Customer Acquisition & RetentionMarketing Efficiency & Advertising RevenueCost to Serve & ProfitabilityPromotion & Pricing OptimisationDemand ForecastingFraud Detection
Many business challenges
4Copyright © 2012 Red Olive Ltd, All Rights Reserved.
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 555
Do you know what you are looking for?The business need for the analysis is formulated and confirmed.The question that the stakeholder needs an answer for is articulated.Steering away from analysing ‘right answer to the wrong question’.
Do you know what you will do with the answers you find?The desired outputs from the analysis are shaped in detail to ensure that the analysis produces outputs in a format that is fit-for-purpose.Actual outputs can be easily integrated into the stakeholders’ target documents, systems or processes.
Do you have a way to evaluate success?Can you measure the current situation in terms of money, time or units? Do you have a way of tracking the results of your work in the same units?
Before jumping into the data...
Copyright © 2012 Red Olive Ltd, All Rights Reserved.
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 666
Gather ideas from people in your business about the cause –> effect relationships.Gather impressions about the different classes or types of events.Consider both positive and negative outcomes.
Translate these ideas/impressions into dataWhat would data have to look like to detect the effects and trends people believe in?Translate business objective into analysis goal…
Getting your data to speak
Copyright © 2012 Red Olive Ltd, All Rights Reserved.
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 77
The research process:CRISP-DM
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 7
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 88
CRISP-DM process
Business data for analytics
1 Develop business
understanding
2 Develop data understanding
3 Prepare data
4 Develop model
5 Evaluate results
6 Deploy live model
Key:
Data set
Process stage
Flow between stages
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 8
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 99
CRISP-DMBusiness understanding
Determine objectivesEstablish use casesSummarise current situationDetermine project goalsMap business goals to data problemEstimate current value so that ROI can be calculatedCreate project plan
Data understandingCollect initial dataDocument the real meaning of each data fieldCapture baseline SQL Explore the dataCheck data quality
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1010
Analysis: Reporting vs. Modelling
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 10
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 111111
vs.
What was the rate of net growth? Why did we have higher/lower rate?
Information based on user-directed queries(hypothesis testing)
Knowledge based on finding unknown relationships (hypothesis generation)
Historical Analysis Predictive Analysis
Monitors performance measures Determines performance measures
Reactive Proactive
Reporting vs. Modelling
ModellingReporting
Copyright © 2012 Red Olive Ltd, All Rights Reserved.
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 121212
Sometimes we put the data into the model and see what happens. Other times we manipulate the inputs (or the outputs) in some way so as to give the algorithm more information to work with.
By combining multiple techniques, we can often gain better insight into the nature of potential solutions to a business problem and hopefully lead us to a more useful result.
Since more than one approach may be used to address a single business problem, the same data may be used to address a wide range of applications. It will depend on which model you choose, how you manipulate the data in the file, and which input or target variables you choose.
Map analysis goal to modelling technique
Copyright © 2012 Red Olive Ltd, All Rights Reserved.
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1313
Analysis: Basic statistical terms
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 13
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1414
Level of measurementLevel of measurement Summary
statisticVisualization
Categorical or Nominal Mode Bar chart, pareto chart
Ranked or Ordinal Median, percentile
Bar chart
Numeric or Scale Mean or average
Histogram, line graph, bubble chart
0
5
10
15
20
25
30
35
40
45
50
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 0
5
10
15
20
25
30
35
40
45
50
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1515
Inserting functionsClick in a cellGo the Insert menuChoose Functions…Select a categoryClick on a function and look at the brief help (first letter searchworks)Click OK to paste
Click Help on this functionfor more informationand a worked example
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1616
Numbers that describea distributionStatistic Function Definition
Mode =MODE The most common value
Percent =COUNT What proportion of the cases are in this group? COUNT in the group divided by total COUNT.
Percentile =PERCENTILE=PERCENTILERANK
How far down the list of an ordered set are you?
Median =MEDIAN The middle value of an ordered set. The 50th percentile.
Mean =AVERAGE Add all the values and divide by the count
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1717
Analysis: Is there an effect?
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 17
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1818
Discussion of crosstabsA method to test if two variables have a non-random relationship
Also called chi-square analysis for the name of the statistic that is calculated Χ2 or X2
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 1919
Discussion of crosstabsIs there a relationship between the section you are reading on the website and whether or not you are motivated to subscribe?
Or are the numbers just due to the normal visit pattern on the site?
Data:Subscribe y/n on this visit to this sectionSection
YES NO TOTALHOME
NEWS
SPORT
FINANCE
COMMENT
BLOGS
CULTURE
TRAVEL
LIFESTYLE
FASHION
TECH
Offers
TOTAL
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2020
Crosstabs exampleActual counts
Calculated %
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2121
Analysis: Is there a single cause?
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 21
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2222
When we seek to predict something, what we are really saying is that we have in mind what the cause is and we are trying to predict how likely the effect is.
Modelling techniques do not make predictions on their own. Analysts structure the data input so that the model can use it in a cause and effect way.
Thus, it is important to make sure that all of the inputs into a model precede the output in time. You can’t put the effect before the cause.
22
Predictive modelling
Copyright © 2012 Red Olive Ltd, All Rights Reserved.
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 232323
Try to get to models early in the process, even before you think you are ready.
Models can tell you things about the data that you can’t see “just by looking”
Build lots of models. Throw away the ones that you are done with
Refine models based on what you learn at each iteration.
Algorithms (within their limitations) are objective
Interpret the results, then make them better
Copyright © 2012 Red Olive Ltd, All Rights Reserved.
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2424
Could there be more than one cause?
A topic for another course…
Advanced analyticsStructure the data into before and afterPick a targetTest multiple input hypotheses at once
Forecasting:ARIMA allows for including multiple time series inputsSpecial eventsWeatherEconomic trends
Multivariate propensityDiscover different predictive segmentsWorks best with predicting Y/N actions rather than values
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2525
Sharing results: What to show
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 25
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2626
Your job is to inspire Your job is not to convince or teach
Lead with the important and interesting findingsExplain in general termsLeave the details at the Data Academy
Inspire them
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 26
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2727
BusinessObjectives
AnalysisResults
BusinessTerms
Modelling & Evaluation(Accuracy & Significance)
MeaningfulRelevant
ActionableQuantified
Translating analysis results into business terms
27Copyright © 2012 Red Olive Ltd, All Rights Reserved.
AnalysisGoals
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2828
How you did itHow long it tookStatistical methods
What to leave in the Data Academy
Problems you hadCaveats related to dataDirty data
AnalysisGoals
AnalysisResultsModelling &
Evaluation(Accuracy & Significance)
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 28
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 2929
Sharing results: How to show it
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 29
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 3030
Can everyone read itIs someone color blind?Does someone have corrective lenses?
Will it print in black and white?Test printBlack text on dark colors, including red, will not print.Use white text instead
Wrong Better
Better
0
10
20
30
40
50
60
70
80
90
100
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
East
West
North
Not as Nice
0102030405060708090
100
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
East
West
North
Design – Colour
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 3131
Design – Colour for the colourblind
• http://www.colorbrewer2.org/
Copyright © 2012 Red Olive Ltd, All Rights Reserved. 3232
Contact information
Please direct enquiries to Jefferson Lynch: [email protected] Office: 01256 831100Mobile: 07860 353027
32
Top Related