Post on 09-Jul-2015
description
Data Analysis for Everyone
Martin Monkman
• Provincial Statistician & Director, BC Stats
• been getting paid to do data analysis in one form or another since the mid-1980s
• B.Sc. and M.A. in Geography (UVic)
• member of SABR
• bayesball.blogspot.ca
1. Start with a question
ALWAYS!
And don’t start with data!
• Five Ws
Some examples of questions
• What was the population of Victoria in 1996? And what will the population of Victoria be in 2029?
• What are the demographics of Victoria?
• What do Victoria residents think about infrastructure investment?
2. Get some data
Remember: after your research question has been asked!
Two sources:
• Third party data
• Collect your own
Sources of third party data
Open Data
• Social data: Statistics Canada
• The Census of Canada
• National Household Survey
• www.statcan.gc.ca
• DataBC
• www.data.gov.bc.ca
Collect your own data
Administrative sources
• Registration information
• Transactions
Original data collection
• Survey
Surveys
From the Twenty Questions:
• Who is your population?
• How are you going to reach them?
• What do you already know about them?
• Differences
• Distributions
• Magnitude
• Patterns
• Proportions
• Relationships
• Trends
3. Data Analysis
• MOOCs
• google “Making Sense of Data”
• Coursera
• https://www.coursera.org/course/introstats
• https://www.coursera.org/course/dataanalysis
• https://www.class-central.com/mooc/388/coursera-computing-for-data-analysis
Data Analysis: How-to
“Graphics are instruments for reasoning about quantitative information.” (Edward R. Tufte)
Purposes
• Exploratory Data Analysis
• Narrative
4. Data Visualization
“The greatest value of a picture is when it forces us to notice what we never expected to see.” – John Tukey
Anscombe’s Quartet
STATISTICAL MEASURES OF
EACH OF THE FOUR DATA SETS
Mean of x = 9 (exact)
Variance of x = 11 (exact)
Mean of y = 7.50
Variance of y = 4.122 or 4.127
Correlation between x and y = 0.816
Regression equation:
y = 3.00 + 0.500x
Population pyramid
http://cran.r-project.org/
Capital Regional District, population by municipality, 2013
Data source: Statistics Canada & BC Stats
Capital Regional District, population by municipality and region, 2013
Data source: Statistics Canada & BC Stats
Capital Regional District population, 1996-2013
Data source: Statistics Canada & BC Stats
Year-over-year population change, Capital Regional District
Data source: Statistics Canada & BC Stats
Census tracts
Data source: Statistics Canada & BC Stats
Victoria CMA – median after-tax income (2005), by Census Tract
Data source: Statistics Canada & BC Stats
Data source: Statistics Canada
Source: Harvard Dialect Survey / Joshua Katz
Mapping
How can I improve my data visualizations?
• Work with data
• Experiment
• Get feedback from others
• Look for good examples
• Look for bad examples
Five Degrees of Obfuscation
Debris
Garbage
Rubbish
Trash
Waste
0
5
10
15
20
25
Trash Debris Rubbish Waste GarbageU
nit
s
Five Columns of Clarity
Foreshortened circles
An illusion of distance and volume
No 3D. Ever.
martin.monkman@gmail.com@monkmanmh
bayesball.blogspot.ca