Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)

Post on 09-Jul-2015

79 views 0 download

description

Data is ubiquitous in our lives and work places, and more data can be easily collected. And those data can, if analyzed correctly, provide insights that lead to better decision making. Martin Monkman will present the key ideas for effective and meaningful data analysis, including sources of existing data, things to think about when collecting new data, analyzing results, and effective presentation and reporting of the data.

Transcript of Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)

Data Analysis for Everyone

Martin Monkman

• Provincial Statistician & Director, BC Stats

• been getting paid to do data analysis in one form or another since the mid-1980s

• B.Sc. and M.A. in Geography (UVic)

• member of SABR

• bayesball.blogspot.ca

1. Start with a question

ALWAYS!

And don’t start with data!

• Five Ws

Some examples of questions

• What was the population of Victoria in 1996? And what will the population of Victoria be in 2029?

• What are the demographics of Victoria?

• What do Victoria residents think about infrastructure investment?

2. Get some data

Remember: after your research question has been asked!

Two sources:

• Third party data

• Collect your own

Sources of third party data

Open Data

• Social data: Statistics Canada

• The Census of Canada

• National Household Survey

• www.statcan.gc.ca

• DataBC

• www.data.gov.bc.ca

Collect your own data

Administrative sources

• Registration information

• Transactions

Original data collection

• Survey

Surveys

From the Twenty Questions:

• Who is your population?

• How are you going to reach them?

• What do you already know about them?

• Differences

• Distributions

• Magnitude

• Patterns

• Proportions

• Relationships

• Trends

3. Data Analysis

• MOOCs

• google “Making Sense of Data”

• Coursera

• https://www.coursera.org/course/introstats

• https://www.coursera.org/course/dataanalysis

• https://www.class-central.com/mooc/388/coursera-computing-for-data-analysis

Data Analysis: How-to

“Graphics are instruments for reasoning about quantitative information.” (Edward R. Tufte)

Purposes

• Exploratory Data Analysis

• Narrative

4. Data Visualization

“The greatest value of a picture is when it forces us to notice what we never expected to see.” – John Tukey

Anscombe’s Quartet

STATISTICAL MEASURES OF

EACH OF THE FOUR DATA SETS

Mean of x = 9 (exact)

Variance of x = 11 (exact)

Mean of y = 7.50

Variance of y = 4.122 or 4.127

Correlation between x and y = 0.816

Regression equation:

y = 3.00 + 0.500x

Population pyramid

http://cran.r-project.org/

Capital Regional District, population by municipality, 2013

Data source: Statistics Canada & BC Stats

Capital Regional District, population by municipality and region, 2013

Data source: Statistics Canada & BC Stats

Capital Regional District population, 1996-2013

Data source: Statistics Canada & BC Stats

Year-over-year population change, Capital Regional District

Data source: Statistics Canada & BC Stats

Census tracts

Data source: Statistics Canada & BC Stats

Victoria CMA – median after-tax income (2005), by Census Tract

Data source: Statistics Canada & BC Stats

Data source: Statistics Canada

Source: Harvard Dialect Survey / Joshua Katz

Mapping

How can I improve my data visualizations?

• Work with data

• Experiment

• Get feedback from others

• Look for good examples

• Look for bad examples

Five Degrees of Obfuscation

Debris

Garbage

Rubbish

Trash

Waste

0

5

10

15

20

25

Trash Debris Rubbish Waste GarbageU

nit

s

Five Columns of Clarity

Foreshortened circles

An illusion of distance and volume

No 3D. Ever.

martin.monkman@gmail.com@monkmanmh

bayesball.blogspot.ca