Statistics 203 - Simon Fraser...

45
Topics for Today What is statistics? What is the role of statistics in Social Science research? What is data? Where does data come from? Stat203 Page 1 of 45 Fall 2011 – Week1, Lecture 2

Transcript of Statistics 203 - Simon Fraser...

Topics for Today

What is statistics?

What is the role of statistics in Social Science research?

What is data?

Where does data come from?

Stat203 Page 1 of 29Fall 2011 – Week1, Lecture 2

What is Statistics?

__________, ________, and ______________ of data relating to some population.

So, statistics is not simply numbers, it is the _______ of collecting and interpreting information.

It is generally broken down into

o ___________ Statistics

o ___________ Statistics

Stat203 Page 2 of 29Fall 2011 – Week1, Lecture 2

Descriptive Statistics

Exploration of data resulting in a _________ summary of information.

Often includes summary measures and _________ or tabular presentation

May indicate _______or other ________.

… gives you a general idea of what information may be in a set of data.

Does NOT result in conclusions and should not drive decisions.

Stat203 Page 3 of 29Fall 2011 – Week1, Lecture 2

Inferential Statistics

This is more complex analysis allowing informed _________ based on data.

Through Inferential Statistics we determine whether any apparent characteristics in the data are _______ or merely the result of randomness.

- How do TV networks decide when to ‘call’ an _______________?

- How many vehicle repairs does a car company perform before they decide to perform a ______?

- Does a result observed in a few subjects imply the same result in the entire __________?

Stat203 Page 4 of 29Fall 2011 – Week1, Lecture 2

Role of Statistics in Social Science

Science involves accumulating _________.

Through statistics, we decide what a study or observation ___________ to our knowledge.

In Social Science, we are focusing on a particular area of knowledge … and there is a typical way that statistics is involved.

Stat203 Page 5 of 29Fall 2011 – Week1, Lecture 2

Typical Scientific Strategy

1. Construct a __________“Is the world different than we think it is”

Most adults living in Quebec wish to separate from Canada.

2. Collect the ____Run your experiment, observe your subjects, or review the source material

A randomly chosen sample of 2000 adults living in Quebec is interviewed and asked if they would like to separate from Canada.

3. _________ your dataDescriptive Statistics, graphs, tables

??% of surveyed individuals said yes

Stat203 Page 6 of 29Fall 2011 – Week1, Lecture 2

4. Perform _________ (analysis)If most people didn’t want to separate, how likely is it that we could have gotten ??% by chance alone?

95 times out of 100, ??% is more/less than 50%.

5. Interpret the analysis and form ___________What does #4 mean?

Based on this survey, we conclude that …

We will follow this procedure many times throughout this class.

Stat203 Page 7 of 29Fall 2011 – Week1, Lecture 2

What is Data?

Data is a __________ of information about a group of ___________. The information is organized in variables

Individuals VariablesPeople ______________Households _____________________Nations _______________Eye ___________

Visualize a spreadsheet; each row is a different individual and each column contains a variable observed (recorded) for that individual.

Stat203 Page 8 of 29Fall 2011 – Week1, Lecture 2

Types of Data:

To facilitate our understanding of methods of analyzing data, we’ll categorize data into one of the following two categories:

________ Variables : these variables take on only__________sets of values and are further broken down into the following types:________ ________ ________

Nominal and Ordinal variables are also sometimes called ___________ variables

__________ Variables : these are typically ____________ with the number of decimal places restricted only by our tools of measurement.

Continuous variables are also sometimes called ____________ variables

Stat203 Page 9 of 29Fall 2011 – Week1, Lecture 2

Why do we care about types of data?

Knowing the each variable’s type will allow us to quickly and easily select the most appropriate statistical analyses and graphical presentations to use.

Stat203 Page 10 of 29Fall 2011 – Week1, Lecture 2

_______ Variables

For each individual, a _______ variable represents an arbitrarily named category to which the individual belongs, where the ordering of the categories is __________ .

Examples:o_______= Male / Female

_________= Brown / Red / Green / …_No category is “greater” or “less” than any other category, and there is no natural ________.

_______ variables are effectively displayed in pie charts, bargraphs, histograms, or mosaic plots.

Stat203 Page 11 of 29Fall 2011 – Week1, Lecture 2

[source: http://www.engenderhealth.org/mdg5/images/pie-chart.gif]

Stat203 Page 12 of 29Fall 2011 – Week1, Lecture 2

[source: http://www.analyticsworld.net/2010/04/22/data-visualization-example-1-mosaic-plot/ ]

Stat203 Page 13 of 29Fall 2011 – Week1, Lecture 2

_______ Variables

For each individual, an _______ variable represents the _______ category to which the individual belongs. One category is clearly ‘greater’ or ‘less’ than another, but the ______ ‘greater’ or ‘less’ is not clear

Examples:o ________________ - 1 = All the time

2 = Most of the time3 = Some of the time4 = Seldom5 = Never

o Multiple Sclerosis Status (EDSS)

Stat203 Page 14 of 29Fall 2011 – Week1, Lecture 2

Expanded Disability Status Scale (EDSS)Some categories removed!0.0 - Normal neurological exam (all grade 0 in all FS scores).1.0 - No disability, minimal signs in one FS (i.e., grade 1).  1.5 - No disability, minimal signs in more than one FS* (more than 1 FS grade 1).2.0 - Minimal disability in one FS (one FS grade 2, others 0 or 1). 2.5 - Minimal disability in two FS (two FS grade 2, others 0 or 1). 3.0 - Moderate disability in one FS (one FS grade 3, others 0 or 1) or mild disability in

three or four FS (three or four FS grade 2, others 0 or 1) though fully ambulatory.  4.0 - Fully ambulatory without aid, self-sufficient, up and about some 12 hours a day

despite relatively severe disability consisting of one FS grade 4 (others 0 or 1), or combination of lesser grades exceeding limits of previous steps; able to walk without aid or rest some 500 meters.  

5.0 - Ambulatory without aid or rest for about 200 meters; disability severe enough to impair full daily activities (e.g., to work a full day without special provisions); (Usual FS equivalents are one grade 5 alone, others 0 or 1; or combinations of lesser grades usually exceeding specifications for step 4.0).  

6.0 - Intermittent or unilateral constant assistance (cane, crutch, brace) required to walk about 100 meters with or without resting; (Usual FS equivalents are combinations with more than two FS grade 3+). 

7.0 - Unable to walk beyond approximately 5 meters even with aid, essentially restricted to wheelchair; wheels self in standard wheelchair and transfers alone; up and about in wheelchair some 12 hours a day; (Usual FS equivalents are combinations with more than one FS grade 4+; very rarely pyramidal grade 5 alone). 

8.0 - Essentially restricted to bed or chair or perambulated in wheelchair, but may be out of bed itself much of the day; retains many self-care functions; generally has effective use of arms; (Usual FS equivalents are combinations, generally grade 4+ in several systems). 

9.0 - Helpless bed patient; can communicate and eat; (Usual FS equivalents are combinations, mostly grade 4+). 

10.0 - Death due to MS. 

Stat203 Page 15 of 29Fall 2011 – Week1, Lecture 2

_________ variables are most commonly presented in barcharts or histograms with categories presented in their natural __________or____________order.

(http://www.msrc.co.uk/images/gallery/Slide4.JPG)

Stat203 Page 16 of 29Fall 2011 – Week1, Lecture 2

________ Variables

For each individual an interval-scaled variable represents an _______ category to which the _________ of the category is also important. Categories are clearly ‘greater’ or ‘less’ than one another, and the amount is meaningful.

For our purposes, the most common occurrence of this type of variable is _________.

Examples:o Number of _______________o Number of _________o Number of____________o_______

Stat203 Page 17 of 29Fall 2011 – Week1, Lecture 2

Difference between_______ and ________

_________variable:

One person experiences 10 arrests compared to another person who experiences 5 arrests

… one is _____ as many as the other.

_______ variable:

One patient has an MS status of EDSS = 10 compared to another with EDSS = 5

… (referring back to the scale on page 9) is ‘Death’ twice as bad as “ambulatory for 200 meters”??

Stat203 Page 18 of 29Fall 2011 – Week1, Lecture 2

_____ Variables

The variable can take ___________ many _______ , and is restricted only by our ability to measure them. Think of these as ‘measured’ variables or something that may have been rounded off.

Examples:

_____Was the couple married for exactly 9______?

_______was the person exactly _____?

In Social Science, and the textbook, ________ variables are treated the same as _____ variables … this is not always possible, however, as we’ll see later on.

Stat203 Page 19 of 29Fall 2011 – Week1, Lecture 2

_____ variables are most commonly plotted using boxplots and the relationships between them by scatterplots. Histograms are also common, but continuous data is converted to ordinal data to construct these.

Stat203 Page 20 of 29Fall 2011 – Week1, Lecture 2

But where did the data come from?

There are many many ways to collect data. What are some you may have particpated in?

- _________ or computer surveys.- Experiments / Clinical Trials- Food store point systems.- Air miles cards, etc.- “Tell Microsoft about this problem.”

-

But what is good data?

Stat203 Page 21 of 29Fall 2011 – Week1, Lecture 2

Garbage In = Garbage Out

Good data must:o be ______________ of the population

you are interested ino have limited (or at least well-understood)

____o be as complete as possible

(bias: http://youtu.be/mKKj35xhkNE )

If your data has some of these problems, the best analysis in the world can’t save you.

Stat203 Page 22 of 29Fall 2011 – Week1, Lecture 2

Collecting Data: Surveys

Samples are drawn from a population and the ______________ (variables) of interest are measured.

e.g. Political polls, consumer polls, etc.

Stat203 Page 23 of 29Fall 2011 – Week1, Lecture 2

Collecting Data: Experiments

Experimental units (or individuals) are ________ assigned to two or more ‘treatment groups’. Each group receives a different treatment. The response of each subject in each group is recorded and the group results are compared.

Example:

For auto theft, first time offenders are randomly given one of two types of punishment (jail/probation). The response to these treatments for each subject may be whether they re-offend or not, within five years.

Stat203 Page 24 of 29Fall 2011 – Week1, Lecture 2

Collecting Data: Content Analysis

Books, papers, TV shows, etc. (eg: source) are analyzed ______________ for their content.

Example:

A political party may feel that they are not getting their fair share of broadcast news time on a particular network. A communications analyst may examine all news broadcasts each day on four channels for a month and record the number of minutes spent each day by each channel discussing the platform of each political party.

Stat203 Page 25 of 29Fall 2011 – Week1, Lecture 2

Collecting Data: Participant Observation

The researcher joins the group being studied and records his/her ____________.Example:

A researcher ‘joins’ the National Rifle association, the Hell’s Angels or the mafia, to study the characteristics of the group.

Stat203 Page 26 of 29Fall 2011 – Week1, Lecture 2

Collecting Data: Secondary Analysis

Data obtained by _____ researchers is studied.

There are many companies whose whole business is devoted to obtaining data that could be sold to third parties.

Example:

Data on the general health of Canadians is purchased from Health Canada and analyzed.

Stat203 Page 27 of 29Fall 2011 – Week1, Lecture 2

Today’s Topics

What is Statistics?- Process of collection, analysis and interpretation

What is Data?- collection of information on individuals (rows)- organized in variables (columns)

Types of Data- Discrete (Nominal, Ordinal, Interval)- Continuous (Interval and Ratio)

Collecting Data- Surveys- Experiments- Content Analysis- Participant Observation- Secondary Analysis

Stat203 Page 28 of 29Fall 2011 – Week1, Lecture 2

Reading for next lecture

Chapter 2

Stat203 Page 29 of 29Fall 2011 – Week1, Lecture 2