1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling.

30
1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling

Transcript of 1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling.

Page 1: 1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling.

1

Stat 1510Statistical Thinking & Concepts

Producing Data: Sampling

Page 2: 1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling.

2

Primary Data is the data collected by the investigator conducting the research / study with a specific purpose.

Secondary Data - is data collected by someone other than the user for the same or different purpose.

Data

Page 3: 1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling.

3

Researchers often want to answer questions about some large group of individuals (this group is called the population)

Population is a set of units. This population is potentially infinite or even hypothetical.

If the time that unit are measured is important, then the population is often called process.

So in analysis, it is important to be clear about what is the definition of population.

Population

Page 4: 1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling.

4

We consider three type of units.

The target population is the set of units to which the investigators set out to investigate in the definition of the problem

The study population is the set of units that could have been in the sample

The sample which is the set of units actually selected for the investigation. The total number of units in the sample is called sample size and the way that the samples are selected is called sampling protocol or sampling design.

Population & Sample

Page 5: 1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling.

5

Population & Sample

Page 6: 1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling.

6

Example Faculty of Science of Memorial University

want to know the opinion of students on the university facilities. For this purpose, they conducted a survey by selecting a random sample of 200 students survey registered for Winter 2013.

Identify target population, study population sample & sampling unit

Page 7: 1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling.

7

Example Target Population: Faculty of Science,

MUN, Students

Study Population – All students registered for Winter 2013 in FoS of MUN

Sample – 200 students selected for this survey

Sampling Unit: Each selected student

Page 8: 1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling.

8

Bad Sampling Designs Voluntary response sampling

– allowing individuals to choose to be in the sample

Convenience sampling– selecting individuals that are easiest to reach

Both of these techniques are biased– systematically favor certain outcomes

Page 9: 1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling.

9

Voluntary Response To prepare for her book Women and Love, Shere

Hite sent questionnaires to 100,000 women asking about love, sex, and relationships.– 4.5% responded– Hite used those responses to write her book

Moore (Statistics: Concepts and Controversies, 1997) noted:– respondents “were fed up with men and eager to fight

them…”– “the anger became the theme of the book…”– “but angry women are more likely” to respond

Page 10: 1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling.

10

Convenience Sampling

Sampling mice from a large cage to study how a drug affects physical activity– lab assistant reaches into the cage to select

the mice one at a time until 10 are chosen

Which mice will likely be chosen?– could this sample yield biased results?

Page 11: 1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling.

11

Purposive Sampling Consider the selection of football team or

soccer team. Consider selection of students for a math

skill competition

In the above sampling scheme, we select the sampling units with a well defined purpose and samples are not randomly picked.

Page 12: 1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling.

12

Simple Random Sampling

Each individual in the population has the same chance of being chosen for the sample

Each group of individuals (in the population) of the required size (n) has the same chance of being the sample actually selected

Random selection:– “drawing names out of a hat”– table of random digits– computer software

Page 13: 1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling.

13

Table of Random Digits Table B on pg. 692 of text

– each entry is equally likely to be any of the 10 digits 0 through 9

– entries are independent of each other (knowledge of one entry gives no information about any other entries)

– each pair of entries is equally likely to be any of the 100 pairs 00, 01,…, 99

– each triple of entries is equally likely to be any of the 1000 values 000, 001, …, 999

Page 14: 1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling.

14

Choosing a Simple Random Sample (SRS)

STEP 1: Label each individual in the population

STEP 2: Use Table B to select labels at random

Page 15: 1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling.

15

Simple Random Sample

with and without replacementCase 1: In without replacement, each

selected sampling unit will not replaced back to the population.

Case 2: In with replacement, each sampled unit will be replaced back to the population.

Page 16: 1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling.

16

Probability Sample

a sample chosen by chance must know what samples are possible

and what chance, or probability, each possible sample has of being selected

a SRS gives each member of the population an equal chance to be selected

Page 17: 1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling.

17

Stratified Random Sample

first divide the population into groups of similar individuals, called strata

second, choose a separate SRS in each stratum

third, combine these SRSs to form the full sample

Page 18: 1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling.

18

Stratified Random SampleExample

Suppose a university has the following student demographics:

Undergraduate Graduate First Professional Special

55% 20% 5% 20%

A stratified random sample of 100 students could be chosen as follows: select a SRS of 55 undergraduates, a SRS of 20 graduates, a SRS of 5 first professional students, and a SRS of 20 special students; combine these 100 students.

Page 19: 1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling.

19

Stratified Random SampleExample

We would like to take a sample to represent Canadian population We have different provinces and we wish represent the all provinces should be represented in the sample

A stratified random sample of 1000 people could be chosen as follows: From each province, we select random samples. Since population in each province differ heavily, samples from each province should be proportional to its population.

Page 20: 1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling.

20

Multistage Sample

several stages of sampling are carried out useful for large-scale sample surveys samples at each stage may be SRSs, but

are often stratified stages may involve other random sampling

techniques as well (cluster, systematic, random digit dialing, …)

Page 21: 1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling.

21

Cautions about Sample Surveys Undercoverage

– some individuals or groups in the population are left out of the process of choosing the sample

Nonresponse– individuals chosen for the sample cannot be contacted

or refuse to cooperate/respond Response bias

– behavior of respondent or interviewer may lead to inaccurate answers or measurements

Wording of questions– confusing or leading (biased) questions; words with

different meanings

Page 22: 1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling.

22

Nonresponse To prepare for her book Women and Love,

Shere Hite sent questionnaires to 100,000 women asking about love, sex, and relationships.– 4.5% responded– Hite used those responses to write her book– angry women are more likely to respond

Page 23: 1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling.

23

Response Bias A door-to-door survey is being conducted

to determine drug use (past or present) of members of the community. Respondents may give socially acceptable answers (maybe not the truth!)

For this survey on drug use, would it matter if a police officer is conducting the interview? (bias from interviewer)

Page 24: 1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling.

24

Asking the UninformedWashington Post National Weekly Edition (April 10-16, 1995, p. 36)

A 1978 poll done in Cincinnati asked people whether they “favored or opposed repealing the 1975 Public Affairs Act.”– There was no such act!– About one third of those asked expressed

an opinion about it.

Response Bias

Page 25: 1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling.

25

Wording of Questions

A newsletter distributed by a politician to his constituents gave the results of a “nationwide survey on Americans’ attitudes about a variety of educational issues.” One of the questions asked was, “Should your legislature adopt a policy to assist children in failing schools to opt out of that school and attend an alternative school--public, private, or parochial--of the parents’ choosing?” From the wording of this question, can you speculate on what answer was desired? Explain.

Page 26: 1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling.

26

Wording: Deliberate Bias

“If you found a wallet with $20 in it, would you return the money?”

“If you found a wallet with $20 in it, would you do the right thing of returning the money?”

Page 27: 1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling.

27

Wording: Unintentional Bias “I have taught several students over the

past few years.”– How many students do you think I have

taught?– How many years am I referring to?

“Over the past few days, how many servings of fruit have you eaten?”– How many days are you considering?– What constitutes a serving?

Page 28: 1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling.

28

Wording: Unnecessary Complexity

“Do you sometimes find that you have arguments with your family members and co-workers?”– Arguments with family members– Arguments with co-workers

Page 29: 1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling.

29

Wording: Ordering of Questions

“How often do you normally go out on a date? about ___ times a month.”

“How happy are you with life in general?”– Strong association between these questions.– If the ordering is reversed, then there would

be no strong association between these questions

Page 30: 1 Stat 1510 Statistical Thinking & Concepts Producing Data: Sampling.

30

Inferences about the Population Values calculated from samples are used to

make conclusions (inferences) about unknown values in the population

Variability– different samples from the same population may yield

different results for a particular value of interest

– estimates from random samples will be closer to the true values in the population if the samples are larger

– how close the estimates will likely be to the true values can be calculated -- this is called the margin of error