Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit...

40
Sampling distributions and the Central Limit Theorem Johan A. Elkink University College Dublin 9 October 2014 Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 1 / 28

Transcript of Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit...

Page 1: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Sampling distributions and theCentral Limit Theorem

Johan A. Elkink

University College Dublin

9 October 2014

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 1 / 28

Page 2: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Outline

1 Sampling

2 Statistical inference

3 Central Limit Theorem

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 2 / 28

Page 3: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Sampling

Outline

1 Sampling

2 Statistical inference

3 Central Limit Theorem

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 3 / 28

Page 4: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Sampling

Sampling

Statistical inference (or inductive statistics)

concerns drawing conclusions regarding a

population of cases on the basis of a sample, a

subset.

Sampling refers to the selection of an

appropriate subset of the population.

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 4 / 28

Page 5: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Sampling

Sampling frame

The sampling frame refers to the identifiable list

of members of the population, from which the

sample can be selected.

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 5 / 28

Page 6: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Sampling

Simple random sampling

Each subject from a population has the exact

same chance of being selected in the sample, i.e.

the sampling probability for each subject is the

same.

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 6 / 28

Page 7: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Sampling

Sampling bias

When the sampling probability correlates with a

variable of interest, we are likely to get biased

results.

Other causes of bias:

Misreporting by respondents

Characteristics of interviewer

Question-ordering effects

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 7 / 28

Page 8: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Sampling

Sampling bias

When the sampling probability correlates with a

variable of interest, we are likely to get biased

results.

Other causes of bias:

Misreporting by respondents

Characteristics of interviewer

Question-ordering effects

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 7 / 28

Page 9: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Sampling

Exercise

What is wrong with the following scenarios?

Students in a class are asked to raise their hands if they havecheated on an exam one or more times within the past year.

To get information on opinions among students, 100students are surveyed at the start of a 9 am class.

To get information on public opinion, you stand at theentrance of the Apple Store in a shopping street andinterview passers-by randomly.

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28

Page 10: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Sampling

Exercise

What is wrong with the following scenarios?

Students in a class are asked to raise their hands if they havecheated on an exam one or more times within the past year.

To get information on opinions among students, 100students are surveyed at the start of a 9 am class.

To get information on public opinion, you stand at theentrance of the Apple Store in a shopping street andinterview passers-by randomly.

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28

Page 11: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Sampling

Exercise

What is wrong with the following scenarios?

Students in a class are asked to raise their hands if they havecheated on an exam one or more times within the past year.

To get information on opinions among students, 100students are surveyed at the start of a 9 am class.

To get information on public opinion, you stand at theentrance of the Apple Store in a shopping street andinterview passers-by randomly.

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28

Page 12: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Sampling

Weighting

Other types of sampling procedures exist, such

as stratified or clustering sampling, whereby

subsequent weighting of the data can recover the

necessary unbiasedness for statistical inference.

Generally, the weight would be the inverse of the

probability of inclusion in the sample.

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 9 / 28

Page 13: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Statistical inference

Outline

1 Sampling

2 Statistical inference

3 Central Limit Theorem

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 10 / 28

Page 14: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Statistical inference

Parameters

A parameter is number that describes a feature

of the population. A parameter is generally fixed

and not observable.

A statistic is a number that describes a feature

of a sample and is fixed for a given sample, but

varies across samples.

We can use statistics to estimate parameters.(Moore, McCabe & Craig 2012: 198)

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 11 / 28

Page 15: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Statistical inference

Parameters

A parameter is number that describes a feature

of the population. A parameter is generally fixed

and not observable.

A statistic is a number that describes a feature

of a sample and is fixed for a given sample, but

varies across samples.

We can use statistics to estimate parameters.(Moore, McCabe & Craig 2012: 198)

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 11 / 28

Page 16: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Statistical inference

Parameters

A parameter is number that describes a feature

of the population. A parameter is generally fixed

and not observable.

A statistic is a number that describes a feature

of a sample and is fixed for a given sample, but

varies across samples.

We can use statistics to estimate parameters.(Moore, McCabe & Craig 2012: 198)

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 11 / 28

Page 17: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Statistical inference

From probability to statistics

Using probability theory, we can understand how

samples behave on average, given some

assumptions.

By comparing the sample at hand to samples on

average, we can draw probabilistic conclusions

about the population parameters.

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 12 / 28

Page 18: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Statistical inference

From probability to statistics

Using probability theory, we can understand how

samples behave on average, given some

assumptions.

By comparing the sample at hand to samples on

average, we can draw probabilistic conclusions

about the population parameters.

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 12 / 28

Page 19: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Statistical inference

Sampling distribution

“The sampling distribution of a statistic is the

distribution of values taken by the statistic in all

possible samples of the same size from the same

population.”(Moore, McCabe & Craig 2012: 201)

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 13 / 28

Page 20: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Statistical inference

Sampling error

The amount of error when a population

parameter is estimated or predicted by a sample

estimate.

The bigger the sample, the lower the sampling

error.

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 14 / 28

Page 21: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Statistical inference

Estimates and uncertainty

When we estimate a parameter, we are uncertain

what the true value is.

Besides an estimate of the parameter, we also

need an estimate of how certain we are of this

estimate.

The typical indicator of this is the standard error.

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 15 / 28

Page 22: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Central Limit Theorem

Outline

1 Sampling

2 Statistical inference

3 Central Limit Theorem

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 16 / 28

Page 23: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Central Limit Theorem

i.i.d.

We make three assumptions about our data to

proceed:

The observations are independent

The observations are identically distributed

The population has a finite mean and a

finite variance

A variable for which the first two assumptions

hold is called iid.Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 17 / 28

Page 24: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Central Limit Theorem

Independent observations

Intuitively: the value for one case does not affect

the value for another case on the same variable.

More formally: P(x1 ∩ x2) = P(x1)P(x2).

Examples of dependent observations:

grades of students in different classes;

stock values over time;

economic growth in neighbouring countries.

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 18 / 28

Page 25: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Central Limit Theorem

Independent observations

Intuitively: the value for one case does not affect

the value for another case on the same variable.

More formally: P(x1 ∩ x2) = P(x1)P(x2).

Examples of dependent observations:

grades of students in different classes;

stock values over time;

economic growth in neighbouring countries.

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 18 / 28

Page 26: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Central Limit Theorem

Identically distributed

All the observations are drawn from the same

random variable with the same probability

distribution.

An example where this is not the case would generally be panel

data. E.g. larger firms will have larger variations in profits, thus

their variance differs, thus these are not observations from the same

probability distribution.

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 19 / 28

Page 27: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Central Limit Theorem

Identically distributed

All the observations are drawn from the same

random variable with the same probability

distribution.

An example where this is not the case would generally be panel

data. E.g. larger firms will have larger variations in profits, thus

their variance differs, thus these are not observations from the same

probability distribution.

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 19 / 28

Page 28: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Central Limit Theorem

Random sample

A proper random sample is i.i.d.

The law of large numbers and the Central Limit

Theorem help us to predict the behaviour of our

sample data.

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 20 / 28

Page 29: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Central Limit Theorem

Law of large numbers

The law of large numbers (LLN) states that, if

these three assumptions are satisfied, the sample

mean will approach the population mean with

probability one if the sample is infinitely large.

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 21 / 28

Page 30: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Central Limit Theorem

Central Limit Theorem

If these three assumptions are satisfied,

The sample mean is normally distributed,

regardless of the distribution of the original

variable.

The sample mean has the same expected

value as the population mean (LLN).

The standard deviation (standard error) of

the sample mean is: S .E .(x̄) = σx̄ = σx√n.

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 22 / 28

Page 31: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Central Limit Theorem

Central Limit Theorem

If these three assumptions are satisfied,

The sample mean is normally distributed,

regardless of the distribution of the original

variable.

The sample mean has the same expected

value as the population mean (LLN).

The standard deviation (standard error) of

the sample mean is: S .E .(x̄) = σx̄ = σx√n.

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 22 / 28

Page 32: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Central Limit Theorem

Central Limit Theorem

If these three assumptions are satisfied,

The sample mean is normally distributed,

regardless of the distribution of the original

variable.

The sample mean has the same expected

value as the population mean (LLN).

The standard deviation (standard error) of

the sample mean is: S .E .(x̄) = σx̄ = σx√n.

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 22 / 28

Page 33: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Central Limit Theorem

Central Limit Theorem

If these three assumptions are satisfied,

The sample mean is normally distributed,

regardless of the distribution of the original

variable.

The sample mean has the same expected

value as the population mean (LLN).

The standard deviation (standard error) of

the sample mean is: S .E .(x̄) = σx̄ = σx√n.

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 22 / 28

Page 34: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Central Limit Theorem

Sample and population size

Note that the standard error depends only on

the sample size, not on the population size.

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 23 / 28

Page 35: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Central Limit Theorem

Central Limit Theorem: unknown σ

When the population variance, σ2, is unknown,

we can use the sample estimate:

σ̂x̄ =σ̂x√n

=

√σ̂2x

n

σ̂2x =

∑ni=1(xi − x̄)2

n − 1

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 24 / 28

Page 36: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Central Limit Theorem

Variance of a proportion

Note that the population variance for a sample

proportion of p can be estimated as:

σ̂2x =

∑ni=1(xi − x̄)2

n= p(1− p)

There is no division by n − 1 here, because only

one parameter determines both the mean and

the variance.Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 25 / 28

Page 37: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Central Limit Theorem

Central Limit Theorem: example

Suppose we have a random sample of 100 individualsand ask each what their first preference vote would be ifthere were elections today. If 30 of them say they wouldvote Fianna Fail, what is the standard error of theestimate that the proportion is p̂ = .3?

σp̂ =

√p̂(1− p̂)√

n=

√0.21√100

= 0.0458

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 26 / 28

Page 38: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Central Limit Theorem

Central Limit Theorem: example

Suppose we have a random sample of 100 individualsand ask each what their first preference vote would be ifthere were elections today. If 30 of them say they wouldvote Fianna Fail, what is the standard error of theestimate that the proportion is p̂ = .3?

σp̂ =

√p̂(1− p̂)√

n=

√0.21√100

= 0.0458

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 26 / 28

Page 39: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Central Limit Theorem

Exercises

Calculate the standard errors:

A sample of 20 students has an average grade of 60, with anestimated population variance of 10.

Out of a sample of 100 road accidents, 10 were fatal.

Of the 1300 respondents in a survey, 48% voted “Yes” onthe Lisbon Treaty referendum.

The average score on a 5-point political knowledge scale inthe same survey is 2.34, with an estimated populationstandard deviation of 0.3.

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 27 / 28

Page 40: Sampling distributions and the Central Limit Theorem€¦ · Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 8 / 28. Sampling Weighting Other types of sampling procedures

Central Limit Theorem

Regression

Open demdev.dta and look at the standard errors for:

The mean of laggdppc and polity2.

The regression coefficients for regressing polity2

on laggdppc.

The regression coefficients for regressing polity2

on log(laggdppc).

Johan A. Elkink (UCD) Central Limit Theorem 9 October 2014 28 / 28