STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are...

22
STAT 311: Sampling Distributions Y. Samuel Wang Summer 2016 Y. Samuel Wang STAT 311: Sampling Distributions

Transcript of STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are...

Page 1: STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are identical? E(S n) = n Var(S n) = n˙2 Z n = S nnn p n˙2 = (1 n S ) p n p ˙2 = p n (

STAT 311: Sampling Distributions

Y. Samuel Wang

Summer 2016

Y. Samuel Wang STAT 311: Sampling Distributions

Page 2: STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are identical? E(S n) = n Var(S n) = n˙2 Z n = S nnn p n˙2 = (1 n S ) p n p ˙2 = p n (

Logistics

Midterm passed back at end of class

Homework posted later today, due next Friday

Y. Samuel Wang STAT 311: Sampling Distributions

Page 3: STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are identical? E(S n) = n Var(S n) = n˙2 Z n = S nnn p n˙2 = (1 n S ) p n p ˙2 = p n (

Long run behavior

Each individual outcome of a random variable is unpredictable

Probability models define the long run behavior

How can we apply these ideas to real world data which wegather

Theorem

Law of Large Numbers: Let n ≥ 1, and let X1,X2 . . .Xn be asequence of mutually independent random variables with finitemean µ and finite variance σ2. Define

x̄n =X1 + X2 + . . .Xn

n

then for any ε, δ > 0 there exists some n such that

P(|X̄n − µ| < ε) = 1− δ

Y. Samuel Wang STAT 311: Sampling Distributions

Page 4: STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are identical? E(S n) = n Var(S n) = n˙2 Z n = S nnn p n˙2 = (1 n S ) p n p ˙2 = p n (

Long run behavior

Each individual outcome of a random variable is unpredictable

Probability models define the long run behavior

How can we apply these ideas to real world data which wegather

Theorem

Law of Large Numbers: Let n ≥ 1, and let X1,X2 . . .Xn be asequence of mutually independent random variables with finitemean µ and finite variance σ2. Define

x̄n =X1 + X2 + . . .Xn

n

then for any ε, δ > 0 there exists some n such that

P(|X̄n − µ| < ε) = 1− δ

Y. Samuel Wang STAT 311: Sampling Distributions

Page 5: STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are identical? E(S n) = n Var(S n) = n˙2 Z n = S nnn p n˙2 = (1 n S ) p n p ˙2 = p n (

Implications of the LLN

Given a large enough sample, we can achieve any level ofprecision (determined by ε and δ)

Exactly how large n has to be depends on σ2, which inpractice we don’t always know

Typically, taking a larger sample will give us a more preciseestimate of the true parameters

Sample statistic will “converge” to the truth

LLN cannot make up for bad study design

Y. Samuel Wang STAT 311: Sampling Distributions

Page 6: STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are identical? E(S n) = n Var(S n) = n˙2 Z n = S nnn p n˙2 = (1 n S ) p n p ˙2 = p n (

Central Limit Theorem

Let n ≥ 1, and let X1,X2 . . .Xn be a sequence of mutuallyindependent random variables such that E (Xi ) = µi andVar(Xi ) = σ2i . Define Sn = X1 + X2 . . .Xn

Note that because we are summing random variables,E (Sn) = µ1 + µ2 . . . µn

Since the random variables are independentVar(Sn) = σ21 + σ22 + . . . σ2n

If Y = Sn − E (Sn), then E (Y ) = 0 and Var(Y ) = Var(Sn)

If Z = Y /√

Var(Sn), then E (Z ) = 0 and Var(Z ) = 1

Can we say more?

Y. Samuel Wang STAT 311: Sampling Distributions

Page 7: STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are identical? E(S n) = n Var(S n) = n˙2 Z n = S nnn p n˙2 = (1 n S ) p n p ˙2 = p n (

Central Limit Theorem

Let n ≥ 1, and let X1,X2 . . .Xn be a sequence of mutuallyindependent random variables such that E (Xi ) = µi andVar(Xi ) = σ2i . Define Sn = X1 + X2 . . .Xn

Note that because we are summing random variables,E (Sn) = µ1 + µ2 . . . µn

Since the random variables are independentVar(Sn) = σ21 + σ22 + . . . σ2n

If Y = Sn − E (Sn), then E (Y ) = 0 and Var(Y ) = Var(Sn)

If Z = Y /√

Var(Sn), then E (Z ) = 0 and Var(Z ) = 1

Can we say more?

Y. Samuel Wang STAT 311: Sampling Distributions

Page 8: STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are identical? E(S n) = n Var(S n) = n˙2 Z n = S nnn p n˙2 = (1 n S ) p n p ˙2 = p n (

Central Limit Theorem

Theorem

Central Limit Theorem: Let n ≥ 1, and let X1,X2 . . .Xn be asequence of mutually independent random variables such thatE (Xi ) = µi and Var(Xi ) = σ2i . Define Sn = X1 + X2 . . .Xn

Zn =Sn − E (Sn)√

Var(Sn)→d N(0, 1)

The →d symbol means converges in distribution. This means thatthe distribution of my random variable Zn, which is simply formedfrom the Xi ’s, begins to look more and more like a standardnormal distribution as n increases.

Y. Samuel Wang STAT 311: Sampling Distributions

Page 9: STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are identical? E(S n) = n Var(S n) = n˙2 Z n = S nnn p n˙2 = (1 n S ) p n p ˙2 = p n (

Central Limit Theorem

What if all my random variables are identical?

E (Sn) = nµ

Var(Sn) = nσ2

Zn =Sn − nµ√

nσ2=

n( 1nSn − µ)√n√σ2

=√n

(x̄n − µ)

σ

The distribution of (almost) any sample mean goes to a standardnormal distribution when we account for the mean and standarddeviation of the population.

Y. Samuel Wang STAT 311: Sampling Distributions

Page 10: STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are identical? E(S n) = n Var(S n) = n˙2 Z n = S nnn p n˙2 = (1 n S ) p n p ˙2 = p n (

Central Limit Theorem

What if all my random variables are identical?

E (Sn) = nµ

Var(Sn) = nσ2

Zn =Sn − nµ√

nσ2=

n( 1nSn − µ)√n√σ2

=√n

(x̄n − µ)

σ

The distribution of (almost) any sample mean goes to a standardnormal distribution when we account for the mean and standarddeviation of the population.

Y. Samuel Wang STAT 311: Sampling Distributions

Page 11: STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are identical? E(S n) = n Var(S n) = n˙2 Z n = S nnn p n˙2 = (1 n S ) p n p ˙2 = p n (

Central Limit Theorem Caveats

Note that this does say anything about individual randomvariables

This only makes a statement about the x̄n as n grows large

Imagine taking many samples of size n = 100 and plot thehistogram, that will look roughly normal; then take manysamples of n = 1000, that will look even more like a normaldistribution, etc.

How large n has to be for a “roughly normal” histogram of byx̄n depends on many things, but as long as n keeps growing, itwill eventually get there

Y. Samuel Wang STAT 311: Sampling Distributions

Page 12: STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are identical? E(S n) = n Var(S n) = n˙2 Z n = S nnn p n˙2 = (1 n S ) p n p ˙2 = p n (

Central Limit Theorem Example

For a given sample size n, take the sample average of nexponential random variables with rate = 1. Then repeat with anew sample 1000 times.

Y. Samuel Wang STAT 311: Sampling Distributions

Page 13: STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are identical? E(S n) = n Var(S n) = n˙2 Z n = S nnn p n˙2 = (1 n S ) p n p ˙2 = p n (

Central Limit Theorem Example

For a given sample size n, take the sample average of nexponential random variables with rate = 1. Then repeat with anew sample 1000 times.

Y. Samuel Wang STAT 311: Sampling Distributions

Page 14: STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are identical? E(S n) = n Var(S n) = n˙2 Z n = S nnn p n˙2 = (1 n S ) p n p ˙2 = p n (

Central Limit Theorem Example

For a given sample size n, take the sample average of nexponential random variables with rate = 1. Then repeat with anew sample 1000 times.

Y. Samuel Wang STAT 311: Sampling Distributions

Page 15: STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are identical? E(S n) = n Var(S n) = n˙2 Z n = S nnn p n˙2 = (1 n S ) p n p ˙2 = p n (

Central Limit Theorem Example

For a given sample size n, take the sample average of nexponential random variables with rate = 1 and standardize thesample average. Then repeat with a new sample 1000 times.

Y. Samuel Wang STAT 311: Sampling Distributions

Page 16: STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are identical? E(S n) = n Var(S n) = n˙2 Z n = S nnn p n˙2 = (1 n S ) p n p ˙2 = p n (

Central Limit Theorem Example

For a given sample size n, take the sample average of nexponential random variables with rate = 1 and standardize thesample average. Then repeat with a new sample 1000 times.

Y. Samuel Wang STAT 311: Sampling Distributions

Page 17: STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are identical? E(S n) = n Var(S n) = n˙2 Z n = S nnn p n˙2 = (1 n S ) p n p ˙2 = p n (

Central Limit Theorem Example

For a given sample size n, take the sample average of nexponential random variables with rate = 1 and standardize thesample average. Then repeat with a new sample 1000 times.

Y. Samuel Wang STAT 311: Sampling Distributions

Page 18: STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are identical? E(S n) = n Var(S n) = n˙2 Z n = S nnn p n˙2 = (1 n S ) p n p ˙2 = p n (

Central Limit Theorem Example

For a given sample size n, take the sample average of nexponential random variables with rate = 1 and standardize thesample average. Then repeat with a new sample 1000 times.

Y. Samuel Wang STAT 311: Sampling Distributions

Page 19: STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are identical? E(S n) = n Var(S n) = n˙2 Z n = S nnn p n˙2 = (1 n S ) p n p ˙2 = p n (

Normal Distributions in Nature

Why is height (approximately) normally distributed?

How tall are you today?

That is the sum of how tall you grew each day for the Xnumber of days you have been alive so far.

So the sum of those lengths is a sample of size X

Each person is a separate sample

Y. Samuel Wang STAT 311: Sampling Distributions

Page 20: STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are identical? E(S n) = n Var(S n) = n˙2 Z n = S nnn p n˙2 = (1 n S ) p n p ˙2 = p n (

How else can we use the CLT?

We know a hypothetical distribution of a sample statistic x̄

In practice we don’t do many “resamples”

We can still estimate the hypothetical distribution if we didresample many times, if we know σ2

If we don’t know σ2, we can estimate it, but this changesthings slightly

Y. Samuel Wang STAT 311: Sampling Distributions

Page 21: STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are identical? E(S n) = n Var(S n) = n˙2 Z n = S nnn p n˙2 = (1 n S ) p n p ˙2 = p n (

Sampling distributions

If the assumptions are satisfied,

x̄ ∼ N(0, σ2/n)

Var(x̄) = Var(1

n

∑i

xi ) =1

n2Var(

∑i

xi ) =1

n2nσ2 = σ2/n

Y. Samuel Wang STAT 311: Sampling Distributions

Page 22: STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are identical? E(S n) = n Var(S n) = n˙2 Z n = S nnn p n˙2 = (1 n S ) p n p ˙2 = p n (

Sampling distributions

If the assumptions are satisfied,

x̄ ∼ N(0, σ2/n)

Var(x̄) = Var(1

n

∑i

xi ) =1

n2Var(

∑i

xi ) =1

n2nσ2 = σ2/n

Y. Samuel Wang STAT 311: Sampling Distributions