Sampling Distributions Sampling Distributions. Sampling Distribution Introduction In real life...

24
Sampling Sampling Distributions Distributions

description

estimator An estimator of a population parameter is a sample statistic used to estimate or predict the population parameter. estimate An estimate of a parameter is a particular numerical value of a sample statistic obtained through sampling. point estimate A point estimate is a single value used as an estimate of a population parameter. estimator An estimator of a population parameter is a sample statistic used to estimate or predict the population parameter. estimate An estimate of a parameter is a particular numerical value of a sample statistic obtained through sampling. point estimate A point estimate is a single value used as an estimate of a population parameter. population parameter A population parameter is a numerical measure of a summary characteristic of a population. Sample Statistics as Estimators of Population Parameters sample statisticA sample statistic is a numerical measure of a summary characteristic of a sample.

Transcript of Sampling Distributions Sampling Distributions. Sampling Distribution Introduction In real life...

Page 1: Sampling Distributions Sampling Distributions. Sampling Distribution Introduction In real life calculating parameters of populations is prohibitive because.

Sampling DistributionsSampling Distributions

Page 2: Sampling Distributions Sampling Distributions. Sampling Distribution Introduction In real life calculating parameters of populations is prohibitive because.

Sampling Distribution Introduction

• In real life calculating parameters of populations is prohibitive because populations are very large.

• Rather than investigating the whole population, we take a sample, calculate a statistic related to the parameter of interest, and make an inference.

• The sampling distribution of the statistic is the tool that tells us how close is the statistic to the parameter.

Page 3: Sampling Distributions Sampling Distributions. Sampling Distribution Introduction In real life calculating parameters of populations is prohibitive because.

• An estimatorestimator of a population parameter is a sample statistic used to estimate or predict the population parameter.

• An estimateestimate of a parameter is a particular numerical value of a sample statistic obtained through sampling.

• A point estimatepoint estimate is a single value used as an estimate of a population parameter.

A population parameterpopulation parameter is a numerical measure of a summary characteristic of a population.

Sample Statistics as Estimators of Population Parameters

• A sample statisticsample statistic is a numerical measure of a summary characteristic

of a sample.

Page 4: Sampling Distributions Sampling Distributions. Sampling Distribution Introduction In real life calculating parameters of populations is prohibitive because.

• The sample mean, , is the most common estimator of the population mean,

• The sample variance, s2, is the most common estimator of the population variance, 2.

• The sample standard deviation, s, is the most common estimator of the population standard deviation, .

• The sample proportion, , is the most common estimator of the population proportion, p.

Estimators

X

Page 5: Sampling Distributions Sampling Distributions. Sampling Distribution Introduction In real life calculating parameters of populations is prohibitive because.

• The sampling distribution of sampling distribution of XX is the probability distribution of all possible values the random variable may assume when a sample of size n is taken from a specified population.

X

Sampling Distribution of X

Page 6: Sampling Distributions Sampling Distributions. Sampling Distribution Introduction In real life calculating parameters of populations is prohibitive because.

Sampling Distribution of the Mean• An example

– A die is thrown infinitely many times. Let X represent the number of spots showing on any throw.

– The probability distribution of X is

x 1 2 3 4 5 6p(x) 1/6 1/6 1/6 1/6 1/6 1/6

E(X) = 1(1/6) +2(1/6) + 3(1/6)+………………….= 3.5

V(X) = (1-3.5)2(1/6) + (2-3.5)2(1/6) + …………. …= 2.92

Page 7: Sampling Distributions Sampling Distributions. Sampling Distribution Introduction In real life calculating parameters of populations is prohibitive because.

• Suppose we want to estimate from the mean of a sample of size n = 2.

• What is the distribution of ?

x

Throwing a dice twice – sampling distribution of sample mean

x

Page 8: Sampling Distributions Sampling Distributions. Sampling Distribution Introduction In real life calculating parameters of populations is prohibitive because.

Sample Mean Sample Mean Sample Mean1 1,1 1 13 3,1 2 25 5,1 32 1,2 1.5 14 3,2 2.5 26 5,2 3.53 1,3 2 15 3,3 3 27 5,3 44 1,4 2.5 16 3,4 3.5 28 5,4 4.55 1,5 3 17 3,5 4 29 5,5 56 1,6 3.5 18 3,6 4.5 30 5,6 5.57 2,1 1.5 19 4,1 2.5 31 6,1 3.58 2,2 2 20 4,2 3 32 6,2 49 2,3 2.5 21 4,3 3.5 33 6,3 4.5

10 2,4 3 22 4,4 4 34 6,4 511 2,5 3.5 23 4,5 4.5 35 6,5 5.512 2,6 4 24 4,6 5 36 6,6 6

Throwing a die twice – sample mean

Page 9: Sampling Distributions Sampling Distributions. Sampling Distribution Introduction In real life calculating parameters of populations is prohibitive because.

x The distribution of when n = 2 Sample Mean Sample Mean Sample Mean

1 1,1 1 13 3,1 2 25 5,1 32 1,2 1.5 14 3,2 2.5 26 5,2 3.53 1,3 2 15 3,3 3 27 5,3 44 1,4 2.5 16 3,4 3.5 28 5,4 4.55 1,5 3 17 3,5 4 29 5,5 56 1,6 3.5 18 3,6 4.5 30 5,6 5.57 2,1 1.5 19 4,1 2.5 31 6,1 3.58 2,2 2 20 4,2 3 32 6,2 49 2,3 2.5 21 4,3 3.5 33 6,3 4.5

10 2,4 3 22 4,4 4 34 6,4 511 2,5 3.5 23 4,5 4.5 35 6,5 5.512 2,6 4 24 4,6 5 36 6,6 6

1 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0

6/365/36

4/36

3/36

2/36

1/36x

E( ) =1.0(1/36)+1.5(2/36)+….=3.5

V(X) = (1.0-3.5)2(1/36)+(1.5-3.5)2(2/36)... = 1.46

x

2:

22 xxxx andNote

Page 10: Sampling Distributions Sampling Distributions. Sampling Distribution Introduction In real life calculating parameters of populations is prohibitive because.

6)

5(5833.

5.35n

2x2

x

x

)10

(2917.

5.310n

2x2

x

x

)25

(1167.

5.325n

2x2

x

x

Sampling Distribution of the Mean

Page 11: Sampling Distributions Sampling Distributions. Sampling Distribution Introduction In real life calculating parameters of populations is prohibitive because.

Sampling Distribution of the Mean

)5

(5833.

5.35n

2x2

x

x

)10

(2917.

5.310n

2x2

x

x

)25

(1167.

5.325n

2x2

x

x

Notice that is smaller than . The larger the sample size the smaller . Therefore, tends to fall closer to , as the sample size increases.

2x

x2x

Notice that is smaller than x. The larger the sample size the smaller . Therefore, tends to fall closer to , as the sample size increases.

2x

x2x

2

Page 12: Sampling Distributions Sampling Distributions. Sampling Distribution Introduction In real life calculating parameters of populations is prohibitive because.

The expected value of the sample meanexpected value of the sample mean is equal to the population mean:

E XX X( )

The variance of the sample meanvariance of the sample mean is equal to the population variance divided by the sample size:

V XnX

X( ) 2

2

The standard deviation of the sample mean, standard deviation of the sample mean, known as the standard error of known as the standard error of the meanthe mean, is equal to the population standard deviation divided by the square root of the sample size:

nXSD X

X )(s.e.

Relationships between Population Parameters and the Sampling Distribution of the Sample Mean

Page 13: Sampling Distributions Sampling Distributions. Sampling Distribution Introduction In real life calculating parameters of populations is prohibitive because.

When sampling from a population with mean and finite standard deviation , the sampling distribution of the sample mean will tend to be a normal distribution with mean and standard deviation as the sample size becomes large(n >30).

For “large enough” n:

n

)/,(~ 2 nNX

P( X)

X

0.25

0.20

0.150.10

0.050.00

n = 5

P( X)

0.2

0.1

0.0 X

n = 20

f ( X)

X

-

0.4

0.3

0.2

0.1

0.0

Large n

The Central Limit Theorem

Page 14: Sampling Distributions Sampling Distributions. Sampling Distribution Introduction In real life calculating parameters of populations is prohibitive because.

Normal Uniform Skewed

Population

n = 2

n = 30

XXXX

General

The Central Limit Theorem Applies to Sampling Distributions from Any Population

Page 15: Sampling Distributions Sampling Distributions. Sampling Distribution Introduction In real life calculating parameters of populations is prohibitive because.

Mercury makes a 2.4 liter V-6 engine, used in speedboats. The company’s engineers believe the engine delivers an average horsepower of 220 HP and that the standard deviation of power delivered is 15 HP. A potential buyer intends to sample 100 engines. What is the probability that the sample mean will be less than 217 HP?

P X P X

n n

P Z P Z

P Z

( )

( ) .

217217

217 22015100

217 2201510

2 0 0228

The Central Limit Theorem (Example)

Page 16: Sampling Distributions Sampling Distributions. Sampling Distribution Introduction In real life calculating parameters of populations is prohibitive because.

If the population standard deviation, , is unknownunknown, replace with the sample standard deviation, s. If the population is normal, the resulting statistic: has a t distribution t distribution with with (n - 1) degrees of freedom(n - 1) degrees of freedom.

• The t is a family of bell-shaped and symmetric distributions, one for each number of degree of freedom.

• The expected value of t is 0.• The variance of t is greater than 1, but

approaches 1 as the number of degrees of freedom increases.

• The t distribution approaches a standard normal as the number of degrees of freedom increases.

• When the sample size is small (<30) we use t distribution.

nsXt /

Standard normal

t, df=20t, df=10

Student’s t Distribution

Page 17: Sampling Distributions Sampling Distributions. Sampling Distribution Introduction In real life calculating parameters of populations is prohibitive because.

Sampling Distributions

Finite Population Correction Factor

If the sample size is more than 5% of the population size and the sampling is donewithout replacement, then a correction needsto be made to the standard error of the means.

1

NnN

nx

Page 18: Sampling Distributions Sampling Distributions. Sampling Distribution Introduction In real life calculating parameters of populations is prohibitive because.

Sampling Distribution of Sampling Distribution of x

Finite PopulationFinite Population Infinite PopulationInfinite Population

x n

N nN

( )1

x n

• is referred to as the is referred to as the standard standard error of theerror of the meanmean..

x

• A finite population is treated as beingA finite population is treated as being infinite if infinite if nn//NN << .05. .05.

• is the finite correction factor.is the finite correction factor.( ) / ( )N n N 1

xStandard Deviation ofStandard Deviation of

Page 19: Sampling Distributions Sampling Distributions. Sampling Distribution Introduction In real life calculating parameters of populations is prohibitive because.

= 32.2x = 32

• The amount of soda pop in each bottle is normally distributed with a mean of 32.2 ounces and a standard deviation of 0.3 ounces.

• Find the probability that a carton of four bottles will have a mean of more than 32 ounces of soda per bottle.

• Solution– Define the random variable as the mean amount of soda per bottle.

9082.0)33.1z(P

)43.

2.3232x(P)32x(Px

32x

0.9082

2.32x

Sampling Distribution of the Sample Mean

Page 20: Sampling Distributions Sampling Distributions. Sampling Distribution Introduction In real life calculating parameters of populations is prohibitive because.

• Example – Dean’s claim: The average weekly income of

M.B.A graduates one year after graduation is $600.

– Suppose the distribution of weekly income has a standard deviation of $100. What is the probability that 25 randomly selected graduates have an average weekly income of less than $550?

– Solution

0062.0)5.2(

)25100

600550()550(

zP

xPxPx

Sampling Distribution of the Sample Mean

Page 21: Sampling Distributions Sampling Distributions. Sampling Distribution Introduction In real life calculating parameters of populations is prohibitive because.

The sample proportionsample proportion is the percentage of successes in n binomial trials. It is the number of successes, X, divided by the number of trials, n.

p Xn

As the sample size, n, increases, the sampling distribution of approaches a normal normal distributiondistribution with mean p and standard deviation

p

p pn

( )1

Sample proportion:

1514131211109876543210

0.2

0.1

0.0P(

X)

n=15, p = 0.3

X

1415

1315

1215

1115

10 15

9 15

8 15

7 15

6 15

5 15

4 15

3 15

2 15

1 15

0 15

1515 ^

p

210

0 .5

0 .4

0 .3

0 .2

0 .1

0 .0

X

P(X)

n=2, p = 0 .3

109876543210

0.3

0.2

0.1

0.0

P(X

)

n=10,p=0.3

X

The Sampling Distribution of the Sample Proportion, p

Page 22: Sampling Distributions Sampling Distributions. Sampling Distribution Introduction In real life calculating parameters of populations is prohibitive because.

Normal approximation to the Binomial

– Normal approximation to the binomial works best when

• the number of experiments (sample size) is large, and

• the probability of success, p, is close to 0.5.

– For the approximation to provide good results two conditions should be met:

np 5; n(1 - p) 5

Page 23: Sampling Distributions Sampling Distributions. Sampling Distribution Introduction In real life calculating parameters of populations is prohibitive because.

• Example – A state representative received 52% of the

votes in the last election.– One year later the representative wanted

to study his popularity.– If his popularity has not changed, what is

the probability that more than half of a sample of 300 voters would vote for him?

Page 24: Sampling Distributions Sampling Distributions. Sampling Distribution Introduction In real life calculating parameters of populations is prohibitive because.

• Example – Solution

• The number of respondents who prefer the representative is binomial with n = 300 and p = .52. Thus, np = 300(.52) = 156 andn(1-p) = 300(1-.52) = 144 (both greater than 5)

7549.300)52.1)(52(.

52.50.)1(

ˆ)50.ˆ(

nppppPpP