SUMMARY. Z-distribution Central limit theorem Sweet demonstration of the sampling distribution of...

28
SUMMARY

Transcript of SUMMARY. Z-distribution Central limit theorem Sweet demonstration of the sampling distribution of...

Page 1: SUMMARY. Z-distribution Central limit theorem Sweet demonstration of the sampling distribution of the mean.

SUMMARY

Page 2: SUMMARY. Z-distribution Central limit theorem Sweet demonstration of the sampling distribution of the mean.

• Z-distribution• Central limit theorem

Page 3: SUMMARY. Z-distribution Central limit theorem Sweet demonstration of the sampling distribution of the mean.

Sweet demonstration of the sampling distribution of the mean

Page 4: SUMMARY. Z-distribution Central limit theorem Sweet demonstration of the sampling distribution of the mean.

Sweet data

𝑛=20

Page 5: SUMMARY. Z-distribution Central limit theorem Sweet demonstration of the sampling distribution of the mean.

R-code – sampling distribution exactdata.set <- c(6,4,5,3,10,3,5,3,6,5,4,8,7,2,8,5,8,5,4,0)

mean(data.set)

sd(data.set)*sqrt(19/20) #standard deviation

(sd(data.set)*sqrt(19/20))/sqrt(20) sample_size<-5

samps <- combn(data.set, sample_size)

xbars <- colMeans(samps)

barplot(table(xbars))

Page 6: SUMMARY. Z-distribution Central limit theorem Sweet demonstration of the sampling distribution of the mean.

Sampling distribution – exact

𝜇𝑥=𝑀=??

𝑀=𝜇=5.05

Page 7: SUMMARY. Z-distribution Central limit theorem Sweet demonstration of the sampling distribution of the mean.

R-code (sampling distribution simulated)

data.set <- c(6,4,5,3,10,3,5,3,6,5,4,8,7,2,8,5,8,5,4,0)

sample_size<-3

number_of_samples<-20

samples <- replicate(number_of_samples,sample(data.set, sample_size, replace=T)); out<-colMeans(samples); mean(out); sd(out)

barplot(table(out))

Page 8: SUMMARY. Z-distribution Central limit theorem Sweet demonstration of the sampling distribution of the mean.

Sampling distribution – simulated

Page 9: SUMMARY. Z-distribution Central limit theorem Sweet demonstration of the sampling distribution of the mean.

Sampling distribution – simulated

Page 10: SUMMARY. Z-distribution Central limit theorem Sweet demonstration of the sampling distribution of the mean.

ESTIMATION

Page 11: SUMMARY. Z-distribution Central limit theorem Sweet demonstration of the sampling distribution of the mean.

Statistical inference

If we can’t conduct a census, we collect data from the sample of a population.

Goal: make conclusions about that population

Page 12: SUMMARY. Z-distribution Central limit theorem Sweet demonstration of the sampling distribution of the mean.

Demonstration problem• You sample 36 apples from your farm’s harvest of over

200 000 apples. The mean weight of the sample is 112 grams (with a 40 gram sample standard deviation).

• What is the probability that the mean weight of all 200 000 apples is within 100 and 124 grams?

Page 13: SUMMARY. Z-distribution Central limit theorem Sweet demonstration of the sampling distribution of the mean.

What is the question?• We would like to know the probability that the population

mean is within 12 of the sample mean.

• But this is the same thing as

• But this is the same thing as

• So, if I am able to say how many standard deviations away from I am, I can use the Z-table to figure out the probability.

Page 14: SUMMARY. Z-distribution Central limit theorem Sweet demonstration of the sampling distribution of the mean.

Slight complication• There is one caveat, can you see it?• We don’t know a standard deviation of a sampling

distribution (standard error). We only know it equals to , but is uknown.

• What we’re going to do is to estimate . Best thing we can use is a sample standard deviation , that equals to 40.

• . This is our best estimate of a standard error.• Now you finish the example. What is the probability that

population mean lies within 12 of the sample if the SE equals to 6.67?• 92.82%

Page 15: SUMMARY. Z-distribution Central limit theorem Sweet demonstration of the sampling distribution of the mean.

This is neat!• You sample 36 apples from your farm’s harvest of over

200 000 apples. The mean weight of the sample is 112 grams (with a 40 gram sample standard deviation). What is the probability that the population mean weight of all 200 000 apples is within 100 and 124 grams?

• We started with very little information (we know just the sample statistics), but we can infere that

with the probability of 92.82% a population mean lies within 12 of our sample mean!

Page 16: SUMMARY. Z-distribution Central limit theorem Sweet demonstration of the sampling distribution of the mean.

Point vs. interval estimate• You sample 36 apples from your farm’s harvest of over

200 000 apples. The mean weight of the sample is 112 grams (with a 40 gram sample standard deviation).

• Goal: estimate a population mean

1. A population mean is estimated as a sample mean. i.e. we say a population mean equals to 112 g. This is called a point estimate (bodový odhad).

2. However, we can do better. We can estimate, that our true population mean will lie with the 95% confidence within an interval of (interval estimate).

𝑥±1.96×𝑠

√𝑛

Page 17: SUMMARY. Z-distribution Central limit theorem Sweet demonstration of the sampling distribution of the mean.

Confidence interval• This type of result is called a confidence interval

(interval spolehlivosti, konfidenční interval).

• The number of stadandard errors you want to add/subtract depends on the confidence level (e.g. 95%) (hladina spolehlivosti).

𝑥±𝑍×𝑠

√𝑛margin of error

možná odchylka

critical valuekritická hodnota

Page 18: SUMMARY. Z-distribution Central limit theorem Sweet demonstration of the sampling distribution of the mean.

Confidence level• The desired level of confidence is set by the researcher

(not determined by data).• If you want to be 95% confident with your results, you add/subtract

1.96 standard errors (empirical rule says about 2 standard errors).• 95% interval spolehlivosti

Confidence level Z-value

80 1.28

90 1.64

95 1.96

98 2.33

99 2.58

Page 19: SUMMARY. Z-distribution Central limit theorem Sweet demonstration of the sampling distribution of the mean.

80% 90%

95% 99%

1.28

1.96

1.64

2.58

Page 20: SUMMARY. Z-distribution Central limit theorem Sweet demonstration of the sampling distribution of the mean.

Small sample size confidence intervals

• 7 patient’s blood pressure have been measured after having been given a new drug for 3 months. They had blood pressure increases of 1.5, 2.9, 0.9, 3.9, 3.2, 2.1 and 1.9. Construct a 95% confidence interval for the true expected blood pressure increase for all patients in a population.

Page 21: SUMMARY. Z-distribution Central limit theorem Sweet demonstration of the sampling distribution of the mean.

CLT consequence• Change in a blood pressure is a biological process. It’s

going to be a sum of thousands or millions of microscopic processes.

• Generally, if we think about biological/physical process, they can be viewed as being affected by a large number of random subprocesses with individually small effects.

• The sum of all these random components creates a random variable that converges to a normal distribution regardless of the underlying distribution of processes causing the small effects.

• Thus, the Central Limit Theorem explains the ubiquity of the famous "Normal distribution" in the measurements domain.

Page 22: SUMMARY. Z-distribution Central limit theorem Sweet demonstration of the sampling distribution of the mean.

• We will assume that our population distribution is normal, with and .

• We don’t know anything about this distribution but we have a sample. Let’s figure out everything you can figure out about this sample: • ,

• We’ve been estimating the true population standard deviation with our sample standard deviation

• However, we are estimating our standard deviation with of only ! This is probably goint to be not so good estimate.

• In general, if this is considered a bad estimate.

Page 23: SUMMARY. Z-distribution Central limit theorem Sweet demonstration of the sampling distribution of the mean.

William Sealy Gosset aka Student• 1876-1937• an employee of Guinness

brewery• 1908 papers addressed the

brewer's concern with small samples• "The probable error of a mean".

Biometrika 6 (1): 1–25. March 1908.• Probable error of a correlation

coefficient". Biometrika 6 (2/3): 302–310. September 1908.

Page 24: SUMMARY. Z-distribution Central limit theorem Sweet demonstration of the sampling distribution of the mean.

Student t-distribution• Instead of assuming a sampling distribution is normal we

will use a Student t-distribution.• It gives a better estimate of your confidence interval if you

have a small sample size.• It looks very similar to a normal distribution, but it has

fatter tails to indicate the higher frequency of outliers which come with a small data set.

Page 25: SUMMARY. Z-distribution Central limit theorem Sweet demonstration of the sampling distribution of the mean.

Student t-distribution

Page 26: SUMMARY. Z-distribution Central limit theorem Sweet demonstration of the sampling distribution of the mean.

Student t-distribution

df – degree of freedom (stupeň volnosti)

Page 27: SUMMARY. Z-distribution Central limit theorem Sweet demonstration of the sampling distribution of the mean.

Back to our case

• Because a sample size is small, sampling distribution of the mean won’t be normal. Instead, it will have a Student t-distribution with .

• Construct a 95% confidence interval, please

for𝑛<30 :𝑥 ±𝑡𝑛−1×𝑠

√𝑛

Page 28: SUMMARY. Z-distribution Central limit theorem Sweet demonstration of the sampling distribution of the mean.

• Just to summarize, the margin of error depends on1. the confidence level (common is 95%)

2. the sample size • as the sample size increases, the margin of error decreases• For the bigger sample we have a smaller interval for which we’re

pretty sure the true population lies.

3. the variability of the data (i.e. on σ)• more variability increases the margin of error

• Margin of error does not measure anything else than chance variation.

• It doesn’t measure any bias or errors that happen during the proces.

• It does not tell anything about the correctness of your data!!!

neco×𝑠

√𝑛