Statistical inference: confidence intervals and hypothesis testing.
Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals ...
-
Upload
darrell-rogers -
Category
Documents
-
view
232 -
download
0
Transcript of Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals ...
![Page 1: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/1.jpg)
Review of Statistics
![Page 2: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/2.jpg)
Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots and Sample Correlation
![Page 3: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/3.jpg)
Estimation of the Population Mean
One natural way to estimate the population mean, , is simply to compute the sample average from a sample of n i.i.d. observations. This can also be motivated by law of large numbers.
But, is not the only way to estimate . For example, Y1 can be another estimator of .
![Page 4: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/4.jpg)
Copyright © 2003 by Pearson Education, Inc.
3-4
Key Concept 3.1
![Page 5: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/5.jpg)
In general, we want an estimator that gets as close as possible to the unknown true value, at least in some average sense.
In other words, we want the sampling distribution of an estimator to be as tightly centered around the unknown value as possible.
This leads to three specific desirable characteristics of an estimator.
![Page 6: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/6.jpg)
Three desirable characteristics of an estimator.
Let denote some estimator of , Unbiasedness: Consistency: Efficiency.
Let be another estimator of , and suppose both and are unbiased. Then is said to be more efficient than if
![Page 7: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/7.jpg)
Properties of It can be shown that E( )= and
(from law of large numbers), is both unbiased and consistent.
But, is efficient?
![Page 8: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/8.jpg)
Examples of alternative estimators.
Example 1: The first observation Y1?
Since E(Y1)= , Y1 is an unbiased estimator of . But,
if n≥2, is more efficient than Y1.
![Page 9: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/9.jpg)
Example 2:
where n is assumed to be an even number.The mean of is and its variance is
Thus is unbiased and, because Var( ) → 0 as n→∞, is consistent.
However, is more efficient than .
![Page 10: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/10.jpg)
In fact, is the most efficient estimator of among all unbiased estimators that are weighted averages of Y1, … , Yn. (Weighted average implies that the estimators are all unbiased.)
![Page 11: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/11.jpg)
Hypothesis Testing
The hypothesis testing problem (for the mean): make a provisional decision, based on the evidence at hand, whether a null hypothesis is true, or instead that some alternative hypothesis is true. That is, test
![Page 12: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/12.jpg)
Copyright © 2003 by Pearson Education, Inc.
3-12
Key Concept 3.5
![Page 13: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/13.jpg)
Terminology
Significance level; p-value; critical value Confidence interval; acceptances region,
rejection region Size; power
![Page 14: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/14.jpg)
p-value = probability of drawing a statistic (e.g. ) at least as adverse to the null as the value actually computed with your data, assuming that the null hypothesis is true.
The significance level of a test is a pre-specified probability of incorrectly rejecting the null, when the null is true.
![Page 15: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/15.jpg)
Calculating the p-value based on :
where is the value of actually observed. To compute the p-value, you need to know the
distribution of . If n is large, we can use the large-n normal
approximation.
![Page 16: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/16.jpg)
where denotes the standard deviation of the distribution of .
![Page 17: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/17.jpg)
Calculating the p-value with Y known
![Page 18: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/18.jpg)
Type I and Type II Error
Type I Error (Red) , Type II Error (Blue)
虛 無 分 配對 立 假 設 下 的虛 無 分 配
臨 界 值
α/2
α/2β
對立假設
![Page 19: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/19.jpg)
Confidence Intervals
A 95% confidence interval for is an interval that contains the true value of Y in 95% of repeated samples.
Digression: What is random here? the confidence interval— it will differ from one sample to the next; the population parameter, , is not random.
![Page 20: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/20.jpg)
In practice, is unknown—it must be estimated.Estimator of the variance of Y:
Fact: If (Y1, … , Yn ) are i.i.d. and E(Y4)< ∞ , then Why does the law of large numbers apply?
Because is a sample average. Technical note: we assume E(Y4)< ∞ because
here the average is not of Yi , but of its square.
![Page 21: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/21.jpg)
![Page 22: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/22.jpg)
![Page 23: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/23.jpg)
For the first term, Define , then , and
W1, … , Wn are i.i.d.
. Thus W1, … , Wn are i.i.d. and Var(Wi) < ∞ , so
Therefore,
For the second term, because
Therefore,
![Page 24: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/24.jpg)
Computing the p-value with estimated:
![Page 25: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/25.jpg)
The p-value and the significance level
With a prespecified significance level (e.g. 5%): reject if |t| > 1.96. equivalently: reject if p ≤ 0.05. The p-value is sometimes called the marginal
significance level.
![Page 26: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/26.jpg)
What happened to the t-table and the degrees of freedom? Digression: the Student t distribution If Yi, i = 1,…, n is i.i.d. N(Y, ), then the t-statistic
has the Student t-distribution with n – 1 degrees of freedom. The critical values of the Student t-distribution is tabulated in the back of all statistics books. Remember the recipe? Compute the t-statistic Compute the degrees of freedom, which is n – 1 Look up the 5% critical value If the t-statistic exceeds (in absolute value) this
critical value, reject the null hypothesis.
![Page 27: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/27.jpg)
Comments on this recipe and the Student t-distribution
The theory of the t-distribution was one of the early triumphs of mathematical statistics. It is astounding, really: if Y is i.i.d. normal, then you can know the exact, finite-sample distribution of the t-statistic – it is the Student t. So, you can construct confidence intervals (using the Student t critical value) that have exactly the right coverage rate, no matter what the sample size. But….
27
![Page 28: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/28.jpg)
Comments on Student t distribution, ctd.
If the sample size is moderate (several dozen) or large (hundreds or more), the difference between the t-distribution and N(0,1) critical values are negligible. Here are some 5% critical values for 2-sided tests:
28
10 2.2320 2.0930 2.0460 2 1.96
5% t -distributioncritical value
degrees of freedom(n-1)
![Page 29: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/29.jpg)
Comments on Student t distribution, ctd.
So, the Student-t distribution is only relevant when the sample size is very small; but in that case, for it to be correct, you must be sure that the population distribution of Y is normal. In economic data, the normality assumption is rarely credible. Here are the distributions of some economic data. Do you think earnings are normally distributed? Suppose you have a sample of n = 10 observations
from one of these distributions – would you feel comfortable using the Student t distribution?
29
![Page 30: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/30.jpg)
30
![Page 31: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/31.jpg)
Comments on Student t distribution, ctd.
Consider the t-statistic testing the hypothesis that two means (groups s, l) are equal:
Even if the population distribution of Y in the two groups is normal, this statistic doesn’t have a Student t distribution!
There is a statistic testing this hypothesis that has a normal distribution, the “pooled variance” t-statistic – see SW (Section 3.6) – however the pooled variance t-statistic is only valid if the variances of the normal distributions are the same in the two groups. Would you expect this to be true, say, for men’s v. women’s wages?
31
2 2 ( )s l
s l
s l s l
s ss l
n n
Y Y Y Yt
SE Y Y
![Page 32: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/32.jpg)
The Student-t distribution – summary
The assumption that Y is distributed N(Y, ) is rarely plausible in practice (income? number of children?)
For n > 30, the t-distribution and N(0,1) are very close (as n grows large, the tn–1 distribution converges to N(0,1))
The t-distribution is an artifact from days when sample sizes were small and “computers” were people
For historical reasons, statistical software typically uses the t-distribution to compute p-values – but this is irrelevant when the sample size is moderate or large.
For these reasons, in this class we will focus on the large-n approximation given by the CLT
32
![Page 33: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/33.jpg)
Summary on The Student t-distribution
If Y is distributed , then the t-statistic has the Student t-distribution (tabulated in back of all stats books)
Some comments: For n > 30, the t-distribution and N(0,1) are very close. The assumption that Y is distributed is
rarely plausible in practice (income? number of children?)
The t-distribution is an historical artifact from days when sample sizes were very small.
In this class, we won’t use the t distribution - we rely solely on the large-n approximation given by the CLT.
![Page 34: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/34.jpg)
Confidence Intervals
A 95% confidence interval for is an interval that contains the true value of Y in 95% of repeated samples.
Digression: What is random here? the confidence interval— it will differ from one sample to the next; the population parameter, , is not random.
![Page 35: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/35.jpg)
A 95% confidence interval can always be constructed as the set of values of not rejected by a hypothesis test with a 5% significance level.
![Page 36: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/36.jpg)
This confidence interval relies on the large-n results that is
approximately normally distributed and
![Page 37: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/37.jpg)
Summary:
From the assumptions of:(1) simple random sampling of a population, that is,
(2)we developed, for large samples (large n):
Theory of estimation (sampling distribution of ) Theory of hypothesis testing (large-n distribution
of t-statistic and computation of the p-value). Theory of confidence intervals (constructed by
inverting test statistic).Are assumptions (1) & (2) plausible in practice? Yes
![Page 38: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/38.jpg)
Tests for Difference between Two Means
Let be the mean hourly earning in the population of women recently graduated from college and let be population mean for recently graduated men. Consider the null hypothesis that earnings for these two populations differ by certain amount d, then
![Page 39: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/39.jpg)
Replace population variances by sample variances, we have the standard error
and the t-statistic is
If both nm and nw are large, the t-statistic has a standard normal distribution.
![Page 40: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/40.jpg)
40
![Page 41: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/41.jpg)
![Page 42: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/42.jpg)
Summarize the relationship between variables
Scatterplots:
![Page 43: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/43.jpg)
The population covariance and correlation can be estimated by the sample covariance and sample correlation.
The sample covariance is
The sample correlation is
![Page 44: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/44.jpg)
It can be shown that under the assumptions that (Xi , Yi) are i.i.d. and that Xi and Yi have finite fourth moments,
![Page 45: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/45.jpg)
![Page 46: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/46.jpg)
It is easy to see that the second term converges in probability to zero because so
by Slutsky’s theorem.
![Page 47: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/47.jpg)
By the definition of covariance, we haveTo apply the law of large numbers on the first term,
we need to have
which is satisfied since
![Page 48: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/48.jpg)
The second inequality follows by applying the Cauchy-Schwartz inequality, and the last inequality follows because of the finite fourth moments for (Xi , Yi).
The Cauchy-Schwartz inequality is
![Page 49: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/49.jpg)
Applying the law of large numbers, we have
Also, , therefore
![Page 50: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/50.jpg)
Scatterplots and Sample Correlation
![Page 51: Review of Statistics. Estimation of the Population Mean Hypothesis Testing Confidence Intervals Comparing Means from Different Populations Scatterplots.](https://reader035.fdocuments.in/reader035/viewer/2022062217/5697c00e1a28abf838cc9d69/html5/thumbnails/51.jpg)