1. 2 3 4 4-1 Statistical Inference The field of statistical inference consists of those methods used...
-
date post
21-Dec-2015 -
Category
Documents
-
view
217 -
download
0
Transcript of 1. 2 3 4 4-1 Statistical Inference The field of statistical inference consists of those methods used...
1
2
3
4
4-1 Statistical Inference
• The field of statistical inference consists of those methods used to make decisions or draw conclusions about a population.
•These methods utilize the information contained in a sample from the population in drawing conclusions.
• Example: Estimate the average height of the class.
5
4-1 Statistical Inference
6
4-2 Point Estimation
7
• An estimator should be close in some sense to the true value of the unknown parameter.
• Θ is an unbiased estimator of if E(Θ) =
• If the estimator Θ is not unbiased, then the difference E(Θ) is called the bias of the estimator Θ
4-2 Point Estimation
8
• Suppose that X is a random variable with mean and variance 2.
• Let X1, X2,.., Xn be a random sample of size n from the population represented by X.
• Show that the sample mean X and sample variance S2 are unbiased estimators of and variance 2, respectively.
4-2 Point EstimationExample 4-1
n
ii
n
ii
n
iii
n
ii
n
ii
XnEXEn
XnXEn
XXXXEn
XXEnn
XXESE
1
22
1
2
1
22
1
21
2
2
)()(1
1
1
1)2(
1
1
)(1
1
1
)()(
9
•
•
•
4-2 Point EstimationExample 4-1
2
2222
1
222
1
1
)()(1
1)(
nnnn
XnEXEn
SEn
ii
222 )( iXE
nXE222
)(
10
• Sample variance S2 are unbiased estimators of variance 2.
• Sample standard deviation S are not unbiased estimators of population standard deviation .
4-2 Point EstimationExample 4-1
11
• Sometimes there are several unbiased estimators of the sample population parameter.
• Example: a random sample of size n =10.1. Sample mean2. Sample median
3. The first observation X1
• All are unbiased estimator of population X.• Cannot rely on the property of unbiasedness alone
to select the estimator.
4-2 Point EstimationDifferent Unbiased Estimators
12
• Suppose that Θ1 and Θ2 are unbiased estimators of • The variances of these two distribution of each
estimator may be different.
• Θ1 has a smaller variance than Θ2 does.
• Θ1 is more likely to produce an estimate close to the true value .
4-2 Point EstimationDifferent Unbiased Estimators
13
• A logical principle of estimation is to choose the estimator that has minimum variance.
4-2 Point EstimationHow to Choose an Unbiased Estimator
14
• In practice, one must occasionally use a biased estimator.
• For example, S for .• What is the criterion?• Mean square error
4-2 Point EstimationHow to Choose a Good Estimator
15
• MSE(Θ)
= E[ΘE(Θ)2] + [E(Θ)]2
= V(Θ)+(bias)2
• Unbiased estimator bias = 0
• A good estimator is the one that minimizes V(Θ)+(bias)2
4-2 Point EstimationMean Square Error
16
• Given two estimator Θ1 and Θ2,
• The relative efficiency of Θ1 and Θ2 is defined as
r = MSE(Θ1)/MSE(Θ2)
• r < 1 Θ1 is better than Θ2.
• Example:
• Θ1= X: the sample mean of sample size n.
• Θ2= Xi: the i-th observation.
4-2 Point EstimationMean Square Error
17
• r = MSE(Θ1)/MSE(Θ2) = (2/n)/2 = 1/n
• Sample mean is a better estimator than a single observation.
• The square root of the variance of an estimator, V(Θ), is called the standard error of the estimator.
4-2 Point EstimationMean Square Error
18
4-2 Point Estimation
Methods of obtaining an estimator: 1. Method of Moments Estimator (MME) Example:
If the random sample is really from the population with pdf , the sample mean should resemble the population
mean and the MME of can be obtained by solving
. Therefore, the MME of is .
nXXX ,...,, 210,),(
~ xexfiid x
1
),()(0
dxxfxXE
),( xf
1
X X
1ˆ
19
MME – cont’d
Example 2:
The MME of and can be obtained by solving
and .
Therefore, the MMEs of and are
and
),(,...,, 2
~21 NiidXXX n
)(XE 222 )()()( XEXEXV
X
2222 ˆ
1 XXn i
X 22 )(1
ˆ XXn i
2
20
The MMEs of the population parameters can be obtained by:
1. Express the k population moments as functions of parameters
2. Replace the notation for population parameters by those of the MMEs
3. Equate the sample moments to the population moments
4. Solve the k equations to obtain the MMEs of the parameters
21
4-2 Point Estimation
Methods of obtaining an estimator:
2. Maximum Likelihood Estimator (MLE) The likelihood function of , given the data is the joint distribution of
The idea of ML estimation is to find an estimator of which maximizes the likelihood of observing
nxxx ,...,, 21
nXXX ,...,, 21
n
iin xfxxxfL
121 );();,...,,()(
nxxx ,...,, 21
22
4-2 Point Estimation
2. Maximum Likelihood Estimator (MLE) Example:
Maximizing is equivalent to maximizing
We obtain the MLE by solving the first derivative equation: , hence the MLE of is
nXXX ,...,, 21 0,),(~
xexfiid x
n
iin xfxxxfL
121 );();,...,,()(
i
ixne
)(ln Ll )(L
i
ixnl ln
0 i
ixn
X
1ˆ
23
It can be shown that the MLEs of the mean and variance of the normal distribution are:
2
X
2
1
2 )(1
ˆ XXn
n
ii
24
4-3 Hypothesis TestingThe purpose of hypothesis testing is to determine whether there is enough statistical evidence in favor of a certain belief about a parameter.Examples
•Is there statistical evidence in a random sample of potential customers, that support the hypothesis that more than p% of the potential customers will purchase a new products?•Is a new drug effective in curing a certain disease? A sample of patient is randomly selected. Half of them are given the drug where half are given a placebo. The improvement in the patients conditions is then measured and compared.
25
4-3 Hypothesis Testing
We like to think of statistical hypothesis testing as the data analysis stage of a comparative experiment, in which the engineer is interested, for example, in comparing the mean of a population to a specified value (e.g. mean pull strength).
4-3.1 Statistical Hypotheses
26
4-3 Hypothesis Testing
4-3.1 Statistical Hypotheses
The critical concepts of hypothesis testing.
• There are two hypotheses (about a population parameter) The null hypothesis [ for example m = 5]
The alternative hypothesis [m > 5] Assume the null hypothesis is true. Build a statistic related to the parameter hypothesized. Pose the question: How probable is it to obtain a statistic value at least as extreme as the one observed from the sample
0H1H
27
4-3 Hypothesis Testing
4-3.1 Statistical HypothesesThe critical concepts of hypothesis testing-Continued
• Make one of the following two decisions (based on the test): Reject the null hypothesis in favor of the alternative hypothesis. Do not reject the null hypothesis in favor of the alternative hypothesis.
•Two types of errors are possible when making the decision whether to reject H0
Type I error - reject H0 when it is true. Type II error - do not reject H0 when it is false.
28
4-3 Hypothesis Testing
For example, suppose that we are interested in the burning rate of a solid propellant used to power aircrew escape systems.
• Now burning rate is a random variable that can be
described by a probability distribution.
• Suppose that our interest focuses on the mean burning
rate (a parameter of this distribution).
• Specifically, we are interested in deciding whether or
not the mean burning rate is 50 centimeters per second.
4-3.1 Statistical Hypotheses
29
4-3 Hypothesis Testing
4-3.1 Statistical HypothesesTwo-sided Alternative Hypothesis
One-sided Alternative Hypotheses
30
4-3 Hypothesis Testing
4-3.1 Statistical HypothesesTest of a Hypothesis • A procedure leading to a decision about a particular hypothesis
• Hypothesis-testing procedures rely on using the information in a random sample from the population of interest.
• If this information is consistent with the hypothesis, then we will conclude that the hypothesis is true; if this information is inconsistent with the hypothesis, we will conclude that the hypothesis is false.
31
4-3 Hypothesis Testing
4-3.2 Testing Statistical Hypotheses Based on the hypotheses and the information in the sample, the sample space is divided into two parts: Reject Region and Accept Region.
32
4-3 Hypothesis Testing
4-3.2 Testing Statistical Hypotheses
Rejection region (RR) is the subset of sample space leads to the rejection of the null hypothesis. RR in Fig4-3.
The complement of the rejection region is the acceptance region.
AR .
}5.51,5.48{ xorx
}5.515.48{ x
33
4-3 Hypothesis Testing
4-3.2 Testing Statistical Hypotheses
Sometimes the type I error probability is called the significance level, or the -error, or the size of the test.
•The Probabilities of committing errors are calculated to determine the performance of a test.
34
}505.515.48{ XorXP
}5.51{}5.48{ XPXP
}
105.2
505.51{}
105.2
505.48{
ZPZP
}9.1{}9.1{ ZPZP
0288.02
0576.0
35
4-3 Hypothesis Testing
4-3.2 Testing Statistical Hypotheses
If the mean is 50, the probability of obtaining a sample mean less than 48.5 or greater than 51.5 is 0.0576.
36
}5.515.48{ 1istrueHXP Since the alternative hypothesis is composite, there are many distributions in the corresponding subset of parameter space.
Therefore, the probability of committing type II error depends on the distribution chosen to calculate .
}525.515.48{ XP
}
105.2
525.51
105.2
525.48{
ZP
}63.043.4{ ZP
}43.4{}63.0{ ZPZP 2643.0
37
4-3 Hypothesis Testing
4-3.2 Testing Statistical Hypotheses
If the mean is actually 52, the probability of falsely accept is 0.264350:0 H
38
4-3 Hypothesis Testing
4-3.2 Testing Statistical Hypotheses
Similarly, the probability of falsely accept
when is 0.8923, which is much higher than the previous case.
It is harder to detect the difference if two distribution are close.
50:0 H
5.50
39
4-3 Hypothesis Testing
4-3.2 Testing Statistical Hypotheses
The probabilities of committing errors depends also on the sample size n, the amount of information.
10n 2643.016n 2119.0
decreases as n increases.
40
4-3 Hypothesis Testing
4-3.2 Testing Statistical Hypotheses
1. can be decreased by making the AR larger, at the price that will increase.
2. The probability of type II error is a function of population mean
3. The probabilities of committing both types of error can be reduced at the same time only by increasing the sample size.
41
4-3 Hypothesis Testing
4-3.2 Testing Statistical Hypotheses
• The power is computed as 1 - , and power can be interpreted as the probability of correctly rejecting a false null hypothesis. We often compare statistical tests by comparing their power properties.
• For example, consider the propellant burning rate problem whenwe are testing H 0 : = 50 centimeters per second against H 1 : not equal 50 centimeters per second . Suppose that the true value of the mean is = 52. When n = 10, we found that = 0.2643, so the power of this test is 1 - = 1 - 0.2643 = 0.7357 when = 52.
42
10/20
175185
10/20
175XP
10/20
200185
10/20
200XP
Example #4-19.(p157)
= P(Z > 1.58) = 1 P(Z 1.58) = 1 0.94295
= 0.057
= P(Z 2.37) = 0.00889.
c) 1 - = 1 – 0.00889 = 0.99111 (The power of the test when mean=200)
)175185()( XPa
)200185()( XPb
43
x
Example #4-20 (p157).a) Reject the null hypothesis and conclude that the
mean foam height is greater than 175 mm.
The probability that a value of at least 190 mm would be observed (if the true mean height is 175 mm) is only 0.0082. Thus, the sample value of = 190 mm would be an unusual result.
)10/20
175190()175190(
ZPXP
0082.0)4.2( ZP
b)
44
4-3 Hypothesis Testing
4-3.3 P-Values in Hypothesis Testing
45
Example #4-21 (p157). Using n = 16:
0228.0)2()175185()( ZPXPa
00135.0)3()200185()( ZPXPb
99865.000135.011)( c
46
Example #4-22(a) (p157). n = 16:
a) 0.0571 = )16/20
175()175(
cZPcXP
9.18216
2058.1175 c
47
4-3 Hypothesis Testing
4-3.3 One-Sided and Two-Sided Hypotheses
Two-Sided Test:
One-Sided Tests:
48
4-3 Hypothesis Testing
4-3.5 General Procedure for Hypothesis Testing
49
4-4 Inference on the Mean of a Population, Variance Known
Assumptions
50
4-4 Inference on the Mean of a Population, Variance Known
4-4.1 Hypothesis Testing on the Mean
We wish to test:
The test statistic is:
51
•The analyst selects critical values according to a preassigned ,
The rejection of H0 is a “strong conclusion”.
• depends on the sample size and the true value of the parameter,
“fail to reject H0” is a “weak conclusion”. To accept H0, we need
more evidence.
• The critical values are determined by
2-sided hypothesis:
1-sided hypothesis:
, or
}{2
02
zZzP
}{ 0 zZP }{ 0 zZP
4-4 Inference on the Mean of a Population, Variance Known
52
4-4 Inference on the Mean of a Population, Variance Known
4-4.1 Hypothesis Testing on the Mean
Reject H0 if the observed value of the test statistic z0 is either:
z0 > z/2 or z0 < -z/2
Fail to reject H0 if
-z/2 < z0 < z/2
where is the observation of 0z 0Z
53
4-4 Inference on the Mean of a Population, Variance Known
4-4.1 Hypothesis Testing on the Mean
Reject H0 if the observed value of the test statistic z0 is either:
or
Fail to reject H0 if
54
4-4 Inference on the Mean of a Population, Variance Known
4-4.1 Hypothesis Testing on the Mean
55
4-4 Inference on the Mean of a Population, Variance Known
4-4.1 Hypothesis Testing on the Mean
56
Example #4-33 (p173)a) 1) The parameter of interest is the true mean yield, . 2) H0 : = 90
3) H1 : 90
4) = 0.05 5) 6) Reject H0 if z0 < z /2 where z0.025 = 1.96 or
z0 > z/2 where z0.025 = 1.96
7) , = 3
8) Since 1.96 < 0.36 < 1.96 do not reject H0 and
conclude the yield is not significantly different from
90% at = 0.05.
zx
n0
/
48.90x z090 48 90
3 50 36
.
/.
57
4-4 Inference on the Mean of a Population, Variance Known
4-4.1 P-Values in Hypothesis Testing
58
2-sided
1-sided
P-value }{ 00 zZP )(1 0z
01 : H
01 : H
01 : H P-value }{ 00 zZP )( 0z
0{ HrejectPP-value if the critical value is
}{ 0000 zorZzZP )](1[2 0z
}00 z
59
•The p-value of a test is the probability of observing a test statistic at least as extreme as the one computed, given that the null hypothesis is true.
•The p - value provides information about the amount of statistical evidence that supports the alternative hypothesis.
•We can conclude that the smaller the p-value the more statistical evidence exists to support the alternative hypothesis.
60
4-4 Inference on the Mean of a Population, Variance Known
4-4.2 P-Values in Hypothesis Testing
61
Example #4-33 (p173)-continued
a) P-value =
Interpreting the p-valueBecause the probability that the absolute value of the standardized test statistics assume a value of more than 0.36 when is so large (0.7188), there are reasons to believe that the null hypothesis is true.
}36.0{ 0 ZP )]36.0(1[2
7188.064058.01
90
62
The p-value and rejection region methods
The p-value can be used when making decisions based on rejection region methods as follows:
•Define the hypotheses to test, and the required significance level •Perform the sampling procedure, calculate the test statistic and the p-value associated with it.•Compare the p-value to Reject the null hypothesis only if ; otherwise, do not reject the null hypothesis.
.
.p
63
Conclusions of a test of Hypothesis
•If we reject the null hypothesis, we conclude that there is enough evidence to infer that the alternative hypothesis is true.
•If we do not reject the null hypothesis, we conclude that there is not enough statistical evidence to infer that the alternative hypothesis is true.
64
4-4 Inference on the Mean of a Population, Variance Known
4-4.3 Type II Error and Choice of Sample Size
Finding The Probability of Type II Error
65
4-4 Inference on the Mean of a Population, Variance Known
4-4.2 Type II Error and Choice of Sample Size
Finding The Probability of Type II Error
66
For a 2-sided test, recall
}{ 02
02
zZzP
})(
{2
0
2
n
z
n
X
n
zP
)()(22
n
zn
z
n
XZ
00
67
If , the second term can be neglected,
Solving for n, we have
0
n
zz 2
2
22
2)(
zzn
68
4-4 Inference on the Mean of a Population, Variance Known
4-4.2 Type II Error and Choice of Sample Size
Sample Size Formulas
69
4-4 Inference on the Mean of a Population, Variance Known
4-4.2 Type II Error and Choice of Sample Size
Sample Size Formulas
70
}{ 00 zZP
})(
{ 0
n
z
n
XP
)(
n
z
n
zz 2
22)(
zz
n
We have
71
4-4 Inference on the Mean of a Population, Variance Known4-4.2 Type II Error and Choice of Sample Size
72
Example #4-33(p173)-continued , ,
b) n =
n 5.
c) If n=5,
= (1.96 – 1.49) (1.96 – 1.49)
= (0.47) (–3.45) = 0.680822 0.000280 = 0.680542
z z z z
/ . . . ..
22 2
20 025 0 05
2 2
2
2
2
3
85 90
196 165 9
54 67
)
53
9092()
53
9092( 025.0025.0
zz
85 05.0 05.0
92
73
4-4 Inference on the Mean of a Population, Variance Known4-4.2 Type II Error and Choice of Sample Size
74
4-4 Inference on the Mean of a Population, Variance Known
4-4.3 Large Sample Test
• We assume that σ2 is known
• In most practical situation, σ2 is unknown
75
4-4 Inference on the Mean of a Population, Variance Known
4-4.3 Large Sample Test
In general, if n 30, according to the Central Limit Theorem,
the sample variance s2 will be close to σ2 for most samples, and so s can be substituted for σ in the test procedures with little harmful effect.
),(2
nNX n
76
4-4 Inference on the Mean of a Population, Variance Known
4-4.4 Some Practical Comments on Hypothesis Testing
The Seven-Step Procedure
Only three steps are really required:
77
4-4 Inference on the Mean of a Population, Variance Known
4-4.4 Some Practical Comments on Hypothesis Testing
Statistical versus Practical Significance
78
4-4 Inference on the Mean of a Population, Variance Known
4-4.4 Some Practical Comments on Hypothesis Testing
Statistical versus Practical Significance
If the sample size becomes very large, it is almost sure that the null hypothesis will be rejected and conclude that the true mean is not equal to 0.
79
4-4 Inference on the Mean of a Population, Variance Known
4-4.5 Confidence Interval on the Mean
An interval estimator draws inferences about a population by estimating the value of an unknown parameter using an interval.
80
4-4 Inference on the Mean of a Population, Variance Known
4-4.5 Confidence Interval on the Mean
Two-sided confidence interval:
One-sided confidence intervals:
Confidence coefficient:
81
4-4 Inference on the Mean of a Population, Variance Known
4-4.5 Confidence Interval on the Mean
82
4-4 Inference on the Mean of a Population, Variance Known
How is an interval estimator produced from a sampling distribution?
To estimate , a sample of size n is drawn from the population, and its mean is calculated.Under certain conditions, is normally distributed (or approximately normally distributed.), thus
X
X
nx
Z
4-4.5 Confidence Interval on the Mean
83
4-4 Inference on the Mean of a Population, Variance Known
4-4.6 Confidence Interval on the Mean
84
– We know that
1)n
zxn
z(P 22
–This leads to the relationship
1)n
zxn
zx(P 22
X--1 - of all the values of obtained in repeatedsampling from this distribution, construct an interval
that includes (covers) the expected value of thepopulation.
--1 - of all the values of obtained in repeatedsampling from this distribution, construct an interval
that includes (covers) the expected value of thepopulation.
x
nzx,
nzx 22
85
4-4 Inference on the Mean of a Population, Variance Known
4-4.5 Confidence Interval on the Mean
86
Example
Discuss Example 4-5
87
Example #4-33(p173)–continued
(d) Find a 95% two-sided CI on the true mean yield. For = 0.05, ,
Therefore, is the 95% CI on the true mean yield. With 95% confidence, we believe the true mean yield of the chemical process is between 87.85% and 93.11%.
96.1025.02
zz 48.90x
85.875
396.148.90
2
nzxl
11.935
396.148.90
2
nzxu
11.9385.87
88
4-4 Inference on the Mean of a Population, Variance Known
4-4.5 Confidence Interval on the Mean
Relationship between Tests of Hypotheses and Confidence Intervals
If [l,u] is a 100(1 - ) percent confidence interval for the parameter, then the test of significance level of the hypothesis
will lead to rejection of H0 if and only if the hypothesized value is not in the 100(1 - ) percent confidence interval [l, u].
89
Interval estimators can be used to test hypotheses.
Calculate the confidence level interval estimator, then
if the hypothesized parameter value falls within the interval, do not reject the null hypothesis,
if the hypothesized parameter value falls outside the interval, conclude that the null hypothesis can be rejected (m is not equal to the hypothesized value).
1
90
Example #4.33(p.173)-continued
(e) Use the CI found in (d) to test the hypothesis. The 95% CI on the true mean yield is [87.85,93.11]. The null hypothesis cannot be rejected since the hypothesized value, 90%, lies within this interval.
91
4-4 Inference on the Mean of a Population, Variance Known
4-4.5 Confidence Interval on the Mean
Confidence Level and Precision of Estimation
The length of the two-sided 95% confidence interval is
whereas the length of the two-sided 99% confidence interval is
•Because the 99% confidence interval is wider, it is more likely to include the value of .
92
Interpreting the interval estimate
It is wrongwrong to state that the interval estimator is an interval for which there is chance that the population mean lies between the and the .This is so because the is a parameter, not a random variable.
Note that and are random variables.
nzxl
2
nzxu
2
Thus, it is correct to state that there is chance
that will be less than and will be greater than .
1
1
l u
l u
l u
93
4-4 Inference on the Mean of a Population, Variance Known
4-4.5 Confidence Interval on the Mean
Choice of Sample Size
The width of the interval estimate is a function of:the population standard deviation, the confidence level, and the sample size.
We can control the width of the interval estimate by changing the sample size.
Thus, we determine the interval width first, and derive the required sample size.
94
The phrase “estimate the mean to within E units”, translates to an interval estimate of the form
nzXE
2
2
2
E
zn
95
4-4 Inference on the Mean of a Population, Variance Known
4-4.5 Confidence Interval on the Mean
Choice of Sample Size
96
4-4 Inference on the Mean of a Population, Variance Known
4-4.5 Confidence Interval on the Mean
Choice of Sample Size
97
4-4 Inference on the Mean of a Population, Variance Known
4-4.5 Confidence Interval on the Mean
Choice of Sample Size
98
4-4 Inference on the Mean of a Population, Variance Known
4-4.5 Confidence Interval on the Mean
One-Sided Confidence Bounds
99
Example #4.36(p174)
a) 1) The parameter of interest is the true mean life, . 2) H0: = 540 3) H1: 540 4) = 0.05 5)
6) Reject if where 7) ,
8) Since 2.19 > 1.65, reject the null hypothesis and conclude there is sufficient evidence to support the claim the life exceeds 540 hrs at = 0.05.
zx
n0
/
0H zz 0 65.105.0 z
20,33.551 x 19.215/20
54033.551z0
100
b) P-value = P(Z > 2.19) = 1 P(Z 2.19)
0143.0)19.2(1
c) =
20
15)540560(05.0z
= (1.65 3.873) = (2.223) = 0.0131
d) n =
2
2210.005.0
2
22
)540560(
zzzz
,644.8)20(
)20()29.165.1(2
22
n 9.
101
x zn
0 050.
15
20645.133.551
e)
542.83
With 95% confidence, the true mean life is at least 542.83 hrs.
f) Since 540 does not fall within this interval, we can reject the null hypothesis in favor of the alternative.
102
4-4 Inference on the Mean of a Population, Variance Known
4-4.6 General Method for Deriving a Confidence Interval
103
4-5 Inference on the Mean of a Population, Variance Unknown
4-5.1 Hypothesis Testing on the Mean
When is unknown, we will replace it by an estimator of it.
Since S is a random variable, that increases the variation in .
nS
XT 0
0
0T
)1(~ nt
104
Theorem Let be a random sample of size n
from . Then
(1)
(2)
(3) and are independent.
nXXX ,...,, 21
),( 2N
),(~2
nNX
)1(~)1()( 2
2
2
2
2
n
SnXX i
X2S
105
Definition Let and be
independent. Then the random variable defined by
has the Student’s t-distribution with degrees of
freedom.
• The t probability density function is
)1,0(~ NZ )(~ 2 Y
)(Y
ZT
xkxkk
kxf k 2)1(2 1)(
1
]2[
]2/)1[()(
106
can be rewritten as
and we have the following result.
)1(
)1(2
2
0
00
n
Sn
n
X
nS
XT
0T
107
4-5 Inference on the Mean of a Population, Variance Unknown
4-5.1 Hypothesis Testing on the Mean
108
4-5 Inference on the Mean of a Population, Variance Unknown
4-5.1 Hypothesis Testing on the Mean
109
Example
4.)325.)1(( TP
05.)812.1)10(( TP
9.0)812.1)10(812.1( TP
975.0025.01)776.2)4(( TP
05.0)776.2)4(776.2)4(( TorTP
110
4-5 Inference on the Mean of a Population, Variance Unknown
4-5.1 Hypothesis Testing on the Mean
Calculating the P-value
111
4-5 Inference on the Mean of a Population, Variance Unknown
4-5.1 Hypothesis Testing on the Mean
112
4-5 Inference on the Mean of a Population, Variance Unknown
4-5.1 Hypothesis Testing on the Mean
113
• Discuss Example 4-7• Parameter of interest: the mean coefficient of
restitution • Null Hypothesis, H0: =0.82
• Alternative Hypothesis, H1: >0.82
• Test statistic:
4-5 Inference on the Mean of a Population, Variance Unknown
4-5.1 Hypothesis Testing on the Mean Example 4-7
ns
xt
/0
0
114
4-5 Inference on the Mean of a Population, Variance Unknown
4-5.1 Hypothesis Testing on the Mean
115
• Reject H0: if the P-value < 0.05
• Computations:
• Conclusion:
• REJECT
4-5 Inference on the Mean of a Population, Variance Unknown
4-5.1 Hypothesis Testing on the Mean Example 4-7
83725.0x 72.20 t
008297.0)72.2)14(( TP
116P-Value: 0.934A-Squared: 0.143
Anderson-Darling Normality Test
N: 6StDev : 0.318852Av erage: 16.9833
17.417.317.217.117.016.916.816.716.616.5
.999
.99
.95
.80
.50
.20
.05
.01
.001
Pro
ba
bili
ty
Fatty
Normal Probability Plot
Example #4.50 (p.185) A particular brand of diet margarine was analyzed to determine the level of polyunsaturated fatty acid. A sample of 6 packages resulted in the following data: 16.8, 17.2, 16.9, 17.4, 16.5, 17.1.
The normality assumption appears to be satisfied.
117
a) 1) The parameter of interest is the true mean level of polyunsaturated fatty acid, . 2) H0: = 17 3) H1: 17 4) = 0.01 5)
6) Reject if where or
7) = 16.98 s = 0.318 n = 6
8) Since 4.032 < 0.154 < 4.032, do not reject the null hypothesis and conclude the true mean level is not significantly different from 17% at = 0.01.
ns
xt 0
0
0H 1,20
ntt 032.45,005.0 t
032.45,005.00 tt
x
16 98 17
0 319 60154
.
. /.
154.0
6318.0
1798.160
t
118
P-value = 2P(t > 0.154): for degrees of freedom of 5 we obtain 2(0.40) < P-value 0.80 < P-value
119
4-5 Inference on the Mean of a Population, Variance Unknown
4-5.1 Hypothesis Testing on the Mean
120
4-5 Inference on the Mean of a Population, Variance Unknown
4-5.2 Type II Error and Choice of Sample Size
Fortunately, this unpleasant task has already been done, and the results are summarized in a series of graphs in Appendix A Charts Va, Vb, Vc, and Vd that plot for the t-test against a parameter for various sample sizes n.
Where has a noncentral t-distribution.0T
=0+
121
4-5 Inference on the Mean of a Population, Variance Unknown
4-5.2 Type II Error and Choice of Sample Size
These graphics are called operating characteristic (or OC) curves. Curves are provided for two-sided alternatives on Charts Va and Vb. The abscissa scale factor d on these charts is defined as
122
b) Using the OC curves on Chart Vb, with d = 1.567, n = 10, when 0.1. Therefore, the current sample size of 6 is inadequate.
123
4-5 Inference on the Mean of a Population, Variance Unknown
4-5.3 Confidence Interval on the Mean
1)/
(
1)(
1,2/1,2/
1,2/1,2/
nn
nn
tnS
XtP
tTtP
Rearrange the above equation
1)//( 1,2/1,2/ nStXnStXP nn
124
4-5 Inference on the Mean of a Population, Variance Unknown
4-5.3 Confidence Interval on the Mean
125
c) For = 0.01, n=6,
we have 16.455 17.505. With 99% confidence, we believe the true mean level of polyunsaturated fatty acid is between 16.455% and 17.505%.
032.45,005.0 t
6
318.0032.498.16
126
4-5 Inference on the Mean of a Population, Variance Unknown
4-5.3 Confidence Interval on the Mean
127
4-5 Inference on the Mean of a Population, Variance Unknown
4-5.4 Confidence Interval on the Mean
128
4-6 Inference on the Variance of a Normal Population
4-6.1 Hypothesis Testing on the Variance of a Normal Population
Test Statistic:
129
4-6 Inference on the Variance of a Normal Population
4-6.1 Hypothesis Testing on the Variance of a Normal Population
130
4-6 Inference on the Variance of a Normal Population
4-6.1 Hypothesis Testing on the Variance of a Normal Population
131
4-6 Inference on the Variance of a Normal Population
4-6.1 Hypothesis Testing on the Variance of a Normal Population
Chi-squared distribution is a skewed distribution.
132
4-6 Inference on the Variance of a Normal Population
4-6.1 Hypothesis Testing on the Variance of a Normal Population
133
4-6 Inference on the Variance of a Normal Population
4-6.1 Hypothesis Testing on the Variance of a Normal Population
134
4-6 Inference on the Variance of a Normal Population
4-6.1 Hypothesis Testing on the Variance of a Normal Population
Discuss Example 4-10
135
4-5 Inference on the Mean of a Population, Variance Unknown
1))1(
( 21,2/2
22
1,2/1 nn
SnP
Rearrange the above equation
4-6.2 Confidence Interval on the Variance of a Normal Population
1))1()1(
(2
1,2/1
22
21,2/
2
nn
SnSnP
136
4-6 Inference on the Variance of a Normal Population
4-6.2 Confidence Interval on the Variance of a Normal Population
137
4-6 Inference on the Variance of a Normal Population
4-6.2 Confidence Interval on the Variance of a Normal Population
Discuss Example 4-11
138
Example 4-59(p.191) n=15, s=0.016mm(a) In order to use 2 statistic in hypothesis testing and confidence
interval construction, we need to assume that the underlying distribution is normal.
1)The parameter of interest is the true standard deviation of the diameter, . 2) vs. 3) = 0.05 4) 5) Reject if 6) Since 8.96 < 23.685 do not reject and conclude there is insufficient evidence to indicate the true standard deviation of the diameter exceeds 0.02 at = 0.05.
20004.0: 2
0 H 0004.0: 21 H
96.802.0
016.014)1(2
2
20
220
sn
68.23214,05.0
20
0H
0H
139
P-value = for 14 degrees of freedom: 0.5 < P-value < 0.9
b) 95% lower confidence interval on For = 0.05 and n = 15,
The 95% lower confidence interval on is
c) Based on the lower confidence bound, we cannot reject the null hypothesis.
)96.8( 2 P
2
68.23214,05.0
21, n
2
68.23
016.014
2200015.0
140
4-7 Inference on Population Proportion
4-7.1 Hypothesis Testing on a Binomial Proportion
We will consider testing:
141
4-7 Inference on Population Proportion
4-7.1 Hypothesis Testing on a Binomial Proportion
142
4-7 Inference on Population Proportion
4-7.1 Hypothesis Testing on a Binomial Proportion
143
4-7 Inference on Population Proportion
4-7.1 Hypothesis Testing on a Binomial Proportion
144
4-7 Inference on Population Proportion
4-7.2 Type II Error and Choice of Sample Size
145
4-7 Inference on Population Proportion
4-7.2 Type II Error and Choice of Sample Size
146
4-7 Inference on Population Proportion
4-7.2 Type II Error and Choice of Sample Size
Discuss Example 4-13
147
Example #4-75(p201) In the process of manufacturing lens, a machine will be qualified if the percentage of the polished lens contains surface defects less than 4%. A random sample of 300 lens contains 11 defective lenses. (a) Formulate and test the hypotheses at 05.0
1) The parameter of interest is the true percentage of polished lenses that contain surface defects, p. 2) H0 : p = 0.04 vs. H1 : p < 0.04 3) = 0.05 4)
6) Reject if
7) Since do not reject the null hypothesis and conclude the machine
cannot be qualified at the 0.05 level of significance.
npp
pp
pnp
npxz
)1(
ˆ
)1( 00
0
00
00
0H 645.105.00 zzz
645.1295.096.004.0300
04.0300110
z
148
b) P-value = (-0.295) = 0.386 = 0.614
c)
d) (4-69)
)/)1(
/)1((1)02.0( 000
0npp
nppzpppzZP
)300/98.002.0
300/96.004.0645.102.004.0(1
4345.0)1648.0(1
22
0
00
02.004.0
98.002.0645.196.004.0645.1)1()1(
pp
ppzppzn
6.763 Take n=764.
149
4-7 Inference on Population Proportion
4-7.3 Confidence Interval on a Binomial Proportion
1))1(
(
1)(
2/2/
2/2/
znpp
pPzP
zZzP
Rearrange the above equation
1))1()1(
( 2/2/ n
ppzPp
n
ppzPP
n
XP i
150
4-7 Inference on Population Proportion
4-7.3 Confidence Interval on a Binomial Proportion
Idea: Use instead of p!
1))1()1(
( 2/2/ n
PPzPp
n
PPzPP
But we don’t know p! (Cannot construct CI!)
P
151
4-7 Inference on Population Proportion
4-7.3 Confidence Interval on a Binomial Proportion
152
4-7 Inference on Population Proportion
4-7.3 Confidence Interval on a Binomial Proportion
153
4-7 Inference on Population Proportion
4-7.3 Confidence Interval on a Binomial ProportionChoice of Sample Size
• is the point estimator of p
• E = | - p |
P
P
154
4-7 Inference on Population Proportion
4-7.3 Confidence Interval on a Binomial ProportionChoice of Sample Size
p+1/p 4
155
Example
Round up to n=1068
03.0ˆ ppE
96.105.02 z
11.106703.04
96.1)1(
2
2
2
2
2
E
ppzn
156
• In some situation, we are interested in predicting a future observation of a random variable
• Find a range of likely values for the variable associated with making the prediction
• Given X1,..,Xn, to predict Xn+1 [?,?]
4-8 Other Interval Estimates for a Single Sample4-8.1 Prediction Interval
157
4-8 Other Interval Estimates for a Single Sample4-8.1 Prediction Interval
To predict the value of a single future value, .
The point estimator is .
The expected value of the prediction error,
The variance of the prediction error,
When is unknown, has distribution.
1nX
X0)( 1 XXE n
)1
1()( 21 n
XXV n
ns
XXT n
11
1
2 1nt
158
4-8 Other Interval Estimates for a Single Sample4-8.1 Prediction Interval
Discuss Example 4-16
159
4-8 Other Interval Estimates for a Single Sample
4-8.2 Tolerance Intervals for a Normal Distribution
160
4-10 Testing for Goodness of Fit
• So far, we have assumed the population or probability distribution for a particular problem is known. (parametric approach).
• There are many instances where the underlying distribution is not known, and we wish to test a particular distribution. (nonparametric approach無母數統計 ).
• Use a goodness-of-fit test procedure based on the chi-square distribution.
161
4-10 Testing for Goodness of Fit
• Oi: the observed frequency in the i-th class interval
• Ei: the expected frequency in the i-th class interval from the hypothesized probability distribution
Test Statistics:
162
Where is the expected number of observations in ith group, and
is the observed number observations in the ith group.
Then
Theorem:
Where k is the number of groups
p is the number of parameters estimated by the sample.
iOiE
)1(~ 220 pkX
163
4-10 Testing for Goodness of Fit
• Discuss Example 4-18
164
Value 0 1 2 3 4 5
Observed Frequency
8 25 23 21 16 7
Expected Frequency
9.1 21.8 26.2 20.9 12.5 6.02
Example #4-85(p207) )4.2(P
P(X=0)=0.091, P(X=1)=0.218, P(X=2)=0.262, P(X=3)=0.209, P(X=4)=0.125, P(X=5)=0.06, P(X>5)=0.035 k=6, p=1 df=4
Do not reject
49.91.202.6
)02.67(...
1.9
)1.98( 24,05.0
2220
X
0H
P-value = lies between 0.5 and 0.9 (p.437).)1.2( 2 XP