Arbitrage, Factor Structure, and Mean- Variance Analysis ...
1 Chapter 3 Experiments with a Single Factor: The Analysis of Variance.
-
Upload
valeria-bolt -
Category
Documents
-
view
267 -
download
10
Transcript of 1 Chapter 3 Experiments with a Single Factor: The Analysis of Variance.
1
Chapter 3 Experiments with a Single Factor: The Analysis of Variance
2
3.1 An Example
• Chapter 2: A signal-factor experiment with two levels of the factor
• Consider signal-factor experiments with a levels of the factor, a 2
• Example: – The tensile strength of a new synthetic fiber.– The weight percent of cotton– Five levels: 15%, 20%, 25%, 30%, 35%– a = 5 and n = 5
3
• Does changing the cotton weight percent change the mean tensile strength?
• Is there an optimum level for cotton content?
An Example (See pg. 61)
• An engineer is interested in investigating the relationship between the RF power setting and the etch rate for this tool. The objective of an experiment like this is to model the relationship between etch rate and RF power, and to specify the power setting that will give a desired target etch rate.
• The response variable is etch rate.
4
• She is interested in a particular gas (C2F6) and gap (0.80 cm), and wants to test four levels of RF power: 160W, 180W, 200W, and 220W. She decided to test five wafers at each level of RF power.
• The experimenter chooses 4 levels of RF power 160W, 180W, 200W, and 220W
• The experiment is replicated 5 times – runs made in random order
5
6
7
• Does changing the power change the mean etch rate?
• Is there an optimum level for power?• We would like to have an objective way to answer
these questions• The t-test really doesn’t apply here – more than
two factor levels
8
9
3.2 The Analysis of Variance
• a levels (treatments) of a factor and n replicates for each level.
• yij: the jth observation taken under factor level or
treatment i.
10
Models for the Data• Means model:
– yij is the ijth observation,
i is the mean of the ith factor level
ij is a random error with mean zero
• Let μ i = μ + τ i , is the overall mean and τ i is the
ith treatment effect• Effects model:
nj
aiy ijiij ,...,2,1
,...,2,1,
nj
aiy ijiij ,...,2,1
,...,2,1,
11
• Linear statistical model• One-way or Signal-factor analysis of variance
model• Completely randomized design: the experiments
are performed in random order so that the environment in which the treatment are applied is as uniform as possible.
• For hypothesis testing, the model errors are assumed to be normally and independently distributed random variables with mean zero and variance, σ 2, i.e. yij ~ N(μ + τi, σ 2)
• Fixed effect model: a levels have been specifically chosen by the experimenter.
12
3.3 Analysis of the Fixed Effects Model• Interested in testing the equality of the a treatment
means, and E(yij) = μ i = μ + τ i, i = 1,2, …, a
H0: μ 1 = … = μ a v.s.
H1: μ i ≠ μ j, for at least one pair (i, j)
• Constraint:
• H0: τ1 = … = τa =0 v.s. H1: τ i ≠ 0, for at least one i
0
ii
ii
a
13
• Notations:
3.3.1 Decomposition of the Total Sum of Squares
• Total variability into its component parts.• The total sum of squares (a measure of overall
variability in the data)
• Degree of freedom: an – 1 = N – 1
naNNyynyy
yyyy
ii
a
i
n
jij
n
jiji
,/,/
,1 11
2..
1 1
( )a n
T iji j
SS y y
14
• SSTreatment: sum of squares of the differences between the treatment averages (sum of squares due to treatments) and the grand average, and a – 1 degree of freedom
• SSE: sum of squares of the differences of observations within treatments from the treatment average (sum of squares due to error), and a(n - 1) = N – a degrees of freedom.
2 2.. . .. .
1 1 1 1
2 2. .. .
1 1 1
( ) [( ) ( )]
( ) ( )
a n a n
ij i ij ii j i j
a a n
i ij ii i j
T Treatments E
y y y y y y
n y y y y
SS SS SS
15
T Treatments ESS SS SS • A large value of SSTreatments reflects large differences in treatment means
• A small value of SSTreatments likely indicates no differences in treatment means
• dfTotal = dfTreatment + dfError
•
• No differences between a treatment means: variance cane be estimated by
)1()1(
)1()1( 221
nn
SnSn
aN
SS aE
1
)(
1
2
a
yyn
a
SS ii
Treatments
16
• Mean squares:
3.3.2 Statistical Analysis
• Assumption: εij are normally and independently
distributed with mean zero and variance σ 2 • Cochran’s Thm (p. 69)
aN
SSMS
a
SSMS E
ETreatments
Treatments
,
1
)1/()()(
)1
(1
)(
1
2
2
1 1 1
22
anMSE
yn
yEaN
MSE
a
iiTreatments
a
i
n
j
a
iiijE
17
• SST/σ2 ~ Chi-square (N – 1), SSE/σ2 ~ Chi-square
(N – a), SSTreatments/σ2 ~ Chi-square (a – 1), and
SSE/σ2 and SSTreatments/σ2 are independent (Theorem
3.1)
• H0: τ1 = … = τa =0 v.s. H1: τi ≠ 0, for at least one i
18
• Reject H0 if F0 > F α , a-1, N-a
• Rewrite the sum of squares:
• See page 71• Randomization test
TreatmentsTE
a
iiTreatments
a
i
n
jijT
SSSSSS
N
yy
nSS
N
yySS
2
1
2
2
1 1
1
ANOVA Table of Example 3-1
19
20
21
3.3.3 Estimation of the Model Parameters
• Model: yij = μ + τ i + ε ij
• Estimators:
• Confidence intervals:
ii
ii
y
yy
y
ˆ
ˆ
ˆ
n
MStyy
n
MStyy
n
MSty
n
MSty
nNy
EaNjiji
EaNji
EaNii
EaNi
ii
,2/,2/
,2/,2/
2 )/,(~
22
• Example 3.3 (page 74)• Simultaneous Confidence Intervals (Bonferroni
method): Construct a set of r simultaneous confidence intervals on treatment means which is at least 100(1-): 100(1-/r) C.I.’s
3.3.4 Unbalanced Data
• Let ni observations be taken under treatment i,
i=1,2,…,a, N = i ni,
N
y
n
ySS
N
yySS
a
i i
iTreatments
a
i
n
jijT
i
2
1
2
2
1 1
2
23
1. The test statistic is relatively insensitive to small departures from the assumption of equal variance for the a treatments if the sample sizes are equal.
2. The power of the test is maximized if the samples are of equal size.
24
3.4 Model Adequacy Checking
• Assumptions: yij ~ N(μ + τ i, σ2)
• The examination of residuals• Definition of residual:
• The residuals should be structureless.
iiiij
ijijij
yyyyy
yye
)(ˆˆˆ
,ˆ
25
3.4.1 The Normality Assumption• Plot a histogram of the residuals• Plot a normal probability plot of the residuals • See Table 3-6
26
• May be – the left tail of error is thinner than the tail part
of standard normal• Outliers• The possible causes of outliers: calculations, data
coding, copy error,….• Sometimes outliers are more informative than the
rest of the data.
27
• Detect outliers: Examine the standardized residuals,
3.4.2 Plot of Residuals in Time Sequence• Plotting the residuals in time order of data
collection is helpful in detecting correlation between the residuals.
• Independence assumption
E
ijij
MS
ed
28
29
• Nonconstant variance: the variance of the observations increases as the magnitude of the observation increase, i.e. yij 2
• If the factor levels having the larger variance also have small sample sizes, the actual type I error rate is larger than anticipated.
• Variance-stabilizing transformation
Poisson Square root transformation,
Lognormal Logarithmic transformation,
Binomial Arcsin transformation,
ijy
ijylog
ijyarcsin
30
• Statistical Tests for Equality Variance:
– Bartlett’s test:
– Reject null hypothesis if
21
2210 oneleast at for not true above:H v.s. :H ia
)/()1(
)()1()1(3
11
log)1(log)(
3026.2
1
22
1
11
1
22
20
aNSnS
aNna
c
SnSaNq
c
q
a
iiip
a
ii
a
iiiP
21,
20 a
31
• Example 3.4: the test statistic is
• Bartlett’s test is sensitive to the normality assumption
• The modified Levene test: – Use the absolute deviation of the observation in
each treatment from the treatment median.
– Mean deviations are equal => the variance of the observations in all treatments will be the same.
– The test statistic for Levene’s test is the ANOVA F statistic for testing equality of means.
81.7 and 43.0 23,05.0
20
iiijij njaiyyd ,,2,1,,,2,1,~
32
• Example 3.5: – Four methods of estimating flood flow
frequency procedure (see Table 3.7)– ANOVA table (Table 3.8)– The plot of residuals v.s. fitted values (Figure
3.7)
– Modified Levene’s test: F0 = 4.55 with P-value
= 0.0137. Reject the null hypothesis of equal variances.
33
• Let E(y) = and y
• Find y* = y that yields a constant variance.
y* +-1
• Variance-Stabilizing Transformations
y and = 1 - Transformation
y constant 0 1 No transformation
y 1/2 ½ ½ Square root
y 1 0 Log
y 3/2 3/2 -1/2 Reciprocal square root
y 2 2 -1 Reciprocal
http://www.stat.ufl.edu/~winner/sta6207/transform.pdf
34
• How to find :
• Use • See Figure 3.8, Table 3.10 and Figure 3.9
iyi logloglog
iiii yS and
35
3.5 Practical Interpretation of Results• Conduct the experiment => perform the statistical
analysis => investigate the underlying assumptions => draw practical conclusion
3.5.1 A Regression Model• Qualitative factor: compare the difference between
the levels of the factors.• Quantitative factor: develop an interpolation
equation for the response variable.
The Regression Model
36
37
3.5.2 Comparisons Among Treatment Means• If that hypothesis is rejected, we don’t know
which specific means are different• Determining which specific means differ
following an ANOVA is called the multiple comparisons problem
3.5.3 Graphical Comparisons of Means
38
3.5.4 Contrast• A contrast: a linear combination of the parameters
of the form
• H0: = 0 v.s. H1: 0
• Two methods for this testing.
0,11
a
ii
a
iii cc
39
• The first method:
aNa
iiE
a
iii
a
ii
a
iii
a
ii
a
iii
t
cnMS
yct
N
cn
yc
cnCVarycC
~ statistic, theHence
)1,0(~,HUnder
)(Then Let
1
2
10
1
22
10
1
22
1
40
• The second method:
a
ii
a
iii
CE
C
E
C
a,Na
iiE
a
iii
cn
yc
SSMS
SS
MS
MSF
~FcnMS
)yc( tF
1
2
10
1
1
2
2
1200
,1/
41
• The C.I. for a contrast,
• Unequal Sample Size
a
ii
EaN
a
ii
a
ii
a
iii
a
iii
cn
MStyc
cn
σVar(C)ycC
c
1
2,2/
1i
1
22
1
1
C.I. Hence
Then .Let
a
iii
a
iii
Ca
iiiE
a
iiia
iii
cn
yc
cnMS
yctcn
1
2
2
1
1
2
10
1
SS 3. 2. 0 .1
42
3.5.5 Orthogonal Contrast
• Two contrasts with coefficients, {ci} and {di}, are
orthogonal if ci di = 0
• For a treatments, the set of a – 1 orthogonal contrasts partition the sum of squares due to treatments into a – 1 independent single-degree-of-freedom components. Thus, tests performed on orthogonal contrasts are independent.
• See Example 3.6 (Page 90)
43
44
3.5.6 Scheffe’s Method for Comparing All Contrasts• Scheffe (1953) proposed a method for comparing
any and all possible contrasts between treatment means.
• See Page 91 and 92
0:Hreject then , If
)1( : valuecritical The
)/( and
,,2,1, Suppose
0,
,1,,
1
2
1
11
uuu
aNaCu
iiiuEC
a
iiiuu
aauuu
SC
FaSS
ncMSSycC
mucc
u
u
45
3.5.7 Comparing Pairs of Treatment Means• Compare all pairs of a treatment means • Tukey’s Test:
– The studentized range statistic:
– See Example 3.7
)/1/1(),(or
),( ispoint critical The
means sample of group a ofout means sample
smallest andlargest theare and ,/
minmaxminmax
jiE
E
E
nnMSfaqT
n
MSfaqT
p
yynMS
yyq
46
• Sometimes overall F test from ANOVA is significant, but the pairwise comparison of mean fails to reveal any significant differences.
• The F test is simultaneously considering all possible contrasts involving the treatment means, not just pairwise comparisons.
The Fisher Least Significant Difference (LSD) Method
• For H0: i = j
)/1/1(0
jiE
ji
nnMS
yyt
47
• The least significant difference (LSD):
• See Example 3.8
Duncan’s Multiple Range Test• The a treatment averages are arranged in
ascending order, and the standard error of each average is determined as
jiEaN nn
MStLSD11
,2/
a
ii
hh
Ey
n
an
n
MSS
i
1
/1,
48
• Assume equal sample size, the significant ranges are
• Total a(a-1)/2 pairs• Example 3.9
The Newman-Keuls Test• Similar as Duncan’s multiple range test• The critical values:
apSfprRiyP ,,3,2,,
iyP SfpqK ),(
49
3.5.8 Comparing Treatment Means with a Control• Assume one of the treatments is a control, and the
analyst is interested in comparing each of the other a – 1 treatment means with the control.
• Test H0: i = a v.s. H1: i a, i = 1,2,…, a – 1
• Dunnett (1964)• Compute
• Reject H0 if
• Example 3.9
1,,2,1, aiyy ai
aiEai nn
MSfadyy11
),1(
50
3.7 Determining Sample Size
• Determine the number of replicates to run
3.7.1 Operating Characteristic Curves (OC Curves)• OC curves: a plot of type II error probability of a
statistical test,
false) is H|(1
false is H|HReject 1
0,1,0
00
aNaFFP
P
51
• If H0 is false, then
F0 = MSTreatment / MSE ~ noncentral F
with degree of freedom a – 1 and N – a and noncentrality parameter
• Chart V of the Appendix• Determine
• Let i be the specified treatments. Then estimates
of i :
• For 2, from prior experience, a previous experiment or a preliminary test or a judgment estimate.
21
2
2
a
na
ii
aa
iiii /,
1
52
• Example 3.11• Difficulty: How to select a set of treatment means
on which the sample size decision should be based.
• Another approach: Select a sample size such that if the difference between any two treatment means exceeds a specified value the null hypothesis should be rejected.
2
22
anD
53
3.7.2 Specifying a Standard Deviation Increase• Let P be a percentage for increase in standard
deviation of an observation. Then
• For example (Page 110): If P = 20, then
nPn
aa
ii
101.01/
/21
2
nn 66.012.1 2
54
3.7.3 Confidence Interval Estimation Method• Use Confidence interval.
• For example: we want 95% C.I. on the difference in mean tensile strength for any two cotton weight percentages to be 5 psi and = 3. See Page 110.
n
MStyy
n
MStyy E
aNjijiE
aNji ,2/,2/
3.8 A Real Application
55
56
57
58
59
3.10 The Regression Approach to the Analysis of Variance
• Model: yij = μ + τ i + ij
•
aiyy
aiLL
yL
n
jiij
a
i
n
jiij
i
a
i
n
jiij
a
i
n
jij
,,2,1,0ˆˆ&0ˆˆ
,,2,1,0
11 1
1
2
11 1
2
60
• The normal equations
• Apply the constraint
Then estimations are
• Regression sum of squares (the reduction due to fitting the full model)
aa
a
ynn
ynn
ynn
ynnnN
ˆˆ
ˆˆ
ˆˆ
ˆˆˆˆ
22
11
21
yyy ii ˆ,ˆ
a
i
ia
iii n
yyyR
1
2
1
ˆˆ),(
a
ii
1
0̂
61
• The error sum of squares:
• Find the sum of squares resulting from the treatment effects:
,1 1
2 RySSa
i
n
jijE
N
yny
R R
RRR
ii
2
1
2 /
Model) (Reduced - Model) Full(
)(),()|(
62
• The testing statistic for H0: 1 = … = a
aNaa
i
n
jij
F
aNRy
aRF
,1
1 1
2
0 ~
)/(),(
)1/()|(