Lecture 10.1.Key
Transcript of Lecture 10.1.Key
-
7/23/2019 Lecture 10.1.Key
1/26
Transforms Revisited
Transforms are used to
Change the mean function so that it islinear.
Adjust for non-constant variance problem Fix non-Normal residuals Although you won't always solve all three
problems (or any problem for that matter.)
-
7/23/2019 Lecture 10.1.Key
2/26
Youve already studied
log transforms and square-root transforms
Now
were going to consider a more generalclass of transforms and discuss strategiesfor finding the best transform
-
7/23/2019 Lecture 10.1.Key
3/26
Strategy
First, transform Y. If that doesn't work, transform the
predictors, but not Y.
If that improves things but not perfectly, seeif you can now transform Y.
There are also approaches that considertransforming ALL variables simultaneously.
Keep in mind
Don't remove outliers, influential points,etc. until the transforming is done. These points might not really be so outlying
once the transform is done.
-
7/23/2019 Lecture 10.1.Key
4/26
Keep in Mind
Simple is better than complicated If you are expected to interpret the
parameters, then transformations mightmake this impossible.
Transform Y
E(Y|X)6=0+ 1x1+ . . .+ pxp
Basic idea: What if
but instead:
E(Y|X) = g(0+ 1x1+ . . . + pxp)
so we need to discover g()
-
7/23/2019 Lecture 10.1.Key
5/26
E(Y|X) = g(0+ 1x1+ . . . + pxp)
if we knew g(), we could invert it:
g1(E(Y|X)) = g1(g(0+ 1x1+ . . . + pxp))
Ynew = 0 + 1x1 + . . .+ pxp
Transform Y: 2 approaches
Inverse Response Plots Box-Cox Method
-
7/23/2019 Lecture 10.1.Key
6/26
Inverse Response Plots
a technique for guessing g()
If the predictors have an elliptically symmetricdistribution (so joint Normal is one example of this), then
plot y-hat against y.
The shape of the resulting curve gives you an idea as to the
shape of g inverse.
-
7/23/2019 Lecture 10.1.Key
7/26
> m1=lm(ozone~temperature+pressure,data=ozonetext)> plot(m1)
A plot of the predictors show that their joint distributionis roughly elliptical.
-
7/23/2019 Lecture 10.1.Key
8/26
> library(alr3)
> invResPlot(m2)
lambda RSS1 0.3658881 1989.771
2 -1.0000000 3412.9123 0.0000000 2082.3774 1.0000000 2196.992
Suggests that the best transform is Ynew = Y0.365881
(lambda=0 refers to the log transform)
Note log transform isnt to different from optimal
-
7/23/2019 Lecture 10.1.Key
9/26
> ozone.t1=transform(ozonetext,ozone.t = ozone^(.37) )> m2=lm(ozone.t~temperature+pressure,data=ozone.t1)> plot(m2) transformed
original
-
7/23/2019 Lecture 10.1.Key
10/26
transform
original
transformed
original
-
7/23/2019 Lecture 10.1.Key
11/26
On the whole, the transformationimproved the validity of the model.
But interpretation may now be quitedifficult.
Still, improved validity means we bettertrust p-values and confidence intervals andprediction intervals.
> summary(m2)
Call:lm(formula = ozone.t ~ temperature + pressure, data = ozone.t1)
Coefficients: Estimate Std. Error t value Pr(>|t|)(Intercept) -0.4004629 0.1774149 -2.257 0.0256 *temperature 0.0423812 0.0027663 15.321
-
7/23/2019 Lecture 10.1.Key
12/26
Another approach:
(Useful when the distribution of the variable to betransformed is not Normal.)
Box-Cox
Choose a transform of Y, (Y)
such that distribution of Y is closer to Normalwhere
(Y) = gm(Y)1(Y 1)/
(Y) =gm(Y)log(Y) for = 0
(gmis the geometric mean)
-
7/23/2019 Lecture 10.1.Key
13/26
(Y) = gm(Y)1(Y 1)/
gm(Y) is the geometric mean of y =
ni=1Y1/ni
To find lambda....
maximum likelihood estimation of lambda.
> library(MASS)
> boxcox(m1)
or
> library(alr3)
>summary(powerTransform(y~x1+x2,data=))
-
7/23/2019 Lecture 10.1.Key
14/26
1/3
which confirmsour previous
transformationusing lambda = .37
> boxcox(m1)
> summary(powerTransform(m1))
bcPower Transformation to Normality
Est.Power Std.Err. Wald Lower Bound Wald Upper Bound
Y1 0.2343 0.0866 0.0646 0.4041
Likelihood ratio tests about transformation parameters
LRT df pval
LR test, lambda = (0) 7.568201 1 5.940706e-03
LR test, lambda = (1) 66.558671 1 3.330669e-16
In fact, optimal transform is .23, which is smaller thanprevious .37. However, .37 is within the confidence interval
of 0.0646 to 0.4041
-
7/23/2019 Lecture 10.1.Key
15/26
Likelihood ratio tests about transformation parameters
LRT df pval
LR test, lambda = (0) 7.568201 1 5.940706e-03
LR test, lambda = (1) 66.558671 1 3.330669e-16
Null: lambda=0Alt: lambda 0
Small p-value, so we reject.Thus, it is best to notdo a
log transform.
Null: no transform (lambda=1)Alt: do a transform
Reject. We need a transform.
Transform Predictors
You can use BoxCox to transformpredictors when Y is NOT transformed
Then, if necessary, use inverse responseplot to transform Y
-
7/23/2019 Lecture 10.1.Key
16/26
In this approach, we find a transformationthat makes the joint distribution of all the
predictorsmultivariate Normal.
(or as close to it as we can get)
once thats done, we try to find atransform for Y.
Then we see if it helps.
-
7/23/2019 Lecture 10.1.Key
17/26
-
7/23/2019 Lecture 10.1.Key
18/26
o these predictors look like they come from a Normaldistribution?
(probably not)
>library(alr3)
> summary(powerTransform(ozone~temperature+height,data=o2.mini))
box.cox Transformations to Multinormality
Est.Power Std.Err. Wald(Power=0) Wald(Power=1)
temperature 1.1383 0.3246 3.5070 0.426height 18.9126 4.5176 4.1864 3.965
LRT df p.valueLR test, all lambda equal 0 25.50600 2 2.893633e-06
LR test, all lambda equal 1 17.30179 2 1.749703e-04
Best lambda could be within two Std. Errors of Estimated.
For temp, use a lambda between 0.5 to 1.7, roundinggenerously.
-
7/23/2019 Lecture 10.1.Key
19/26
> summary(powerTransform(cbind(o2.mini$temperature, o2.mini
$height,data=o2.mini)~1)
box.cox Transformations to Multinormality
Est.Power Std.Err. Wald(Power=0) Wald(Power=1)
temperature 1.1383 0.3246 3.5070 0.426height 18.9126 4.5176 4.1864 3.965
LRT df p.valueLR test, all lambda equal 0 25.50600 2 2.893633e-06
LR test, all lambda equal 1 17.30179 2 1.749703e-04
Temp: try square-root transform or no transform
Height: Transform to a high power, which is very unusual
and probably not helpful. But let's try the 20th poweranyways.
> o2.minit=transform(o2.mini,temp.t = sqrt(temperature),height.t =height^20)> plot(o2.minit)
-
7/23/2019 Lecture 10.1.Key
20/26
> o2.minit=transform(o2.mini,temp.t = sqrt(temperature),height.t = height^20)> plot(o2.minit)
residuals: no transform
-
7/23/2019 Lecture 10.1.Key
21/26
not much better, so look at transforming
Y
transformedpredictors > m.t1 = lm(ozone~temp.t+height.t,data=o2.minit)
> plot(m.t1)
> invResPlot(m.t1)
-
7/23/2019 Lecture 10.1.Key
22/26
once again,
Y Y
1/3
looks best.> o2.minit2 = transform(o2.minit,ozone.t =ozone^(1/3))> m.t2 = lm(ozone.t~temp.t
+height.t,data=o2.minit2)
> plot(m.t2)
-
7/23/2019 Lecture 10.1.Key
23/26
A third approach is to use boxcox totransform the predictors and the response
simultaneously
-
7/23/2019 Lecture 10.1.Key
24/26
Use BoxCox to transform ALL at once.>
summary(powerTransform(with(o2.mini,cbind(ozone,height,temperature))
)box.cox Transformations to Multinormality
Est.Power Std.Err. Wald(Power=0) Wald(Power=1)ozone 0.2503 0.0888 2.8178 -8.4416
height 18.8959 4.4542 4.2422 4.0177
temperature 1.1590 0.2661 4.3550 0.5976
LRT df p.valueLR test, all lambda equal 0 37.03313 3 4.527709e-08
LR test, all lambda equal 1 83.53574 3 0.000000e+00
This is consistent with the 1/3 power of ozone, a 20th power forheight, and no change (raise to the 1 power) for temp.
-
7/23/2019 Lecture 10.1.Key
25/26
-
7/23/2019 Lecture 10.1.Key
26/26
2 (p+ 1)/n= 2 3/141 = 0.04 = "big" leverage