Transforming the data Modified from: Gotelli and Allison 2004. Chapter 8; Sokal and Rohlf 2000...
-
Upload
charleen-barton -
Category
Documents
-
view
220 -
download
0
Transcript of Transforming the data Modified from: Gotelli and Allison 2004. Chapter 8; Sokal and Rohlf 2000...
![Page 1: Transforming the data Modified from: Gotelli and Allison 2004. Chapter 8; Sokal and Rohlf 2000 Chapter 13.](https://reader036.fdocuments.in/reader036/viewer/2022062309/56649d985503460f94a8264c/html5/thumbnails/1.jpg)
Transforming the data
Modified from:
Gotelli and Allison 2004. Chapter 8; Sokal and Rohlf 2000 Chapter 13
![Page 2: Transforming the data Modified from: Gotelli and Allison 2004. Chapter 8; Sokal and Rohlf 2000 Chapter 13.](https://reader036.fdocuments.in/reader036/viewer/2022062309/56649d985503460f94a8264c/html5/thumbnails/2.jpg)
What is a transformation?
It is a mathematical function that is applied to all the observations of a given variable
• Y represents the original variable, Y* is the transformed variable,
and f is a mathematical function that is applied to the data
YfY *
![Page 3: Transforming the data Modified from: Gotelli and Allison 2004. Chapter 8; Sokal and Rohlf 2000 Chapter 13.](https://reader036.fdocuments.in/reader036/viewer/2022062309/56649d985503460f94a8264c/html5/thumbnails/3.jpg)
Most are monotonic:
• Monotonic functions do not change the rank order of the data, but they do change their relative spacing, and therefore affect the variance and shape of the probability distribution
![Page 4: Transforming the data Modified from: Gotelli and Allison 2004. Chapter 8; Sokal and Rohlf 2000 Chapter 13.](https://reader036.fdocuments.in/reader036/viewer/2022062309/56649d985503460f94a8264c/html5/thumbnails/4.jpg)
There are two legitimate reasons to transform your data before analysis
• The patterns in the transformed data may be easier to understand and communicate than patterns in the raw data.
• They may be necessary so that the analysis is valid
![Page 5: Transforming the data Modified from: Gotelli and Allison 2004. Chapter 8; Sokal and Rohlf 2000 Chapter 13.](https://reader036.fdocuments.in/reader036/viewer/2022062309/56649d985503460f94a8264c/html5/thumbnails/5.jpg)
They are often useful for converting curves into straight lines:
The logarithmic function is very useful when two variables are related to each other by multiplicative or exponential functions
![Page 6: Transforming the data Modified from: Gotelli and Allison 2004. Chapter 8; Sokal and Rohlf 2000 Chapter 13.](https://reader036.fdocuments.in/reader036/viewer/2022062309/56649d985503460f94a8264c/html5/thumbnails/6.jpg)
Logarithmic (X): )log(10 XY
)log(10 XY
y = Ln(x)
0
5
10
15
20
1 100 10000 1000000
log(x)
Y
0
5
10
15
20
0 50000 100000 150000 200000
x
Y
![Page 7: Transforming the data Modified from: Gotelli and Allison 2004. Chapter 8; Sokal and Rohlf 2000 Chapter 13.](https://reader036.fdocuments.in/reader036/viewer/2022062309/56649d985503460f94a8264c/html5/thumbnails/7.jpg)
Example:Asi’s growth (50 % each year)Year weight
1 10.0
2 15.0
3 22.5
4 33.8
5 50.6
6 75.9
7 113.9
8 170.9
9 256.3
10 384.4
11 576.7
12 865.0
![Page 8: Transforming the data Modified from: Gotelli and Allison 2004. Chapter 8; Sokal and Rohlf 2000 Chapter 13.](https://reader036.fdocuments.in/reader036/viewer/2022062309/56649d985503460f94a8264c/html5/thumbnails/8.jpg)
Exponential: XeY 10
0.0
200.0
400.0
600.0
800.0
1000.0
0 5 10 15
year
wei
gh
t (g
)
y = 6.6667e0.4055x
1.0
100.0
10000.0
0 5 10 15
yearw
eig
ht
(g)
XY 10 )ln()ln(
![Page 9: Transforming the data Modified from: Gotelli and Allison 2004. Chapter 8; Sokal and Rohlf 2000 Chapter 13.](https://reader036.fdocuments.in/reader036/viewer/2022062309/56649d985503460f94a8264c/html5/thumbnails/9.jpg)
Example: Species richness in the Galapagos Islands
![Page 10: Transforming the data Modified from: Gotelli and Allison 2004. Chapter 8; Sokal and Rohlf 2000 Chapter 13.](https://reader036.fdocuments.in/reader036/viewer/2022062309/56649d985503460f94a8264c/html5/thumbnails/10.jpg)
Power: 10
XY
)log()log()log( 10 XY
0
100
200
300
400
0 2000 4000 6000 8000
Area
Ric
hn
ess
Nspecies
Power(Nspecies)
1
10
100
1000
0.1 10 1000 100000
Area
Ric
hn
ess
Nspecies
Power(Nspecies)
![Page 11: Transforming the data Modified from: Gotelli and Allison 2004. Chapter 8; Sokal and Rohlf 2000 Chapter 13.](https://reader036.fdocuments.in/reader036/viewer/2022062309/56649d985503460f94a8264c/html5/thumbnails/11.jpg)
Statistics and transformation
Data to be analyzed using analysis of variance must meet to assumptions:
• The data must be homoscedastic: variances of treatment groups need to be approximately equal
• The residuals, or deviations from the mean must be normal random variables
![Page 12: Transforming the data Modified from: Gotelli and Allison 2004. Chapter 8; Sokal and Rohlf 2000 Chapter 13.](https://reader036.fdocuments.in/reader036/viewer/2022062309/56649d985503460f94a8264c/html5/thumbnails/12.jpg)
Lets look an example
• A single variate of the simplest type of ANOVA (completely randomized, single classification) decomposes as follows:
• In this model the components are additive with the error term εij distributed normally
ijiijY
![Page 13: Transforming the data Modified from: Gotelli and Allison 2004. Chapter 8; Sokal and Rohlf 2000 Chapter 13.](https://reader036.fdocuments.in/reader036/viewer/2022062309/56649d985503460f94a8264c/html5/thumbnails/13.jpg)
However…
• We might encounter a situation in which the components are multiplicative in effect, where
• If we fitted a standard ANOVA model, the observed deviations from the group means would lack normality and homoscedasticity
ijiijY
![Page 14: Transforming the data Modified from: Gotelli and Allison 2004. Chapter 8; Sokal and Rohlf 2000 Chapter 13.](https://reader036.fdocuments.in/reader036/viewer/2022062309/56649d985503460f94a8264c/html5/thumbnails/14.jpg)
The logarithmic transformation
• We can correct this situation by transforming our model into logarithms
)log(* YY
Wherever the mean is positively correlated with the variance the logarithmic transformation is likely to remedy the situation and make the variance independent of the mean
![Page 15: Transforming the data Modified from: Gotelli and Allison 2004. Chapter 8; Sokal and Rohlf 2000 Chapter 13.](https://reader036.fdocuments.in/reader036/viewer/2022062309/56649d985503460f94a8264c/html5/thumbnails/15.jpg)
We would obtain
• Which is additive and homoscedastic
)log()log()log()log( ijiijY
![Page 16: Transforming the data Modified from: Gotelli and Allison 2004. Chapter 8; Sokal and Rohlf 2000 Chapter 13.](https://reader036.fdocuments.in/reader036/viewer/2022062309/56649d985503460f94a8264c/html5/thumbnails/16.jpg)
The square root transformation
• It is used most frequently with count data. Such distributions are likely to be Poisson distributed rather than normally distributed.
In the Poisson distribution the variance is the same as the mean.
Transforming the variates to square roots generally makes the variances independents of the means for these type of data.
When counts include zero values, it is desirable to code
all variates by adding 0.5.
![Page 17: Transforming the data Modified from: Gotelli and Allison 2004. Chapter 8; Sokal and Rohlf 2000 Chapter 13.](https://reader036.fdocuments.in/reader036/viewer/2022062309/56649d985503460f94a8264c/html5/thumbnails/17.jpg)
The box-cox transformation
• Often one do not have a-priori reason for selecting a specific transformation.
• Box and Cox (1964) developed a procedure for estimating the best transformation to normality within the family of power transformation
/)1(* YY
)log(* YY
)0( for)0( for
![Page 18: Transforming the data Modified from: Gotelli and Allison 2004. Chapter 8; Sokal and Rohlf 2000 Chapter 13.](https://reader036.fdocuments.in/reader036/viewer/2022062309/56649d985503460f94a8264c/html5/thumbnails/18.jpg)
The box-cox transformation
• The value of lambda which maximizes the log-likelihood function:
yields the best transformation to normality within the family of transformations
s2T is the variance of the transformed values (based on v degrees of freedom). The second term involves the sum of the ln of untransformed values
)ln()1(ln2
2 Yn
vs
vL T
![Page 19: Transforming the data Modified from: Gotelli and Allison 2004. Chapter 8; Sokal and Rohlf 2000 Chapter 13.](https://reader036.fdocuments.in/reader036/viewer/2022062309/56649d985503460f94a8264c/html5/thumbnails/19.jpg)
box-cox in R (for a vector of data Y)
>library(MASS)
>lamb <- seq(0,2.5,0.5)
>boxcox(Y_~1,lamb,plotit=T)
>library(car)
>transform_Y<-box.cox(Y,lamb)
-2 -1 0 1 2
-24
.0-2
3.5
-23
.0-2
2.5
log
-Lik
elih
oo
d
95%
What do you conclude from this plot?
Read more in Sokal and Rohlf 2000 page 417
![Page 20: Transforming the data Modified from: Gotelli and Allison 2004. Chapter 8; Sokal and Rohlf 2000 Chapter 13.](https://reader036.fdocuments.in/reader036/viewer/2022062309/56649d985503460f94a8264c/html5/thumbnails/20.jpg)
The arcsine transformation
• Also known as the angular transformation
• It is especially appropriate to percentages
![Page 21: Transforming the data Modified from: Gotelli and Allison 2004. Chapter 8; Sokal and Rohlf 2000 Chapter 13.](https://reader036.fdocuments.in/reader036/viewer/2022062309/56649d985503460f94a8264c/html5/thumbnails/21.jpg)
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
lineal
arcsine
The arcsine transformation
YY arcsin*
It is appropriate only for data expressed as proportions
Proportion original data
Tra
nsfo
rmed
da
ta
![Page 22: Transforming the data Modified from: Gotelli and Allison 2004. Chapter 8; Sokal and Rohlf 2000 Chapter 13.](https://reader036.fdocuments.in/reader036/viewer/2022062309/56649d985503460f94a8264c/html5/thumbnails/22.jpg)
Since the transformations discussed are NON-LINEAR, confidence limits computed in
the transformed scale and changed back to the original
scale would be
asymmetrical