SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the...
Transcript of SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the...
![Page 1: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/1.jpg)
SEEC Toolbox seminarsData Exploration
Greg Distiller
29th November
![Page 2: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/2.jpg)
Motivation
I The availability of sophisticated statistical tools has grown.But “rubbish in - rubbish out”.
I ”. . . even the well-known assumptions continue to be violatedfrequently”.
![Page 3: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/3.jpg)
Motivation
Examples of statistical pitfalls:
I Collinearity → Type II errors
I Zero inflation in GLMs may bias parameter estimates
I Spatio / temporal autocorrelation → Type I errors
I Linearity
I Ability to model interactions
![Page 4: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/4.jpg)
Motivation
I These various pitfalls can be avoided with proper dataexploration.
I Exploration is separate from hypothesis testing.
I If a priori knowledge is very thin, exploration can be used as ahypothesis-generating exercise.
I Focus on graphical tools rather than testing.
![Page 5: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/5.jpg)
OutliersI Some techniques can be dominated by outliers.I They should not be removed as a matter of course!I Boxplots are typically used.I Cleveland dotplot (row nos vs observations) provides more
information.
![Page 6: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/6.jpg)
Outliers
I Extreme values by chance?I Can simulate random observations.I Can consider dropping if they seem to be measurement errors.
![Page 7: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/7.jpg)
Outliers
I Outliers in X vs Y space.I influential observations are usually both!
I Transformations an option but really not ideal (especially forthe response).I better to choose a more suitable model, or a more suitable
metric if not modelling.
![Page 8: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/8.jpg)
Homogeneity of varianceI Important assumption in regression type models (incl.
ANOVA) and MV techniques like discriminant analysis.I Regression-type models → verify with residuals.I Can use conditional boxplots.
Source: Milandri S.G. et al (2012)
![Page 9: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/9.jpg)
Homogeneity of varianceI Important assumption in regression type models (incl.
ANOVA) and MV techniques like discriminant analysis.I Regression-type models → verify with residuals.I Can use conditional boxplots.
![Page 10: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/10.jpg)
NormalityI Is normality required? What must be normal?
Source: Zuur et al. Analysing Ecological Data
I Implies normality of the residuals.
![Page 11: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/11.jpg)
NormalityI Is normality required? What must be normal?
Source: Zuur et al. Analysing Ecological Data
Source: Zuur et al. Analysing Ecological Data
I Implies normality of the residuals.
![Page 12: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/12.jpg)
NormalityI Is normality required? What must be normal?
Source: Zuur et al. Analysing Ecological Data
I Implies normality of the residuals.
![Page 13: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/13.jpg)
NormalityI Is normality required? What must be normal?
Source: Zuur et al. Analysing Ecological Data
I Implies normality of the residuals.
![Page 14: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/14.jpg)
Normality
I There can be more to it than meets the eye:
![Page 15: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/15.jpg)
Excess ZerosI Poisson GLM often used to model count data.
Source: Abraham, E. R., & Thompson, F. N. (2011)
I Zero-inflated models (ZIPs, ZINBs, Hurdle models . . .)
![Page 16: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/16.jpg)
Excess ZerosI Poisson GLM often used to model count data.
Source: Abraham, E. R., & Thompson, F. N. (2011)
I Zero-inflated models (ZIPs, ZINBs, Hurdle models . . .)
![Page 17: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/17.jpg)
Excess ZerosMV setting:I is there useful information in joint zeros?
I corrgram can be used to assess prevalence of “joint absences”:
![Page 18: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/18.jpg)
Excess ZerosMV setting:I is there useful information in joint zeros?I corrgram can be used to assess prevalence of “joint absences”:
![Page 19: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/19.jpg)
Collinearity
I NB when objective is to understand what covariates drive asystem.
I Common examples: morphological measurements, waterdepth and distance to the shore . . .
I Can lead to a confusing set of results.
![Page 20: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/20.jpg)
CollinearitySource: Krivokapich J et al (1999)
●
●●
●
●●
●
●
●● ●●
●●●● ●
● ●●●●● ● ●● ● ●
●●● ●● ● ●
● ●● ●●●
● ●●● ●
● ●●● ●●●●
●
●
●
●
●●
●●
●
●●
●
●
●
●●
●
●
●●
●●
●
●●
●
●
●●
●
●●
●
● ● ●●
●●
●● ●●●●●●● ● ●●● ●●●● ●
●●● ●●●●●● ● ● ● ●● ●● ●● ●●●●●●
●●● ●● ●●●● ●●● ●●● ●
●●● ●● ● ●●●●● ●● ●●
● ●● ●●●●
● ●●●●● ● ●●● ●● ●● ● ●●●
●● ●●●●
●●
●
●
●●
●
●
●● ●
● ●●
●● ●●●
●●●
● ●●
●●● ●● ●
● ● ●●
● ●●
●
●
●
●● ●
●● ●●●
●●●
● ●
● ●
●●
●
●
●
●●
●●
●●
●
●●
●● ●
●
●
●
●
●●
● ●
●● ●●
●● ●●
●●
● ●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●● ●●
● ●●●●● ●● ●● ●
●● ● ●●●● ●
● ●●●
● ●● ●●●●●
●●● ●
● ●●
●
●
●
●
●
●
● ● ●
● ● ●
● ●●
●
●
●●
●
●
●●
●
●
●
● ●
●
●
●
●
●
● ●●
● ●
●
●●
●
●
●
●
●
●
●●●
●●●● ●●
● ● ●●●
●●
●●●● ●
●
●
●●●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●● ●
●●● ●●●●
●● ●● ●● ●
● ●● ● ●●●● ●
●●●●
●●● ●●
● ●●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●●● ●
●●
●
●
●
●
●
Ejection fraction on Dobutamine
Bas
elin
e ej
ectio
n fr
actio
n
30 40 50 60 70 80 90
3040
5060
7080
90
![Page 21: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/21.jpg)
CollinearitySource: Krivokapich J et al (1999)
●
●
●
●
●●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
event
Bas
elin
e ej
ectio
n fr
actio
n
No event Cardiac event
2030
4050
6070
80
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
event
Eje
ctio
n fr
actio
n on
Dub
utam
ine
No event Cardiac event
3040
5060
7080
90
![Page 22: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/22.jpg)
Collinearity
> m1 <- glm(event ~ baseef + dobef, family = binomial)
> summary(m1)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.66404 0.56674 2.936 0.003323 **
baseef 0.02878 0.02605 1.105 0.269212
dobef -0.07770 0.02333 -3.331 0.000865 ***
![Page 23: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/23.jpg)
Collinearity
I There are some diagnostic tools:I VIF
1
1− R2j
I scatterplots: plot covariates against any temporal / spatialvariables.
I High or even moderate collinearity is problematic when thesignal is weak.
![Page 24: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/24.jpg)
Nature of the Relationship
I Use scatterplots of the response variable vs each covariate.
I Note that the absence of two-way relationships does notnecessarily mean that there are no relationships.
![Page 25: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/25.jpg)
Nature of the Relationship
●
●
●
● ●
●
●
● ● ●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
Year
Sep
tem
ber
Arc
tic S
ea Ic
e E
xten
t (1,
000,
000
sq k
m)
1980 1985 1990 1995 2000 2005 2010
Source: Gary Witt. Using data from climate science to teach introductory statistics. Journal of Statistics
Education, 21(1), 2013.
![Page 26: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/26.jpg)
Nature of the Relationship
Year
Sep
tem
ber
Arc
tic S
ea Ic
e E
xten
t (1,
000,
000
sq k
m)
1980 1985 1990 1995 2000 2005 2010
●
●
●
● ●
●
●
● ● ●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
Source: Gary Witt. Using data from climate science to teach introductory statistics. Journal of Statistics
Education, 21(1), 2013.
![Page 27: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/27.jpg)
Nature of the Relationship
> m1 <- lm(ice ~ year, data = dat)
> m2 <- lm(ice ~ year + I(year^2), data = dat)
> anova(m1,m2)
Analysis of Variance Table
Model 1: ice ~ year
Model 2: ice ~ year + I(year^2)
Res.Df RSS Df Sum of Sq F Pr(>F)
1 32 10.529
2 31 6.822 1 3.7072 16.846 0.0002731 ***
![Page 28: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/28.jpg)
Including interactions
I A coplot is a useful plot to visualise potential interactions.
I Can reveal data sparsity.
![Page 29: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/29.jpg)
Independence
I A crucial assumption for many techniques.
I Violation increases type I errors.I Some examples:
I Data from different locations → are birds from locations closeto each other more similar than birds from locations furtheraway?
I Individuals from one family may tend to be similar due toshared genes.
I Repeated measurements.
![Page 30: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/30.jpg)
Independence
I Plot the response against time or space, any pattern suggestsdependence.
I More formally, plot auto-correlation functions (ACF’s).
![Page 31: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/31.jpg)
Independence
When the data are not independent:
I need a model that can account for the dependence:mixed-effects models, AR models, generalised least squares(modelling the correlation structure).
I can sometimes model the dependence with suitable covariates.
I residuals should display no dependence.
![Page 32: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/32.jpg)
Final thoughts
I Not all steps are relevant for all data sets.
I Some techniques require analysis first (i.e. to produceresiduals)
I Transformations are not ideal.
“...the routine use and transparent reporting of systematic dataexploration would improve the quality of ecological research andany applied recommendations that it produces.”
![Page 33: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/33.jpg)
Final thoughts
I Not all steps are relevant for all data sets.
I Some techniques require analysis first (i.e. to produceresiduals)
I Transformations are not ideal.
“...the routine use and transparent reporting of systematic dataexploration would improve the quality of ecological research andany applied recommendations that it produces.”
![Page 34: SEEC Toolbox seminars · Source: Zuur et al. Analysing Ecological Data I Implies normality of the residuals. Normality I There can be more to it than meets the eye: Excess Zeros I](https://reader036.fdocuments.in/reader036/viewer/2022071011/5fc9d0331c11ea30ef484b9e/html5/thumbnails/34.jpg)
SEEC Stats Toolbox
Want to broaden your stats knowledge? Unsure of what you can do with your data? Still developing your proposal?
Join us for the monthly SEEC Stats Toolbox seminars where we introduce you to statistical methods that are useful for ecologists, environmental and conservation scientists.
See you next year!