When trying to explain some of the patterns you have observed in your species and community data, it...
-
Upload
melvyn-jennings -
Category
Documents
-
view
214 -
download
0
Transcript of When trying to explain some of the patterns you have observed in your species and community data, it...
When trying to explain some of the patterns you have
observed in your species and community data, it
sometimes helps to have a look at relationships between
variables – both physical and biological
Correlation and linear regression
Is it possible to quantify your observations?
For Example………..
What patterns can see when you look at the above data?
Altitude (m) 0 1000 2000 3000 4000Max Temperature (°C) 27 22 14 10 3
Rainfall (mm) 3750 4625 1864 876 321No Species 128 86 34 20 15
Sp A 112 59 12 2 0Sp B 18 10 35 20 53Sp C 27 27 27 26 26Sp D 134 165 68 29 9Sp E 0 0 5 25 76
Correlation and linear regression:
not the same, but are related
Correlation:
quantifies how X and Y vary together
Linear regression: line that best predicts Y from X
Use correlation when both X and Y are measured
Use linear regression when one of the variables is controlled
Direction
0.0
6.7
13.3
20.0
0.0 4.0 8.0 12.0
C1 vs C2
C1
C2
0.0
40.0
80.0
120.0
0.0 83.3 166.7 250.0
C1 vs C2
C1
C2
Positive
Large values of X = large values of Y, Small values of X = small values of Y.e.g., height and weight
Large values of X = small values of Y Small values of X = large values of Ye.g., speed and accuracy
Negative
3 characteristics of a relationship
DirectionPositive(+)Negative (-)
Degree of associationBetween –1 and +1 Absolute values signify strength
FormLinear Non-linear
FormLinear Non- linear
Degree of association
0.0
6.7
13.3
20.0
0.0 4.0 8.0 12.0
C1 vs C2
C1
C2
0.0
40.0
80.0
120.0
0.0 4.0 8.0 12.0
C1 vs C2
C1C
2
Strong(tight cloud)
Weak(diffuse cloud)
Pearson’s r
Absolute value indicates strength
+/- indicates direction
A value ranging from -1.00 to 1.00 indicating the strength and direction of the linear relationship
Correlation: a statistical technique that measures and describes the degree of linear relationship between two variables
Obs X YA 1 1 B 1 3 C 3 2 D 4 5 E 6 4 F 7 5
Dataset
X
Y
Scatterplot
Some Examples………….
(X – X)(Y – Y)
r
(X – X) 2(Y – Y) 2√
Sum of Squares (Sample)
Mean Sum of Squares (sample)
(Variance)
Data (x) x - mean (x - mean)2
3 -1 14 0 05 1 16 2 47 3 92 -2 43 -1 14 0 05 1 16 2 43 -1 12 -2 43 -1 14 0 05 1 12 -2 464 0 3616 154 2.4
ΣN
mean
Remember…………….
162.25
Standard Deviation
s = 1.5
(Variance) √
Square units?
How to Calculate Pearson’s r
(X – X)(Y – Y)
r
(X – X) 2(Y – Y) 2√
The equation for r
YX, of variationtotal
YX, of covariationr
Means this in words………
NUMERATOR: For each set of X and Y values - you are
looking at the deviation of X from its mean, and the
deviation of Y from its mean – to get a feel for their joint
deviation – or covariation. This is summed across all
sets of X-Y values to provide an overall index of co-
variation.
DENOMINATOR: This is simply total variation of X and Y
(see previous slide)
Femur L (cm) Humerus L (cm)X Y
A 38 41 -20.2 -25 408.04 625 505B 56 63 -2.2 -3 4.84 9 6.6C 59 70 0.8 4 0.64 16 3.2D 64 72 5.8 6 33.64 36 34.8E 74 84 15.8 18 249.64 324 284.4
Total 291 330 696.8 1010 834Count 5 5Mean 58.2 66
(Y - Ymean)2 (X - Xmean)(Y - Ymean)(Y - Ymean) (X - Xmean)2Specimen (X - Xmean)
Femur L (cm) Humerus L (cm)X Y
A 38 41B 56 63C 59 70D 64 72E 74 84
SpecimenFor Example………….
0
10
20
30
40
50
60
70
80
90
0 20 40 60 80
Femur Length (cm)
Hu
mer
us
Len
gth
(cm
)
(X – X)(Y – Y)
r
(X – X) 2(Y – Y) 2√= 834 / √(696.8 x 1010)
= 834 / √703768
= 834 / 383.9
= 0.994
Some issues with r
Outliers have strong effectsRestriction of range can suppress or augment r Correlation is not causation No linear correlation does not mean no association
OutliersChild 19 is lowering rChild 18 is increasing r
The restricted range problemThe relationship you see between X and Y may depend on the range of X
For example, the size of a child’s vocabulary has a strong positive association with the child’s age
But if all of the children in your data set are in the same grade in school, you may not see much association
Common causes, confounds
Two variables might be associated because they share a common cause.
There is a positive correlation between ice cream sales and the number of drowning incidents..
Also, in many cases, there is the question of reverse causality
Non-linearity
05
101520253035404550
1 2 3 4 5 6
Practice time
Pro
fici
ency
Some variables are not linearly related, though a relationship obviously exists
The correlation coefficient, r, is a statistic
Its significance can be determined by checking it against
the appropriate critical value [for a set level of probability,
degree of freedom and alpha (1 or 2 tailed)] in a table of r
values.
When you check the table – ignore the sign of your value
If your value is greater than the critical value, then it is
considered significant.
It summarises the co-variation or correlation between the
two variables and varies (excluding negatives) from 0 to 1
Before checking it, however, you need to set up a null
hypothesis (H0)
What would such an hypothesis be?
0.1 0.05 0.02 0.011 0.988 0.997 0.9995 0.99992 0.9 0.95 0.98 0.993 0.805 0.878 0.934 0.9594 0.729 0.811 0.882 0.9175 0.669 0.754 0.833 0.8746 0.622 0.707 0.789 0.8347 0.582 0.666 0.75 0.7988 0.549 0.632 0.716 0.7659 0.521 0.602 0.685 0.735
10 0.497 0.576 0.658 0.70811 0.476 0.553 0.634 0.68412 0.458 0.532 0.612 0.66113 0.441 0.514 0.592 0.64114 0.426 0.497 0.574 0.62315 0.412 0.482 0.558 0.60616 0.4 0.468 0.542 0.5917 0.389 0.456 0.528 0.57518 0.378 0.444 0.516 0.56119 0.369 0.433 0.503 0.54920 0.36 0.423 0.492 0.53721 0.352 0.413 0.482 0.52622 0.344 0.404 0.472 0.51523 0.337 0.396 0.462 0.50524 0.33 0.388 0.453 0.49625 0.323 0.381 0.445 0.48726 0.317 0.374 0.437 0.47927 0.311 0.367 0.43 0.47128 0.306 0.361 0.423 0.46329 0.301 0.355 0.416 0.45630 0.296 0.349 0.409 0.44935 0.275 0.325 0.381 0.41840 0.257 0.304 0.358 0.39345 0.243 0.288 0.338 0.37250 0.231 0.273 0.322 0.35460 0.211 0.25 0.295 0.32570 0.195 0.232 0.274 0.30380 0.183 0.217 0.256 0.28390 0.173 0.205 0.242 0.267
100 0.164 0.195 0.23 0.254
Level of Significance: 2-TailedDF (N-2)
If r is the correlation coefficient, what is r2?
The amount of covariation compared to the amount of total variation
“The percent of total variance that is shared variance”
E.g. “If r = .80, then X explains 64% of the variability in Y” (and vice versa)
MSExcel can generate r2 values…………….
A CAUTIONARY NOTE
6
7
8
9
10
11
12
13
10 12 14 16 18 20 22
Temperature (Celcius)
Lo
g N
um
be
rs
r = 0.93
7
8
9
10
11
12
13
20 22 24 26 28 30
Temperature (Celcius)
Lo
g N
um
be
rs
r = 0.911
6
7
8
9
10
11
12
13
10 15 20 25 30
Temperature (Celcius)
Lo
g N
um
be
rs
r = 0.302
0
10
20
30
40
50
60
70
10 12 14 16 18 20 22 24 26 28 30 32
Temperature (Celcius)
Fre
qu
en
cy
BUT……..
THE END
Image acknowledgements – http://www.google.com
Content acknowledgements – Dr Vanessa Couldridge, UWC