When trying to explain some of the patterns you have observed in your species and community data, it...

17
When trying to explain some of the patterns you have observed in your species and community data, it sometimes helps to have a look at relationships between variables – both physical and biological Correlation and linear regression it possible to quantify your observatio For Example……….. t patterns can see when you look at the above data? A ltitude (m ) 0 1000 2000 3000 4000 M ax Tem perature (°C ) 27 22 14 10 3 Rainfall(m m ) 3750 4625 1864 876 321 N o Species 128 86 34 20 15 Sp A 112 59 12 2 0 Sp B 18 10 35 20 53 Sp C 27 27 27 26 26 Sp D 134 165 68 29 9 Sp E 0 0 5 25 76

Transcript of When trying to explain some of the patterns you have observed in your species and community data, it...

Page 1: When trying to explain some of the patterns you have observed in your species and community data, it sometimes helps to have a look at relationships between.

When trying to explain some of the patterns you have

observed in your species and community data, it

sometimes helps to have a look at relationships between

variables – both physical and biological

Correlation and linear regression

Is it possible to quantify your observations?

For Example………..

What patterns can see when you look at the above data?

Altitude (m) 0 1000 2000 3000 4000Max Temperature (°C) 27 22 14 10 3

Rainfall (mm) 3750 4625 1864 876 321No Species 128 86 34 20 15

Sp A 112 59 12 2 0Sp B 18 10 35 20 53Sp C 27 27 27 26 26Sp D 134 165 68 29 9Sp E 0 0 5 25 76

Page 2: When trying to explain some of the patterns you have observed in your species and community data, it sometimes helps to have a look at relationships between.

Correlation and linear regression:

not the same, but are related

Correlation:

quantifies how X and Y vary together

Linear regression: line that best predicts Y from X

Use correlation when both X and Y are measured

Use linear regression when one of the variables is controlled

Page 3: When trying to explain some of the patterns you have observed in your species and community data, it sometimes helps to have a look at relationships between.

Direction

0.0

6.7

13.3

20.0

0.0 4.0 8.0 12.0

C1 vs C2

C1

C2

0.0

40.0

80.0

120.0

0.0 83.3 166.7 250.0

C1 vs C2

C1

C2

Positive

Large values of X = large values of Y, Small values of X = small values of Y.e.g., height and weight

Large values of X = small values of Y Small values of X = large values of Ye.g., speed and accuracy

Negative

3 characteristics of a relationship

DirectionPositive(+)Negative (-)

Degree of associationBetween –1 and +1 Absolute values signify strength

FormLinear Non-linear

Page 4: When trying to explain some of the patterns you have observed in your species and community data, it sometimes helps to have a look at relationships between.

FormLinear Non- linear

Degree of association

0.0

6.7

13.3

20.0

0.0 4.0 8.0 12.0

C1 vs C2

C1

C2

0.0

40.0

80.0

120.0

0.0 4.0 8.0 12.0

C1 vs C2

C1C

2

Strong(tight cloud)

Weak(diffuse cloud)

Page 5: When trying to explain some of the patterns you have observed in your species and community data, it sometimes helps to have a look at relationships between.

Pearson’s r

Absolute value indicates strength

+/- indicates direction

A value ranging from -1.00 to 1.00 indicating the strength and direction of the linear relationship

Correlation: a statistical technique that measures and describes the degree of linear relationship between two variables

Obs X YA 1 1 B 1 3 C 3 2 D 4 5 E 6 4 F 7 5

Dataset

X

Y

Scatterplot

Page 6: When trying to explain some of the patterns you have observed in your species and community data, it sometimes helps to have a look at relationships between.

Some Examples………….

Page 7: When trying to explain some of the patterns you have observed in your species and community data, it sometimes helps to have a look at relationships between.

(X – X)(Y – Y)

r

(X – X) 2(Y – Y) 2√

Sum of Squares (Sample)

Mean Sum of Squares (sample)

(Variance)

Data (x) x - mean (x - mean)2

3 -1 14 0 05 1 16 2 47 3 92 -2 43 -1 14 0 05 1 16 2 43 -1 12 -2 43 -1 14 0 05 1 12 -2 464 0 3616 154 2.4

ΣN

mean

Remember…………….

162.25

Standard Deviation

s = 1.5

(Variance) √

Square units?

How to Calculate Pearson’s r

Page 8: When trying to explain some of the patterns you have observed in your species and community data, it sometimes helps to have a look at relationships between.

(X – X)(Y – Y)

r

(X – X) 2(Y – Y) 2√

The equation for r

YX, of variationtotal

YX, of covariationr

Means this in words………

NUMERATOR: For each set of X and Y values - you are

looking at the deviation of X from its mean, and the

deviation of Y from its mean – to get a feel for their joint

deviation – or covariation. This is summed across all

sets of X-Y values to provide an overall index of co-

variation.

DENOMINATOR: This is simply total variation of X and Y

(see previous slide)

Page 9: When trying to explain some of the patterns you have observed in your species and community data, it sometimes helps to have a look at relationships between.

Femur L (cm) Humerus L (cm)X Y

A 38 41 -20.2 -25 408.04 625 505B 56 63 -2.2 -3 4.84 9 6.6C 59 70 0.8 4 0.64 16 3.2D 64 72 5.8 6 33.64 36 34.8E 74 84 15.8 18 249.64 324 284.4

Total 291 330 696.8 1010 834Count 5 5Mean 58.2 66

(Y - Ymean)2 (X - Xmean)(Y - Ymean)(Y - Ymean) (X - Xmean)2Specimen (X - Xmean)

Femur L (cm) Humerus L (cm)X Y

A 38 41B 56 63C 59 70D 64 72E 74 84

SpecimenFor Example………….

0

10

20

30

40

50

60

70

80

90

0 20 40 60 80

Femur Length (cm)

Hu

mer

us

Len

gth

(cm

)

(X – X)(Y – Y)

r

(X – X) 2(Y – Y) 2√= 834 / √(696.8 x 1010)

= 834 / √703768

= 834 / 383.9

= 0.994

Page 10: When trying to explain some of the patterns you have observed in your species and community data, it sometimes helps to have a look at relationships between.

Some issues with r

Outliers have strong effectsRestriction of range can suppress or augment r Correlation is not causation No linear correlation does not mean no association

OutliersChild 19 is lowering rChild 18 is increasing r

Page 11: When trying to explain some of the patterns you have observed in your species and community data, it sometimes helps to have a look at relationships between.

The restricted range problemThe relationship you see between X and Y may depend on the range of X

For example, the size of a child’s vocabulary has a strong positive association with the child’s age

But if all of the children in your data set are in the same grade in school, you may not see much association

Common causes, confounds

Two variables might be associated because they share a common cause.

There is a positive correlation between ice cream sales and the number of drowning incidents..

Also, in many cases, there is the question of reverse causality

Page 12: When trying to explain some of the patterns you have observed in your species and community data, it sometimes helps to have a look at relationships between.

Non-linearity

05

101520253035404550

1 2 3 4 5 6

Practice time

Pro

fici

ency

Some variables are not linearly related, though a relationship obviously exists

Page 13: When trying to explain some of the patterns you have observed in your species and community data, it sometimes helps to have a look at relationships between.

The correlation coefficient, r, is a statistic

Its significance can be determined by checking it against

the appropriate critical value [for a set level of probability,

degree of freedom and alpha (1 or 2 tailed)] in a table of r

values.

When you check the table – ignore the sign of your value

If your value is greater than the critical value, then it is

considered significant.

It summarises the co-variation or correlation between the

two variables and varies (excluding negatives) from 0 to 1

Before checking it, however, you need to set up a null

hypothesis (H0)

What would such an hypothesis be?

Page 14: When trying to explain some of the patterns you have observed in your species and community data, it sometimes helps to have a look at relationships between.

0.1 0.05 0.02 0.011 0.988 0.997 0.9995 0.99992 0.9 0.95 0.98 0.993 0.805 0.878 0.934 0.9594 0.729 0.811 0.882 0.9175 0.669 0.754 0.833 0.8746 0.622 0.707 0.789 0.8347 0.582 0.666 0.75 0.7988 0.549 0.632 0.716 0.7659 0.521 0.602 0.685 0.735

10 0.497 0.576 0.658 0.70811 0.476 0.553 0.634 0.68412 0.458 0.532 0.612 0.66113 0.441 0.514 0.592 0.64114 0.426 0.497 0.574 0.62315 0.412 0.482 0.558 0.60616 0.4 0.468 0.542 0.5917 0.389 0.456 0.528 0.57518 0.378 0.444 0.516 0.56119 0.369 0.433 0.503 0.54920 0.36 0.423 0.492 0.53721 0.352 0.413 0.482 0.52622 0.344 0.404 0.472 0.51523 0.337 0.396 0.462 0.50524 0.33 0.388 0.453 0.49625 0.323 0.381 0.445 0.48726 0.317 0.374 0.437 0.47927 0.311 0.367 0.43 0.47128 0.306 0.361 0.423 0.46329 0.301 0.355 0.416 0.45630 0.296 0.349 0.409 0.44935 0.275 0.325 0.381 0.41840 0.257 0.304 0.358 0.39345 0.243 0.288 0.338 0.37250 0.231 0.273 0.322 0.35460 0.211 0.25 0.295 0.32570 0.195 0.232 0.274 0.30380 0.183 0.217 0.256 0.28390 0.173 0.205 0.242 0.267

100 0.164 0.195 0.23 0.254

Level of Significance: 2-TailedDF (N-2)

Page 15: When trying to explain some of the patterns you have observed in your species and community data, it sometimes helps to have a look at relationships between.

If r is the correlation coefficient, what is r2?

The amount of covariation compared to the amount of total variation

“The percent of total variance that is shared variance”

E.g. “If r = .80, then X explains 64% of the variability in Y” (and vice versa)

MSExcel can generate r2 values…………….

Page 16: When trying to explain some of the patterns you have observed in your species and community data, it sometimes helps to have a look at relationships between.

A CAUTIONARY NOTE

6

7

8

9

10

11

12

13

10 12 14 16 18 20 22

Temperature (Celcius)

Lo

g N

um

be

rs

r = 0.93

7

8

9

10

11

12

13

20 22 24 26 28 30

Temperature (Celcius)

Lo

g N

um

be

rs

r = 0.911

6

7

8

9

10

11

12

13

10 15 20 25 30

Temperature (Celcius)

Lo

g N

um

be

rs

r = 0.302

0

10

20

30

40

50

60

70

10 12 14 16 18 20 22 24 26 28 30 32

Temperature (Celcius)

Fre

qu

en

cy

BUT……..

Page 17: When trying to explain some of the patterns you have observed in your species and community data, it sometimes helps to have a look at relationships between.

THE END

Image acknowledgements – http://www.google.com

Content acknowledgements – Dr Vanessa Couldridge, UWC