0
0.05
0.1
0.15
0.2
0.25
0.3
-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5
Freq
uenc
y
Correlation
Approximate random distribution of coefficients of correlation for two random variates
g= 0.03
Under a normal approximation we can use Z-transformed score for
statistical infering.
ExpStdDevExpObsZ
P(m - s < X < m + s) = 68%P(m - 1.65s < X < m + 1.65s) =
90%P(m - 1.96s < X < m + 1.96s) =
95%P(m - 2.58s < X < m + 2.58s) =
99% P(m - 3.29s < X < m + 3.29s) =
99.9%
The Fisherian significance levels
The standard normal distribution
Z is standard normally distributed
00.020.040.060.08
0.10.120.140.160.18
0.2
0 3 6 9 12 15 18X
f(x)
n=20
0
0.02
0.04
0.06
0.08
0.1
0.12
0 6 12 18 24 30 36 42 48X
f(x)
n=50
0
0.05
0.1
0.15
0.2
0.25
0.3
0 2 4 6 8 10X
f(x)
n=10
0
0.01
0.02
0.03
0.04
0.05
0.06
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5X
f(x)
2
2( )21( )
2
x
f x em
s
s
+s-s 0.68
+2s-2s 0.95
Lecture 2Randomization techniques
Country sq.km DeltaTAlbania 28748 17Andorra 468 15Austria 83871 20Azores 2200 7Baleary Islands 5014 15Belarus 207650 23Belgium 30528 15Bosnia and Herzegovina 51197 20Bulgaria 110971 21Canary Islands 7270 5Channel Is. 300 10Corsica 8680 13Crete 8259 13Croatia 56594 21Cyclades Is. 2500 12Cyprus 9250 19Czech Republic 78866 19Denmark 43093 16Dodecanese Is. 2663 14Estonia 45227 21Faroe Is. 1399 7Finland 338145 23France 543965 15Franz Josef Land 16134 27Germany 357021 19Gibraltar 6.5 10Greece 131992 17Hungary 93054 22Iceland 103000 12Ireland 70273 10Italy 301401 16Kaliningrad Region 15000 19Latvia 64626 20Liechtenstein 160 14Lithuania 65318 22Luxembourg 2588 16Macedonia 25339 23Madeira(Funchal) 789 5Malta 316 14Moldova 33709 23Monaco 1.95 12… … …
Average temperature difference in European countries/islands
Permutation test probability
Bootstrap probability
Probability level
Parameters and standard errors
Consider the coefficient of correlation. Statistical significance of r > 0 (H1) is tested against the null hypothesis H0 of r = 0. Most statistics programs do this using Fisher’s Z-
transformation
1 1 rZ ln2 1 r
Reshuffling
Permutation testing
Random number ln area ln Delta T r Sim r Average r Average r0.247012838 11.33704 2.833213 0.457176 0.14894 0.08609641 =+ŚREDNIA(H2:H21)0.303300878 12.65321 2.70805 0.014534 StdDev r StdDev r0.725633833 9.917045 2.995732 0.157997 0.16530152 +ODCH.STANDARDOWE(H2:H21)0.258217857 0.667829 1.94591 0.0310330.632451857 7.243513 2.70805 -0.14119 t t0.254528292 7.696213 3.135494 0.268839 10.0393331 (H2-J2)/J4*20^0.50.980671601 13.01692 2.70805 0.117112 P(t) P(t)0.522396276 10.62825 2.995732 0.137361 4.9403E-09 +ROZKŁAD.T(J7,19,2)0.683545674 11.08702 3.044522 0.214470.773648713 7.887209 1.609438 0.159525 Z Z0.359562515 10.3264 2.302585 -0.05251 2.24486312 +(G2-H2)/J40.128137778 12.68838 2.564949 -0.23382 P(Z) P(Z)0.573061911 11.7905 2.564949 0.072888 0.03687629 =ROZKŁAD.T(J12,19,2)0.025421522 12.78555 3.044522 -0.046160.087309492 11.42796 2.484907 0.2224670.20159921 9.132379 2.944439 -0.143290.438208554 12.40519 2.944439 -0.05720.575893524 13.13427 2.772589 0.4491860.931176694 10.1401 2.639057 0.1675530.0309793 10.67112 3.044522 0.234201
0.032472788 10.63432 1.945910.352239001 9.019059 3.135494
We reorder one of the variables at random (at least
1000 times)
We calculate the mean, standard deviation, and the upper and lower confidence intervals.This gives us an estimate of how probable is the observed correlation.
0
0.05
0.1
0.15
0.2
0.25
-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5
Freq
uenc
y
Correlation
The distribution of randomized correlation coefficients
Observed value
The distribution is not symmetric.We can’t use Z-transformed values (the normal approximation)We can’t use a t-test.
Lower two-sided 1% confidence
limit
Upper two-sided 1% confidence
limit
We have to use the upper and lower probability levels. We get them directly from the random distribution
Probability level for r = 0.457: P = 0.0006
JackknifingTime Blood pressure
1 1152 1173 1244 1215 1226 1197 1208 1269 11710 12211 12112 12713 12914 12215 12216 12917 12118 11119 11320 114
115 115 115 115 115 115 115 115 115 115 115 115 115 115 115 115 115 115 115117 117 117 117 117 117 117 117 117 117 117 117 117 117 117 117 117 117 117124 124 124 124 124 124 124 124 124 124 124 124 124 124 124 124 124 124 124121 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122119 119 119 119 119 119 119 119 119 119 119 119 119 119 119 119 119 119 119120 120 120 120 120 120 120 120 120 120 120 120 120 120 120 120 120 120 120126 126 126 126 126 126 126 126 126 126 126 126 126 126 126 126 126 126 126117 117 117 117 117 117 117 117 117 117 117 117 117 117 117 117 117 117 117122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122121 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127129 129 129 129 129 129 129 129 129 129 129 129 129 129 129 129 129 129 129122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122129 129 129 129 129 129 129 129 129 129 129 129 129 129 129 129 129 129 129121 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121111 111 111 111 111 111 111 111 111 111 111 111 111 111 111 111 111 111 111113 113 113 113 113 113 113 113 113 113 113 113 113 113 113 113 113 113 113114 114 114 114 114 114 114 114 114 114 114 114 114 114 114 114 114 114 114
Blood pressure
Mean 121Stddev 4.97CV 0.04
121 121 121 121 121 121 121 121 121 121 121 121 120 121 121 120 121 121 121 1214.923 5.039 5.049 5.108 5.087 5.093 5.103 4.954 5.052 5.097 5.102 4.870 4.721 5.094 5.101 4.712 5.105 4.544 4.742 4.8760.041 0.042 0.042 0.042 0.042 0.042 0.042 0.041 0.042 0.042 0.042 0.040 0.039 0.042 0.042 0.039 0.042 0.037 0.039 0.040
0.03 0.05 0.05 0.06 0.06 0.06 0.06 0.04 0.05 0.06 0.06 0.03 0.00 0.06 0.06 0.00 0.06 -0.03 0.00 0.030.04
0.000 0.003 0.003 0.004 0.004 0.004 0.004 0.002 0.003 0.004 0.004 0.001 0.000 0.004 0.004 0.000 0.004 0.001 0.000 0.0010.0460.01
PseudovaluesMean Squared differencesSumStandard error
( 1)( )i ip X n X X
2( )( 1)
ip p
SEn n
The jackknifed standard error of the coefficient of variation
Population
Sample
1000 bootstrap samples
1000 bootstrapparameter estimates
Bootstrap distribution
Distribution parametersas estimates of
population distribution
Bootstrapping
Take the original values and calculate the parameter you need
Take 1000 random samples of different size
Calculate 1000 parameters from the bootstrap samples
Compare the observed value with the parameters distribution and calulate the confidence limits for the observed value
Time Blood pressure1 1152 1173 1244 1215 1226 1197 1208 1269 11710 12211 12112 12713 12914 12215 12216 12917 12118 11119 11320 114
Mean 121Stddev 4.97CV 0.041Mean Standard deviation
115 115 115 115 115 115 115 115 115 115 115 115 117 117 117 117 117 117 117 117 117
124 124 124 124 124 124 124 124 124 124 121 121 121 121 121 121 121 121 121 121 121 121 121 121 121
122 122 122 122 122 122 122 122 122119 119 119 119 119 119 119 119 119 119 119
120 120 120 120 120 120 120 120 120 120 120 120 120 120 126 126 126 126 126 126 126 126 126 126
117 117 117 117 117 117 117 117 117 122 122 122 122 122 122 122 122 122 122 122 122 122 122
121 121 121 121 121 121 121 121 121 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 129 129 129 129 129 129 129 129 129 129
122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122 122
129 129 129 129 129 129 129 129 129 129 129 129 129 121 121 121 121 121 121 121 121 121 121
111 111 111 111 111 111 111 111 111 111 111 111 111 111 111 111113 113 113 113 113 113 113114 114 114 114 114 114 114 114 114
Blood pressure
120 120 120 122 121 122 121 121 120 122 121 121 122 120 120 120 121 122 120 1194.140 4.617 5.164 5.066 5.593 5.288 5.397 5.166 5.952 5.528 5.585 4.683 3.874 5.045 4.229 5.028 5.653 4.291 4.657 5.6900.034 0.038 0.043 0.042 0.046 0.043 0.045 0.043 0.049 0.045 0.046 0.039 0.032 0.042 0.035 0.042 0.047 0.035 0.039 0.0480.0420.005
We use at least 1000 random samples and calculate for each sample CV. The standard deviation of thses CV values is an estimate of the standard error of the original CV.
The standard error of a distribution is identical to the standard deviation of the sample.
0
0.05
0.1
0.15
0.2
0.25
0.3 0.32 0.34 0.36 0.38 0.4 0.42 0.44 0.46 0.48 0.5
Freq
uenc
y
CV
Bootstrap distribution
ii
i
x xb ns
Mean 121 120 120 120 122 121 122 121 121 120 122 121Stddev 4.97 4.140 4.617 5.164 5.066 5.593 5.288 5.397 5.166 5.952 5.528 5.585CV 0.041 0.034 0.038 0.043 0.042 0.046 0.043 0.045 0.043 0.049 0.045 0.046Mean 0.042Standard deviation 0.005N 11.00 13.00 17.00 13.00 11.00 12.00 15.00 11.00 11.00 13.00 11.00Studentized values -0.007 -0.003 0.001 0.000 0.005 0.002 0.003 0.001 0.008 0.004 0.005
0
0.05
0.1
0.15
0.2
0.25
-0.007 -0.005 -0.003 -0.001 0.001 0.003
Freq
uenc
y
Studentized CV
The mean CV values are based on samples of different size. The scores are therefore of different value.
We have to use weighed averages
Monte Carlo simulation.
Null models
Darwin finch
Photo:Guardian Unlimited
Do the beak length of Darwin finches as a measure of resource usage differ more or less than expected just by chance?
0 5 10 15
0 5 10 15
0 5 10 15
0 5 10 15
0 5 10 15
0 5 10 15
The classical method to answer this question is to compare the observed variance in beak length differences with those obtained from a random draw of beak length inside
the observed range (smallest and largest beak size being fixed).
This is a null model approach
We test whether this null model approach is reliable
We have randomly assigned beak length of 20 species measured in mmOrdiginal
data Sorted data Difference
50 120 2 115 9 723 12 341 15 321 17 238 18 118 19 117 20 132 20 024 21 119 23 220 24 112 28 428 32 449 37 5
2 38 19 41 3
37 49 81 50 1
Variance 4.81
Randomized data
Adjusting precision Sorting Difference Randomized
dataAdjusting precision Sorting Difference
1 1 1 1 1 125.6752138 26 1 0 11.2255116 11 7 648.1121149 48 7 5 9.78997347 10 7 042.1150435 42 14 6 8.95736252 9 8 022.9872128 23 23 8 7.83346076 8 9 132.1307563 32 23 0 23.0002153 23 10 038.3337144 38 25 1 28.576216 29 11 148.6675789 49 26 0 49.8830873 50 11 023.4858483 23 29 3 39.7427775 40 18 624.7944844 25 31 1 19.0851063 19 19 135.5989343 36 32 0 47.943224 48 23 314.0908314 14 36 3 47.5814688 48 23 031.2566943 31 38 2 35.7635977 36 26 229.4842605 29 42 3 7.40672814 7 29 243.5479253 44 43 0 22.6873765 23 36 71.21857593 1 44 0 25.9080369 26 40 37.12456902 7 47 3 10.8669441 11 48 742.6867606 43 48 1 7.26532825 7 48 047.0118747 47 49 0 17.5330692 18 50 1
50 50 50 1 50 50 50 0
Variance 5.39 6.43
1000 randomizations Null model distribution
1 12 13 44 155 896 1657 2568 1999 131
10 8611 4912 4
1000
0
50
100
150
200
250
300
1 2 3 4 5 6 7 8 9 10 11 12N
umbe
rVariance
P (H0) = 21/1000 = 0.021 The null distribution gives us directly the H0 probability.
Observed variance Randomized variances
Meningitis in Europe Distribution of forests in Europe
Is the probability of Meningitis infection correlated to the distribution of forests in Europe?
We use a grid aproach
We use the corefficient of correlation between the entries of both gridsR = 0.06; P(R=0) > 0.1.
The distance between the sites might be of importance.
Meningitis casesGrid SitesSites 46 33 44 82 18 73
87 83 45 63 54 1665 79 11 24 53 2079 62 67 40 6 89
3 63 8 4 5 54 6 2 8 0 237 3 5 5 5 879 10 7 2 0 448 7 9 3 5 29
60 61 66 13 40 9047 96 92 40 76 3617 36 43 98 10 3123 17 275 149 36 4134 19 943 46 40 6627 95 48 603 6 3
Forest densityGrid SitesSites 83 73 675 193 84 50
72 29 441 479 44 7459 59 5 8 39 3377 37 10 8 66 3055 58 2 1 14 172 46 7 4 79 16
2 45 9 0 38 10061 65 643 876 56 6795 25 772 480 97 3581 52 948 722 92 995 95 23 9 69 3434 53 7 18 86 4414 29 30 1 30 6847 52 11 24 92 8175 17 632 641 66 93
Meningitis in Europe Distribution of forests in Europe
Meningitis casesGrid SitesSites 46 33 44 82 18 73
87 83 45 63 54 1665 79 11 24 53 2079 62 67 40 6 89
3 63 8 4 5 54 6 2 8 0 237 3 5 5 5 879 10 7 2 0 448 7 9 3 5 29
60 61 66 13 40 9047 96 92 40 76 3617 36 43 98 10 3123 17 275 149 36 4134 19 943 46 40 6627 95 48 603 6 3
Forest densityGrid LongitudeLatitude 643 56 67 876 61 65
772 97 35 480 95 25948 92 9 722 81 52
11 92 81 24 47 525 39 33 8 59 59
30 30 68 1 14 297 86 44 18 34 53
441 44 74 479 72 299 38 100 0 2 45
10 66 30 8 77 37632 66 93 641 75 17675 84 50 193 83 73
23 69 34 9 95 952 14 1 1 55 587 79 16 4 72 46
We reshuffle rows and columns only to get the null model distribution.
0
50
100
150
200
250
-0.5
-0.4
5-0
.4-0
.35
-0.3
-0.2
5-0
.2-0
.15
-0.1
-0.0
5 00.
05 0.1
0.15 0.
20.
25 0.3
0.35 0.
40.
45 0.5
Num
ber
Correlation
P (H0) = 26/1000 = 0.026
Mantel testSequence
Caruabus coriaceus A T T T G C A T G C ACarabus auronitens A G T A A C A G G G ACarabus cancellatus A C G T G C A T C C TCarabus auratus A T A T G C T T G G T
Caruabus coriaceus
Carabus auronitens
Carabus cancellatus
Carabus auratus
Caruabus coriaceus 0 5 4 4Carabus auronitens 5 0 8 7Carabus cancellatus 4 8 0 5Carabus auratus 4 7 5 0
PreyCollembola Diptera Arachnida
Caruabus coriaceus 50 20 30Carabus auronitens 60 10 40Carabus cancellatus 50 25 25Carabus auratus 30 60 10
Caruabus coriaceus
Carabus auronitens
Carabus cancellatus
Carabus auratus
Caruabus coriaceus 0 0.95 0.94 -0.11Carabus auronitens 0.95 0 0.81 -0.68Carabus cancellatus 0.94 0.81 0 -0.11Carabus auratus -0.11 0.68 -0.11 0 Coefficient of correlation between
matrix entries
n n
ij iji 1 j 1
1r Z(1) Z(2)n 1
For convenience we use Z-transformed data
The Mantel test is a test for the correlation between two distance matrices.It tests whether distances are correlated.
Reshuffling of values among matrix entries.
Top Related