Post on 19-Dec-2015
Lecture 14Non-parametric hypothesis testing
High Medium Low Pristine33 51 6 2534 43 28 2732 75 2238 47 1929 60 21
49 4664 31
302531253493242857
T-TESTMedium Low Pristine
High 0.145265 0.172254 0.931288Medium 1 0.081749Low 0.211812
High Medium Low Pristine14 6 30 2312 10 20 2215 2 2711 8 2919 4 28
7 93 16
1823162312126205
T-TESTMedium Low Pristine
High 0.121074 0.414821 0.09406Medium 0.5811 0.020992Low 0.103828
The ranking of dataThe ranking of data eliminates outliers and non-linearities.In most cases it reduces within group variances.
All parametric tests can be applied to ranked data!
Spiders on Mazurian lake islands
Disturbance
Effects of ranking
High Medium Low Pristine High Medium Low Pristine33 51 6 25 14 6 30 2334 43 28 27 12 10 20 2232 75 22 15 2 2738 47 19 11 8 2929 60 21 19 4 28
49 46 7 964 31 3 16
30 1825 2331 1625 2334 1293 124 2628 2057 5
Mean 33.2 47 47 33.625 14.2 8 10.57143 18.625
StdDev 3.271085 5.656854 23.4094 18.5324 3.114482 2.828427 10.48582 8.317652CVGroup 10.14954 8.308505 2.00774 1.81439 4.559345 2.828427 1.008164 2.239214
CVMean 5.123992 2.784361
CVStdDev 1.299735 1.620218
CVCVGroup 1.297749 1.800552
Raw data Ranked data
Ranking often reduces the within group variances.
Paired comparisons of the mean; Wilcoxon’s matched pairs rank test
24)12)(1(
4)1(
2
nnn
nn
Wz
z is approximately normally distributed.
IslandAbundance
SpringAbundance
Summer
Górna E 22.1 8.3
Koń 15.1 26.0Kopanka 13.4 41.4Królewski Ostrów 13.1 23.1Maleńka 19.7 10.9Mała Wierzba 9.7 32.5Kopanka N 12.3 9.7Ośrodek 9.7 20.5Piaseczna 34.5 43.5Ruciane - ląd 28.4 16.9Mikołajki - ląd 22.4 13.4Śluza 13.7 38.8Górna W 7.3 16.7Wierzba 26.0 17.0Wygryńska 15.1 39.2
Difference SignAbsolute
valueRanks Sign Sorted Sum
13.9 1 13.9 5 5 -11
-10.9 -1 10.9 7 -7 -10-27.9 -1 27.9 1 -1 -9-10.0 -1 10.0 9 -9 -88.7 1 8.7 14 14 -7
-22.7 -1 22.7 4 -4 -42.6 1 2.6 15 15 -3
-10.8 -1 10.8 8 -8 -2-9.0 -1 9.0 11 -11 -1 -5511.5 1 11.5 6 6 59.0 1 9.0 12 12 6
-25.1 -1 25.1 2 -2 12-9.3 -1 9.3 10 -10 138.9 1 8.9 13 13 14
-24.1 -1 24.1 3 -3 15 65
Past uses a different algorithm for the same test.
W
The Wicoson test is the non-parametric alternative to the one-way repeated measures ANOVA
Sign test
IslandAbundance
SpringAbundance
SummerDifference Sign
Górna E 22.1 8.3 13.9 1
Koń 15.1 26.0 -10.9 -1Kopanka 13.4 41.4 -27.9 -1Królewski Ostrów 13.1 23.1 -10.0 -1Maleńka 19.7 10.9 8.7 1Mała Wierzba 9.7 32.5 -22.7 -1Kopanka N 12.3 9.7 2.6 1Ośrodek 9.7 20.5 -10.8 -1Piaseczna 34.5 43.5 -9.0 -1Ruciane - ląd 28.4 16.9 11.5 1Mikołajki - ląd 22.4 13.4 9.0 1Śluza 13.7 38.8 -25.1 -1Górna W 7.3 16.7 -9.3 -1Wierzba 26.0 17.0 8.9 1Wygryńska 15.1 39.2 -24.1 -1
1 9
-1 6Bernoulli 0.303619
2-sided 0.607239
The rank test of Withney and Mann – U-test
1 11 1 2 1
( 1)
2
n nU n n R
1 2
2U
n n
Expected mean if no difference
1 2 1 2( 1)
12U
n n n nSE
1 2
1 2 1 2
2( 1)
12
U
U
n nUU
tSE n n n n
Expected SE if no difference
The U-test is the nonparametric alternative to the t-test.
Low Pristine
6 2528 2775 2247 1960 2149 4664 31
302531253493242857
Raw data
Spider abundances
Low Pristine
1 610 922 417 220 318 1621 13
126136152351019
Ranked data
n 7 16
R 109 =SUMA(EB3:EB18)
U 31 =EB19*EC19+(EC19+1)*EC19/2-EB20
Mean 56 =EB19*EC19/2
SE 14.96663 =(EB19*EC19*(EB19+EC19+1)/12)^0.5
Z -1.67038 =(EB21-EB22)/EB22
P(Z) 0.047422 =ROZKŁAD.NORMALNY(EB22;EB21;EB23;PRAWDA)Double sided 0.094844
As in the case of the t-test does the ranked ANOVA result in lower significance levels. Ranking levels off the within group heterogeneity (lower within group variance). The test
is less conservative.
Raw data Ranked data
Kruskal-Wallis test or Kruskal-Wallis one way ANOVA by ranks
2
1
123( 1)
( 1)
ri
i i
RKW N
N N n
KW is approximately χ2 distributed. Values can be taken from a c2 table with r-1
degrees of freedom
Raw data ANOVA
High Medium Low Pristine High Medium Low Pristine33 51 6 25 14 6 30 2334 43 28 27 12 10 20 2232 75 22 15 2 2738 47 19 11 8 2929 60 21 19 4 28
49 46 7 964 31 3 16
30 1825 2331 1625 2334 1293 124 2628 2057 5
Raw data Ranked data
ni 5 2 7 16Ri 71 16 74 298
Ri2 / ni 1008.2 128 782.2857 5550.25
KW 3.370783
Chi2(KW;3) 0.337912
Random skewers
Diversity of ground beetles along an elevational gradient
AltitudeNumber
of species
100 86250 98350 75400 80450 50500 61630 55700 45950 49
1030 381100 391200 361300 331500 27
Ranked altitud
e
Ranked number of
species
1 22 13 44 35 76 57 68 99 8
10 1111 1012 1213 1314 14
r 0.969231
Random samples
Ranked altitude
Ranked number
of species
Ranked altitude
Ranked number
of species
Ranked altitude
Ranked number
of species
Ranked altitude
Ranked number
of species
1 2 1 22 1 2 1 2 1
3 4 3 44 3 4 3 4 3
5 7 5 76 5
7 68 9 8 9 8 9
9 8 9 8 9 810 11
11 1012 12 12 12 12 12
13 1314 14 14 14
0.98327 0.95721 0.97105 0.98243
We take 1000 random samples and calculate each time Spearman’s rank order correlation.
If there is no trend in species richness we expect a Bernoulli distribution of positive and negative correlations.
Of 1000 rank correlations 623 were positive.The associated probability is
151000
10*37.121
623
1000)1000,623(
p
AltitudeNumber
of species
100 86250 98350 75400 80450 50500 61630 55700 45950 49
1030 381100 391200 361300 331500 27
It’s highly probable that there is a altitudinal trend in species richness.
What kind of test to be used?
Errors are normally distributedaround the mean
Errors are not normally distributedaround the mean
Comparingtwo means
t-test
Comparingtwo variances
F-test
effect size
overall standard errort
1 2
2 21 2
x xt N
2122
F
Comparingtwo distributions
Comparingexpectation
and observation
Chi2-test
22
1
( )ki i
i i
Obs Exp
Exp
22
1
( )ki i
i i
Obs Exp
Exp
Chi2-test
Kolmogorov -Smirnov-test
max( )cum cumKS Obs Exp
Chi2-test
G-test
1
2 lnk
i
OG O
E
Errors are not normally distributedaround the mean
Comparingtwo means
Comparingexpectation
and observation
Analyzing dependenciesbetween two variables
Sign testMonte Carlo simulation
U-testWilcoxon test
Sign test
Rank correlation
Comparingtwo means
Studying structure
Monte Carlo simulation
Home work and literature
Refresh:
• U-test• Wilcoxon matched pairs test• Sign test• Kruskal Wallis test• Raw and ranked data• Tied ranks
Literature:
Łomnicki: Statystyka dla biologówhttp://statsoft.com/textbook/