Post on 31-Mar-2015
Computer algebra and rank statistics
Alessandro Di Bucchianico
HCM Workshop Coimbra
November 5, 1997
2
How to run this presentation?
• the presentation runs itself most of the time
• click the mouse if you want to continue
• type S to stop or restart the presentation
• underlined items are hyperlinks to files on the World Wide Web (usually Postscripts files of technical reports)
Enjoy my presentation!
3
Outline
• General remarks on nonparametric methods
• What is computer algebra?
• Case study: the Mann-Whitney statistic
• Critical values of rank test statistics
• Moments of the Mann-Whitney statistic
• Conclusions
4
General remarks on nonparametric methods
Practical problems
• tables (limited, errors, not exact,…)
• limited availability in statistical software
• procedures in statistical software often only based on asymptotics
5
General remarks on nonparametric methods
Mathematical problems
• in general no closed expression for distribution function
• direct enumeration only feasible for small sample sizes
• recurrences are time-consuming
6
What is computer algebra?
ExpandAHx+1LIx2 + 1MESample session in Mathematica 3.0
ExpandAH1+xLI1+x2ME1+x+x2+x3
<<"DiscreteMath RSolve "
SeriesTerm@1H1 - x^4L,8x, 0, n<,Assumptions - >8n ³ 0<D1
4+H- 1Ln4
+IfBEven@nD, 1
2H- 1Ln2, 0F
7
Case study: Mann-Whitney statistic
independent samples X1,…,Xm and Y1,…,Yn
continuous distribution functions F, G resp.
(hence, no ties with probability one)
order the pooled sample from small to large
8
Mann-Whitney (continued)
Wilcoxon: Wm,n= i rank(Xi)
Mann-Whitney: Mm,n = #{(i,j) | Yj < Xi}
Wm,n = Mm,n + ½ m (m+1)
What is the distribution of Mm,n under H0:F=G?
9
m
j
j
nm
ni
imn
k
knm
x
x
n
nmxkMP
1
1
0,
)1(
)1(1
)(
Under H0, we have:
Mann-Whitney generating function in Mathematica 3.0
ExpandASimplifyA1Binomial@5, 3D
Ûi=35 H1 - xiLÛj=13 H1 - xjLEE
1
10+
x
10+x2
5+x3
5+x4
5+x5
10+x6
10
CoefficientList@%, xD:110
,1
10,1
5,1
5,1
5,
1
10,
1
10>
10
MannWhitneyCumFreq@m_, n_D:= ModuleA8x, i, j<,FoldListAPlus, 0, CoefficientListAExpandAFactorA1Binomial@m +n, nD*Ûi=n+1
m+n H1 - xiLÛj=1m H1- xjLEE, xEEEMannWhitneyLeftSigValue@m_, n_, k_D:= MannWhitneyCumFreq@@k+2DDMannWhitneyLeftCritValue@m_, n_, a_D:= Module@8value= Length@Select@MannWhitneyCumFreq@m, nD, # £ a&DD- 2<,If@NonNegative@N@valueDD, value, "no critical value exists"DD
11
Computational speed (Pentium 133 MHz)
Exact: P(M5,5 4) = 1/21 0.0476
computing time: 0.05 sec (generating function: degree 25)
P(M5,5 4) 0.0384
Exact: P(M20,20 138) = 0.0482 (rounded)computing time: 8.5 sec (generating function: degree 400)P(M20,20 138) 0.0475
Asymptotics and exact calculations are both useful!
12
Other examples of nonparametric test statistics with closed form for generating function include:
• Wilcoxon signed rank statistic
• Kendall rank correlation statistic
• Kolmogorov one-sample statistic
• Smirnov two-sample statistic
• Jonckheere-Terpstra statistic
Consult the combinatorial literature!
What to do if there is no generating function?
13
Linear rank statistics
Z = 1 if th order statistic in the pooled sample is an X-observation, and 0 otherwise
nm
nm ZaT1
, )(
)1(...)1()Pr( )()1(, yxyxyxkT
m
N Naamknm
Nnm k
Streitberg & Röhmel 1986 (cf. Euler 1748):
Branch-and-bound algorithm (Van de Wiel)
14
Moments of Mann-Whitney statistic
Mann and Whitney (1947) calculated 4th central moment
Fix and Hodges (1955) calculated 6th central moment
Computations are based on recurrences
Can we improve?
solution:computer algebra and generating functions
15
Computing moments of Mm,n
recompute E(Mm,n) (following René Swarttouw)1
)( )(
xk
kn
n
n xkXPdx
dXE
m
k
km
k
knnm
m
mnn
nm
xxxG
xx
xxxG
11,
1
,
)1log()1log()(log
)1)...(1(
)1)...(1(:)(
16
.)(lim :Fact
)('lim)1(' hence ,polynomiala is
)1()1(
)1()()1(
)(log)(
)(
1
)(
1)(log
,1x
,1
,,
1
11
,,
,
1
1
1
1
,
n
mnxG
xGGG
xx
xxknxxk
xGdx
d
xG
xGdx
d
x
xkn
x
xkxG
dx
d
nm
nmx
nmnm
m
kknk
kknknk
nmnm
nm
m
kkn
knm
kk
k
nm
17
Hence, it remains to calculate for 1 k m :
)1()1(
)1()1(lim
)1()1(
)1()()1(lim
1x
11
1x
knk
knn
knk
kknknk
xx
xxnxk
xx
xxknxxk
After some simplifications:
)1()(
)...1()...1(lim
11
1x xknk
xnxk knn
18
L’Hôpital’s rule yields that the limit equals:
2)1(
2
1()1(
2
1
)(
1 nkkknnnkn
knk
It is tedious to perform these computations by hand.
Alternative:
compute moments using Mathematica.
19
LogAG@k_, n_, x_D:= LogA1 - xn+kE- LogA1- xkEDerivativeOfLogG@r_D:= Module@8j, der<,Sum@Simplify@r! Coefficient@
Normal@Series@LogG@k, n, xD,8x, 1, r+1<DD, x - 1, rDD,8k, 1, m<DDFactorialMoments@r_D:= Module@8i, j, equations<,
equations = Table@ReplaceAll@Simplify@Together@D, Log@G@zDD,8z, j<DD,
G@zD® 1D== DerivativeOfLogG@jD,8j, 1, r<D;Flatten@ReplaceAll@Table@D@G@zD,8z, i<D,8i, 1, r<D,Solve@equations, Table@D@G@zD,8z, i<D,8i, 1, r<DDDDD
Mathematica procedures for moments of Mm,n:
20
8th central moment of Mm,n1
34560 Im nH1+m +nLI- 96 m +96 m2 +240 m3 - 240 m4 -
432 m5 - 144 m6 - 96n+192 m n+224 m2 n- 540 m3 n-
100 m4n+780 m5n+404 m6n+96n2+224 m n2 - 600 m2 n2 -
200 m3n2+900 m4 n2 - 48 m5n2 - 420 m6 n2+240n3 - 540 m n3 -
200 m2n3+1095 m3 n3 - 395 m4n3 - 735 m5 n3+175 m6 n3 -
240n4 - 100 m n4+900 m2n4 - 395 m3 n4 - 630 m4 n4+525 m5n4 -
432n5+780 m n5 - 48 m2n5 - 735 m3 n5+525 m4 n5 - 144n6+
404 m n6 - 420 m2n6+175 m3 n6MM
21
Conclusions
• generating functions are also useful in nonparametric statistics
• computer algebra is a natural tool for mathematicians
• asymptotics and exact calculations complement each other
22
Topics under investigation
• tests for censored data
• power calculations
• nonparametric ANOVA (Kruskal-Wallis, block designs, multiple comparisons)
• Spearman’s (rank correlation)
• multimedia/ World Wide Web implementation
Click on underlined items to obtain Postscript file of technical report
23
References
• A. Di Bucchianico, Combinatorics, computer algebra and the Wilcoxon-Mann-Whitney test, to appear in J. Stat. Plann. Inf.
• B. Streitberg and J. Röhmel, Exact distributions for permutation and rank tests: An introduction to some recently published algorithms, Stat. Software Newsletter 12 (1986), 10-18
24
References (continued)• M.A. van de Wiel, Exact distributions of
nonparametric statistics using computer algebra, Master’s Thesis, TUE, 1996
• M.A. van de Wiel and A. Di Bucchianico, The exact distribution of Spearman’s rho, technical report
• M.A. van de Wiel, A. Di Bucchianico and P. van der Laan, Exact distributions of nonparametric test statistics using computer algebra, technical report
25
References (continued)
• M.A. van de Wiel, Edgeworth expansions with exact cumulants for two-sample linear rank statistics , technical report
• M.A. van de Wiel, Exact distributions of two-sample rank statistics and block rank statistics using computer algebra , technical report
26
The End