Computer algebra and rank statistics Alessandro Di Bucchianico HCM Workshop Coimbra November 5,...

Post on 31-Mar-2015

215 views 0 download

Tags:

Transcript of Computer algebra and rank statistics Alessandro Di Bucchianico HCM Workshop Coimbra November 5,...

Computer algebra and rank statistics

Alessandro Di Bucchianico

HCM Workshop Coimbra

November 5, 1997

2

How to run this presentation?

• the presentation runs itself most of the time

• click the mouse if you want to continue

• type S to stop or restart the presentation

• underlined items are hyperlinks to files on the World Wide Web (usually Postscripts files of technical reports)

Enjoy my presentation!

3

Outline

• General remarks on nonparametric methods

• What is computer algebra?

• Case study: the Mann-Whitney statistic

• Critical values of rank test statistics

• Moments of the Mann-Whitney statistic

• Conclusions

4

General remarks on nonparametric methods

Practical problems

• tables (limited, errors, not exact,…)

• limited availability in statistical software

• procedures in statistical software often only based on asymptotics

5

General remarks on nonparametric methods

Mathematical problems

• in general no closed expression for distribution function

• direct enumeration only feasible for small sample sizes

• recurrences are time-consuming

6

What is computer algebra?

ExpandAHx+1LIx2 + 1MESample session in Mathematica 3.0

ExpandAH1+xLI1+x2ME1+x+x2+x3

<<"DiscreteMath RSolve "

SeriesTerm@1H1 - x^4L,8x, 0, n<,Assumptions - >8n ³ 0<D1

4+H- 1Ln4

+IfBEven@nD, 1

2H- 1Ln2, 0F

7

Case study: Mann-Whitney statistic

independent samples X1,…,Xm and Y1,…,Yn

continuous distribution functions F, G resp.

(hence, no ties with probability one)

order the pooled sample from small to large

8

Mann-Whitney (continued)

Wilcoxon: Wm,n= i rank(Xi)

Mann-Whitney: Mm,n = #{(i,j) | Yj < Xi}

Wm,n = Mm,n + ½ m (m+1)

What is the distribution of Mm,n under H0:F=G?

9

m

j

j

nm

ni

imn

k

knm

x

x

n

nmxkMP

1

1

0,

)1(

)1(1

)(

Under H0, we have:

Mann-Whitney generating function in Mathematica 3.0

ExpandASimplifyA1Binomial@5, 3D

Ûi=35 H1 - xiLÛj=13 H1 - xjLEE

1

10+

x

10+x2

5+x3

5+x4

5+x5

10+x6

10

CoefficientList@%, xD:110

,1

10,1

5,1

5,1

5,

1

10,

1

10>

10

MannWhitneyCumFreq@m_, n_D:= ModuleA8x, i, j<,FoldListAPlus, 0, CoefficientListAExpandAFactorA1Binomial@m +n, nD*Ûi=n+1

m+n H1 - xiLÛj=1m H1- xjLEE, xEEEMannWhitneyLeftSigValue@m_, n_, k_D:= MannWhitneyCumFreq@@k+2DDMannWhitneyLeftCritValue@m_, n_, a_D:= Module@8value= Length@Select@MannWhitneyCumFreq@m, nD, # £ a&DD- 2<,If@NonNegative@N@valueDD, value, "no critical value exists"DD

11

Computational speed (Pentium 133 MHz)

Exact: P(M5,5 4) = 1/21 0.0476

computing time: 0.05 sec (generating function: degree 25)

P(M5,5 4) 0.0384

Exact: P(M20,20 138) = 0.0482 (rounded)computing time: 8.5 sec (generating function: degree 400)P(M20,20 138) 0.0475

Asymptotics and exact calculations are both useful!

12

Other examples of nonparametric test statistics with closed form for generating function include:

• Wilcoxon signed rank statistic

• Kendall rank correlation statistic

• Kolmogorov one-sample statistic

• Smirnov two-sample statistic

• Jonckheere-Terpstra statistic

Consult the combinatorial literature!

What to do if there is no generating function?

13

Linear rank statistics

Z = 1 if th order statistic in the pooled sample is an X-observation, and 0 otherwise

nm

nm ZaT1

, )(

)1(...)1()Pr( )()1(, yxyxyxkT

m

N Naamknm

Nnm k

Streitberg & Röhmel 1986 (cf. Euler 1748):

Branch-and-bound algorithm (Van de Wiel)

14

Moments of Mann-Whitney statistic

Mann and Whitney (1947) calculated 4th central moment

Fix and Hodges (1955) calculated 6th central moment

Computations are based on recurrences

Can we improve?

solution:computer algebra and generating functions

15

Computing moments of Mm,n

recompute E(Mm,n) (following René Swarttouw)1

)( )(

xk

kn

n

n xkXPdx

dXE

m

k

km

k

knnm

m

mnn

nm

xxxG

xx

xxxG

11,

1

,

)1log()1log()(log

)1)...(1(

)1)...(1(:)(

16

.)(lim :Fact

)('lim)1(' hence ,polynomiala is

)1()1(

)1()()1(

)(log)(

)(

1

)(

1)(log

,1x

,1

,,

1

11

,,

,

1

1

1

1

,

n

mnxG

xGGG

xx

xxknxxk

xGdx

d

xG

xGdx

d

x

xkn

x

xkxG

dx

d

nm

nmx

nmnm

m

kknk

kknknk

nmnm

nm

m

kkn

knm

kk

k

nm

17

Hence, it remains to calculate for 1 k m :

)1()1(

)1()1(lim

)1()1(

)1()()1(lim

1x

11

1x

knk

knn

knk

kknknk

xx

xxnxk

xx

xxknxxk

After some simplifications:

)1()(

)...1()...1(lim

11

1x xknk

xnxk knn

18

L’Hôpital’s rule yields that the limit equals:

2)1(

2

1()1(

2

1

)(

1 nkkknnnkn

knk

It is tedious to perform these computations by hand.

Alternative:

compute moments using Mathematica.

19

LogAG@k_, n_, x_D:= LogA1 - xn+kE- LogA1- xkEDerivativeOfLogG@r_D:= Module@8j, der<,Sum@Simplify@r! Coefficient@

Normal@Series@LogG@k, n, xD,8x, 1, r+1<DD, x - 1, rDD,8k, 1, m<DDFactorialMoments@r_D:= Module@8i, j, equations<,

equations = Table@ReplaceAll@Simplify@Together@D, Log@G@zDD,8z, j<DD,

G@zD® 1D== DerivativeOfLogG@jD,8j, 1, r<D;Flatten@ReplaceAll@Table@D@G@zD,8z, i<D,8i, 1, r<D,Solve@equations, Table@D@G@zD,8z, i<D,8i, 1, r<DDDDD

Mathematica procedures for moments of Mm,n:

20

8th central moment of Mm,n1

34560 Im nH1+m +nLI- 96 m +96 m2 +240 m3 - 240 m4 -

432 m5 - 144 m6 - 96n+192 m n+224 m2 n- 540 m3 n-

100 m4n+780 m5n+404 m6n+96n2+224 m n2 - 600 m2 n2 -

200 m3n2+900 m4 n2 - 48 m5n2 - 420 m6 n2+240n3 - 540 m n3 -

200 m2n3+1095 m3 n3 - 395 m4n3 - 735 m5 n3+175 m6 n3 -

240n4 - 100 m n4+900 m2n4 - 395 m3 n4 - 630 m4 n4+525 m5n4 -

432n5+780 m n5 - 48 m2n5 - 735 m3 n5+525 m4 n5 - 144n6+

404 m n6 - 420 m2n6+175 m3 n6MM

21

Conclusions

• generating functions are also useful in nonparametric statistics

• computer algebra is a natural tool for mathematicians

• asymptotics and exact calculations complement each other

22

Topics under investigation

• tests for censored data

• power calculations

• nonparametric ANOVA (Kruskal-Wallis, block designs, multiple comparisons)

• Spearman’s (rank correlation)

• multimedia/ World Wide Web implementation

Click on underlined items to obtain Postscript file of technical report

23

References

• A. Di Bucchianico, Combinatorics, computer algebra and the Wilcoxon-Mann-Whitney test, to appear in J. Stat. Plann. Inf.

• B. Streitberg and J. Röhmel, Exact distributions for permutation and rank tests: An introduction to some recently published algorithms, Stat. Software Newsletter 12 (1986), 10-18

24

References (continued)• M.A. van de Wiel, Exact distributions of

nonparametric statistics using computer algebra, Master’s Thesis, TUE, 1996

• M.A. van de Wiel and A. Di Bucchianico, The exact distribution of Spearman’s rho, technical report

• M.A. van de Wiel, A. Di Bucchianico and P. van der Laan, Exact distributions of nonparametric test statistics using computer algebra, technical report

25

References (continued)

• M.A. van de Wiel, Edgeworth expansions with exact cumulants for two-sample linear rank statistics , technical report

• M.A. van de Wiel, Exact distributions of two-sample rank statistics and block rank statistics using computer algebra , technical report

26

The End