Rank correlation- some features and an application

54
On some interesting features and an application of rank correlation Kushal Kr. Dey Indian Statistical Institute D.Basu Memorial Award Talk 2011 Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011 On some interesting features and an application of rank correla

description

This PPT was presented by Kushal Kumar Dey, a B.Stat (undergraduate) student from Indian Statistical Institute, Kolkata for D.Basu Memorial Award

Transcript of Rank correlation- some features and an application

Page 1: Rank correlation- some features and an application

On some interesting features and anapplication of rank correlation

Kushal Kr. Dey

Indian Statistical InstituteD.Basu Memorial Award Talk 2011

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 2: Rank correlation- some features and an application

List of contents

1 Historical overview of rank correlation.

2 Some properties of rank correlation.

3 A practical example of rank correlation.

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 3: Rank correlation- some features and an application

Historical Overview—Correlation

In 1886, Sir Francis Galton coined the term correlation byquoting

length of a human arm is said to be correlated withthat of the leg, because a person with long arm hasusually long legs and conversely.

Galton wanted a measure of correlation that takes value +1for perfect correspondence, 0 for independence, and -1 forperfect inverse correspondence.

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 4: Rank correlation- some features and an application

Historical Overview—Correlation

In 1886, Sir Francis Galton coined the term correlation byquoting

length of a human arm is said to be correlated withthat of the leg, because a person with long arm hasusually long legs and conversely.

Galton wanted a measure of correlation that takes value +1for perfect correspondence, 0 for independence, and -1 forperfect inverse correspondence.

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 5: Rank correlation- some features and an application

Historical overview—contd.

Karl Pearson, a student of Galton, worked on his idea andformulated his ”product moments” measure of correlation in1896.

r =Sxy√

Sxx

√Syy

. (1)

Spearman observed that for characteristics not quantitativelymeasurable, the Pearsonian measure fails to measure theassociation. This motivated him to use rank-based methodsfor association and develop his rank correlation coefficient in1904. [”The proof and measurement of association betweentwo things” by C. Spearman in The American Journal ofPsychology (1904)].

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 6: Rank correlation- some features and an application

Historical overview—contd.

Karl Pearson, a student of Galton, worked on his idea andformulated his ”product moments” measure of correlation in1896.

r =Sxy√

Sxx

√Syy

. (1)

Spearman observed that for characteristics not quantitativelymeasurable, the Pearsonian measure fails to measure theassociation. This motivated him to use rank-based methodsfor association and develop his rank correlation coefficient in1904. [”The proof and measurement of association betweentwo things” by C. Spearman in The American Journal ofPsychology (1904)].

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 7: Rank correlation- some features and an application

Historical overview contd

In 1938, two years after the death of Pearson, MauriceKendall, a British scientist, while working on psychologicalexperiments, came up with a new measure of correlationpopularly known as Kendall’s τ . [”A new measure of rankcorrelation”, M. Kendall, Biometrika,(1938)].

Th next few years saw extensive research in this area due toKendall, Daniels, Hoeffding and others.

In 1954, a modification to Kendall’s coefficient in case of tieswas made by Goodman and Kruskal. [”Measures ofassociation for cross classifications” Part I, L.A.Goodman andW.H. Kruskal, J. Amer. Statist. Assoc, (1954)]

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 8: Rank correlation- some features and an application

Historical overview contd

In 1938, two years after the death of Pearson, MauriceKendall, a British scientist, while working on psychologicalexperiments, came up with a new measure of correlationpopularly known as Kendall’s τ . [”A new measure of rankcorrelation”, M. Kendall, Biometrika,(1938)].

Th next few years saw extensive research in this area due toKendall, Daniels, Hoeffding and others.

In 1954, a modification to Kendall’s coefficient in case of tieswas made by Goodman and Kruskal. [”Measures ofassociation for cross classifications” Part I, L.A.Goodman andW.H. Kruskal, J. Amer. Statist. Assoc, (1954)]

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 9: Rank correlation- some features and an application

Daniel’s Generalized correlation coefficient

H.E. Daniels of Cambridge University, a close associate ofKendall, proposed a measure in 1944 to unify Pearson’s r ,Spearman’s ρ and Kendall’s τ [The relation betweenmeasures of correlation in the universe of samplepermutations, H.E.Daniels, Biometrika,(1944)].

Consider n data points given by (Xi ,Yi ), i = 1(|)n , for eachpair of X ’s, (Xi ,Xj), we may allot aij = −aji and aii = 0,similarly, we may allot bij to the pair (Yi ,Yj), then Daniel’sgeneralized coefficient D is given by

Dd=

∑ni=1

∑nj=1 aijbij

(∑n

i=1

∑nj=1 aij2.

∑ni=1

∑nj=1 bij

2)12

(2)

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 10: Rank correlation- some features and an application

Daniel’s Generalized correlation coefficient

H.E. Daniels of Cambridge University, a close associate ofKendall, proposed a measure in 1944 to unify Pearson’s r ,Spearman’s ρ and Kendall’s τ [The relation betweenmeasures of correlation in the universe of samplepermutations, H.E.Daniels, Biometrika,(1944)].

Consider n data points given by (Xi ,Yi ), i = 1(|)n , for eachpair of X ’s, (Xi ,Xj), we may allot aij = −aji and aii = 0,similarly, we may allot bij to the pair (Yi ,Yj), then Daniel’sgeneralized coefficient D is given by

Dd=

∑ni=1

∑nj=1 aijbij

(∑n

i=1

∑nj=1 aij2.

∑ni=1

∑nj=1 bij

2)12

(2)

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 11: Rank correlation- some features and an application

Daniel’s generalized coefficient contd.

Special cases

Put aij as Xj − Xi and bij as Yj − Yi to get Pearson’s r .

Put aij as Rank(Xj)− Rank(Xi ) and bij asRank(Yj)− Rank(Yi ) to get Spearman’s ρ.

Put aij as sgn(Xj − Xi ) and bij as sgn(Yj − Yi ) to getKendall’s τ .

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 12: Rank correlation- some features and an application

Alternative expression for τ and ρ

First, we define dij to be +1 when the rank j ( j > i) precedesthe rank i in the second ranking and zero otherwise.

We can write the Kendall’s τ as the following

τ = 1− 4Q

n(n − 1)(3)

where Q is the total score, Q =∑

i<j dij and n is the totalnumber of elements in the sample.

Similarly, we can write Spearman’s ρ as the following

ρ = 1− 12V

n(n2 − 1)(4)

where V =∑

i<j (j − i)dij is the sum of inversions weightedby the numerical difference between the ranks inverted. Thisdifference is called the weight of inversion.

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 13: Rank correlation- some features and an application

Alternative expression for τ and ρ

First, we define dij to be +1 when the rank j ( j > i) precedesthe rank i in the second ranking and zero otherwise.

We can write the Kendall’s τ as the following

τ = 1− 4Q

n(n − 1)(3)

where Q is the total score, Q =∑

i<j dij and n is the totalnumber of elements in the sample.

Similarly, we can write Spearman’s ρ as the following

ρ = 1− 12V

n(n2 − 1)(4)

where V =∑

i<j (j − i)dij is the sum of inversions weightedby the numerical difference between the ranks inverted. Thisdifference is called the weight of inversion.

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 14: Rank correlation- some features and an application

An interesting result

We simulated observations in large sample size from abivariate normal distribution and plotted the mean values ofSpearman’s ρ and Kendall’s τ against Pearson’s r . Weobtained the following graph.

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 15: Rank correlation- some features and an application

The graph

Figure: Relation of Spearman’s ρ and Kendall’s τ with Pearson’s rKushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 16: Rank correlation- some features and an application

Relation of τ and ρ with r for BVN

In 1907, Pearson , in his book [”On Further Methods ofDetermining Correlation”, Karl Pearson, Biometric series IV,(1907)], established the following relation betweenSpearman’s ρ and his r for bivariate normal distribution.

r = 2 sin(π

6ρ)

(5)

Cramer, in 1946, also established a relation between Kendall’sτ and Pearson’s r for bivariate normal.

r = sin(π

2τ)

(6)

However it is easy to show that the above two relations holdfor any elliptic distribution.

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 17: Rank correlation- some features and an application

Relation of τ and ρ with r for BVN

In 1907, Pearson , in his book [”On Further Methods ofDetermining Correlation”, Karl Pearson, Biometric series IV,(1907)], established the following relation betweenSpearman’s ρ and his r for bivariate normal distribution.

r = 2 sin(π

6ρ)

(5)

Cramer, in 1946, also established a relation between Kendall’sτ and Pearson’s r for bivariate normal.

r = sin(π

2τ)

(6)

However it is easy to show that the above two relations holdfor any elliptic distribution.

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 18: Rank correlation- some features and an application

Relation of τ and ρ with r for BVN

In 1907, Pearson , in his book [”On Further Methods ofDetermining Correlation”, Karl Pearson, Biometric series IV,(1907)], established the following relation betweenSpearman’s ρ and his r for bivariate normal distribution.

r = 2 sin(π

6ρ)

(5)

Cramer, in 1946, also established a relation between Kendall’sτ and Pearson’s r for bivariate normal.

r = sin(π

2τ)

(6)

However it is easy to show that the above two relations holdfor any elliptic distribution.

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 19: Rank correlation- some features and an application

Relation between Kendall’s τ and r for bivariatenormal

Let (X1,Y1), (X2,Y2), . . . , (Xn,Yn) be a sample drawn fromBVN(0,0,1,1,r). Then Kendall’s τ computed from the data isan unbiased estimator of

2P((X1 − X2)(Y1 − Y2) > 0)− 1 = 2P(Z1Z2 > 0)− 1 (7)

where (Z1,Z2) ∼ BVN(0, 0, 2, 2, 2r).

Note that (Z1,Z2)d=√

2(V√

1− r2 + Wr ,W ) where (V ,W )have standard normal distribution. Since (Z1,Z2) is symmetricabout (0, 0)

4P(Z1 > 0,Z2 > 0)−1 = 4P(V√

1− r2+Wr > 0,W > 0)−1(8)

Use polar transformation on (V ,W ) and evaluate thisprobability to get 2

π sin−1 r .

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 20: Rank correlation- some features and an application

Relation between Kendall’s τ and r for bivariatenormal

Let (X1,Y1), (X2,Y2), . . . , (Xn,Yn) be a sample drawn fromBVN(0,0,1,1,r). Then Kendall’s τ computed from the data isan unbiased estimator of

2P((X1 − X2)(Y1 − Y2) > 0)− 1 = 2P(Z1Z2 > 0)− 1 (7)

where (Z1,Z2) ∼ BVN(0, 0, 2, 2, 2r).

Note that (Z1,Z2)d=√

2(V√

1− r2 + Wr ,W ) where (V ,W )have standard normal distribution. Since (Z1,Z2) is symmetricabout (0, 0)

4P(Z1 > 0,Z2 > 0)−1 = 4P(V√

1− r2+Wr > 0,W > 0)−1(8)

Use polar transformation on (V ,W ) and evaluate thisprobability to get 2

π sin−1 r .

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 21: Rank correlation- some features and an application

Relation between Spearman’s ρ and r for bivariatenormal

Now we try to give a sketch of a proof of the relationshipbetween Pearson’s r and Spearman’s ρ for bivariate normaldistribution .

Let R(Xi ) and R(Yi ) be the ranks of Xi and Yi . DefineH(t) = I{t>0}. Then, observe that

R(Xi ) =n∑

j=1

H(Xi − Xj) + 1 (9)

Note that Spearman’s ρ is the Pearson’s correlation coefficient

between R(Xi ) and R(Yi ) which ish− 1

4n(n−1)2

112n(n2−1)

where h =∑n

i=1

∑nj=1

∑nk=1 H(Xi − Xj)H(Yi − Yk).

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 22: Rank correlation- some features and an application

Proof continued

Case 1

If i , j , k are distinct, then (Xi − Xj ,Yi − Yk) are distributed asBVN(0, 0, 2, 2, r2).

E{H(Xi − Xj)H(Yi − Yk)} will reduce to the integral of theprobability density over the positive quadrant.

We can check, following similar technique as in the case of τthat, this integral is 1

2(1− 1π cos−1 r

2).

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 23: Rank correlation- some features and an application

Proof continued

Case 2

If i 6= j = k , then (Xi − Xj ,Yi − Yk) are distributed asBVN(0, 0, 2, 2, r) and the above expectation would reduce to12(1− 1

π cos−1 r). Then,

E

(h − 1

4n(n − 1)2

112n(n2 − 1)

)=

6

π

{n − 2

n + 1sin−1

r

2+

1

n + 1sin−1 r

}(10)

As n goes to infinity, the R.H.S reduces to 6π sin−1 r

2 .

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 24: Rank correlation- some features and an application

Reason for approximate linear relationship betweenSpearman’s ρ and Pearson’s r for BVN

As observed from the graph, Spearman’s ρ for Bivariatenormal is almost linearly related with Pearson’s r . This maybe attributed to the fact that ρ = 6

π sin−1 r2

= 6π ( r

2 + 16r3

8 + . . .)

= 3π r + terms very small compared to 1st order term≈ 3

π r

For Kendall’s τ , using similar expansion, we can also showthat τ convex function of r in the interval [0,1]. a

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 25: Rank correlation- some features and an application

Kendall’s comparative assessment of τ and ρ

Kendall in his paper admitted that ρ can take n3−n6 values

between −1 and +1, whereas τ can take only n2−n2 values in

the range, but according to him, this does not seriously affectthe sensitivity of τ .

Both Kendall’s τ and Spearman’s ρ computed from thesample have asymptotically normal distributions.

But Kendall showed using simulation experiments that thedistribution for his correlation coefficient is surprisingly closeto normal even for small values of n, which is not the case forSpearman’s correlation.

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 26: Rank correlation- some features and an application

Bias properties of Kendall’s τ and Spearman’s ρ

Consider a finite population. Let ρ? and τ? be Spearman’sand Kendall’s rank correlation coefficients computed from theentire population.

Suppose that we have a simple random sample withoutreplacement from that population. And we computeSpearman’s ρ and Kendall’s τ from the sample.

Then, τ is an unbiased estimator for τ? but ρ is a biasedestimator for ρ?.

If the population size N tends to infinity, expected value ofSpearman’s ρ goes to 1

n+1{3τ? + (n − 2)ρ?} where n is the

size of the sample.

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 27: Rank correlation- some features and an application

small sample distribution of τ , ρ and r

It is well-known that for a simple random sample of size ndrawn from a bivariate normal distribution, under theassumption of zero correlation, Pearson’s r satisfies

r√

n − 2√1− r2

∼ tn−2 (11)

But the distribution of r for small samples from normaldistribution with non-zero correlation and from non-normaldistributions, is not tractable.

τ and ρ are distribution free statistics in the sense that theirdistributions do not depend on the distribution of the data solong as X and Y are independent. Consequently, theirdistributions under the hypothesis of independence of X andY can be tabulated.

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 28: Rank correlation- some features and an application

Asymptotic normality of r , ρ and τ

Note that each of Pearson’s r , Spearman’s ρ and Kendall’s τcomputed from a bivariate data are asymptotically normallydistributed.

Asymptotic normality of Pearson’s r can be derived usingCentral Limit Theorem applied to various bivariate samplemoments.

Asymptotic normality of Spearman’s ρ follows fromasymptotic normality of linear rank statistics.

Asymptotic normality of Kendall’s τ follows from asymptoticnormality of U-statistics.

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 29: Rank correlation- some features and an application

List of contents

Historical overview of rank correlation.

Some properties of rank correlation.

A practical example of rank correlation.

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 30: Rank correlation- some features and an application

A practical application of rank correlation

Recently, the Ministry of Human Resource Development(MHRD) considered giving weightage to the marks scored inthe 10+2 Board exams for admission to engineering collegesin India.

The raw scores across the Boards are not comparable. So,they wanted help in this regard from the Indian StatisticalInstitute.

The use of percentile ranks of students based on theiraggregate scores was recommended by Indian StatisticalInstitute.

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 31: Rank correlation- some features and an application

A practical application of rank correlation

Recently, the Ministry of Human Resource Development(MHRD) considered giving weightage to the marks scored inthe 10+2 Board exams for admission to engineering collegesin India.

The raw scores across the Boards are not comparable. So,they wanted help in this regard from the Indian StatisticalInstitute.

The use of percentile ranks of students based on theiraggregate scores was recommended by Indian StatisticalInstitute.

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 32: Rank correlation- some features and an application

The Data

Indian Statistical Institute was provided data from 4 boards(namely, ICSE , CBSE , West Bengal Board andTamil Nadu Board) for two consecutive years 2008 and 2009

Though the recommendation from Indian Statistical Institutewas to use aggregate scores of a student for computing thepercentile rank of the student (and that recommendation wasfavorably accepted by MHRD), a statistically interestingquestion is what happens if we consider various subject scoresseparately instead of the aggregate score.

We intend to investigate this issue under some appropriateassumptions.

2

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 33: Rank correlation- some features and an application

The Data

Indian Statistical Institute was provided data from 4 boards(namely, ICSE , CBSE , West Bengal Board andTamil Nadu Board) for two consecutive years 2008 and 2009

Though the recommendation from Indian Statistical Institutewas to use aggregate scores of a student for computing thepercentile rank of the student (and that recommendation wasfavorably accepted by MHRD), a statistically interestingquestion is what happens if we consider various subject scoresseparately instead of the aggregate score.

We intend to investigate this issue under some appropriateassumptions.

2

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 34: Rank correlation- some features and an application

The Data

Indian Statistical Institute was provided data from 4 boards(namely, ICSE , CBSE , West Bengal Board andTamil Nadu Board) for two consecutive years 2008 and 2009

Though the recommendation from Indian Statistical Institutewas to use aggregate scores of a student for computing thepercentile rank of the student (and that recommendation wasfavorably accepted by MHRD), a statistically interestingquestion is what happens if we consider various subject scoresseparately instead of the aggregate score.

We intend to investigate this issue under some appropriateassumptions.

2

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 35: Rank correlation- some features and an application

The Model

For convenience, let us consider only two subjects namelyMathematics and Physics.

Let us denote the observed score of a student in Mathematicsand Physics as XM and XP . Assume the existence ofunobserved merit variables WP and WM such that the scoresin the two subjects are related as

XM ≈ gM(WM) XP ≈ gP(WP) (12)

WM and WP may be treated as attributes of the studentwhich depend on the knowledge and understanding of Mathsand Physics respectively and also on other factors likeschooling, intelligence etc.

gM and gP relate to the examination procedure correspondingto the two subjects. They may vary across the boards. 3

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 36: Rank correlation- some features and an application

The Model

For convenience, let us consider only two subjects namelyMathematics and Physics.

Let us denote the observed score of a student in Mathematicsand Physics as XM and XP . Assume the existence ofunobserved merit variables WP and WM such that the scoresin the two subjects are related as

XM ≈ gM(WM) XP ≈ gP(WP) (12)

WM and WP may be treated as attributes of the studentwhich depend on the knowledge and understanding of Mathsand Physics respectively and also on other factors likeschooling, intelligence etc.

gM and gP relate to the examination procedure correspondingto the two subjects. They may vary across the boards. 3

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 37: Rank correlation- some features and an application

The Model

For convenience, let us consider only two subjects namelyMathematics and Physics.

Let us denote the observed score of a student in Mathematicsand Physics as XM and XP . Assume the existence ofunobserved merit variables WP and WM such that the scoresin the two subjects are related as

XM ≈ gM(WM) XP ≈ gP(WP) (12)

WM and WP may be treated as attributes of the studentwhich depend on the knowledge and understanding of Mathsand Physics respectively and also on other factors likeschooling, intelligence etc.

gM and gP relate to the examination procedure correspondingto the two subjects. They may vary across the boards. 3

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 38: Rank correlation- some features and an application

Formulation of the model

Two students may obtain different scores in Mathematics andPhysics because of the difference in their merit variables WM

and WP or due to the difference in examination procedure gMand gP across the boards.

It is time that we lay down our assumptions about WM , WP

and gM and gP .

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 39: Rank correlation- some features and an application

Assumptions of the model

Assumption 1

The functions gP and gM are monotonically increasing. Thisimplies the scores of the students are expected to increasefrom less meritorious to more meritorious students for each ofthe two subjects.

Assumption 2

The joint distribution of (WP ,WM) for the students is thesame in different boards.

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 40: Rank correlation- some features and an application

How Assumptions can be checked

Imagine a common test in Mathematics and Physics taken bystudents of all the boards.

Mathematics score in the common test would be a monotonefunction of the Mathematics score in the board examination,as both are monotone functions of the same merit variable.(The same holds for Physics scores).This can be tested by using Spearman’s ρ and Kendall’s τstatistics.

Mathematics and Physics scores in the common test wouldhave the same distribution in the subpopulationscorresponding to different boards.This can be tested using any non-parametric test for equalityof bivariate distributions.

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 41: Rank correlation- some features and an application

Is there a way to check the validity of theseassumptions using currently available data?

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 42: Rank correlation- some features and an application

How assumptions can be checked without acommon test

According to Assumption 2, the dependence between meritsin Physics and Mathematics should be similar in all theboards.

Rank correlation between Physics and Mathematics scores ina particular board should not depend on the board-specificmonotone functions gM and gP .

Therefore, rank correlation between Physics and Mathematicsscores across the boards should be the same.

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 43: Rank correlation- some features and an application

Rank correlation between Physics & Maths fordifferent boards and years

0

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 44: Rank correlation- some features and an application

Rank correlation Physics & Chemistry

Figure: Rank correlation between Physics and Chemistry marks overyears

0

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 45: Rank correlation- some features and an application

bar chart of rank correlation Chemistry & Maths

Figure: Rank correlation between Chemistry and Maths marks over years

m

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 46: Rank correlation- some features and an application

Subject percentile graph WBHS 2008

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 47: Rank correlation- some features and an application

Variation of a subject across a board same year

mKushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 48: Rank correlation- some features and an application

Inference from the data analysis

Between boards variation is significantly higher than withinboard variation across the two years.

Visibly,there is high correlation in Tamil Nadu Board, whereaslow correlation is observed in CBSE Board.

If we interpret the data available as a large sample from alarger hypothetical population, the rank correlation computedfor a board in a particular year will have an approximatenormal distribution.

So, we can use this rank correlation values to carry outANOVA type statistical analysis to see whether there issignificant difference values across different boards and acrossdifferent years. When this is done, rank correlation appears tobe significant across different boards.

This essentially implies breakdown of Assumption 2.

Study of the rank correlation brings out this fact even withoutscores of a common test.

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 49: Rank correlation- some features and an application

Inference from the data analysis

Between boards variation is significantly higher than withinboard variation across the two years.

Visibly,there is high correlation in Tamil Nadu Board, whereaslow correlation is observed in CBSE Board.

If we interpret the data available as a large sample from alarger hypothetical population, the rank correlation computedfor a board in a particular year will have an approximatenormal distribution.

So, we can use this rank correlation values to carry outANOVA type statistical analysis to see whether there issignificant difference values across different boards and acrossdifferent years. When this is done, rank correlation appears tobe significant across different boards.

This essentially implies breakdown of Assumption 2.

Study of the rank correlation brings out this fact even withoutscores of a common test.

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 50: Rank correlation- some features and an application

Inference from the data analysis

Between boards variation is significantly higher than withinboard variation across the two years.

Visibly,there is high correlation in Tamil Nadu Board, whereaslow correlation is observed in CBSE Board.

If we interpret the data available as a large sample from alarger hypothetical population, the rank correlation computedfor a board in a particular year will have an approximatenormal distribution.

So, we can use this rank correlation values to carry outANOVA type statistical analysis to see whether there issignificant difference values across different boards and acrossdifferent years. When this is done, rank correlation appears tobe significant across different boards.

This essentially implies breakdown of Assumption 2.

Study of the rank correlation brings out this fact even withoutscores of a common test.

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 51: Rank correlation- some features and an application

Inference from the data analysis

Between boards variation is significantly higher than withinboard variation across the two years.

Visibly,there is high correlation in Tamil Nadu Board, whereaslow correlation is observed in CBSE Board.

If we interpret the data available as a large sample from alarger hypothetical population, the rank correlation computedfor a board in a particular year will have an approximatenormal distribution.

So, we can use this rank correlation values to carry outANOVA type statistical analysis to see whether there issignificant difference values across different boards and acrossdifferent years. When this is done, rank correlation appears tobe significant across different boards.

This essentially implies breakdown of Assumption 2.

Study of the rank correlation brings out this fact even withoutscores of a common test.

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 52: Rank correlation- some features and an application

Inference from the data analysis

Between boards variation is significantly higher than withinboard variation across the two years.

Visibly,there is high correlation in Tamil Nadu Board, whereaslow correlation is observed in CBSE Board.

If we interpret the data available as a large sample from alarger hypothetical population, the rank correlation computedfor a board in a particular year will have an approximatenormal distribution.

So, we can use this rank correlation values to carry outANOVA type statistical analysis to see whether there issignificant difference values across different boards and acrossdifferent years. When this is done, rank correlation appears tobe significant across different boards.

This essentially implies breakdown of Assumption 2.

Study of the rank correlation brings out this fact even withoutscores of a common test.

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 53: Rank correlation- some features and an application

Acknowledgement

I would like to express my gratitude towards my mentors for thisproject, Prof.Probal Chaudhuri and Prof. Debasis Senguptafor their immense co-operation. I would also like to think all thosewho have been associated with this work in some way or the other.

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation

Page 54: Rank correlation- some features and an application

Thank You

Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation