Rank correlation- some features and an application
-
Upload
kushal-kumar-dey -
Category
Education
-
view
1.399 -
download
2
description
Transcript of Rank correlation- some features and an application
On some interesting features and anapplication of rank correlation
Kushal Kr. Dey
Indian Statistical InstituteD.Basu Memorial Award Talk 2011
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
List of contents
1 Historical overview of rank correlation.
2 Some properties of rank correlation.
3 A practical example of rank correlation.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Historical Overview—Correlation
In 1886, Sir Francis Galton coined the term correlation byquoting
length of a human arm is said to be correlated withthat of the leg, because a person with long arm hasusually long legs and conversely.
Galton wanted a measure of correlation that takes value +1for perfect correspondence, 0 for independence, and -1 forperfect inverse correspondence.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Historical Overview—Correlation
In 1886, Sir Francis Galton coined the term correlation byquoting
length of a human arm is said to be correlated withthat of the leg, because a person with long arm hasusually long legs and conversely.
Galton wanted a measure of correlation that takes value +1for perfect correspondence, 0 for independence, and -1 forperfect inverse correspondence.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Historical overview—contd.
Karl Pearson, a student of Galton, worked on his idea andformulated his ”product moments” measure of correlation in1896.
r =Sxy√
Sxx
√Syy
. (1)
Spearman observed that for characteristics not quantitativelymeasurable, the Pearsonian measure fails to measure theassociation. This motivated him to use rank-based methodsfor association and develop his rank correlation coefficient in1904. [”The proof and measurement of association betweentwo things” by C. Spearman in The American Journal ofPsychology (1904)].
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Historical overview—contd.
Karl Pearson, a student of Galton, worked on his idea andformulated his ”product moments” measure of correlation in1896.
r =Sxy√
Sxx
√Syy
. (1)
Spearman observed that for characteristics not quantitativelymeasurable, the Pearsonian measure fails to measure theassociation. This motivated him to use rank-based methodsfor association and develop his rank correlation coefficient in1904. [”The proof and measurement of association betweentwo things” by C. Spearman in The American Journal ofPsychology (1904)].
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Historical overview contd
In 1938, two years after the death of Pearson, MauriceKendall, a British scientist, while working on psychologicalexperiments, came up with a new measure of correlationpopularly known as Kendall’s τ . [”A new measure of rankcorrelation”, M. Kendall, Biometrika,(1938)].
Th next few years saw extensive research in this area due toKendall, Daniels, Hoeffding and others.
In 1954, a modification to Kendall’s coefficient in case of tieswas made by Goodman and Kruskal. [”Measures ofassociation for cross classifications” Part I, L.A.Goodman andW.H. Kruskal, J. Amer. Statist. Assoc, (1954)]
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Historical overview contd
In 1938, two years after the death of Pearson, MauriceKendall, a British scientist, while working on psychologicalexperiments, came up with a new measure of correlationpopularly known as Kendall’s τ . [”A new measure of rankcorrelation”, M. Kendall, Biometrika,(1938)].
Th next few years saw extensive research in this area due toKendall, Daniels, Hoeffding and others.
In 1954, a modification to Kendall’s coefficient in case of tieswas made by Goodman and Kruskal. [”Measures ofassociation for cross classifications” Part I, L.A.Goodman andW.H. Kruskal, J. Amer. Statist. Assoc, (1954)]
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Daniel’s Generalized correlation coefficient
H.E. Daniels of Cambridge University, a close associate ofKendall, proposed a measure in 1944 to unify Pearson’s r ,Spearman’s ρ and Kendall’s τ [The relation betweenmeasures of correlation in the universe of samplepermutations, H.E.Daniels, Biometrika,(1944)].
Consider n data points given by (Xi ,Yi ), i = 1(|)n , for eachpair of X ’s, (Xi ,Xj), we may allot aij = −aji and aii = 0,similarly, we may allot bij to the pair (Yi ,Yj), then Daniel’sgeneralized coefficient D is given by
Dd=
∑ni=1
∑nj=1 aijbij
(∑n
i=1
∑nj=1 aij2.
∑ni=1
∑nj=1 bij
2)12
(2)
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Daniel’s Generalized correlation coefficient
H.E. Daniels of Cambridge University, a close associate ofKendall, proposed a measure in 1944 to unify Pearson’s r ,Spearman’s ρ and Kendall’s τ [The relation betweenmeasures of correlation in the universe of samplepermutations, H.E.Daniels, Biometrika,(1944)].
Consider n data points given by (Xi ,Yi ), i = 1(|)n , for eachpair of X ’s, (Xi ,Xj), we may allot aij = −aji and aii = 0,similarly, we may allot bij to the pair (Yi ,Yj), then Daniel’sgeneralized coefficient D is given by
Dd=
∑ni=1
∑nj=1 aijbij
(∑n
i=1
∑nj=1 aij2.
∑ni=1
∑nj=1 bij
2)12
(2)
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Daniel’s generalized coefficient contd.
Special cases
Put aij as Xj − Xi and bij as Yj − Yi to get Pearson’s r .
Put aij as Rank(Xj)− Rank(Xi ) and bij asRank(Yj)− Rank(Yi ) to get Spearman’s ρ.
Put aij as sgn(Xj − Xi ) and bij as sgn(Yj − Yi ) to getKendall’s τ .
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Alternative expression for τ and ρ
First, we define dij to be +1 when the rank j ( j > i) precedesthe rank i in the second ranking and zero otherwise.
We can write the Kendall’s τ as the following
τ = 1− 4Q
n(n − 1)(3)
where Q is the total score, Q =∑
i<j dij and n is the totalnumber of elements in the sample.
Similarly, we can write Spearman’s ρ as the following
ρ = 1− 12V
n(n2 − 1)(4)
where V =∑
i<j (j − i)dij is the sum of inversions weightedby the numerical difference between the ranks inverted. Thisdifference is called the weight of inversion.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Alternative expression for τ and ρ
First, we define dij to be +1 when the rank j ( j > i) precedesthe rank i in the second ranking and zero otherwise.
We can write the Kendall’s τ as the following
τ = 1− 4Q
n(n − 1)(3)
where Q is the total score, Q =∑
i<j dij and n is the totalnumber of elements in the sample.
Similarly, we can write Spearman’s ρ as the following
ρ = 1− 12V
n(n2 − 1)(4)
where V =∑
i<j (j − i)dij is the sum of inversions weightedby the numerical difference between the ranks inverted. Thisdifference is called the weight of inversion.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
An interesting result
We simulated observations in large sample size from abivariate normal distribution and plotted the mean values ofSpearman’s ρ and Kendall’s τ against Pearson’s r . Weobtained the following graph.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
The graph
Figure: Relation of Spearman’s ρ and Kendall’s τ with Pearson’s rKushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Relation of τ and ρ with r for BVN
In 1907, Pearson , in his book [”On Further Methods ofDetermining Correlation”, Karl Pearson, Biometric series IV,(1907)], established the following relation betweenSpearman’s ρ and his r for bivariate normal distribution.
r = 2 sin(π
6ρ)
(5)
Cramer, in 1946, also established a relation between Kendall’sτ and Pearson’s r for bivariate normal.
r = sin(π
2τ)
(6)
However it is easy to show that the above two relations holdfor any elliptic distribution.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Relation of τ and ρ with r for BVN
In 1907, Pearson , in his book [”On Further Methods ofDetermining Correlation”, Karl Pearson, Biometric series IV,(1907)], established the following relation betweenSpearman’s ρ and his r for bivariate normal distribution.
r = 2 sin(π
6ρ)
(5)
Cramer, in 1946, also established a relation between Kendall’sτ and Pearson’s r for bivariate normal.
r = sin(π
2τ)
(6)
However it is easy to show that the above two relations holdfor any elliptic distribution.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Relation of τ and ρ with r for BVN
In 1907, Pearson , in his book [”On Further Methods ofDetermining Correlation”, Karl Pearson, Biometric series IV,(1907)], established the following relation betweenSpearman’s ρ and his r for bivariate normal distribution.
r = 2 sin(π
6ρ)
(5)
Cramer, in 1946, also established a relation between Kendall’sτ and Pearson’s r for bivariate normal.
r = sin(π
2τ)
(6)
However it is easy to show that the above two relations holdfor any elliptic distribution.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Relation between Kendall’s τ and r for bivariatenormal
Let (X1,Y1), (X2,Y2), . . . , (Xn,Yn) be a sample drawn fromBVN(0,0,1,1,r). Then Kendall’s τ computed from the data isan unbiased estimator of
2P((X1 − X2)(Y1 − Y2) > 0)− 1 = 2P(Z1Z2 > 0)− 1 (7)
where (Z1,Z2) ∼ BVN(0, 0, 2, 2, 2r).
Note that (Z1,Z2)d=√
2(V√
1− r2 + Wr ,W ) where (V ,W )have standard normal distribution. Since (Z1,Z2) is symmetricabout (0, 0)
4P(Z1 > 0,Z2 > 0)−1 = 4P(V√
1− r2+Wr > 0,W > 0)−1(8)
Use polar transformation on (V ,W ) and evaluate thisprobability to get 2
π sin−1 r .
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Relation between Kendall’s τ and r for bivariatenormal
Let (X1,Y1), (X2,Y2), . . . , (Xn,Yn) be a sample drawn fromBVN(0,0,1,1,r). Then Kendall’s τ computed from the data isan unbiased estimator of
2P((X1 − X2)(Y1 − Y2) > 0)− 1 = 2P(Z1Z2 > 0)− 1 (7)
where (Z1,Z2) ∼ BVN(0, 0, 2, 2, 2r).
Note that (Z1,Z2)d=√
2(V√
1− r2 + Wr ,W ) where (V ,W )have standard normal distribution. Since (Z1,Z2) is symmetricabout (0, 0)
4P(Z1 > 0,Z2 > 0)−1 = 4P(V√
1− r2+Wr > 0,W > 0)−1(8)
Use polar transformation on (V ,W ) and evaluate thisprobability to get 2
π sin−1 r .
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Relation between Spearman’s ρ and r for bivariatenormal
Now we try to give a sketch of a proof of the relationshipbetween Pearson’s r and Spearman’s ρ for bivariate normaldistribution .
Let R(Xi ) and R(Yi ) be the ranks of Xi and Yi . DefineH(t) = I{t>0}. Then, observe that
R(Xi ) =n∑
j=1
H(Xi − Xj) + 1 (9)
Note that Spearman’s ρ is the Pearson’s correlation coefficient
between R(Xi ) and R(Yi ) which ish− 1
4n(n−1)2
112n(n2−1)
where h =∑n
i=1
∑nj=1
∑nk=1 H(Xi − Xj)H(Yi − Yk).
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Proof continued
Case 1
If i , j , k are distinct, then (Xi − Xj ,Yi − Yk) are distributed asBVN(0, 0, 2, 2, r2).
E{H(Xi − Xj)H(Yi − Yk)} will reduce to the integral of theprobability density over the positive quadrant.
We can check, following similar technique as in the case of τthat, this integral is 1
2(1− 1π cos−1 r
2).
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Proof continued
Case 2
If i 6= j = k , then (Xi − Xj ,Yi − Yk) are distributed asBVN(0, 0, 2, 2, r) and the above expectation would reduce to12(1− 1
π cos−1 r). Then,
E
(h − 1
4n(n − 1)2
112n(n2 − 1)
)=
6
π
{n − 2
n + 1sin−1
r
2+
1
n + 1sin−1 r
}(10)
As n goes to infinity, the R.H.S reduces to 6π sin−1 r
2 .
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Reason for approximate linear relationship betweenSpearman’s ρ and Pearson’s r for BVN
As observed from the graph, Spearman’s ρ for Bivariatenormal is almost linearly related with Pearson’s r . This maybe attributed to the fact that ρ = 6
π sin−1 r2
= 6π ( r
2 + 16r3
8 + . . .)
= 3π r + terms very small compared to 1st order term≈ 3
π r
For Kendall’s τ , using similar expansion, we can also showthat τ convex function of r in the interval [0,1]. a
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Kendall’s comparative assessment of τ and ρ
Kendall in his paper admitted that ρ can take n3−n6 values
between −1 and +1, whereas τ can take only n2−n2 values in
the range, but according to him, this does not seriously affectthe sensitivity of τ .
Both Kendall’s τ and Spearman’s ρ computed from thesample have asymptotically normal distributions.
But Kendall showed using simulation experiments that thedistribution for his correlation coefficient is surprisingly closeto normal even for small values of n, which is not the case forSpearman’s correlation.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Bias properties of Kendall’s τ and Spearman’s ρ
Consider a finite population. Let ρ? and τ? be Spearman’sand Kendall’s rank correlation coefficients computed from theentire population.
Suppose that we have a simple random sample withoutreplacement from that population. And we computeSpearman’s ρ and Kendall’s τ from the sample.
Then, τ is an unbiased estimator for τ? but ρ is a biasedestimator for ρ?.
If the population size N tends to infinity, expected value ofSpearman’s ρ goes to 1
n+1{3τ? + (n − 2)ρ?} where n is the
size of the sample.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
small sample distribution of τ , ρ and r
It is well-known that for a simple random sample of size ndrawn from a bivariate normal distribution, under theassumption of zero correlation, Pearson’s r satisfies
r√
n − 2√1− r2
∼ tn−2 (11)
But the distribution of r for small samples from normaldistribution with non-zero correlation and from non-normaldistributions, is not tractable.
τ and ρ are distribution free statistics in the sense that theirdistributions do not depend on the distribution of the data solong as X and Y are independent. Consequently, theirdistributions under the hypothesis of independence of X andY can be tabulated.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Asymptotic normality of r , ρ and τ
Note that each of Pearson’s r , Spearman’s ρ and Kendall’s τcomputed from a bivariate data are asymptotically normallydistributed.
Asymptotic normality of Pearson’s r can be derived usingCentral Limit Theorem applied to various bivariate samplemoments.
Asymptotic normality of Spearman’s ρ follows fromasymptotic normality of linear rank statistics.
Asymptotic normality of Kendall’s τ follows from asymptoticnormality of U-statistics.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
List of contents
Historical overview of rank correlation.
Some properties of rank correlation.
A practical example of rank correlation.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
A practical application of rank correlation
Recently, the Ministry of Human Resource Development(MHRD) considered giving weightage to the marks scored inthe 10+2 Board exams for admission to engineering collegesin India.
The raw scores across the Boards are not comparable. So,they wanted help in this regard from the Indian StatisticalInstitute.
The use of percentile ranks of students based on theiraggregate scores was recommended by Indian StatisticalInstitute.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
A practical application of rank correlation
Recently, the Ministry of Human Resource Development(MHRD) considered giving weightage to the marks scored inthe 10+2 Board exams for admission to engineering collegesin India.
The raw scores across the Boards are not comparable. So,they wanted help in this regard from the Indian StatisticalInstitute.
The use of percentile ranks of students based on theiraggregate scores was recommended by Indian StatisticalInstitute.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
The Data
Indian Statistical Institute was provided data from 4 boards(namely, ICSE , CBSE , West Bengal Board andTamil Nadu Board) for two consecutive years 2008 and 2009
Though the recommendation from Indian Statistical Institutewas to use aggregate scores of a student for computing thepercentile rank of the student (and that recommendation wasfavorably accepted by MHRD), a statistically interestingquestion is what happens if we consider various subject scoresseparately instead of the aggregate score.
We intend to investigate this issue under some appropriateassumptions.
2
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
The Data
Indian Statistical Institute was provided data from 4 boards(namely, ICSE , CBSE , West Bengal Board andTamil Nadu Board) for two consecutive years 2008 and 2009
Though the recommendation from Indian Statistical Institutewas to use aggregate scores of a student for computing thepercentile rank of the student (and that recommendation wasfavorably accepted by MHRD), a statistically interestingquestion is what happens if we consider various subject scoresseparately instead of the aggregate score.
We intend to investigate this issue under some appropriateassumptions.
2
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
The Data
Indian Statistical Institute was provided data from 4 boards(namely, ICSE , CBSE , West Bengal Board andTamil Nadu Board) for two consecutive years 2008 and 2009
Though the recommendation from Indian Statistical Institutewas to use aggregate scores of a student for computing thepercentile rank of the student (and that recommendation wasfavorably accepted by MHRD), a statistically interestingquestion is what happens if we consider various subject scoresseparately instead of the aggregate score.
We intend to investigate this issue under some appropriateassumptions.
2
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
The Model
For convenience, let us consider only two subjects namelyMathematics and Physics.
Let us denote the observed score of a student in Mathematicsand Physics as XM and XP . Assume the existence ofunobserved merit variables WP and WM such that the scoresin the two subjects are related as
XM ≈ gM(WM) XP ≈ gP(WP) (12)
WM and WP may be treated as attributes of the studentwhich depend on the knowledge and understanding of Mathsand Physics respectively and also on other factors likeschooling, intelligence etc.
gM and gP relate to the examination procedure correspondingto the two subjects. They may vary across the boards. 3
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
The Model
For convenience, let us consider only two subjects namelyMathematics and Physics.
Let us denote the observed score of a student in Mathematicsand Physics as XM and XP . Assume the existence ofunobserved merit variables WP and WM such that the scoresin the two subjects are related as
XM ≈ gM(WM) XP ≈ gP(WP) (12)
WM and WP may be treated as attributes of the studentwhich depend on the knowledge and understanding of Mathsand Physics respectively and also on other factors likeschooling, intelligence etc.
gM and gP relate to the examination procedure correspondingto the two subjects. They may vary across the boards. 3
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
The Model
For convenience, let us consider only two subjects namelyMathematics and Physics.
Let us denote the observed score of a student in Mathematicsand Physics as XM and XP . Assume the existence ofunobserved merit variables WP and WM such that the scoresin the two subjects are related as
XM ≈ gM(WM) XP ≈ gP(WP) (12)
WM and WP may be treated as attributes of the studentwhich depend on the knowledge and understanding of Mathsand Physics respectively and also on other factors likeschooling, intelligence etc.
gM and gP relate to the examination procedure correspondingto the two subjects. They may vary across the boards. 3
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Formulation of the model
Two students may obtain different scores in Mathematics andPhysics because of the difference in their merit variables WM
and WP or due to the difference in examination procedure gMand gP across the boards.
It is time that we lay down our assumptions about WM , WP
and gM and gP .
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Assumptions of the model
Assumption 1
The functions gP and gM are monotonically increasing. Thisimplies the scores of the students are expected to increasefrom less meritorious to more meritorious students for each ofthe two subjects.
Assumption 2
The joint distribution of (WP ,WM) for the students is thesame in different boards.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
How Assumptions can be checked
Imagine a common test in Mathematics and Physics taken bystudents of all the boards.
Mathematics score in the common test would be a monotonefunction of the Mathematics score in the board examination,as both are monotone functions of the same merit variable.(The same holds for Physics scores).This can be tested by using Spearman’s ρ and Kendall’s τstatistics.
Mathematics and Physics scores in the common test wouldhave the same distribution in the subpopulationscorresponding to different boards.This can be tested using any non-parametric test for equalityof bivariate distributions.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Is there a way to check the validity of theseassumptions using currently available data?
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
How assumptions can be checked without acommon test
According to Assumption 2, the dependence between meritsin Physics and Mathematics should be similar in all theboards.
Rank correlation between Physics and Mathematics scores ina particular board should not depend on the board-specificmonotone functions gM and gP .
Therefore, rank correlation between Physics and Mathematicsscores across the boards should be the same.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Rank correlation between Physics & Maths fordifferent boards and years
0
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Rank correlation Physics & Chemistry
Figure: Rank correlation between Physics and Chemistry marks overyears
0
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
bar chart of rank correlation Chemistry & Maths
Figure: Rank correlation between Chemistry and Maths marks over years
m
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Subject percentile graph WBHS 2008
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Variation of a subject across a board same year
mKushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Inference from the data analysis
Between boards variation is significantly higher than withinboard variation across the two years.
Visibly,there is high correlation in Tamil Nadu Board, whereaslow correlation is observed in CBSE Board.
If we interpret the data available as a large sample from alarger hypothetical population, the rank correlation computedfor a board in a particular year will have an approximatenormal distribution.
So, we can use this rank correlation values to carry outANOVA type statistical analysis to see whether there issignificant difference values across different boards and acrossdifferent years. When this is done, rank correlation appears tobe significant across different boards.
This essentially implies breakdown of Assumption 2.
Study of the rank correlation brings out this fact even withoutscores of a common test.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Inference from the data analysis
Between boards variation is significantly higher than withinboard variation across the two years.
Visibly,there is high correlation in Tamil Nadu Board, whereaslow correlation is observed in CBSE Board.
If we interpret the data available as a large sample from alarger hypothetical population, the rank correlation computedfor a board in a particular year will have an approximatenormal distribution.
So, we can use this rank correlation values to carry outANOVA type statistical analysis to see whether there issignificant difference values across different boards and acrossdifferent years. When this is done, rank correlation appears tobe significant across different boards.
This essentially implies breakdown of Assumption 2.
Study of the rank correlation brings out this fact even withoutscores of a common test.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Inference from the data analysis
Between boards variation is significantly higher than withinboard variation across the two years.
Visibly,there is high correlation in Tamil Nadu Board, whereaslow correlation is observed in CBSE Board.
If we interpret the data available as a large sample from alarger hypothetical population, the rank correlation computedfor a board in a particular year will have an approximatenormal distribution.
So, we can use this rank correlation values to carry outANOVA type statistical analysis to see whether there issignificant difference values across different boards and acrossdifferent years. When this is done, rank correlation appears tobe significant across different boards.
This essentially implies breakdown of Assumption 2.
Study of the rank correlation brings out this fact even withoutscores of a common test.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Inference from the data analysis
Between boards variation is significantly higher than withinboard variation across the two years.
Visibly,there is high correlation in Tamil Nadu Board, whereaslow correlation is observed in CBSE Board.
If we interpret the data available as a large sample from alarger hypothetical population, the rank correlation computedfor a board in a particular year will have an approximatenormal distribution.
So, we can use this rank correlation values to carry outANOVA type statistical analysis to see whether there issignificant difference values across different boards and acrossdifferent years. When this is done, rank correlation appears tobe significant across different boards.
This essentially implies breakdown of Assumption 2.
Study of the rank correlation brings out this fact even withoutscores of a common test.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Inference from the data analysis
Between boards variation is significantly higher than withinboard variation across the two years.
Visibly,there is high correlation in Tamil Nadu Board, whereaslow correlation is observed in CBSE Board.
If we interpret the data available as a large sample from alarger hypothetical population, the rank correlation computedfor a board in a particular year will have an approximatenormal distribution.
So, we can use this rank correlation values to carry outANOVA type statistical analysis to see whether there issignificant difference values across different boards and acrossdifferent years. When this is done, rank correlation appears tobe significant across different boards.
This essentially implies breakdown of Assumption 2.
Study of the rank correlation brings out this fact even withoutscores of a common test.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Acknowledgement
I would like to express my gratitude towards my mentors for thisproject, Prof.Probal Chaudhuri and Prof. Debasis Senguptafor their immense co-operation. I would also like to think all thosewho have been associated with this work in some way or the other.
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation
Thank You
Kushal Kr. Dey [1.5 pt] Indian Statistical Institute D.Basu Memorial Award Talk 2011On some interesting features and an application of rank correlation