THE BISERIAL AND POINT CORRELATION COEFFICI]!IJ'l'Sboos/library/mimeo.archive/ISMS__14.pdf ·...

;.- ..~..•.•_,._-...'.". . . . .. ', ,">,

,_.- :i

~'.

1" ,

'tn.

;[7""

:.'"

THE BISERIAL AND POINT CORRELATION COEFFICI]!IJ'l'S

By

Robert Fleming Tate

Institute of StatisticsA~imeo. Series /;~14For Limited Distribution

p. vi L 21 "," should be added between "other" and "con·

tinuous" •

p. vii 1. 2 "of our" should read "of the joint distribution

p. 3 L 1

p. 7 L 23,. p. 23 1. 6e L 10

'~ p. 29 1. 14p. 59 1. 2

p. 1

p. 2

1. 6

1. 8L 9

1. 131. 16

of our".

Omit "of"."(21Crl " should read "(21C6) -1"."-2x:sir" should read "2yrylu" •

'1"; should read "J '1,"" _c..?

"defined to" should read "defined from the

sample to".

II, and" should read", m =the number of ziin the sample which are 1, and".

fIg' (x) 0" should read "g' (x) ';> 0".

"5.:t~' should read "l:t~·II { ;" should read "{ - -- }dxi ".fl I " should read " ¢ "."for Random" should read "for Dependent

Random" •

•

•

THE BISERIAL AND POINT CORRELATION COEFFICIENTSBy Robert Fleming Tate

Special Report of research at the Institute of Statistics ofthe University of North Carolina, Chapel Hill, under Officeof Naval Research Project NR 042031.

FOREWORDBy Harold HoteUing

The fact that biserial correlation was introduced andcame into general use before the development of the modemstatistical emphasis on exact sampling probabilities andthe theor,y of efficient estimation and testing of hypotheses,which ha\Te not yet embraced biserial correlation in theirformal treatments, leaves unanswered many questions as to theefficacy, proper place, and possible modifications of thiswidely used technique. A feeling that all is not well withbiserial correlation has led more recently to the introduc-tion of point biserial correlation, whicn in turn has led tofurther questions of principle and technique. In this paperMr. Tate contrasts these techniques with each other and withthe theoretically hundred per cent efficient, but computa-tionally difficult method of maximum likelihood.. A paradoxis created by the fact that whereas the correlation coeffici-ent in a population cannot exceed unity, the biserial estimateof it has no upper bound and may occasionally be far greaterthan unity. Mr. Tate shows how this phenomenon is associatedwith a gradually decreasing efficiency, approaching zero, ofthe biserial correlation as it increases. A point of parti-cular interest is the variance stabilizing transformation,'analogous to R. A. Fisher's transformation of the product-moment correlation coefficient, and capable of being carriedout with the same tables, reached on pp. 21-22 for the caseof equal frequencies in the two classes. Extension of thisto the case of unequal classes is now under considerat.ion.Psychologists and personnel workers as well as others con-cerned with test construction, item analysis and correlationwill find in this memorandum a clarification of numeroustroublesome questions that have surrounded biserialcorrelation.

•

ii

ABSTRACT

ROBERT FLEMING TATE. The Biserial and Point Biserial

Correlation Coefficients. (Uhder the direction of HAROLD

HOTELLING. )

Two solutions to the problem of finding the correlation

between a continuous random variable X and a discrete, two

valued random variable Z are discussed, the coefficients of

biserial and of point biserial correlation.

It is shown that the biserial correlation coefficient

r* has efficiency 0 when the population correlation coefficient

(' tends to :t L Also, r* has minimum variance for fixed ..

when the cut value of the underlying normal distl~ibution

occurs at the population mean; a table is included to illus-

trate this point. A special case of the limiting distribution

of r* is obtained for the cut value at the mean. Further,

biserial r* is shown to be Unbounded, and a diagram illus-

trating this point is included.

Tho equivalence of the use of point biserial I' under

certain restrictions and "Student I Slf t is displayed. The

relative advantages and disadvantages of I' and r* are dis-

cussed, and so~ recammendati~ are given as to Which of tho

two coefficients should be used in various cases •

•

INTRODUCTION

An important problem in experimental work is that of deter-

mining the correlation I between a continuous variable, and adiscrete variable which takes on only two values. The need to

measure such a quantity is "present in all fields of research to

some extent, but such correlations are of particular :importance

in psychological testing.

The fundamental problem of psychological testing 1s that of

measuring some quantity which has a name, for example intelligence

or meohanical aptitude. A; measurable criterion is selected to re-

present the quantity under consideration. In the absence of a:ny

external criterion the total test score is sometimes used. The

technique consists firstly in summoning for consideration all

possible questions, or items} as they are called, which could

have any bearing on the quantity to 'be measured, and which can be

answered quickly and unambigously by the subject Who is being

tested. The item is then scored 1 if the degree of association

with the quantity is positive, and 0 if it is not. If the test

is to be efficiently carried out, the items should have low corre-

lations among themselves, and if it is to be valid, they Should

all be correlated highly With the critorion.. The methods avail-

able for finding the correlation between it~ and criterion form

the subject of this pa.per. The extension of the mode of approach

used in psychological testing to other types of situat10ne will

in most cases be evident.

..

•

v

. For testing the hypothesis Jf = 0 product moment l' can beused. Its distribution is independent of that of the discrete

variable under the null hypothesis. For a tabulation we can use,1(,

Table V A of Fisher's book [6.t , David's tables 2), or Fisher'sz transformation, which ia eiven in Table V B of Fisher's book (6) .

There is no problem involved in testinc this hypothesj,a.

For the case .f ~ 0 two solutions he.ve been proposed. In 1909

Ka.rl Pearson (12) proposed a. solution to the problem in the form of

*the so-called biserial coefficient of correlation r 1 which is de-fined in Section 2 of Chapter I. The word biserial refers' to the

separate sets of values of the continuous variable which are asso-

ciated With the two values, 0 and 1, of the discrete variable.

Many results and techniQues of the Karl Pearson school of statistics

were supplanted by new and improved methods a.fter the advent of

the penetratinf studies of R. A. Fisher [5] and J. Neyman [l~ •

The coefficient of biseria,l correlation, however, is still a

widely used tool in statistical analysis. Jt is treated in many

texts used in psycholOGical statistics, but its mathematical theory

has remained in substantially the same incomplete for.m since 1913.

The basic property of the biserial coefficient is that it

presupposes an underlyin[ normal distribution from which the dis-

crete, two-valued, variable can be obtained•. That is, it pre-

supposes a dichotomization of the normal distribution at some

1. Numbers in brackets refer to biblioeraphy.

•

vi

fix.ed point (I), after which all observations which fallon the right

of this point will be assiened the value 1, end all of those on the

left, the value O. The meaning of this assumption will become

clearer if we return for a moment to the notion of a test i tam.

There the postulation of underlying normality is eqUivalent to the

Buppoeition that there exists a normal population of attitudes

towards answering the given test question. Most subjects would, if

pOSSible, answer in qualified terms, or in degrees of confirmation

or rejection. However, one of two answers must be given, and it

is hypothesised that there exists a unique point of demarcation of

attitudes, and that it occurs at (I), the aforementioned po1nt of

dichotomy.

In 1934 another solution to the problem was proposed by J.

Stalnaker and M. W. Richardson (l~ in the form of the point

biserial ooefficient of correlation r. In connection with this

coefficient no specification is made of any under1Ying distribu-

t1en for the discrete variable.

Pearson's assumptions were that the discrete variable was

obtained from. an. underlyinL normal variable by dicnotomiza.tion at

s~e fixed point, and that there exists a linear regression of the

other I;ontinuous variab~e upon this normal variable. His der1va-

*tion of biserial r , based on these assumptions, was carried out* .by the Method of Consistency, BO r is a consistent esttmate of

f. A discussion of the derivation is given in Chapter I,Seotion 3.

vii

Unless we specify a more complex model than that used by Pearson,

we cannot speak of our discrete and continuous variables, and hence

*can dra'" no further inferences about r. The model to be con-

sidereiL here specifies an underlying bivariate normal distribution

with '~hree parameters, OW the standard deviation of the continuous

varia.ble I the cut va.lue of (.I), and the population correlation

coeN'icient .f' . The model is defined in Cha.pter I, Section 2.*In addition to the fact that r is a consistent estimate of

*.f ' the asymptotic standard error of r is known. H. E. Soper[l~ obtained this result in 1913. In Chapter II the asymptotic

*standard error of r will be studied more closely. It will beshown that this quantity, a function of (.I) and! , takes on its

minimum value for any fixed'p at (.I) = O. A double-entry table

of the asymptotic standard deViation is given in Table I.

*In Chapter III, Section 1 the limiting distribution of rwill be derived, subject to the restriction that the point of dich-

otomy (.I) be taken at the mean of the underlying distribution asso-

eiated with the discrete variable. The variance of this distri-

bution offers a partial check on Soper's ~sult. Subject to the

•

same restriction on (.I), a transformation will be given analogous to

FiSher's z transformation, which stabiliz~s the variance.

*Two important properties of the efficiency of r are obtainedin Chapter IV. By a consideration of the information matriX of

A .... .A.the maximum likelihood estimates (T', (.I) , f 1t is shown that

.f + *as tends to .1 the efficiency of r tends to 0, and also that*r has efficiency 1 when f = O.

"'e

. ..

viii

In a recent paper J. Lev [10] considered point biserial r

under the assumption that the N values of the discrete variable

are not random but fixed. That is, if X denotes the continuous

variable and Z the discrete variable, ho considered only samples

(Xi' Zi)' (i = 1, 2, ••• , N), for which the Zi have a fixed Partition,No of them being 0, and Nl of them 1. He further assumed tho

residuals of the Xi determined from the Zi to be normally distri-

buted. Lev showed that under these restrictions point biserial r

is a maximum likelihood estimate of ,f ' and that using it is

equivalent to using the two-sample t statistic when f =0, andthe two-sample non-central t when .F f: o.

Neither of these coefficients is entirely adequate; in fact,

they both leave much to be desired. A great deal of the literature

available on the subject of biserial correlation-displays the con-

fusion of the early Pearson school in their failure to distinguish

between population parameters and sample estimates of these para-

meters, and to appreciate the significance of the concept of

efficiency of 8n estimate. An attempt has boen me.de in this paper

to put the various notions involved on a firmer mathematical foot-

ing in the light of modern statistical methods. A discussion of '

the advantages and disadvantages of the biserial and point biserial

coefficients of correlation may be found in the summary of results

given in Chapter V.

CHAPTER I

PROPERTIES OF THE BISERIAL MID

POINr BISERIAL CORRELATION COEFFICIENTSf

1. Notation

The following notation and conventions will be used throughout.

~(x) stands for (2n)-1/2exp(_X2/2), the probability

¢(x,y)

N(a,b)

r.;>

e VeX)...f-l(x)

plim ~ =I

dlim~=x

density of a standard normal variable.

stands for (2n )-1(1_ f2)-1/2.

exp ~2-l(l_.? 2) -l(x2j,,r2-2 xyp.. +y2~ , theprobability density of a bivariate normal vector.

denotes a variable which has a normal distribution

with mean! and standard deviation E;i 9 the variance of a random variable X•

will stand for l/f(x) for any function f which

appears.

denotes a sequence of r?llldom. variables IN(N = 1,2, ••• )tending in probability to a random variable I as

N tends to r:f:)

denotes a sequence of random variables XN(N=l,2, ••• )

tending in distribution to a random variable X as

N tends to 00 • That is ~o say, if we define

FN(X) = P(XN.~ x) and F(x) = P(l ~ xl, then Fn{x)

tends to F(x) at every continuity point of F(x).

Greek let~ers will designate parameters.

Lemmas will be assigned Arabic numerals.

2

Theorems Will be assigned Roman numerals.

Formulae will be assigned Arabic numerals with the

number of the chapter appearinf first.

References Will be numbered in square brackets, and may be found

in the bibliography.

2. Mathematical Model

(x, Y) is a two-dtmens1onal"random vector with the fr~quency

distribution ¢(x,y).

Y >- (.I), where (.I) is a fixed constant of the Y

Y

[.1 _. 21 1/2

With z :: 0 and z ~ 1 respectively, s = N t{xi-i) j ,00 00

defined by the relationmjN ==f 1I.(y)dy.t

*,. Karl Pearson's Derivation of r

and t is

;.

The essence of Pearson's method was tha.t he found a relation

between the population means under the assumptions of normal!ty

underlyins the Z distribution, and the existence of a linear re-

gression of X on Y. More precisely, let

~ = ECX t Y~ (1), bl :: E(l , Y~ w),.ao = E(X , Y< co), bo :: E(Y , Y

4. *Bounds for Biserial rThe ordinary product moment r is bounded between -1 and +1.

*It is eVident on slight investigation that r does not share this

property.

Let us consider for a moment that we are dealing with a fixed

set of Zi:

No+Nl = N.

namely, No of the Zi are ° and Nl of them are 1, where*Then r becomes

00

where c is determined from Nl/N = S A.(y)ay.c

It can now be shown that, for .f =0,

(1.4)

.'

has the two-sample t distribution with N-2 degree~ of freedom.

This is a transf~tion of. a result by Lev (10) , of which morewill be said in Section 6 of this chapter. For the case of fixed

proportions Nl/N and No/Nl and fixed sample size N, theh, tallows

*us to compute upper and lower bounds for r , and to make probabilitystatements about these bounds. A nomograph for the computation of

r* by Dunla.p [4J is Biven in Fie;. 1 at the end of this paper inorder to illustrate the wide range of.valuea which can be obtained•

Dunlap uaes slightly different notation. He transforms expression

*(1.1) for r into

5

(1.5) r* =s~l(xl-x)(m/N)X-l(t) by using the relation(m/N')~+(l-m/N)xo =x, and then considers xl to be the larger ofthe two sample means. In this way he obtains only values of

*r ~ O.

*5. Machine Methods for the Computation of r· ,,There are several mechanical methods available for the compu-

tat10n of r*. Royer (14) . *devised a method for computing r for

.,

each i tam in a test I using punched cards. His method requires

* -for each r the necessary steps for finding x and a, and then asorting of all cards, separating those which have z =1, and atabulation of part of them. Du. Boia [3] improved upon Royer's

method by reducing the number of steps to a point where only one

*complete sorting is required and several r may be computedsimultaneously,

6. Point Biserial Correlation.

Model: Let X again be a continuous variable, and. zi (i=l".",

N) be a fixed point in N-space. N of the z. are 0, and NI

ofo 1

them are 1. We consider a model in which X has mean !-I..- and(

standard deviation CT , and the variables Xi - fl' -a( zi-z) areN(O, 0"'(1- .f 2)1/2) , where a = fQ-N(NoN1)-p../2.

Under this model the ordinary product ~nt r between x and

z is called the point biserial coefficient of correlation, and is

defined to be

e·

6

) -1(- -) -l( )1/2(1.6 r:;: s xl-xo N NoNl •

The relation between the expressions for the biserial and point

biserial coefficients is

00

where c is again defined by Nl/N :;:J ~(y)dy as in (1.3).It can easily be shown that r Is a maximum likelihood esti-

mate of f Lev (10) , using the model above, showed that thedistribution of r is equivalent to that of Student's t when

f :;: 0, and a non-central t when.f ~ O. The parameter of thenon-central. t distribution is ~:;: (N)l/2jP (l-JP 2)-1/2, which

i8 independent of No and Nl • SummariZing, we can use

t = (N_2)1/2r (1_r2)-1/2 to test the hypothesis ~:;: O. When

f :I 0, we can compute confidence lim!ts for ~, and hence forl' J or test the hypothosiS .p:;: .f0' by making use of Table IV

given by Johnson and Welch [8] for the non-central t distribution.It should bo emphasized that the results hold only for the case

of fixed numbers Nand NJ in the two groups, the same in all ofo .

the samples With which our particular sample is compared.

CRAPrER II

THE ASYMPTOTIC VARIANCE OF r *

H. E. Soper [15] gives the following expression for the

* -1asymptotic variance of r as far as terms of order N

This chapter will contain an investigation of the critical values

*of VCr ) considered as a function of p and f. We shall· firstprove two lemmas.

00

LEMMA 1. If p ::f A,(y)dy, where A,(y) = (2TC)-l/2exp(_y2/2),x

then (l-2p)A(x) {> p(l-p)x

•

2 2Setting a'(x) = ° gives p ..p+2~ (x) = 0, which for the in..

terval 0 < x ° A2 (X) (16x2+8) < 1 and Sex) has a. minimum at the

K(x)< 0

point x.

;\2 (x) (16x2+8) > 1 and Sex) has a maximum a.t thepoint x.

Now consider the equation K(x) == 0, or defining

~ (x) = ;\2 (x)(16x2+8) , l.1(x) =1. l.1(x) is unimodal withits maximum at x =2-1/2 and minima. at x == O,·I'Q • Furthermore

1.1 (0) == 1. 27, ~(oo) == O. Therefore, ~(x) == 1 has a single

solution x = Xl in the interval 0 < x

Proof:

y

-- 9Assume that such a point x, exists. Then substituting from

(2.5) in Sex), we have by hypothesis

(2 ] 1/2 2

(2.6) s(x,) = ~(x,) 1-8~ (x,) -2x,X (x,) < 0, or

2 ) 2 2 2)~ (x, (4x, -1-8) ;> 1, which means that (4x, +8) > 21fe'W(x, •The least value which X, can take is Xl' but xl> 1.,. SO, sub-stituting x, =1., in the above" we have 14.76 ,. 21fexp(l.69),Which is false, and a fortiori false for all x >1.,.- ,

Therefore, there are no critical values of g(x) in the range

0< x < 00 except for a maximum at x2• Combining this resultWith (2.4', we ha.ve gex) > 0 for all x >0, which proves the lemma.

00 1/2 2!,EMMA 2. If P =f ~(y)d.y, where hey) =(21f)- exp(-y /2),

xthen p(1-p) ~ 'A,2(x)1f/2 for all x" .. 00 < x < 00 , with equalityat x = 0, : 00

:x

Define F(x) = J"I 'A.(y)d.y. Then p(l-p) = (l.F)F, and we wish "-00

to show F(x) [l-F(xil ~ (exp( _x2>] /4. Let

2 r 2:\hex) =F(x)-F (x)- lexp(-x )J /4.

hex) is an even function of x, so we need study only the case

o~ x -< 00. Differentiatins once produces

2ht(x) = ~(x)-2F(x)A(X)+(x/2)exp(-x ),

which with a little simplification becomes

...

e

....

10

It is easily shown that

(2.8) hex) 1s c01\tinuous, h(O) = h6x?) =0, hI (0) = o.By the law of the mean for integrals F(x) 5 (1/2)+x(2n) -1/2.

Hence, sUbstituting in (2.7), we obtain

(2.9) h t (x) ~ ~(x) f1-2 [X(21t) -1/2+l/~ +X(1C/2)l/2exp(_x2/2~ .=xl-(xl {(n/2)1/2exp(_X2/2l_(2/nl1/ 2] •

From (2.9) it is clear that in same neighborhood of' x = 0

we have

Now examine the function G(x).

GI (x) = _2~(x)+(n/2)1/2(1_x2)exp(_x2/2) = ~(x) {n(1-x2)_e} .In the interval 0 ~ x 0in sorne neighborhood about O. In the light of' these f'acts and

(2.8) it is evident that hex»~ Of'or all x in 0< x(oo ;

11

otherwise, hex) ~uld have at least two critical points in the

interval. This proves the lennna..

*THEOREM I. V(r) has an absolute minimum for aIJy fixed .p atl' = 1/2.Proof in three parts:

*PART 1: It is easily shown that VCr ) has a relative minimum atp = 1/2 for aIJy fixed 1'. Indeed,(2.12) V(r*) = N-1 {p4_2.5.;p2+'p2A+B} , where

A = pqa?,,--2(ill)+(2p-l)ill:\-1(00), B = pq:\-2(rot.

Recalling that :; = _:\-1(

..12

and hence d~ I = 2nN-l (.f 2(n-4)+rc-2}. It is easily seen thatdp (.1):0

O

II

13

By Lemma. 2~ pq~-2 ((1) ~ n/2 with equality at co = 0 or p = 1/2.

Combining parts 1 to 3, we have

aba ,n { abs T vCr*) for any p1'occurS at p =1/2, which.:= fortiori proves the theorem.

The points brought up in this chapter are adequately illus-

trated in Table I which gives the asymptotic standard deviation

*of r as a function of p and l' •

..CHAPTER III

A SPECIAL CASE OF THE LIMITING DISI'RIBUl'ION OF r *.

We shall need the folloWing three theorems.

THEOREM A [13) : Let (V.tN' ... , VkiV) and (Vl' ••• I Vlf) be random

vectors, where N =1, 2, ••• If(a) dlim(V1N'."'V~) = (Vl""'V~),

(b) bl , b2, ••• is a sequenoe ot real positive numbers such

that bN --+ 0 as N 4 CO I

(0) H(vl,v2, ••• ~vk) is a function ot the real variables

vl' ... , vk which has a total differential at the point

(0, ••• ,0) .. and

(d) Hi =oH/Ovi

at the point (0, ••. ,0),

then, if-

dl1m WN =H1Vl+ ••• +HkVk •THEOREM B (1):. Let ~ • ~+YN" where ~, YN1 ZN are sequences

of random variables. It

(a) dUm ~ =X,(b) p1im YN = 0, where c is a constant,

then dl1m ZN =X+c.

·TBEOREM C (1): Let ZN =~YN' whe:r-o ~, YN, ZN are sequences of:r-andom variables. If

then

(a) dlim IN =X..(b) plim YN = 0, where c is a constant..

dlim ZN = oX.

II

15

As a corallary to TheoramC, we have for c =0, the resultp~1m ZN i= O.

Note: No condition of independence is required in Theorems B and c.

LEMMA 3: If

00

m/N ::; f ).,(y)dy,t

" 00

p ::; J ).,(y)dy,00

where m is the number of successes in N trials with the probability

of suocess, p, a constant for each trial, then the necessary and.

sufficient condition that

1s 00 =O.Proof: (l)

(m(N)-p= ~ ).,(y)dy,t

which by the law of .the mean for inte[,rala eives

(3.1) (m/N)-p ::; -(t-(l)~(t), t < t < 6,).

Now expand ).,(t) in ~ Taylor series about the point (J),

(;.2) ).,(t) ~ ).,(w)+).,t(m)(t-oo)+o(t-w).

m is a binomia.l vari.able with parameters N and.p, so we have

•

16

We can write (3.2) in the form

Now as N -+ 00 , the second term on the right tends in probabi11tyto Oby the corollary to Theorem 0, since rt-/2(t_cu) haa a limiting

,

law. From this result and Theorem B it can be seen that the corollary

to Theorem C Gives a necessary and suffioient condition in this

case: namely,

Plim~/2 [1I.(CU)-A(t~ = 0

11' and only if At «(.I) = O. But cu is a fixed finite number and

At«(.I) = -cuA(CU), so (.I) must be O. This proves the lemma.

1. Derivation of the Distribution.

We start with expression (1.2) for the biserial correlation

coefficient.

*r =(N-1Ex

izi-i Z)A-1(t)

{N-iOE(Xi-x)2] 172

This expression may be written in the fOl~

- -1/2* (- - -) 2 -2 -1)r = xz - x z (x - X}A (t.

*Thus r is essentially a. function of sample means. It is. *evident that r is invariant under a transformation of the form

y =xkr . Hence, in the model for r* put cr::ll 1.

11

For our purposes we w111need essentially a reformulation of

Theorem A. Let (Tux." T2Q(. "T;~, T4a ) (a£,.= 1" 2" ... , N) be

a set of independent, identically distributed random vectors with

finite covariances 0;,J• Now define

(a) TiN =Ti-ETi (i = 1" ••• , 4),(b) b

N= (N) -1/2"

2 -1/2(c) H(v1, .•• "v4) = (v1-v2v;)(v4-v2) ,(d) ViN = ~/~iN.

,;By the generalization of the Lindeberg-Levy fonD. of the

Central Limit Theorem to the case of random vecotrs, we have

where (Vl" •• • ,v4) is a 11Drmal random vector With mean (0, ... ,0)

and covariance matrix (0""1J) i, J = 1, •ow, 4.We have now satisfied the conditions of Theorem A, and so

(3.6) dlim WN = d1im r/2{ H(T1N, ••• ,T4lf) -H(O, ••• ,011/2

=N(O, f~~HiHJ()"iJ} ),where

Now in order actually to find the variance of the limiting

normal distribution, we adopt a mechanical technique due to p. L.

Heu which amounts to the sama thing as using the form of Theorem A

above" but which sa.ves labor. What Thoorem A actually tells us is

18

to use a two-term Ta.ylor expansion of the funotionof sample

means about their expected values.

Define varia.bles of the form T'

Xi =N- l / 2{xz)'+EXZ,i = N- l / 2x"

= r?-/2(T_ET). Then,

z: =r-t l / 2z' +1',x2=N-1/ 2(X2) '+1.

Upon substituting expressions (3.7) in (3.4) we obtain

(;.8) r* = \ (N- l / 2(xz) '+m) _(N- l / 2x l)(N-1/ 2z 1+1'>J .

{1'-1/ 2(x2) •+1-(N"1/2x , )2r1/\-1(t) .Now, the efficiency of Hsu's technique comes in combining the

terms in (3.8) I removing those which are O(N-1!2). Using a two-term negat1ve b1nOIll1al expansion for the second term, we get

Now by Theorem C

r!2 ~ r*-(m».-let) ~will have some limiting law as ,

1 ~ 2 J"'- .(w>{ xz-px- (EXZ)x /2 •Applying Theorem A, we have tbe resUlt

(3.1l) dlim Nl/2 [r*_{EXZ)iI.-l(t~ .. N(O,iI.-1{w)·

{ V(XZ-pX- (EXZ)x2!211/j

•e

19

Unfortunately, this is not quite satisfactory, since what we

desire is the limiting distribution of

It can easily be seen that the two expressions are not the same.

Indeed,

(,.12) r/2 [r*-(l!:XZ)",-1(CD1 = Nl / 2 [r*-

..e 20

Integration of bA by parts gives the recursion relationship

The Clan may now be written down. Note that Clan is independent of

k for k ~ 1.

Making use of the facts that bo = p, b1 = A(oo), 00 = OJ we computethe following constants.

We now have from C~.13),

Hence, from (3.15),

1/2 * 4 2 1/2(3.16) d11m N (r -f) ::t N(O, {.r -2.5' +sc/21 ).

Among other things (3.16) provides a check for Soper's

result (2.1).

..e 21

2. Variance Stabilizing Tr~formatio~r the Case (j) == ~.,'\.

It 18 desirable to find a func+;:ion f':::'~') such that

-~/'2[ (*) /t;)~~ TI '. dlim N- . f r -fl... J = r,~O.,l).

This function f(r*) haeseveral adva.nta.geo:'

(a) Its standard error is practically ind3pc~dent off';

* *(b) f(r) will tend to nor.mality faster than r no matterwhat value l' takes on;

(c) The fom of tho distribution will be nearly the same for

*all f for moderate N, while the distribution of I' for-l( 4·' 42. / )large N will have a variance N r ..2,5)7 +Jr '2 whioh

is markedly changed, becoming peaked for p near :1,and flat for p near O.

In connection with this type of transformation see (6) Chapter VI.. . *

Consider a Taylor expansion of fer ) about p as far aster.ms of order N-1/'2.

(3.17) f(r*) =ff )+f'(f )(r«~J')+o(N-l/'2L'2 .

E (f(r*)-f(f)] =1/1 by bPothesis.E [f(r*)-f(f)] '2 = ~t(f)] '2 E(r*-f)'2+o(r("l) by (3.17).

Then aside from terms O(N-l ) we have, equating the two previousresults, a differential equation

(3.18)

.~

ewhich can be rewritten as

..eCHAPTER rr

EFFICIENCY OF THE COEFFICIENT OF BISERIAL CORRELATION

1. ·.~el1minary Matters

From the mathematical model 6iven in Chapter I, Section 2,

w" mow that

x xP(x ~ x, Z =0) == f J" ¢(x,y)dy dx, and

-00 -00x 00

P(Xi x, Z = 1) =J f ¢(x,y)dy dx.-00 (l)

Therefore, the probability element of the sample (xl' zl)"'·'

~, zN) ma:y be written

The Method of Ma.x1mulll Likelihood will now be invoked in

..... /\.'\-order to give us estimates l' ' (1), ;:r of the population para-meters f' ' (1), iY respect!vely• These estimates have severalnice properties. They are consistent, tend in distribution to

normality as the sample size increases, have minimum variance

in the limit at least, and provide sufficient statistics where

any eXist. The Method of Maximum Likelihood we.a first introduced

by Fisher in 1921 [5J ' and was substantially improved upon bythe same author in 1925 l7J . These papers are both descr1bedalone With other pertinent material by Kendall in Cha.pter XVII

· of his book \~]. For a more rigorous treatment the reader is

referred to Chapters XXXII and XXXIII of Cram6r (1) •

The likelihood funotion of a sample (xlI zl), ••• ,(~,zN)

is defined to be

24

N(4.2) L =n f(xi ,z1).1=1It will, as usual, be convenient to take logarithms and maximize

log L.

Because of the complexities inherent in the caloulations

of first and second partial derivatives of log L, the work will

have to be done in stages. The following notation will be used

(4.4)

(1) aro _~log f(x,z) IdQ:" - ao .J J ~.=o

d2 f 2(ii) a _ d log f(x,z) Ii

aoJatSk - dOJaok •z~o

(iii) Ofl diog f(x,z) IdQj = d'SJ {z=l •

d2f 2(iV) 1 _ d log f(x,z) I

()Qj(1Qk - CiOJdQk •z=l

...e

where QJ' Ok run over the parameters f ' ID, ()" and f(x,z) isdefined by (4.1) •

25aj( ¢(x,y)dy = R(a) ..

"00

(v)

00f ¢(x..y)dy:;: Sea)a

00 (l)

(vi) Eo(Mo/?,oj) = q-l I J (Ofo/doj)¢{x,y)dy dx,-00 -00

..

the conditional expectation of 0dlOS f given z :;: 0i

00 00

(vii) El(Ofl/aoj ):;: p-l f f (Ofl/doJ)¢(x,y)dy

..e•

26w 00

N (l",Zt) f (a¢(Xi,y)/OO" )dY+Zi! (~(Xily)/a )" whichwill in the limit be the minimum variance among all variances

1\of estimates of .f as defined by our model. V(f ) can beobtained once we have the elements of the information matrix.

2. Determination of the Information Matrix.

By definition the infor.mation matrix is

and we have

(4.8)

A 1\ "where Jl. = «(J'" iJ) the matrix ofc~variances of .f ' (I), 0"The derivation of the elements of the information matrix Will

be given in two ~arts, each consisting of several cases.

Part I, Case 1:

USing the notation of (4.4) we have

..

27

.,?t/a f2 = fll((j» j (a2;/af 2)dy_( J(?Jt)/af )dy)1 1l-2((j»~ -00 -00

00 en2 I 2) -1 f f 2 2EO(d f

O'0 J =q ¢{x,y)('O fOldf )dy dx,

-00 -00

which becomes

00 en 00 en 2q-l J Jccl¢lof2)d:y dx_q-l J(f (-a¢I'Of )dy) R-l(w)dx.

-00 -00 ) -00 -00 ..-I'- ~ \.. ., ..

A B

The well known results

are to be used extensively in this chapter.

It is important to notice, first of all, that A and B .are

invariant under a change of scale of the X variable. According-

ly, we can replace xlrr by x, or in other words let cr = l-In view of (4.9)

1\ dx.

~

Putting (j = 1 in ~(x,y), and. differentiating, we obtain

..e

28

(4.10)

¥ =-(x- ..f y)¢(x,y) (1- J' 2)-1

~ = -(Y-..f x)¢(x,y) (1- .f 2)-1

03¢(x,y) = ..¢(x,y) {2,f Cx-.f' y) )CX- f' y)2 _ 1 1.ax2ay (1- .f2)2 (l_j'2)2 1-f2J

(y- j' X)11-J 2 J

Substituting in A from (4.10), and letting

we obtain

00 /1 2 2 -2 f 2·{ 2 2 1 2}A = .q" exp(-m /2)(1. f) . (z -1) mel. f j..z f (1- f) J.

-00

Z bae the frequency function of a variable which ia N(O,l),

eo A :;: O.

Now consider B:

)

2 2 2 2 -2= (x-fro) ¢ (x,w) (1-f) .

Case 2:

29

d2r/d f 2 = f(")[ (d2t1!df2)iJ¥-(1 (ar;!d f)iJ¥)JS-2(,.).

2/221 2El (d f l d f ) and Eo(O f o d f ) are the same except for thelimits of integration on y and q, which ris-· nO!!,:':lz.eplaced by p •

..With the aid of the fundamental relation (4.5), the following

ma.y be written:

00

d2l L J 2 2 '2 -2(4.11) E 0 8'2 = -N ¢ (x,O)(x- .f 0) (1- f ).d f -00

2E d log L

?m2

d'2fol?m2 = ~(0)(d¢(x,0)/?m)_¢2(x,0)} R-2(0)

Aga.in let ~ =1.

00 00

Eo

{d'2fo l?m'2) = q-l f (d¢(x,co)/cm)dx-q-1 J'I ¢2(x,0)R-l (0)dx.-00 -00

00

From ~ ~.l.O) A::o oq.-l I (0)-J x) (1..J 2) -J !(x.,O)dx.-00

Let z = (x- fO) (l-! 2) -1/2• Then, we set

..e

...-

exp(-z.2/2)dZ

== _q-l(2n)-1/2~ epx(~2/2).

Hence, 00

E ca2f /cm2 ) == _q-l(2n) -1/2~ exp( _m2/2)-q-1 f ¢2(x"m).o 0

-00

R-1(m)dx.

Again E1ca2fl/~2) is almost the same as Eo(??fo/(!iJ)2)

00

E1Ulfl/"al) :I p-l(2n)-1/2(j) exp(~2/2)-p-lf ¢2(x,~)s-1(m)dx.-00

From (4.5) we obtain

2Case %. E;; log L_......:.J:;...... diiaf .

31

We see :tmmediately from (4.10) that the first term vanishes. In

a sim11ar way we can obtain

00 00

"lcclt 1/?t»¥) .. _p-l f (?I/J(o,f )u+p-lJ ¢2(x..a>)(x- f m)'-00 -00

(1- f 2) -lS... l(a»dxin which the first term vanishes. Rence ... by (4.5)

2 00(4.1~) E·al;aj,L;I NJ (x- f m}¢2(x..m)(1- .f2)-1

"00

Case 4: E 'd2

1og La;;a

( ~.14) ~ ~ ~(x,l') ( x2(1- f 2) -1 CT - 3_!xy(l-f 2) -1.

(1-2_ cr- l ].

[f 2x2l(1- 1'2)-2- 3x2 (1-.f2)-~

+4 f'xy(l-5'2)-10--3->20--2J00 '

The integral .f (d¢(x,co) /0 cr )dx is taken by the-QO

transformation z = (x- f (j) 0") (1- .f 2) -1/2 cy-l into the form

00

e..

2C l:: E 0 log L~ee ,,: 3.f 3cr ·Let r = (x-.f a> tT )¢(x"a» cr-2(1-.f 2). Then"

'3"\/'3j' '3cr ={ -R(",)( '3.-l'3er' )+r J(?Y/JI'3 IT' )dy] R-2("') ,-00

00

"i\/'3'l '30" = [8("')('3.-1'3 cT )-r.r (?Y/JI'3CT) dy} S-2(",).(.0

USing (4.5.) and sUbatitutins for r" we obtain

2 00·

(4.16) E ~fo~; = Nf (x- f a>er )0'-2(1_ f 2)-1¢(x"a»"-00

a>

{R-1(a» f (o¢/d cr )ely-00

00

-S-l(a>1" (Ot)/?J(l')dy} dx.a>

33

Ca.se 6:2

E 0 log Locr 2 •

...

•

a>

'32f older' 2 = {R("') f ('32¢1'3 0-' 2)dy-00.[.l (?Y/J/iJ 0"' )dy J2} R-2(",)

iJ2f /M:r 2 = {S(",)j ('32¢liJ cT 2) dyen

_[j (,!lNiJo-')dy ] 2 S-2(Ul).a>

;4

It may be shown after some tedious algebra that

00 ID 00 00

Nq J f q-lU!p¢/ocr2)dy dx+NpIf p-l(o2¢/ofJ 2)dy dx-00 -00 -00 W

• N j j ui¢/?JIY 2)113 dx =O.-00 -00

(4.17) E ?J:~g2L • -NI ¢(x,,,,) { R-1(",). [-l (CY/J/?J(l')i13] 2+8-

1(",) [I (CY/J/?J(T")113 ) 2} dx.Part II.

The inta&ral eXpressions for the elements of the informa-

tion matrix a.ppear to be rather formida.ble. After forming

the matrix, we will want its inverse; in particular, we

'"want the element V(f.) in its inverse. In order to. *

examine the efficiency of r as J? tends to 1, J' must be'"allowed to tend to 1 in the final expression for V(j> ).

Here bad oomplications enter, since it can be determined

that none of the expected values (4.11), (4.12), (4.1;),

(4.15), (4.16) exists when l' tends to 1. They all behavesatisfactorily over- the whole x r~e - c£> < x

'5is expressible as a product C(l~f )~kal where C is a functionof J' I m which remains finite as ~ tends to 11 k is apositive integer, and H is an integral which exists for all

f ' then we can invert the information matrix and seeA .

what happens to the element V(j> ) as l' tends to 1. Ob-A

viously, f!+'l V(]> ) will be finite constant ~ 0 I since

the efficiency of a statistic must lie between 0 and 1, and

*we already know that VCr ) is finite from Chapter II.Now make the following definitions.

e•

(4.18) M=¢2(x/m) {R~l(m)+s-l(m)J •00

G(u) =f exp( _t2/2)dt.u

Let

e"

the two exponential expressions in (4.19) to the form

exp( _z2/2) •(a function of f ' oo). The exponential canbe put in the form

...xp { -(1+ f 2) [2(1-.f 2) 1-1 [X-2.f "'(1+.f 2)-11 2_002(1+f 2) -1] ;

whence it is seen that the desired transformation is

(4.20) Tl

:· Z = (1+f 2)1/2(1_ f 2)-1/2 (X-2J'(J)(1~2)-1] ,which pz:oduces . .+

x = (1_y 2)1/2(1+.f2)-1/2Z+2fOO(1+f2)-1,

(f X_(J))(1_j 2)-1/2=J' Z(1+f2)-1/2_00(1_f2)1/2(1+f2)-1= zl

x-foo = (1_f2)~/2 [z(1+f2) -1/2+fOO(1_.f2)1/2(1+.f2) -1J'and dx = (If 2)1/2(1+j'2)-1/2dz •

Upon performing these substitutions in (4.19), we obtain

(4.21) M = (22t)-1(1_.f2)-1/2(1+f2)-1/2exp(_002/l+.f2)

•oxp( _z2/2) { G-1( zl) +G-1(Zl)1.Caso 1: Consider integrals of the form

00

I k = SM(x- f oo)k(l_. f 2) -kdx •-00

·e

37

This is the form of (4.11), (4.12), (4.13). upon applying

k(4.21) and using the tran~for.m of (x-~ 00) given in (4.20)

on I k, one obtains immediately

. 00 .

exp(-0o2/l+J'2) J{Z+ f oo(1_p 2) (1~p~)-1/2J ~-00

It is easy to establish the fact that the integral

in I k exists for all finite k~_ 0, even if ~ tends to :1.

z(2) -1/2 as f~ 1

,-z(2) -1/2 as l' ---+ -1

while the balance of the integrand tends to zkexp(_z2/2).

It 1a easily shown by de l'Hospital's rule that

00J z:kexp(_z2/2} { G-l(z~)+G-l( -z~)}cL1..-00

-1/2must exist for all 1.1 < 1. In our case ~ = 2 ,so we

have eXistence. The factor multiplying tho integral of

course becomes infinite like (1+ f }-(k+l)/2 as .P~ :1.Note that the integral in I k Will, in the limit, be an odd

function if k 1s odd, and an even funotion if k is even.

Expression (4.22) then offers us a partial evaluation of three

of the elements of the information matrix"

2 2E ~Og L =NIl" E 0 10~ L = -NI •

j> an 0

Case 2:2

EO 108 Laoder·

•

Recall expressions (4.14) and (4.15). Upon substituting

(?IJ/ofl')'from (4.14) in (4.15), we have some cancellation"

and (4.15) becomes

00 00

(4.24) NJ0"'f(l~ f2) -1X¢2 (x,m)r-1(m) J ~(x,y) dy-00 -00

_S-l(",)! ~(x,y)dy] dx.00

Now" when we replace x/O' by x, we have the same ex-

-2 -1pression except that C" is replaced by Q'" , and the

rest of the integrand conta.ins no OV • Performing the

transformation t = (YO' f x) (1- f 2) -1/2 on the expressionin brackets" we obta.in (00- f x)(l-j> 2)-1/2 2

1 00 J(t(1-p2)1/2+fx]e-t /2dtN: t:> f (1- f· )20--] - _[0X¢(X1OO) -00/

J L (00_PX)(1_p2)-12f -t~/2dt

-00 e

..e•

00 2J e -t /2 dt( 2) -1/2

dx.

39

be

After we cancel the right members in the inner integrands

and revise the limits on the first inner integrals, we obtain

Perfor.m1ng the integration in the numerators of the bracketed

expression, we get

00

(4.25) - l' N 0"'-\1- f2)-1/2 f x¢(x,ro)exp-00

f 2-1(1- f 2)-1(ill_ f x)1 {G-1 [ ( r X-ill) (1-j' 2)-1/2]+G-1 [ (Cll- .f x)(1- f 2) -l/jJdx.

By comparing (4.25) with expression (4.19) for M, one

can qUickly deter.m1no that (4.25) ma,y be written as

•e•

•

00-If- .f N (J" - xM dx.-00

Now, applying·T. as defined.in (4.20) we arrive at an expression

similar to (4.22) in exactly the s~e way as before. The re-

sult is

•

(4.26) E O~a80"L = -Nf L2Jt cr (l+ j> 2)J-l{l_Y )-1/2{1+j' J1/200

exp(-

..e(4.27)

41-02 -1 /

E a~o~~ = -N.f (21f 0' (l- .f )(l+.9)] (l+.f 2)"3 2.00 _ .

_[~ z (1- .f' 2) 1/2+2 f m(l+.f 2) -1/2]

[z:+ fID(l- f 2)1/2(1+.f, 2)-1/2J exp(_z2/2).

[G·1

!Zl) +G-1

(-zdJdx. .

'e

',.'

..e

Case 4: E 02

1013 Lo0'2 ·

.Recall expression (4.17)

The simplest wa:y to treat this integral is to write it in

the form00

00 S(a¢/'?J0')d.y 2-N S¢(x,w) {R(ID) [-00 R(ID) )

-00 00

r I (OSl/Cl 0". )dY. '. 2J+S(w) l 00 SCm) ] dx.

After substituting the value of ~xo:) from (4.14) ~

ropla.cing x/a by x, we are left with the following ex-

press1ons, which conta.in cr only as a factor •

•

42

00f y¢(x,y)dy dx +-00

00 m(0) - .f2w0--2(1- y2) -1 f b(x,(J)) {R-1(",) [f y¢(x,y)ayr

-00 -00

Consider (A). This is of a slightly different type from

those which we dealt with before. This time the correct

transformation is

which produces

x = (1- f 2)1/2(2_ f 2)-1/2?+ fm/(2- f 2) I

( f x-m){l- y 2) -1/2: y z(2- .p 2) -1/2-2m(1- f 2)1/2(2- f 2)-1,

,.

eand dx = (1- f 2)1/2(2_ J' 2)-1/2dz . Substituting these valuesin (A), we obtain

j-00

..

00

Consider (B). In Jy¢(x,y)d.y make the transformation-00

00

2f 2N0--2(1_ .f 2) -1 J x2 { x2(1_ .f 2) -1_~.-00

¢(x,m)exp( _x2/2)dx.

Now apply:.· .. T2

given in (4.28). We obtain, in exactly the

same way as before,. -2

(B) = 2 f' 2Nf3'"' (1- f )(l+f )1 (2n)-lexp(_m2/(2_p 2) ·00 2-L[ .(1- f 2)1/2(2- f 2) -1/2+f' ID/(2-l' 2)}U.(.h- j2)(2- f 2)-1/2+ fID] 2.(1_f2q.

(2rt)-1/2exp (_z2/2)dz .

44

Finally, consider (0).

Since there are essentially no new ideas involved, and

in view of the fact that this calculation is much longer than

the others, a sketch of the method will be given, tOfethor

with the final results.

(1) Ma.1ce the transformation t = (1'- j>X)(1_p 2)-1/2

in the numerators of the expression in brackets.

(ii) We have (0) = 01 + 02 + 03'(iii) 01 requires transformation T2.

(iv) 02 can be shown to vanish, With little effort.

(v) 03 requiros a new transformation of the same type

as Tl' 1'2: namely, T3,

x = z(l- f' 2)1/2(2+ l' 2)-1/2+31' ro/('p 2+2).

Define z2 = S' (2+ j> 2) -1/2Z _2w(1...f 2)1/2(~+f' 2) -1.

The value of °will be included in the Table of ExpectedValues with the others:

,..

e

..e

Table of Expected Values.

We will define 01j to reprosent all of the expected

02108 L . (.f )..k tY-Avalues E dO db except the factors 1- and v •i j

,,\ 1\

1, J will run from 1 to 3; ,9 will correspond to 1; (l) toA

2; .and cr to 3. Note that all c1j must exist by the seme

arsument as that used in (4.22) et~

2 -3~E (j(j1;l = 011(1- f )-3/2= _1I(2n)-1 (1- J')(1+ f' )(1+,2»).

exp(-.,2;l+ f 2) j( ,,+ f",(l- f2)1/2(1+f2) -1/2r-00

fj210g L -1 [ 2]-1E af

,..

e:j;

46

2E 0 log L = -le (1- 0 )-1= -N f' l(r(1- 0 ) (1+f )1 -1as., acr cr 13 J ~ J .

00

(1+f2)-3/~exp(-a>2/1+f2).f[z(l- f''2)1/2-00

00

(1+y'2)-1exp (-(l)2/1+J'2) f[ Z(1_p'2)1/2-00 .

00

(2-f 2)-1/2exp [ "(1)'2/(2_9 2>} J{[Z(1_f'2)1/'2 .-00

+2Nf'2 [0'(1-f)(1+j')] -'2(21f)-3/'2(2_.r 2)-1!2.

e~_(1)2(1_ .f 2) -1)J •

,,'

e

[{Z(1_r2)1/2(2_),2) -1/2+jW] 2 -(1- J2)J #

exp(-z2/2)dz

_Nj'4(21C)-3/2 [CY(1-f)(1+j»] -2(2_f 2 )-1/2

exp [_W2(2_jP2)-1].00 . 4f [ z(1- f2)1/2{2_f~)-1/2+fW(2-f2) -1] •

-00

-Nf 2{21C) -1 [ (j" (1-'p) (1+.P ) ] -2 (2+j> 2) -1/2exp(_3CD2(2+f 2)-1) •

00 2f [Z(1- y2)(2+,.p2)-1/2+3j>W(2+J,2)-1 ] •-00

[G-1(Z2)+O-1( -Z2j exp( _z2/2)dz.In the above

00

G(u) =J exp( _t2/2)dt,u

Z1 = f z(1+f2) -1/2-w(1-.f 2)1/2(1+),2) -1, and

z2 =Y Z(2+j>2)-1/2_2ID(1_f 2)1/2(2+f2)-1.

41

48

. Now, recalling the definition of the information matrix,

we have

It will be notational]y convenient to consider

D = a lCOf(cll)1-1 for a moment.

We know VCr*) as far as terms of order N-l from

A -1Chapter II. V(?) as far as terms of order N is given

-1 Aby D , and f> is an estimate of minimum variance. Hence,

the efficiency of the coefficient of biserial correlation is

• 49

3. The Efficiency of r*as 1'-7 :l.All of the work of Chapter IV Boes through for j> ·4 -1

in exactly the same manner as for f' --+ 1. Hence, inresult (4.30) we need only replace.f by -f'. Note thefollow-ins facts: .

(1) L+ 012 = L+ c13 = 0, because we are integrating.f~-l j>~-l

an odd f~ction of z over symmetric lim!ta.

/+ -1/2

(i ') L _ +~-l 2 ; L+ zn =-3 z.~ zl - -eo Z f ~ -1 eof~:l -,

(4.3l) L (1+P)-3/2Eff(r*) = f161f eXP«(J)2/2)} •f~:!t 1.

[pq(ci+1) 2~ exp(a,2j +(2p-1) (2~)1/2exp(a}/2) - 3/2] -:

00 . "-1

fJ z2exp(_z2/2) [G-l (Z2-1/2j+G-l ( _Z3-1/ 2)] dZ} .

-00

Expression (4.31) is positive for all finite (J), so we ~

state the followin[. theorem:

• 50THEOBllM II

The coefficient of biserial correlation has limiting

efficiency 0 for estimating .f when +f --;. -1.

* '4. The Efficiency of r When.f = O.*In order to find the efficiency of r for J' =0 it will

be easier to return to (4.11), (4.12), (4.13), (4.15),(4.17).

Setting JP =0 provides a great simplification, and it may

be shown rather directly that

(4.32)2 '

E(d 10~ L , f =0)= _N(2npq)-lexp(~2)of

E(d210g L t () = 0) = -N(21fpq) -lexp( _c.o2)oc.o2 .j

All mixed partials vanish. Recalli11€) the expression

*for VCr ) from Chapter 1I(2.l), we have from (4.32) and (2.1)A

(4.33) v(j> , f :: 0) = 2npqN..;lexp(c.o2)+O(N-2),

Hence, we may state the following theorem:

THEOREM III. The efficiency of the coefficient of biserial

correla.tion for estimating j> is 1 when.f = O. This

result was to be expected.

•• CHAPTER VSUMMARY A1ID INTERPRETATION OF RESULTS

When f # 0 two coefficients of correlation have beenproposed for estimating ~ and for testin£ the hypothesis

f = f 0 # 0, biserial r* and point biserial r. The as-sumption of an underlying bivariate normal distribution for

*r is about the simplest of those which allow us to specify,and work with the Joint distribution of the continuous and

discrete variables. It has been shown that the assumption

of an underlying normal population With the point of dicho-

tom-r at the mean produces minimum variance in the limit for

* *r; and that for thiS case a simple statistic fer ), de-fined in Chapter III, Section 2, analoLouB to Fisher's z

transform of product moment r, can be used for moderate to

large N. Gra.ph1cal and mechanical methods are available, so

*that the computa.tion of r 1 for ea.ch item in a test forexample, can be carried out ra.pidly and efficiently. How-

*ever, r is not restricted to the interval (-1, +1); infact, it is unbounded as is illustrated in Figure 1. Also,

. *while the efficiency of r has been shown to be 1 for+JP =0, it tends to 0 as J' tends to -1. This is a con-

siderable defect, since we are especially interested in

large correlations.

* .In contradistinction to biserial r we have pointbiserial r, whose model does not specify any underlying

distribution for the discrete variable. Point biserial r

52

is a maximum likelihood estimate of j> under the conditions

of the model (See Chapter I, Section 6), and as such has

efficiency I for large N no matter what value p takes.Corresponding to the property of minimum variance of

* .biserial r for the point of dichotomy at the mean, that ism =0, we have a. similar'property for point biserial r.Usinr r for the model given is equivalent to using the t

statistic. It is intuitively clear that if we are in-

terested in measuring f' ' and. hence essentially the

difference between two mean values, we will get better

results in the for.m of greater power for the t test if

the means are estim.ated from samples in which No=NI= N/2.While the assumption of nor.mality of the residuals which

is used in connection with point biserial r is a restric-

tion, there is some question as to how serious this re-

striction is. Indeed, the theory of Least Squares is

largely based on the same assumption. There is no ques-

tion, however, as to the inherent danger in making the

assumption of under1ying normality in connection with

*biserial r •If the sample is sufficiently large we can get soma

*indication as to whether the model for r , or that for ris more appl1.cable. If the model for r is applicable, the

seta of variates (Xol"."XoN ) and (xIl"."xIN ), aS80-o 1

53

ciated with the zi which are 0 and 1 respectively, should be

*normally distributed. It' the model:~for r is applicable,the,'Sete of variates cannot be normally distributed. In the

* 7caSe of r an approximation to the moments E(X I Y ') ro)canba obtained from the quantiti9s C

kIndefined in Section 1

of Chapter III. We have from (3.:1.4)

E(X 1Y)ro) =f A(ro) Ip, E(X2 / Y:> m) = (1- f ~.l.? 2[ P-kJ.)>..(ro~ Ip,

and E(X3 t Y)ro) = 3f (1- f 2)>..(ro)/p+y3>..(m)(o}+2)/p. After

*performing the substitutions .f = r , >"((1) = >..(t), ro = t,P =mIN (Sse Chapter I, Section 2), if we do not have a

. -1 N1fairly close agreement between E(X? I Y;;- ro) and N1 i~l xli'

*then the model for r is s~spect. To test the applicabilityof r we need only test the hj~othesis that Xoi (i=l,2, ••No)

and XlJ (J=1,2, •• ,Nl ) are norrr~l. with equal variances.

A variety of tests are avaHabJe for this purpose.

It would seem from the evid.e:lCc :r.>resented that point

biserial r is in most cases the better coefficient to use.

While the results obtained will ~e valid only for awnples

of Size N with a fixed p~ti~ion (Wo~N:») point biserial

r in the fixed sample form is on f'l~rly good ~ound in

view of its being a maximum likelihood eS1jim~.te ,of J;)Further.more, the concept of two fixed numbers for the

•

discrete variable, instead of an underlYin8 distribution of

any kind, is more, satisfying to the psychologist in his

efforts to obJectify the testing situation.

If preliminary information pertaining to ID and J? isavailable, then the following tables may be used in order

*to select that statistic r or r which from overall con-. siderations of efficiency, sample sizeJ and the menitudes

of (l) and f seems most appropriate. The statistic chosenmust of course satisfy the requirement that its model is

-Estimation of .?

r

r r

r

Moderate Large

tfl

*r*r

small

small j

Moderate

!j•

I

the more applicable in the sense of page 53. Two situa-

tions are considered, the estimation of f J and the testof the hypothesis J' =5'0 or the placing of confidencel1m:lts on j .

I Large *r r r

.'1'est. ~f ...the "Hypothesis.f'=J1oor the Placing of Confidence.Limits on/'

111

".

Iwl

small Moderate Large i

small :r(r*) *fer } rModerate r r r

t Lares r r j r

c,

..

55

*The recOJl:lll1endat1on is, then, use r for estimating f*when I!' is small, otherwise use r. Use f(r ) for testins

f =.f 0 or placing confidence limits on f if'

•e' ..

TABLE I

The Asymptotic Standard Deviation of

Biserial r * as a Function of p and f .All values must be divided by..rN.

p or l-p

: . 56

· tfe,.,

-.05 .10 .15 .20 . .25 .30 .35 .40 .45 . .50

0 4.466 2.922 2.345 2.041 1.85:7 1.137 1.658 1.608 1.580 1.511

.10 2.104 1.699 1.521 1.419 1.353 1.308 1.21811.258 1.241 1.243

.20 2.011 1.668 1.491 1.;89 1.323 1.219 1.2481.228 1.211 1.213

.30 2.033 1.616 1.440 1.339 1.273 1.229 1.198 1.179 1.167 1.163

.40 1.971 1.543 1.370 1.269 1.a03 1.159 1.128 1.109 1.097 1.093

.50 1.893 1.449 1.219 1.179 1.114 1.069 1.038 1.019 1.008 1.004

.601 1.199 1.333 1.167 1.069 1.004 0.960 0.930 0.910 0.898 0.894

.10 1.691 1.194 1.034 0.939 0.815 0.831 0.801 0.781 0.169 0.766

.80 1.569 1.031 0.881 0.789 0.121 0.683 0.653 0.632 0.620 0.616

.90 1.438 0.842 0.105 0.619 0.559 0.517 0.486 0.465 0.453 0.449

1.00 1.302 0.616 0.503 0.429 0.314 0.335 0.304 0.283 0.270 0.266

•

. -*r .~

.50-~·1I.'H'·-II-,

.~v-j

.20 _.~.J

.10 _.~

1o --'-Xr, - x

J"l ( --)2Nt x-x

o

Lo1-f-.20

~r-· 301-j1_.....0I -

t.~- .501-l- ."0/_.. •10

t ,SO

57

~ is the larger sample mean.~ is the number _of values which make up i,...r is the coefficient of biserial correlat!on.- -

r* is obtained by computing i and :xr. - x ~, laying aJ"l .

jE(x-i)2

straiJht-edge on the three scales, and readins the results onthe r scale.

58

BIBLIOGRAPHY

1.

2.

4.

,-Cramer, H., Mathematical Methods ot Statistics,

Princeton university Press, 1946.

DaVid, F. N., Ta~les of the Correlation Coefficient,Cambridge university Press, 1938.

DUBois, P. H., "A Note on the Computation of Biserialr in Item Validation," Psychometrika,Vol. VII, (1942), pp. 143-146.

Dunlap, J. W., "A Nomograph for Computing BiserialCorrelations," PSlchometrika, Vol. I,(1936), PP. 59-60.

•e1,

,.

5. Fisher, R. A., "On the Mathematical Foundations ofTheoretical Statistics," PhilosophicalTransactions of the Royal Society A,Vol. dCxxfI, (1921), pp. 309-366.

6. Fisher, R. A., Statistical Methods for Research'workers, New York City, Hafiier, 10th.Edition 1946.

7. Fisher, R. A., "Theory of Statistical Estimation,"Proceedings of the Cambridge PhilosophicalSociety, Vol. XXII, (1925), pp. 700-725.

8. Johnson, N. L.and Welch, B. L., "Applications ofthe Non-centJ!"6.l t Distribution,"Biometrika, Vol. XXXI, (1940), pp. 362-389.

9. Kendall, M. G., The Advanced Theory of'Statisticli8Vol. II, London, ciiarles Griffin, 19 •

10. Lev, J., "The Point IU.ser1al Coefficient of Correla-tion," Anna.ls of Mathematica.l Statistics,Vol. XX No.1, (1949), pp. 125-126.

11. Ne,man, J., "Outline of a Theory of StatisticalEstimatj,on," Philoqophical Trans-actions of the Royal Society A,fol. ccXXXVf, (1937), p. 333.

12. Pearson, K., "on a New Method. for Deter'lUning theCorrelation between a Measured CharacterA, and a Character B," Biometrika,Vol. VII, (1909), pp. 96-105.

e1'1

~,. Robbins, R. E. and. lioeffd1ng, W., "The Central LimitTheorem for Random Variables," DukeMattematical •.Tonrna.l, Vol. 1Y No.3,(191;8), 1>P.· 775e ·18o.

14. Royer,E. B., "Punched Card Methods for DeterminingBiserial Correlat-:or~sJ " Psycnomotr1ka,Vo. VI No.1, (1941.), 1'1>.5.5:59-:

15. Soper, R. E., "On tho Proba.blEJ Error fo:i.~ the BiserialExpression for the Correla~lonCoofficient,"Biomo~, Vol. X, (191.3), 1'p. 384"390.

16. Stalnaker, J. and Pichard.son, M. W., "A Note on theUse of B:i.scrial r in TestRGsoe,rch,"Journa.:"" ()f' Genera.l P9~·c.holoey, Vol. VIII,(1933),:pp7~)~465.

59

THE BISERIAL AND POINT CORRELATION COEFFICI]!IJ'l'Sboos/library/mimeo.archive/ISMS__14.pdf ·...

Documents

Transcript of THE BISERIAL AND POINT CORRELATION COEFFICI]!IJ'l'Sboos/library/mimeo.archive/ISMS__14.pdf ·...