THE BISERIAL AND POINT CORRELATION COEFFICI]!IJ'l'Sboos/library/mimeo.archive/ISMS__14.pdf ·...

69
;.- ....•.• _,._- ...'." . . . . . . ', ,">, ,_.- :i 1" , 'tn. ;[7"" :.' " THE BISERIAL AND POINT CORRELATION COEFFICI]!IJ'l'S By Robert Fleming Tate Institute of Statistics Series For Limited Distribution

Transcript of THE BISERIAL AND POINT CORRELATION COEFFICI]!IJ'l'Sboos/library/mimeo.archive/ISMS__14.pdf ·...

  • ;.- ..~..•.•_,._-...'.". . . . .. ', ,">,

    ,_.- :i

    ~'.

    1" ,

    'tn.

    ;[7""

    :.'"

    THE BISERIAL AND POINT CORRELATION COEFFICI]!IJ'l'S

    By

    Robert Fleming Tate

    Institute of StatisticsA~imeo. Series /;~14For Limited Distribution

  • p. vi L 21 "," should be added between "other" and "con·

    tinuous" •

    p. vii 1. 2 "of our" should read "of the joint distribution

    p. 3 L 1

    p. 7 L 23,. p. 23 1. 6e L 10

    '~ p. 29 1. 14p. 59 1. 2

    p. 1

    p. 2

    1. 6

    1. 8L 9

    1. 131. 16

    of our".

    Omit "of"."(21Crl " should read "(21C6) -1"."-2x:sir" should read "2yrylu" •

    '1"; should read "J '1,"" _c..?

    "defined to" should read "defined from the

    sample to".

    II, and" should read", m =the number of ziin the sample which are 1, and".

    fIg' (x) 0" should read "g' (x) ';> 0".

    "5.:t~' should read "l:t~·II { ;" should read "{ - -- }dxi ".fl I " should read " ¢ "."for Random" should read "for Dependent

    Random" •

  • THE BISERIAL AND POINT CORRELATION COEFFICIENTSBy Robert Fleming Tate

    Special Report of research at the Institute of Statistics ofthe University of North Carolina, Chapel Hill, under Officeof Naval Research Project NR 042031.

    FOREWORDBy Harold HoteUing

    The fact that biserial correlation was introduced andcame into general use before the development of the modemstatistical emphasis on exact sampling probabilities andthe theor,y of efficient estimation and testing of hypotheses,which ha\Te not yet embraced biserial correlation in theirformal treatments, leaves unanswered many questions as to theefficacy, proper place, and possible modifications of thiswidely used technique. A feeling that all is not well withbiserial correlation has led more recently to the introduc-tion of point biserial correlation, whicn in turn has led tofurther questions of principle and technique. In this paperMr. Tate contrasts these techniques with each other and withthe theoretically hundred per cent efficient, but computa-tionally difficult method of maximum likelihood.. A paradoxis created by the fact that whereas the correlation coeffici-ent in a population cannot exceed unity, the biserial estimateof it has no upper bound and may occasionally be far greaterthan unity. Mr. Tate shows how this phenomenon is associatedwith a gradually decreasing efficiency, approaching zero, ofthe biserial correlation as it increases. A point of parti-cular interest is the variance stabilizing transformation,'analogous to R. A. Fisher's transformation of the product-moment correlation coefficient, and capable of being carriedout with the same tables, reached on pp. 21-22 for the caseof equal frequencies in the two classes. Extension of thisto the case of unequal classes is now under considerat.ion.Psychologists and personnel workers as well as others con-cerned with test construction, item analysis and correlationwill find in this memorandum a clarification of numeroustroublesome questions that have surrounded biserialcorrelation.

  • ii

    ABSTRACT

    ROBERT FLEMING TATE. The Biserial and Point Biserial

    Correlation Coefficients. (Uhder the direction of HAROLD

    HOTELLING. )

    Two solutions to the problem of finding the correlation

    between a continuous random variable X and a discrete, two

    valued random variable Z are discussed, the coefficients of

    biserial and of point biserial correlation.

    It is shown that the biserial correlation coefficient

    r* has efficiency 0 when the population correlation coefficient

    (' tends to :t L Also, r* has minimum variance for fixed ..

    when the cut value of the underlying normal distl~ibution

    occurs at the population mean; a table is included to illus-

    trate this point. A special case of the limiting distribution

    of r* is obtained for the cut value at the mean. Further,

    biserial r* is shown to be Unbounded, and a diagram illus-

    trating this point is included.

    Tho equivalence of the use of point biserial I' under

    certain restrictions and "Student I Slf t is displayed. The

    relative advantages and disadvantages of I' and r* are dis-

    cussed, and so~ recammendati~ are given as to Which of tho

    two coefficients should be used in various cases •

  • INTRODUCTION

    An important problem in experimental work is that of deter-

    mining the correlation I between a continuous variable, and adiscrete variable which takes on only two values. The need to

    measure such a quantity is "present in all fields of research to

    some extent, but such correlations are of particular :importance

    in psychological testing.

    The fundamental problem of psychological testing 1s that of

    measuring some quantity which has a name, for example intelligence

    or meohanical aptitude. A; measurable criterion is selected to re-

    present the quantity under consideration. In the absence of a:ny

    external criterion the total test score is sometimes used. The

    technique consists firstly in summoning for consideration all

    possible questions, or items} as they are called, which could

    have any bearing on the quantity to 'be measured, and which can be

    answered quickly and unambigously by the subject Who is being

    tested. The item is then scored 1 if the degree of association

    with the quantity is positive, and 0 if it is not. If the test

    is to be efficiently carried out, the items should have low corre-

    lations among themselves, and if it is to be valid, they Should

    all be correlated highly With the critorion.. The methods avail-

    able for finding the correlation between it~ and criterion form

    the subject of this pa.per. The extension of the mode of approach

    used in psychological testing to other types of situat10ne will

    in most cases be evident.

  • ..

    v

    . For testing the hypothesis Jf = 0 product moment l' can beused. Its distribution is independent of that of the discrete

    variable under the null hypothesis. For a tabulation we can use,1(,

    Table V A of Fisher's book [6.t , David's tables 2), or Fisher'sz transformation, which ia eiven in Table V B of Fisher's book (6) .

    There is no problem involved in testinc this hypothesj,a.

    For the case .f ~ 0 two solutions he.ve been proposed. In 1909

    Ka.rl Pearson (12) proposed a. solution to the problem in the form of

    *the so-called biserial coefficient of correlation r 1 which is de-fined in Section 2 of Chapter I. The word biserial refers' to the

    separate sets of values of the continuous variable which are asso-

    ciated With the two values, 0 and 1, of the discrete variable.

    Many results and techniQues of the Karl Pearson school of statistics

    were supplanted by new and improved methods a.fter the advent of

    the penetratinf studies of R. A. Fisher [5] and J. Neyman [l~ •

    The coefficient of biseria,l correlation, however, is still a

    widely used tool in statistical analysis. Jt is treated in many

    texts used in psycholOGical statistics, but its mathematical theory

    has remained in substantially the same incomplete for.m since 1913.

    The basic property of the biserial coefficient is that it

    presupposes an underlyin[ normal distribution from which the dis-

    crete, two-valued, variable can be obtained•. That is, it pre-

    supposes a dichotomization of the normal distribution at some

    1. Numbers in brackets refer to biblioeraphy.

  • vi

    fix.ed point (I), after which all observations which fallon the right

    of this point will be assiened the value 1, end all of those on the

    left, the value O. The meaning of this assumption will become

    clearer if we return for a moment to the notion of a test i tam.

    There the postulation of underlying normality is eqUivalent to the

    Buppoeition that there exists a normal population of attitudes

    towards answering the given test question. Most subjects would, if

    pOSSible, answer in qualified terms, or in degrees of confirmation

    or rejection. However, one of two answers must be given, and it

    is hypothesised that there exists a unique point of demarcation of

    attitudes, and that it occurs at (I), the aforementioned po1nt of

    dichotomy.

    In 1934 another solution to the problem was proposed by J.

    Stalnaker and M. W. Richardson (l~ in the form of the point

    biserial ooefficient of correlation r. In connection with this

    coefficient no specification is made of any under1Ying distribu-

    t1en for the discrete variable.

    Pearson's assumptions were that the discrete variable was

    obtained from. an. underlyinL normal variable by dicnotomiza.tion at

    s~e fixed point, and that there exists a linear regression of the

    other I;ontinuous variab~e upon this normal variable. His der1va-

    *tion of biserial r , based on these assumptions, was carried out* .by the Method of Consistency, BO r is a consistent esttmate of

    f. A discussion of the derivation is given in Chapter I,Seotion 3.

  • vii

    Unless we specify a more complex model than that used by Pearson,

    we cannot speak of our discrete and continuous variables, and hence

    *can dra'" no further inferences about r. The model to be con-

    sidereiL here specifies an underlying bivariate normal distribution

    with '~hree parameters, OW the standard deviation of the continuous

    varia.ble I the cut va.lue of (.I), and the population correlation

    coeN'icient .f' . The model is defined in Cha.pter I, Section 2.*In addition to the fact that r is a consistent estimate of

    *.f ' the asymptotic standard error of r is known. H. E. Soper[l~ obtained this result in 1913. In Chapter II the asymptotic

    *standard error of r will be studied more closely. It will beshown that this quantity, a function of (.I) and! , takes on its

    minimum value for any fixed'p at (.I) = O. A double-entry table

    of the asymptotic standard deViation is given in Table I.

    *In Chapter III, Section 1 the limiting distribution of rwill be derived, subject to the restriction that the point of dich-

    otomy (.I) be taken at the mean of the underlying distribution asso-

    eiated with the discrete variable. The variance of this distri-

    bution offers a partial check on Soper's ~sult. Subject to the

    same restriction on (.I), a transformation will be given analogous to

    FiSher's z transformation, which stabiliz~s the variance.

    *Two important properties of the efficiency of r are obtainedin Chapter IV. By a consideration of the information matriX of

    A .... .A.the maximum likelihood estimates (T', (.I) , f 1t is shown that

    .f + *as tends to .1 the efficiency of r tends to 0, and also that*r has efficiency 1 when f = O.

  • "'e

    . ..

    viii

    In a recent paper J. Lev [10] considered point biserial r

    under the assumption that the N values of the discrete variable

    are not random but fixed. That is, if X denotes the continuous

    variable and Z the discrete variable, ho considered only samples

    (Xi' Zi)' (i = 1, 2, ••• , N), for which the Zi have a fixed Partition,No of them being 0, and Nl of them 1. He further assumed tho

    residuals of the Xi determined from the Zi to be normally distri-

    buted. Lev showed that under these restrictions point biserial r

    is a maximum likelihood estimate of ,f ' and that using it is

    equivalent to using the two-sample t statistic when f =0, andthe two-sample non-central t when .F f: o.

    Neither of these coefficients is entirely adequate; in fact,

    they both leave much to be desired. A great deal of the literature

    available on the subject of biserial correlation-displays the con-

    fusion of the early Pearson school in their failure to distinguish

    between population parameters and sample estimates of these para-

    meters, and to appreciate the significance of the concept of

    efficiency of 8n estimate. An attempt has boen me.de in this paper

    to put the various notions involved on a firmer mathematical foot-

    ing in the light of modern statistical methods. A discussion of '

    the advantages and disadvantages of the biserial and point biserial

    coefficients of correlation may be found in the summary of results

    given in Chapter V.

  • CHAPTER I

    PROPERTIES OF THE BISERIAL MID

    POINr BISERIAL CORRELATION COEFFICIENTSf

    1. Notation

    The following notation and conventions will be used throughout.

    ~(x) stands for (2n)-1/2exp(_X2/2), the probability

    ¢(x,y)

    N(a,b)

    r.;>

    e VeX)...f-l(x)

    plim ~ =I

    dlim~=x

    density of a standard normal variable.

    stands for (2n )-1(1_ f2)-1/2.

    exp ~2-l(l_.? 2) -l(x2j,,r2-2 xyp.. +y2~ , theprobability density of a bivariate normal vector.

    denotes a variable which has a normal distribution

    with mean! and standard deviation E;i 9 the variance of a random variable X•

    will stand for l/f(x) for any function f which

    appears.

    denotes a sequence of r?llldom. variables IN(N = 1,2, ••• )tending in probability to a random variable I as

    N tends to r:f:)

    denotes a sequence of random variables XN(N=l,2, ••• )

    tending in distribution to a random variable X as

    N tends to 00 • That is ~o say, if we define

    FN(X) = P(XN.~ x) and F(x) = P(l ~ xl, then Fn{x)

    tends to F(x) at every continuity point of F(x).

    Greek let~ers will designate parameters.

    Lemmas will be assigned Arabic numerals.

  • 2

    Theorems Will be assigned Roman numerals.

    Formulae will be assigned Arabic numerals with the

    number of the chapter appearinf first.

    References Will be numbered in square brackets, and may be found

    in the bibliography.

    2. Mathematical Model

    (x, Y) is a two-dtmens1onal"random vector with the fr~quency

    distribution ¢(x,y).

    Y >- (.I), where (.I) is a fixed constant of the Y

    Y

  • [.1 _. 21 1/2

    With z :: 0 and z ~ 1 respectively, s = N t{xi-i) j ,00 00

    defined by the relationmjN ==f 1I.(y)dy.t

    *,. Karl Pearson's Derivation of r

    and t is

    ;.

    The essence of Pearson's method was tha.t he found a relation

    between the population means under the assumptions of normal!ty

    underlyins the Z distribution, and the existence of a linear re-

    gression of X on Y. More precisely, let

    ~ = ECX t Y~ (1), bl :: E(l , Y~ w),.ao = E(X , Y< co), bo :: E(Y , Y

  • 4. *Bounds for Biserial rThe ordinary product moment r is bounded between -1 and +1.

    *It is eVident on slight investigation that r does not share this

    property.

    Let us consider for a moment that we are dealing with a fixed

    set of Zi:

    No+Nl = N.

    namely, No of the Zi are ° and Nl of them are 1, where*Then r becomes

    00

    where c is determined from Nl/N = S A.(y)ay.c

    It can now be shown that, for .f =0,

    (1.4)

    .'

    has the two-sample t distribution with N-2 degree~ of freedom.

    This is a transf~tion of. a result by Lev (10) , of which morewill be said in Section 6 of this chapter. For the case of fixed

    proportions Nl/N and No/Nl and fixed sample size N, theh, tallows

    *us to compute upper and lower bounds for r , and to make probabilitystatements about these bounds. A nomograph for the computation of

    r* by Dunla.p [4J is Biven in Fie;. 1 at the end of this paper inorder to illustrate the wide range of.valuea which can be obtained•

    Dunlap uaes slightly different notation. He transforms expression

    *(1.1) for r into

  • 5

    (1.5) r* =s~l(xl-x)(m/N)X-l(t) by using the relation(m/N')~+(l-m/N)xo =x, and then considers xl to be the larger ofthe two sample means. In this way he obtains only values of

    *r ~ O.

    *5. Machine Methods for the Computation of r· ,,There are several mechanical methods available for the compu-

    tat10n of r*. Royer (14) . *devised a method for computing r for

    .,

    each i tam in a test I using punched cards. His method requires

    * -for each r the necessary steps for finding x and a, and then asorting of all cards, separating those which have z =1, and atabulation of part of them. Du. Boia [3] improved upon Royer's

    method by reducing the number of steps to a point where only one

    *complete sorting is required and several r may be computedsimultaneously,

    6. Point Biserial Correlation.

    Model: Let X again be a continuous variable, and. zi (i=l".",

    N) be a fixed point in N-space. N of the z. are 0, and NI

    ofo 1

    them are 1. We consider a model in which X has mean !-I..- and(

    standard deviation CT , and the variables Xi - fl' -a( zi-z) areN(O, 0"'(1- .f 2)1/2) , where a = fQ-N(NoN1)-p../2.

    Under this model the ordinary product ~nt r between x and

    z is called the point biserial coefficient of correlation, and is

    defined to be

  • 6

    ) -1(- -) -l( )1/2(1.6 r:;: s xl-xo N NoNl •

    The relation between the expressions for the biserial and point

    biserial coefficients is

    00

    where c is again defined by Nl/N :;:J ~(y)dy as in (1.3).It can easily be shown that r Is a maximum likelihood esti-

    mate of f Lev (10) , using the model above, showed that thedistribution of r is equivalent to that of Student's t when

    f :;: 0, and a non-central t when.f ~ O. The parameter of thenon-central. t distribution is ~:;: (N)l/2jP (l-JP 2)-1/2, which

    i8 independent of No and Nl • SummariZing, we can use

    t = (N_2)1/2r (1_r2)-1/2 to test the hypothesis ~:;: O. When

    f :I 0, we can compute confidence lim!ts for ~, and hence forl' J or test the hypothosiS .p:;: .f0' by making use of Table IV

    given by Johnson and Welch [8] for the non-central t distribution.It should bo emphasized that the results hold only for the case

    of fixed numbers Nand NJ in the two groups, the same in all ofo .

    the samples With which our particular sample is compared.

  • CRAPrER II

    THE ASYMPTOTIC VARIANCE OF r *

    H. E. Soper [15] gives the following expression for the

    * -1asymptotic variance of r as far as terms of order N

    This chapter will contain an investigation of the critical values

    *of VCr ) considered as a function of p and f. We shall· firstprove two lemmas.

    00

    LEMMA 1. If p ::f A,(y)dy, where A,(y) = (2TC)-l/2exp(_y2/2),x

    then (l-2p)A(x) {> p(l-p)x

  • 2 2Setting a'(x) = ° gives p ..p+2~ (x) = 0, which for the in..

    terval 0 < x ° A2 (X) (16x2+8) < 1 and Sex) has a. minimum at the

    K(x)< 0

    point x.

    ;\2 (x) (16x2+8) > 1 and Sex) has a maximum a.t thepoint x.

    Now consider the equation K(x) == 0, or defining

    ~ (x) = ;\2 (x)(16x2+8) , l.1(x) =1. l.1(x) is unimodal withits maximum at x =2-1/2 and minima. at x == O,·I'Q • Furthermore

    1.1 (0) == 1. 27, ~(oo) == O. Therefore, ~(x) == 1 has a single

    solution x = Xl in the interval 0 < x

  • Proof:

    y

    -- 9Assume that such a point x, exists. Then substituting from

    (2.5) in Sex), we have by hypothesis

    (2 ] 1/2 2

    (2.6) s(x,) = ~(x,) 1-8~ (x,) -2x,X (x,) < 0, or

    2 ) 2 2 2)~ (x, (4x, -1-8) ;> 1, which means that (4x, +8) > 21fe'W(x, •The least value which X, can take is Xl' but xl> 1.,. SO, sub-stituting x, =1., in the above" we have 14.76 ,. 21fexp(l.69),Which is false, and a fortiori false for all x >1.,.- ,

    Therefore, there are no critical values of g(x) in the range

    0< x < 00 except for a maximum at x2• Combining this resultWith (2.4', we ha.ve gex) > 0 for all x >0, which proves the lemma.

    00 1/2 2!,EMMA 2. If P =f ~(y)d.y, where hey) =(21f)- exp(-y /2),

    xthen p(1-p) ~ 'A,2(x)1f/2 for all x" .. 00 < x < 00 , with equalityat x = 0, : 00

    :x

    Define F(x) = J"I 'A.(y)d.y. Then p(l-p) = (l.F)F, and we wish "-00

    to show F(x) [l-F(xil ~ (exp( _x2>] /4. Let

    2 r 2:\hex) =F(x)-F (x)- lexp(-x )J /4.

    hex) is an even function of x, so we need study only the case

    o~ x -< 00. Differentiatins once produces

    2ht(x) = ~(x)-2F(x)A(X)+(x/2)exp(-x ),

    which with a little simplification becomes

  • ...

    e

    ....

    10

    It is easily shown that

    (2.8) hex) 1s c01\tinuous, h(O) = h6x?) =0, hI (0) = o.By the law of the mean for integrals F(x) 5 (1/2)+x(2n) -1/2.

    Hence, sUbstituting in (2.7), we obtain

    (2.9) h t (x) ~ ~(x) f1-2 [X(21t) -1/2+l/~ +X(1C/2)l/2exp(_x2/2~ .=xl-(xl {(n/2)1/2exp(_X2/2l_(2/nl1/ 2] •

    From (2.9) it is clear that in same neighborhood of' x = 0

    we have

    Now examine the function G(x).

    GI (x) = _2~(x)+(n/2)1/2(1_x2)exp(_x2/2) = ~(x) {n(1-x2)_e} .In the interval 0 ~ x 0in sorne neighborhood about O. In the light of' these f'acts and

    (2.8) it is evident that hex»~ Of'or all x in 0< x(oo ;

  • 11

    otherwise, hex) ~uld have at least two critical points in the

    interval. This proves the lennna..

    *THEOREM I. V(r) has an absolute minimum for aIJy fixed .p atl' = 1/2.Proof in three parts:

    *PART 1: It is easily shown that VCr ) has a relative minimum atp = 1/2 for aIJy fixed 1'. Indeed,(2.12) V(r*) = N-1 {p4_2.5.;p2+'p2A+B} , where

    A = pqa?,,--2(ill)+(2p-l)ill:\-1(00), B = pq:\-2(rot.

    Recalling that :; = _:\-1(

  • ..12

    and hence d~ I = 2nN-l (.f 2(n-4)+rc-2}. It is easily seen thatdp (.1):0

    O

  • II

    13

    By Lemma. 2~ pq~-2 ((1) ~ n/2 with equality at co = 0 or p = 1/2.

    Combining parts 1 to 3, we have

    aba ,n { abs T vCr*) for any p1'occurS at p =1/2, which.:= fortiori proves the theorem.

    The points brought up in this chapter are adequately illus-

    trated in Table I which gives the asymptotic standard deviation

    *of r as a function of p and l' •

  • ..CHAPTER III

    A SPECIAL CASE OF THE LIMITING DISI'RIBUl'ION OF r *.

    We shall need the folloWing three theorems.

    THEOREM A [13) : Let (V.tN' ... , VkiV) and (Vl' ••• I Vlf) be random

    vectors, where N =1, 2, ••• If(a) dlim(V1N'."'V~) = (Vl""'V~),

    (b) bl , b2, ••• is a sequenoe ot real positive numbers such

    that bN --+ 0 as N 4 CO I

    (0) H(vl,v2, ••• ~vk) is a function ot the real variables

    vl' ... , vk which has a total differential at the point

    (0, ••• ,0) .. and

    (d) Hi =oH/Ovi

    at the point (0, ••. ,0),

    then, if-

    dl1m WN =H1Vl+ ••• +HkVk •THEOREM B (1):. Let ~ • ~+YN" where ~, YN1 ZN are sequences

    of random variables. It

    (a) dUm ~ =X,(b) p1im YN = 0, where c is a constant,

    then dl1m ZN =X+c.

    ·TBEOREM C (1): Let ZN =~YN' whe:r-o ~, YN, ZN are sequences of:r-andom variables. If

    then

    (a) dlim IN =X..(b) plim YN = 0, where c is a constant..

    dlim ZN = oX.

  • II

    15

    As a corallary to TheoramC, we have for c =0, the resultp~1m ZN i= O.

    Note: No condition of independence is required in Theorems B and c.

    LEMMA 3: If

    00

    m/N ::; f ).,(y)dy,t

    " 00

    p ::; J ).,(y)dy,00

    where m is the number of successes in N trials with the probability

    of suocess, p, a constant for each trial, then the necessary and.

    sufficient condition that

    1s 00 =O.Proof: (l)

    (m(N)-p= ~ ).,(y)dy,t

    which by the law of .the mean for inte[,rala eives

    (3.1) (m/N)-p ::; -(t-(l)~(t), t < t < 6,).

    Now expand ).,(t) in ~ Taylor series about the point (J),

    (;.2) ).,(t) ~ ).,(w)+).,t(m)(t-oo)+o(t-w).

    m is a binomia.l vari.able with parameters N and.p, so we have

  • 16

    We can write (3.2) in the form

    Now as N -+ 00 , the second term on the right tends in probabi11tyto Oby the corollary to Theorem 0, since rt-/2(t_cu) haa a limiting

    ,

    law. From this result and Theorem B it can be seen that the corollary

    to Theorem C Gives a necessary and suffioient condition in this

    case: namely,

    Plim~/2 [1I.(CU)-A(t~ = 0

    11' and only if At «(.I) = O. But cu is a fixed finite number and

    At«(.I) = -cuA(CU), so (.I) must be O. This proves the lemma.

    1. Derivation of the Distribution.

    We start with expression (1.2) for the biserial correlation

    coefficient.

    *r =(N-1Ex

    izi-i Z)A-1(t)

    {N-iOE(Xi-x)2] 172

    This expression may be written in the fOl~

    - -1/2* (- - -) 2 -2 -1)r = xz - x z (x - X}A (t.

    *Thus r is essentially a. function of sample means. It is. *evident that r is invariant under a transformation of the form

    y =xkr . Hence, in the model for r* put cr::ll 1.

  • 11

    For our purposes we w111need essentially a reformulation of

    Theorem A. Let (Tux." T2Q(. "T;~, T4a ) (a£,.= 1" 2" ... , N) be

    a set of independent, identically distributed random vectors with

    finite covariances 0;,J• Now define

    (a) TiN =Ti-ETi (i = 1" ••• , 4),(b) b

    N= (N) -1/2"

    2 -1/2(c) H(v1, .•• "v4) = (v1-v2v;)(v4-v2) ,(d) ViN = ~/~iN.

    ,;By the generalization of the Lindeberg-Levy fonD. of the

    Central Limit Theorem to the case of random vecotrs, we have

    where (Vl" •• • ,v4) is a 11Drmal random vector With mean (0, ... ,0)

    and covariance matrix (0""1J) i, J = 1, •ow, 4.We have now satisfied the conditions of Theorem A, and so

    (3.6) dlim WN = d1im r/2{ H(T1N, ••• ,T4lf) -H(O, ••• ,011/2

    =N(O, f~~HiHJ()"iJ} ),where

    Now in order actually to find the variance of the limiting

    normal distribution, we adopt a mechanical technique due to p. L.

    Heu which amounts to the sama thing as using the form of Theorem A

    above" but which sa.ves labor. What Thoorem A actually tells us is

  • 18

    to use a two-term Ta.ylor expansion of the funotionof sample

    means about their expected values.

    Define varia.bles of the form T'

    Xi =N- l / 2{xz)'+EXZ,i = N- l / 2x"

    = r?-/2(T_ET). Then,

    z: =r-t l / 2z' +1',x2=N-1/ 2(X2) '+1.

    Upon substituting expressions (3.7) in (3.4) we obtain

    (;.8) r* = \ (N- l / 2(xz) '+m) _(N- l / 2x l)(N-1/ 2z 1+1'>J .

    {1'-1/ 2(x2) •+1-(N"1/2x , )2r1/\-1(t) .Now, the efficiency of Hsu's technique comes in combining the

    terms in (3.8) I removing those which are O(N-1!2). Using a two-term negat1ve b1nOIll1al expansion for the second term, we get

    Now by Theorem C

    r!2 ~ r*-(m».-let) ~will have some limiting law as ,

    1 ~ 2 J"'- .(w>{ xz-px- (EXZ)x /2 •Applying Theorem A, we have tbe resUlt

    (3.1l) dlim Nl/2 [r*_{EXZ)iI.-l(t~ .. N(O,iI.-1{w)·

    { V(XZ-pX- (EXZ)x2!211/j

  • •e

    19

    Unfortunately, this is not quite satisfactory, since what we

    desire is the limiting distribution of

    It can easily be seen that the two expressions are not the same.

    Indeed,

    (,.12) r/2 [r*-(l!:XZ)",-1(CD1 = Nl / 2 [r*-

  • ..e 20

    Integration of bA by parts gives the recursion relationship

    The Clan may now be written down. Note that Clan is independent of

    k for k ~ 1.

    Making use of the facts that bo = p, b1 = A(oo), 00 = OJ we computethe following constants.

    We now have from C~.13),

    Hence, from (3.15),

    1/2 * 4 2 1/2(3.16) d11m N (r -f) ::t N(O, {.r -2.5' +sc/21 ).

    Among other things (3.16) provides a check for Soper's

    result (2.1).

  • ..e 21

    2. Variance Stabilizing Tr~formatio~r the Case (j) == ~.,'\.

    It 18 desirable to find a func+;:ion f':::'~') such that

    -~/'2[ (*) /t;)~~ TI '. dlim N- . f r -fl... J = r,~O.,l).

    This function f(r*) haeseveral adva.nta.geo:'

    (a) Its standard error is practically ind3pc~dent off';

    * *(b) f(r) will tend to nor.mality faster than r no matterwhat value l' takes on;

    (c) The fom of tho distribution will be nearly the same for

    *all f for moderate N, while the distribution of I' for-l( 4·' 42. / )large N will have a variance N r ..2,5)7 +Jr '2 whioh

    is markedly changed, becoming peaked for p near :1,and flat for p near O.

    In connection with this type of transformation see (6) Chapter VI.. . *

    Consider a Taylor expansion of fer ) about p as far aster.ms of order N-1/'2.

    (3.17) f(r*) =ff )+f'(f )(r«~J')+o(N-l/'2L'2 .

    E (f(r*)-f(f)] =1/1 by bPothesis.E [f(r*)-f(f)] '2 = ~t(f)] '2 E(r*-f)'2+o(r("l) by (3.17).

    Then aside from terms O(N-l ) we have, equating the two previousresults, a differential equation

    (3.18)

    .~

    ewhich can be rewritten as

  • '.,

  • ..eCHAPTER rr

    EFFICIENCY OF THE COEFFICIENT OF BISERIAL CORRELATION

    1. ·.~el1minary Matters

    From the mathematical model 6iven in Chapter I, Section 2,

    w" mow that

    x xP(x ~ x, Z =0) == f J" ¢(x,y)dy dx, and

    -00 -00x 00

    P(Xi x, Z = 1) =J f ¢(x,y)dy dx.-00 (l)

    Therefore, the probability element of the sample (xl' zl)"'·'

    ~, zN) ma:y be written

    The Method of Ma.x1mulll Likelihood will now be invoked in

    ..... /\.'\-order to give us estimates l' ' (1), ;:r of the population para-meters f' ' (1), iY respect!vely• These estimates have severalnice properties. They are consistent, tend in distribution to

    normality as the sample size increases, have minimum variance

    in the limit at least, and provide sufficient statistics where

    any eXist. The Method of Maximum Likelihood we.a first introduced

    by Fisher in 1921 [5J ' and was substantially improved upon bythe same author in 1925 l7J . These papers are both descr1bedalone With other pertinent material by Kendall in Cha.pter XVII

  • · of his book \~]. For a more rigorous treatment the reader is

    referred to Chapters XXXII and XXXIII of Cram6r (1) •

    The likelihood funotion of a sample (xlI zl), ••• ,(~,zN)

    is defined to be

    24

    N(4.2) L =n f(xi ,z1).1=1It will, as usual, be convenient to take logarithms and maximize

    log L.

    Because of the complexities inherent in the caloulations

    of first and second partial derivatives of log L, the work will

    have to be done in stages. The following notation will be used

    (4.4)

    (1) aro _~log f(x,z) IdQ:" - ao .J J ~.=o

    d2 f 2(ii) a _ d log f(x,z) Ii

    aoJatSk - dOJaok •z~o

    (iii) Ofl diog f(x,z) IdQj = d'SJ {z=l •

    d2f 2(iV) 1 _ d log f(x,z) I

    ()Qj(1Qk - CiOJdQk •z=l

    ...e

    where QJ' Ok run over the parameters f ' ID, ()" and f(x,z) isdefined by (4.1) •

  • 25aj( ¢(x,y)dy = R(a) ..

    "00

    (v)

    00f ¢(x..y)dy:;: Sea)a

    00 (l)

    (vi) Eo(Mo/?,oj) = q-l I J (Ofo/doj)¢{x,y)dy dx,-00 -00

    ..

    the conditional expectation of 0dlOS f given z :;: 0i

    00 00

    (vii) El(Ofl/aoj ):;: p-l f f (Ofl/doJ)¢(x,y)dy

  • ..e•

    26w 00

    N (l",Zt) f (a¢(Xi,y)/OO" )dY+Zi! (~(Xily)/a )" whichwill in the limit be the minimum variance among all variances

    1\of estimates of .f as defined by our model. V(f ) can beobtained once we have the elements of the information matrix.

    2. Determination of the Information Matrix.

    By definition the infor.mation matrix is

    and we have

    (4.8)

    A 1\ "where Jl. = «(J'" iJ) the matrix ofc~variances of .f ' (I), 0"The derivation of the elements of the information matrix Will

    be given in two ~arts, each consisting of several cases.

    Part I, Case 1:

    USing the notation of (4.4) we have

  • ..

    27

    .,?t/a f2 = fll((j» j (a2;/af 2)dy_( J(?Jt)/af )dy)1 1l-2((j»~ -00 -00

    00 en2 I 2) -1 f f 2 2EO(d f

    O'0 J =q ¢{x,y)('O fOldf )dy dx,

    -00 -00

    which becomes

    00 en 00 en 2q-l J Jccl¢lof2)d:y dx_q-l J(f (-a¢I'Of )dy) R-l(w)dx.

    -00 -00 ) -00 -00 ..-I'- ~ \.. ., ..

    A B

    The well known results

    are to be used extensively in this chapter.

    It is important to notice, first of all, that A and B .are

    invariant under a change of scale of the X variable. According-

    ly, we can replace xlrr by x, or in other words let cr = l-In view of (4.9)

    1\ dx.

    ~

    Putting (j = 1 in ~(x,y), and. differentiating, we obtain

  • ..e

    28

    (4.10)

    ¥ =-(x- ..f y)¢(x,y) (1- J' 2)-1

    ~ = -(Y-..f x)¢(x,y) (1- .f 2)-1

    03¢(x,y) = ..¢(x,y) {2,f Cx-.f' y) )CX- f' y)2 _ 1 1.ax2ay (1- .f2)2 (l_j'2)2 1-f2J

    (y- j' X)11-J 2 J

    Substituting in A from (4.10), and letting

    we obtain

    00 /1 2 2 -2 f 2·{ 2 2 1 2}A = .q" exp(-m /2)(1. f) . (z -1) mel. f j..z f (1- f) J.

    -00

    Z bae the frequency function of a variable which ia N(O,l),

    eo A :;: O.

    Now consider B:

    )

    2 2 2 2 -2= (x-fro) ¢ (x,w) (1-f) .

  • Case 2:

    29

    d2r/d f 2 = f(")[ (d2t1!df2)iJ¥-(1 (ar;!d f)iJ¥)JS-2(,.).

    2/221 2El (d f l d f ) and Eo(O f o d f ) are the same except for thelimits of integration on y and q, which ris-· nO!!,:':lz.eplaced by p •

    ..With the aid of the fundamental relation (4.5), the following

    ma.y be written:

    00

    d2l L J 2 2 '2 -2(4.11) E 0 8'2 = -N ¢ (x,O)(x- .f 0) (1- f ).d f -00

    2E d log L

    ?m2

    d'2fol?m2 = ~(0)(d¢(x,0)/?m)_¢2(x,0)} R-2(0)

    Aga.in let ~ =1.

    00 00

    Eo

    {d'2fo l?m'2) = q-l f (d¢(x,co)/cm)dx-q-1 J'I ¢2(x,0)R-l (0)dx.-00 -00

    00

    From ~ ~.l.O) A::o oq.-l I (0)-J x) (1..J 2) -J !(x.,O)dx.-00

    Let z = (x- fO) (l-! 2) -1/2• Then, we set

  • ..e

    ...-

    exp(-z.2/2)dZ

    == _q-l(2n)-1/2~ epx(~2/2).

    Hence, 00

    E ca2f /cm2 ) == _q-l(2n) -1/2~ exp( _m2/2)-q-1 f ¢2(x"m).o 0

    -00

    R-1(m)dx.

    Again E1ca2fl/~2) is almost the same as Eo(??fo/(!iJ)2)

    00

    E1Ulfl/"al) :I p-l(2n)-1/2(j) exp(~2/2)-p-lf ¢2(x,~)s-1(m)dx.-00

    From (4.5) we obtain

    2Case %. E;; log L_......:.J:;...... diiaf .

  • 31

    We see :tmmediately from (4.10) that the first term vanishes. In

    a sim11ar way we can obtain

    00 00

    "lcclt 1/?t»¥) .. _p-l f (?I/J(o,f )u+p-lJ ¢2(x..a>)(x- f m)'-00 -00

    (1- f 2) -lS... l(a»dxin which the first term vanishes. Rence ... by (4.5)

    2 00(4.1~) E·al;aj,L;I NJ (x- f m}¢2(x..m)(1- .f2)-1

    "00

    Case 4: E 'd2

    1og La;;a

  • ( ~.14) ~ ~ ~(x,l') ( x2(1- f 2) -1 CT - 3_!xy(l-f 2) -1.

    (1-2_ cr- l ].

    [f 2x2l(1- 1'2)-2- 3x2 (1-.f2)-~

    +4 f'xy(l-5'2)-10--3->20--2J00 '

    The integral .f (d¢(x,co) /0 cr )dx is taken by the-QO

    transformation z = (x- f (j) 0") (1- .f 2) -1/2 cy-l into the form

    00

  • e..

    2C l:: E 0 log L~ee ,,: 3.f 3cr ·Let r = (x-.f a> tT )¢(x"a» cr-2(1-.f 2). Then"

    '3"\/'3j' '3cr ={ -R(",)( '3.-l'3er' )+r J(?Y/JI'3 IT' )dy] R-2("') ,-00

    00

    "i\/'3'l '30" = [8("')('3.-1'3 cT )-r.r (?Y/JI'3CT) dy} S-2(",).(.0

    USing (4.5.) and sUbatitutins for r" we obtain

    2 00·

    (4.16) E ~fo~; = Nf (x- f a>er )0'-2(1_ f 2)-1¢(x"a»"-00

    a>

    {R-1(a» f (o¢/d cr )ely-00

    00

    -S-l(a>1" (Ot)/?J(l')dy} dx.a>

    33

    Ca.se 6:2

    E 0 log Locr 2 •

    ...

    a>

    '32f older' 2 = {R("') f ('32¢1'3 0-' 2)dy-00.[.l (?Y/J/iJ 0"' )dy J2} R-2(",)

    iJ2f /M:r 2 = {S(",)j ('32¢liJ cT 2) dyen

    _[j (,!lNiJo-')dy ] 2 S-2(Ul).a>

  • ;4

    It may be shown after some tedious algebra that

    00 ID 00 00

    Nq J f q-lU!p¢/ocr2)dy dx+NpIf p-l(o2¢/ofJ 2)dy dx-00 -00 -00 W

    • N j j ui¢/?JIY 2)113 dx =O.-00 -00

    (4.17) E ?J:~g2L • -NI ¢(x,,,,) { R-1(",). [-l (CY/J/?J(l')i13] 2+8-

    1(",) [I (CY/J/?J(T")113 ) 2} dx.Part II.

    The inta&ral eXpressions for the elements of the informa-

    tion matrix a.ppear to be rather formida.ble. After forming

    the matrix, we will want its inverse; in particular, we

    '"want the element V(f.) in its inverse. In order to. *

    examine the efficiency of r as J? tends to 1, J' must be'"allowed to tend to 1 in the final expression for V(j> ).

    Here bad oomplications enter, since it can be determined

    that none of the expected values (4.11), (4.12), (4.1;),

    (4.15), (4.16) exists when l' tends to 1. They all behavesatisfactorily over- the whole x r~e - c£> < x

  • '5is expressible as a product C(l~f )~kal where C is a functionof J' I m which remains finite as ~ tends to 11 k is apositive integer, and H is an integral which exists for all

    f ' then we can invert the information matrix and seeA .

    what happens to the element V(j> ) as l' tends to 1. Ob-A

    viously, f!+'l V(]> ) will be finite constant ~ 0 I since

    the efficiency of a statistic must lie between 0 and 1, and

    *we already know that VCr ) is finite from Chapter II.Now make the following definitions.

    e•

    (4.18) M=¢2(x/m) {R~l(m)+s-l(m)J •00

    G(u) =f exp( _t2/2)dt.u

    Let

  • e"

    the two exponential expressions in (4.19) to the form

    exp( _z2/2) •(a function of f ' oo). The exponential canbe put in the form

    ...xp { -(1+ f 2) [2(1-.f 2) 1-1 [X-2.f "'(1+.f 2)-11 2_002(1+f 2) -1] ;

    whence it is seen that the desired transformation is

    (4.20) Tl

    :· Z = (1+f 2)1/2(1_ f 2)-1/2 (X-2J'(J)(1~2)-1] ,which pz:oduces . .+

    x = (1_y 2)1/2(1+.f2)-1/2Z+2fOO(1+f2)-1,

    (f X_(J))(1_j 2)-1/2=J' Z(1+f2)-1/2_00(1_f2)1/2(1+f2)-1= zl

    x-foo = (1_f2)~/2 [z(1+f2) -1/2+fOO(1_.f2)1/2(1+.f2) -1J'and dx = (If 2)1/2(1+j'2)-1/2dz •

    Upon performing these substitutions in (4.19), we obtain

    (4.21) M = (22t)-1(1_.f2)-1/2(1+f2)-1/2exp(_002/l+.f2)

    •oxp( _z2/2) { G-1( zl) +G-1(Zl)1.Caso 1: Consider integrals of the form

    00

    I k = SM(x- f oo)k(l_. f 2) -kdx •-00

  • ·e

    37

    This is the form of (4.11), (4.12), (4.13). upon applying

    k(4.21) and using the tran~for.m of (x-~ 00) given in (4.20)

    on I k, one obtains immediately

    . 00 .

    exp(-0o2/l+J'2) J{Z+ f oo(1_p 2) (1~p~)-1/2J ~-00

    It is easy to establish the fact that the integral

    in I k exists for all finite k~_ 0, even if ~ tends to :1.

    z(2) -1/2 as f~ 1

    ,-z(2) -1/2 as l' ---+ -1

    while the balance of the integrand tends to zkexp(_z2/2).

    It 1a easily shown by de l'Hospital's rule that

    00J z:kexp(_z2/2} { G-l(z~)+G-l( -z~)}cL1..-00

    -1/2must exist for all 1.1 < 1. In our case ~ = 2 ,so we

    have eXistence. The factor multiplying tho integral of

    course becomes infinite like (1+ f }-(k+l)/2 as .P~ :1.Note that the integral in I k Will, in the limit, be an odd

    function if k 1s odd, and an even funotion if k is even.

  • Expression (4.22) then offers us a partial evaluation of three

    of the elements of the information matrix"

    2 2E ~Og L =NIl" E 0 10~ L = -NI •

    j> an 0

    Case 2:2

    EO 108 Laoder·

    Recall expressions (4.14) and (4.15). Upon substituting

    (?IJ/ofl')'from (4.14) in (4.15), we have some cancellation"

    and (4.15) becomes

    00 00

    (4.24) NJ0"'f(l~ f2) -1X¢2 (x,m)r-1(m) J ~(x,y) dy-00 -00

    _S-l(",)! ~(x,y)dy] dx.00

    Now" when we replace x/O' by x, we have the same ex-

    -2 -1pression except that C" is replaced by Q'" , and the

    rest of the integrand conta.ins no OV • Performing the

    transformation t = (YO' f x) (1- f 2) -1/2 on the expressionin brackets" we obta.in (00- f x)(l-j> 2)-1/2 2

    1 00 J(t(1-p2)1/2+fx]e-t /2dtN: t:> f (1- f· )20--] - _[0X¢(X1OO) -00/

    J L (00_PX)(1_p2)-12f -t~/2dt

    -00 e

  • ..e•

    00 2J e -t /2 dt( 2) -1/2

    dx.

    39

    be

    After we cancel the right members in the inner integrands

    and revise the limits on the first inner integrals, we obtain

    Perfor.m1ng the integration in the numerators of the bracketed

    expression, we get

    00

    (4.25) - l' N 0"'-\1- f2)-1/2 f x¢(x,ro)exp-00

    f 2-1(1- f 2)-1(ill_ f x)1 {G-1 [ ( r X-ill) (1-j' 2)-1/2]+G-1 [ (Cll- .f x)(1- f 2) -l/jJdx.

    By comparing (4.25) with expression (4.19) for M, one

    can qUickly deter.m1no that (4.25) ma,y be written as

  • •e•

    00-If- .f N (J" - xM dx.-00

    Now, applying·T. as defined.in (4.20) we arrive at an expression

    similar to (4.22) in exactly the s~e way as before. The re-

    sult is

    (4.26) E O~a80"L = -Nf L2Jt cr (l+ j> 2)J-l{l_Y )-1/2{1+j' J1/200

    exp(-

  • ..e(4.27)

    41-02 -1 /

    E a~o~~ = -N.f (21f 0' (l- .f )(l+.9)] (l+.f 2)"3 2.00 _ .

    _[~ z (1- .f' 2) 1/2+2 f m(l+.f 2) -1/2]

    [z:+ fID(l- f 2)1/2(1+.f, 2)-1/2J exp(_z2/2).

    [G·1

    !Zl) +G-1

    (-zdJdx. .

    'e

    ',.'

    ..e

    Case 4: E 02

    1013 Lo0'2 ·

    .Recall expression (4.17)

    The simplest wa:y to treat this integral is to write it in

    the form00

    00 S(a¢/'?J0')d.y 2-N S¢(x,w) {R(ID) [-00 R(ID) )

    -00 00

    r I (OSl/Cl 0". )dY. '. 2J+S(w) l 00 SCm) ] dx.

    After substituting the value of ~xo:) from (4.14) ~

    ropla.cing x/a by x, we are left with the following ex-

    press1ons, which conta.in cr only as a factor •

  • 42

    00f y¢(x,y)dy dx +-00

    00 m(0) - .f2w0--2(1- y2) -1 f b(x,(J)) {R-1(",) [f y¢(x,y)ayr

    -00 -00

    Consider (A). This is of a slightly different type from

    those which we dealt with before. This time the correct

    transformation is

    which produces

    x = (1- f 2)1/2(2_ f 2)-1/2?+ fm/(2- f 2) I

    ( f x-m){l- y 2) -1/2: y z(2- .p 2) -1/2-2m(1- f 2)1/2(2- f 2)-1,

  • ,.

    eand dx = (1- f 2)1/2(2_ J' 2)-1/2dz . Substituting these valuesin (A), we obtain

    j-00

    ..

    00

    Consider (B). In Jy¢(x,y)d.y make the transformation-00

    00

    2f 2N0--2(1_ .f 2) -1 J x2 { x2(1_ .f 2) -1_~.-00

    ¢(x,m)exp( _x2/2)dx.

    Now apply:.· .. T2

    given in (4.28). We obtain, in exactly the

    same way as before,. -2

    (B) = 2 f' 2Nf3'"' (1- f )(l+f )1 (2n)-lexp(_m2/(2_p 2) ·00 2-L[ .(1- f 2)1/2(2- f 2) -1/2+f' ID/(2-l' 2)}U.(.h- j2)(2- f 2)-1/2+ fID] 2.(1_f2q.

    (2rt)-1/2exp (_z2/2)dz .

  • 44

    Finally, consider (0).

    Since there are essentially no new ideas involved, and

    in view of the fact that this calculation is much longer than

    the others, a sketch of the method will be given, tOfethor

    with the final results.

    (1) Ma.1ce the transformation t = (1'- j>X)(1_p 2)-1/2

    in the numerators of the expression in brackets.

    (ii) We have (0) = 01 + 02 + 03'(iii) 01 requires transformation T2.

    (iv) 02 can be shown to vanish, With little effort.

    (v) 03 requiros a new transformation of the same type

    as Tl' 1'2: namely, T3,

    x = z(l- f' 2)1/2(2+ l' 2)-1/2+31' ro/('p 2+2).

    Define z2 = S' (2+ j> 2) -1/2Z _2w(1...f 2)1/2(~+f' 2) -1.

    The value of °will be included in the Table of ExpectedValues with the others:

  • ,..

    e

    ..e

    Table of Expected Values.

    We will define 01j to reprosent all of the expected

    02108 L . (.f )..k tY-Avalues E dO db except the factors 1- and v •i j

    ,,\ 1\

    1, J will run from 1 to 3; ,9 will correspond to 1; (l) toA

    2; .and cr to 3. Note that all c1j must exist by the seme

    arsument as that used in (4.22) et~

    2 -3~E (j(j1;l = 011(1- f )-3/2= _1I(2n)-1 (1- J')(1+ f' )(1+,2»).

    exp(-.,2;l+ f 2) j( ,,+ f",(l- f2)1/2(1+f2) -1/2r-00

    fj210g L -1 [ 2]-1E af

  • ,..

    e:j;

    46

    2E 0 log L = -le (1- 0 )-1= -N f' l(r(1- 0 ) (1+f )1 -1as., acr cr 13 J ~ J .

    00

    (1+f2)-3/~exp(-a>2/1+f2).f[z(l- f''2)1/2-00

    00

    (1+y'2)-1exp (-(l)2/1+J'2) f[ Z(1_p'2)1/2-00 .

    00

    (2-f 2)-1/2exp [ "(1)'2/(2_9 2>} J{[Z(1_f'2)1/'2 .-00

    +2Nf'2 [0'(1-f)(1+j')] -'2(21f)-3/'2(2_.r 2)-1!2.

    e~_(1)2(1_ .f 2) -1)J •

  • ,,'

    e

    [{Z(1_r2)1/2(2_),2) -1/2+jW] 2 -(1- J2)J #

    exp(-z2/2)dz

    _Nj'4(21C)-3/2 [CY(1-f)(1+j»] -2(2_f 2 )-1/2

    exp [_W2(2_jP2)-1].00 . 4f [ z(1- f2)1/2{2_f~)-1/2+fW(2-f2) -1] •

    -00

    -Nf 2{21C) -1 [ (j" (1-'p) (1+.P ) ] -2 (2+j> 2) -1/2exp(_3CD2(2+f 2)-1) •

    00 2f [Z(1- y2)(2+,.p2)-1/2+3j>W(2+J,2)-1 ] •-00

    [G-1(Z2)+O-1( -Z2j exp( _z2/2)dz.In the above

    00

    G(u) =J exp( _t2/2)dt,u

    Z1 = f z(1+f2) -1/2-w(1-.f 2)1/2(1+),2) -1, and

    z2 =Y Z(2+j>2)-1/2_2ID(1_f 2)1/2(2+f2)-1.

    41

  • 48

    . Now, recalling the definition of the information matrix,

    we have

    It will be notational]y convenient to consider

    D = a lCOf(cll)1-1 for a moment.

    We know VCr*) as far as terms of order N-l from

    A -1Chapter II. V(?) as far as terms of order N is given

    -1 Aby D , and f> is an estimate of minimum variance. Hence,

    the efficiency of the coefficient of biserial correlation is

  • • 49

    3. The Efficiency of r*as 1'-7 :l.All of the work of Chapter IV Boes through for j> ·4 -1

    in exactly the same manner as for f' --+ 1. Hence, inresult (4.30) we need only replace.f by -f'. Note thefollow-ins facts: .

    (1) L+ 012 = L+ c13 = 0, because we are integrating.f~-l j>~-l

    an odd f~ction of z over symmetric lim!ta.

    /+ -1/2

    (i ') L _ +~-l 2 ; L+ zn =-3 z.~ zl - -eo Z f ~ -1 eof~:l -,

    (4.3l) L (1+P)-3/2Eff(r*) = f161f eXP«(J)2/2)} •f~:!t 1.

    [pq(ci+1) 2~ exp(a,2j +(2p-1) (2~)1/2exp(a}/2) - 3/2] -:

    00 . "-1

    fJ z2exp(_z2/2) [G-l (Z2-1/2j+G-l ( _Z3-1/ 2)] dZ} .

    -00

    Expression (4.31) is positive for all finite (J), so we ~

    state the followin[. theorem:

  • • 50THEOBllM II

    The coefficient of biserial correlation has limiting

    efficiency 0 for estimating .f when +f --;. -1.

    * '4. The Efficiency of r When.f = O.*In order to find the efficiency of r for J' =0 it will

    be easier to return to (4.11), (4.12), (4.13), (4.15),(4.17).

    Setting JP =0 provides a great simplification, and it may

    be shown rather directly that

    (4.32)2 '

    E(d 10~ L , f =0)= _N(2npq)-lexp(~2)of

    E(d210g L t () = 0) = -N(21fpq) -lexp( _c.o2)oc.o2 .j

    All mixed partials vanish. Recalli11€) the expression

    *for VCr ) from Chapter 1I(2.l), we have from (4.32) and (2.1)A

    (4.33) v(j> , f :: 0) = 2npqN..;lexp(c.o2)+O(N-2),

    Hence, we may state the following theorem:

    THEOREM III. The efficiency of the coefficient of biserial

    correla.tion for estimating j> is 1 when.f = O. This

    result was to be expected.

  • •• CHAPTER VSUMMARY A1ID INTERPRETATION OF RESULTS

    When f # 0 two coefficients of correlation have beenproposed for estimating ~ and for testin£ the hypothesis

    f = f 0 # 0, biserial r* and point biserial r. The as-sumption of an underlying bivariate normal distribution for

    *r is about the simplest of those which allow us to specify,and work with the Joint distribution of the continuous and

    discrete variables. It has been shown that the assumption

    of an underlying normal population With the point of dicho-

    tom-r at the mean produces minimum variance in the limit for

    * *r; and that for thiS case a simple statistic fer ), de-fined in Chapter III, Section 2, analoLouB to Fisher's z

    transform of product moment r, can be used for moderate to

    large N. Gra.ph1cal and mechanical methods are available, so

    *that the computa.tion of r 1 for ea.ch item in a test forexample, can be carried out ra.pidly and efficiently. How-

    *ever, r is not restricted to the interval (-1, +1); infact, it is unbounded as is illustrated in Figure 1. Also,

    . *while the efficiency of r has been shown to be 1 for+JP =0, it tends to 0 as J' tends to -1. This is a con-

    siderable defect, since we are especially interested in

    large correlations.

    * .In contradistinction to biserial r we have pointbiserial r, whose model does not specify any underlying

    distribution for the discrete variable. Point biserial r

  • 52

    is a maximum likelihood estimate of j> under the conditions

    of the model (See Chapter I, Section 6), and as such has

    efficiency I for large N no matter what value p takes.Corresponding to the property of minimum variance of

    * .biserial r for the point of dichotomy at the mean, that ism =0, we have a. similar'property for point biserial r.Usinr r for the model given is equivalent to using the t

    statistic. It is intuitively clear that if we are in-

    terested in measuring f' ' and. hence essentially the

    difference between two mean values, we will get better

    results in the for.m of greater power for the t test if

    the means are estim.ated from samples in which No=NI= N/2.While the assumption of nor.mality of the residuals which

    is used in connection with point biserial r is a restric-

    tion, there is some question as to how serious this re-

    striction is. Indeed, the theory of Least Squares is

    largely based on the same assumption. There is no ques-

    tion, however, as to the inherent danger in making the

    assumption of under1ying normality in connection with

    *biserial r •If the sample is sufficiently large we can get soma

    *indication as to whether the model for r , or that for ris more appl1.cable. If the model for r is applicable, the

    seta of variates (Xol"."XoN ) and (xIl"."xIN ), aS80-o 1

  • 53

    ciated with the zi which are 0 and 1 respectively, should be

    *normally distributed. It' the model:~for r is applicable,the,'Sete of variates cannot be normally distributed. In the

    * 7caSe of r an approximation to the moments E(X I Y ') ro)canba obtained from the quantiti9s C

    kIndefined in Section 1

    of Chapter III. We have from (3.:1.4)

    E(X 1Y)ro) =f A(ro) Ip, E(X2 / Y:> m) = (1- f ~.l.? 2[ P-kJ.)>..(ro~ Ip,

    and E(X3 t Y)ro) = 3f (1- f 2)>..(ro)/p+y3>..(m)(o}+2)/p. After

    *performing the substitutions .f = r , >"((1) = >..(t), ro = t,P =mIN (Sse Chapter I, Section 2), if we do not have a

    . -1 N1fairly close agreement between E(X? I Y;;- ro) and N1 i~l xli'

    *then the model for r is s~spect. To test the applicabilityof r we need only test the hj~othesis that Xoi (i=l,2, ••No)

    and XlJ (J=1,2, •• ,Nl ) are norrr~l. with equal variances.

    A variety of tests are avaHabJe for this purpose.

    It would seem from the evid.e:lCc :r.>resented that point

    biserial r is in most cases the better coefficient to use.

    While the results obtained will ~e valid only for awnples

    of Size N with a fixed p~ti~ion (Wo~N:») point biserial

    r in the fixed sample form is on f'l~rly good ~ound in

    view of its being a maximum likelihood eS1jim~.te ,of J;)Further.more, the concept of two fixed numbers for the

  • discrete variable, instead of an underlYin8 distribution of

    any kind, is more, satisfying to the psychologist in his

    efforts to obJectify the testing situation.

    If preliminary information pertaining to ID and J? isavailable, then the following tables may be used in order

    *to select that statistic r or r which from overall con-. siderations of efficiency, sample sizeJ and the menitudes

    of (l) and f seems most appropriate. The statistic chosenmust of course satisfy the requirement that its model is

    -Estimation of .?

    r

    r r

    r

    Moderate Large

    tfl

    *r*r

    small

    small j

    Moderate

    !j•

    I

    the more applicable in the sense of page 53. Two situa-

    tions are considered, the estimation of f J and the testof the hypothesis J' =5'0 or the placing of confidencel1m:lts on j .

    I Large *r r r

    .'1'est. ~f ...the "Hypothesis.f'=J1oor the Placing of Confidence.Limits on/'

    111

    ".

    Iwl

    small Moderate Large i

    small :r(r*) *fer } rModerate r r r

    t Lares r r j r

  • c,

    ..

    55

    *The recOJl:lll1endat1on is, then, use r for estimating f*when I!' is small, otherwise use r. Use f(r ) for testins

    f =.f 0 or placing confidence limits on f if'

  • •e' ..

    TABLE I

    The Asymptotic Standard Deviation of

    Biserial r * as a Function of p and f .All values must be divided by..rN.

    p or l-p

    : . 56

    · tfe,.,

    -.05 .10 .15 .20 . .25 .30 .35 .40 .45 . .50

    0 4.466 2.922 2.345 2.041 1.85:7 1.137 1.658 1.608 1.580 1.511

    .10 2.104 1.699 1.521 1.419 1.353 1.308 1.21811.258 1.241 1.243

    .20 2.011 1.668 1.491 1.;89 1.323 1.219 1.2481.228 1.211 1.213

    .30 2.033 1.616 1.440 1.339 1.273 1.229 1.198 1.179 1.167 1.163

    .40 1.971 1.543 1.370 1.269 1.a03 1.159 1.128 1.109 1.097 1.093

    .50 1.893 1.449 1.219 1.179 1.114 1.069 1.038 1.019 1.008 1.004

    .601 1.199 1.333 1.167 1.069 1.004 0.960 0.930 0.910 0.898 0.894

    .10 1.691 1.194 1.034 0.939 0.815 0.831 0.801 0.781 0.169 0.766

    .80 1.569 1.031 0.881 0.789 0.121 0.683 0.653 0.632 0.620 0.616

    .90 1.438 0.842 0.105 0.619 0.559 0.517 0.486 0.465 0.453 0.449

    1.00 1.302 0.616 0.503 0.429 0.314 0.335 0.304 0.283 0.270 0.266

  • . -*r .~

    .50-~·1I.'H'·-II-,

    .~v-j

    .20 _.~.J

    .10 _.~

    1o --'-Xr, - x

    J"l ( --)2Nt x-x

    o

    Lo1-f-.20

    ~r-· 301-j1_.....0I -

    t.~- .501-l- ."0/_.. •10

    t ,SO

    57

    ~ is the larger sample mean.~ is the number _of values which make up i,...r is the coefficient of biserial correlat!on.- -

    r* is obtained by computing i and :xr. - x ~, laying aJ"l .

    jE(x-i)2

    straiJht-edge on the three scales, and readins the results onthe r scale.

  • 58

    BIBLIOGRAPHY

    1.

    2.

    4.

    ,-Cramer, H., Mathematical Methods ot Statistics,

    Princeton university Press, 1946.

    DaVid, F. N., Ta~les of the Correlation Coefficient,Cambridge university Press, 1938.

    DUBois, P. H., "A Note on the Computation of Biserialr in Item Validation," Psychometrika,Vol. VII, (1942), pp. 143-146.

    Dunlap, J. W., "A Nomograph for Computing BiserialCorrelations," PSlchometrika, Vol. I,(1936), PP. 59-60.

    •e1,

    ,.

    5. Fisher, R. A., "On the Mathematical Foundations ofTheoretical Statistics," PhilosophicalTransactions of the Royal Society A,Vol. dCxxfI, (1921), pp. 309-366.

    6. Fisher, R. A., Statistical Methods for Research'workers, New York City, Hafiier, 10th.Edition 1946.

    7. Fisher, R. A., "Theory of Statistical Estimation,"Proceedings of the Cambridge PhilosophicalSociety, Vol. XXII, (1925), pp. 700-725.

    8. Johnson, N. L.and Welch, B. L., "Applications ofthe Non-centJ!"6.l t Distribution,"Biometrika, Vol. XXXI, (1940), pp. 362-389.

    9. Kendall, M. G., The Advanced Theory of'Statisticli8Vol. II, London, ciiarles Griffin, 19 •

    10. Lev, J., "The Point IU.ser1al Coefficient of Correla-tion," Anna.ls of Mathematica.l Statistics,Vol. XX No.1, (1949), pp. 125-126.

    11. Ne,man, J., "Outline of a Theory of StatisticalEstimatj,on," Philoqophical Trans-actions of the Royal Society A,fol. ccXXXVf, (1937), p. 333.

    12. Pearson, K., "on a New Method. for Deter'lUning theCorrelation between a Measured CharacterA, and a Character B," Biometrika,Vol. VII, (1909), pp. 96-105.

  • e1'1

    ~,. Robbins, R. E. and. lioeffd1ng, W., "The Central LimitTheorem for Random Variables," DukeMattematical •.Tonrna.l, Vol. 1Y No.3,(191;8), 1>P.· 775e ·18o.

    14. Royer,E. B., "Punched Card Methods for DeterminingBiserial Correlat-:or~sJ " Psycnomotr1ka,Vo. VI No.1, (1941.), 1'1>.5.5:59-:

    15. Soper, R. E., "On tho Proba.blEJ Error fo:i.~ the BiserialExpression for the Correla~lonCoofficient,"Biomo~, Vol. X, (191.3), 1'p. 384"390.

    16. Stalnaker, J. and Pichard.son, M. W., "A Note on theUse of B:i.scrial r in TestRGsoe,rch,"Journa.:"" ()f' Genera.l P9~·c.holoey, Vol. VIII,(1933),:pp7~)~465.

    59