Zimmerman 1997

download Zimmerman 1997

of 13

Transcript of Zimmerman 1997

  • 7/26/2019 Zimmerman 1997

    1/13

    Sage Publications, Inc., American Statistical Association and American Educational Research Association arecollaborating with JSTOR to digitize, preserve and extend access to Journal of Educational and Behavioral Statistics.

    http://www.jstor.org

    A Note on Interpretation of the Paired-Samples t TestAuthor(s): Donald W. ZimmermanSource: Journal of Educational and Behavioral Statistics, Vol. 22, No. 3 (Autumn, 1997), pp. 349-

    360

    Published by: andAmerican Educational Research Association American Statistical AssociationStable URL: http://www.jstor.org/stable/1165289Accessed: 25-12-2015 18:58 UTC

    F R N SLinked references are available on JSTOR for this article:http://www.jstor.org/stable/1165289?seq=1&cid=pdf-reference#references_tab_contents

    You may need to log in to JSTOR to access the linked references.

    Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp

    JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of contentin a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.For more information about JSTOR, please contact [email protected].

    This content downloaded from 128.240.233.146 on Fri, 25 Dec 2015 18:58:05 UTCAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/http://www.jstor.org/publisher/aerahttp://www.jstor.org/publisher/astatahttp://www.jstor.org/stable/1165289http://www.jstor.org/stable/1165289?seq=1&cid=pdf-reference#references_tab_contentshttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/stable/1165289?seq=1&cid=pdf-reference#references_tab_contentshttp://www.jstor.org/stable/1165289http://www.jstor.org/publisher/astatahttp://www.jstor.org/publisher/aerahttp://www.jstor.org/
  • 7/26/2019 Zimmerman 1997

    2/13

    TEACHER'S

    CORNER

    A

    Note

    on

    Interpretation

    of the

    Paired-Samples

    t Test

    Donald W. Zimmerman

    Carleton

    University

    Keywords:

    correlated

    samples,

    difference

    scores,

    independent

    samples,

    matched

    pairs,

    nonindependence,pairedsamples, power,

    t

    test,

    Type

    I

    error,

    Type

    I

    error

    Explanations

    of

    advantages

    and

    disadvantagesof

    paired-samplesexperimental

    designs

    in textbooks in education and

    psychology

    frequently

    overlook

    the

    change

    in

    the

    Type

    I

    error

    probability

    which occurs when an

    independent-

    samples

    t

    test is

    performed

    on correlated observations.

    This alteration

    of

    the

    significance

    level can

    be

    extreme

    even

    if

    the

    correlation

    is small.

    By

    compari-

    son,

    the loss

    of

    power

    of

    the

    paired-samples

    t

    test

    on

    difference

    scores due

    to

    reduction

    of degrees

    of

    freedom,

    which

    typically

    is

    emphasized,

    is

    relatively

    slight.

    Althoughpaired-samples

    designs

    are

    appropriate

    and

    widely

    used when

    there is a natural correspondenceor pairing of scores, researchers have not

    often

    considered the

    implications

    of

    undetectedcorrelationbetween

    supposedly

    independent amples

    in the

    absence

    of

    explicit pairing.

    Many experimental designs

    in

    education,

    psychology,

    and

    social

    sciences

    employ paired

    or matched observations.

    A

    familiar

    example

    is

    repeated

    mea-

    sures on

    the

    same

    subjects

    over a

    period

    of time.

    Some

    significance

    tests of

    location,

    including

    the

    independent-samples

    tudent

    t

    test

    are not

    appropriate

    or

    these

    designs,

    because

    the

    measures

    usually

    are correlatedrather

    han

    indepen-

    dent.

    Researchers

    typically analyze

    paired

    data

    using

    the

    paired-samples

    t

    test,

    which

    essentially

    is a

    one-sample

    Studentt

    test

    performed

    on

    difference scores.

    Applied

    statisticians

    generally

    are

    aware

    of the

    advantages

    and

    disadvantages

    of

    this test.

    First,

    the correlation

    associated

    with

    pairing

    or

    matching

    of observa-

    tions reduces the

    standard

    error of the

    difference

    between

    means,

    so

    the error

    term

    differs from

    that of

    the

    independent-samples

    est.

    This is

    apparent

    rom

    the

    equation

    2 2

    2

    u_ = ug +

    op-

    2p7o o y.

    The

    correlation erm reduces

    the

    variance

    of the

    difference

    between means

    and

    increases the t

    ratio.

    In

    the

    context

    of

    interval

    estimation,

    the

    reduced

    standard

    This

    research

    was

    supported

    by

    a

    Carleton

    University

    research

    grant.

    A

    listing

    of the

    computerprogram,

    written n

    Turbo

    BASIC,

    Version 1.0

    (Borland, Inc.)

    can be

    obtained

    by

    writing

    to the

    authorat

    15078

    Eagle

    Place,

    Surrey,

    BC V3R

    4W2,

    Canada.

    349

    This content downloaded from 128.240.233.146 on Fri, 25 Dec 2015 18:58:05 UTCAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/26/2019 Zimmerman 1997

    3/13

    Teacher's Corner

    error

    results

    in a

    narrowerconfidence

    interval.

    For this

    reason,

    an

    experimental

    design

    involving

    paired

    observations

    can

    more

    accurately

    detect differencesin

    which

    a

    researcher

    s

    interested.

    Similar

    ogic

    applies

    to

    within-subjects

    ANOVA

    as

    opposed

    to

    independent-groups

    ANOVA.

    Second,

    this

    gain

    is

    partly

    offset

    by

    a

    loss of

    degrees

    of freedom.

    The

    one-sample

    t

    statisticbased

    on n

    pairs

    is evaluated

    at n -

    1

    degrees

    of

    freedom,

    while

    the

    two-sample

    t is

    evaluated at

    2n

    - 2

    degrees

    of

    freedom.

    Therefore,

    authors

    emphasize

    that the

    paired-samples

    est is

    preferable

    f the two

    groups

    are

    highly

    correlated,

    while the

    independent-samples

    est

    is the

    better

    choice if

    they

    are

    uncorrelated or

    only slightly

    correlated. Authors

    usually

    do

    not advise

    explicit matching

    or

    pairing

    of

    subjects

    in an

    experimental

    design

    and subse-

    quentuse of a paired-samples test,unless this procedureproducesa substantial

    correlation.

    For

    example,

    Kurtz

    (1965)

    summarized

    he

    thinking

    of

    many

    inves-

    tigators

    as follows.

    The

    advantage

    f

    pairing

    s

    seen

    to

    depend

    n the

    closeness

    of the

    relationship

    established

    etween he

    two

    sets

    of observations s a result

    of

    pairing.

    f

    a

    sufficiently igh

    relationship

    s

    established,

    he reduction

    f

    the

    variance f the

    difference

    more

    han

    ompensates

    or the

    degrees

    f

    freedom

    ost

    as

    a resultof

    pairing;

    f

    only

    a low correlation

    s

    established,

    he

    gains

    resulting

    rom

    reduction f the variance

    f

    thedifference

    may

    be

    more

    hanoffset

    by

    the

    loss

    of degrees f freedom.p.213)

    More

    recently,Hays

    (1988)

    wrote,

    Such

    matchingmay

    be less efficient

    han

    he

    comparison

    f

    unmatchedandom

    groups,

    unless

    he factorused

    n

    matching

    ntroduces

    relatively trong

    posi-

    tive

    relationship

    etween hemeans.

    Although positive elationship,

    eflected

    in

    a

    positive

    ovariance

    erm,

    does reduce he standardrror f the

    difference,

    this

    procedure

    lso

    halves he

    number f

    degrees

    of freedom.

    Dealing

    with a

    sample

    f N

    pairsgivesonlygroups

    f

    N

    caseseach.

    Thus,

    f

    the

    factor

    ntering

    intothe

    matching

    s

    onlyslightly

    elevant

    o

    thedifferences

    etween

    he

    groups

    or is evenirrelevantosuchdifferences,

    matching

    s not a desirable

    rocedure.

    (p.

    315)

    And Edwards

    (1979)

    noted that

    the

    average

    alueof thecovariancemustbe

    sufficientlyarge

    o offset

    the

    fact

    that

    for

    the same

    number

    f

    observations,

    MSsT

    will

    have fewer

    degrees

    of

    freedom

    han

    MSw

    and

    will

    thus

    require larger

    alue

    of

    F

    for

    significance.

    (p.

    128)

    See also introductory extbooksby Howell (1987, pp. 204-206), Loether and

    McTavish

    (1993,

    p.

    554),

    and

    Pagano

    (1986,

    pp.

    301-304).

    These

    recommenda-

    tions are

    typical

    of

    many

    authors,

    although

    the

    relative

    emphasis

    placed

    on

    reductionof

    the

    standard

    rrorand

    reductionof

    degrees

    of freedom

    varies

    from

    one text

    to another.

    The

    simulations

    n

    the

    present

    study

    reveal that

    this

    advice

    must

    be

    qualified

    and

    that

    pairing

    sometimes

    is

    associated with

    a

    large

    difference

    in

    the

    efficiency

    350

    This content downloaded from 128.240.233.146 on Fri, 25 Dec 2015 18:58:05 UTCAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/26/2019 Zimmerman 1997

    4/13

    Teacher's

    Corner

    of the two

    significance

    tests

    when the

    correlation s

    quite

    small.

    In the case

    of

    naturally

    paired

    data,

    even correlations of

    .10, .15,

    or .20 make the

    paired-

    samples

    t test

    mandatory

    n order

    to

    protect against

    distortionof the

    significance

    level. These

    conclusions are based

    on examinationof

    power

    functions as well

    as

    degrees

    of freedom and

    Type

    I

    errors.

    The

    present study

    also

    compares

    the two tests from

    another

    point

    of view

    and

    focuses attention

    on

    an

    aspect

    of the

    problem

    which has been overlooked.

    The

    comparison

    of the

    two

    procedures frequently

    made in textbooks fails to

    take

    account of an

    important

    effect:

    Nonindependence

    of observations

    depresses

    both

    Type

    I error

    probabilities

    and the

    power

    of the test to detect differences.

    In

    other

    words,

    a

    correlationbetween

    samples

    thatarebelieved to be

    independent

    compromisesnot only the efficiencybut also the validityof the significancetest.

    Furthermore,

    he

    change

    that occurs

    is

    quite large.

    Many years ago,

    Cochran

    (1974),

    Scheff6

    (1959),

    Walsh

    (1947),

    and

    others

    discovered that violation

    of

    the

    independence

    assumption

    underlying

    he

    t

    and

    F

    tests distorts

    Type

    I

    and

    Type

    II

    error

    probabilities.

    (See

    also a

    recent

    study

    by

    Zimmerman,Williams,

    &

    Zumbo,

    1993.)

    However,

    investigators

    have

    not

    con-

    sidered these results

    in

    the

    context

    of

    paired-samples xperimental

    designs.

    The

    present

    note examines some

    implications

    of

    nonindependence

    of

    observations,

    as

    investigated

    in

    these

    studies,

    for

    interpretation

    f the

    paired-samples

    statis-

    tic.

    Paired Data

    and

    Nonindependence

    of

    Observations

    A

    simulation

    study

    consisted

    of

    performing

    independent-samples

    Student t

    tests

    and

    paired-samples

    tests on

    samples

    from

    a

    normal

    population.

    Although

    it

    is

    possible

    to

    calculate

    the

    power

    of these

    tests

    analytically,

    a

    comparison

    of

    the two

    tests is not

    possible

    without

    taking

    into

    consideration he changein Type

    I

    error

    probabilities

    discussed above. In

    the

    present

    study,

    a

    computeralgorithm

    induced correlations

    ranging

    from -.50

    to .50

    by

    adding

    a

    multiple

    of

    one

    random

    variable

    to each

    of two

    other random

    variables,

    the

    multiplicative

    constant

    being

    chosen

    to

    produce

    the

    desired

    correlation

    coefficient.

    The

    algorithm

    generated

    N(0,

    1)

    normal

    deviates

    by

    the

    method

    of Box

    and

    Muller

    (1958),

    based on the

    transformation

    X

    =

    (-2

    log

    Ul)1/2

    cos

    27rU2,

    where

    U1

    and

    U2

    are

    uniformly

    distributed

    pseudorandom

    numbers on the

    interval

    (0, 1).

    In

    successive

    replications,

    constants

    were

    added to all

    scores in

    one

    group

    in incrementsof .5o, 1.25o, or 1.5u in orderto determinebothTypeI andType

    II

    errors.

    Sample

    sizes

    ranged

    from 10

    to 80. The

    study

    performed

    both one-

    tailed and

    two-tailed

    tests at the

    .05

    significance

    level. Each

    data

    point

    repre-

    sents

    10,000

    replications

    of

    the

    sampling

    procedure

    and

    subsequentsignificance

    tests. The

    purpose

    of

    the

    simulations

    was to

    illustrate the

    arguments

    in

    the

    present

    note,

    and

    they

    were

    not

    intended

    to be an

    exhaustive

    study

    of

    properties

    of the

    t

    test.

    351

    This content downloaded from 128.240.233.146 on Fri, 25 Dec 2015 18:58:05 UTCAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/26/2019 Zimmerman 1997

    5/13

    Teacher's Corner

    Two Concomitant Effects: Failure to Maintain the

    Significance

    Level and

    Reduction of the Power of

    the Test

    First,considerthe two curvesin the lower section of Figure1, whichplots the

    probability

    of

    rejecting

    the

    null

    hypothesis

    as a

    function

    of the correlation

    between

    paired

    observations,

    for both the

    independent-samples

    test and

    the

    paired-samples

    test,

    when

    the null

    hypothesis

    is false. There

    were

    20

    pairs

    of

    observations,

    and

    the

    difference

    between the

    means of the

    two

    populations

    was

    1.5o.

    It

    is

    apparent

    that the

    efficiency

    of

    the

    paired-samples

    test

    increases

    systematically,

    while

    that of the

    independent-samples

    est

    decreases,

    as the

    correlation

    increases

    from

    -.50 to

    .50.

    When

    the correlation is

    zero,

    the

    independent-samples

    est

    is

    slightly

    more

    powerful

    than the

    paired-samples

    est.

    This resultis consistent with ourpreviousdiscussion,although investigatorsdo

    not

    usually

    consider

    negative

    correlations

    n

    the

    present

    context.

    Examination of

    the

    upper

    section

    of

    Figure

    2,

    again

    based on 20

    pairs

    of

    observations,

    reveals a

    somewhat different

    pattern.

    In

    the

    simulations

    repre-

    sented

    in

    this

    graph,

    there were

    no differences between

    population

    means,

    so

    that

    the curves

    represent

    the

    probabilities

    of

    Type

    I

    errors.The

    paired-samples

    test maintains

    the

    probability

    close

    to

    the

    .05

    significance

    level

    despite

    the

    increasing

    correlation.The

    independent-samples

    est, however,

    exhibits

    a

    rather

    large change

    as the

    correlation ncreases. Even a

    correlationof

    only

    .10

    or .20

    has a substantial nfluence on this test.Because of this changein theTypeI error

    probability,

    the

    values

    plotted

    for the

    independent-samples

    est

    in

    the

    lower

    section

    of

    Figure

    1

    cannot be

    interpreted

    s

    the

    power

    of the

    test.

    Consequently,

    the values

    are

    not

    comparable

    o those of the

    paired-samples

    est.

    Implications

    of

    the

    alterationof the

    significance

    level

    are further

    llustrated

    by

    Figure

    2.

    The

    upper

    section of

    the

    figure

    shows

    power

    functions of

    both tests.

    In

    this

    graph,

    here are 20

    pairs

    of

    scores,

    the

    correlation

    s

    zero,

    and

    the difference

    between means

    increases from 0

    to

    4.5o

    in

    increments of

    .5o.

    Apparently,

    he

    independent-samples

    est is

    slightly

    more

    powerful

    than the

    paired-samples

    est.

    The difference

    between the two

    curves is

    accounted

    for

    by

    the

    fact

    that the

    paired-samples

    test is

    based on

    9

    degrees

    of

    freedom

    (critical

    value

    of

    t

    of

    2.262),

    while the

    independent-samples

    est

    is

    based on

    18

    degrees

    of

    freedom

    (critical

    value

    of

    t

    of

    2.101).

    In

    the

    data

    plotted

    in

    the

    lower

    section,

    the

    correlation

    between

    paired

    observations

    is

    .30. In

    this

    case,

    the

    paired-samples

    test

    dominates the

    independent-samples

    test.

    However,

    the

    Type

    I

    error

    probability

    of the

    independent-samples

    est

    declines

    to

    .023,

    while

    that

    of

    the

    paired-samples

    est

    remains

    close to .05.

    For this reason, the two power curves are not compa-

    rable.

    Similarly,

    in

    the

    lower

    section of

    Figure

    1,

    one

    cannot

    conclude

    that

    the

    independent-samples

    est is

    preferable

    or

    negative

    correlations,

    because

    of the

    large

    difference

    in

    Type

    I

    error

    probabilities

    exhibited in

    the

    upper

    section.

    The

    third curve

    in

    Figure

    2,

    labeled

    adjusted,

    represents

    the

    paired-samples

    test

    performed

    at

    the

    .023

    significance

    level.

    This

    adjustment

    of the

    significance

    level

    to allow

    for the

    change

    in

    Type

    I

    error

    probability

    makes

    the

    two

    functions

    352

    This content downloaded from 128.240.233.146 on Fri, 25 Dec 2015 18:58:05 UTCAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/26/2019 Zimmerman 1997

    6/13

    n

    =

    20

    0.12

    I

    0.11

    O

    t-Independent

    S0.10

    O

    t-Paired

    O

    0.09

    0

    o

    0.08

    0.07

    4

    0.06

    0

    0.05

    -

    -

    0.04

    QQ3

    -Q

    0

    L

    0.02

    -

    0.01

    -

    0.00

    -0.5 -0.4 -0.3 -0.2

    -0.1 0.0

    0.1

    0.2 0.3 0.4

    0.5

    Correlation

    n

    =

    20

    0.55

    0.5o

    -

    t-Independent

    o

    0

    t-Paired

    I

    0.45

    C

    0.40

    S

    0.35

    -?-

    - - -

    -

    -

    -

    -

    - - - -

    -

    - 0.30

    --------

    0

    0.25

    S0.20

    S0.15

    0

    L

    0.10

    0.05

    0.00

    -0.5 -0.4 -0.3

    -0.2 -0.1 0.0

    0.1

    0.2 0.3

    0.4 0.5

    Correlation

    FIGURE

    1.

    Probability

    of

    rejecting

    Ho

    by

    the

    independent samples

    t test and the

    paired-samples

    t test as a

    function

    of

    correlation

    Note. The differencebetween

    population

    means is zero

    in

    the

    upper

    section and

    1.5ar

    n

    the

    lower

    section.

    This content downloaded from 128.240.233.146 on Fri, 25 Dec 2015 18:58:05 UTCAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/26/2019 Zimmerman 1997

    7/13

    n

    =

    20

    p

    =

    0

    1.0

    0.9

    O

    t-Independent

    0 t-Paired

    -1-

    0.8

    .

    0.7

    .C

    0.6

    0.6

    0.5

    S0.4

    c3

    0.3

    -0

    0

    L

    0.2

    0.1

    0.0

    I

    I

    0

    1 2

    3

    4 5

    6

    7

    8 9

    Difference

    in

    Standard

    Units

    n

    =

    20

    p

    =

    .30

    1.0

    0.9

    0

    t-Independent

    o

    *

    t-Paired

    0

    0.8

    V

    t-Adjusted

    S

    0.7

    0

    0.6

    >

    0.4

    .-0

    0.3

    O

    0.2

    L-

    0.1

    0.0

    0

    1 2

    3 4

    5 6

    7

    8

    9

    Difference in

    Standard

    Units

    FIGURE

    2.

    Probability of rejecting

    Ho

    by

    the

    independent-samples

    test,

    the

    paired-

    samples

    t

    test,

    and

    the

    paired-samples

    t test

    with an

    adjusted

    significance

    level as a

    function

    of

    the

    difference

    between

    means

    (increments

    of

    .5cr)

    Note. The

    correlation

    s zero

    in

    the

    upper

    section

    and .30

    in the lower

    section.

    This content downloaded from 128.240.233.146 on Fri, 25 Dec 2015 18:58:05 UTCAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/26/2019 Zimmerman 1997

    8/13

    Teacher'sCorner

    comparable.

    The

    modified curve

    remains

    slightly

    below that of

    the

    independent-

    samples

    test

    over the entire

    range

    of

    differences

    between means.

    This

    slight

    disparity

    of the two curves

    apparently

    eflects differences

    in

    degrees

    of

    freedom.

    It is evident from these

    figures

    that even a moderate correlation between

    observations has a

    stronger

    influence on

    the

    probability

    of

    Type

    II

    errors

    and

    power

    than does

    reduction

    of

    degrees

    of

    freedom

    from

    18

    to

    9.

    Table

    1

    provides

    simulation data for

    sample

    sizes of

    10, 20,

    40,

    and 80

    for

    both one-tailed and two-tailed

    tests. The

    difference between

    means

    increased

    in

    incrementsof

    1.25r.

    It

    is

    evident

    that

    depression

    of the

    Type

    I

    error

    probability

    of the

    independent-samples

    est

    occurs

    consistently

    for

    all

    sample

    sizes

    exam-

    ined.

    Furthermore,

    he

    relative

    advantage

    of the

    paired-samples

    est for

    corre-

    lated

    samples

    is

    apparent

    or

    all

    sample

    sizes.

    Conclusions

    Inspection

    of

    Figures

    1

    and 2

    and

    Table

    1

    certainly

    confirms the

    widespread

    belief

    among

    researchersand

    applied

    statisticians hat

    one

    should

    substitute

    the

    paired-samples

    t

    test

    for the

    independent-samples

    est whenever

    subjects

    are

    coupled

    or

    matched

    in

    some

    way

    in

    an

    experimental

    design.

    The

    magnitude

    of

    the effect

    producedby

    slight

    correlations

    probably

    s

    greater

    han

    most

    research-

    ers

    realize. The

    present

    results

    disclose that even

    a

    correlation of .10 or

    .20

    seriously distortsthe significance level of the t statisticbased on

    independent

    samples.

    When

    power

    functions are

    examined,

    it

    is

    apparent

    hat

    advantages

    of

    the

    paired-samples

    est are not

    negligible

    for

    small

    correlations

    and are

    excep-

    tional

    for

    correlationsas

    high

    as

    .40 or

    .50.

    We now

    examine the

    problem

    from

    another

    point

    of view. In

    making

    compari-

    sons

    in

    the

    present

    context,

    one can

    ask two

    distinct

    questions.

    The first

    question

    is,

    What

    gain

    in

    efficiency

    results

    from

    using

    a

    matched-pairs

    experimental

    design

    instead of an

    independent-samples

    esign,

    if

    matching

    nduces

    a

    correla-

    tion?

    The

    answer

    to this

    question

    is

    found

    by

    comparing

    he

    curve

    representingthe

    paired-samples

    test

    in

    the

    lower

    section of

    Figure

    1

    with

    the

    horizontal

    broken line.

    The

    line

    represents

    a

    constant

    probability

    of

    .308,

    which

    is the

    power

    of the

    independent-samples

    est

    when

    the

    correlation

    s

    zero.

    This com-

    parison

    makes it

    clear that

    the

    advantage

    of

    the

    paired-samples

    design

    becomes

    greater

    as the

    correlation

    ncreases from

    0

    to

    .50,

    and

    that the

    advantage

    s

    quite

    large

    for

    higher

    correlations.

    This

    outcome is

    consistent

    with the

    usual

    interpre-

    tation

    of the

    two

    tests. Of

    course,

    the

    amount

    of

    gain

    depends

    on

    the

    parameters

    chosen for

    this

    particular

    example.

    The

    figure

    also

    reveals

    that

    a

    negative

    correlation

    results

    in

    a

    loss

    rather han a gain.

    A

    second

    question

    is,

    What

    loss

    occurs

    if

    one

    performs

    the

    independent-

    samples

    t

    test

    inappropriately

    n

    measures

    which

    are

    correlated?

    This

    question

    is

    somewhat

    more

    complicated,

    but it

    has

    significant

    practical

    applications.

    The

    answer

    can

    be

    found

    by

    inspecting

    the

    two

    curves

    (open

    circles

    and

    filled

    circles)

    in

    the

    lower

    section

    of

    Figure

    1.

    These

    curves

    reveal

    that

    the

    difference

    in

    the

    probabilities

    of

    rejecting

    the

    null

    hypothesis

    for

    the

    two

    tests

    becomes

    355

    This content downloaded from 128.240.233.146 on Fri, 25 Dec 2015 18:58:05 UTCAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/26/2019 Zimmerman 1997

    9/13

    TABLE

    1

    Probability

    of rejecting

    Ho

    by independent-samples

    test and

    paired-samples

    t test

    for

    various

    numbers

    of

    pairs

    (n)

    and

    degrees of

    correlation

    between

    samples

    (p)-one-tailed

    and two-tailed

    tests

    p

    =

    0

    p

    =

    .20

    p

    =

    .40

    t

    t

    t t t t

    n

    D

    indep. paired

    indep. paired

    indep.

    paired

    One-tailed tests

    10

    0

    .049

    .050 .035 .051

    .022 .051

    1

    .297

    .283

    .284

    .331

    .249 .391

    2 .716 .683 .734 .766 .762 .860

    3

    .952

    .937

    .964 .968 .980

    .993

    20

    0

    .048

    .047

    .034

    .052 .019

    .051

    1

    .295

    .285

    .288

    .345

    .261

    .417

    2 .724 .711 .748 .797 .786 .891

    3 .957

    .951

    .977 .984 .988

    .997

    40

    0

    .052

    .051

    .034 .051 .019 .050

    1

    .309

    .304

    .288 .350

    .258

    .426

    2

    .744

    .735 .755

    .807 .792 .898

    3 .962 .961 .977 .986 .988 .996

    80

    0

    .050

    .050 .033

    .048

    .017 .051

    1

    .317 .314

    .279

    .346

    .267

    .435

    2

    .743 .736 .761 .812

    .791 .899

    3

    .962 .958

    .978

    .986 .989 .997

    Two-tailedtests

    10

    0

    .051 .050 .030 .048 .016 .051

    1

    .199

    .182 .170 .211

    .149

    .270

    2

    .583 .531 .601 .634

    .609

    .758

    3 .899 .863 .926 .931 .946 .976

    20

    0

    .050

    .048 .031 .051 .014

    .049

    1

    .195 .185

    .175

    .229

    .145

    .288

    2

    .603 .580 .622

    .684 .636

    .798

    3

    .921 .903 .941

    .953 .961 .988

    40

    0

    .051 .051 .030

    .050 .013

    .052

    1

    .218

    .215 .178

    .238

    .143

    .302

    2

    .630 .620

    .636 .710

    .652 .820

    3

    .930 .923

    .944

    .963 .966 .991

    80 0 .051

    .051 .030

    .049 .0

    13

    .051

    1

    .215

    .210

    .185 .246

    .149 .316

    2

    .627 .617

    .649 .719 .672

    .834

    3

    .926

    .922 .952

    .969 .972

    .992

    Note.

    Differences re

    n

    units

    of

    1.25u.

    This content downloaded from 128.240.233.146 on Fri, 25 Dec 2015 18:58:05 UTCAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/26/2019 Zimmerman 1997

    10/13

    Teacher's Corner

    larger

    as the correlation

    increases.

    The

    paired-samples

    est dominates

    when

    a

    positive

    correlationexceeds

    about

    .05,

    and the

    independent-samples

    est

    domi-

    nates

    when

    the correlation

    s

    negative

    or zero.

    As mentioned

    earlier,however,

    the

    upper

    sectionof

    Figure

    1 discloses thatthe

    differences

    in

    the

    lower section

    cannot be

    interpreted

    s differences in the

    power

    of the

    test,

    because

    the

    Type

    I

    error

    probability

    changes

    as the correlation

    changes.

    The two sections

    of this

    figure

    together suggest

    that a

    correlation

    spuriously

    elevates

    or

    depresses

    the entire

    power

    function of the

    independent-

    samples

    test. Instead of

    referring

    to

    power

    differences,

    one must state

    simply

    that

    nonindependence

    ompromises

    the

    validity

    of

    the test and makes

    the

    power

    to

    detect differences

    uninterpretable. lthough

    explicit

    matching

    s efficient

    only

    for

    positive correlations,

    his

    spurious

    alteration

    of the

    significance

    level occurs

    for both

    positive

    and

    negative

    correlations.

    Authors

    have not often asked

    the second

    question,

    even

    though

    it has

    practical

    implications

    for research.

    Perhaps

    an

    experimenter

    s

    unaware

    of some inciden-

    tal

    pairing

    which induces

    a correlationbetween

    measures of a

    dependent

    vari-

    able.

    In

    other

    words,

    a researcher

    may

    believe

    samples

    to

    be

    independent

    when

    in

    reality they

    are

    correlated,

    although perhaps

    only

    slightly.

    Violation

    of ran-

    dom

    assignment

    of

    subjects

    to

    experimental

    reatments s one

    possible

    source

    of

    such a

    correlation,

    which can

    invalidate

    the

    independent-samples

    test. Another

    source was identified and studiedby Coren and Hakstian 1990) andby Zumbo

    (1996).

    These

    investigators

    examined

    designs

    in

    which each

    subject

    contributes

    two scores to the data

    pool-for example,

    measures of two

    eyes,

    two

    ears,

    and

    so

    on,

    in

    perceptual

    esearch.

    Researchers ometimes

    analyze

    this kind of data

    as

    if

    all measures

    are

    independent,

    ignoring

    the correlation

    induced

    by pairing.

    This kind

    of

    violation

    is

    sometimes difficult to detect

    in

    otherwise

    well-

    designed

    experiments

    and

    probably

    occurs

    more

    often

    in

    researchstudies than is

    generally

    realized.

    Undoubtedly,

    t can

    markedly

    nfluence the

    significance

    level

    and the

    probability

    of

    rejecting

    Ho.

    For

    this

    reason,

    the hazards of

    inappropri-

    ately using an independent-samplesest probablyare more serious than the loss

    of

    degrees

    of freedom

    resulting

    from

    using

    a

    paired-samples

    est when

    it

    is

    not

    required.

    Sometimes

    researchers

    ail

    to

    identify negative

    correlationsor

    overlook

    the

    fact that

    negative

    correlations

    n

    paired

    data

    have effects

    quite

    different from

    positive

    correlations

    (see

    Figure

    1).

    It is

    apparent

    rom the

    equation

    presented

    earlier that

    they

    result

    in

    wider confidence

    intervals

    and

    decreased

    sensitivity

    of

    the

    paired-samples

    design.

    A

    negative

    relationship

    between

    naturally

    paired

    subjects

    is conceivable

    in

    some

    practical

    research

    contexts. For

    example, Hays

    (1988,

    p.

    314)

    suggested

    that

    measures of

    personality

    dominance of

    husband-

    wife

    pairs

    could be

    negatively

    correlated

    f

    highly

    dominant

    women are

    paired

    with men

    having

    low

    dominance

    ratings. Matching

    on the

    basis of

    husband-wife

    pairs

    therefore

    could elevate the

    probability

    of

    Type

    I

    errorsof the

    independent-

    samples

    test

    and

    at

    the same time

    reduce

    the

    power

    of

    the

    paired-samples

    est,

    as

    indicated

    in

    Figure

    1. One

    can envision

    other

    negative

    relationships

    of this

    sort

    357

    This content downloaded from 128.240.233.146 on Fri, 25 Dec 2015 18:58:05 UTCAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/26/2019 Zimmerman 1997

    11/13

    Teacher's

    Corner

    in

    education,

    psychology,

    and the social sciences.

    Many

    of these

    relationships

    may

    be

    quite

    difficult

    to

    detect,

    although

    some can be avoided

    easily

    in

    experi-

    mental

    designs.

    The

    example

    in Table2 illustratessome of the

    practical mplications

    of these

    conclusions.

    Suppose

    a researcherbelieves that the scores

    in the two

    left-hand

    columns,

    labeled

    X

    and

    Y,

    comprise

    independentsamples.

    A Student t test

    fails

    to

    reject

    Ho

    at

    the

    .05

    significance

    level.

    Now,

    assume that there exists an

    unknown

    correspondence

    of scores

    as

    indicated

    in the next three

    columns.

    The

    second

    X

    and

    Y

    columns are

    permutations

    f the two left-hand

    columns with

    the

    hidden

    pairing

    now

    displayed.

    In

    fact,

    these scores are

    computer-generated

    samples

    from

    a

    population

    n

    which the

    correlationbetween

    X

    and

    Y

    was .10

    and

    the difference between population means was 4.65. The sample correlation

    turned out

    to be

    .139.

    Despite

    this

    relatively

    small

    correlation,

    which

    many

    investigators

    might

    consider

    insignificant,

    a

    paired-samples

    test

    now

    rejects

    Ho

    at the

    .05

    significance

    level.

    Let us now look at

    the same data from another

    point

    of view.

    Suppose

    an

    experimenter

    s aware

    of

    the

    pairing

    ndicated

    n

    the

    table,

    but

    believes the

    small

    correlation

    to be

    unimportant

    and

    performs

    an

    independent-samples

    test

    in

    order

    to take

    advantage

    of more

    degrees

    of

    freedom. The result is

    failure

    to

    reject

    Ho,

    although

    a

    paired-samples

    test would have

    a

    different

    outcome.

    If

    the

    existence of pairingor matchingis known,this kind of oversightis not likely to

    occur

    and can be

    corrected

    easily.

    However,

    it

    is

    impossible

    to

    know from

    most

    TABLE 2

    Example

    of

    a

    design

    in which

    initially

    there is an

    undetected

    correspondenceof

    values

    t

    indep.

    t

    paired

    X Y Pair

    X

    Y

    D=Y- X

    25

    34

    1 17

    45

    28

    32

    39 2

    25

    35

    10

    43

    34

    3

    16 27

    11

    16 30

    4

    24

    34

    10

    34 35

    5

    43

    46

    3

    25 46

    6

    18 30

    12

    17

    23

    7

    34

    29 -5

    18

    27

    8

    25

    39 14

    29

    43

    9 36

    23

    -13

    24 45 10 34 43 9

    34 28

    11

    32

    34

    2

    36

    29

    12

    29

    28

    -1

    Note.

    An

    independent-samples

    tudent

    test

    was

    first

    performed

    ithout

    onsiderationf

    possible

    pairing

    f

    scores.

    Then,

    pairing

    was

    recognized,

    nda

    one-sample

    tudent

    test

    (i.e.,

    a

    paired-

    samples

    test)

    was

    performed

    n

    difference

    cores.

    Independent:

    =

    2.052,

    df-=

    22,

    p

    >

    .05.

    Paired:

    =

    2.2

    10,

    df

    =

    11,

    p