43423998 Anne Anastasi Psychological Testing I

download 43423998 Anne Anastasi Psychological Testing I

of 104

Transcript of 43423998 Anne Anastasi Psychological Testing I

  • 8/15/2019 43423998 Anne Anastasi Psychological Testing I

    1/104

    ANNE~NASTASIProfessor of Psychology, Fordham Universiry

     P sy c l w lv g ic a l T e s tin g

    MACMILLAN PUBLISHING CO.,   INC.

     New   Y o r k

    Collier   Maonillan   Publishers

    London

  • 8/15/2019 43423998 Anne Anastasi Psychological Testing I

    2/104

    I  N A  revised edition, one expects both similarities and differences. This

    edition shares with the earlier versions the objectives and basic approach

    of the book .  The primary goal of this text is still to contribute toward the

     proper evaluation of psychological tests and the correct interpretation

    and use of test results. This goal calls for several kinds of information:

    ( 1) an understanding of the major principles of test construction, (2)

     psychological knowledge about the behavior being assessed, (3) sensi-

    tivity to the social and ethical implications of test use, and (4) broad 

    familiarity with the types of available instruments and the sources of 

    information about tests. A minor innovation in the fourth edition is the

    addition of a suggested outline for test evaluation (Appendix C).

    In successive editions, it has been necessary to exercise more and more

    restraint to keep the number of specific tests discussed in the book from

    growing with the field-it has never been my intention to provide a

    miniature   Mental Measurements Yearbook!   l:\evertheless, I am aware

    that principles of test co~struction and interpretation can be better un-derstood when applied to~ particular tests.   Moreover, acquaintance with

    the major types of available tests, together with an understanding of 

    their special contributions and limitations, is an es!>entialcomponent of 

    knowledge about contemporary testing. For these reasons, specific tests

    are again examined and evaluated in Parts 3, 4, and 5. These tests have

     been chosen either because they are outstanding examples with which

    the student of testing should be familiar or because they illustrate some

    special point of test construction or interpretation. In the text itself, the

     principal focus is on types of tests rather than on specific instruments. At

    the same time, Appendix E contains a classified list of over 250 tests,

    including not only those cited in the text but also others added to provide

    a more representative sample.

    As for the differences-they loomed especially large during the prepa-

    ration of this edition. Much that has happened in human society since

    the mid-1960'shas had an impact on psychological testing. Some of these

    developments were briefly described in the last two chapters of the third 

    edition. Today they have become part of the mairn;tream.;()fpsychological '

    testing and have been accordingly incorpo~i-ted in the apprqpqate sec-

    tions throughout the book .   Recent changes in psychological Jesting that

    are reflected in the present edition can be del pribed on three levels:

    (1) general orientation toward testing, (2) Stlbm,IJ,tiveand inethod()l~i-

    cal developments, and (3) "ordinary progress" w  1 )Q   as the publiciitibnof new tests and revision of earlier tests.

    All rights reserved .   No part of this book may be reproduced or 

    transmitted in any form or by any means, electronic or me-

    chanical, including photocopying, recording, or any informa-

    tion storage and retrieval system, without permission in writing

    from the Publisher .

    Earlier editions copyright 1954 and   ©   1961 by MacmillanPublishing Co., Inc., and copyright   ©   1968 by Anne Anastasi.

    MACMILLAN PUBLISHING Co., INC.

    866 Third Avenue, New York, New York 10022

    COLLIER MACMILLAN CANADA,  LTD.

     Librarlj of Congress Cataloging in Publication Data

    Anastasi, Anne, (date)

    Psychological testing.

    Bibliography: p.

    Includes indexes.1.   Mental tests. 2. Per sonality tests. I. Title.

    [ DNLM: 1.   Psychological tests. WM145 A534P]

    BF431.A573 1976 153·9 75-2206

    ISBN O-2-30298

  • 8/15/2019 43423998 Anne Anastasi Psychological Testing I

    3/104

    Preface

    ; An example of changes on the first level is the incr easing awar eness of 

    ~e ethical,  soc ial,   and legal implications of   t~sting.   In the present edi-

    lon, thistopic has been expanded and treated 111a separate chapter ear ly

    b   the book (Ch. 3) and in Appendixes A and B .   A cluster of related 

    l. .evelopments represe~ts a bro~dening of.test u~es..Beside~ the tradi~ion~l' pplications of tests   111   selectwn and diagnosIs,   111creasmgattention ISeing given to administering tests for self-kuowledge and self-d evelop-~entl and to training ind ividuals in the use of their own test res?lts. in,lJecisionmaking (Chs. 3 and 4). In the same categor y   are the contmumg

    ~eplacementof global scores with multitrait profiles and the application

     bf classificationstrategies, whereby "ever yone can be above average"   in

     bne or more socially valued    "ariables (Ch. 7).   From another angle,

    r ffortsare being made   to modif y   traditional interpretations of test scores,

    ~n bothcognitive and noncognitive areas, in the light of accumulating

     psychological knowledge. In this edition,   Chapter 12 brings   together 

    ' psychologicalissues   in the interpretation of intelligence test scores,

    :touchingon such problems as stability   and change in intellectual level

    .over time; the nature of intelligence;   and the testing of intelligence in

    :earlychildhood, in old age,   and   in different cultures. Another example

    is pr ovided  by   the   increasing emphasis on situational specificity   and I  per son- by-situationinteractions in personality   testing, stimulated    in large

     par t bythe social-lear ning theor ists (Ch.   17).

    T ~ e second level, -covering substantive   and methodological   changes,

    isillustratedby the impact of computers on the development, administra-

    "tioll,scoring,and interpretation of tests (see especially Chs. 4, 11, 13, 17,

    1 8 , W  ) .   The use of computers in administering or managing instructional

     pr o/ramshas also stimulated the   d evelopment of criterion-r eferenced 

    t~~~although other conditions have contributed t o th e u psu rg e o f  

    'i!r estin such tests in education. Criterion-r eferenced tests are discussed '1c •

    ,. 'pally   in Chapters 4,5,   and   14.  Other ty pes of lllstruments that have

    to prominence and have received fuller treatment   in the present

    n includ e: tests for identif ying specific learning disabilities (Ch.inventor ies and other devices for use in behavior    modification pro-'

    (Ch. 20),   instruments for assessing early   ch~ldhOod education

    14),   Piagetian   "ordinal" scales (Chs.   10 and    14),   basic education

    literacy tests for adults (Cbs.   13 and    14),   and techniques for the

    ment of environments (Ch. 20).   Problems to be considered in the

    ,   ment of minority groups, including the question of test bias,   are

    ined from different angles in Chapters 3, 7, 8,   and 12.

    the   third level,   it may be noted that over 100 of the tests listed in

    edition have been either initially   pUblished or revised since the

    ication of the preceding edition (1968).   Major examples include the

    ar thy Scales of Children's Abilities, the WISC-R ,   the   1972   Stanford-

    norms (with all the   resulting readjustments in interpretations),

    Preface   vii

    Forms Sand T of the   DAT (including a computerized Career Planning

    Pr ogram), the   Str ong-Camp bel l I nter est I nventory   (merged f orm   of the

    SVIB), and the latest   r evisions of  the Stanfor d   Achievement Test and   theMetropolitan Readiness Tests.

    It   is a pleasure to acknowledge the assis~nce   r eceived fr om   many

    sources in the preparation of   this edition. The com pletion   of the   pr o ject

    was facilitated by a  one-semester Faculty   Fellowshi p   award ed    by For d -

    ham Univer sity and by   a grant   f rom   the For dham   Univer sity R esear chCouncil   covering   princi pally   the services of a research assistant.   These

    services   wer e perfor m ed by   Stanley   Fr iedland   with   an unusual   combina-

    tion of expertise,   r esponSibility,   and graciousness.   I am   indebted to   the

    many authors and   test   publishers   who pr ovided r epr ints, unpublished 

    manuscr ipts,   specimen sets of   tests,   and   answers   to my   innumerable in-

    quiries by mail   and   telephone.   For assistance extend ing   far beyond the

    interests and r esponsibilities of any single   publisher,   I   am   especially

    grateful to Anna Dragositz of Educational Testing Service and   Blythe

    Mitchell of Harcourt Brace Jovanovich,   Ioc. I want   to acknowledge the

    Significant contribution of  John T.  Cowles of the Univer sity   of  Pittsburgh,

    who assumed complete responSibility f or   the prepar ation   of   the   Instruc-

    t or' s   M anual   to accompany this   text.For informative discussions and critical comments on   par ticular topics,

    I want   to convey   my  sincer e   thanks to Willianl H.   Angof f   of Ed ucational

    Testing Ser vice and to sever al members of the   Fordham University Psy-

    chology Department, including David R .   Chabot, Mar vin Reznikoff ,

    Reube~ M. Schonebaum,   and   'Warren,  W.   Tr yon. Gratef ul acknowledg-

    ment IS also made of the thoughtful r ecommendations su bmitted by

    course instructors in  r esponse   to the questionnaire distri buted    to cur rent

    users of the third edition. Special   thank s   in   this connection   am d ue   to

    Mar y   Carol Cahill for her extensive, constr uctive, and Wid e-r anging

    suggestions.   I   wish to ex press my   appr eciation   to   Victoria   Overton   of 

    the Fordham University library   staff   f or   her   efficient and courteous a s-

    sistance in bibliographic matters. Finany,   I am   happy   to   r ecor d    thecontributions of my  husband ,   John Porter Foley,   Jr ., who again par tici-

     pated in the solution of countless problems at   all stages   in the   prepara-tion of the book .

     A. A.

  • 8/15/2019 43423998 Anne Anastasi Psychological Testing I

    4/104

    CONTENTS

    P ART   1CONTEXT   OF PSYCHOLOGICAL TESTING

    1. FUNCTIONS AND ORIGINS OFPSYCHOLOGICAL TESTING 3

    Current uses of psychological tests   QEarly interest in classification and training of the mentally

    retarded 5The first experimental psychologists 7

    Contributions of Francis Galton 8

    Cattell and the early "mental tests" 9

    Binet and the nse of intelligence tests 10

    Gr oup t est ing 1 2

    Aptitude tes ting 13   ~

    Standardized achievement tests 16

    Measurement of personality 18

    Sour ces of information about tests 20

    2. NATURE AND USE OFPSYCHOLOGICAL TESTS

    What is a psychological test? 23Reasons for controlling the use of psychological tests

    Test administration 32

    Rapport 34

    Te st an xi et \' 3 7Examiner ~nd situational variables 39

    Coaching, practice, and test sophistication 41

    3. SOCIAL AND ETHICAL IMPLICATIONS

    OF TESTING   "

    User qualifications 45

    Testing instruments and procedures 47

    Protection of privacy   . 49

    Confidentiality 52

    Communicating test results 56

    Testing and the civil rights of minorities 57

    ix

  • 8/15/2019 43423998 Anne Anastasi Psychological Testing I

    5/104

    4. NORMS AND   THE   INTERPRETATION OF

    TEST SCORES

    Statistical concepts 68

    Developmental norms 73Within-group norms 77

    Relativity of norms 88Computer utilization in tile interpretation of test scores 94

    Criterion-referenced testing 96

    5,   RELIAB ILITY

    The correlation coefficient 104

    Types of reliability 110

    Reliability of speeded tests 122Dependence of reliability coefficients on the sample tested 125

    Standard error of measurement 127

    Reliability of criterion-referenced tests 131

    Content validity 134

    Criterion-related validity 140

    Construct validity 151

    Overview 158

    7. VALIDITY: MEASUREMENT AND

    INTERPRET ATION

    Validity coefficient and error of estimate 163

    Test validity and decision theory 167

    Moderator variabll;;s 177Combining information from different tests 180

    Use of tests for cl.assification decisions 186Statistical analyses of test bias 191

    8. ITEM ANALYSl-S

    Item difficulty 199

    Item validity 206

    Internal consistency 215

    Item analysis of speeded tests 217

    Cross validation 219

    Item-group interaction 222

    PART   3

    TESTS OF GENERAL INTELLECTUAL

    LEVEL

    9. INDIVIDUAL TESTS

    Stanford-Binet Intelligence Scale 230

    Wechsler Adult Intelligence Scale 245

    Wechsler Intelligence Scale for Children   2 .'5 5Wechsler Preschool and Primary Scale of Intelligence 260

    10. TESTS FOR SPECIAL POPULATIONS

    Infant and preschool testing 266

    Testing the physically handicapped 281

    Cross-cultural testing 287

    Croup tests versus individual tests 299

    Multilevel batteries 305

    Tests for the college level and beyond 318

    12. PSYCHOLOGICAL ISSUES IN

    INTELLIGENCE TESTINGLongitudinal studies of intelligence 327.

    Intelligence in early childhood 332

    Problems in the testing of adult intelligence 337

    Problems in cross-cultural testing 343

     Nature of intelligence 349

    PART 4

    TESTS OF SEPARATE AInLJTIES

    13. MEASURING MULTIPLE APTITUDES

    Factor analysis 362

    Theories of trait organizationMUltiple aptitude batteries

    Measurement of creativity

    369378

    388

    14. EDUCATIONAL TESTING

    Achievement tests: their nature and uses   398General achievement batteries 403

    Standardized tests in separate subjects 410

    Teacher-made classroom tests 412

  • 8/15/2019 43423998 Anne Anastasi Psychological Testing I

    6/104

    20. OTHER ASSESSMENT TECHNIQUES

    "Objective"   performance tests 588

    Situational tests 593

    SeH-concepts and personal constructs 598

    Assessment techniques in behavior modification programs

    Observer reports 606Biographical inventories 614

    The assessment of environments 616

    Diagnostic and criterion-rdt:re nc ed t est s 417

    Specialized prognostic tests 423

    Assessment in early childhood education 425

    ~ O CCU PAT IO NA L TE STI NG\VValidation of industrial tests 435

    Short screening tests .for industrial personnel 439

    Special aptitude tests 442

    Testing in the profeSSions 458

    Diagnostic use of intelligence tests 465

    Special tests for detecting cognitive dysfunction

    Identifying specific learning disabilities 478

    Clinical judgment 482

    Report writing 487

    B. Guidelines on Employee Selection Procedures (EEOC)

    Guidelines for Reporting Criterion-Related and 

    Content   Validity (OFCC)

    PART   5

    PERSON ALITY TESTS

    17. SELF-REPORT INVENTORIES

    Content validation 494

    Empirical criterion keying   -   496

    Factor analysis in test development

    Personality theory in test development

    Test-taking attitudes and response sets

    Situational specificity 521

    Evaluation of personality inventories

    506510

    515

    18. MEASURES OF INTERESTS, ATTITUDES,AND VALUES   ;527

    Interest inventories 528

    Opinion and attitude measurement 543

    Attitude scales 546Assessment of values and related variables 552

    19. PROJECTIVE TECHNIQUES

     Nature of projective techniques 558

    Inkblot techniques 559Thematic Apperception Test and related instruments

    Other projective techniques 569

    Evaluation of projective techniques 576

  • 8/15/2019 43423998 Anne Anastasi Psychological Testing I

    7/104

  • 8/15/2019 43423998 Anne Anastasi Psychological Testing I

    8/104

    CHAPTER   1

     F u n c tio n s a n d    0 1 ~ ig iT lS o f 

     P sy c llo lo g ica l T eS tiT lg

     A' NYO NE   r ead ing this book tod ay   could und oubtedly   illush'ate   what

    .   is meant by a psychological test,  I t   would be easy enough to recall

    .   a test the read er himself has taken in school, in college, in the

    armed services, in the counseling center, or in the personnel office. Or 

     perhaps the r ead er has served as a   subject in an   experiment in which

    stand ardized tests wer e employed .  This would certainly not have been the

    case fifty years ago. Psychological testing is a r elatively young br anch of 

    one of the youngest of the sciences.

    Basically, the   function of psychological tests is  to measure ,9.:iff e~~~.n~L _ 

    1Jetween individuals or between the reactions of the   same   individual on

    diff erent occasions.One of the first problems that stimulated the d evelop-

    ment of psychological tests was the   id entification of the mentally r  e-

    tard ed. To  this d ay,  the   d etection of int~i1ectual d eficiencies remains an

    Important application of certain ty pes o f psychological tests. Related 

    clinical uses oftests include the examination of the  emotionally disturbed,

    the d elinquent,   and other ty pes of behavior al d eviartts. A strong impetusto the   early   d evelopment of tests was lik ewise provided by   problems

    arising in education, At present,   schools ar e   among the   largest test users.

    The classifica.tiOIlOfchildr en with refer ence   to their ability   to profit

    from diff erent types of school instruction, the identi£ication of the in-

    tellectually retarded on the one hand and the gifted on the other, the

    diagnosis of acad emic failures,   the educational and  vocational counseling

    of high school and college students, and the   s~~ction of applicants for 

     professional and other special schools are among the many   educational~uses of tests.

    The selection and   classification of industrial personnel represent an-

    other ma jor application of psychological testing.   From the   assembly-line

  • 8/15/2019 43423998 Anne Anastasi Psychological Testing I

    9/104

    4   COllf cl't   o f   Psychologic al T  esting

    operator or filing clerk to top management, there is scarcely a   type   of jo b

    for which some kind of psychological test has not proved helpful   in such

    matters as hiring,   job assignment, transf er, promotion, or termination.

    To be sure,   the effective employment of tests in many of these situations,

    es   eciiill-"Tri('Onnection with high-level jobs, usuall • re uires that the

    t!.:ts he used as an adjunct to s -i u interviewing,   so that test scores

    may be properly   int~rpreteaTnt1leli   ht of other back ound '   rmatiQn

    a out the m   IVI   un. evertheless, testing constitutes an important part~ total   personnel progr am.   A closely   r elated application of psycho-

    logical testing is to b e found in the selection and   classification of military

     personnel.   From simple beginnings in   "Vorld    'War    I, the scope and 

    var iety   of psychological   tests employe d i n militar y   sihlations under went

    a phenomenal increase during   World    War II. Subsequently,   research

    on test development has been continuing on a large scale in all branches

    of the  a rmed ser vices,

    The use of tests in   counseling has gr adually broadened f ro m a na r-

    rowly defined guidance   r egarding educational and vocational plans to

    an involvement   with all aspects of the person's   life. Emotional well-

     being and ef fective interpersonal r elations have become increasingly

     prominent objectives of counseling. Ther e is growing emphasis,   too, on

    the use of tests to enhance self-understanding   and personal   development.Within this framework, test scores ar e part of    the   information   given to

    the   individual as  ai ds   to   his   own d ecision-making processes.

    It   is   clear ly   evid ent that psychological tests   are currently   being em-

     ployed in the solution of a wid e   range   of practical pr o blems. One should 

    not, however, lose  sight of the fact that   such tests are als? serving impor-

    tant functions in basic research Nearly all problems in differential   psy-

    chology, for example, require testing procedures as a means of gathering

    d ata.  As illustrations,   reference may   be made   to stud ies   on the natur e a nd  

    extent of individual differences,   the   identification of psychological traits,

    the measurement of group:' diff erences, ~nd the investigationf i jo]ogical

    and cUltur al factors associated WIth 6ehavioral differences. For all such

    areas of research-and for many others-the precise mt>.asur ement of 

    individual differences mad e   possible by   well-constructed tests is an

    essential prerequisite. Similarly,   psycholOgical tests provide standardized 

    tools for investigating such varied problems as life-span developmental

    changes within the individual,   the relative effectiveness of different edu-

    cational procedures, the outcomes of psychotherapy, the   impact of  

    community programs, and the influence of noise on per formance.

    From the many different uses of psychological tests,   it follows that some

    knowledge of such tests   is needed for an adequate understanding of most

    fields of contemporar y   psychology. It is primarily   with this end in   view

    that the present book has been prepared. The book is not designed to

    make the individ ual either n skilled examiner    and test administrator or 

    an"exper f   on   test construction.   It   is directed ,   not to the   test specialist, but

    to the general student of psychology.   Some acquaintance   with the   lead ·'

    ing   current tests is  necessary   in order to understand    r eferences   to   the use

    of such tests   in the psychological liter ature.   And a proper evaluation   and 

    interpretation of test results must ultimately rest on a knowledge   of   how

    the tests   were constructe

  • 8/15/2019 43423998 Anne Anastasi Psychological Testing I

    10/104

    6 Conte xt    o f Psychological Testing

    deviates came a realization that some uniform cr iteria for identifying and 

    classif ying these cases were required .   The establishment of many special

    institutions for the care of the mentally   retarded in both Europe and 

    America made   the need for setting up admission standards and an ob-

     jective system of classification especially   urgent.   Fir st   it   was necessar y   to

    differentiate between the insane and the mentall v retard ed. The former manifested emotional disorders that might or might not be accompanied 

     by intellectual d et er iomtion from an initially normal level;   the latter were

    characterized essentially by i~tellectual defect that had been present

    from bir th or   early   infancy. What is probably the first explicit statement

    of  this distinction is to be found in a two-volume work published in 1838

     by   the   French physician Esquirol (1838), in wh ich over one hundred 

     pages are de\'oted to mental retardation. Esquirol also pointed out   that

    there an!  many degrees of mental retardation, varying along a continuum

    f rom normality to low-grade idiOCy.In the   effort to develop some system

    f or claSSifying the differ ent degrees and varieties of retardation"Esguiroi

    tr ied several   pr ocedures but concluded that the individual's use of lan-

    guage   pr ovides the m05t de end a ble cr iter ion of his intellectual level. It

    is mer es mg to note   t   at current criteria   0   ment a r etardation ar e   also

    largely   lingUistic   ant!   that   present-day   intelligence   tests ar e   heavily

    load ed   ~vith Yerbal content. The important par t   verbal ability   plays   in

    our concept of  intelligence will be repeatedly   demonstrated in subsequent

    chapters.

    Of special significance   are the contributions of   another    Fr en ch p hysi-

    cian,   S,egll~. who pioneered in the   training of the mentally retarded.

    Having   rejected the   prevalent   notion of the ineurability   of mental r e-

    tar d ation   SeO'uin (18 66 ) eXIJerimented f or many vears with what he,   v ~  "

    termed the physiological method of tr aining;   and in   1837 he,:es,tal:6hed 

    the nrst school   d evoted to the   education of mentally   reta   .."   ~hildren.

    In   1848   he emigr at ed t o   America, where his ideas gaine  _    ide recog-nition. Man~-   of the sense-training and muscle-trainirJg techniques cur-

    rently in use in institutions for the mentally   retarded \vere originated by

    Seguin. By these methods,   severely retarded children are given intensiveexercise   in  s ensory discrimination and in the development of motor con-

    tr ol. Some of   the   procedur es developed by Seguin for this purpose were

    'eventually incorporated into performan ce o r no nverbal tests of intelli-

    gence. An example is the Seguin Form Board, in which the individual

    is r eq uired to insert   variously   shaped blocks into the corr esponding

    recesses as quickly as possible.

    More than half a centur y   after the work of Esquirol and Seguin, the

    French psychologist Alfred Binet urged that children who failed to

    r espond to normal schooling be examined before dismissal and,   if con-

    sidered educable, be assigned to special classes (T. H. Wolf ,   1973). With

    Funct ions   and   Origins   o f  P sychological Test ing 7  

    his fellow members   of t he Society for the Psychological S tudy o f the

    Child ,   Binet   stimulated    the   Ministr y   of Public   Instruction   to tak e steps to

    improve   the   condition of retarded children.   A s pecific outcome   was the

  • 8/15/2019 43423998 Anne Anastasi Psychological Testing I

    11/104

    Functions   and   Ol'igills   of   Psychological Test ing   9

    mathematically untrained investigator who might wish   to tr eat test   r e-

    sults quantitatively. He thereby extended enormously the application of  

    statistical procedures to the analysis of test data. This phase of Galton's

    work has been carried forward by many of his students, the most   eminent

    of whom was Karl Pearson.It "'as the English biologist Sir Francis Galton who ,,:as.primarily r~-

    sponsible for launching the testing movem~l~t: A umfY~lg. factor ~nCalton's numerous and vaI'ied research activities was hiS }nterest llL

    'humaJ;r heredit   ". In the course of his imestigations on heredity, Caltont~a 'ize t   e   need for measuring the characteristics of related and un-related persons. Only in this way could he discover, for example, the

    exact degree of resemblance bet:w'een p~ren~s and offspring, 1;'rothers and   .

    sisters; cousins, or twins. With this end   11l  View, Calton was mstrument~l   '

    in inducing a number of educational institutions to keep systematic

    anthropometric recOl:ds on their students. ~e al

  • 8/15/2019 43423998 Anne Anastasi Psychological Testing I

    12/104

    10   Context of   PS lJc11010g ica l   Testing

    relation to independent estimates of intellectual levC:'1based on teachers'

    ratings (Bolton, 1891-1892;   J .   A. Gilbert, 1894) or academic grades(Wissler, 1901).

    A number of test series assembled by European psychologists of the

     period tended to cover somewhat more complex functions. Kraepelin(1895), who was interested primarily in the clinical examination of psy-

    chiatric patients, prepared a long series of tests to measure what he re-

    garded as basic factors in the characterization of an individual. The

    tests, employing chiefly simple arithmetic operations, were designed to

    measurepractice effects, memory, and susceptibility to fatigue and to dis-

    traction. A few years earlier, Oehrn (1889), a pupil of Kraepelin, had 

    emploY€idtests of perception, memory, association, and motor functions

    in an investigation on the interrelations of psychological functions. An-

    other German psychologist,   Ebbinghaus (1897), administered tests of 

    arithmetic computation, memory span,   and sentence completion to school-

    children. The most complex of the three tests, sentence completion, was

    the only one that showed a clear correspondence with the children's

    scholastic achievement.

    Like Kraepelin, the Italian psychologist Ferrari and his students were

    interested primarily in the use of tests with pathological cases (Guicciardi

    &   Ferrari,   1896).   The test series they devised ranged from physiological

    measures and motor tests to apprehension span and the interpretation of 

     pictures. In an article published in France in 1895, Binet and Henri criti-

    cizedmost of the available test series as being too largely sensory and as

    concentrating unduly on simple, specialized abilities. They argued further 

    that, in the measurement of the more complex functions, great precision

    is not necessary, since individual differences are larger in these functions.

    Anextensive and varied list of tests was proposed, covering   such   func-tions as memory, imagination, attention, comprehension, suggestibility,

    aesthetic appreciation, and many others. In these tests we can recognize

    the trends that were eventually to lead to the development of the famous

    Binet intelligence scales.

    Functions and Origi;ls of Psychological Testing   11

    ously cited commission to study procedures for the education of retarded 

    children. It was in connection   'with the objectives of this commission that

    Binet, in collaboration with Simon, prepared the first Binet-Simon Scale(Binet   &   Simon, 1905).

    This scale, known as the 1905 seale, consisted of 30 problems or tests

    arranged in ascending order of difficulty. The difficulty level was deter-

    mined empirically by administering the tests to 50 normal children aged 

    3 to 11 years, and to some mentally retarded children and adults. The

    tests were designed to cover a wide variety of functions, with speCial

    emphasis onJ.udgmt;nt, comprehension, and reasoning. Which Binet re-

    garded as essential components of intelligence. Although sensory and 

     perceptual tests were included, a much greater proportion of verbal

    content was found in this scale than in most  test   series of the time.   The

    1905 scale was presented as a preliminary and tentative instrument, and 

    no precise objective method for arriving at a total score was formulated.

    In the second, or 1908, scale, the number of tests was increased, some

    unsatisfactory tests from the earlier scale were eliminated, and all tests

    were grouped into age levels on the basis of the performance of about300 normal children between..  the ages of 3 and 13 Years. Thus, in the

    3-year level were placed all tests passed by 80 to  0 0   percent of normal3-year-olds; in the 4-year-Ievel, all tests similarly passed by normal 4-year-

    olds; and so on to age 13.The child's score on the entire test could then

     be expressed as a   mental level   corresponding to the age of normal chil-

    dren whose performance he equaled. In the various translations and 

    adaptations of the Binet scales,   the term "mental age" was commonly

    substituted for "mentalleveI." Since mental age is such a simple concept

    to~rasE> the introduction of this term undoubtedly did much to popu-

    larize intelligence testing.> Binet himself, however, avoided the term

    "mental age" because of its unverified developmental implications and 

     preferred the more neutral term "mental level" (T. H. \\Tolf, 1973).

    A third revision of the Binet-Simon Scale appeared in 1911, the year of 

    Binet's untimely death. In this scale,   no fundamental changes were intro-duced. Minor revisions and relocations of specific tests were instituted .

    More tests were added at several year levels, and the scale was extended to the adult level

    Even prior to the 1908 revision, the Binet-Simon tests attracted wide

    >   Goodenough   (1949,   pp.   50-51)   notes that in   1881,   2l y~aTs befor~ the appear-ance of the 1908 Binet-Simon Scale, S. E. Chaille publi!iheq in the   New Orleans

     Medical a~d Surgical Journal   a series of tests for infan~ 11l7anged according to thea!1:eat whIch the tests are commonly passed. Partly because'   of the limited circulation

    of the journal 'nd partly, perhaps, because the scientific   ~Om!J1l1nity  was not readyfor it, the significance of this age-scale concept passed unnoticed at the time. Binet's

    own scale was in~ed by the work   o E   some   o E   ~is contemporaries, notably Blinand Damaye, who prepared a set of oral questions from which they derived a singleglobal score  E o r   eaclrdiild (T. H. Wolf, 1973).   .

    Binet and his co-workers devoted many years to active and ingenious

    research on ways of measuring intelligence. Many approaches were tried,

    including even the measurement of cranial, facial, and hand form, and 

    the analysis of handwriting. The results, however, led to a growing con-

    viction that the direct, even though crude, measurement of com lex

    1 fence   a unc ons   0   ere t e greatest promise. T en a specific situ-ation arose that brought Binet's efforts to imme(]iate practical fruition.

    In 1904, the Minister of Public Instruction appointed ~inet to the previ-

  • 8/15/2019 43423998 Anne Anastasi Psychological Testing I

    13/104

    12   C o n t e xt    o f   P s yc 11 010g i ca l Tes t i ng

    attention among psychologists throughout the wor ld .   Tr anslation~   and 

    adaptations appeared in many lang;uages. In Americ;l,   a number of diHer-

    ent revisions were prepa.red ,   the   most famous of which is   the one de-

    veloped under   the dir ection of L. ~tTerman a.t Stanford University,   and known as  the Stanfmd -Binet (Terman,   1916).   It was in this   test that the

    intelligence q uotient   (IQ),   or mtio between mental age   and chr onologi-

    cal age, was fir st used .   The   latest r evision of this   test is wid ely   employed 

    today and will be mor e   full\'   consider ed in Chapter 9.   Of special interest,

    too. is  the first Kuhlmann-Binet revision,   which extend ed the scale down-

    ward to the age level of   3 months (Kuhlmann,   1912).   This   scale repre-

    sents one of the earliest   efforts   to develop pr eschool   and inf ant tests of 

    intelligence.

    F u nc t io n s a n d   Origins   o f P s y c1 101u g i c a l T e s t in g   13

    f o~ g~n~r al routine   te~ting; t~e latter was a nonlanguage scale em ployed 

    WIth Illiterates and wIth foreign- bor n recruits who were unable to tak e a

    tcst   in English. Both test~ w~re suitable for administr atio~ to lar ge   groups.

    Shortly af~e~the temunatlOn of "Vorld War   I, the Army tests were   r e-

    leased for cmhan use.   Not   only   did the Army   Alpha   and   Ar my Beta

    themselves pass through many   revisions,   the latest of   which are   even now

    in   use, b.ut they   also sVVed as ~dels   f or most group   intelligence   tests.The   te~ting   .movement under went   a   tremendous   s purt of   growth.   Soon

    group mtelhgence tests were being devised    f or all ages   and ty pes   of 

    ~ersons,   from preschool   children to graduate students.   Large-sc~le test-

    109  progr a~ns:   previously   impossible,   were now being launched with

    ~est~ul optimIsm. Because group.   tests were designed as mass testing

    lUsh uments,   they not only   permItted the simultaneous   examination of 

    large groups but also simplified the instructions and adminish'ation   pro-

    cedu~es so as to demand a minimum of training on the par t of t he

    exammer .   Schoolteachers began to give intelligence tests to thcir classes.

    Coll~ge studen~s wer e   routinely examined prio~ to ad mission.   Extensive

    studies of specIal adult groups, such as prisoners,   wer e   und ertaken. And 

    soon the general public became IQ-conscious.   "---

    T~e application of such group intelligence tests f ar outr an   their   techni-

    cal   Impr ovement. That the tests   wer e still cr ud e   instruments was of ten

    f?rgotten in the rush   of gathering scores and drawing pr actical condu-

    slO~Sfrom the ~esults.  'Vhen. ~he tests failed to meet unwar ranted expec-

    tations"   skepticism and hostiht)'   toward    all testing often   resulted .   J J 1 U S .the testi boom of the twenties,   based on the indiscr iminate use   of tests   i?ISma~   have ~one as much to r etai'   as to   ad\'ance   the pr ogress   of   psvcho-   ---logical test mg.   -   ~

    The Binet tests,   as well   as all their revisions,   ar e   in d il;ic lu a l   scales   inthe   sense that the\"   can be   ad ministered to   onlY one   person at   a   time.

    Man\'   of the tests in these   scales r equire   .oral   re~ ponses   fr om the subject

    or n~cessitate the manipulation of materials . S ome   call f  or    individualtiming   of r es ponses.   For these   and other r easons,   suc h tests are not

    ada pted to group administr ation. Another char acter ist ic of the Binet type

    of test   is that" it r eq uir es a   highly   trained    examiner .   Such tests   are es-

    sentiallv clinical   instr uments, suited to   the intensive   study of   individualJ   .'   •

    cases.Group   t esting ,   like the   fir st   Binet   scale,   was   d evelo ped to meet a press-

    ing  pr actical need . When the United States e nter ed l)!or ld ' Var I   in

    191 7,   a   committee   was appointed    by   the   Amer ican   Psychological Associ-

    ation to consider ways in which psychology   might   assist in   the conduct of the  war. This committee, und er the direction of   !lo bert 1.•.1. Yerkes,   recog-

    nized the need for the   rapid   classification of the   million   and a   ha 1 f   re-cruits with r es pect to gener al intellectual level.   Such informati~.~~~va:s

    r elevant to many   admmistr ative decisions,   including r e jection or dis-charge f r om militar y   ser vice,   assignment to diff erent types of sel'vicei,  or 

    admission to officer-training camps.   It was in this setting that the first

    group intelligence test was d eveloped .   In this task ,   the   Ar-m~' psycholo-

    gists d r ew   on all availa ble   test materials, and especially on an un pub-

    lished group intelligence test prepared by ~rthur S.   Otis, which hcturned    over to the   Army. A major contr i bution of Otis's test, which he

    designed while a student in one of   Terman's graduate courses,   was the

    introduction of multiple-choice and other "objective"   item types.

    The tests finally developed    by   the Army psychologists came to be

    known as the   ~rm""yAlpha and the Army Beta The former was designed 

    ~lthough intelligence   tests were originally   designed to   sample   a   wid e

    vanety   of ~unctions in order to estimate the individual's general intelIec-

    tua~ level, It soon became apparent that such  tests   were quite   limited in

    theIr   .cove~age. Not   all impor tant functions were represented.   I J : ! .   fact,

    most mtelhgence   tests wer e   primarily measures of verbal ability   and. to a

    lesser extent, of the ability to handle numer ical and other abstr act and 

    symb~~ic re~ations. Gr~dually psychologists eame to recogni~e that the

    ~erm . Il1telhgence test was a misnomer, since only certain aspects of 

    mtelligence were measured by such tests.

    To be sure,   th~ tests cov~red abilities   ,t~t are ot p.rime importance in

    our culture.   B~   It   was. realized that more' precise designations, in terms

    of the type of mformation these tests are able to yield,   w < ;lUlq   be prefer-

  • 8/15/2019 43423998 Anne Anastasi Psychological Testing I

    14/104

    14 Context of Psyclwlo{!.ical Testing

    able, For example, a number of tests that would probably have been

    caned intelligence tests during the twenties later came to be known as

    scholastic aptitude tests. This shift ill terminology was made in l'ec:og-

    nition of the fact that mallY   so-called intelligence tests   measure that

    combination of abilities demanded by academic work .

    E\'l'n prior to Vvorld War I, ps\'ch~logists had begun to recognize theneed for tests   of spE'cial aptitudes to suppkment the global intelligence

    tests. These   s ecial a   till/de   test s '    , ,   _   ' 

    vocationa counseling and in the   selection and   classification of industrial

    and military ersonn~1.   Among the most widely used are tests of .!!lechani-

    ea , c erica,   musical,   and artistic aptitlldes.-TI~ca~lation of intelligence tests that follm,'ed their wide-

    sl>\'eadand indiscriminate use durinlJ   the   twenties also revealed another ,   0

    lIote"iOlthy  fact: an individual's   erformance on   '

    test often -showed mar -c var iation. This ,yas especially apparent on

    gl'OUptests,   111whlch the items ar~mmonly   segregated into subtests   of 

    relath'e1\- homogeneous content. For example, a person might score r ela-

    tively high   on a verbal   subtest and low on   a   numerical   subtest,   or   vice

    ver sa,  To some extent, such internal variability   is also discernible   on a

    test lik e   the Stanf ord-Binet, in which,   for example, all items   involvingwords might prove   difficult for a particular individual,   whereas itcms

    em ploying pictures   or geometr ic diagrams may   place   him at an ad  -

    vantage,Test user s.   and es pecially clinicians,   frequently utilized   such   interc~l11-

     par isons in order to obtain 1110reinsight into the individual's psychological

    mak e-up.   Thus,   not only tllC'IQ   or other glo bal scor e   but also   scor es on

    subtests  wonld lJt' examined in the  e\'aluation of the indhidual case,   Such

    a practice   is   not   to   be   general1~'   r ecommended ,   ho,~,('ver. ~)eeaus~   in-

    tellig('J]ce tests   wer e   not designed for the   purpose   of  ,dIHerel,~h,~11aphtude

    anal;'sis. Of ten the   subtests   heing   compar ed contain t0o,14C\\' items to

    yield a   stable or relia ble   estimate of a specific ability:; jis'a r esu lt,   the

    obtained diffl:'rence   betwcen subtest scores might be reversed    if    the

    individual were retestE'd on a different day or with another foml of thesame test.   If such intraindividual comparisons   are to be made,   tests are

    needed that are specially   designed to reveal differ ences in performance

    in v arious functions.While the practical apl)lication of tests demonstr ated t he l1~.ed for 

    differential aptitude tests,   a parallel development in the stu,d)'  of trait or-

    ganization was gradually providing the means for constructing   SUC?   tests.

    Statistical studi('s on the natur e   of intelligence had been explonng the

    iflterrelatiol1s among scores obtained by many persons on a ,,,ide variety

    of different tests, Such investigations   were begun by   the English   , psy-

    chologist Charles Spearman (1904,   1927) during the £lrst decade of the

    Functions and   Or igillS of PSl jchological   Testing   15

     present   century.   Subsequent   methodological   developments,   based    on   the

    work of such Amer ican psychologists   as   T.   L.   R eIley (1928) and L.   L.

    !hurs~one (1935, 194i),   as  well as on that of other Amer ican and Englishll1veshgators,   have come to be known   as  "factor   analvsis."

    The   contr i butions   that the methods of f  actor    ana'lysis have made to

    test c'Onstruction will be more fully examined and ill~strated in Chapter 1:3 .  F or the present,   it will suffice to note that t h e d  ata gathered by such

     procedures have indicated the presence   of   a   Dumber    of rebtiyely   ;nde-

    J)end ent f ac tor s. or   tr aits. Some   of these   traits   wer e represen'ted ,   in

    vary~ng proportions,   in the traditional   intelligence tests. Verbal compre-

    henSIOn   and numerical reasoning are examples of this   tvpe of   trait.

    Other s,   such   as spatial,   perceptual,   and mechanical aptitude~, wer e   found 

    more often   in special aptitude tests than   in intelligence tests.

    One   of  t he   chief practical outcomes of factor analysis   was the develop-

    ment of   mult i ple aptitud e   batteries.   These batteri('s   arc d esiuned to pr o-

    vid e a   measure of the individual's standing in each of a number of traits.

    In place of  a   total score or   IQ,   a separate   score is obtained for such traits

    as   "erhal compr ehension,   numerical aptitude,   spatial visualization, arith-

    m~tic re~soning,   and perce~tual speed ,   Such batteries thus provide   a

    SUItable mstrument for makin

  • 8/15/2019 43423998 Anne Anastasi Psychological Testing I

    15/104

    1 6 C OI l! ex ! o f    Psyclwlogict,{    Tcsrillg

    term "aptitude test" has been tracHtiollalJ" cmployed to refer to tests

    measuring relativel\" homo ('ncous and dparlv defined sc rn1C'nts of 

    • II \.,t le term "intelliO'ence test" customarih'   refers to more hderogenc-Co)   e - . .

    ~ests yielding a single global score sm:h as an IQ. S~)ecial aptitu~c

    tests typically measure a single aptitude. ~lultiple al~tltl1de battenes

    measure a number of aptitudes but pro\"ide a profile of scores, one for 

    eaeh aptitude.

    FI/I1C!iol1.\'   mltl   Origi/l.~   of Psyc1IO/

  • 8/15/2019 43423998 Anne Anastasi Psychological Testing I

    16/104

    J'  I IIIC/   /(111,\   {///(/   (higill.,   of    J'sydl(l'(/~i('111   1'

  • 8/15/2019 43423998 Anne Anastasi Psychological Testing I

    17/104

  • 8/15/2019 43423998 Anne Anastasi Psychological Testing I

    18/104

    22   Cont ext of   Psyc11010gical   T esting

    of manuals that meet adequate scientific standar ds.   An enlightened    P U ? -lie of test users   provides the f irmest assur ance that   such standal'd s wIll

     be maintained and improved in the future.. .Asuccinct but comprehensive guide for the evaluatwn of psy~hologlcal

    testsis to be found in  Standards for Educational arul Psyc11010glCal Tests

    (1974),   published    by   the American Psychological As~ocia~ion. These

    standards represent a summary of recommended practices   111   test con-

    struction based on the current state of knowledge in the field. They are

    concerned  w ith   the   information about validity,   r eliability, norms, and other test character istics that ought to be reported in the manual.   In their latest revision, the   Standards   also provide a guide for the proper use of 

    tests and for the correct interpretation and application   of   test results.

    Relevantportions of the   S t Qnda~ds   "ill.be   cited in the following chapter s,

    in connection with the appropnate tOpICS.

    CHAPTER   2

     J  \ r a t 1 u r e   ar ld    U se o f  

    P sy c l z .o lo g ic a l T  e s ts

    T.HE HISTORICAL   introduction in   Chapter    1  has alr eady suggested some of  the many uses of   psychological tests, as well as   the wid e

    d iversity of available tests. Although   the general public   may   still

    associate  psychological   tests most dosely with "IQ tests"   and with tests

    designed to detect  emotional disorders, these   tests represent only  a  small

     proportion of the available   ty pes of instruments. The major categories of 

     psychological   tests will be discussed and   illustr ated in Parts 3, 4,   and 5,

    '\'hich cover tests of general intellectual level,   trad itionally called intelli-

    gence tests; tests of se parate abilities, includ ing   multi ple   a ptitud e   bat-teries, tests of  s pecial aptitud es, and achievement tests;   and per sonality

    tests, concerned with measures   of   emotional and motivational tr aits, in-

    terpersonal   behavior, inter ests, attitudes, and   other noncognitive   char -

    acteristics.

    In the   face of such diversity in nature and pur  pose,   ,~hat ar e tIle

    common differentiating characteristics of ps~'Chological tests? Ho,"   d o

     psychological tests differ from   other methods   of gathering information

    about individuals? The answer   is to be found   in certain   f und amental

    f eatures of both the construction and use of tests.   It  is  with   these   featm!es

    that the present chapter is concerned.

    BEHAVIOR    SAMPLE..-A,   psychological test is essentially   an o bjective

    .~d standard ized measure or it's'ample of behavior . Psychological tests

    are like tests in any other science, insofar as 0R~f lh~tions are made   on  a

    small  hut carefully chosen  ,sample   .~   . an i p~ jyjil~)r s behaviQr..  In   thisrespect, the psychologist proceed s   in much·.the 'Jame   way   as the chemist

    who tests a patient's blood or a community.}swater supply by   analyzing

    ,-et'more samples of it. If the psychologistwish¢'~ to   test the   extent,iff a c1lild's vocabulary, a clerk 's ability to perf orm arithmetic computa-

  • 8/15/2019 43423998 Anne Anastasi Psychological Testing I

    19/104

    tions, or a pilot's eye-hand    coor dination,   he   ('xamim's   their perfor mance

    with   a r epresentatin'   set of wonls, :11'ithmclie prol>lems, or   motor tests.

    " 'hetlwr or not the test ad eqnately co\'(.'rs the   behavior    und er con-

    sid eration   o bviously depends on the   number and natur e   of it   n ls   in thesamp e.   or examp e, an   ant 1I1letJctest consisting of only five   problems,

    ~le   including   only multiplication items,   would be a   poor measur e   of 

    the   ind iyidual's computational   skill.  A  yoealmlary test   composed    entirely

    of   base ball   terms would hardly pr oYid e a   dependable   estimate   of achild 's   total range of  v ocalmlar ~'.

    The   diagnostic   or    ' redict i J ;c   t ;a7uc   of   a   lsycholC!gical test depend ~ _ ol!

    the de bH,',~O which   it   sen'es   as an indicator    of   a   relatively broad and 

    !!guinea;t ar ea·Ofb~;:.   Measur ement   of the  h ehaYior   sample   dir ectl~'

    cO\'er ed   by   the   test is   J:ar ely,   if ever ,   the   goal   of   psychological testing.

    The   child 's knowled ge   of   a  p ar ticular    list   of 50 word s   is not,   in itself ,   of 

    ,great   interest.   Nor   is   the jo b applicant's   perf or mance on   a   specific   set

    of 20   arithmetic problems   of   much importune-e _  If ,   however, it c an b e

    demonstr ated that   there is a   dose   correspond ence   between the child 's

    knO\dedge of the   wor d list and his total l1laster~- of voca bular y,   or be-

    tween   the applicant's   scor e on the   ar ithmetic problems and his computa-

    tional perfor mance   on the joh.   then the   tests ar e   ser\'ing   their purpose,

    It should be   noted ir.. this connectiol J t hat the test items need notr esemble closely the   beha.vior   the   test   is.t o }[('dicr."It is only   necessary

    tna   "   .-  on   ence be d emoHstrated bet"'ecn the t m);   The

    d egrec   of similarity   between   the   test   sample   and the pr ed icted behavior 

    ma\'   vary wid ely.   At   one extr eme. the   test mav coincid e completelY with

    a  part   o'f the b;'h~or to he   preclictt'cl. An   e.\:Im ple   might be a   for eign

    voca bulary test in whi!=·htilt:' stud ents   are examilled   on 20   of the   50 nt'\\-

    word s   th~y   have studied;   another example is  pr ovided by the   r o,ld test

    tak en prior to o btaining a   driver's   liccme.   A lesser d egr e e o f similarity   is

    illustr ated by many   vocational   a ptitud e   tests administer ed prior to joh

    training,   in   which ther e   is   only a mod'"   eh as   the Rorschach

    inkblot   t est  ,   in which an attempt is made to pred ict f r om the subject's

    as~ociations   to inkblots   how he will   r cad to other people,   to   ~motionallytoned    stimuli,   and to other complex, ever yday-lif e   situations,   Despite

    their superficial differ ences,   all these tests consist of samples of the   indi-

    ~s behavioL., And each mUst prove Its worth   b y " an empiricallyd emonstrated correspondence   between the   subject's pcrformance on the

    test and in other situations.

    Whether the term   "diagnosis"   or the term   " prediction"   is employed in

    this connection also represents a minor distinction. Prediction eommonly

    connotes a temporal   estimate,   the individual's future performance on a

     job, for example, heing for eeast from his present test performance.   In   a

    hr oad e r sense,   ho\\"('\'er, e\-en the diagnosis of   pr esent condition,   suell   as

    mental r etar dation ur   emutional d isord er ,   im plies a   pred iction   of what

    the   incIi\'id ual will cIO in situations   other    than   the   pr esent   test.   It is

    logically Simpler to consider all tests   as   behavior    samples   from which

     predictions regarding other JX.havior can be m ad e.   Differ ent ty pps   of 

    tests   can then be   char acterized as   variants   of this   basic patter n.

    Anotlwr   point that should be   consider ed    at   the   outset    per tains   to the

    cone-e pt of   Clll}(/cify.   It is entir ely   possible,   f or   example, to   d c\'i sc a testf ur pr edicting   how wel l a n individual   can learn   Fr e11Ch bef ore he   has

    even begun the   stud y   of   Fr ench.   Such   a  t est   would invoh- e a sample of 

    the ty pes of behavior   requir ed to learn the   new language, but would in

    itself presuppose   no   knowled ge   of French.   It could then   be said that

    this   test measur es   the   indh'idual's "capacity"   or   " potentialitt   f or learn-

    ing French, Such tenus should ,   hO"'ever ,   be used with caution   in r ef er -

    ence to   ps~'dlOlogical   tests.   Onl\'   in the   senSe that   a   present behavior 

    sample   can be used as   an indicator of other , futur e   behayior    can   we

    s~ak .()f    a   test measur ing   "ca pacity."   K o psychological test can d o   mor ethan   measur elJel1"UDor .  'Vh~ethci:S\1ch behavior    can   ser ve   as an eff ective

    inc!('x  o f   other IX'hador can be   determined    only   by   empirica l t ry-out.

    STA:-;DARDIZATIO:-;,   It ,,-:"iIlhe   r ecalled that in the   initial   d efinition   a  p s~--

    chological   test \\'as descr ibed    as a  stand ard ized measur e.   Standar d ization

    implies   !miformifll of   ~)rQcedllre   in 'hd nl11Hsfenng and   SCoring the   'test If  

    the scor es   o btained    by   diff er ent iudiyiduals   are   to  be   compar a ble,   testin~

    cond itions   must   o bYiously   be  the same for all. Such a requir ement   is only

    a  s peCial application of the   need f or   controlled conditions in   all   scientific

    ohse-r yations.   In   a   test situation,   the   single ind e pend ent \'ar ia ble is

    usuall~'   the indh-idual being tested.

    In ord er to secure   uniformity   of  t esting conditions, the   test constr uctor 

     provides detailed dir  ections   for administer ing each newly d eveloped h:'st.

    The   for mulation of  such directions   is a major part of the stand ardization

    of   a   new test_ Such standardization extends to the exact materials   em

    'plo~d ,   time limits, oral instructions to subjects,   prc>Jiminary demonstra-

    :   ~n s, ways of handling queries f r om subjects. and   evel,\,   other   ~the testing situation. :Many other , mor e subtle factors may   influence the

    subject's   performance on cer tain tests.   Thus, in giving instructions   or ,

     presenting problems orally, consideration must be given to the rate of 

    speak ing, tone of voice,   inflection, pauses,   and faC j~1 e} pr ession. In a

    test   involving the detection of absurdities,   to t   eX;lnit>le, the correct   an-~wer may   be   given away by   smiling or paY~ jlg   wh~n the crucial word 

    J~.read  . . Stand~rdized testing p.rocedure,   ~r :,~i[th~\. ex.aminer 's point of  \1:w,   Will be dJscussed further m a later sect~g~ of -

  • 8/15/2019 43423998 Anne Anastasi Psychological Testing I

    20/104

    2 6 C Ol lt ex t O f   Ps ych ological T es ting

    Another   important step in the   standardization   of  a test is   the establish-

    ment of  norms ,   Psychological tests   have   no pr ed eter mined    stand ar ds o f  

     pli5s ingor    f a'inng;   an individual's score is evaluated by comparing   it withthe scores obtained    by   others.   As its name implies,   a nor m is the normalor average performance.   Thus,   if normal B-year-old children complete

    12   out of 50 problems corr ectly on a particular arithmetic reasoning test,

    then the 8-year -old norm on this   test corr esponds to   a   scor e   of   12,   The

    latter is   known as the raw scor e   on the test,   It may be expressed as

    number of correct items,   time required to complete a task, number of 

    er rors,  or   some other objective measure appropriate to the content of the

    test. Such a raw score is meaninglcss   until evaluated in terms of a suitable

    set of norms,   .

    In the process of standardizing a test , i t is   administered to   a   lar ge,representative sample of the ty pe of subjects   for whom it is designed.

    This   group, k nown as the standardization sample,   ser ves   to establish the

    norms.   Such norms indicate not only   the average performance but also

    the relative frequency   of   varying degrees   of deviation   a bove and below

    the  awrage.   It is thus   possible   to  e valuate diff er ent degrees of super iority

    and inferiority. The specific ways in which   such norm" may   be expressed 

    will be considered   in Chapter 4. All permit the designation of the indi-

    "idual's position with r ef erence   t o t he   normative or stand ardization

    sample.It   might also be   noted that norms are   established for   per sonality tests

    .   in esse!1tially the same way as for aptitude tests. The norm on a person-

    ality test   is   not necessarily   the   most d esirable or   "ideal"   performance,

    any   more than   a   perfect or   errorless scor e   is   the   norm on an aptitude

    test. On both ty pes o f tests,   the nor m corres ponds to the performance   of 

    typical or average individuals. On dominance-submission tests, for   ex-

    ample,   the nonn falls   at an intermediate point representing   the degree

    of dominance   or submission manifested by the   average   individual.

    Similarly. in an   emotional adjustment inventory,   the nor m d oe s n ot

    ordinarih·   correspond to a complete absen

  • 8/15/2019 43423998 Anne Anastasi Psychological Testing I

    21/104

    when retested on Friday, it is obvious that little or   110   confidence can be

     put in either score. Similarly, if in olle set of 50 words an individual

    identifies 40 correctl~·, whereas   in another, supposedly equivalent set he

    gets a score of only 20 right, then neither score can be taken as a de-

     pendable index of his verbal comprehension. To be sure, in both illustra-

    tions it is possible that only one of the two sC'ores is in error, but tlus

    could be demonstrated only by further retests. From the given data, we

    can conclude only that both scores cannot be right.   \Vhether one or 

    neither is an adequate estimate of the individual's ability in vocabulary

    cannot be established without additional information.

    Before a psychological test is released for general use, a thorough,

    objective check of its reliability should be carried out.   The different types

    of test reliability, as well as methods of measuring each, will be con-

    sidered in Chapter 5. Reliability can be checked with reference to

    Itemporal fluctuations, the particular selection of items or behavior sample

    constituting the test, the role of different examiners or scorers, and other 

    aspects of the testing situation. It is essential to specify the type of re-

    liability and the method employed to determine it, because the same test

    may vary in these different aspects. The number and nature of indi-

    viduals on whom reliability was checked should likewise be reported.

    With such information, the test user can predict whether the test will beabout equally reliable for the group with 'which he expects to use it,   or 

    whether it is likelv to be more reliable or less reliable.

    VALIDITY,   Undoubtedly the most important question to be asked about

    any psychological test"concerns its validity, i.e., the degree to which the

    test actually measures what it purports to measure. Validity provides a

    direct check on how well the test fulfills its function.   The determination

    of validity usually requires independent,   external   criteria   of -whatever the

    test is nesigned to measure. For example, if a medical aptitude test ist9

     be used in selecting promising applicants for medical school,.   ultimatle

    success in medical scholYlwould be a criterion. In the process of  ·y~lidat-

    ing such a test, it would be administered to a large group of students atthe time of their admission to medical school.   Some measure of per-

    formance in medical school would eventually be obtained for each stu-

    dent on the basis of grades, ratings by instructors, success or failure in

    completing training, and the like. Such a composite measure constitutes

    the criterion with which each student's initial test score is to be correlated.

    A high correlation, or   validity coefficie, ,!t,   would signify th~t those indi-

    viduals who scored high on the- test.   had been relatively successful in

    medical school, whereas those scoring low on the test had done poorly in

    medical school. A low correlation would indicate little correspondencel,,,t"' ppn tp~t ~('orp.rind criterirJn measure and hence poor validity for the

    test. The validity coefficifnt enables us to determine how closel\'   the

    criterion perfor~ance could have been predicted from the test scor~s.

    In a similar manner, tests designed for other purposes can be validated 

    against appropriate criteria.   A vocational aptitude test, for example, can

     be validated against on-the-job success of a trial group of new employees.

     A   pilot aptitude battery can 1;>evalidated against achie\'ement in flig:lt

    training. Tests designed for broader f\nd more varied uses are validated 

    against a number of criteria and their valid ity can be established only   by

    the gradual accumulation of data from many different kinds of investiga-tions.

    The reader may have noticed an apparent paradox in the concept of 

    test validity. If it is necessary to follow up the subjects or in other ways

    to obtain independent measures of what the test is trying to predict, why

    not dispense v.ith the test? The answer to this riddle is to be found in the

    distinction between the validation   l,T fO U p   on the one hand anci the groupson which the test will eventually be employed for operational purposes

    on the other. Before the test is ready for use, its validity must be estab-

    lished on a representative sample of suhjects. The scores of these persons

    are not themselves employed for operational purposes but serve only in

    the process of testing the test.   If the test proves valid b~' this method, it

    can then be used on other samples in the absence of criterion measures.It might still be argued that we would need only to wait   for the crite-

    rion measure to mature,   to become available, on   any   group in order to

    obtain the information that the test is trying to predict.   But such a pro-

    cedure would be so wasteful of time and energy as to be prohibitive in

    most instances. Thus, we could detennine which applicants will succeed 

    on a job or which students will satisfactorily complete college by admit-

    ting all who apply and waiting for subsequent developments! It is thevery wastefulness of this procedure-and its deleterious emotional im-

     pact on individuals-that tests are designed to minimize. By means of 

    tests, the person's present level of prerequisite skills, knowledge, and 

    other relevant characteristics can be assessed with a deferminable margin

    of error. The more valid and reliable thef~,   the smaller will be this,margin of error .   .

    The special problems encountered in determining the validity of dif-

    ferent types of tests, as well as the specific criteria and statistical pro-

    cedures employed, willlJ~ fhscussed in Chapters 6 and 7.   One further 

     point, however, should be coq$fdered at this time.   Validitv tells us more

    than the degree to which the te~t is f~lfilling its funcpari.ft actually tells

    us   what    the test is measuring. By studying the validation data, we can

    objectively determine what the test is measuring. It would thus be more

    accurate to define validity as the extent to which we Jrnow what the test

    measures. The interpretation of test scores would undoubtedly be clearer 

    and less ambiguous if tests were regularly named in terms of the criterion

  • 8/15/2019 43423998 Anne Anastasi Psychological Testing I

    22/104

    Context of Psychological   Tes/ing

    '~:~hl:oughwhich they had been validated. A tendency in this direction

     pe'recognized in such test labels as "~cholastic aptitude test" and 

    sonnel classification test" in place of the vague title "intelligence

    'SONS FOR CONTROLLING THE USE OF

    ,CHOLOCICAL TESTS

    ' y   I:have a Stanford-Binet blank? ~fy   nephew has to take it next week for ;i~s ion to,School X and I'd like to give him ~ol1lepractice so he can pass."

    o improve the reading program in our school, we need a culture-free IQ

    ,t .that measures each child's inllate potential."

    st night I answered the questions in an intelligence test published in a~gazineand   I   got an  IQ  of   SO-I   think psychological tests are silly."

    . . ' y   roommate is studying psych. She gave me a personality test and I came1neurotic. I've been too upset to go to class ever since."

    , 'ast ~'enryou gave a new personality test to our employees for research pur-.; poses.We would now like to have the scores for their personnel folders."

    The above  ·remarks are not imaginary. Each is based on a re~fincident,

    nd the list could easily be extended by any psychologist. SuQ't remarks

    'lustrate potential misllses or misinterpretations of psychological tests in

    uch wavs, as to rrnder the tests worthless or to hurt the indi:,V;idual.Like

    ny sd~ntillc instrument or precision tool,   psychological t~~s"roJ!~.LP.!: _ 

    9perly used to be effective. In the hands of either the unscrupulous or 

    "we -meamng ut uninformed user ,   such tests can cause serious

    ~~~   ~. There are two principal reasons for controlling the use of psychological

    ests:   (a )   to revent general familiarity with test content, which would 

    .' invalidate the test an ( to ensure tat e test is used ~ a qualified   :>

    , '~\' if an individual were to merr  'lbrize the correct'   re-

    O'   sponses on a test o'f'color blindness, such a test w ~l d n o l on ge r b e a

    'measure of color vision for him. Under these condItions, the test would 

     be completely invalidated. Test content clearly has to be restricted in

    , order to forestall deliberate efforts to fake scores.

    In other cnses, however, the effect of familiarity may be less obvious,

    or the test may be invalidated in good faith by misinformed persons. A

    \ ,schoolteacher, for example, may give her class special praettee in prob-

    .1ems closely resembling those on an intelligence test, "so that the pupils

    will be well prepared to take the test." Such an attitude is simply a carry-

    "over from the usual procedure of preparing for a school examination.

    When applied to an intelligence test, however, it is likely that such

    specific training   01'  coaching will raise the scores on the test without ap-

     preciably affecting the broader area of beha"ior the test tries to sample.

    Under such conditions. the validity of the test as a predictive instl'l1ment

    is reduced.

    The need for a qualified examiner is evident in each of the three major aspects of the testing situation-selection of the test, administration and 

    scoring, and i~terpretation of scores. Tests cannot be chos'en like lawn

    mowers, from a mail-order catalogue.   They cannot be evaluated by name,

    author, or other easy marks of identification. To be sure, it requires no

     psychological training to consider such factors as cost, bulkiness and ease

    of transporting test materials, testing time required, and ease and rapidity

    of scoring.   Information on these practica] points can '\lsually be obtained 

    from a test catalogue and should be taken into account in planning a test-

    ing program. For the test to serve its function, however, an e"nlnation of 

    its technical merits'   in terms of such characteristics as validity reliability

    difficulty level, and norms is essential. Only in such a way'   ~an the tes~

    user determine the appropriateness of an)' test for his particular purpose

    and its suitability for the type of persons with whom he plans to use it.

    The introductory discussion of test standardization earlier in this chap-

    ter has ah'eady suggested the importance of a trained examiner. An ade-

    quate realization of the need to follow instructions precisely, as well as a

    thorough familiarity with the standard instructions, i~ required if the test

    scores obtained by different examiners are to be comparable or if anyone

    individual's score is to he evaluated in terms of the published norms.

    Careful conh-ol of testing conditions is also essential. Similarly,   incorrect

    or inaccurate scoring may render the test score worthless. In the absence

    of proper checking procedures, scoring errors are far more likeh-   to occur 

    than is generally realized .   .   ,\

    The proper interpretation of test scores requires a thorough under-

    standing of the test, the individual, and the testing

  • 8/15/2019 43423998 Anne Anastasi Psychological Testing I

    23/104

  • 8/15/2019 43423998 Anne Anastasi Psychological Testing I

    24/104

  • 8/15/2019 43423998 Anne Anastasi Psychological Testing I

    25/104

    manifest a number of unfavorable attitudes, such as suspicion, insecurity,

    fl'ar, or cynical indifh'renee.   Abnormal conditions in their past experiences

    are also likely to influence their test perforrnanee adversely. As a result

    of early failures and frustrations in school, for example, they may have

    developed feelings of hostility and inferiority toward academic tasks,

    \rhich the tests resemble. The experienced examiner makes special effortsto establish rappolt under these conditions. In any event, he must be

    sensitive t~ these special difficulties and take them into account in inter-

     pr eting and explaining test performance.

    In testing any school-age child or adult, one should bear in mind that

    e\'e1')'test presents an implied threat to the individual's prestige. Some

    reassurance should therefore be given at the outset. It is helpful to ex-

     plain,  f or  e xample, that no one is expected to finish or to get all the itcms

    correct. The individual might otherwise experience a mounting sense of 

    failure as 11e advances to the more difficult items or finds that he is un-

    able to finish anv subtest within the time allowed.

    It is also desil:able to eliminate the element of surprise from the test

    situation as far as possible, because the unexpected and unknown are

    likely to produce al1xiet~'.  :Many group tests provide a prdiminaryex-

     planatory statement that is read to the group by the examiner. An even

     better procedure is to announce the tests a few days in advance and to

    give each subject a printed booklet that explains the purpose and nature

    of the tests,   offers general suggestions on how to take tests,   and contains

    a few sample items. Such explanatory booklets are regularly available to

     participants in large-scale testing programs such as those conducted    bythe College Entrance Examination Board (1974a, 1974b). The United 

    States Employment Service has likewise de\'eloped a booklet on how to

    take tests, as well as a more extensive pretesting orientation~.technique

    for use with culturally disadvantaged applicants unfamili~f .   ,v'ith tests.

    \1ore general orientation booklets aie also  . available,   si 't c1 1   as   l\feetingthe Test   (Anderson, Katz,  &   Shimberg, 1965), A tape recOl'ding and two

     booklets are combined in   Test Orientatioll Procedure   (TOP),   designed 

    specifically for job applicants with little prior testing experience CBen-nett   &   Doppelt, 1967), The first booklet, used together with the tape,

     provides general information on how to take tests; the second contains

     practice tests. In the absence of a tape recorder, the examiner may read 

    the instructions from a printed script.

    Adult testing presents--some additional problems. Unlike the school-

    child, the adult is not so likely to work hard at a task merely because it is

    assigned to him. It therefore becomes more important to "sell" the pur-

     pose of the tests to the adult, although high school and college students

    also respond to such an appeal Cooperation of the examinee can usually

    ; be secured by convincing him that it is in his own interests to obtain a\,

    valid score,   Le., a score correctly indicating wh~lt he can do rather than

    overestimating or underestimating his abilities.   ~Iost persons will under-

    stand that an incorrect decision, which might result from invalid test

    scores, would mean subsequent failure, loss of time, and frustration for 

    them. This approach can serve not only to motivate the individual to

    try his best on ability tests but also to reduce faking and encourage frank reporting on personality inventories, because the examinee realizes that

    he himself would otherwise be the loser. It is certainly not in the best

    inter ests of the individual to be admitted to a course of study for which

    he is not qualified or assigned to a job he cannot perform or that hewould find uncongenial.

    :\lany of the practices designed to enhance rapport sen'e also to reduce

    test anxiety. Procedures tending to dispel surprise and strangeness from

    the testing situation and to reassure and encourage the subject shottld 

    certainly help to lower anxiety. J'he examiner 's own manner and a well-organized , smccthly running testing operation will contribute toward the

    same goal. Individual differences in test anxiety have been studied with

    hoth schoolchildren and college students (Ga~dry& Spielberger, 1974;-

    Spielberger ,   19i2).   Much of this research was initiated bv Sarason and 

    his associates at Yale (Sarason, Davidson, Lighthall, "'aite, & Ruebush,

    1960). The first step was to construct a questionnaire to assess the indi-

    vidual's test-taking attitudes.   The children's form, for example,   containsitems such as the following:

    Do you worry a lot before taking a test?

    \\'hen the teacher sa~'sshe is going to find out how much you h,we learned,does your healt begin to beat faster?

    While 'you are taking a test, do you usually think you are not doing wen.

    Of primary interest is the finding that both school achievement and intel-

    ligence test scores yielded significant negative correlations with test anx-

    iety. Similar correlations have been found among college st1tdcn!s (1. G.

    Samson, 1961). Longitudinal studies likewise revealed an inverse relation

     between changes in anxiety level and changes in inteJligence or achieve-

    ment test perfonnance   (Hill   &   Sarason, 1966; Sarason, Hill,   &   Zim- bardo, 1964).   .

    ~uch findings, of course, do not indicate the direction of caUsal relation-slllps.   It   is possible that children develop test anxiety because they per-

  • 8/15/2019 43423998 Anne Anastasi Psychological Testing I

    26/104

    Con t e xt o f P s y dl ( Jl o g ic a l T e s t iu g

    formpoor ly on tests and haw thus experienced failure and frustration in

     previous test situations.   In support of this interpretation is the finding

    that \\ithin subgroups of high scorers on intelligence tests, the negative

    "rrelation between anxiet~'   level and test performance disappears

    Denny, 1966; Feldhusen   &   Klausmeier, 1962). On the other hand ,   there

    5   evidence suggesting that at least some of the relationship results fromhe deleteLious effects of anxiety on test performance. In one study

    (:Waite,Sarason, Lighthall,   &   Davidson, 1958), high-anxious and low-

    ,   'iotlschildren equated in intelligence test scores were given repeated 

    ialsin a learning task Although initially equal in the learning test,   the

    w-allxiousgroup improved significantly more than the high-anxious.

    Severalinvestigators have compared test performance under conditions

    esigned to evoke "anxious" and "relaxed" states. Mandler and Sarason   f ;;( .1952) ,   for example, found that ego-involving instructions, such as telling

    su bjects that everyone is expected to finish in the time allotted ,   h ad a

     beneficialeffect on the performance of low-anxious subjects, but a dele-

    teriouseffect on that ofbigh-anxious subjects. Other studies have likewise

    foundan interaction between testing conditions and such individual char-

    ~cteristicsas anxiety level and achievement motivation (Lawrence,   1962;Palll &   Eriksen, 1964).   It thus appears likely that the r~latjQn between

    anxiety,and test p erformance is nonlinear, a slight amount Qf anxiety

    , \le in   bencficia~ while a lar e amount is detrimental.   Individuals who are

    ',cllstomariy ow-anxious benefit from test con i,tions t lat arouse some

    e t : > ,   ",hi e t lose who are customarilv hiil tests may be

    undulyaffected by test anxiety.   In a thorough ana::4ontrol1ed investi.

    gationof this question,   French   (1962)   compar~d Jhf' p,erformancc of high

    schoolstudents on a test given as part of the  f e-gular administration of 

    the SAT with performance on a parallel form of the test administered at

    ,a different time under "relaxed" conditions,   The instructions on the latter 

    ,   occasionspecified that the test was given for  'research purposes only and 

    scoreswould not be sent to any college. The results showed that per-

    formance was no poorer during the standard administration than during

    the r elaxed administration. Moreover ,   the concurrent validitv of the test

    scoresagainst high school course grades did not differ signifi~antly under 

    the two conditions.

    Comprehensive surveys of the effects of examiner and situational

    variables on test seores'lmve been prepared by S. B.   Sarason   (1954),

    Masling  (l~60),

      ~foliarty  (1961,   1966),

      Sattler and Theye  (1967),

    Palmer    (19,0),   and Sattler (1970,   1974).   Although some effects have

     been demonstrated with objective group tests, most of the data have been

    obtained with either projective techniques or individual intelligence   tests.

    These extraneous factors are more likely to operate with unstructured and 

    ambiguous   stimuli, as well as "ith difficult and nO"el tasks, than with

    clearly defined and well-learned functions. In general,   children are more

    susceptible to examiner and situational influences than are adults; in the

    examination of preschool children, the role of the examiner is especially

    cruCiaL.Emotionally disturbed and insecure persons of an\'   age are also

    mClre likely to be affected    by   such conditions than are well-adjusted  persons,

    There is considerable evidence that test results may vary systematically

    as a function of the examiner (E.   Cohen,   1965; ~'Iasling,   1960).   These   dif-

    f erences may he related to personal characteristics of the examiner, such

    as his, age,   sex, race,   professional or socioeconomic status,   training and 

    expenence,   personality charaderistics,   and appearance.   Se\'eral studies of 

    thes~ examiner variables, however ,   have   yielded misleading or illcon-

    cluSl\'e results because the experimental designs failed to control or iso-

    late the influence of differcnt examiner or subject characteristics.   Hence

    thp l:'ffeds of two or more variables ma\,   be confounded.

    The   examiner 's behavior before and during test auministration has also

    heen s~lown to affect test results, For example, controlled investigations

    ha\'e YIelded significant d ifferences in intelligence test performance as a

    res~lt of a "warm"   versus a   "cold" interpersonal relation between ex-

    amllJer   and examinees, or a rigid and aloof versus a natural manner on

    the part of the examiner (Exner, 1966;   Masling, 1959). Moreover, there

    may be Significant interactions between examiner and examinee'   charac-t " ,   he~lstJCs,III t e sen~e that the same examiner characteristic or testing man-

    nel   may   have a dIfferent effect on different examinees as a function of 

    the examinee's Own personality characteristics. Similar interactions may

    occur '~ith task variables, such as the nature of th,e test, the purpose of 

    the testing, and the instructions given to the subjects.   Dyer (1973) adds

    even more variables to this list, calling attention to the possible inHir enceof  th   t   t· ,   d    . ,"   ..   c es gIVers an the test takers' diverse perceptions of the funetigllsand goals of testing.'   '

    St'll   '•'. I.  an,other way in which an examin8r may inadvertently affect the

    ~x~~m~e s responses is through ~is own ' cexpectations,   This is simply a

    P clal mstance of the self-fulfilhng prophecy (Rosenthal, 1966; Rosen-

  • 8/15/2019 43423998 Anne Anastasi Psychological Testing I

    27/104

    40   Co nt e xt of  P sycholog.ical Testing

    thaI  &   Rosnow, 1969).   -Anexper iment conducted with   the Rorschach   will

    illustrate this effect (Masling,   1965). The examiners   were 14 graduate

    student volunteers, 7 of whom were told, among other things,   that ex-

     perienced examinel's elicit more human than animal responses from the

    subjects, while the other 7 were told that   exper ienced examiners elicit

    more animal than human responses.   Under these conditions,   the two

    groups of   examiner s obtained significantly diHerent ratios of animal tohuman r esponses   f rom theh subjects. These diff erences occurr ed d espite

    the fact that neither examiners nor   subjects reported awareness of any

    influence attempt.   ~f oreover, tape   r ecordings of all testing sessions   r e-

    vealed no evidence of verbal   influence on the par t of any   examiner .   The

    examiners' expectations apparently operated through subtle postural and 

    f acial  cues   to which the subjects responded.

    Apa~ f rom the examiner,   other aspects of the testing situation may

    Significantly   aff ect test per formance