Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to...

21
Measurement Concepts & Measurement Concepts & Interpretation Interpretation

Transcript of Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to...

Measurement Concepts & Measurement Concepts & InterpretationInterpretation

Scores on tests can be Scores on tests can be interpreted:interpreted:• By comparing a client to a peer in the By comparing a client to a peer in the

norm group to determine how different the norm group to determine how different the client is from the norm group (inter-client is from the norm group (inter-individual)individual)– Scores provided in norm tablesScores provided in norm tables– The score in the norm table usually indicates The score in the norm table usually indicates

how the client with peers in same age group or how the client with peers in same age group or gradegrade

Interpretation, cont.Interpretation, cont.

• Comparing a Comparing a client with his or client with his or her own her own performance performance (intra- individual)(intra- individual)

Define:Define:

• MeanMean

• MedianMedian

• ModeMode

So you wanna use psychological So you wanna use psychological tests…tests…

• Um, CAREFULLY review the test manualUm, CAREFULLY review the test manual

• Consider these aspects:Consider these aspects:– Theoretical Orientation of test/instrumentTheoretical Orientation of test/instrument– Practical ConsiderationsPractical Considerations– StandardizationStandardization– ReliabilityReliability– ValidityValidity Gary Groth-Marnat, 2003Gary Groth-Marnat, 2003

Theoretical OrientationTheoretical Orientation

• Do you adequately understand the Do you adequately understand the theoretical construct the test is theoretical construct the test is supposed to be measuring?supposed to be measuring?– If not, do some research.If not, do some research.

• Do the test items correspond to the Do the test items correspond to the theoretical description of the construct?theoretical description of the construct?– Usually manuals provide individual Usually manuals provide individual

analyses of the items…are the items analyses of the items…are the items relevant?relevant?

Practical ConsiderationsPractical Considerations• If reading is required by the examinee, does If reading is required by the examinee, does

his or her ability match the level required by his or her ability match the level required by the test?the test?– Tests vary in terms of the level of educationTests vary in terms of the level of education

• How appropriate is the length of the test?How appropriate is the length of the test?– Some are too damn long and who likes Some are too damn long and who likes

that?that?

• You can always get additional training for You can always get additional training for some tests so you become Über good at it.some tests so you become Über good at it.

Standardization (adequacy of Standardization (adequacy of norms)norms)

• Is the population to be test similar to Is the population to be test similar to the population the test was the population the test was standardized on?standardized on?

• Was the size of the standardization Was the size of the standardization sample adequate?sample adequate?

• Have specialized subgroup norms Have specialized subgroup norms been established?been established?

• How adequately do the instructions How adequately do the instructions permit standardized administration?permit standardized administration?

Norms!Norms!

ReliabilityReliability

• The reliability of a test refers to its The reliability of a test refers to its degree of stability, consistency, degree of stability, consistency, predictability, and accuracypredictability, and accuracy

• Are reliability estimates sufficiently Are reliability estimates sufficiently high? (correlations generally high? (correlations generally around .90 for clinical decision making around .90 for clinical decision making and around .70 for research purposes)and around .70 for research purposes)

• What implications do the relative What implications do the relative stability of the trait, the method of stability of the trait, the method of estimating reliability, and the test estimating reliability, and the test format have on reliability?format have on reliability?

You tell me…You tell me…• Test-Retest ReliabilityTest-Retest Reliability

– The reliability coefficient is calculated by The reliability coefficient is calculated by correlating the scores obtained by the same person correlating the scores obtained by the same person on two different administrations.on two different administrations.

• Alternate FormsAlternate Forms– Trait is measured several times on the same Trait is measured several times on the same

individual by using parallel/alternate forms of the individual by using parallel/alternate forms of the test – the different measurements should produce test – the different measurements should produce similar resultssimilar results

• Split half ReliabilitySplit half Reliability– Test only given once (items split in half…and two Test only given once (items split in half…and two

halves are correlated)halves are correlated)• Interscorer ReliabilityInterscorer Reliability

– When scoring is based partially on the judgment of When scoring is based partially on the judgment of the examiner (e.g., Rorschach). Responses are the examiner (e.g., Rorschach). Responses are scored by two people or two people score one scored by two people or two people score one client’s responses)client’s responses)

All tests have a degree of All tests have a degree of errorerror

• The inevitable, natural variation in The inevitable, natural variation in human performancehuman performance– Measures of ability usually have less Measures of ability usually have less

variability than measures of personality…variability than measures of personality…why?why?

• Psychological testing methods are Psychological testing methods are necessarily imprecisenecessarily imprecise– Constructs in psychology are measured Constructs in psychology are measured

indirectlyindirectly

Standard Error of Standard Error of MeasurementMeasurement

• Test scores consist of both truth and errorTest scores consist of both truth and error• SEM provides a range of to indicate how SEM provides a range of to indicate how

extensive that error is likely to beextensive that error is likely to be– The higher the reliability, the narrower the range The higher the reliability, the narrower the range

of errorof error• The SEM is a standard deviation score.The SEM is a standard deviation score.

– A SEM of 3 on an IQ test would indicate that A SEM of 3 on an IQ test would indicate that individual’s score has a 68% chance of being +/-individual’s score has a 68% chance of being +/-3 IQ points from the estimated true score – refer 3 IQ points from the estimated true score – refer back to the normal distribution curveback to the normal distribution curve

– The SEM is a statistical index of how a person’s The SEM is a statistical index of how a person’s repeated scores on a specific test would fall repeated scores on a specific test would fall around a normal distribution (also referred to as around a normal distribution (also referred to as a a confidence intervalconfidence interval))

ValidityValidity

• Wheras reliability addresses issues of Wheras reliability addresses issues of consistency, validity assess what the test is consistency, validity assess what the test is to be accurate about.to be accurate about.

• What criteria and procedures were used to What criteria and procedures were used to validate the test?validate the test?

• Will the test produce accurate Will the test produce accurate measurements in the context and for the measurements in the context and for the purpose for which you would like to use it?purpose for which you would like to use it?– A psychological test is not valid in any abstract A psychological test is not valid in any abstract

or absolute sense. It must be valid in a or absolute sense. It must be valid in a particular CONTEXT and for a specific group of particular CONTEXT and for a specific group of people.people.

Face validityFace validity

• Face validity is present if the test Face validity is present if the test looks good to the persons taking it, looks good to the persons taking it, the policymakers who decide to the policymakers who decide to include it in their programs, and to include it in their programs, and to other untrained personnel.other untrained personnel.

Criterion validityCriterion validity

• Concurrent validityConcurrent validity– Measurements taken at the same, or Measurements taken at the same, or

approximately the same, time as the testapproximately the same, time as the test– Concurrent validation is preferable if an Concurrent validation is preferable if an

assessment of the client’s current status is assessment of the client’s current status is requiredrequired

• Predictive validityPredictive validity– Outside measurements that were taken some Outside measurements that were taken some

time after the test scores were derived. For time after the test scores were derived. For example, the predictive validity may be example, the predictive validity may be evaluated by correlating test scores with other evaluated by correlating test scores with other scores from similar measures a year after the scores from similar measures a year after the initial testinginitial testing

Construct ValidityConstruct Validity

• The extent to which the test measures a The extent to which the test measures a theoretical construct or traittheoretical construct or trait– First, the trait must be carefully analyzedFirst, the trait must be carefully analyzed– Consider the ways in which the trait should Consider the ways in which the trait should

relate to other variablerelate to other variable– Test the hypothesized relationshipsTest the hypothesized relationships

• Does the test converge with variables that Does the test converge with variables that are theoretically similar to it?are theoretically similar to it?

• Does it discriminate from variables that are Does it discriminate from variables that are dissimilar to it?dissimilar to it?

Incremental validityIncremental validity

• For a test to be considered useful For a test to be considered useful and efficient, it must be able to and efficient, it must be able to produce accurate results above and produce accurate results above and beyond the results that could be beyond the results that could be obtained with greater ease and less obtained with greater ease and less expenseexpense

• Hey, self-assessments are pretty Hey, self-assessments are pretty handy!handy!

Beck Depression Inventory II (BDI-Beck Depression Inventory II (BDI-II)II)

• Add up the score for each of the Add up the score for each of the twenty-one questions and obtain the twenty-one questions and obtain the total. The highest score on each of the total. The highest score on each of the twenty-one questions is three, the twenty-one questions is three, the highest possible total for the whole test highest possible total for the whole test is sixty-three. The lowest possible score is sixty-three. The lowest possible score for the whole test is zero. Only add one for the whole test is zero. Only add one score per question (the highest rated if score per question (the highest rated if more than one is circled). more than one is circled).

““So what does my BDI-II score So what does my BDI-II score mean?”mean?”

• Below 4Below 4 = possible denial of = possible denial of depression, faking gooddepression, faking good

• 05-0905-09 = these ups and downs are = these ups and downs are considered normal (i.e., suck it up)considered normal (i.e., suck it up)

• 10-1810-18 = mild to moderate depression = mild to moderate depression• 19-2919-29 = moderate to severe depression = moderate to severe depression• 30-6330-63 = severe depression = severe depression• Over 44Over 44 = pretty damn high even for = pretty damn high even for

severely depressed persons; severely depressed persons; possible exaggeration of symptomspossible exaggeration of symptoms

Same for BAISame for BAI

• 0-210-21 = low anxiety = low anxiety

• 22-3522-35 = moderate = moderate anxietyanxiety

• Over 36Over 36 = high anxiety, = high anxiety, may be severemay be severe