Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered...

43
REL Midwest and ARC Item Response Theory Item Response Theory Kimberly Maier, Ph.D. Assistant Professor, Measurement and Quantitative Methods Michigan State University Andrew Swanlund Senior Research Associate and Psychometrician Learning Point Associates November 2009

Transcript of Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered...

Page 1: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC

Item Response TheoryItem Response Theory

Kimberly Maier, Ph.D.Assistant Professor, Measurement and Quantitative MethodsMichigan State University

Andrew SwanlundSenior Research Associate and PsychometricianLearning Point Associates

November 2009

Page 2: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC2

Topics Covered

Rationale for IRT/Rasch methods

The 1PL/Rasch Dichotomous Model

The Rasch Rating Scale Model

Psychometric Analysis/Diagnostics

Advanced IRT Models

Page 3: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC3

What’s a Latent Trait?

An unobservable trait (theorized to exist) that cannot be directly measured, although it can usually be easily described.

Examples include the following:– Mathematics achievement– Intelligence– Attitudes– Opinions

Tests or questionnaires are used to assess the latent trait.

Page 4: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC4

Test Theory

Test theory or psychometrics is used to:– Develop questionnaires.

Reliability

Validity

Determine dimensionality

– Provide measures (on a scale) for examinees.

Two “flavors” of test theory:– Classical Test Theory– Modern Test Theory (IRT)

Page 5: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC5

Classical Test Theory

True score = Observed score + error

Focus on raw test scores

Item difficulty– How hard is it to get the item “right”?– Or, how hard is it to agree with a statement?– Measured as the proportion of respondents who

get the item “correct.”

Page 6: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC6

Classical Test Theory

Item discrimination– How effectively an item differentiates between

examinees who are high and those who are low on the latent trait.

– Two types of measures of discrimination:

Index of discrimination – cannot perform statistical significance tests to determine if it is zero.

Various correlation coefficients (e.g., point biserial, biserial, tetrachoric, and phi coefficient) – measure of relationship between responses on an item and the performance on the entire test.

Page 7: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC7

What Classical Test Theory Can’t Do

A person’s ability level and the survey item difficulties cannot be estimated separately.– Implications:

A person’s measure of the latent trait is dependent on the survey items administered.

Items’ means depend on the sample of people who took the survey.

– Therefore, ALL estimates of the model are sample dependent and cannot be compared across samples varying in the distribution of the underlying latent trait.

Page 8: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC8

What Classical Test Theory Can’t Do

Doesn’t provide information about how examinees at different ability levels on the trait have performed on individual items.

Difficult to compare performance of examinees who have taken different tests that measure the same trait.

Difficult to apply results to another group to be tested.

Page 9: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC9

Item Response Theory

Items on a test/instrument measure a single latent trait or several latent traits (multidimensional)

Allows one to compare the performance of one group taking Test A with another group taking Test B

The results of an item analysis can be applied to groups of respondents other than the original group used for the analysis

Page 10: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC10

General Ideas of IRT

The item response model gives us an idea of the probability that a person with latent trait level θ will correctly answer an item of difficulty δ.

The relationship between ability (attitude) and item response is characterized by an item characteristic curve.

Each person is assumed to have a level of ability that situates him or her on the item characteristic curve.

Page 11: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC11

Item Characteristic Curve (ICC)

Page 12: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC12

Important Points About the ICC

Item difficulty– The level of ability at which 50 percent of the respondents

are able to correctly answer the item

Item discrimination– The slope of the item characteristic curve– Determines difference in probabilities of respondents of

different ability levels to answer the item correctly

Difficulty and discrimination are independent of one another.

Page 13: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC13

Measurement

The value of a latent trait measure θ usually varies from -3 to +3, although the limits are - ∞

to +∞

(this can be rescaled to any metric if

desired).

The higher one’s ability level, the higher his or her probability of correctly answering the item.

Page 14: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC14

Psychometric Models

Three Main Properties/Assumptions of IRT (including Rasch)*– Unidimensionality– Local Independence– Monotonicity of the Item Response Functions

*These also hold for Factor Analysis and Classical Test Theory, but we discuss them here in the IRT framework.

Page 15: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC15

Psychometric Models

Unidimensionality– The latent trait (or construct) is represented by a single

number (often denoted θ)– Examples – mathematics ability, self-efficacy– Some constructs are multidimensional – personality – “the

big five” (openness, conscientiousness, extroversion, agreeableness, neuroticism).

Models we talk about today are for unidimensional constructs only.

Can model multidimensional constructs with MIRT (Reckase, 2009)

Page 16: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC16

Psychometric Models

Local Independence– Responses to items are independent from one

another after taking into account examinee/respondent ability

– Items shouldn’t cue one another – that is, knowing the answer to one item gives you the answer to another

Page 17: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC17

Psychometric Models

Monotonicity of the Item Characteristic Curve– Higher ability examinees should have a higher

probability of successful/favorable response than lower ability examinees

Page 18: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC18

The 1PL/Rasch Dichotomous Model

What kind of data can we use with the dichotomous model?– Multiple-choice or correct/incorrect test data– Checklist data– Yes/no survey responses

Page 19: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC19

The 1PL/Rasch Dichotomous Model

The 1-Parameter Logistic (or Rasch) Model:

Note: The probability of a correct response depends only on ability of the person () and the difficulty of the item ().

ji

jijiXP

exp1exp

,|1

Page 20: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC20

1PL/Rasch Theoretical ICCs

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-3 -2.7

-2.4

-2.1

-1.8

-1.5

-1.2

-0.9

-0.6

-0.3

1.527

E-15 0.3 0.6 0.9 1.2 1.5 1.8 2.1 2.4 2.7 3

Ability

Prob

.

Diff iculty = -1

Diff iculty = 0

Diff iculty = 1

Page 21: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC21

The 1PL/Rasch Dichotomous Model

Measures of ability and item parameters are reported in logits:

– The mathematical unit of ability is defined as the log odds for succeeding on items of the kind chosen to define the “zero” point on the scale.

– The mathematical unit of an item’s difficulty is defined as the log odds for eliciting failure from persons with “zero” ability.

logit ln1

i

i

pp

Page 22: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC22

The 1PL/Rasch Dichotomous Model

Probabilities of correctly answering an item with difficulty of 1.0: Ability Probability

-3.00 0.02-2.00 0.05-1.00 0.120.00 0.271.00 0.502.00 0.733.00 0.88

Page 23: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC23

2-PL Model

The 2-parameter logistic model:

Note: The probability of a correct response depends ability of the person (

), the difficulty of the item (bj ), and the discrimination of the item (aj ).

jij

jijjji ba

babaXP

exp1

exp,,|1

Page 24: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC24

2-Parameter Logistic ICCs

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-3 -2.7

-2.4

-2.1

-1.8

-1.5

-1.2

-0.9

-0.6

-0.3

1.527

E-15

0.3 0.6 0.9 1.2 1.5 1.8 2.1 2.4 2.7 3

Ability

Prob

.

Page 25: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC25

3-PL Model

The 3-parameter logistic model

Note: The probability of a correct response depends ability of the person (i ), the difficulty of the item (bj ), the discrimination (aj ), and the guessing parameter (cj ).

jij

jijjjjjji ba

bacccbaXP

exp1

exp1,,,|1

Page 26: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC26

Difference Between Rasch and 2/3PL?

2/3PL models will likely better fit the data (higher parameterized models do that), but it requires a lot of data to find stable parameter estimates

Modeling items with high/low discrimination and high guessing parameters can lead to the inclusion of lower quality items (according to some); conversely, Rasch analysis may result in omitting some items

In 2/3PL, pattern of responses matter; in Rasch, there is one scale score per raw score

Page 27: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC27

The Rasch Rating Scale Model

What are some uses of the rating scale model?– Survey response data with a standard set of

responses across many items (SD, D, A, SA)– Scoring rubrics where the performance levels are

defined similarly across all indicators

If score definitions vary across items, would require Partial Credit Model

Page 28: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC28

The Rasch Rating Scale Model

Here’s the math…

))((exp

))((exp

00

0

jin

k

j

m

k

jin

x

jnix

Page 29: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC29

Rasch Probability Curve

Category Probability Curves for an Agreement Scale…

ThresholdsThresholds

Page 30: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC30

“Validation” of an Instrument

Some psychometric properties to consider– Reliability– Rating scale functioning– Item and person fit– Point-measure correlation– Differential item functioning (DIF)– Dimensionality

Page 31: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC31

“Validation” of a Survey

Reliability– A definition: the degree to which scores are free

from measurement error, or how consistent the scores are within an administration and over time.

– In general, reliability increases with the number of items and the ability of those items to spread people out along the scoring metric.

– Rules of thumb – 0.7 = OK, 0.8 = good, 0.9 = excellent (but it depends on the use of scores)

Page 32: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC32

Reliability and Score Distribution

A scale with reliability 0.40

20.00 40.00 60.00 80.00

TLSS

0

50

100

150

Cou

nt

Page 33: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC33

“Validation” of a Survey

Rating Scale Functioning– Are the respondents using the rating scale in a

consistent fashion?– Are any categories being over- or underutilized?– Is there a good distribution of responses across

categories?– Are the categories “disordered”?

Page 34: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC34

Disordered Categories

A frequency scale with a problem…(never, daily, weekly, biweekly, monthly)

4/5 Threshold

3/4 Thresholddaily

never

weekly

Page 35: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC35

What’s wrong with this scale?

A five-point partial credit observation item…

Page 36: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC36

“Validation” of a Survey

Item and Person Fit– Are there unpredictable responses in the data (under-fit)?

Can indicate multidimensionality, confusing wording or multiple meaning, content not consistent with the construct, or multiple classes of respondents

– Are there responses that are too predictable (over-fit)?

Can indicate redundancy in the items (or response sets)– Are those responses made be individuals at the center of

the distribution or at the extremes?

Page 37: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC37

“Validation” of a Survey

An example of a misfitting person…

KEY: .1.=OBSERVED, 1=EXPECTED, (1)=OBSERVED, BUT VERY UNEXPECTED.

NUMBER - NAME ------------------ MEASURE - INFIT (MNSQ) OUTFIT - S.E.

372 3457102 62.28 4.8 A 4.8 7.85

-10 10 30 50 70 90 110

|---------+---------+---------+---------+---------+---------| NUM Item

(2) 4 10* 3c

.4. 12* 3e

4 (5) 14* 3g

.3. 4 9* 3b

(3) 4 8* 3a

4 (5) 13* 3f

4 (5) 11* 3d

|---------+---------+---------+---------+---------+---------| NUM Item

-10 10 30 50 70 90 110

Page 38: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC38

“Validation” of a Survey

Point-measure correlation

Extent to which item-rating correlates with total score.+----------------------------------------------------------------------------+|ENTRY RAW MODEL| INFIT | OUTFIT |PTMEA| ||NUMBER SCORE COUNT MEASURE S.E. |MNSQ ZSTD|MNSQ ZSTD|CORR.| Item G ||------------------------------------+----------+----------+-----+-----------|| 115 13 15 35.86 8.33|1.51 1.1|9.90 3.5|A-.24| @51b D || 116 29 377 90.20 2.05|1.23 1.5|4.26 6.0|B-.02| I52 C || 79 46 342 83.49 1.72|1.46 3.8|4.25 8.2|C-.16| I32 E || 114 15 373 97.81 2.73|1.11 .5|3.86 3.8|D .02| I51 C || 120 327 369 34.48 1.75|1.31 2.5|2.98 5.3|E-.04| I54 C || 117 17 29 53.96 4.51|1.72 2.8|2.83 3.7|F .05| @52b D || 124 286 368 43.80 1.38|1.20 2.6|1.63 3.6|G .21| avail57 A || 113 372 378 12.33 4.16|1.02 .2|1.61 .9|H .07| I50 D || 118 144 356 64.60 1.23|1.33 5.8|1.58 6.1|I .20| I53 C || 121 263 324 41.06 1.55|1.19 2.1|1.53 2.5|J .21| @54b D || 75 319 376 38.22 1.55|1.10 1.1|1.32 1.5|K .26| I28 E || 119 100 142 51.10 2.09|1.16 1.6|1.30 1.5|L .35| @53b D || 122 247 338 47.16 1.36|1.04 .6|1.21 1.6|M .37| I55 C || 123 259 370 48.77 1.27|1.19 3.1|1.17 1.5|N .30| avail56 A |

Page 39: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC39

“Validation” of a Survey

Differential Item Functioning Analysis– A method for looking for item bias (or differing

perceptions of the items for a survey)– Item bias is different from test bias (the

cumulative effect of item bias on total score)– DIF analysis looks for items where similar ability

respondents from different demographics respond in a very different manner

Page 40: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC40

“Validation” of a Survey

Dimensionality (Rasch PCA)– Similar to exploratory factor analysis– Examines variance structure in the model

residuals after factoring out variance explained by scale scores

– Looks at correlations in the residual matrix to identify factors (dimensions) that may be affecting patterns of responses

Page 41: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC41

Estimation of Model Parameters

The values of the respondents’ latent trait measures and the difficulties of the items are all unknown quantities.

Likelihood approaches are commonly used to estimate parameters– 1PL/Rasch – Joint maximum likelihood (JMLE),

conditional maximum likelihood (CMLE)– All models – Marginal maximum likelihood

(MMLE)

Bayesian techniques are another option

Page 42: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC42

Available Software

IRT (1/2/3PL, Graded Response Model, Generalized Partial Credit Model)– BILOG-MG 3.0, PARSCALE 4.0, TESTFACT 4.0,

and MULTILOG 7.0, R modules such as eRm

Rasch (1PL, Rasch Rating Scale Model, Partial Credit Model)– WINSTEPS, RUMM, BIGSTEPS, Conquest,

WINMIRA, R modules such as plink

Page 43: Item Response Theory - University of Chicago Webinar... · 2. REL Midwest and ARC Topics Covered Rationale for IRT/Rasch methods The 1PL/Rasch Dichotomous Model The Rasch Rating Scale

REL Midwest and ARC43

More Advanced Models

Multidimensional random coefficients multinomial logit model (MRCML) – Conquest

Mixture distribution Rasch models (latent class analysis) – WINMIRA

Many-faceted Rasch model (FACETS)

Multilevel Rasch models (HLM)