On the statistical meaning of the item parameters in IRT models - … · 2018-09-12 ·...

Ernesto San Martín (LIES) Statistical meaning of IRT parametersAgosto 2018, Juiz de Fora, Brasil 1

On the statistical meaning of the item parame-ters in IRT modelsCONBRATRI VI

Ernesto San Martín1Faculty of Mathematics, Pontificia Universidad Católica de Chile, Chile2The Economics School of Louvain, Université catholique de Louvain, Belgium3Laboratorio Interdisciplinario de Estadística Social, LIES, Pontificia Universidad Católica de Chile, Chile

Agosto 2018, Juiz de Fora, Brasil

What means to model aeducational phenomenon?

We observe the response patterns of 50 students on 7 items:

It1 It2 It3 It4 It5 It6 It7Student 1 1 0 0 1 1 0 1Student 2 0 1 1 0 1 1 0Student 3 1 1 1 1 0 1 0Student 4 0 0 1 0 0 0 1Student 5 0 0 0 1 1 1 1Student 6 1 0 0 1 1 0 1Student 7 0 1 1 0 0 1 0Student 8 1 1 0 0 1 1 0Student 9 0 0 1 1 0 0 0Student 10 1 0 0 1 1 1 1

We observe the response patterns of 50 students on 7 items:

It1 It2 It3 It4 It5 It6 It7Student 1 1 0 0 1 1 0 1Student 2 0 1 1 0 1 1 0Student 3 1 1 1 1 0 1 0Student 4 0 0 1 0 0 0 1Student 5 0 0 0 1 1 1 1Student 6 1 0 0 1 1 0 1Student 7 0 1 1 0 0 1 0Student 8 1 1 0 0 1 1 0Student 9 0 0 1 1 0 0 0Student 10 1 0 0 1 1 1 1

We intend to describe the behavior of items.

We intend to describe the behavior of students.

In particular, we intend to identify test fraud as answer copyingbetween two examinees.

We can face these questions using psychometric models.

Detecting answer copyingand IRT models

IRT models: student’s answer on an item depends on both anindividual characteristic (ability) and item characteristics(difficulty, discrimination, guessing).

3PL model: P(Ypi = 1) = ci + (1− ci )exp(αiθp − βi )

1+ exp(αiθp − βi ).

If αi = 1 then we obtain the 1PL-G model.If ci = 0 then we obtain the 2PL model.If αi = 1 and ci = 0 then we obtain the 1PL model.

There are other extensions: the 4PL Model . . .

−4 −2 0 2 4

beta=0beta=1beta=−1

●●

●●●●●●●●●●●●●●●●

●●●

●●

●●●

10 20 30 40

DE=0.31; CIT=0.37

Puntaje Total

Zopluoglu (2016) analyzes the statistical behavior of answer-copying indexes underdifferent IRT models (dichotomous and polychotomous)

He remarks that copying indexes share the same rationale, but the computation ofthe probability of choosing the alternative k of the item j for the person p is doneusing either an IRT model or the CTT framework.

The objective of his contribution is to analyze the type I error behavior and thestatistical power of copying indexes.

Data: 40 items of mathematics applied to 67896 students.

Adjust 1PL, 2PL and 3PL using IRTPRO (which provides MML estimators for theparameters that characterize the items, although it also provides Bayesianestimates).

●●

● ●●

●●

−4 −3 −2 −1 0 1

Con datos de Zopluoglu (2016)

Dificultades 1PL

●●

● ●

1PL vs 2PL (0.81)1PL vs 3PL (0.86)

Study 1: manipulate sample sizes, amount of copy, type of copy, levels of itemsdifficulties.

Item difficulties: easy and medium difficulties . . . “overall test difficulty wasmanipulated through the b parameters for the dichotomous IRT models” (p.595).

Remark: Given the high correlations between the estimators of the difficulties, it ispossible to keep this part of the design comparable with respect to the different IRTmodels . . .

¿or not? ¿why?

For easy test difficulty conditions, test difficulty was around .80 for typical simulated responsedata which was similar to the real dataset. For medium test difficulty conditions, the b parametersfor dichotomous IRT models [. . . ] were manipulated such that test difficulty was around .50for typical simulated response data. This was accomplished by adding a constant of 1.52 forthe 1PL, 1.56 for the 2PL, and 1.53 for the 3PL model to the b parameters used for the easyconditions (pp.595-596.

Remark: Given the high correlations between the estimators of the difficulties, it ispossible to keep this part of the design comparable with respect to the different IRTmodels . . . ¿or not? ¿why?

Copying types:

Random copying: it is assumed that the student who copies, copies responsesfrom a student-source in a random manner, so that all items are equally likelyto be copied.

Difficulty weighted copying: the items are ordered from the easiest to the moredifficult. The probability that each item is copied is proportional to their rank.

Outcome of interest: the AUC is used as a measure ofclassification accuracy for how well an index separates theanswer-copying and honest pairs of students.

Results:

Amount of copy is the most important factor that affect the classification’sperformance.

The difficulty level of items and the copy types has a negligible effect.

It was expected that the indexes would be reduced to some degree given thepresence of guessing, but it doesn’t.

Critical review?

Initial question: What means to model an educationalphenomenon?

It means to provide structural definitions of educational concepts

either by introducing explicit structural definitions,

or by introducing statistical models the parameters of which“represent some properties of the population under analysis”(Fisher, 1922).

Critical review?

On the statistical meaning of the item parameters in IRT models - … · 2018-09-12 ·...

Documents

Transcript of On the statistical meaning of the item parameters in IRT models - … · 2018-09-12 ·...