Real-Time Bayesian Parameter Estimation for Item Response ...
Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an...
-
Upload
nguyennhan -
Category
Documents
-
view
224 -
download
0
Transcript of Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an...
![Page 1: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/1.jpg)
Adam Wyse and Ji ZengPsychometricians
Michigan Department of EducationOffice of Educational Assessment and
Accountability
Introduction to Item Response Theory
![Page 2: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/2.jpg)
2
Focus of this session
• The other session discussed Classical Test Theory (CTT).
• The focus of this session is on Item Response Theory (IRT) and how IRT is used at MDE.
![Page 3: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/3.jpg)
3
Basic Differences between CTT and IRT
• Focus on item performance (IRT) versus Total Test performance (CTT).
• Population dependent statistics (CTT) versus population independent statistics (IRT).
• Test specific statistics (CTT) versus Test independent statistics (IRT).
• Definition, which cannot be tested (CTT), versus a model, which can be tested (IRT).
• Few assumptions (CTT) versus several assumptions (IRT).
![Page 4: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/4.jpg)
4
What is IRT?
• Relates student ability and item characteristics to the probability of obtaining a particular score on an item.
• Many IRT models exist, including models for multiple-choice, short answer, and constructed response items.
• Models differ in how probabilities are related to student ability and item characteristics.
![Page 5: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/5.jpg)
5
IRT assumptions
• Monotonicity: A more able person has a higher probability of responding correctly to an item than a less able person.
• Local independence: the response to one item is independent of and does not influence your probability of responding correctly to another item after controlling for ability.
• Item and person parameters do not change across populations.
![Page 6: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/6.jpg)
6
Unidimensionality
• Models used by MDE also assume unidimensionality.–A single underlying construct
measured by the assessment (i.e. mathematics achievement, reading achievement, etc.)
![Page 7: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/7.jpg)
7
Common IRT Models
• Multiple-Choice and Short Answer Items– Rasch Model (MEAP, MEAP-Access, MI-Access
FI, ELPA)– 2 PL Model (NAEP)– 3 PL Model (MME, NAEP)
• Constructed Response Items– Partial Credit Model (MEAP Writing, MEAP-
Access Writing, MI-Access FI Expressing Ideas, ELPA)
– Generalized Partial Credit Model (MME Writing, NAEP)
Note: NAEP is not analyzed or administered by MDE. It is a test administered by the federal government!
![Page 8: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/8.jpg)
8
The Rasch Model(sometimes called the 1 PL Model)
( ) )(
)(
1 b
b
eeP −
−
+= θ
θ
θ
![Page 9: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/9.jpg)
9
The Rasch Model
• An item characteristic curve for a sample MEAP itemSimple IRT Model
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
-3 -2 -1 0 1 2 3
Achievement
Prob
abili
ty o
f cor
rect
resp
onse
Item difficulty
Inflection point
50% probability of correct response
![Page 10: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/10.jpg)
10
The 3 PL Model
( ) )(
)(
1)1(
bDa
bDa
eeccP −
−
+−
+= θ
θ
θ
![Page 11: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/11.jpg)
11
The 3 PL Model
• An item characteristic curve for a sample MME item.More Complex IRT Model
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
-3 -2 -1 0 1 2 3
Achievement
Prob
abili
ty o
f cor
rect
resp
onse
Item difficulty
Slope at inflection point indicates how well the item discriminates between high and low achievers
Probability halfway between item "guessability" and 1
Item "guessability"
![Page 12: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/12.jpg)
12
Rasch vs. 3 PL
What features do the Rasch and 3PL model have in common?
What features of the Rasch and 3 PL Model are different?
![Page 13: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/13.jpg)
13
Rasch vs. 3PL
• In the Rasch model, the item difficulty parameter and its difference from student ability drives the probability of a correct response. All other elements are constants in the equation.– Therefore, when you see the plots of
multiple items, they only differ by a constant in terms of their location on the scale (shown in diagram on next slide).
![Page 14: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/14.jpg)
14
10 Rasch item characteristic curves
-3 -2 -1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
θ
Pro
babi
lity
of c
orre
ct re
spon
se
![Page 15: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/15.jpg)
15
Rasch vs. 3PL• In 3 PL model, the difference between ability and difficulty
is still the critical piece. However, the discrimination parameter changes the influence of the difference between ability and difficulty for each item. Furthermore, the minimum possible result for the equation is influenced by the ‘c’ parameter.– If c > 0.00, the probability of correct response is greater
than 0.– Item characteristic curves will vary by location on the
scale as well as lower asymptote (c parameter) and slope (a parameter).
– Knowing how difficult an item is compared to another is still relevant but is not the only piece of information that leads to differences in items.
![Page 16: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/16.jpg)
16
10 3 PL item characteristic curves
-3 -2 -1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
θ
Pro
babi
lity
of c
orre
ct re
spon
se
![Page 17: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/17.jpg)
17
2 PL Model
• How do you end up with the 2 PL model?
![Page 18: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/18.jpg)
18
Test Characteristic Curves
• Relates achievement to scores examinees are expected to receive on the assessment.
• Sum of Item Characteristic Curves in IRT.
• Defined the same way for Raschand 3 PL models.
![Page 19: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/19.jpg)
19
Example of Test Characteristic Curve for 10 Rasch Items
- 4 -2 0 2 4
02
46
810
θ
Exp
ecte
d N
umbe
r Cor
rect
Sco
re
![Page 20: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/20.jpg)
20
Partial Credit Model (PCM)
( ) ( )( )
( )∑ ∑
∑
= =
=
−
−===
im
h
h
kik
x
kik
iix
b
bxXPP
0 0
0
exp
exp
θ
θθθ
![Page 21: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/21.jpg)
21
Generalized Partial Credit Model (GPCM)
( ) ( )( )
( )∑ ∑
∑
= =
=
−
−===
im
h
h
kiki
x
kiki
iix
ba
baxXPP
0 0
0
exp
exp
θ
θθθ
![Page 22: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/22.jpg)
22
How do we get there?• IRT models depend on item and person
parameters.• Item and person parameters have to be estimated.• Person by item matrix is needed to begin the
process.
MME Science010100101111000111101110101111000101100110110011011000011101001011010111011001010011
![Page 23: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/23.jpg)
23
• Person by item matrix input into an IRT estimation program.
• Program uses an estimation algorithm (a set of mathematical rules) to come up with a solution.
• The end products are best estimates of the item parameters and person ability estimates.– Item parameters are the ‘guessability’,
discrimination and difficulty parameters– Person parameters are the ability estimates we
use to create a student’s scale score.
IRT Estimation
![Page 24: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/24.jpg)
24
Estimating Ability
• For the 3PL/GPCM, people who share the same response string (same pattern of correct and incorrect responses/ same score on constructed response items) will have the same ability estimate. – It is possible for people with the same raw score to
end up with different ability estimates.• In the Rasch/PCM, the raw score is used to
derive the abilities. – Each person with the same raw score will have the
same estimate of ability.
![Page 25: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/25.jpg)
25
Common IRT Software Packages
• Rasch/PCM:– WINSTEPS– FACETS– CONQUEST
• 3PL/GPCM:– PARSCALE– MULTILOG– BILOG-MG (cannot be used for
constructed response items)
![Page 26: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/26.jpg)
26
Uses of IRT
• Item/Test Information• Conditional Standard Error of
Measurement• Creation of Scale Scores• Standard Setting• Equating/Linking• Test assembly/Test Design• Differential Item Functioning (DIF)
![Page 27: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/27.jpg)
27
Item/Test Information
• Each IRT model has an item information function.
• Item information provides an indicator of the accuracy of ability estimates at each location.
• Test information is the sum of item information over items.
![Page 28: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/28.jpg)
28
Item Information (Rasch item)
-4 -2 0 2 4
0.0
0.2
0.4
0.6
0.8
1.0
θ
Info
rmat
ion
![Page 29: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/29.jpg)
29
Test information (10 Rasch items)
-4 -2 0 2 4
01
23
45
θ
Info
rmat
ion
![Page 30: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/30.jpg)
30
Conditional Standard Error of Measurement
• Equal to reciprocal of the square root of the Test Information Function.
• Provides indicator of assessment accuracy at each ability level.
• MDE reports conditional standard error of measurement for student’s scale scores.
![Page 31: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/31.jpg)
31
Conditional Standard Error of Measurement
-4 -2 0 2 4
01
23
45
θ
Con
ditio
nal S
tand
ard
Erro
r of M
easu
rem
ent
![Page 32: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/32.jpg)
32
Theta to scale score transformation
• Remember the linear equation?– y = mx + b
• MDE uses linear equations to transform θ (Ability) to scale scores.
• Different transformation for each grade, content area, and assessment.
• Performance levels are determined by the student’s scale score.
![Page 33: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/33.jpg)
33
Example of a Raw to Scale Score Table
Raw Score Scale Score PL SE0 385 4 441 414 4 252 432 4 18… … … …9 478 4 1010 482 3 10… … … …14 497 3 915 500 2 9… … … …23 534 2 1124 540 1 12… … … …
![Page 34: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/34.jpg)
34
Standard Setting• Process of establishing cut scores on the score scale
of an assessment. • Involves groups of teachers, administrators, and
content experts who make cut score recommendations.
• Recommendations are based on panelists’understanding of students and content as well as assessment characteristics.
• MDE has applied IRT based standard setting methods (e.g. Bookmark and Body of Work).
• State Board of Education sets final cut scores after considering panelists’ cut score recommendations.
![Page 35: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/35.jpg)
35
Equating/Linking• Process of placing scores from different test
administrations onto a common scale so that scores can be used interchangeably.
• Equating adjusts for differences in difficulty between test forms.
• IRT facilitates equating/linking by assuming item parameters for common items do not change over time.
• Many IRT linking methods exist for creating a common scale once this assumption is made.
• MDE uses the Stocking-Lord procedure for MME and the fixed parameter method for the other assessments.
![Page 36: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/36.jpg)
36
Test Design/Assembly• MDE checks item and content characteristics
when creating new test forms.• Make test information as large as possible near
the cut scores to make performance level classifications as accurate as possible.
• Make sure that the IRT test information and test characteristic curves for alternate test versions are as close to each other as possible.
• Why do we want the test information functions and test characteristic curves to be as similar as possible?
![Page 37: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/37.jpg)
37
Differential Item Functioning (DIF)
• DIF refers to the situation where examinees with the same ability differ on average in their item performance depending on subgroup membership.
• MDE checks for DIF for each subgroup (e.g. males vs. females) that it is tested on the assessment that has a large enough sample size.
• Items identified as exhibiting DIF are reviewed by a panel of teachers and content experts to make sure that they are fair to all subgroups of examinees being tested.
![Page 38: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/38.jpg)
38
Summary
• You were introduced to IRT models and how they are used by MDE.
• Goal is that you leave with a greater understanding of how MDE assessments are scored, scaled, and interpreted.
• In addition, you now should have some ‘tools’ that can assist you in your own analyses.
![Page 39: Introduction to Item Response Theory - michigan.gov · 23 • Person by item matrix input into an IRT estimation program. • Program uses an estimation algorithm (a set of mathematical](https://reader031.fdocuments.in/reader031/viewer/2022020303/5b3a976e7f8b9ace408bc3fd/html5/thumbnails/39.jpg)
39
Contact InformationAdam Wyse (517)-373-2435
Ji Zeng(517)-241-3517
Please feel free to contact us if you have any questions ☺