Field Test Analysis Report: SAS Macro and Item/Distractor/DIF Analyses Prepared by Yi-Hsin Chen,...
-
Upload
kathleen-blankenship -
Category
Documents
-
view
219 -
download
0
Transcript of Field Test Analysis Report: SAS Macro and Item/Distractor/DIF Analyses Prepared by Yi-Hsin Chen,...
Field Test Analysis Report:SAS Macro and
Item/Distractor/DIF Analyses
Prepared by Yi-Hsin Chen, Chunhua Cao, and Stephanie
GreenCollege of Education at USFPresented at the meeting of
the Central Florida Assessment Collaborative (CFAC)
May 20th, 2014, Orlando Florida
Agenda of This Presentation
SAS macro for CTT test/item analysis, IRT 2PL model, and Mantel-Haenszel differential item functioning (DIF) analysis
Introduction of statistical concepts for test/item development
Item Analyses: CTT and IRT Distractor Analysis DIF Analysis
SAS Macro Outputs
A SAS macro developed for this project
There are six excel outputs Test score statistics Frequencies of options for each item Item analysis statistics Distractor analysis DIF 2PL item parameter
Available upon request at [email protected]
Sample size
N: Sample size
85, 60, 70, 44, 59, 89, 99, 79, . , 100
N=10
USED_N: Sample size used for analysis without missing data one missing data USED_N = 9
Central Tendency
MEAN: Arithmetic average Most frequently reported measure of
central tendency Sum of scores divided by number of
scores
1005
10095105100110
N
XX
Test Statistics: Central Tendency
MEDIAN (Q2): the score at the 50th percentile half of the examinees score above
median, and half score below median110
105
100
95
90
Median = 100
110
105
100
95
95
90
Median = 95+100 / 2 = 97.5
Percentiles
Percentile is considered when we consider the percentage of scores that fall below a given point
They are very useful for interpreting an individual student’s performance
Q1: The score is at the 25th percentile Q1 = 10, indicating 25 percent of the
students’ scores below 10 points
Q3: The score is at the 75th percentile
Variability
Range Subtract lowest score (Minimum) from
highest score (Maximum) This is a rough measure of variability
High score = 90
Low score = 50
Range = ? (40)
High score = 100
Low score = 50
Range = ? (50)
High score = 90
Low score = 30
Range = ? (60)
Variability
Standard Deviation (SD): an average points that deviates from
the mean score A measure of the amount of variability
in examinees’ total scores Large SD = large variability
(heterogeneity) Small SD = small variability
(homogeneity) (scores cluster closer to the mean)
VariabilityDeviation Scores Squared
100-92= 8 82 = 64
96-92 = 4 42 = 16
94-92 = 2 22 = 4
92-92 = 0 02 = 0
90-92 = -2 (-2)2 = 4
80-92 = -12 (-12)2 = 144
232 = (X-Mean)2
SD = (X-Mean)2 = 232 =
N 6
Scores
100
96
94
92
90
80
Mean = 92 6.22
Skewness and Kurtosis
SKEWNESS: a measure to tell the shape of the score
distribution, such as positive or negative skewness or symmetry
KURTOSIS: a measure of the "peakedness" of
the score distribution
Skewnessa roughly negatively skewed distribution (bar
chart)
0
1
2
3
4
5
6
42 48 52 56 61 62 63 67 71 72 73 74 78 80 82 91
Fre
qu
en
cy
Score
Skewness
a roughly positively skewed distribution (bar chart)
0
2
4
6
8
10
12
14
16
18
20
Fre
qu
en
cy
Score
Reliability: Cronbach’s Alpha
A measure of the test reliability, indicating the internal consistency of the test
Sample dependent Different samples may obtain
different reliability with the same test
Ranges from 0 to 1 0.7 and above: good internal
consistency
Standard Error of Measurement
SEM (Standard Error of Measurement)
SEM = STD *
A higher reliable test can cause smaller SEM
Item Analysis
Why care? – Item analysis helps you identify
problems with your items (or scoring)
These problems can be corrected, resulting in a better test, and better measurement
Item Analysis
When is it useful? – Item analysis is most useful when you are
developing a bank, or pool, of items that you will continue to use
It can be used when evaluating standardized tests
It is also a useful tool, anytime students have complained about an item
It can be used to identify mis-keyed items
Item Difficulty (p-value)
Item difficulty (proportion correct): the proportion of examinees tested
that answered the item correctly
# of students who responded correctly
total # of students who responded
p =
Ncorrect
Ntotal
p =
Item Difficulty (p-value)
p can range from 0 to 1.0 A rough level of item difficulty (p)
.80 and above moderately easy to very easy (mastery)
.80 - .30 moderate
.30 and below moderately difficult to very difficult
Item Discrimination
Discrimination can be computed using correlation This shows the relationship between
a single item and the total test
It is expected that students with high scores answer the item correctly
rpb = (point-biserial) correlationbetween item score and total
score
Item Discrimination
Corrected point-biserial correlation: A statistic similar to point-biserial
correlations The score of the individual item is
taken out of the total score so that the contribution of the item itself is removed from the correlation This statistic is more accurate to
represent item discrimination
Item Discrimination
Two ability groups (upper and lower) approach Median score is used to divide the students into
two groups Discrimination coefficient (D-value) =
percentage correct in the upper group – percentage correct in the lower group
Ranges from -1 to 1 An item with higher and positive D-value
indicates a good discriminating item An item with a negative D-value suggests that
the lower achieving group did better on an item than the higher achieving group, indicating a poor item
Item Discrimination
A rough scale of item discrimination (D) D can range from -1 to 1
.30 and above moderate to high discrimination
0 - .30 little to no discrimination
0 and below negative discrimination (unwanted)
Item Difficulty and Discrimination
Relationship between item difficulty and discrimination
there can be little discrimination: if nearly everyone gets the item right, or if nearly everyone gets the item wrong
there can be maximum discrimination: if about half the people got the item
right,and about half got the item wrong
Item Difficulty and Discrimination
Relationship between item difficulty and potentialdiscrimination
0 .5 1.0Item Difficulty
M
ax D
iscr
i mi n
a tio
n0
.51 .
0
Alpha If an Item Deleted
“The Alpha If Deleted” shows what would happen to the internal consistency when the item is deleted When the test_alpha_deleted coefficient goes
up, compared with the original test-alpha, it indicates that without the deleted item, the test can be more reliable (that item can be removed from the test)
When the test_alpha_deleted coefficient goes down, it means that deleting that item is not a good thing and also indicates that item is a good item
Distractor Analysis
used to determine which distractors students find attractive consider the proportion of (total)
students choosing each option compare the number of examinees
selecting each option in the High and Low groups, or
a* b c d
Total .78 .11 .03 .08
Example:Proportion of total examinees selecting each option
Selecting upper and lower groups
Upper and Lower groups are needed: to hand-compute D-values, and for distractor analysis when
comparing numbers of examinees
To select Upper and Lower groups: arrange the tests by total score separate out the tests for each group
top half becomes Upper group, and bottom half becomes Lower group
Upper and Lower groups are needed: to hand-compute D-values, and for distractor analysis when
comparing number of examinees
To select Upper and Lower groups: Upper group: top half (50%) or top
33% Lower group: bottom half (50%) or
bottom 33%
Selecting upper and lower groups
1. The capital of Switzerland isa) Bern.b) Zurich.c) Lucerne.d) Geneva.
Numbers in the High and Low groups who selected each option
Example 1: distractor analysis
a* b c d
Upper 13 0 1 1
Lower 1 3 2 9
2. The most important part of test planning is creating:
a) sound instruction.b) a test blueprint.c) an item analysis plan.d) the grading curve.
Numbers in the High and Low groups who selected each option
Example 2: distractor analysis
a b* c d
Upper 1 8 1 0
Lower 2 8 0 0
3. Which type of essay item contains the most explicit instructions to students?
a) extended responseb) fixed responsec) explicit responsed) restricted response
a b c* d
Upper 3 1 2 14
Lower 4 1 7 8
Numbers in the High and Low groups who selected each option
Example 3: distractor analysis
Two-Parameter Logistic Model
)](exp[1
)](exp[),,|1(
iji
ijiiijijXP
47
Alpha represents item discrimination The value is positive
Beta represents item difficulty with the mean of 0 and the SD of 1 Items with the negative values = easy items Items with the positive values = hard items
49
Differential Item Functioning
A major concern regarding using the psychological measures is that these measures may “work differently” or be either “for or against” a particular group of examinees (e.g., gender or ethnicity)
When a test item unfairly favors one group over another, it can be said to show differential item functioning or DIF
52
Mantel Haenszel chi-square
0 1 Total
Reference Bt At NRt
Focal Dt Ct NFt
Total M0t M1t Tt
1
1
1
1L
t t
tt
L
t t
tt
MH
T
CBT
DA
subscript t = individual raw score
53
Mantel Haenszel chi-square
Controlling for the observed score, we want to see if the proportion correct for the focal group is equal to that for the reference group on an item
The MH statistic consists of a series of 2x2 contingency tables MH = 1 : No DIF MH < 1: DIF and favor the focal group
(dummy=0) if p < .05 MH > 1: DIF and favor the reference group
(dummy=1) if p < .05
Test Statistics for Three Subjects
STATISTIC AnatomyN 269
USED_N 269MEAN 12.364
STD 3.337MIN 4Q1 10
MEDIAN 12Q3 15
MAX 21SKEWNESS -0.102KURTOSIS -0.447
ALPHA 0.533SEM 2.281
STATISTIC PrecalculusN 210
USED_N 210MEAN 9.748
STD 2.978MIN 2Q1 8
MEDIAN 10Q3 11
MAX 20SKEWNESS 0.378KURTOSIS 0.679
ALPHA 0.506SEM 2.093
STATISTIC Phy-SciN 183
USED_N 183MEAN 12.852
STD 4.141MIN 4Q1 10
MEDIAN 13Q3 16
MAX 25SKEWNESS 0.088KURTOSIS -0.658
ALPHA 0.626SEM 2.531
Item difficulty
Physical Science(31 items)
0-0.10 (1 item) 22
0.11-0.20 (2 items) 11, 28
0.21-0.30 (5 items) 16, 27, 9, 6, 20
0.31-0.70 (12 items) 30, 18, 12, 25, 31, 15, 2, 19, 24, 13, 26, 29, 21, 10, 23, 8, 7, 4, 3,
17, 5, 14
0.71-0.80 (1 item) 1
0.81-0.90 0 items
0.90-1.00 0 items
Item difficulty
Anatomy(27 items)
0-0.1 (0 items)
0.1-0.2 (2 items) 13, 2
0.2-0.3 (6 items) 16, 8, 27, 3, 10, 20
0.3-0.7 (14 items) 17, 4, 9, 14, 11, 5, 18, 26, 25, 7, 15, 22, 21,
24
0.7-0.8 (3 items) 12, 19, 1
0.8-0.9 (2 items) 23, 6
0.9-1.0 (0 items)
Item DifficultyItem
difficultyPre-calculus(21 items)
0-0.1 (0 Items)
0.1-0.2 (2 Items) 19, 3
0.2-0.3 (1 Item) 14, 1
0.3-0.7 (14 Items) 21, 10, 11, 18, 12, 20, 8, 16, 17, 15, 14, 6,
13, 2 0.7-0.8 (3 Items) 7, 5, 9
0.8-0.9 (0 Items)
0.9-1.0 (0 Items)
Value Physical Science31 items
Negative Value
(6 items) 11, 22, 20, 12, 31, 1
0-0.10 (2 items) 21, 5
0.10-0.20 (8 items) 23, 19, 6, 28, 25, 18, 10, 16
0.20-0.30 (6 items) 8, 3, 15, 2, 30, 27
Above 0.30
(9 items) 13, 7, 17, 24, 29, 9, 14, 26, 4
Value Pre-calculus27 items
Negative Value
(3 items) 3, 17, 13
0-0.10 (6 items) 16, 9, 20, 10, 27, 2
0.11-0.20 (9 items) 15, 4, 11, 26, 18, 12, 5,
14, 80.21-0.30 (9 items) 7, 1,
25, 22, 23, 21, 24, 6, 19
Above 0.30
0 items
Item Discrimination (Corrected point-biserial correlation)
Value Pre-calculus21 items
Negative Value
(1 Item) 19
0-0.10 (2 Items) 11, 3
0.10-0.20 (13 Items) 9, 10, 2, 18, 8, 5, 17, 21, 1,
13, 14, 4, 120.20-0.30 (5 Items) 16, 15, 7,
6, 20
Above 0.30
(0 Items)
Value Number of items
Negative Value
(2 items) 11, 22
0-0.10 (4 items) 20, 1, 12, 31
0.11-0.20 (8 items) 28, 21, 23, 5, 19, 25, 6, 16
0.21-0.30 (6 items) 10, 3, 8, 27, 18, 9
Above 0.30 (10 items) 15, 30, 2, 17, 13, 24, 14, 7, 29, 26, 4
Value Anatomy27 items
Negative Value
(0 items)
0-0.10 (5 items) 13, 3, 16, 17,2
0.11-0.20 (7 items) 27, 9, 10, 20, 12, 6, 26
0.21-0.30 (11 items) 8, 4, 11, 1, 15, 18, 23, 15, 5, 7
Above 0.30
(5 items) 25, 24, 19, 22, 21
Item Discrimination(Two-Group Approach)
Value Pre-calculus21 items
Negative Value
(0 Items)
0-0.10 (3 Items) 19, 3, 18
0.10-0.20 (7 Items) 21, 14, 9, 17, 10, 2, 11
0.20-0.30 (7 Items) 1, 5, 12, 13, 8, 7, 16
Above 0.30
(4 Items) 15, 20, 4, 6
Alpha Difference
Physical Science31 items
Negative Value
(8 items) 11, 20, 12, 31, 22, 1, 21, 5
0-0.005 (6 items) 23, 19, 6, 28, 25, 18
0.006-0.01 (3 items) 10, 16, 8
Above 0.01 (14 items) 3, 15, 2, 27, 30, 13, 7, 17, 24, 9, 29,
14, 26, 4
Alpha Difference
Anatomy27 items
Negative Value
(7 items) 3, 17, 9, 16, 13, 20, 10
0-0.005 (3 items) 27, 2, 15
0.005-0.01
(4 items) 4, 11, 26, 18
Above 0.01
(13 items) 12, 5, 14, 8, 1, 7, 25, 23, 6, 22, 21, 19,
24
Alpha Difference(Alpha and Alpha When deleted)
Alpha Difference
Pre-Calculus21 items
Negative Value
(2 Items) 19, 11
0-0.005 (3 Items) 3, 9, 10
0.005-0.01 (5 Items) 2, 8, 18, 5, 17
Above 0.01
(14 Items) 21, 1, 13, 14, 4, 12, 16, 7, 15,
6, 20
Item Analysis Summary
The test with reliability (alpha) less than .5 needs to be worried
Too hard item (e.g., p-value < 0.1 or 0.2) or/and too easy (e.g., p-value close to 1) items may be revisited
Revisiting Items with a negative value of discrimination is warranted, especially for the two-group item discrimination
Items with negative alpha difference between the original test alpha and the test alpha when deleted are not good, either
Precalculus: Item 29
FrequencyRow Pct
Table of groupB by r19
groupBr19
A B C* D TotalLOWER GROUP 14
27.4521
41.1811
21.575
9.8051
UPPER GROUP 4741.23
3228.07
2723.68
87.02
114
Total 61 53 38 13 165
Frequency Missing = 76
Precalculus: Item 3
The item is a hard item (p = 0.162)
FrequencyRow Pct
Table of groupB by r3
groupBr3
A B C* D TotalLOWER GROUP 20
32.2629
46.779
14.524
6.4562
UPPER GROUP 4030.30
5642.42
3022.73
64.55
132
Total 60 85 39 10 194
Frequency Missing = 47
Precalculus: Item 1
The item is a hard item (p = 0.253)
FrequencyRow Pct
Table of groupB by r1
groupBr1
A B C D* TotalLOWER GROUP 14
24.1427
46.555
8.6212
20.6958
UPPER GROUP 2317.83
4937.98
86.20
4937.98
129
Total 37 76 13 61 187
Frequency Missing = 54
Precalculus: Item 14
Table of groupB by r14
groupBr14
- A B C D* TotalLOWER GROUP 5
5.568
8.893
3.3357
63.3317
18.8990
UPPER GROUP 54.55
43.64
43.64
5953.64
3834.55
110
Total 10 12 7 116 55 200Frequency Missing = 10
Precalculus: Item 14
The item is challenging (p = 0.26)
Option C may be the potential key
Or students have a misconception on this item
Precalculus: Good Item
The item is challenging (p = 0.266)
Discriminating well
Table of groupB by r21
groupBr21
- A B* C D TotalLOWER GROUP 33
33.3318
18.1823
23.2310
10.1015
15.1599
UPPER GROUP 2320.72
1614.41
4338.74
1816.22
119.91
111
Total 56 34 66 28 26 210
Precalculus: Good Item
The item is challenging (p = 0.31)
Discriminating well
However, this item shows DIF and favors girls
Summary for Precalculus
Some items need to revisit: Items: 19, 3, 1, and 14
Develop some easy items (p=.70-.90)
Two DIF items Items 4 and 21
Anatomy: Hard Item
The item is a hard item (p = 0.271)
Not discriminating well
Table of groupB by r3
groupB
r3
A B C* D TotalLOWER GROUP 20
18.5224
22.2225
23.1539
36.11108
UPPER GROUP 74.38
2817.50
4830.00
7748.13
160
Total 27 52 73 116 268
Frequency Missing = 1
Anatomy: Potential Miskey
The item may have a miskey of Option D
The possible correct key is Option A (Majority of the upper group chose this option)
Table of groupB by r16
groupBr16
A B C D* TotalLOWER GROUP
5248.15
2725.00
109.26
1917.59
108
UPPER GROUP 9257.50
1610.00
127.50
4025.00
160
Total 144 43 22 59 268
Frequency Missing = 1
Anatomy: Potential Miskey
The item may have a miskey of Option D
The possible correct key is Option A (Majority of the upper group chose this option)
Or there is a misconception on this item
Anatomy: Good Item
Table of groupB by r25
groupBr25
A B C D* TotalLOWER GROUP
1311.93
3027.52
3229.36
3431.19
109
UPPER GROUP 85.03
2213.84
3119.50
9861.64
159
Total 21 52 63 132 268
Frequency Missing = 1
The item has moderate difficulty level(p = 0.491)
Discriminating well
Summary for Anatomy
The p-value of the items look good, with half of the items being moderate difficult, almost one quarter of them being easy, and almost one quarter being difficulty
No negative discrimination items using the two-group approach (a good sign)
The test alpha is low (0.533) DIF: Items 14, 19 (favoring boys) and
items15, 22, 26 (favoring girls)
Physical Science: Item too hard
The item is a hard item (p = 0.164)
Table of groupB by r28
groupBr28
- A B C D* TotalLOWER GROUP
22.33
1922.09
3743.02
1922.09
910.47
86
UPPER GROUP
2020.62
44.12
4445.36
88.25
2121.65
97
Total 22 23 81 27 30 183
Physical Science: Potential Miskey
The item may have a miskey of Option C
The possible correct key is Option A (Majority of the upper group chose this option)
Table of groupB by r11
groupBr11
A B C* D TotalLOWER GROUP 35
40.7026
30.2317
19.778
9.3086
UPPER GROUP 6668.04
1111.34
1111.34
99.28
97
Total 101 37 28 17 183
Physical Science: Good Item
The item has moderate difficulty level(p = 0.491)
Discriminating well
Table of groupB by r27
groupBr27
A B C D* TotalLOWER GROUP 13
15.1230
34.8834
39.539
10.4786
UPPER GROUP 99.28
1919.59
3334.02
3637.11
97
Total 22 49 67 45 183