Improving the fairness and quality of MCQ assessment · Adv Health Sci Educ Theory Pract, 10(2 ......
Transcript of Improving the fairness and quality of MCQ assessment · Adv Health Sci Educ Theory Pract, 10(2 ......
Bianca Klettke & Susie Macfarlane
Deakin University CRICOS Provider Code: 00113B
Improving the fairness and quality of MCQ assessment
Deakin University Teaching and Learning ConferenceNovember 14, 2018
Why is it important to think about the quality of our MCQs?
1. MCQs are widely used in summative assessment
MCQs as % of unit assessment Course 1
Ave: 37.5%
MCQs as % of unit assessment Course 2
Ave: 42.5%
Why is it important to think about the quality of our MCQs?
1. MCQs are widely used
2. MCQs are on average, flawed
Study• 2770 items from 2001 – 2005
• Evaluated against 19 item-writing flaws
• Evaluated item cognitive level
Tarrant, M, Knierim, A, Hayes, S, Ware, J (2006) The frequency of item writing flaws in multiple-choice questions used in high stakes nursing assessments, Nurse Education Today 6(6):354-363.
Results• 46% items violated guidelines
• 90% written at lower cognitive levels
Why is it important to think about the quality of our MCQs?
1. MCQs are widely used
2. MCQs are on average, flawed
3. Poorly constructed items disadvantage some students
Deakin University CRICOS Provider Code: 00113B
Study: 4 exams, 1st and 2nd year science students
Downing SM (2005) The effects of violating standard item writing principles on tests and students: the consequences of using flawed test items on achievement examinations in medical education. Adv Health Sci Educ Theory Pract, 10(2): 133–143.
Findings:646 (53%) students passed the standard items 575 (47%) passed the flawed items
Deakin University CRICOS Provider Code: 00113B
Study: 4 exams, 1st and 2nd year science students
Downing SM (2005) The effects of violating standard item writing principles on tests and students: the consequences of using flawed test items on achievement examinations in medical education. Adv Health Sci Educ Theory Pract, 10(2): 133–143.
Findings:646 (53%) students passed the standard items 575 (47%) passed the flawed items
Therefore the test is not assessing students’ knowledge
Why is it important to think about the quality of our MCQs?
1. MCQs are widely used
2. MCQs are on average, flawed
3. Poorly constructed items disadvantage some students
4. There is an evidence base for how to write items and tests that are fair(er)
Good practice guidelines
Click each to access article
Deakin University CRICOS Provider Code: 00113B
Capability building
Main guidelines
1. Write a question that can be answered
2. Avoid NOTA, AOTA
3. Avoid negative stems
4. Avoid overlapping options
5. Create options that are authentically plausible
Example question IItem analysis data
Purpose of summative MCQs
To distinguish between students who know the material and
those who don’t
Deakin University CRICOS Provider Code: 00113B
MCQ review and rewrite
MCQ quality enhancement process
Review Rewrite
Professional Development
Review items according to guidelines and IA
reports
Rewrite items according to guidelines
Evaluate MCQ quality
T2, 2016 Xmas/NY, 2016 2018
Implement exams
T1, T2 2017
Assess Evaluate
Reviewing items
Example Question IV
A. Answer
B. Answer
C. Answer
D. All of the above (A, B and C)
E. Both B and C, but not A
Difficulty SD Discrimination
Overall 0.8571 0.378 -0.1369
A (0.0) 0 0 NaN
B (0.0) 0 0 NaN
C (0.0) 0 0 NaN
D (1.0) 0.8571 0.378 -0.14
E (0) 0.1429 0.378 -0.0126
Which of the following types of X are related to Y?
Question 2 - BEFORE
Example Question IV Revised
Difficulty SD Discrimination
Overall 0.3857 0.4885 0.4044
A (0.0) 0.3857 0.4885 -0.3425
B (0.0) 0.0857 0.2809 -0.4141
C (1.0) 0.3857 0.4885 0.4044
D (0.0) 0.1429 0.3512 -0.0722
Question 2 - AFTER
A. Answer
B. Answer
C. Answer
D. Answer
One of barriers to X for victims of Y is?
Deakin University CRICOS Provider Code: 00113B
Outcomes
Overall Analysis
38
7
55
0
10
20
30
40
50
60
1 2 3
T2
40
47
13
0
5
10
15
20
25
30
35
40
45
50
1 2 3
T3
Pre review Post review
> 0.3 0.3 – 0.1 < 0.1
Good Moderate Not discriminating
Good Moderate Not discriminating
> 0.3 0.3 – 0.1 < 0.1
44.4%
33.3%
55.6%
6.7%
20.0%
11.1%
75.6%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Unfocused stem /disparate options
Negative stem AOTA / NOTA Overlapping options Options include Other Cover test
% flawed items before and after review
Pre Review/Rewrite Post Review/Rewrite
44.4%
33.3%
55.6%
6.7%
20.0%
11.1%
75.6%
26.7%28.9%
2.2%
6.7%
2.2% 2.2%
60.0%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Unfocused stem /disparate options
Negative stem AOTA / NOTA Overlapping options Options include Other Cover test
% flawed items before and after review
Pre Review/Rewrite Post Review/Rewrite
44.4%
33.3%
55.6%
6.7%
20.0%
11.1%
75.6%
26.7%28.9%
2.2%
6.7%
2.2% 2.2%
60.0%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Unfocused stem /disparate options
Negative stem AOTA / NOTA Overlapping options Options include Other Cover test
% flawed items before and after review
Pre Review/Rewrite Post Review/Rewrite
0
0.5
1
1.5
2
2.5
3
Pre Post
Flaws per item
Flaw type
1. Unfocussed stem and disparate options
2. Cover options test
3. Negative stem
4. NOTA, AOTA
5. Overlapping options
6. Options that include other options
7. Repeat information in options that should be in the stem
8. Key is longer than the distractors
9. Stem includes terms in key only
10. Stem includes redundant information not required to answer the question
11. Spelling errors
12. Grammar that fits key not distractors
13. Other flaws
Outcomes
Fairer assessment
Exam assesses what it was intended to assess
Exam is now too easy
Outcomes
Fairer assessment
Exam assesses what it was intended to assess
Exam is now too easy
Next steps
Develop more challenging and higher cognitive level items
Raise awareness among colleagues (please contact us if interested)
Discussion
Questions?
Comments?
References
Azer, S (2006) Assessment in a problem-based learning course: Twelve tips for constructing multiple choice questions that test students' cognitive skills, Biochemistry and Molecular Biology Education, https://iubmb.onlinelibrary.wiley.com/doi/full/10.1002/bmb.2003.494031060288
Case S, Swanson D (2002) Constructing written test questions for basic and clinical sciences. 3rd ed. Philadelphia: National Board of Medical Examiners. https://www.nbme.org/pdf/itemwriting_2003/2003iwgwhole.pdf
Collins, J (2006) Writing Multiple-Choice Questions for Continuing Medical Education Activities and Self-Assessment Modules, Radiographics, 26(2):543-51
Downing SM. The effects of violating standard item writing principles on tests and students: the consequences of using flawed test items on achievement examinations in medical education. Adv Health Sci EducTheory Pract. 2005;10(2): 133–143.
DiBattista, B, Sinnige-Egger, J, & Fortuna, G (2014) The “None of the Above” Option in Multiple-Choice Testing: An Experimental Study, The Journal of Experimental Education, 82:2, 168-183, DOI: 10.1080/00220973.2013.795127
Ellsworth RA, Dunnell P, Duell OK (1990) Multiple–choice test items: what are text book authors telling teachers. Journal of
References
Ellsworth RA, Dunnell P, Duell OK (1990) Multiple–choice test items: what are text book authors telling teachers. Journal of educational research, 83(5):289–93.
Haladyna TM, Downing SM, Rodriguez MC. (2002) A review of multiple-choice item-writing guidelines. Applied Measurement in Education, 15(3): 309-334.
Hansen JD (1997) Quality multiple-choice test questions: Item writing guidelines and an analysis of auditing test banks. J Educ Business, 73(2): 94–97.
Rodriquez, M (2005) Three Options Are Optimal for Multiple-Choice Items: A Meta-Analysis of 80 Years of Research, Educational measurement: issues and practice
Stagnaro-Green, AS, Downing, SM (2006) Use of flawed multiple-choice items by the New England Journal of Medicine for continuing medical education, 28(6): 566-8
Tarrant, M, Knierim, A, Hayes, S, Ware, J (2006) The frequency of item writing flaws in multiple-choice questions used in high stakes nursing assessments, Nurse Education in Practice, 6(6):354-363 DOI: http://dx.doi.org/10.1016/j.nepr.2006.07.002
Technical guidelines http://www.tpcb.org/ptoe/TechnicalGuidelines.pdf
Image attribution
Free handcuffs by lechenie-narkomanii, CC0 Creative Commons, link
Sport train referee by 3dman_eu, CC0 Creative Commons, link
MCQ sheet by F1Digitals, CC0 Creative Commons, link
Arrow by QuinceMedia, link
Cogs by F1Digitals, CC0 Creative Commons, link
Deakin University CRICOS Provider Code: 00113B
Considerations in using MCQs…
Considerations - Quality
2. The Efficiency Myth: Time required to write quality tests may not be reflected in assessment design decisions.
Tasks include the time to:
• develop Item Analysis skills
• develop MCQ writing skills, and
• review, rewrite and write new items (For example, the Deakin policy requires EP1 and EP2 exams to be 75% different)
1. Quality assurance: Do we know how valid and reliable our MCQ exams are? How do we currently assure exam quality?
Considerations - Learning
3. Standards: Do our current MCQs assess higher level (e.g. PG) Learning Outcomes?
4. Backwash effect: What learning practices do our MCQs encourage?
Are these the agentic and lifelong learning strategies our graduates require?
Exam blueprintingExam blueprinting
1. The Stem is complete and can be answered without seeing the options
2. Stem is clear and specific3. Include material in the stem that would be repeated in
the options4. Avoid negatives and state in the positive form
Rules for the Stem
1. Options do not overlap2. Length of options is short, and approximately equal3. Avoid absolutes such as never, always and all4. Avoid vague frequency terms such as rarely, usually5. Avoid AOTA and NOTA, or both A and B6. Present options in logical order (chronological or
numerical)7. Grammar consistent in the stem and alternatives
Rules for the options
1. Only one correct answer is included2. The position of the correct answer
varies3. Avoid a correct answer that includes
the elements most common in other options
Rules for the Answer
1. All distractors are plausible2. Common student misunderstandings
are incorporated in the distractors
Rules for the Distractors
Main guidelines
1. Unfocussed stem
2. Avoid NOTA, AOTA
3. Avoid negative stems
4. Avoid overlapping options
5. Create options that are authentically plausible
Advantages
MCQs
Tests examinees knowledge more widely than other methods
Efficient to administer and mark
Disadvantages
Requires skill to write high quality MCQs and construct fair tests
Time intensive
Can test what students remember, not their ability to engage in higher level cognitive processing
False sense of precision
(Walsh & Seldomridge, 2006; DiBattista, 2014)
Item analysis
Difficulty Factor is the ratio of students who answer the question correctly.
A good question should reflect the difficulty of the question at varied levels of understanding providing students with both weak and strong understanding an opportunity to demonstrate their knowledge.
Discrimination Index compares the top 25% of students with the bottom 25% and discriminates the level of understanding from each question irrespective of language difficulty or random chance.
Item response theory – point biserial
Discrimination Index applied to an individual question provides information about how students with a higher overall score or a lower overall score have performed on an individual question. In general we would expect the top 25% of students to perform well in all questions