Creating Innovative Tests: Applying Universal Design to Assessment Practices Assessment Colloquium...
-
Upload
abbigail-gullett -
Category
Documents
-
view
218 -
download
4
Transcript of Creating Innovative Tests: Applying Universal Design to Assessment Practices Assessment Colloquium...
Creating Innovative Tests: Creating Innovative Tests: Applying Universal Design to AssessmentApplying Universal Design to Assessment
Practices Practices
Assessment ColloquiumNovember 30, 2007
Manju Banerjee, Ph.D.Assistant Professor in Residence
Special Education
Just imagine ---
If there were no tests, no assessment, no accountability as we know it?
Student perspective: Teacher perspective: Policy maker perspective:
“Opportunities borne of new technologies, desires borne of new understandings of learning ---- a new generation of assessment beckons. To realize the vision, we must reconceive how we think about assessment, from purposes and designs to production and delivery.” (Mislevy, Steinberg, & Almond, 1999, p.6)(Mislevy, Steinberg, & Almond, 1999, p.6)
The States and Online Testing
Source: Education Week survey of state technology contacts, Technology Counts, 2004
Computer-based tests (CBT) are the “next frontier” in high stakes assessment (Thompson, Johnstone, & Thurlow, 2002)
UD is anchored in the belief that a design that works well for examinees with disabilities, improves usability for all individuals (Center for an Accessible Society, 2006)
What is universal design? (Center for Universal Design, 1997) What makes a test universally designed? * Seven Elements of a universally designed test
(Thompson, Johnstone, & Thurlow, 2002)
Opportunity to create tests that support accessibility needs of diverse test takers -- Universal Design (UD)
What is the appeal of computer-based tests?What is the appeal of computer-based tests?
Maximum usabilityWidest range of consumers
Without designadaptations
Minimize construct irrelevant features
Include test takingfeatures
Disabilities, ELL, Non-traditional age
Built-in from the start
Examinee choice is “flexibility to access and express in the mode or methods that best suit the individual” (Hall, 2005, p. 2)(Russell, Goldberg, & O’Conner, 2003)
EXAMINEE CHOICE
Application of Universal Design to High Stakes TestsApplication of Universal Design to High Stakes Tests
Inform product development of high stakes tests
1. Objective of Study 1. Objective of Study
Based on current research on features that support “examinee choice” in high stakes test design
Test taking toolsOn-screen item
display toolsAccess tools
• Goldberg & Pedula, 2002• Peak, 2005 • Lunz & Bergstrom, 1994• Vispoel et al., 2000
• Bridgeman, Lennon, & Jackenthal, 2002• Mazzeo & Harvey, 1988• Pommerich, 2004• Pommerich & Burden, 2004
• Mandinach et al., 2005• Sireci, Li, & Scarpati, 2003• Tindal & Fuchs, 2000
II. Background Information II. Background Information
Features of Examinee Choice
Construct neutral Construct related• Tindal, 1998• CTB/McGraw-Hill, 2004
Test taking tools
On-screen item display
Access tools
II. Background Information (Cont.) II. Background Information (Cont.)
• U D increased accessibility for all examinees
• Accessibility is maximized when examinees have choice over features of test design
• Research on features of test design fall into three broad categories:
(1)Test taking tools (2)Item Display (3) Access tools
• Some features are construct neutral/construct irrelevant; others are construct related (including test accommodations)
• Allowing examinees to choose features of test design based on individual preferences needs to be explored for a wide range of features including features that affect test construct
• U D suggest a framework but research is still emerging on the application of UD to high stakes CBTs.
1. What are college students’ stated preferences for features and combinations of features of test design from among test taking tools, on-screen item display, and access tools for the Passage Comprehension section of the GRE?
2. Are stated preferences for features and combinations of features from among test-taking tools, on-screen item display, and access tools different among students with and without learning disabilities (LD), Attention Deficit Hyperactivity Disorder (ADHD), or both?
Research questions
III. Methodology and Procedures
Exploratory study - Survey design Participants responded to an online survey instrument (1) Student background questionnaire (2) Demonstration of selected features of CBT (3) Opportunity for practice (4) Two choice exercises * Rank-ordered choice exercise * Voluntary top feature choice exercise Two pilot studies
Research Design, Instrumentation, Pilot Study
III. Methodology and Procedures (cont.)
Attribute 1: Test taking tools Highlighting Tagging Strike-out Change answer
Attribute 2: On-screen item display tools Font size Note pad Question reorder
Attribute 3: Access tools Self-voicing less 20 points 50% extra time less 20 points Self-voicing less 40 points 50% extra time less 40 points Self-voicing less 60 points 50% extra time less 60 points No selection
Instrumentation – Test Features
III. Methodology and Procedures (contd.)
Instrumentation
III. Methodology and Procedures (contd.)
Introduction http://www.education.uconn.edu/jamison/highstakestesting/intro.cfm
Highlighting feature
http://www.education.uconn.edu/jamison/highstakestesting/tool1video.cfm
Strike out feature
http://www.education.uconn.edu/jamison/highstakestesting/tool3video.cfm
Instrumentation
III. Methodology and Procedures (cont.)
Choice exercise 1
http://www.education.uconn.edu/jamison/highstakestesting/choice1.cfm
Choice exercise 2
http://www.education.uconn.edu/jamison/highstakestesting/choice2.cfm
Instrumentation- Creating the 1st choice exercise
III. Methodology and Procedures (contd.)
Given 4x3x7 (features) = 84 combinations Select a unique group of 4 from 84 combinations 4
84C
Attribute Range of occurrence of features
Test taking tools 176 - 196
On-screen item display tools 230 - 250
Access tools
"No selection” feature
75 – 99
185
Research Question 1
Data Analysis
Rank-ordered choiceexercise data
Voluntary top feature choiceexercise data
Rank-ordered LogitRegression
Multinomial LogitRegression
III. Methodology and Procedures (contd.)
Research Question 2
Data Analysis – Rank-ordered logit regression
III. Methodology and Procedures (contd.)
Dependent variable - Ranks assigned to the combination of feature
Independent variables - Features and attributes
Utility/Preference * Non-significant baseline
[rank is proxy for preference] * Non-significant zero probability of selection Relative Utility * One feature is dropped from each attribute for the model to be determinate
XY ̂ˆ
Data Analysis – Rank-ordered logit regression
Three models were estimated:
Model 1: Three attributes as independent variables
Model 2: Three attributes as independent variables with “no selection” feature was omitted
Model 3: Features within each attribute as independent variables
III. Methodology and Procedures (contd.)
Data Analysis – Multinomial logit regression
Dependent variable –Top pick feature within an attribute
Independent variables – Demographic characteristics
III. Methodology and Procedures (contd.)
Sample Demographics
0
20
40
60
80
100
120
140
160
180
200
1
Profile
Num
ber o
f par
ticip
ants
All particpants
LD/ADHD
No LD/ADHD
Grad
Undergrad
High GPA
Low GPA
No disability
Disability
Prior experience
No Prior experience
Male
Female
IV. Results – Participant demographics
Demographic
Characteristics
Test-taking tools (SE)
On-screen item display tools (SE)
Access tools
(SE)
All participants .03(.06) .04(.09) .04(.03)
No LD/ADHD .03(.07) -.03(.09) .06(.03)**
LD/ADHD -.06(.17) .51(.24)** -0.12(.08)*
Graduate -.03(.08) .02(.11) .11(.03)***
Undergraduate .10(.09) .07(.14) -.05(.04)
No disability .05(.07) -.03(.11) .07(.03)**
Disability -.02(.12) .20(.15) -.02(.05)
*p<0.10, **p<0.05, ***p<0.01
IV. Results - Model 1
Demographic
Characteristics
Test-taking tools (SE)
On-screen item display tools (SE)
Access tools
(SE)
High GPA (3.0)
.04(.07) -.04(.09) .04(.03)
Low GPA (<3.0) .004(.17) .54(.26)** .06(.08)
Prior CBT exp. .05(.09) -.10(.12) .09(.03)**
No CBT exp. .10(.08) .15(.12) .01(.04)
Male .02(.11) -.01(.15) .09(.04)**
Female .04(.07) .05(.10) .01(.03)
*p<0.10, **p<0.05, ***p<0.01
IV. Results - Model 1 (contd.)
Demographic
Characteristics
Test-taking tools (SE)
On-screen item display tools (SE)
SV + Time
(SE)
All participants .03(.06) .03(.09) -.19(.15)
No LD/ADHD .03(.07) -.04(.09) -.24(.16)
LD/ADHD -.04(.17) .49(.24)** .40(.45)
Graduate -.03(.08) .02(.11) -.39(.19)**
Undergraduate .10(.09) .07(.14) .12(.24)
No disability .05(.07) -.04(.11) -.25(.18)
Disability -.02(.11) .02(.15) .01(.26)
*p<0.10, **p<0.05, ***p<0.01
IV. Results - Model 2
Demographic
Characteristics
Test-taking tools (SE)
On-screen item display tools (SE)
SV + Time
(SE)
High GPA (3.0)
.03(.07) -.04(.09) -.24(.16)
Low GPA (<3.0) -.01(.17) .50(.26)** -.05(.43)
Prior CBT exp. -.05(.09) -.10(.12) -.09(.04)**
No CBT exp. .10(.08) .15(.12) .01(.04)
Male .02(.11) -.02(.15) -.55(.28)**
Female .03(.11) -.02(.16) -.55(.31)
*p<0.10, **p<0.05, ***p<0.01
IV. Results - Model 2 (contd.)
Features above baseline preference
Features at baseline
preference
Features below baseline preference
• Strike-out
• Tagging for review
• Highlighting
• Change answer
• Question reorder
• Extra time
• Font size
• SV less 20 pt.
• SV less 40 pt.
All LD/ADHD No LD/
ADHD
Grad Undergrad
IV. Results - Model 3 (contd.)
Features above baseline preference
Features at baseline
preference
Features below baseline preference
•Tagging for review • Highlighting
• Change answer
• Question reorder
• Extra time
• Note pad
• Strike-out
• SV less 40 pt.
• Extra time less 40 pt
High GPA
3.0
Low GPA
<3.0
Prior CBT experience
No prior
CBT
experience
Male Female
IV. Results - Model 3 (contd.)
Test of equality of regression coefficients for LD/ADHD status
Model 1 On-screen item display
Z = 2.107; p = 0.02
Model 2 On-screen item display SV + ET
Z= 2.07; p = 0.02 Z=1.34; p = 0.09
Model 3 Tagging for later review
Z= 1.76; p = 0.04
Strike-out
Z= 2.40; p = 0.01
Self-voicing less 40 points
Z= 2.30; p = 0.01
IV. Results (contd.)
33.15
12.16
22.1
32.6
24.68
43.09
32.04
79.6
20.4
0
10
20
30
40
50
60
70
80
Probabilities
1
Features
Estimated Probability of Selection in Voluntary Top Feature Choice Exercise - All Participants
Highlighter
Tag
Strike-out
Change answ er
Font size
Note pad
Question reorder
Extra time
Self voicing
IV. Results -Voluntary Top Feature Choice Exercise
College students’ preferences for combinations of features of test design varied by demographic strata
At the attribute level (rank-ordered exercise):
- Students with LD and/or ADHDAND students with low GPA indicated above baseline level of preference for on-screen item display relative to test-taking tools and access tools
- Students without LD/ADHD, Graduates, No disabilities, With prior CBT experience and Males prefer Access Tools with “no selection”, BUT indicated below baseline preference when “no selection” is removed (except for those with no disabilities)
V. Summary of Results and Discussion
At the features level (rank-ordered exercise)
Strike-out, Tagging* (LD/ADHD; With disabilities
Undergrad*; GPA* < 3.0)
At the features level (voluntary top choice exercise) among Test-taking tools display
Highlight (all participants*; No LD/ADHD; High
GPA*; Graduates, Undergraduates, No disabilities,
Male; Female*)
V. Summary of Results and Discussion (contd.)
At the features level (voluntary top choice exercise) among On-screen item display
Note pad (across all demographic strata)
At the features level (voluntary top choice exercise) among Access tools
Extra time (across all demographic strata)
V. Summary of Results and Discussion (contd.)
Further investigation of “examinee choice” in high stakes computer-based test development
Explore other features of test design
Investigate concept of examinee choice with different college populations
Expand UD in assessment to include construct irrelevant and construct related features.
Provide examinee choice for features within high stakes test preparation material
V. Implications for Future Research
Sample selection did not follow formal procedures for stratified random sampling
(Levy & Lemeshow, 1991)
Participants were all from a competitive Research One university (external validity)
Focus on hypothetical choices rather than real time choices (stated vs. revealed preference)
No way to determine if all participants clearly understood the “trade-off” exercise.
(Notion of students with LD/ADHD and penalty)
V. Limitations of Study
**********
END OF PRESENTATION