SRI Technology Evaluation WorkshopSlide 1RJM 2/23/00 Leverage Points for Improving Educational...

33
SRI Technology Evaluation Workshop Slide 1 RJM 2/23/00 Leverage Points for Improving Educational Assessment Robert J. Mislevy, Linda S. Steinberg, and Russell G. Almond Educational Testing Service February 25, 2000 Presented at the Technology Design Workshop sponsored by the U.S. Department of Education, held at Stanford Research Institute, Menlo Park, CA, February 25-26, 2000. The work of the first author was supported in part by the Educational Research and Development Centers Program, PR/Award Number R305B60002, as administered by the Office of Educational Research and Improvement, U.S. Department of Education. The findings and opinions expressed in this report do not reflect the positions or policies of the National Institute on Student Achievement, Curriculum, and Assessment, the Office of Educational Research and Improvement, or the U.S. Department of Education.
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of SRI Technology Evaluation WorkshopSlide 1RJM 2/23/00 Leverage Points for Improving Educational...

SRI Technology Evaluation Workshop Slide 1RJM 2/23/00

Leverage Points for Improving Educational Assessment

Robert J. Mislevy, Linda S. Steinberg,

and Russell G. Almond

Educational Testing Service

February 25, 2000

Presented at the Technology Design Workshop sponsored by the U.S. Department of Education, held at Stanford Research Institute, Menlo Park, CA, February 25-26, 2000.

The work of the first author was supported in part by the Educational Research and Development Centers Program, PR/Award Number R305B60002, as administered by the Office of Educational Research and Improvement, U.S. Department of Education. The findings and opinions expressed in this report do not reflect the positions or policies of the National Institute on Student Achievement, Curriculum, and Assessment, the Office of Educational Research and Improvement, or the U.S. Department of Education .

SRI Technology Evaluation Workshop Slide 2RJM 2/23/00

Some opportunities...

Cognitive/educational psychology» how people learn,» organize knowledge,» put knowledge to use.

Technology to...» create, present, and vivify “tasks”; » evoke, capture, parse, and store data; » evaluate, report, and use results.

SRI Technology Evaluation Workshop Slide 3RJM 2/23/00

A Challenge

How the heck do you make sense of rich, complex data, for more ambitious inferences about students?

SRI Technology Evaluation Workshop Slide 4RJM 2/23/00

A Response

Design assessment from

generative principles ...

1. Psychology

2. Purpose

3. Evidentiary reasoningConceptual design LEADS

Tasks, statistics & technology FOLLOW

SRI Technology Evaluation Workshop Slide 5RJM 2/23/00

Principled Assessment Design

Evidence Model(s) Task Model(s)

1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx

Student Model Stat model Evidence

rules

The three basic models

SRI Technology Evaluation Workshop Slide 6RJM 2/23/00

Evidence-centered assessment design

What complex of knowledge, skills, or other attributes should be assessed, presumably because they are tied to explicit or implicit objectives of instruction or are otherwise valued by society?

(Messick, 1992)

Evidence Model(s) Task Model(s)

1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx

Student Model Stat model Evidence

rules

SRI Technology Evaluation Workshop Slide 7RJM 2/23/00

Evidence-centered assessment design

What complex of knowledge, skills, or other attributes should be assessed, presumably because they are tied to explicit or implicit objectives of instruction or are otherwise valued by society?

What behaviors or performances should reveal those constructs?

(Messick, 1992)

Evidence Model(s) Task Model(s)

1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx

Student Model Stat model Evidence

rules

SRI Technology Evaluation Workshop Slide 8RJM 2/23/00

The Evidence Model(s)

Evidence rules extract features from a work product and evaluate values of observable variables.

Evidence Model(s)

Stat model Evidence

rules

Work product

Observable variables

SRI Technology Evaluation Workshop Slide 9RJM 2/23/00

Evidence Model(s)

Stat model Evidence

rules

The Evidence Model(s)

The statistical component expresses the how the observable variables depend, in probability, on student model variables.

Student modelvariables

Observablevariables

SRI Technology Evaluation Workshop Slide 10RJM 2/23/00

Evidence-centered assessment design

What complex of knowledge, skills, or other attributes should be assessed, presumably because they are tied to explicit or implicit objectives of instruction or are otherwise valued by society?

What behaviors or performances should reveal those constructs? What tasks or situations should elicit those behaviors?

(Messick, 1992)

Evidence Model(s) Task Model(s)

1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx

Student Model Stat model Evidence

rules

SRI Technology Evaluation Workshop Slide 11RJM 2/23/00

The Task Model(s)

Task-model variables describe features of tasks.

A task model provides a framework for describing and constructing the situations in which examinees act.

Task Model(s)

1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx

SRI Technology Evaluation Workshop Slide 12RJM 2/23/00

The Task Model(s)

Includes specifications for the stimulus material, conditions, and affordances--the environment in which the student will say, do, or produce something.

Task Model(s)

1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx

SRI Technology Evaluation Workshop Slide 13RJM 2/23/00

The Task Model(s)

Includes specifications for the “work product”:the form in which what the student says, does, or produces will be captured.

Task Model(s)

1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx

SRI Technology Evaluation Workshop Slide 14RJM 2/23/00

Leverage Points...

For cognitive/educational psychology For statistics For technology

SRI Technology Evaluation Workshop Slide 15RJM 2/23/00

Leverage Points for Cog Psych

The character and substance of the student model.

Evidence Model(s) Task Model(s)

1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx

Student Model Stat model Evidence

rules

SRI Technology Evaluation Workshop Slide 16RJM 2/23/00

Example a: GRE Verbal Reasoning

The student model is just the IRT ability parameter the tendency to make correct responses in the mix of items presented in a GRE-V.

Example b: HYDRIVE

Student-model variables in HYDRIVE

A Bayes net fragment.

Overall Proficiency

Procedural Knowledge

PowerSystem

SystemKnowledge

Strategic Knowledge

Use ofGauges

SpaceSplitting

Electrical Tests

SerialElimination

Landing GearKnowledge

Canopy Knowledge

ElectronicsKnowledge

HydraulicsKnowledge

Mechanical Knowledge

SRI Technology Evaluation Workshop Slide 18RJM 2/23/00

Leverage Points for Cog Psych

The character and substance of the student model. What we can observe to give us evidence,

The work product

Evidence Model(s) Task Model(s)

1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx

Student Model Stat model Evidence

rules

SRI Technology Evaluation Workshop Slide 19RJM 2/23/00

Leverage Points for Cog Psych

The character and substance of the student model. What we can observe to give us evidence,

and how to recognize and summarize its key features.

Evidence Model(s) Task Model(s)

1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx

Student Model Stat model Evidence

rules

SRI Technology Evaluation Workshop Slide 20RJM 2/23/00

Leverage Points for Cog Psych

The character and substance of the student model. What we can observe to give us evidence,

and how to recognize and summarize its key features. Modeling which aspects of performance depend on which aspects of

knowledge, in what ways.

Evidence Model(s) Task Model(s)

1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx

Student Model Stat model Evidence

rules

SRI Technology Evaluation Workshop Slide 21RJM 2/23/00

Leverage Points for Cog Psych

The character and substance of the student model. What we can observe to give us evidence,

and how to recognize and summarize its key features. Modeling how which aspects of performance depend on which aspects of

knowledge , in what ways. Effective ways to elicit the kinds of behavior we need to see.

Evidence Model(s) Task Model(s)

1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx

Student Model Stat model Evidence

rules

SRI Technology Evaluation Workshop Slide 22RJM 2/23/00

Leverage Points for Statistics

Managing uncertainty with respect to the student model. Bayes nets (generalize beyond familiar test theory models--eg, VanLehn) Modular construction of models Monte Carlo estimation Knowledge-based model construction wrt the student model.

Evidence Model(s) Task Model(s)

1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx

Student Model Stat model Evidence

rules

SRI Technology Evaluation Workshop Slide 23RJM 2/23/00

Leverage Points for Statistics

Managing the stochastic relationship between observations in particular tasks and the persistent unobservable student model variables. Bayes nets Modular construction of models (incl psychometric building blocks) Monte Carlo approximation Knowledge-based model construction--docking with the student model.

Evidence Model(s) Task Model(s)

1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx

Student Model Stat model Evidence

rules

SRI Technology Evaluation Workshop Slide 24RJM 2/23/00

Example a, continued: GRE-V

Sample Bayes net --

Student model fragment

docked with an

Evidence Model fragment (IRT model & parameters for this item)

Xj

Library of

Evidence Model

Bayes net fragments

X1

X2::

Xn

Example b, continued: HYDRIVE

Sample Bayes net fragment Library of fragments

Canopy Situation--No split possible

Canopy Situation--No split possible

Use ofGauges

SerialElimination

Canopy Knowledge

HydraulicsKnowledge

Mechanical Knowledge

SRI Technology Evaluation Workshop Slide 26RJM 2/23/00

Leverage Points for Statistics

Extracting features and determining values of observable variables . Bayes nets (also neural networks, rule-based logic) Modeling human raters for training, quality control, efficiency

Evidence Model(s) Task Model(s)

1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx

Student Model Stat model Evidence

rules

SRI Technology Evaluation Workshop Slide 27RJM 2/23/00

Leverage Points for Technology

Dynamic assembly of the student model.

Evidence Model(s) Task Model(s)

1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx

Student Model Stat model Evidence

rules

SRI Technology Evaluation Workshop Slide 28RJM 2/23/00

Leverage Points for Technology

Dynamic assembly of the student model. Complex and realistic tasks that can produce direct evidence about

knowledge used for production and interaction.

Stimulus material

Work environment

Evidence Model(s) Task Model(s)

1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx

Student Model Stat model Evidence

rules

SRI Technology Evaluation Workshop Slide 29RJM 2/23/00

Leverage Points for Technology

Dynamic assembly of the student model. Complex and realistic tasks that can produce direct evidence about

knowledge used for production and interaction.

Work product

Evidence Model(s) Task Model(s)

1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx

Student Model Stat model Evidence

rules

SRI Technology Evaluation Workshop Slide 30RJM 2/23/00

Leverage Points for Technology

Dynamic assembly of the student model. Complex and realistic tasks that can produce direct evidence about

knowledge used for production and interaction. Automated extraction and evaluation of key features of complex work.

Evidence Model(s) Task Model(s)

1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx

Student Model Stat model Evidence

rules

SRI Technology Evaluation Workshop Slide 31RJM 2/23/00

Leverage Points for Technology

Dynamic assembly of the student model. Complex and realistic tasks that can produce direct evidence about

knowledge used for production and interaction. Automated extraction and evaluation of key features of complex work. Construction and calculation to guide acquisition of, and manage of

uncertainty about, our knowledge about the student.

Evidence Model(s) Task Model(s)

1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx

Student Model Stat model Evidence

rules

SRI Technology Evaluation Workshop Slide 32RJM 2/23/00

Leverage Points for Technology

Dynamic assembly of the student model. Complex and realistic tasks that can produce direct evidence about

knowledge used for production and interaction. Automated extraction and evaluation of key features of complex work. Construction and calculation to guide acquisition of, and manage and

uncertainty about, knowledge about the student. Automated/assisted task construction, presentation, management.

Evidence Model(s) Task Model(s)

1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx

Student Model Stat model Evidence

rules

SRI Technology Evaluation Workshop Slide 33RJM 2/23/00

The Cloud behind the Silver Lining

These developments will have the most impact when assessments are built for well-defined purposes, and connected with a conception of knowledge in the targeted domain.

They will have much less impact for ‘drop-in-from-the-sky’ large-scale assessments like NAEP.