MDE / OEAA 1 Un-distorting Measures of Growth: Alternatives to Traditional Vertical Scales...

25
MDE / OEAA 1 Un-distorting Measures of Growth: Alternatives to Traditional Vertical Scales Presentation on June 19, 2005 to 25 th Annual CCSSO Conference on Large-Scale Assessment By Joseph A. Martineau, Psychometrician Office of Educational Assessment & Accountability (OEAA) Michigan Department of Education (MDE)

Transcript of MDE / OEAA 1 Un-distorting Measures of Growth: Alternatives to Traditional Vertical Scales...

Page 1: MDE / OEAA 1 Un-distorting Measures of Growth: Alternatives to Traditional Vertical Scales Presentation on June 19, 2005 to 25 th Annual CCSSO Conference.

MD

E /

OE

AA

1

Un-distorting Measures of Growth: Alternatives to Traditional Vertical Scales

Presentation on June 19, 2005 to25th Annual CCSSO Conference on Large-Scale Assessment

By Joseph A. Martineau, PsychometricianOffice of Educational Assessment & Accountability (OEAA)Michigan Department of Education (MDE)

Page 2: MDE / OEAA 1 Un-distorting Measures of Growth: Alternatives to Traditional Vertical Scales Presentation on June 19, 2005 to 25 th Annual CCSSO Conference.

MD

E /

OE

AA

2

Introduction

• Measurement of growth or “progress”– Growth models

• Measurement of educators’ contributions to student growth or progress– Value Added Models (VAM)

• Both require vertical scales that– Measure the “same thing” along the entire

scale– Have the same meaning along the entire

scale

Page 3: MDE / OEAA 1 Un-distorting Measures of Growth: Alternatives to Traditional Vertical Scales Presentation on June 19, 2005 to 25 th Annual CCSSO Conference.

MD

E /

OE

AA

3

Distortions in studies of growth

• Using traditional vertical scales to measure growth can result in the following distortions:– Identification of growth trajectories with little

resemblance to true growth trajectories– Attribution of effects on growth to effects on initial

status and vice versa– Identification of false effects on initial status or growth– Failure to detect true effects on initial status or

growth– Identification of effective interventions as harmful and

vice versa

Page 4: MDE / OEAA 1 Un-distorting Measures of Growth: Alternatives to Traditional Vertical Scales Presentation on June 19, 2005 to 25 th Annual CCSSO Conference.

MD

E /

OE

AA

4

Graphical demonstration of one kind of distortion in growth models

Grade 5 scale mostly measures differences in number sense

Grade 6 scale mostly measures differences in algebra

Panel A: Unequated scales

Number sense

Alg

ebra

Unequated grade-5 scale

Unequated grade-6 scale

Page 5: MDE / OEAA 1 Un-distorting Measures of Growth: Alternatives to Traditional Vertical Scales Presentation on June 19, 2005 to 25 th Annual CCSSO Conference.

MD

E /

OE

AA

5

Graphical demonstration of one kind of distortion in growth models

Vertically “equated, unidimensional” scales have to bend to accommodate both the grade-5 and grade-6 content mixes

This can come out as fitting a unidimen-sional model if number sense and algebra scores are strongly correlated, but strong correlations do not alleviate distortions in measures of growth

Panel B: Vertically "equated" scale

Number sense

Alg

ebra

Vertically "equated" grade-5/6 scale

Page 6: MDE / OEAA 1 Un-distorting Measures of Growth: Alternatives to Traditional Vertical Scales Presentation on June 19, 2005 to 25 th Annual CCSSO Conference.

MD

E /

OE

AA

6

Graphical demonstration of one kind of distortion in growth models

Any given student’s true achievement may not lie near the vertical scale, so the vertical scale may be incapable of accurately representing student achievement

Panel C: True achievement

Number sense

Alg

ebra

True grade 5 achievement

True grade 6 achievement

Page 7: MDE / OEAA 1 Un-distorting Measures of Growth: Alternatives to Traditional Vertical Scales Presentation on June 19, 2005 to 25 th Annual CCSSO Conference.

MD

E /

OE

AA

7

Graphical demonstration of one kind of distortion in growth models

Therefore, the true multidimensional achievement of a student becomes projected onto the “unidimensional” vertical scale

Panel D: Projection onto vertical scale

Number sense

Alg

ebra

Projection of truegrade-5 achievementonto vertical scale

Projection of truegrade-6 achievementonto vertical scale

Page 8: MDE / OEAA 1 Un-distorting Measures of Growth: Alternatives to Traditional Vertical Scales Presentation on June 19, 2005 to 25 th Annual CCSSO Conference.

MD

E /

OE

AA

8

Graphical demonstration of one kind of distortion in growth models

The nearest point on the “unidimensional” vertical scale is the most likely estimate of “unidimensional” student ability

Panel E: Estimated achievement

Number sense

Alg

ebra

Estimated grade-6achievement

Estimated grade-5achievement

Page 9: MDE / OEAA 1 Un-distorting Measures of Growth: Alternatives to Traditional Vertical Scales Presentation on June 19, 2005 to 25 th Annual CCSSO Conference.

MD

E /

OE

AA

9

Graphical demonstration of one kind of distortion in growth models

The true measure of growth and the “unidimensional” measure of growth are remarkably different

The distortion can be overestimation of growth (as shown here) or under-estimation of growth

This can have remarkable effects on studies of growth

Panel F: True and estimated growth

Number sense

Alg

ebra

True growth

Estimated growth

Page 10: MDE / OEAA 1 Un-distorting Measures of Growth: Alternatives to Traditional Vertical Scales Presentation on June 19, 2005 to 25 th Annual CCSSO Conference.

MD

E /

OE

AA

10

Distortions in studies of value added

• Using traditional vertical scales to measure educators contributions to student growth can result in the following distortions:– Mis-estimation of educator effectiveness simply

because educators serve students whose growth is occurring outside the range measured well by the test

– Attribution of prior educators’ effectiveness to later educators

• One promise of value added is to cease to hold educators accountable for prior experiences of students

• This distortion betrays that promise

Page 11: MDE / OEAA 1 Un-distorting Measures of Growth: Alternatives to Traditional Vertical Scales Presentation on June 19, 2005 to 25 th Annual CCSSO Conference.

MD

E /

OE

AA

11

Graphical demonstration of one kind of distortion in value added models

Grade 5 scale mostly measures differences in number sense

Grade 6 scale mostly measures differences in algebra

Scale has to “bend” to accommodate both tests’ content

Panel A: Vertically "equated" scale

Number sense

Alg

ebra

Vertically "equated" grade-5/6 scale

Page 12: MDE / OEAA 1 Un-distorting Measures of Growth: Alternatives to Traditional Vertical Scales Presentation on June 19, 2005 to 25 th Annual CCSSO Conference.

MD

E /

OE

AA

12

Graphical demonstration of one kind of distortion in value added models

True average statewide scores are likely to lie close to (but not on) the vertical scale

Panel B: Average statewide true scores

Number sense

Alg

ebra True grade-5 statewide

average achievement

True grade-6 statewideaverage achievement

Page 13: MDE / OEAA 1 Un-distorting Measures of Growth: Alternatives to Traditional Vertical Scales Presentation on June 19, 2005 to 25 th Annual CCSSO Conference.

MD

E /

OE

AA

13

Graphical demonstration of one kind of distortion in value added models

Individual school (or teacher) average true scores are likely to lie farther off the vertical scale than statewide averages

Individual school (or teacher) average true scores are likely to be quite different than the statewide averages

Panel C: Average school-X true scores

Number sense

Alg

ebra

True grade-6 average achievement in school X

True grade-5 average achievement in school X

Page 14: MDE / OEAA 1 Un-distorting Measures of Growth: Alternatives to Traditional Vertical Scales Presentation on June 19, 2005 to 25 th Annual CCSSO Conference.

MD

E /

OE

AA

14

Graphical demonstration of one kind of distortion in value added models

In this carefully chosen scenario, both the statewide averages and the average scores of a given school project onto the vertical scale at exactly the same place

Panel D: Projection onto vertical scale

Number sense

Alg

ebra

Projections onto vertical scaleaverage

Projections onto vertical scaleaverage

Page 15: MDE / OEAA 1 Un-distorting Measures of Growth: Alternatives to Traditional Vertical Scales Presentation on June 19, 2005 to 25 th Annual CCSSO Conference.

MD

E /

OE

AA

15

Graphical demonstration of one kind of distortion in value added models

Even though statewide and school averages are very different in two dimen-sions, they are estimated to be identical on the “unidimensional” score scale.

Panel E: Estimated achievement

Number sense

Alg

ebra

Estimated grade-6 average achievement, both statewide and for school X

Estimated grade-5 average achievement, both statewide and for school X

Page 16: MDE / OEAA 1 Un-distorting Measures of Growth: Alternatives to Traditional Vertical Scales Presentation on June 19, 2005 to 25 th Annual CCSSO Conference.

MD

E /

OE

AA

16

Graphical demonstration of one kind of distortion in value added models

The average state growth is overestimated, the average school-X growth is underestimated, such that both are equal

In a vertical-scale-based value added model, this exceptionally effective school would be identified as average

Overestimation of individual school effectiveness can also result from the distortions

Panel F: True and estimated growth

Number sense

Alg

ebra

True average school-X growth

True averagestatewide growth

Estimated growth, both statewide and for school X

Page 17: MDE / OEAA 1 Un-distorting Measures of Growth: Alternatives to Traditional Vertical Scales Presentation on June 19, 2005 to 25 th Annual CCSSO Conference.

MD

E /

OE

AA

17

Graphical Demonstration

• Table 1 on page 13 of the document

• Interpretation– Effect size of 0.00 is equivalent to 1 part truth,

no parts distortion– Effect size of 0.25 is equivalent to 4 parts

truth, 1 part distortion– Effect size of 1.00 is equivalent to the results

of VAM being 1 part truth, 1 part distortion.– Effect size of 2.00 is equivalent to 1 part truth,

2 parts distortion

Page 18: MDE / OEAA 1 Un-distorting Measures of Growth: Alternatives to Traditional Vertical Scales Presentation on June 19, 2005 to 25 th Annual CCSSO Conference.

MD

E /

OE

AA

18

Alternatives to TraditionalVertical Scales

• Given that using vertical scales in growth-based statistical models results in distorted outcomes, where do we go from here?

• Michigan has investigated several alternatives– Vertically Moderated Standard Setting– Domain-Referenced Measurement of Growth– Link only adjacent grades– Provided stronger out-of-level content representation

as vertical linking items• Matrix sampling• Large number of forms

• All of these are important to do, but are insufficient to resolve the distortions arising from using vertical scales in growth-based models

Page 19: MDE / OEAA 1 Un-distorting Measures of Growth: Alternatives to Traditional Vertical Scales Presentation on June 19, 2005 to 25 th Annual CCSSO Conference.

MD

E /

OE

AA

19

Alternatives to TraditionalVertical Scales

• Michigan is investigating other alternatives– Additional testing

• Fall and Spring• More than twice per year• Eliminates summer loss/gain problem• Completely eliminates distortions!

Page 20: MDE / OEAA 1 Un-distorting Measures of Growth: Alternatives to Traditional Vertical Scales Presentation on June 19, 2005 to 25 th Annual CCSSO Conference.

MD

E /

OE

AA

20

Alternatives to TraditionalVertical Scales

• Michigan is investigating other alternatives– Additional testing

• Fall and Spring• More than twice per year• Eliminates summer loss/gain problem• Completely eliminates distortions!

–Yeah, whatever!

Page 21: MDE / OEAA 1 Un-distorting Measures of Growth: Alternatives to Traditional Vertical Scales Presentation on June 19, 2005 to 25 th Annual CCSSO Conference.

MD

E /

OE

AA

21

Alternatives to TraditionalVertical Scales

• Michigan is investigating other alternatives– Supplement grade-level content with

substantial quantities of out-of-level items• Items like those on lower grade-level tests• Items like those on higher grade-level tests• Could be done either by P&P or CBT• Implementing with CAT

– Would require little additional testing because out-of-level items could inform the stopping rules

– May not work with NCLB

Page 22: MDE / OEAA 1 Un-distorting Measures of Growth: Alternatives to Traditional Vertical Scales Presentation on June 19, 2005 to 25 th Annual CCSSO Conference.

MD

E /

OE

AA

22

Alternatives to TraditionalVertical Scales

• Michigan is investigating other alternatives– Supplement grade-level content with

substantial quantities of out-of-level items• Provides for less precise estimates of growth, but

they should at least be undistorted• Administer items like those on lower and/or higher

grade-level tests• Could be done either by P&P or CBT• Implementing with CAT

– Would require little additional testing because out-of-level items could inform the stopping rules

– May not work for NCLB because of on-grade-level requirements

Page 23: MDE / OEAA 1 Un-distorting Measures of Growth: Alternatives to Traditional Vertical Scales Presentation on June 19, 2005 to 25 th Annual CCSSO Conference.

MD

E /

OE

AA

23

Alternatives to TraditionalVertical Scales

• Michigan is investigating other alternatives– More complex psychometric models

• Without changing the administration model, the only way to address the distortions is to change the psychometric model

• The psychometric model needs to acknowledge and exploit the multidimensional complexity of item response data

• Multidimensional models can be a liability as well– Public relations (complexity of the model)– Possibility for error (complexity of the model)– Turnaround time (intensity of the analysis)

• This area is promising as well as challenging

Page 24: MDE / OEAA 1 Un-distorting Measures of Growth: Alternatives to Traditional Vertical Scales Presentation on June 19, 2005 to 25 th Annual CCSSO Conference.

MD

E /

OE

AA

24

Conclusion

• Growth-based statistical models using vertically scaled student achievement data are much further along than they were several years ago

• Growth-based statistical models using vertically scaled student achievement data are still not robust enough to support high-stakes use

• Either the test administration model or the psychometric model needs to reflect the complexity of the intended analyses

• No existing methods have been proven to allow for high-stakes use of growth-based statistical models, including Value Added Models

Page 25: MDE / OEAA 1 Un-distorting Measures of Growth: Alternatives to Traditional Vertical Scales Presentation on June 19, 2005 to 25 th Annual CCSSO Conference.

MD

E /

OE

AA

25

Contact Information

Joseph Martineau, PsychometricianOffice of Educational Assessment & AccountabilityMichigan Department of EducationP.O. Box 30008Lansing, MI 48909

(517) [email protected]