Presentation at the 12 th Annual Maryland Assessment Conference College Park, MD October 18, 2012

45
PRESENTATION AT THE 12 TH ANNUAL MARYLAND ASSESSMENT CONFERENCE COLLEGE PARK, MD OCTOBER 18, 2012 JOSEPH A. MARTINEAU JI ZENG MICHIGAN DEPARTMENT OF EDUCATION Borrowing the Strength of Unidimensional Scaling to Produce Multidimensional Educational Effectiveness Profiles

description

Borrowing the Strength of Unidimensional Scaling to Produce Multidimensional Educational Effectiveness Profiles. Presentation at the 12 th Annual Maryland Assessment Conference College Park, MD October 18, 2012 Joseph A. Martineau Ji Zeng Michigan Department of Education. Background. - PowerPoint PPT Presentation

Transcript of Presentation at the 12 th Annual Maryland Assessment Conference College Park, MD October 18, 2012

Page 1: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

P R E S E N TAT I O N AT T H E 1 2 T H A N N UA LM A RY L A N D A SS ESS M E N T C O N F ER EN C E

C O L L E G E PA R K , M DO C T O B E R 1 8 , 2 0 1 2

J O S EP H A . M A RT I N E A UJ I Z E N G

M I C H I GA N D E PA RT M E N T O F E D U C AT I O N

Borrowing the Strength of Unidimensional Scaling to Produce

Multidimensional Educational Effectiveness Profiles

Page 2: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

2

Background

Prior research showing that using unidimensional measures of multidimensional achievement constructs can distort value-added Martineau, J. A. (2006). Distorting Value Added: The Use of

Longitudinal, Vertically Scaled Student Achievement Data for Value-Added Accountability. Journal of Educational and Behavioral Statistics, 31(1), 35-62.

Construct irrelevant variance can become considerable in value-added measures when a construct is multidimensional, but is modeled in value-added as unidimensional.

Common misunderstanding is that if the multiple constructs are highly correlated, value-added should not be distorted.

Correct understanding is that if value-added on the multiple constructs is highly correlated, value-added should not be distorted

Page 3: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

3

Background

Prior research showing that the choice of dimension/domain within construct changes value-added significantly Lockwood, J.R et al. (2007). The Sensitivity of Value-Added

Teacher Effect Estimates to Different Mathematics Achievement Measures. Journal of Educational Measurement, 44(1), 47-67.

Depending on choices made in value-added modeling, the correlation between teacher value-added on Procedures and Problem Solving ranged from 0.01 to 0.46.

This gives a surprisingly low correlation in value-added that indicates that at least in this situation, one needs to be concerned about modeling value-added in both dimensions rather than unidimensionally.

Only work I am aware of to date that has inspected inter-construct value-added correlations.

Page 4: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

4

Background

Prior research showing that commonly used factor analytic techniques underestimate the number of dimensions in a multidimensional construct Zeng, J. (2010) . Development of a Hybrid Method for

Dimensionality Identification Incorporating an Angle-Based Approach. Unpublished doctoral dissertation, University of Michigan.

Common dimensionality identifications procedures make the unwarranted assumption that all shared variance among indicator variables arise because the indicator variables measure the same construct (shared variance can also arise because the indicator variables are influenced by a common exogenous variable)

Because of this unwarranted assumption, commonly used dimensionality identification techniques underestimate the number of dimensions in a data set.

Page 5: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

5

Background

Scaling constructs as multidimensional is a difficult task Multidimensional Item Response Theory (MIRT) is time-

consuming and costly to run Replicating MIRT analyses can be challenging (there are

multiple subjective decision points along the way) Identifying the number of dimensions in MIRT can be

challenging Once the number of dimensions is identified, identifying

which items load in which dimensions in MIRT can also be challenging The factor analysis techniques underlying MIRT are

techniques for data reduction, not dimension identification

Page 6: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

6

Background

Short of resolving the considerable difficulties in analytically identifying dimensions within a construct (and replicating such analyses), can another approach be used?

Propose using/trusting content experts’ identifications of dimensions within constructs (e.g., the divisions agreed upon by the writers of content standards) as the best currently available identification of dimensions, for example… Within English language proficiency, producing reading,

writing, listening, and speaking scales. Within Mathematics, producing number & operations,

algebra, geometry, measurement, and data analysis/statistics scales.

Page 7: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

7

Background

However, separately scaling each dimension can also be difficult and costly compared to running a traditional unidimensional IRT calibration Confirmatory MIRT Bi-factor IRT model Separate unidimensional calibration and year-to-year equating of

each dimension score Another option:

Unidimensionally calibrate the total score Unidimensionally equate the total score from year to year Use (fixed) item parameters from the unidimensional calibration

to create the multiple dimension scores as specified by content experts

Use of this method needs to be investigated Practical necessity for Smarter Balanced Assessment

Consortium

Page 8: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

8

Purpose

Investigate the feasibility and validity of relying on unidimensional total score calibration as a basis for creating multidimensional profile scores… For reporting multidimensional student achievement scores For reporting multidimensional value-added measures

Investigate the impact of separate versus fixed calibration of multidimensional achievement scores in terms of impact on… Student achievement scores Value-added scores

…as compared to the impact of other common decisions in scaling, outcome selection, and value-added modeling

Page 9: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

9

Methods

Decisions Modeled in the Analyses Psychometric decisions

Choice of psychometric model 1-PL vs. 3-PL PCM vs. GPCM

Estimation of sub-scores Separate calibration for each dimension vs. fixed calibration

based on unidimensional parameters Choice of outcome metric

Which sub-score is modeled Value-added modeling decisions

Inclusion of demographics in models Number of pre-test covariates (for covariate adjustment

models)

Page 10: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

10

Methods

Outcomes Correlations in student achievement metrics compared

across each psychometric choice and outcome choice Correlations in value-added modeling compared across

each choice Classification consistency in value-added compared across

each choice for Three-category classification decisions

Based on confidence intervals around point-estimates placing programs/schools into three categories: (1) above average, (2) statistically indistinguishable from the average, and (3) below average

Four-category classification decisions Based on sorting programs’/schools’ point estimates into

quartiles, representing arbitrary cut points for classification

Page 11: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

11

Methods

Data Michigan English Language Proficiency Assessment (ELPA) Level III (Grades 3-5) 3391 students each with 3 measurement occasions (10,173 total scores) Measures

Total Reading (domain) Writing (domain) Listening (domain) Speaking (domain)

Calibrated the ELPA as a unidimensional measure using both 1-PL/Partial Credit Model and 3-PL/Generalized Partial Credit Model

Created domain scores both from fixed parameters from unidimensional calibration and in separate calibrations for each domain

Page 12: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

12

Methods

Data Michigan Educational Assessment Program (MEAP)

Mathematics Grades 7 and 8 (not on a vertical scale) Over 110,000 students per grade Measures

Total (using items from the two domains) Number & Operations (domain) Algebra (domain)

Calibrated the MEAP Math tests as unidimensional measures using both 1-PL and 3-PL models

Created domain scores both from fixed parameters from unidimensional calibration and in separate calibrations for each domain

Page 13: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

13

Methods

Value-added modeling the ELPA 3-level HLM nesting test occasion within

student within English language learner program to obtain program value-added

Page 14: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

14

Methods

Value-added modeling the ELPA VAMs were run in a fully-crossed design

with… All outcomes (R, W, L, S) PCM- and GPCM-calibrated outcomes Fixed and separately calibrated outcomes With and without demographics in the VAMs

32 real-data applications across design factors

Page 15: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

15

Methods

Value-added modeling MEAP mathematics 2-level HLM covarying grade-8 outcomes

on grade-7 outcomes with students nested within schools

Page 16: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

16

Methods

Value-added modeling MEAP mathematics VAMs were run in a fully-crossed design

with… Both outcomes (algebra and number &

operations) 1-PL and 3-PL calibrated outcomes Fixed and separately calibrated outcomes With and without demographics With either one or two pre-test covariates

32 real-data applications across design factors

Page 17: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

17

Results

ELPA

Page 18: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

18

Results: ELPA Student-Level Outcomes

Correlations across fixed vs. separate calibrationsModel choice Content Area Correlation

Reading 0.997Writing 0.995

Listening 0.997Speaking 1.000Reading 0.997Writing 0.997

Listening 0.994Speaking 1.000

PCM

GPCM

Page 19: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

19

Results: ELPA Student-Level Outcomes

Correlations across model choice (PCM vs. GPCM)Calibration choice Content Area Correlation

Reading 0.972Writing 0.983

Listening 0.967Speaking 0.982Reading 0.978Writing 0.983

Listening 0.977Speaking 0.982

Fixed

Separate

Page 20: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

20

Results: ELPA Student-Level Outcomes

Correlations across content areasReading Writing Listening Speaking

Reading - 0.636 0.627 0.371Writing - - 0.537 0.385

Listening - - - 0.368Speaking - - - -Reading - 0.622 0.626 0.373Writing - - 0.519 0.375

Listening - - - 0.365Speaking - - - -Reading - 0.655 0.662 0.402Writing - - 0.559 0.407

Listening - - - 0.405Speaking - - - -Reading - 0.639 0.648 0.395Writing - - 0.543 0.400

Listening - - - 0.394Speaking - - - -

Content AreaContentArea

Calibrationchoice

Modelchoice

PCM

GPCM

Fixed

Separate

Fixed

Separate

Low to moderate inter-dimension correlations

However, Rasch dimensionality analysis from WINSTEPS identified the total score as a unidimensional score

Page 21: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

21

Results: ELPA Program District-Level Value-Added Outcomes

Impact of fixed versus separate calibrationPCM GPCM PCM GPCM

Reading 1.000 0.987 1.000 0.992 min 0.987Writing 1.000 0.997 1.000 0.997 max 1.000Listening 1.000 0.987 1.000 0.987 mean 0.997Speaking 1.000 1.000 1.000 1.000 SD 0.005

PCM GPCM PCM GPCMReading 0.996 0.996 1.000 0.991 min 0.991Writing 1.000 0.996 1.000 0.991 max 1.000Listening 1.000 1.000 1.000 0.996 mean 0.998Speaking 1.000 1.000 1.000 1.000 SD 0.003

PCM GPCM PCM GPCMReading 0.982 0.875 0.982 0.902 min 0.875Writing 0.973 0.946 0.982 0.946 max 1.000Listening 0.991 0.897 0.991 0.906 mean 0.961Speaking 1.000 1.000 1.000 1.000 SD 0.043

Content AreaNo Demos Demos

Content Area

4-CategoryConsistency

DemosNo Demos

Correlations

3-CategoryConsistency

Content AreaNo Demos Demos

Page 22: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

22

Results: ELPA Program District-Level Value-Added Outcomes

Correlations between Listening and Reading VA

Min = 0.228, Max = 0.397Mean = 0.322, SD = 0.037

Fixed Separate Fixed Separate Fixed Separate Fixed SeparateFixed 0.371 0.371 0.301 0.327 0.303 0.302 0.228 0.245Separate 0.372 0.371 0.303 0.328 0.304 0.303 0.230 0.247Fixed 0.360 0.361 0.387 0.392 0.301 0.302 0.316 0.321Separate 0.376 0.377 0.389 0.397 0.327 0.328 0.320 0.329Fixed 0.330 0.330 0.292 0.308 0.318 0.317 0.261 0.275Separate 0.329 0.330 0.294 0.309 0.318 0.318 0.263 0.277Fixed 0.304 0.305 0.341 0.342 0.307 0.309 0.329 0.332Separate 0.328 0.329 0.346 0.350 0.333 0.335 0.332 0.339

Reading

PCM GPCMNo Demos Demos

PCM

GPCM

GPCM

No D

emos

Dem

osListe

ning

PCM GPCM

PCM

Page 23: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

23

Results: ELPA Program District-Level Value-Added Outcomes

Correlations between Listening and Writing VA

Min = 0.342, Max = 0.420Mean = 0.373, SD = 0.019

Fixed Separate Fixed Separate Fixed Separate Fixed SeparateFixed 0.358 0.359 0.369 0.366 0.342 0.343 0.353 0.353Separate 0.359 0.360 0.370 0.367 0.343 0.344 0.354 0.354Fixed 0.403 0.403 0.420 0.412 0.385 0.385 0.401 0.396Separate 0.368 0.368 0.383 0.376 0.354 0.355 0.370 0.364Fixed 0.362 0.362 0.373 0.371 0.361 0.362 0.372 0.371Separate 0.363 0.364 0.374 0.372 0.362 0.363 0.374 0.372Fixed 0.395 0.395 0.410 0.405 0.397 0.397 0.412 0.407Separate 0.364 0.364 0.378 0.373 0.365 0.365 0.379 0.374

Liste

ning No

Dem

os PCM

GPCM

Dem

os

PCM

GPCM

WritingNo Demos Demos

PCM GPCM PCM GPCM

Page 24: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

24

Results: ELPA Program District-Level Value-Added Outcomes

Correlations between Listening and Speaking VA

Min = -0.005, Max = 0.108Mean = 0.046, SD = 0.035

Fixed Separate Fixed Separate Fixed Separate Fixed SeparateFixed 0.002 0.002 0.026 0.026 0.005 0.005 0.032 0.032Separate 0.004 0.004 0.028 0.028 0.007 0.007 0.033 0.033Fixed 0.068 0.068 0.102 0.102 0.081 0.081 0.108 0.108Separate 0.051 0.051 0.080 0.080 0.061 0.061 0.086 0.086Fixed -0.005 -0.005 0.025 0.025 0.001 0.001 0.028 0.028Separate -0.004 -0.004 0.027 0.027 0.002 0.002 0.029 0.029Fixed 0.065 0.065 0.097 0.097 0.075 0.075 0.101 0.101Separate 0.047 0.047 0.076 0.076 0.056 0.056 0.080 0.080

Liste

ning No

Dem

os PCM

GPCM

Dem

os

PCM

GPCM

SpeakingNo Demos Demos

PCM GPCM PCM GPCM

Page 25: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

25

Results: ELPA Program District-Level Value-Added Outcomes

Correlations between Reading and Writing VA

Min = 0.335, Max = 0.491Mean = 0.412, SD = 0.047

Fixed Separate Fixed Separate Fixed Separate Fixed SeparateFixed 0.389 0.390 0.393 0.386 0.335 0.336 0.341 0.338Separate 0.392 0.393 0.396 0.389 0.338 0.339 0.344 0.341Fixed 0.466 0.464 0.480 0.466 0.442 0.440 0.455 0.443Separate 0.455 0.454 0.468 0.455 0.420 0.419 0.432 0.422Fixed 0.365 0.365 0.370 0.365 0.374 0.374 0.379 0.372Separate 0.369 0.369 0.374 0.369 0.379 0.379 0.384 0.377Fixed 0.453 0.450 0.465 0.454 0.478 0.476 0.491 0.477Separate 0.440 0.438 0.452 0.442 0.464 0.462 0.476 0.461

Read

ing No

Dem

os PCM

GPCM

Dem

os

PCM

GPCM

WritingNo Demos Demos

PCM GPCM PCM GPCM

Page 26: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

26

Results: ELPA Program District-Level Value-Added Outcomes

Correlations between Reading and Speaking VA

Min = 0.121, Max = 0.205Mean = 0.151, SD = 0.026

Fixed Separate Fixed Separate Fixed Separate Fixed SeparateFixed 0.121 0.121 0.132 0.132 0.131 0.131 0.136 0.136Separate 0.122 0.122 0.134 0.134 0.132 0.132 0.138 0.138Fixed 0.129 0.129 0.174 0.174 0.152 0.152 0.179 0.179Separate 0.134 0.134 0.172 0.172 0.154 0.154 0.177 0.177Fixed 0.122 0.122 0.136 0.136 0.125 0.125 0.134 0.134Separate 0.125 0.125 0.139 0.139 0.128 0.128 0.138 0.138Fixed 0.163 0.163 0.205 0.205 0.171 0.171 0.203 0.203Separate 0.162 0.162 0.199 0.199 0.168 0.168 0.197 0.197

Read

ing No

Dem

os PCM

GPCM

Dem

os

PCM

GPCM

SpeakingNo Demos Demos

PCM GPCM PCM GPCM

Page 27: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

27

Results: ELPA Program District-Level Value-Added Outcomes

Correlations between Speaking and Writing VA

Min = 0.150, Max = 0.246Mean = 0.199, SD = 0.029

Fixed Separate Fixed Separate Fixed Separate Fixed SeparateFixed 0.151 0.150 0.169 0.180 0.158 0.157 0.180 0.189Separate 0.151 0.150 0.169 0.180 0.158 0.157 0.180 0.189Fixed 0.207 0.205 0.225 0.236 0.209 0.208 0.231 0.240Separate 0.207 0.205 0.225 0.236 0.209 0.208 0.231 0.240Fixed 0.173 0.172 0.192 0.202 0.167 0.165 0.189 0.197Separate 0.173 0.172 0.192 0.202 0.167 0.165 0.189 0.197Fixed 0.216 0.215 0.235 0.246 0.212 0.210 0.233 0.243Separate 0.216 0.215 0.235 0.246 0.212 0.210 0.233 0.243

Spea

king No

Dem

os PCM

GPCM

Dem

os

PCM

GPCM

WritingNo Demos Demos

PCM GPCM PCM GPCM

Page 28: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

28

Results: ELPA Program District-Level Value-Added Outcomes

Impact of choice of psychometric modelFixed Sep Fixed Sep

Reading 0.837 0.900 0.834 0.887 min 0.834Writing 0.988 0.987 0.988 0.986 max 0.988Listening 0.929 0.945 0.942 0.955 mean 0.943Speaking 0.975 0.975 0.980 0.980 SD 0.052

Fixed Sep Fixed SepReading 0.973 0.982 0.978 0.987 min 0.964Writing 0.996 0.991 0.996 0.996 max 0.996Listening 0.987 0.987 0.982 0.987 mean 0.982Speaking 0.964 0.964 0.969 0.969 SD 0.011

Fixed Sep Fixed SepReading 0.567 0.634 0.580 0.634 min 0.567Writing 0.902 0.866 0.920 0.893 max 0.920Listening 0.728 0.728 0.768 0.754 mean 0.765Speaking 0.795 0.795 0.839 0.839 SD 0.113

Content AreaNo Demos Demos

Content AreaNo Demos Demos

Correlations

3-CategoryConsistency

4-CategoryConsistency

Content AreaNo Demos Demos

Page 29: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

29

Results: ELPA Program District-Level Value-Added Outcomes

Impact of Including/Not Including Demographics

Fixed Sep Fixed SepReading 0.915 0.915 0.931 0.922 min 0.915Writing 0.978 0.978 0.979 0.982 max 0.997Listening 0.982 0.982 0.980 0.981 mean 0.969Speaking 0.993 0.993 0.997 0.997 SD 0.030

Fixed Sep Fixed SepReading 0.991 0.987 0.987 0.982 min 0.973Writing 0.987 0.987 0.987 0.973 max 0.996Listening 0.991 0.991 0.987 0.982 mean 0.988Speaking 0.991 0.991 0.996 0.996 SD 0.006

Fixed Sep Fixed SepReading 0.808 0.817 0.750 0.741 min 0.741Writing 0.830 0.821 0.848 0.839 max 0.924Listening 0.924 0.911 0.911 0.915 mean 0.859Speaking 0.902 0.902 0.911 0.911 SD 0.060

Content AreaPCM GPCM

Content AreaPCM GPCM

Correlations

3-CategoryConsistency

4-CategoryConsistency

Content AreaPCM GPCM

Page 30: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

30

Results

MEAP Mathematics

Page 31: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

31

Results: MEAP Math Student-Level Outcomes

Correlations among variables based on psychometric decisions

Fixed Sep Fixed Sep Fixed Sep Fixed SepFixed - 1.000 0.943 0.941 0.775 0.775 0.775 0.743Sep 1.000 - 0.943 0.941 0.775 0.775 0.775 0.742Fixed 0.900 0.901 - 0.996 0.748 0.748 0.748 0.751Sep 0.891 0.893 0.984 - 0.746 0.745 0.746 0.748Fixed 0.684 0.685 0.677 0.666 - 1.000 1.000 0.941Sep 0.684 0.685 0.676 0.665 1.000 - 1.000 0.941Fixed 0.670 0.671 0.691 0.682 0.936 0.935 - 0.941Sep 0.667 0.668 0.688 0.679 0.935 0.934 0.998 -

Grade 7 above diagonal/Grade 8

below

Alge

bra

Num

ber &

O

pera

tions

3-PL

1-PL

3-PL

1-PL

Number & OperationsAlgebra3-PL1-PL3-PL1-PL

Page 32: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

32

Results: MEAP Math Student-Level Outcomes

Very high correlations based on fixed versus separate calibrations

Fixed Sep Fixed Sep Fixed Sep Fixed SepFixed - 1.000 0.943 0.941 0.775 0.775 0.775 0.743Sep 1.000 - 0.943 0.941 0.775 0.775 0.775 0.742Fixed 0.900 0.901 - 0.996 0.748 0.748 0.748 0.751Sep 0.891 0.893 0.984 - 0.746 0.745 0.746 0.748Fixed 0.684 0.685 0.677 0.666 - 1.000 1.000 0.941Sep 0.684 0.685 0.676 0.665 1.000 - 1.000 0.941Fixed 0.670 0.671 0.691 0.682 0.936 0.935 - 0.941Sep 0.667 0.668 0.688 0.679 0.935 0.934 0.998 -3-

PL1-

PL3-

PL1-

PL

Number & OperationsAlgebra3-PL1-PL3-PL1-PL

Grade 7 above diagonal/Grade 8

below

Alge

bra

Num

ber &

O

pera

tions

Page 33: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

33

Results: MEAP Math Student-Level Outcomes

Very high correlations based on fixed versus separate calibrations

Fixed Sep Fixed Sep Fixed Sep Fixed SepFixed - 1.000 0.943 0.941 0.775 0.775 0.775 0.743Sep 1.000 - 0.943 0.941 0.775 0.775 0.775 0.742Fixed 0.900 0.901 - 0.996 0.748 0.748 0.748 0.751Sep 0.891 0.893 0.984 - 0.746 0.745 0.746 0.748Fixed 0.684 0.685 0.677 0.666 - 1.000 1.000 0.941Sep 0.684 0.685 0.676 0.665 1.000 - 1.000 0.941Fixed 0.670 0.671 0.691 0.682 0.936 0.935 - 0.941Sep 0.667 0.668 0.688 0.679 0.935 0.934 0.998 -3-

PL1-

PL3-

PL1-

PL

Number & OperationsAlgebra3-PL1-PL3-PL1-PL

Grade 7 above diagonal/Grade 8

below

Alge

bra

Num

ber &

O

pera

tions

Page 34: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

34

Results: MEAP Math Student-Level Outcomes

Not as high correlations based on 1-PL versus 3-PL calibrations

Fixed Sep Fixed Sep Fixed Sep Fixed SepFixed - 1.000 0.943 0.941 0.775 0.775 0.775 0.743Sep 1.000 - 0.943 0.941 0.775 0.775 0.775 0.742Fixed 0.900 0.901 - 0.996 0.748 0.748 0.748 0.751Sep 0.891 0.893 0.984 - 0.746 0.745 0.746 0.748Fixed 0.684 0.685 0.677 0.666 - 1.000 1.000 0.941Sep 0.684 0.685 0.676 0.665 1.000 - 1.000 0.941Fixed 0.670 0.671 0.691 0.682 0.936 0.935 - 0.941Sep 0.667 0.668 0.688 0.679 0.935 0.934 0.998 -

Grade 7 above diagonal/Grade 8

below

Alge

bra

Num

ber &

O

pera

tions

3-PL

1-PL

3-PL

1-PL

Number & OperationsAlgebra3-PL1-PL3-PL1-PL

Page 35: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

35

Results: MEAP Math Student-Level Outcomes

Moderate to high correlations across dimensions

Fixed Sep Fixed Sep Fixed Sep Fixed SepFixed - 1.000 0.943 0.941 0.775 0.775 0.775 0.743Sep 1.000 - 0.943 0.941 0.775 0.775 0.775 0.742Fixed 0.900 0.901 - 0.996 0.748 0.748 0.748 0.751Sep 0.891 0.893 0.984 - 0.746 0.745 0.746 0.748Fixed 0.684 0.685 0.677 0.666 - 1.000 1.000 0.941Sep 0.684 0.685 0.676 0.665 1.000 - 1.000 0.941Fixed 0.670 0.671 0.691 0.682 0.936 0.935 - 0.941Sep 0.667 0.668 0.688 0.679 0.935 0.934 0.998 -3-

PL1-

PL3-

PL1-

PL

Number & OperationsAlgebra3-PL1-PL3-PL1-PL

Grade 7 above diagonal/Grade 8

below

Alge

bra

Num

ber &

O

pera

tions

Page 36: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

36

Results: MEAP Mathematics School-Level Value-Added Outcomes

Impact of fixed versus separate calibration

1-PL 3-PL 1-PL 3-PL 1-PL 3-PL 1-PL 3-PLAlgebra 1.000 0.995 1.000 0.992 1.000 0.985 1.000 0.985Number & Operations 1.000 0.977 1.000 0.956 1.000 0.988 1.000 0.983

1-PL 3-PL 1-PL 3-PL 1-PL 3-PL 1-PL 3-PLAlgebra 0.989 0.968 0.987 0.973 0.987 0.935 0.989 0.960Number & Operations 0.989 0.923 0.994 0.935 0.990 0.946 0.989 0.966

1-PL 3-PL 1-PL 3-PL 1-PL 3-PL 1-PL 3-PLAlgebra 0.995 0.926 0.993 0.883 0.992 0.856 0.986 0.848Number & Operations 0.989 0.827 0.984 0.712 0.993 0.875 0.983 0.817

Corr

elati

ons

3-Ca

t Co

nsist

ency

4-Ca

t Co

nsist

ency

DemosNo DemosDemosNo Demos2 pre-test covariates1 pre-test covariate

Content Area

1 pre-test covariate 2 pre-test covariatesNo Demos Demos No Demos Demos

Content Area

Content Area

1 pre-test covariate 2 pre-test covariatesNo Demos Demos No Demos Demos

Page 37: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

37

Results: MEAP Mathematics School-Level Value-Added Outcomes

Impact of choice of outcome (Algebra vs. Number)

1-PL 3-PL 1-PL 3-PL 1-PL 3-PL 1-PL 3-PLFixed Parameter 0.548 0.608 0.361 0.391 0.652 0.697 0.574 0.609Separate 0.549 0.649 0.366 0.436 0.653 0.711 0.576 0.614

1-PL 3-PL 1-PL 3-PL 1-PL 3-PL 1-PL 3-PLFixed Parameter 0.637 0.667 0.649 0.703 0.703 0.751 0.716 0.774Separate 0.637 0.691 0.650 0.726 0.705 0.749 0.713 0.784

1-PL 3-PL 1-PL 3-PL 1-PL 3-PL 1-PL 3-PLFixed Parameter 0.399 0.424 0.322 0.337 0.447 0.475 0.404 0.412Separate 0.397 0.429 0.322 0.350 0.444 0.484 0.405 0.436

Corr

elati

ons

3-Ca

t Co

nsist

ency

4-Ca

t Co

nsist

ency

1 pre-test covariate 2 pre-test covariatesNo Demos Demos No Demos Demos

MultidimensionalCalibration Type

1 pre-test covariate 2 pre-test covariatesNo Demos Demos No Demos Demos

MultidimensionalCalibration Type

MultidimensionalCalibration Type

1 pre-test covariate 2 pre-test covariatesNo Demos Demos No Demos Demos

Page 38: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

38

Results: MEAP Mathematics School-Level Value-Added Outcomes

Impact of choice of psychometric model

Alg Num Alg Num Alg Num Alg NumFixed Parameter 0.939 0.963 0.883 0.934 0.918 0.961 0.925 0.962Separate 0.938 0.962 0.876 0.937 0.925 0.962 0.873 0.938

Alg Num Alg Num Alg Num Alg NumFixed Parameter 0.890 0.901 0.851 0.912 0.867 0.921 0.837 0.915Separate 0.886 0.907 0.841 0.918 0.876 0.918 0.839 0.915

Alg Num Alg Num Alg Num Alg NumFixed Parameter 0.732 0.763 0.611 0.673 0.679 0.773 0.602 0.677Separate 0.717 0.775 0.604 0.685 0.701 0.770 0.610 0.670

Corr

elati

ons

3-Ca

t Co

nsist

ency

4-Ca

t Co

nsist

ency

1 pre-test covariate 2 pre-test covariatesNo Demos Demos No Demos Demos

MultidimensionalCalibration Type

1 pre-test covariate 2 pre-test covariatesNo Demos Demos No Demos Demos

MultidimensionalCalibration Type

MultidimensionalCalibration Type

1 pre-test covariate 2 pre-test covariatesNo Demos Demos No Demos Demos

Page 39: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

39

Results: MEAP Mathematics School-Level Value-Added Outcomes

Impact of Including/Not Including Demographics

Alg Num Alg Num Alg Num Alg NumFixed Parameter 0.964 0.815 0.813 0.717 0.984 0.822 0.895 0.775Separate 0.962 0.819 0.806 0.780 0.983 0.825 0.877 0.793

Alg Num Alg Num Alg Num Alg NumFixed Parameter 0.880 0.772 0.771 0.713 0.928 0.774 0.841 0.771Separate 0.875 0.767 0.774 0.724 0.927 0.775 0.831 0.756

Alg Num Alg Num Alg Num Alg NumFixed Parameter 0.775 0.551 0.572 0.464 0.864 0.557 0.646 0.508Separate 0.774 0.556 0.544 0.522 0.858 0.552 0.635 0.547

Corr

elati

ons

3-Ca

t Co

nsist

ency

4-Ca

t Co

nsist

ency

1 pre-test covariate 2 pre-test covariates1-PL 3-PL 1-PL 3-PL

MultidimensionalCalibration Type

1 pre-test covariate 2 pre-test covariates1-PL 3-PL 1-PL 3-PL

MultidimensionalCalibration Type

MultidimensionalCalibration Type

1 pre-test covariate 2 pre-test covariates1-PL 3-PL 1-PL 3-PL

Page 40: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

40

Results: MEAP Mathematics School-Level Value-Added Outcomes

Impact of covarying on one vs. two pre-test scores

Alg Num Alg Num Alg Num Alg NumFixed Parameter 0.937 0.965 0.923 0.964 0.941 0.947 0.930 0.951Separate 0.937 0.965 0.937 0.962 0.941 0.948 0.941 0.942

Alg Num Alg Num Alg Num Alg NumFixed Parameter 0.855 0.884 0.851 0.889 0.889 0.918 0.872 0.744Separate 0.859 0.889 0.878 0.883 0.885 0.922 0.885 0.755

Alg Num Alg Num Alg Num Alg NumFixed Parameter 0.734 0.764 0.696 0.753 0.715 0.687 0.704 0.713Separate 0.729 0.768 0.727 0.754 0.716 0.693 0.714 0.698

Corr

elati

ons

3-Ca

t Co

nsist

ency

4-Ca

t Co

nsist

ency

No Demographics Includes Demographics1-PL 3-PL 1-PL 3-PL

MultidimensionalCalibration Type

No Demographics Includes Demographics1-PL 3-PL 1-PL 3-PL

MultidimensionalCalibration Type

MultidimensionalCalibration Type

No Demographics Includes Demographics1-PL 3-PL 1-PL 3-PL

Page 41: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

41

Conclusions

Practically important impacts on value-added metrics and value-added classifications Choice of psychometric model Including/not including demographics Including/not including multiple pre-test values

Prohibitive impacts on value-added metrics and value-added classifications Choice of outcome (i.e., domain within construct)

Practically negligible impacts on value-added metrics and value-added classifications Separate versus fixed calibrations of domains within

construct

Page 42: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

42

Conclusions, continued…

Need to pay attention to modeling domains within constructs if constructs can reasonably be considered multidimensional Of the common psychometric and statistical modeling decisions one can

make, the choice of which subscore to use as an outcome is the most influential

Because subscores give different profiles of both student achievement and program/school value-added, each subscore should be modeled to the degree possible

4-category (i.e., quartile) classifications on value-added are appreciably impacted by every psychometric and statistical modeling choice evaluated here, but 3-category classifications are not Discourage more than three categories RTTT requires at least four categories

Page 43: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

43

Conclusions, continued…

3- vs. 4-category distinction is actually a proxy for Statistical decision categories (3-categories) Arbitrary cut point categories (4-categories)

Can leverage unidimensional calibrations of multidimensional achievement scales to create multidimensional profiles of value-added Except where using four categories of classifications

Page 44: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

44

Limitations

Inductive reasoning Results are likely to hold in similar circumstances Still will need to investigate feasibility of using

fixed parameters from unidimensional calibration for specific circumstances if those circumstances are high stakes

This is a proof of conceptPCM and GPCM models were run using

different software (WINSTEPS vs. PARSCALE)

Page 45: Presentation at the 12 th  Annual Maryland Assessment Conference College Park, MD October 18, 2012

45

Contact Information

Joseph A. Martineau, Ph.D. Executive Director Bureau of Assessment & Accountability Michigan Department of Education [email protected]

Ji Zeng, Ph.D. Psychometrician Bureau of Assessment & Accountability Michigan Department of Education [email protected]