AUTOMATED TRAIT SCORES FOR TOEFL & GRE WRITING€¦ · • TOEFL iBT Writing: integrated task and...

22
© Pacific Metrics 2016 AUTOMATED TRAIT SCORES FOR TOEFL & GRE WRITING Sandip Sinharay and Yigal Attali Pacific Metrics Corporation & ETS

Transcript of AUTOMATED TRAIT SCORES FOR TOEFL & GRE WRITING€¦ · • TOEFL iBT Writing: integrated task and...

Page 1: AUTOMATED TRAIT SCORES FOR TOEFL & GRE WRITING€¦ · • TOEFL iBT Writing: integrated task and independent task • GRE Writing: argument task and issue task • In some computations,

© Pacific Metrics 2016

AUTOMATED TRAIT SCORES FOR TOEFL & GRE WRITING

Sandip Sinharay and Yigal Attali

Pacific Metrics Corporation & ETS

Page 2: AUTOMATED TRAIT SCORES FOR TOEFL & GRE WRITING€¦ · • TOEFL iBT Writing: integrated task and independent task • GRE Writing: argument task and issue task • In some computations,

© Pacific Metrics 2016

Few wish to assess others, fewer still wish to be assessed, but everyone wants to see the scores.

Paul W. Holland (2001)

Presenter
Presentation Notes
Page 3: AUTOMATED TRAIT SCORES FOR TOEFL & GRE WRITING€¦ · • TOEFL iBT Writing: integrated task and independent task • GRE Writing: argument task and issue task • In some computations,

© Pacific Metrics 2016

Few wish to assess others, fewer still wish to be assessed, but everyone wants to see the subscores.

Presenter
Presentation Notes
Page 4: AUTOMATED TRAIT SCORES FOR TOEFL & GRE WRITING€¦ · • TOEFL iBT Writing: integrated task and independent task • GRE Writing: argument task and issue task • In some computations,

© Pacific Metrics 2016

• In the context of essay scoring, subscores usually refer to analytic scores, for example, those produced by the 6+1 writing model (Education Northwest).

• Analytic scores are expensive.

ANALYTIC SCORES

Presenter
Presentation Notes
Page 5: AUTOMATED TRAIT SCORES FOR TOEFL & GRE WRITING€¦ · • TOEFL iBT Writing: integrated task and independent task • GRE Writing: argument task and issue task • In some computations,

© Pacific Metrics 2016

• Given the popularity of automated essay scoring, a question of increasing interest: “Is it viable to report automated subscores or trait scores?”

• Focus here will be on e-rater V.2 (Attali & Burstein, 2006).

AUTOMATED TRAIT SCORES

Presenter
Presentation Notes
Page 6: AUTOMATED TRAIT SCORES FOR TOEFL & GRE WRITING€¦ · • TOEFL iBT Writing: integrated task and independent task • GRE Writing: argument task and issue task • In some computations,

© Pacific Metrics 2016

• Attali (2007) and Attali and Powers (2008) found three factors underlying the e-rater non-content features for TOEFL CBT and a developmental writing scale – word-choice (W) – grammatical conventions (G)– fluency & organization (F)

PREVIOUS RESEARCH ON AUTOMATED TRAIT SCORES

Presenter
Presentation Notes
Page 7: AUTOMATED TRAIT SCORES FOR TOEFL & GRE WRITING€¦ · • TOEFL iBT Writing: integrated task and independent task • GRE Writing: argument task and issue task • In some computations,

© Pacific Metrics 2016

• We explore reporting of four automated trait scores for TOEFL iBT Writing and GRE Writing– W– G– F– Content (C).

GOAL OF THE STUDY

Presenter
Presentation Notes
Page 8: AUTOMATED TRAIT SCORES FOR TOEFL & GRE WRITING€¦ · • TOEFL iBT Writing: integrated task and independent task • GRE Writing: argument task and issue task • In some computations,

© Pacific Metrics 2016

Three important questions• What should the trait scores be

based on?• How to compute the trait scores?• Do the trait scores have added

value?

GOAL OF THE STUDY

Presenter
Presentation Notes
Page 9: AUTOMATED TRAIT SCORES FOR TOEFL & GRE WRITING€¦ · • TOEFL iBT Writing: integrated task and independent task • GRE Writing: argument task and issue task • In some computations,

© Pacific Metrics 2016

• Features of e-rater V.2 used in this study: Vocabulary, Word length, Grammar, Usage, Mechanics, Col/prep, Organization, Essay length, Style,Value cosine, & Pattern cosine.

AUTOMATED TRAIT SCORES

Presenter
Presentation Notes
Page 10: AUTOMATED TRAIT SCORES FOR TOEFL & GRE WRITING€¦ · • TOEFL iBT Writing: integrated task and independent task • GRE Writing: argument task and issue task • In some computations,

© Pacific Metrics 2016

• TOEFL iBT Writing: integrated task and independent task

• GRE Writing: argument task and issue task

• In some computations, the reliability was computed from the correlation between the two tasks

TOEFL AND GRE

Presenter
Presentation Notes
Page 11: AUTOMATED TRAIT SCORES FOR TOEFL & GRE WRITING€¦ · • TOEFL iBT Writing: integrated task and independent task • GRE Writing: argument task and issue task • In some computations,

© Pacific Metrics 2016

• Weights based on regression of human score on the features

• Weights based on cross-task reliability of the features: weight proportional to √r/(1-r)

• Weights based on factor analysis

THREE TYPES OF WEIGHTS ON THE FEATURES

Presenter
Presentation Notes
Page 12: AUTOMATED TRAIT SCORES FOR TOEFL & GRE WRITING€¦ · • TOEFL iBT Writing: integrated task and independent task • GRE Writing: argument task and issue task • In some computations,

© Pacific Metrics 2016

• Weights based on regression were occasionally negative.

• The other two sets of weights were similar—the weights based on reliability are considered henceforth.

THREE TYPES OF WEIGHTS

Presenter
Presentation Notes
Page 13: AUTOMATED TRAIT SCORES FOR TOEFL & GRE WRITING€¦ · • TOEFL iBT Writing: integrated task and independent task • GRE Writing: argument task and issue task • In some computations,

© Pacific Metrics 2016

• Vocabulary:1.3, Word length:1.3• Grammar:1.3, Usage:1.3,

Mechanics:1.8, Col/prep:0.8• Organization:1.6, Essay

length:2.9, Style:0.7• Value cosine:1.2, Pattern

cosine:1.3

RELIABILITY-BASED WEIGHTS FOR TOEFL IBT

Presenter
Presentation Notes
Page 14: AUTOMATED TRAIT SCORES FOR TOEFL & GRE WRITING€¦ · • TOEFL iBT Writing: integrated task and independent task • GRE Writing: argument task and issue task • In some computations,

© Pacific Metrics 2016

• Reliabilities (based on cross-task correlation) – 0.29-0.71 (0.29 for C, ≥ 0.62 for

the other three) for TOEFL – 0.51-0.72 for GRE

RELIABILITY OF THE TRAIT SCORES

Presenter
Presentation Notes
Page 15: AUTOMATED TRAIT SCORES FOR TOEFL & GRE WRITING€¦ · • TOEFL iBT Writing: integrated task and independent task • GRE Writing: argument task and issue task • In some computations,

© Pacific Metrics 2016

• Haberman’s criterion for added value of subscores:

ADDED VALUE OF THE TRAIT SCORES

Test Form 1Subscore 1Subscore 2Subscore 3

…Total score

Test Form 2Subscore 1Subscore 2Subscore 3

…Total score

Presenter
Presentation Notes
Page 16: AUTOMATED TRAIT SCORES FOR TOEFL & GRE WRITING€¦ · • TOEFL iBT Writing: integrated task and independent task • GRE Writing: argument task and issue task • In some computations,

© Pacific Metrics 2016

• The three trait scores (W, G, and F) other than C had added value

VALUE OF THE TRAIT SCORES

TOEFL IndWGFC

E-rater score

TOEFL IntWGFC

E-rater score

.62.08.35

.15

Presenter
Presentation Notes
Page 17: AUTOMATED TRAIT SCORES FOR TOEFL & GRE WRITING€¦ · • TOEFL iBT Writing: integrated task and independent task • GRE Writing: argument task and issue task • In some computations,

© Pacific Metrics 2016

• Wainer et al. (2001) and Haberman (2008) suggested “augmented subscores” where a subscore is improved by borrowing strength from other subscores.

AUGMENTED SUBSCORES

Presenter
Presentation Notes
Page 18: AUTOMATED TRAIT SCORES FOR TOEFL & GRE WRITING€¦ · • TOEFL iBT Writing: integrated task and independent task • GRE Writing: argument task and issue task • In some computations,

© Pacific Metrics 2016

For example, for the TOEFL independent task, Augmented W = 0.60 W – 0.01 G + 0.02 F + 0.20 C

AUGMENTED TRAIT SCORES

Presenter
Presentation Notes
Page 19: AUTOMATED TRAIT SCORES FOR TOEFL & GRE WRITING€¦ · • TOEFL iBT Writing: integrated task and independent task • GRE Writing: argument task and issue task • In some computations,

© Pacific Metrics 2016

• Reliabilities (based on cross-task correlation) of augmented trait scores are 0.58-0.72 for TOEFL and 0.55-0.76 for GRE

• Reliability of augmented C score much larger than that of the (unaugmented) C score

RELIABILITY OF THE TRAIT SCORES

Presenter
Presentation Notes
Page 20: AUTOMATED TRAIT SCORES FOR TOEFL & GRE WRITING€¦ · • TOEFL iBT Writing: integrated task and independent task • GRE Writing: argument task and issue task • In some computations,

© Pacific Metrics 2016

• Haberman (2008) suggested that an augmented subscore has added value if its PRMSE is substantially larger than the corresponding unaugmented subscore.

• All augmented trait scores have added value using that criterion.

ADDED VALUE OF AUGMENTED TRAIT SCORES

Presenter
Presentation Notes
Page 21: AUTOMATED TRAIT SCORES FOR TOEFL & GRE WRITING€¦ · • TOEFL iBT Writing: integrated task and independent task • GRE Writing: argument task and issue task • In some computations,

© Pacific Metrics 2016

• We explored the reporting of four trait scores for TOEFL iBT Writing and GRE Writing.

• Three of them have added value• The fourth has added value after

augmentation

CONCLUSIONS

Presenter
Presentation Notes
Page 22: AUTOMATED TRAIT SCORES FOR TOEFL & GRE WRITING€¦ · • TOEFL iBT Writing: integrated task and independent task • GRE Writing: argument task and issue task • In some computations,

© Pacific Metrics 2016

• The added value of three trait scores provides further evidence of the validity of e-rater scores for measuring writing skill.

• It should be examined if the trait scores benefit student learning, teacher instruction, and program decisions.

CONCLUSIONS

Presenter
Presentation Notes