AUTOMATED TRAIT SCORES FOR TOEFL & GRE WRITING€¦ · • TOEFL iBT Writing: integrated task and...

© Pacific Metrics 2016

AUTOMATED TRAIT SCORES FOR TOEFL & GRE WRITING

Sandip Sinharay and Yigal Attali

Pacific Metrics Corporation & ETS


Few wish to assess others, fewer still wish to be assessed, but everyone wants to see the scores.

Paul W. Holland (2001)

Presenter

Presentation Notes


Few wish to assess others, fewer still wish to be assessed, but everyone wants to see the subscores.

Presenter

Presentation Notes


• In the context of essay scoring, subscores usually refer to analytic scores, for example, those produced by the 6+1 writing model (Education Northwest).

• Analytic scores are expensive.

ANALYTIC SCORES

Presenter

Presentation Notes


• Given the popularity of automated essay scoring, a question of increasing interest: “Is it viable to report automated subscores or trait scores?”

• Focus here will be on e-rater V.2 (Attali & Burstein, 2006).

AUTOMATED TRAIT SCORES

Presenter

Presentation Notes


• Attali (2007) and Attali and Powers (2008) found three factors underlying the e-rater non-content features for TOEFL CBT and a developmental writing scale – word-choice (W) – grammatical conventions (G)– fluency & organization (F)

PREVIOUS RESEARCH ON AUTOMATED TRAIT SCORES

Presenter

Presentation Notes


• We explore reporting of four automated trait scores for TOEFL iBT Writing and GRE Writing– W– G– F– Content (C).

GOAL OF THE STUDY

Presenter

Presentation Notes


Three important questions• What should the trait scores be

based on?• How to compute the trait scores?• Do the trait scores have added

value?

GOAL OF THE STUDY

Presenter

Presentation Notes


• Features of e-rater V.2 used in this study: Vocabulary, Word length, Grammar, Usage, Mechanics, Col/prep, Organization, Essay length, Style,Value cosine, & Pattern cosine.

AUTOMATED TRAIT SCORES

Presenter

Presentation Notes


• TOEFL iBT Writing: integrated task and independent task

• GRE Writing: argument task and issue task

• In some computations, the reliability was computed from the correlation between the two tasks

TOEFL AND GRE

Presenter

Presentation Notes


• Weights based on regression of human score on the features

• Weights based on cross-task reliability of the features: weight proportional to √r/(1-r)

• Weights based on factor analysis

THREE TYPES OF WEIGHTS ON THE FEATURES

Presenter

Presentation Notes


• Weights based on regression were occasionally negative.

• The other two sets of weights were similar—the weights based on reliability are considered henceforth.

THREE TYPES OF WEIGHTS

Presenter

Presentation Notes


• Vocabulary:1.3, Word length:1.3• Grammar:1.3, Usage:1.3,

Mechanics:1.8, Col/prep:0.8• Organization:1.6, Essay

length:2.9, Style:0.7• Value cosine:1.2, Pattern

cosine:1.3

RELIABILITY-BASED WEIGHTS FOR TOEFL IBT

Presenter

Presentation Notes


• Reliabilities (based on cross-task correlation) – 0.29-0.71 (0.29 for C, ≥ 0.62 for

the other three) for TOEFL – 0.51-0.72 for GRE

RELIABILITY OF THE TRAIT SCORES

Presenter

Presentation Notes


• Haberman’s criterion for added value of subscores:

ADDED VALUE OF THE TRAIT SCORES

Test Form 1Subscore 1Subscore 2Subscore 3

…Total score

Test Form 2Subscore 1Subscore 2Subscore 3

…Total score

Presenter

Presentation Notes


• The three trait scores (W, G, and F) other than C had added value

VALUE OF THE TRAIT SCORES

TOEFL IndWGFC

E-rater score

TOEFL IntWGFC

E-rater score

.62.08.35

.15

Presenter

Presentation Notes


• Wainer et al. (2001) and Haberman (2008) suggested “augmented subscores” where a subscore is improved by borrowing strength from other subscores.

AUGMENTED SUBSCORES

Presenter

Presentation Notes


For example, for the TOEFL independent task, Augmented W = 0.60 W – 0.01 G + 0.02 F + 0.20 C

AUGMENTED TRAIT SCORES

Presenter

Presentation Notes


• Reliabilities (based on cross-task correlation) of augmented trait scores are 0.58-0.72 for TOEFL and 0.55-0.76 for GRE

• Reliability of augmented C score much larger than that of the (unaugmented) C score

RELIABILITY OF THE TRAIT SCORES

Presenter

Presentation Notes


• Haberman (2008) suggested that an augmented subscore has added value if its PRMSE is substantially larger than the corresponding unaugmented subscore.

• All augmented trait scores have added value using that criterion.

ADDED VALUE OF AUGMENTED TRAIT SCORES

Presenter

Presentation Notes


• We explored the reporting of four trait scores for TOEFL iBT Writing and GRE Writing.

• Three of them have added value• The fourth has added value after

augmentation

CONCLUSIONS

Presenter

Presentation Notes


• The added value of three trait scores provides further evidence of the validity of e-rater scores for measuring writing skill.

• It should be examined if the trait scores benefit student learning, teacher instruction, and program decisions.

CONCLUSIONS

Presenter

Presentation Notes

AUTOMATED TRAIT SCORES FOR TOEFL & GRE WRITING€¦ · • TOEFL iBT Writing: integrated task and...

Documents

Transcript of AUTOMATED TRAIT SCORES FOR TOEFL & GRE WRITING€¦ · • TOEFL iBT Writing: integrated task and...