Investigating the Effect of Raters’ L1 Background on Writing Assessment

7/30/2019 Investigating the Effect of Raters L1 Background on Writing Assessment

1/11

Investigating the Effect ofRaters L1 Background on

Writing Assessment

A Presentation for

IJASParis, France

April 8, 2013

by

Farah BahrouniSultan Qaboos University (SQU)

OMAN

[email protected]
mailto:[email protected]:[email protected]


2/11

Confusion is the beginning of learning.Socrates (469-399 BC)

If we knew what we were doing, we

wouldnt call it research.Albert Einstein

These 2 quotations might explain why I am here!


3/11

Outline:

1) Claim

2) Literature

3) Study

3.1 Data collection

3.2 Tool: FACETS & One-Way ANOVA

3.3 Results

4) Conclusion

Implication & Significance

5) References & Further Readings


4/11

1. Claim

L1 downplayed in literature: seconded to culture

Significant standalone source of rater discrepancy in

performance assessment

Should be studied as a facet (aspect/feature/factor) on its own

Back


5/11

2. Literature

Research has established that writing assessment can by no

means be objective

Studies have probed possible reasons for its subjectivity

extensively:


6/11

Weigle (1994: 23, 24) grouped sources of raters'

disagreement into three categories:

within the text : prompt, writers background & ability

within the rater(the focus of this study): physical &

psychological conditions

within the rating context: when, where & under what

conditions the rating is done

She adds that interactions among these sources are

also possible:

A rater from a certain background may react to a text

written in a certain style differently from the way a rater

from a different background would. p. 24


7/11

Bachman (1990) refers to the above sources as:potential

sources of measurement errorand categorizes them into three

groups:

test method factors (e.g. raters, prompt type, etc.),

personal attributes (e.g. test taker's cognitive style, knowledge

of particular content, etc.)

random factors (e.g. fatigue, time of day, etc) Back


8/11

3. Study

Quantitative Data collection

32 ESL teachers from 4 different language backgrounds (8 native

speakers, 8 Arabs, sharing students mother tongue, 8 Indians,

and 8 Russians) scored 3 essays written by 3 Omani university

students. All raters are experienced ESL/EFL teachers, and have

taught in the Omani context for a minimum of 2 years

Back


9/11

2. Analysis:

2.1 Facets

Vertical Ruler: the higher up in the column, the more severe

L1 Measurement Report (Table 7.3.1): all indices show that the

difference between the 4 L1s is significant

Measure

Fit analysis

Reliabilty

2.2 ANOVA

One-Way ANOVA indicates some similarities between Native

Speakers and Indians on one hand, and on the other betweenArabs and Russians in the ways they scored the 3 essays. The

significant discrepancy is between Arabs and Indians.Back
http://localhost/var/www/apps/conversion/tmp/scratch_9/8%20General.outhttp://localhost/var/www/apps/conversion/tmp/scratch_9/8%20V2%20One%20way%20ANOVA.spvhttp://localhost/var/www/apps/conversion/tmp/scratch_9/8%20V2%20One%20way%20ANOVA.spvhttp://localhost/var/www/apps/conversion/tmp/scratch_9/8%20General.out


10/11

4. Implication & significance:

Findings indicate that L1 does have a significant impact on a raters

behavior in a writing assessment event

This jeopardizes the reliability of the scoring process as well as the

validity of the obtained results

A panoply of ways could be used to mitigate L1 impact:

Training

Double/Triple marking

Improve the marking criteria in a way that idiosyncrasies are

stopped from playing a role there

Back


11/11

REFERENCES & Further Readings

Alderson, J. C. (1991). Bands and Scores. In J. C. Alderson & B. North (Eds.), Language Testing in the 1990s: The

Communicative Legacy(Vol. 71 - 86). London and Basingstoke: Macmillan Publishers Limited.

Alderson, J. C., Clapham, C., & Wall, D. (1995). Language Test Construction and Evaluation: Cambridge University Press.

Bachman, L. F. (1990). Fundamental Considerations in Language Testing: Oxford: Oxford University Press.

Bachman, L. F., & Palmer, A. S. (1996). Language Testing in Practice: Designing and Developing Useful Language Tests.:Oxford: Oxford University Press.

Brindley, G. (1998). Describing language development? Rating scales and SLA. In: L. F. Bachman & A. D. Cohen (Eds.),

Interfaces between second language acquisition and language testing research. CUP.

Fulcher, G. (2000). The 'communicative' legacy in language testing. System, 28, 483 -497.

Fulcher, G. (2010). Practical Language Testing. Hodder Education, An Hachette UK Company

Fulcher, G., Davidson, F. & Kemp, J. (2011) Effective rating scale development for speaking tests: Performance decision

trees. Language Testing 28 (1) 5-29

Hamp-Lyons, L. (1991). Scoring procedures for ESL contexts. In L. Hamp-Lyons (Ed.),Assessing Second Language

Writing in Academic Contexts (pp. 241-276). Norwood, NJ: Ablex Publishing Corporation.

Hunter, D. M., Jones, R. M., & Randhawa, B. S. (1996). The Use of Holistic versus Analytic Scoring for Large-Scale

Assessment of Writing. The Canadian Journal of Program Evaluation, 11(2), 61 - 85.

North, B. (2000) The development of a Common Framework Scale of Language Proficiency: Theoretical Studies in

Second Language Acquisition P. Lang.

North, B. (2003). Scales for rating language performance: Descriptive models, formulation styles, and presentation

formats. TOEFL Monograph, 24.

North, B. & Schneider, G. (1998) Scaling descriptors for language proficiency scales. Language Testing 15 (2) 217-263

Weigle, S. C. (1994). Effects of training on raters of English as a second language compositions: Quantitative and

Qualitative approaches. University of California, Los Angeles.

Weigle, S. C. (2002).Assessing Writing. Cambridge: Cambridge University Press.

Thank you

Investigating the Effect of Raters’ L1 Background on Writing Assessment

Documents

Transcript of Investigating the Effect of Raters’ L1 Background on Writing Assessment