A model for investigating the impact of language assessment...

30
Teaching and Testing: Opportunities for Learning June 2006 Hellenic American Union, Athens A model for investigating the impact of language assessment within state educational contexts Nick Saville

Transcript of A model for investigating the impact of language assessment...

Teaching and Testing: Opportunities for Learning

June 2006

Hellenic American Union, Athens

A model for investigating the impact of

language assessment within

state educational contexts

Nick Saville

Nick Saville

AthensJune 2006

A model for investigatingthe impact of

language assessment within

state educational contexts

Teaching and Testing: Opportunities for Learning

June 2006

Hellenic American Union, Athens

• Background - a personal perspective• 1980’s

• Messick, Bachman – early 1990s

• The literature on washback/impact• early work and recent progress

• gaps? where next?

• Analysis of three case studies – what can be learnt?

• Towards a comprehensive Model of Impact

• Applying the model in a state educational context• the Asset Languages Project

Outline

PracticalityV

P

TestR

“Practicality in Language Testing: an educational management model”

Main argument: test development is a form of educational innovation - and needs to be managed as such

“... achieving a balance between the purpose of the test, its validity for the purpose, the required reliability for the purpose and the constraints

imposed by the context is essentially the task facing the test designer ….”

Saville (1990), University of Reading - based on test development project Japan, 1987-9

Putting the test into context

V

R P

Test

The aim … is not only to encourage good testing practice, but to prevent bad tests being produced ....

... a bad test is not only one with low reliability and dubious validity but also one which has a damaging backwash on the curriculum.

Saville 1990:11-13

A logical consequence …. is that ethicality will be achieved as a result ..

……. this is because any test which is produced should be appropriate to the educational context in which it is to be used and the effect on learners and institutions will be a major consideration.

Impact Ripples

V

R P

Test

II

Wider Impact

“macro”

level II Local Impact

“micro” level

U = V + R + I + P

Bachman and Palmer, 1996 : U = Cv + A + I + R + I + P

Developing “useful tests”, fit for purpose

Balancing the test qualities

Usefulness as overall validity

Maxim 1 PLANUse a rational and explicit approach to test development

Maxim 2 SUPPORTSupport stakeholders in the testing process

Maxim 3 COMMUNICATEProvide comprehensive, useful and transparent information

Maxim 4 MONITOR and EVALUATECollect all relevant data and analyse as required.

g Maxims for achieving/monitoring intended impact

Milanovic and Saville 1995 – Considering the impact of the Cambridge EFL examinations

The literature on washback/impact

g Readings in the language testing literature:

• Hamp-Lyons (1989)• Wall and Alderson (1993) Does washback exist? etc..• Language Testing (1996: 13, 3) Messick, Bailey, etc…• Hamp-Lyons (1997)• Watanabe (1997)• Cheng and Watanabe (eds) (2004)

• Recent PhD studies and subsequent books in SILT series based on research conducted in the 1990s:

• Cheng (SILT 21 - 2005)• Wall (SILT 23 - 2005)• Hawkey (SILT 24 - 2006)• Green (SILT 25 forthcoming) - “washback in context”

• Current work in Lancaster, ETS, UCLA, Cambridge, IELTS funded projects etc.

FocalConstruct

Test designcharacteristics

item formatcontent

complexityetc.

Overlap

Potential fornegative backwash

Potential forpositive backwash

Perception oftest importance

Perception oftest difficulty

Backwash toparticipant

Important

Unimportant

No backwash

Intense backwash

Easy

Unachievable

Challenging

Washback direction

Washback intensity

Washback variabilityParticipant characteristics and values

Knowledge/ understanding of test demandsResources to meet test demandsAcceptance of test demands

Other stakeholdersCourse providersMaterials writers

PublishersTeachersLearners

Tony Green IELTS Washback in context: Preparation for academic writing in higher education(SILT 25 - forthcoming)

The model starts from test design characteristics and related validity issues of construct representationidentified with washback by Messick (1996)

Washback will be most intense –have the most powerful effects on teaching and learning behaviours –where participants see the test as challenging and the results as important

SEE BLUE ARROW

The literature on washback/impactSo

• Impact is relatively new in the field of language assessment - an extension on the notion of washback and related to ethicality

• It is now considered to be of growing importance• It is part of a validity argument and evidence needs to be provided

Broadly speaking there is consensus • washback is an aspect of impact related to the “micro contexts” of the

classroom and the school (teaching and learning)• impact deals with wider influences and includes the “macro contexts” -

tests and examinations in societyBUT

g The dynamics between the micro and macro contexts mean that this is a complex rather than a simple relationship

- a “complex dynamic system”

The literature on washback/impact

And currently:

• there is no comprehensive model of test or examination impact within educational contexts

• impact has not yet been fully integrated into an approach to test development and validation in a systematic way

Three case studies – 1995 to 2004

g Case 1 - the world-wide survey of the impact of IELTS• a starting point for the work and the original model for what has followed• a conceptualisation of impact and design/validation of suitable

instruments to investigate it

g Case 2 - the Italian PL2000 project• an application of the model within a macro educational context• an initial attempt at the applying the approach on a limited basis within a

state educational context

g Case 3 - the Florence Language Learning Gains Project• an extension and re-application of the model within in a single school

context • at the micro level focusing on individual stakeholders within a single

language teaching institution

Hawkey – SILT 24 (2006)

Learning from the case studies

g What can be learned using these specific impact projects as meta-data?

g Three key factors of contemporary educational systems need to be accounted for:

1. the nature of complex dynamic systems

2. the roles that stakeholders play within such systems

3. the need to see assessment projects as educational innovations within the systems and to manage change effectively

1. The nature of complex dynamic systems

2. The roles that

stakeholders play

Context

Stakeholders in the Testing Community

Government agencies

Professional bodies

Learners

Teachers, Heads

School owners

Test writers

Consultants

Examiners

Test centre administrators

Materials writers

Publisher

inter alia

Learners

Parents/carers

Teachers, Heads

School owners

Receiving institutions

Government agencies

Professional bodies

Employers

Academic researchers

inter alia)

Input to test design Context of test use -

provided by stakeholders where decisions are made by stakeholders using test scores

Cambridge ESOL

Test construct Test format Test conditions Test assessment criteria

Test scores

Testing system

g See Wall (2005)… a case study using insights from testing and innovation theory

e.g. Henrichsen (1989)

3. The need to see assessment projects as educational innovations and to manage change effectively

Hybrid Model of the Diffusion / Implementation Process

Antecedents Process Consequences

Learning from the case studies

g When applied to language assessment – two key factors also need to be accounted for :

1. the nature of language itself as a socio-cognitive phenomenon - the latest views on validity

2. the nature of the test development and validation process• from conception to routine data collection and analysis

g Impact research, therefore is another kind of validation activity........

Investigating impact as valdidation

g The investigation of impact is not a discrete or one-off activity

g It is an essential component in establishing the overall validity (usefulness) of an assessment system in terms of its fitness forspecific purposes and contexts of use

g The proposed model locates the study of test impact as one of a set of research and development tools within an iterative approach to on-going test validation

g It is consistent with Messick 1996:“In essence ..... test validation is empirical evaluation of meaning and consequences of measurement, taking into account extraneous factors in the applied setting that might erode or promote validity of local score interpretation and use.”

Theory Test Taking Context

TT CONTEXT• TLU • Learning context • Context of score use

1. A SOCIO-COGNITIVE FRAMEWORKMessickBachmanKaneMislevyWeir….. etc.

Consequential aspects

of validity

Theory Test Taking Context

TT CONTEXT• TLU • Learning context • Context of score use

A SOCIO-COGNITIVE FRAMEWORK

The testing system

CoreConstruct

Consequential aspects

of validity

Theory Test Taking Context

TT CONTEXT• TLU • Learning context • Context of score use

The contextsLearning contexts

Testingcontexts

Use of resultscontexts

Consequential aspects

of validity

2. Model of the Test Development Process

“Validity by design”

Identifying stakeholders and their needs

Linking these needs to the requirements of test usefulness- including predicted impact

- theoretical

- practical

Long term, Iterative Processes -a key feature of validation

Model of the Test Development Process

Next phase: applying the model

g Asset Languages within the UK educational context

Welcome to Asset Languages

Asset Languages is a new way of motivating languages learners and rewarding their language skills.

The Asset Languages assessment scheme is for language learners of all ages and abilities: from primary school through to further, higher and adult education. Asset Languages is being developed by Cambridge Assessment through OCR and Cambridge ESOL, as part of the DfES' National Languages Strategy.

g Asset is being developed by Cambridge Assessment and is a voluntary “recognition scheme” based on the Languages Ladder to complement existing qualifications and the CEFR

g Will this approach deliver the intended impacts as a key element of the UK’s National Languages Strategy?

Asset Languages: a new qualifications framework

g National Languages Strategy 2002 (Nuffield Inquiry 2000):

Grew out of a perception that: • Language learning in UK is not successful• Assessment is part of the problem

g Proposed: the Languages Ladder• 'a new voluntary recognition system to complement existing

national qualifications frameworks and the Common European Framework'

Problems with existing language frameworks

g Current qualifications "confusing and uninformative about the levels of competence they represented“

(Nuffield Languages Programme 2002)g New framework should stress meaningful proficiency levels

g Current qualifications do not support learning wellg New framework should provide a “learning ladder” of bite-sized,

accessible learning targets

g The CEFR referred to as a model for addressing both problems

Asset Languages’ two main goals

g Two distinct goals• Support language learning, providing a motivating 'ladder' of learning targets which

enables recognition of each step achieved

• Accredit useful language ability within a can do framework, so that levels are comparable across languages

g A very complex framework:Over 20 languages• 6 CEFR levels• 3 contexts (primary, secondary, adult)• 2 assessment strands (external, teacher-assessed) • 4 skills accredited separately

g Needs a strong methodology for test development and validation

“ … seek validity by design as a likely basis for washback”Messick 1996: 252

..Test

performance

..“Real world”

(target situation of use)

True score

Test score

How can we score what we observe?

Relates to marking,rating criteria

Scoring model

Evaluation

Does the test measure consistently?

Relates totest reliability,rater training,scale construction and version equating using IRTetc

Measurement model

Generalization Extrapolation

Does the test score reflect the candidate’s actual ability?

Relates to validity

e.g. a socio-cognitive model linking features of the learners, the test content and the skills to be measured

CEFRlevels

Specific testing context Link to context-neutral framework

Idealization

How does the specific learning/testing context relate to a more general proficiency framework?

Depends on identifying the salient features of the levels and the specific learner group – not all salient features may be relevant to all groups.

Quantitative and qualitative evidence may be provided.

Inference to a framework- Dr Neil Jones