A model for investigating the impact of language assessment...
Transcript of A model for investigating the impact of language assessment...
Teaching and Testing: Opportunities for Learning
June 2006
Hellenic American Union, Athens
A model for investigating the impact of
language assessment within
state educational contexts
Nick Saville
Nick Saville
AthensJune 2006
A model for investigatingthe impact of
language assessment within
state educational contexts
Teaching and Testing: Opportunities for Learning
June 2006
Hellenic American Union, Athens
• Background - a personal perspective• 1980’s
• Messick, Bachman – early 1990s
• The literature on washback/impact• early work and recent progress
• gaps? where next?
• Analysis of three case studies – what can be learnt?
• Towards a comprehensive Model of Impact
• Applying the model in a state educational context• the Asset Languages Project
Outline
PracticalityV
P
TestR
“Practicality in Language Testing: an educational management model”
Main argument: test development is a form of educational innovation - and needs to be managed as such
“... achieving a balance between the purpose of the test, its validity for the purpose, the required reliability for the purpose and the constraints
imposed by the context is essentially the task facing the test designer ….”
Saville (1990), University of Reading - based on test development project Japan, 1987-9
Putting the test into context
V
R P
Test
The aim … is not only to encourage good testing practice, but to prevent bad tests being produced ....
... a bad test is not only one with low reliability and dubious validity but also one which has a damaging backwash on the curriculum.
Saville 1990:11-13
A logical consequence …. is that ethicality will be achieved as a result ..
……. this is because any test which is produced should be appropriate to the educational context in which it is to be used and the effect on learners and institutions will be a major consideration.
U = V + R + I + P
Bachman and Palmer, 1996 : U = Cv + A + I + R + I + P
Developing “useful tests”, fit for purpose
Balancing the test qualities
Usefulness as overall validity
Maxim 1 PLANUse a rational and explicit approach to test development
Maxim 2 SUPPORTSupport stakeholders in the testing process
Maxim 3 COMMUNICATEProvide comprehensive, useful and transparent information
Maxim 4 MONITOR and EVALUATECollect all relevant data and analyse as required.
g Maxims for achieving/monitoring intended impact
Milanovic and Saville 1995 – Considering the impact of the Cambridge EFL examinations
The literature on washback/impact
g Readings in the language testing literature:
• Hamp-Lyons (1989)• Wall and Alderson (1993) Does washback exist? etc..• Language Testing (1996: 13, 3) Messick, Bailey, etc…• Hamp-Lyons (1997)• Watanabe (1997)• Cheng and Watanabe (eds) (2004)
• Recent PhD studies and subsequent books in SILT series based on research conducted in the 1990s:
• Cheng (SILT 21 - 2005)• Wall (SILT 23 - 2005)• Hawkey (SILT 24 - 2006)• Green (SILT 25 forthcoming) - “washback in context”
• Current work in Lancaster, ETS, UCLA, Cambridge, IELTS funded projects etc.
FocalConstruct
Test designcharacteristics
item formatcontent
complexityetc.
Overlap
Potential fornegative backwash
Potential forpositive backwash
Perception oftest importance
Perception oftest difficulty
Backwash toparticipant
Important
Unimportant
No backwash
Intense backwash
Easy
Unachievable
Challenging
Washback direction
Washback intensity
Washback variabilityParticipant characteristics and values
Knowledge/ understanding of test demandsResources to meet test demandsAcceptance of test demands
Other stakeholdersCourse providersMaterials writers
PublishersTeachersLearners
Tony Green IELTS Washback in context: Preparation for academic writing in higher education(SILT 25 - forthcoming)
The model starts from test design characteristics and related validity issues of construct representationidentified with washback by Messick (1996)
Washback will be most intense –have the most powerful effects on teaching and learning behaviours –where participants see the test as challenging and the results as important
SEE BLUE ARROW
The literature on washback/impactSo
• Impact is relatively new in the field of language assessment - an extension on the notion of washback and related to ethicality
• It is now considered to be of growing importance• It is part of a validity argument and evidence needs to be provided
Broadly speaking there is consensus • washback is an aspect of impact related to the “micro contexts” of the
classroom and the school (teaching and learning)• impact deals with wider influences and includes the “macro contexts” -
tests and examinations in societyBUT
g The dynamics between the micro and macro contexts mean that this is a complex rather than a simple relationship
- a “complex dynamic system”
The literature on washback/impact
And currently:
• there is no comprehensive model of test or examination impact within educational contexts
• impact has not yet been fully integrated into an approach to test development and validation in a systematic way
Three case studies – 1995 to 2004
g Case 1 - the world-wide survey of the impact of IELTS• a starting point for the work and the original model for what has followed• a conceptualisation of impact and design/validation of suitable
instruments to investigate it
g Case 2 - the Italian PL2000 project• an application of the model within a macro educational context• an initial attempt at the applying the approach on a limited basis within a
state educational context
g Case 3 - the Florence Language Learning Gains Project• an extension and re-application of the model within in a single school
context • at the micro level focusing on individual stakeholders within a single
language teaching institution
Hawkey – SILT 24 (2006)
Learning from the case studies
g What can be learned using these specific impact projects as meta-data?
g Three key factors of contemporary educational systems need to be accounted for:
1. the nature of complex dynamic systems
2. the roles that stakeholders play within such systems
3. the need to see assessment projects as educational innovations within the systems and to manage change effectively
2. The roles that
stakeholders play
Context
Stakeholders in the Testing Community
Government agencies
Professional bodies
Learners
Teachers, Heads
School owners
Test writers
Consultants
Examiners
Test centre administrators
Materials writers
Publisher
inter alia
Learners
Parents/carers
Teachers, Heads
School owners
Receiving institutions
Government agencies
Professional bodies
Employers
Academic researchers
inter alia)
Input to test design Context of test use -
provided by stakeholders where decisions are made by stakeholders using test scores
Cambridge ESOL
Test construct Test format Test conditions Test assessment criteria
Test scores
Testing system
g See Wall (2005)… a case study using insights from testing and innovation theory
e.g. Henrichsen (1989)
3. The need to see assessment projects as educational innovations and to manage change effectively
Hybrid Model of the Diffusion / Implementation Process
Antecedents Process Consequences
Learning from the case studies
g When applied to language assessment – two key factors also need to be accounted for :
1. the nature of language itself as a socio-cognitive phenomenon - the latest views on validity
2. the nature of the test development and validation process• from conception to routine data collection and analysis
g Impact research, therefore is another kind of validation activity........
Investigating impact as valdidation
g The investigation of impact is not a discrete or one-off activity
g It is an essential component in establishing the overall validity (usefulness) of an assessment system in terms of its fitness forspecific purposes and contexts of use
g The proposed model locates the study of test impact as one of a set of research and development tools within an iterative approach to on-going test validation
g It is consistent with Messick 1996:“In essence ..... test validation is empirical evaluation of meaning and consequences of measurement, taking into account extraneous factors in the applied setting that might erode or promote validity of local score interpretation and use.”
Theory Test Taking Context
TT CONTEXT• TLU • Learning context • Context of score use
1. A SOCIO-COGNITIVE FRAMEWORKMessickBachmanKaneMislevyWeir….. etc.
Consequential aspects
of validity
Theory Test Taking Context
TT CONTEXT• TLU • Learning context • Context of score use
A SOCIO-COGNITIVE FRAMEWORK
The testing system
CoreConstruct
Consequential aspects
of validity
Theory Test Taking Context
TT CONTEXT• TLU • Learning context • Context of score use
The contextsLearning contexts
Testingcontexts
Use of resultscontexts
Consequential aspects
of validity
Identifying stakeholders and their needs
Linking these needs to the requirements of test usefulness- including predicted impact
- theoretical
- practical
Long term, Iterative Processes -a key feature of validation
Model of the Test Development Process
Next phase: applying the model
g Asset Languages within the UK educational context
Welcome to Asset Languages
Asset Languages is a new way of motivating languages learners and rewarding their language skills.
The Asset Languages assessment scheme is for language learners of all ages and abilities: from primary school through to further, higher and adult education. Asset Languages is being developed by Cambridge Assessment through OCR and Cambridge ESOL, as part of the DfES' National Languages Strategy.
g Asset is being developed by Cambridge Assessment and is a voluntary “recognition scheme” based on the Languages Ladder to complement existing qualifications and the CEFR
g Will this approach deliver the intended impacts as a key element of the UK’s National Languages Strategy?
Asset Languages: a new qualifications framework
g National Languages Strategy 2002 (Nuffield Inquiry 2000):
Grew out of a perception that: • Language learning in UK is not successful• Assessment is part of the problem
g Proposed: the Languages Ladder• 'a new voluntary recognition system to complement existing
national qualifications frameworks and the Common European Framework'
Problems with existing language frameworks
g Current qualifications "confusing and uninformative about the levels of competence they represented“
(Nuffield Languages Programme 2002)g New framework should stress meaningful proficiency levels
g Current qualifications do not support learning wellg New framework should provide a “learning ladder” of bite-sized,
accessible learning targets
g The CEFR referred to as a model for addressing both problems
Asset Languages’ two main goals
g Two distinct goals• Support language learning, providing a motivating 'ladder' of learning targets which
enables recognition of each step achieved
• Accredit useful language ability within a can do framework, so that levels are comparable across languages
g A very complex framework:Over 20 languages• 6 CEFR levels• 3 contexts (primary, secondary, adult)• 2 assessment strands (external, teacher-assessed) • 4 skills accredited separately
g Needs a strong methodology for test development and validation
“ … seek validity by design as a likely basis for washback”Messick 1996: 252
..Test
performance
..“Real world”
(target situation of use)
True score
Test score
How can we score what we observe?
Relates to marking,rating criteria
Scoring model
Evaluation
Does the test measure consistently?
Relates totest reliability,rater training,scale construction and version equating using IRTetc
Measurement model
Generalization Extrapolation
Does the test score reflect the candidate’s actual ability?
Relates to validity
e.g. a socio-cognitive model linking features of the learners, the test content and the skills to be measured
CEFRlevels
Specific testing context Link to context-neutral framework
Idealization
How does the specific learning/testing context relate to a more general proficiency framework?
Depends on identifying the salient features of the levels and the specific learner group – not all salient features may be relevant to all groups.
Quantitative and qualitative evidence may be provided.
Inference to a framework- Dr Neil Jones