When What You Have Is Not Enough NCOLCTL 24 April 2009 Ray Clifford.
-
Upload
evelyn-mccann -
Category
Documents
-
view
213 -
download
0
Transcript of When What You Have Is Not Enough NCOLCTL 24 April 2009 Ray Clifford.
When What You HaveIs Not Enough
NCOLCTL
24 April 2009
Ray Clifford
How Do You Understand theTitle of This Session?
• In the LCTLs there is a shortage of….– Expertise.– Time.– Textbooks.– Tests.– Patience.– Students.– Something else.
And the answer (from a test development perspective) is …
And the answer (from a test development perspective) is …there are often too few students to follow “normal” validation procedures for Reading and Listening proficiency tests.
But are you sure you want to use a proficiency test?
• Proficiency testing is not always the right choice.
Testing is complicated – but it is important!
• Language tests can motivate.
• Language tests can demotivate.
Language Testing and Motivation
• Appropriate tests can motivate learners to improve their skills.
• Appropriate tests can motivate teachers to refine their teaching to match their students’ needs.
• Inappropriate tests can de-motivate both students and teachers.
“Washback” Effects
• Testing has a negative impact when:– Educational goals are reduced to those that
are most easily measured. – Testing procedures do not reflect course
goals, for instance…• Giving multiple choice tests in writing classes.• Using grammar tests as a measure of general
proficiency.• Basing speaking ability on pronunciation alone.
Washback Effects of Tests
• Testing has a positive impact when:– Tests reinforce course objectives.– Tests act as change agents for improving
teaching and learning.
If Tests Are to bePositive Motivators
• We have to select the right type of test for each testing purpose.
3 Major Types of Tests
• Achievement
• Performance
• Proficiency
3 Major Types of Learning
• Limited Transfer
• Near Transfer
• Far Transfer
Aligned Test and Learning Types
• Achievement (Limited Transfer)– Memorized responses using the content of
a specific textbook or curriculum.
• Performance (Near Transfer)– Rehearsed ability to communicate in
specific, familiar settings.
• Proficiency (Far Transfer)– Unrehearsed general ability to accomplish
real-world communication tasks across a wide range of topics and settings.
More on Types of Tests
• Achievement Tests measure:– Rehearsed, memorized responses.
– What was taught.
– Content of a specific textbook or curriculum.
Sample Achievement Test Item
Complete the following with the correct verb form in the past tense.
(go) I _____________ to the United States last year.
(be) My seat on the plane _______ in business class.
(have) My associates and I _________ meetings each day.
(eat) We _________ at typical American restaurants.
More on Types of Tests
• Performance Tests measure: – Semi-rehearsed and rehearsed
responses.
– Ability to communicate in constrained, familiar, and predictable settings.
– What one can do with what has been taught and practiced.
Sample Performance Test Item
Complete the following sentences about an upcoming business trip. Add a minimum of 5 additional words to each sentence.
For an upcoming business trip I plan to __________________________________________.
I am certain that the trip will be successful, because __________________________________________.
More on Types of Tests
• Proficiency Tests measure: – Spontaneous, unrehearsed
communication ability.
– General ability to accomplish communication tasks in a variety of settings.
– Whether skills are transferable from one context to another.
Sample Proficiency Test Item
You will be taking a business trip abroad. Plan an itinerary that spends at least two days in each of the three cities you must visit and costs less than $4,000 for all travel expenses. Then negotiate with a travel agent to purchase the airplane tickets, arrange hotel reservations, and obtain sufficient information about local transportation options to be able to complete the trip within your budget.
What Distinguishes Proficiency Tests
from other tests?
• They test real world tasks.
• They measure a person’s ability to function in a language.
• They provide an overall evaluation across a range of real-world tasks.
• They rate a person’s unrehearsed ability against a set of task, conditions, and accuracy criteria.
ACTFL Proficiency ScaleNovice
Memorized language• Lists words/phrases
– Telegraphic
• Attempts at conversation– Reactive
• Limited topic areas– Social courtesies– Dates, numbers, colors– Family, home, common
objects
• May be difficult to comprehend beyond memorized material.
ACTFL Proficiency Scale Intermediate
Survival Proficiency• Has sufficient language to
create and express own meaning
• Engage in simple conversation• Deal with a simple social
transaction• Ask and answer questions• Comprehensible to a
sympathetic conversation partner
ACTFL Proficiency ScaleAdvanced
Limited Work Proficiency• Speaks with confidence• Can narrate and describe in
all major time frames• Can elaborate, clarify,
illustrate• Can handle a situation with a
complication • Can be a “Story Teller”• Fully comprehensible to
native speakers
ACTFL Proficiency ScaleSuperior
Professional Proficiency
• Can support opinions and hypothesize
• Converse both formally and informally
• Handle abstract treatment of subject
• No pattern of linguistic errors
ACTFL Criteria: Speaking
Quick Review
• 3 main types of Tests.– Ac…– Pe…– Pr…
A Summary that Contrasts: Achievement, Performance and Proficiency
AchievementMemorized,
Limited Transfer
PerformanceRehearsed, Near Transfer
ProficiencyUnrehearsed, Far Transfer
Task
Repeat, produce, choose
Specific skills in familiar settings
A wide range of abilities
Context
Textbook, Curriculum
Focused, constrained, or restricted
Broad, in-depth, variable
Accuracy
Determined by the teacher
Situation dependent
Ascending expectations
Matching
Test Type with
Testing Purpose…
Some CommonTesting Purposes
• Assigning grades in a class.
• Placing students into a sequence of courses.
• Selecting an applicant for a job with limited, static language requirements.
• Screening employees for future jobs with broad, general language requirements.
What would happen if students,who were studying the same
textbook,were given achievement tests by
different teachers? • Unless the two tests asked exactly the
same questions, the students’ would have different responses on one test than on the other.
• Even if the questions were the same, unless the teachers graded using exactly the same criteria, each student’s score would be different.
What would happen if the same students were tested on their
rehearsed performance by University A and University B?
• Unless tests A and B covered exactly the same performance areas, the students’ performance on one test would be different from their performance on the other test.
• Even if the tests were identical, unless the raters from both Universities applied the same performance standards, the students would be given different ratings.
And what would happen if you compared students’ classroom achievement ratings with their performance ratings on a university test with their proficiency ratings? • Those who can pass an unrehearsed,
general proficiency test can also pass a performance test and an achievement test.
• Those who can pass an achievement test, or a rehearsed performance test may not be able to pass a general, unrehearsed proficiency test.
And what does all this mean?
All Three Types of LanguageTests are Needed.
3 Major Types of Tests• Achievement = Memorized responses
using the content of a specific textbook or curriculum.
• Performance = Rehearsed ability to communicate in constrained, familiar settings.
• Proficiency = Unrehearsed general ability to accomplish real-world communication tasks across a wide range of topics and settings.
Activity # 1
• You will be asked about 8 different testing purposes.
• For each of those test purposes, which type of test would you choose?
a. Achievementb. Performancec. Proficiency
Which type of test would you choose:Achievement, Performance, or Proficiency?
1. To assess students’ language learning after Chapter 3 of a beginning language course?
Which type of test would you choose:Achievement, Performance, or Proficiency?
1. To assess students’ language learning after Chapter 3 of a beginning language course?
2. To place students into a university’s sequence of courses?
Which type of test would you choose:Achievement, Performance, or Proficiency?
1. To assess students’ language learning after Chapter 3 of a beginning language course?
2. To place students into a university’s sequence of courses?
3. To test students completing a year-long, intensive language course?
Which type of test would you choose:Achievement, Performance, or Proficiency?
1. To assess students’ language learning after Chapter 3 of a beginning language course?
2. To place students into a university’s sequence of courses?
3. To test students completing a year-long, intensive language course?
4. To screen job applicants for a specific job with well-defined, repetitive tasks?
Which test type would you choose:Achievement, Performance, or Proficiency?
1. To assess students’ language learning after Chapter 3 of a beginning language course?
2. To place students into a university’s sequence of courses?
3. To test students completing a year-long, intensive language course?
4. To screen job applicants for a specific job with well-defined, repetitive tasks?
5. To select someone to be your spokesperson on a news show with a “hostile” moderator?
Which test type would you choose:Achievement, Performance, or Proficiency?
1. To assess students’ language learning after Chapter 3 of a beginning language course?
2. To place students into a university’s sequence of courses?
3. To test students completing a year-long, intensive language course?
4. To screen job applicants for a specific job with well-defined, repetitive tasks?
5. To select someone to be your spokesperson on a news show with a “hostile” moderator?
6. To document employees’ language ability in their personnel files?
Which test type would you choose:Achievement, Performance, or Proficiency?
1. To assess students’ language learning after Chapter 3 of a beginning language course?
2. To place students into a university’s sequence of courses?
3. To test students completing a year-long, intensive language course?
4. To screen job applicants for a specific job with well-defined, repetitive tasks?
5. To select someone to be your spokesperson on a news show with a “hostile” moderator?
6. To document employees’ language ability in their personnel files?
7. To compare the learning of your students with those of other students using the same text book?
Which test type would you choose:Achievement, Performance, or Proficiency?
1. To assess students’ language learning after Chapter 3 of a beginning language course?
2. To place students into a university’s sequence of courses?
3. To test students completing a year-long, intensive language course?
4. To screen job applicants for a specific job with well-defined, repetitive tasks?
5. To select someone to be your spokesperson on a news show with a “hostile” moderator?
6. To document employees’ language ability in their personnel records?
7. To compare results of my students with those of other students using the same text book?
8. To compare the skills of students in Study Abroad programs with “regular” students?
Solving Testing Problems• “The solutions to our problems should be
as simple as possible, but no simpler.”
Albert Einstein
• There is no answer for the overly simple question of “Which test is best?”
• There is an answer to the question, “Which type of test is best for a given purpose?”
Which type of test is best?
• The test that matches the purpose for which the results will be used.– Use achievement tests for testing mastery of
lessons in a textbook.– Use performance tests for checking
rehearsed abilities within specific contexts.– Use proficiency tests for determining
general, unrehearsed ability in real-world situations.
If You Do Want to TestReading and Listening
Proficiency
• It is not as easy as you might think.
• Start by answering the question, “What is reading?”
A ProposedDefinition of Reading
• Reading: The process of deriving meaning from the written symbols used to represent a given language.
But What isReading Proficiency?
• Reading for achievement purposes may be defined differently for each curriculum.
• Reading for specific performance purposes can result in a different definition of reading for each purpose.
But What isReading Proficiency?
• “Proficient Reading” has some consistent, core expectations:– Understanding of texts for the purpose(s) for
which they were written.– Automatic comprehension rather than laborious
decoding.– Comprehension abilities that are sustained
beyond one’s own areas of specialization.
A Proposed Definition of Reading Proficiency
• Proficient reading: The active, automatic process of using one’s internalized language and culture expectancy system to obtain new information and comprehend authors’ views and communicative purposes from the written language symbols those authors have used to communicate their messages.
A Proposed Definition of Reading Proficiency
• Note: Proficient readers can “read to learn”.
A Summary of Receptive Skill Contrasts: Achievement, Performance and Proficiency
Achievement Performance Proficiency
Author’s purpose &
Reader’s task
Understand discreet pieces of learned content
(Learning to read)
Understand new information within familiar contexts
(Learning to read)
Understand new information about unfamiliar topics.
(Reading to learn)
Context
Textbook, Curriculum
Focused, restricted
Broad, in-depth, variable
Accuracy
Determined by the teacher
Situation dependent
Ascending expectations aligned with increasing task complexity.
Tests of Reading and Listening should follow the central
principles of proficiency testing.
• Does the test go beyond decoding?
• Are the tasks tested (questions asked) linked to specific proficiency levels?
• Do the ratings assigned represent a sustained ability across topical domains?
• Are the ratings based on non-compensatory task, domain, and accuracy criteria?
An Example: Total Score VersusCriterion-Referenced Scoring
(3 test takers with the same total score, but different proficiency levels)
Learner Results @
Level 1
Learner Results @
Level 2
Learner Results @
Level 3
Overall
ResultsTrue Level
Alice 65%
Bob 65%
Carol 65%
Criterion-Referenced Approach
• Report scores for each proficiency level separately.
• Check for “sustained” ability at each level.
• A notional reporting scale:– Sustained (consistent evidence) ≈ 70% to 100%– Developing (a lot; not sustained) ≈ 55% to 69%– Emerging (some evidence) ≈ 26% to 54%– Random (occasional evidence) ≈ 0% to 25%
Total Score VersusCriterion-Referenced Scoring
Learner Results @
Level 1
Learner Results @
Level 2
Learner Results @
Level 3
Overall
ResultsTrue Level
Alice 85%Sustained
70%Sustained
40%Emerging
65% Adv / 2
(Barely)
Bob 65%
Carol 65%
Total Score VersusCriterion-Referenced Scoring
Learner Results @
Level 1
Learner Results @
Level 2
Learner Results @
Level 3
Overall
ResultsTrue Level
Alice 85%Sustained
70%Sustained
40%Emerging
65% Adv / 2
(Barely)
Bob 90%Sustained
85%Sustained
20%Random
65% Adv / 2
(Clearly)
Carol 65%
Total Score VersusCriterion-Referenced Scoring
Learner Results @
Level 1
Learner Results @
Level 2
Learner Results @
Level 3
Overall
ResultsTrue Level
Alice 85%Sustained
70%Sustained
40%Emerging
65% Adv / 2
(Barely)
Bob 90%Sustained
85%Sustained
20%Random
65% Adv / 2
(Clearly)
Carol 90%Sustained
60%Developing
45%Emerging
65% Int Hi / 1+(1 with
developing ability @ 2)
Why Aren’t Criterion-Referenced Tests More Common?
• Traditional testing practices are predominately norm-referenced.
• There has been a lack of agreement on the construct to be tested.
• Descriptions of the receptive skills are quite complex.
For reading, how many rating profiles are possible?
• For 10 factors, each with 4 levels there are 40 cells in which a rating may be assigned.
• With one rating per factor, how many different profiles are possible?
10 Factors Author Reader
Rating Level
PurposeTopical
Domains GenreText Type Accuracy Purpose
Topical Domains
Type of Reading
Reading Strategy Accuracy
Superior
Advanced
Intermediate
Novice
With 10 different factors, how many rating profiles are possible?
10 factors with 4 levels produces 410 combinations …
or 1,048,576 possible profiles.
10 Factors Author Reader
Rating Level
PurposeTopical
Domains GenreText Type Accuracy Purpose
Topical Domains
Type of Reading
Reading Strategy Accuracy
Superior
Advanced x x x x
Intermediate x x x
Novice x x x
How might this unwieldy complexity be made more manageable?
• We can reduce the scoring complexity by aligning the rating factors!
• For instance, it would make sense to align the author “topical domains” with the author “purposes” generally associated with those topics.
For 9 factors, how manyrating profiles are possible?
9 factors with 4 levels produces 49 combinations …
or 262,144 possible profiles.
9 Factors Author Reader
Rating Level
PurposeTopical
Domains GenreText Type Accuracy Purpose
Topical Domains
Type of Reading
Reading Strategy Accuracy
Superior
Advanced x x x x
Intermediate x x
Novice x x x
How might this unwieldy complexity be made more manageable?
• Every instance of alignment across factors significantly simplifies the testing and rating process.
• For instance, it would make sense to align the author “genre” with the author “purposes” and “topical domains” generally associated with those genre.
For 8 factors, how manyrating profiles are possible?
8 factors with 4 levels produces 48 combinations …
or 65,536 possible profiles.
8 factors Author Reader
Rating Level
PurposeTopical
Domains GenreText Type Accuracy Purpose
Topical Domains
Type of Reading
Reading Strategy Accuracy
Superior
Advanced x x x x
Intermediate x
Novice x x x
For 7 factors, how manyrating profiles are possible?
7 factors with 4 levels produces 47 combinations …
or 16,384 possible profiles.
7 Factors Author Reader
Rating Level
PurposeTopical
Domains GenreText Type Accuracy Purpose
Topical Domains
Type of Reading
Reading Strategy Accuracy
Superior
Advanced x x x
Intermediate x
Novice x x x
For 6 factors, how manyrating profiles are possible?
6 factors with 4 levels produces 46 combinations …
or 4,096 possible profiles.
6 Factors Author Reader
Rating Level
PurposeTopical
Domains GenreText Type Accuracy Purpose
Topical Domains
Type of Reading
Reading Strategy Accuracy
Superior
Advanced x x x
Intermediate x
Novice x x
For 5 factors, how manyrating profiles are possible?
5 factors with 4 levels produces 45 combinations …
or 1,024 possible profiles.
5 Factors Author Reader
Rating Level
PurposeTopical
Domains GenreText Type Accuracy Purpose
Topical Domains
Type of Reading
Reading Strategy Accuracy
Superior
Advanced x x
Intermediate x
Novice x x
For 4 factors, how manyrating profiles are possible?
4 factors with 4 levels produces 44 combinations …
or 256 possible profiles.
4 Factors Author Reader
Rating Level
PurposeTopical
Domains GenreText Type Accuracy Purpose
Topical Domains
Type of Reading
Reading Strategy Accuracy
Superior
Advanced x
Intermediate x
Novice x x
For 3 factors, how manyrating profiles are possible?
3 factors with 4 levels produces 43 combinations …
or 64 possible profiles.
3 Factors Author Reader
Rating Level
PurposeTopical
Domains GenreText Type Accuracy Purpose
Topical Domains
Type of Reading
Reading Strategy Accuracy
Superior
Advanced x
Intermediate x
Novice x
For 2 factors, how manyrating profiles are possible?
2 factors with 4 levels produces 42 combinations …
or 16 possible profiles.
2 Factors Author Reader
Rating Level
PurposeTopical
Domains GenreText Type Accuracy Purpose
Topical Domains
Type of Reading
Reading Strategy Accuracy
Superior
Advanced x
Intermediate
Novice x
For 1 factor per level, how manyrating profiles are possible?
1 factor with 4 levels produces 41 combinations …
or 4 possible profiles.
Aligned Factors Author Reader
Rating Level
PurposeTopical
Domains GenreText Type Accuracy Purpose
Topical Domains
Type of Reading
Reading Strategy Accuracy
Superior
Advanced x
Intermediate
Novice
Benefits of Aligning Factors• Complexity is reduced.
• Each level becomes a separate “Task, Condition, and Accuracy” ability criterion.
• This hierarchy of levels establishes by-level criteria for measuring “reading proficiency”.
• With this ascending hierarchy of criteria, raters can look for sustained ability.
• Students’ abilities can be compared regardless of the textbook used or the program attended.
Proficiency Tests Are NotIncremental Progress
TestsAligned Factors Author Reader
Rating Level
PurposeTopical
Domains GenreText Type Accuracy Purpose
Topical Domains
Type of Reading
Reading Strategy Accuracy
Superior
Advanced
Intermediate
Novice
Proficiency Tests AreMilestone Tests
Aligned Factors Author Reader
Rating Level
PurposeTopical
Domains GenreText Type Accuracy Purpose
Topical Domains
Type of Reading
Reading Strategy Accuracy
Superior
Advanced
Intermediate
Novice
Activity # 2• Can you rank these 4 reading passages
from easiest to most difficult?
• Renumber the passages according to their relative difficulty.– 1 = Easiest– 2 = 2nd easiest– 3 = the 2nd most difficult– 4 = the most difficult
• Justify your ranking decisions.
Activity # 3• Align each of these 4 reading passages
with the proficiency levels summarized in the “text characteristics” handout?
• Justify your proposed alignment decisions.
Activity # 4• Write an “aligned” question for each text.
• Does the question you wrote require the test taker to read the text for the purpose for which the author wrote it?
Conclusion
• Proficiency tests are criterion-referenced tests.
• If criterion-referenced tests are well constructed, they can be scored based on the criteria they are designed to measure.
• Such criterion-referenced scoring does not require the testing of hundreds of test takers to be able to interpret the test results.
Add Handouts
• 4 texts Levels 0 – 3 … fair use?
• Overview of …