NATO BAT Testing: The First 200 BILC Professional Seminar 6 October, 2009 Copenhagen, Denmark Dr....

Post on 04-Jan-2016

218 views 3 download

Tags:

Transcript of NATO BAT Testing: The First 200 BILC Professional Seminar 6 October, 2009 Copenhagen, Denmark Dr....

NATO BAT Testing: The First 200

BILC Professional Seminar

6 October, 2009

Copenhagen, Denmark

Dr. Elvira Swender, ACTFL

This Report

1. History of Benchmark Advisory Tests (BAT)

2. 2009 Administration of BAT in 4-Skills

3. BAT Scores

4. Comparing National Scores to Benchmark Scores

5. Observations

This Report

1. History of Benchmark Advisory Tests (BAT)

2. 2009 Administration of BAT

3. Combined BAT Scores

4. Comparing National Scores to Benchmark Scores

5. Observations

Why Benchmark Testing?

• To provide an external measure against which nations can compare their national STANAG test results

• To promote relative parity of scale interpretation and application across national testing programs

• To standardize what is tested and how it is tested

BAT History

• Launched as a volunteer, collaborative project – The BILC Test Working Group

• 13 members from 8 nations• Contributions received from many other nations

– The original goal was to develop a Reading test

• Later awarded a competitive contract by ACT– December, 2006

BAT History (cont’d)

• ACTFL working with BILC Working Group– To develop tests in 4 skill modalities.

• Reading and Listening tests piloted and validated

• Speaking and Writing tests developed– Testers and raters trained and certified

• Test administration and reporting protocols developed

• 200 BAT 4-skills tests allocated under the contract

• Tests administered and rated• Scores reported to Nations

BAT Reading and Listening Tests

• Internet-delivered and computer scored• Criterion-referenced tests

– Allow for direct application of the STANAG Proficiency Scale • Each proficiency level is tested separately

– Test takers take all items for Levels 1,2,3– 20 texts at each level; one item with multiple choice responses

per text• The proficiency rating is assigned based on two separate

scores– “Floor” – sustained ability across a range of tasks and contexts

specific to one level– “Ceiling” – non-sustained ability at the next higher proficiency

level

BAT Speaking Test

• Telephonic Oral Proficiency Interview– Goal is to a produce a speech sample that best demonstrates

the speaker’s highest level of spoken language ability across the tasks and contexts for the level

• Interview consists of– Standardized structure of “level checks” and “probes”– NATO specific role-play situation

• Conducted and rated by one certified BAT-S Tester– Independently second rated by a separate certified tester or

rater

• Ratings must agree exactly– Level and plus level scores are assigned– Discrepancies are arbitrated

BAT Writing Test

• Internet-delivered• Open constructed response• Four, multi-level, prompts

– Prompts target tasks and contexts of STANAG levels 1,2,3

– NATO specific prompt• Rated by a minimum of two certified BAT-W

Raters– Ratings must agree exactly– Level and plus level scores are assigned– Discrepancies are arbitrated

This Report

1. History of Benchmark Advisory Tests (BAT)

2. 2009 Administration of BAT battery

3. Combined BAT Scores

4. Comparing National Scores to Benchmark Scores

5. Observations

2009 BAT Administration

• Allocation to 11 Nations– 8 Nations have completed testing

• Testing began in May, 2009

• Tests administered by LTI, the ACTFL Testing Office

2009 BAT Administration

• Each Nation has a customized client site– Request tests

– View and print test schedules

– Obtain test administration instructions, passwords, and test codes

– Retrieve Ratings

]

This Report

1. History of Benchmark Advisory Tests (BAT)

2. 2009 Administration of BAT

3. Combined BAT Scores

4. Comparing National Scores to Benchmark Scores

5. Observations

Total Number of BAT Scores

Skill BAT

Listening 119

Speaking 115

Reading 119

Writing 115

BAT Scores by Level Cumulative

3

12

19

15

21

49

0

10

22

39

29

5

1

1113

16

12

66

0

11

28

51

22

3

0

10

20

30

40

50

60

70

Listening Speaking Reading Writing

0+

1

1+

2

2+

3

This Report

1. History of Benchmark Advisory Tests (BAT)

2. 2009 Administration of BAT

3. Combined BAT Scores

4. Comparing National Scores to Benchmark Scores

5. Observations

Alignment of National Scores and BAT Scores

Listening Speaking Reading Writing

Black (5) 40% (7) 29% – – – –

White (11) 64% (18) 56% (13) 92% (18) 39%

Red (18) 89% (18) 83% (18) 83% (18) 50%

Blue (20) 85% (19) 47% (20) 55% (20) 60%

Maroon (16) 69% (15) 47% (14) 64% (18) 50%

Purple (12) 8% – – (13) 54% – –

Yellow (17) 24% (18) 0% (18) 33% (18) 0%

Listening Speaking Reading Writing

This Report

1. History of Benchmark Advisory Tests (BAT)

2. 2009 Administration of BAT

3. Combined BAT Scores

4. Comparing National Scores to Benchmark Scores

5. Observations

Observations – Listening Scores

• Exact agreement of BAT and National Scores is 58%– 69 of the 119 Listening scores agree exactly

• When the scores disagree, the National score is HIGHER 88% of the time

• In 8 cases (7%), disagreement is across two levels – 1 vs 3 and 2 vs 4

Observations – Speaking Scores

• Exact agreement of BAT and National Scores is 46%– 53 of 115 Speaking scores agree exactly

• When the scores disagree, the National score is HIGHER in all cases

• In 6 cases (6%),the disagreement is across two levels– 1 vs 3 and 2 vs 4

Observations – Reading Scores

• Exact agreement of BAT and National Scores is 62%– 74 of 119 Reading scores agree exactly

• When the scores disagree, the National score is HIGHER in 85% of the cases

• In 2 cases, the disagreement is across two levels– 1 vs 3

Observations – Writing Scores

• Exact agreement of BAT and National Scores is 38%– 44 of 115 Writing scores agree exactly

• When there is disagreement, the National score is HIGHER in all cases

• In 15 cases, the disagreement is across two levels – 1 vs 3 and 2 vs 4

Accounting for Strictness or Leniency

• Testing rehearsed rather than unrehearsed material– Performance vs proficiency

• Inconsistencies in interpretation of the STANAG• When “plus” ratings are not used, the tendency to

award the next higher level rating to a performance that is substantially better than a baseline performance

For Receptive Skills

• Compensatory cut score setting

• Lack of alignment of author purpose, text type, and reader task at level

• Inadequate item response alternatives

For Productive Skills

• Misalignment of test type and test purpose– Ex: list of discrete questions when goal is to

measure spoken language proficiency

• Inadequate tester/rater norming

Plus Ratings

• Within the Level 1 Range– 60% of ratings are 1– 40% of ratings are 1+

• Within the Level 2 Range– 50% of ratings are 2– 50% of ratings are 2+

Profiles

• Only 12 of 115 profiles (10%) were “flat” – 1 1 1 1 (8)– 2 2 2 2 (2)– 3 3 3 3 (2)

• All remaining profiles are mixed

We are all wondering.

What will the future bring?

Let’s hope it’s not

the same kind of anxiety

these early linguists

experienced.

Questions?Questions?

Extra Slides

Side-by-side BAT and National Test Scores

SkillBAT scores

onlyBAT Scores and National Scores

Reading 119 103

Listening 119 100

Speaking 115 95

Writing 115 95

1

1113

1612

66

0

10

20

30

40

50

60

70

0+11+22+3

BAT Scores by Level Reading

Level BAT- R % of Total

0+ 1 1

1 11 9

1+ 13 11

2 16 14

2+ 12 10

3 66 55

Total 119

3

12

19

15

21

49

0

10

20

30

40

50

0+11+22+3

BAT Scores by LevelListening

Level BAT- R % of Total

0+ 3 2

1 12 10

1+ 19 16

2 15 13

2+ 21 18

3 49 41

Total 119

0

10

22

39

29

15

0

10

20

30

40

0+11+22+3

BAT Scores by LevelSpeaking

Level BAT- S % of Total

1 10 9

1+ 22 19

2 39 34

2+ 29 25

3 15 13

Total 115

0

11

28

51

22

3

0

10

20

30

40

50

60

0+11+22+3

BAT Scores by LevelWriting

Level BAT- W % of Total

1 11 10

1+ 28 24

2 51 44

2+ 22 19

3 3 3

Total 115

Comparing Scores by Level Reading

BAT-R National Test

Level 1 23 9

Level 2 23 35

Level 3 55 49

Level 4 - 10

BAT

L1

BAT

L 2

BAT

L3

National

L 19 -

National

L 212 17 6

National

L3 2 5 40

National

L 4 10

Comparing Scores by Level Listening

BAT-L National Test

Level 1 24 12

Level 2 29 28

Level 3 44 52

Level 4 - 8

BAT

L1

BAT

L 2

BAT

L3

National

L 1 10

National

L 28 15 5

National

L36 12 33

National

L 4 2 6

Comparing Scores by Level Speaking

BAT-S National Test

Level 1 28 11

Level 2 52 34

Level 3 15 44

Level 4 - 6

BAT

L1

BAT

L 2

BAT

L3

National

L 111

National

L 214 20

National

L3 4 28 12

National

L 4 4 2

Comparing Scores by Level Writing

BAT-W National Test

Level 1 35 14

Level 2 57 36

Level 3 3 35

Level 4 - 10

BAT

L1

BAT

L 2

BAT

L3

National

L 114

National

L 216 20

National

L3 5 27 3

National

L 4 10

Listening Speaking Reading Writing

Black (5) 40% (7) 29% – – – –

White (11) 64% (18) 56% (13) 92% (18) 39%

Red (18) 89% (18) 83% (18) 83% (18) 50%

Blue (20) 85% (19) 47% (20) 55% (20) 60%

Maroon (16) 69% (15) 47% (14) 64% (18) 50%

Purple (12) 8% – – (13) 54% – –

Yellow (17) 24% (18) 0% (18) 33% (18) 0%

Alignment of National Scores and BAT Scores