Report on the Standardisation of National Reading Tests for 2013 ...

1

for the Welsh Government

Report on the Standardisation of National Reading Tests for 2013 Overview report

December 2012

Contents

1 Introduction 1

2 Structure of the standardisation 1

3 The samples 73.1 Sample representation 73.2 Learner data 17

4 Statistical outcomes 194.1 Year 2/3 test – Welsh medium 194.2 Anchor test outcomes 234.3 Bilingual sample outcomes 26

5 Feedback 29

6 Marker panel outcomes 306.1 Outline of the day 306.2 Background of participants 306.3 Introduction to test materials 316.4 Observations during marking 326.5 Feedback after marking 336.6 Optional open-response questions 356.7 Outcomes 366.8 Conclusions and recommendations from marker panel meeting 38

7 Conclusions and Recommendations 30

Appendix 1: Year 2/3 Welsh-medium Test Score Distributions 41

1

1 Introduction This report presents an overview of the findings of the standardisation of tests proposed for use as the National Reading Tests in 2013 and subsequently. A two-week trialling period was offered to schools and the standardisation took place 8th – 19th October 2012. The main purpose of the standardisation was to provide information about whole test functioning, to gather data in order to generate the progress scale and age-standardised scores and to finalise mark schemes and associated teacher guidance.

2 Structure of the standardisation A suite of eight ‘main’ tests were standardised. One test in English and one in Welsh (not translations of each other) was developed for each of year 2/3, year 4/5, year 6/7 and year 8/9. All learners took one ‘main’ test and one other test; either an anchor test or an open response test. Each ‘main’ test consisted of three sections. The Year 2/3 test consisted of a sentence completion section followed by two texts with associated items. The tests for all the other year groups consisted of three texts with associated items. The total number of marks available in each ‘main’ test booklet ranged between 35 and 40 marks. Questions were presented in a variety of formats. The majority of learners also took one of three anchor tests (Anchor A-C in either English or Welsh). Information about the anchor tests, their purpose and the test outcomes are provided in section 4.2. A smaller number of learners took one of the optional open response tests which consisted of three open response questions based on the three texts in the ‘main’ test. These are designed for diagnostic purposes. Each test instrument had a unique identifying number and all materials were logged in and out. For reasons of security, the test was conducted by NFER administrators, who, together with each teacher supervising a class, signed statements of confidentiality. Questionnaires were completed by administrators for each visit and their feedback is included in the relevant reports. All materials were trialled in black and white.

2

The trial was administered in two sessions, with the ‘main’ test items being trialled in session 1 and the anchor or open questions being trialled in session 2. It was recommended that learners had a break between the two sessions. As the tests were taken early in the academic year, samples of learners were drawn from the year above the lowest target age group in order to ensure full coverage of the age range that the ‘live’ tests will need to accommodate (i.e. the Year 4/5 test was taken by Year 5 and Year 6 learners). However, for the Y2/3 test, an additional sample of Year 2 learners was included to ensure full coverage of the age and ability range at the lower end of the scale. For the Year 2/3 test (taken by learners in years 2-4), administrators were asked to liaise with teachers about the length of each session based on the needs of the class. This was to allow for each test to be ‘chunked’ so that learners could do the ‘main’ test in smaller amounts to suit their needs. However, the advised time allowance was for up to 50 minutes to complete session 1 and up to 20 minutes to complete session 2. For years 5-10, it was recommended that session 1 lasted up to 60 minutes and session 2 lasted up to 20 minutes. Tables 2.1 – 2.3 show the numbers of test booklets despatched and completed.

3

Table 2.1 Number of tests sent to schools and completed: English-medium samples

Target year groups (for ‘live’ test) Year 2/3 Year 4/5 Year 6/7 Year 8/9

Year group taking the test at standardisation Year 2 Year 3 Year 4 Year 5 Year 6 Year 7 Year 8 Year 9 Year 10

Main test name EM1 EM2 EM3 EM4 Target sample size 500 500 500 500 500 500 500 500 500 Tests sent to schools 580 565 559 559 563 601 561 600 553 Completed tests 530 503 489 480 496 561 504 524 462

Anchor test A A A A B A B B C B C C C Target sample size 500 500 500 200 200 200 200 200 200 200 200 400 400 Tests sent to schools 580 565 559 217 216 226 225 241 242 228 228 455 448 Completed tests 522 494 482 181 182 193 200 220 225 206 199 384 390

Open response test n/a EO2v1

EO2v2 EO2v3

EO3v1 EO3v2 EO3v3

EO4v1 EO4v2 EO4v3

Target sample size n/a 200 across the 3 versions 200 across the 3 versions 200 across the 3 versions Tests sent to schools n/a EO2v1 - 42

EO2v2 - 42 EO2v3 - 41

EO2v1 - 37 EO2v2 - 37 EO2v3 – 38

EO3v1 - 39 EO3v2 - 39 EO3v3 – 39

EO3v1 - 35 EO3v2 - 35 EO3v3 – 35

EO4v1 - 49 EO4v2 - 48 EO4v3 – 48

EO4v1 - 27 EO4v2 - 27 EO4v3 – 26

Completed tests n/a EO2v1 - 36 EO2v2 - 37 EO2v3 - 39

EO2v1 - 31 EO2v2 - 33 EO2v3 - 36

EO3v1 - 37 EO3v2 - 35 EO3v3 - 37

EO3v1 - 34 EO3v2 - 30 EO3v3 - 33

EO4v1 - 46 EO4v2 - 47 EO4v3 - 45

EO4v1 - 24 EO4v2 - 23 EO4v3 - 24

4

Table 2.2 Number of tests sent to schools and completed: Welsh-medium samples



Main test WM1 WM2 WM3 WM4 Target sample size 500 500 500 500 500 500 500 500 500 Tests sent to schools 571 561 557 576 560 555 557 535 529 Completed tests 516 486 503 515 492 483 488 462 428

Anchor test A A A A B A B B C B C C C Target sample size 500 500 500 200 200 200 200 200 200 200 200 400 400 Tests sent to schools 571 561 557 229 229 226 224 222 221 222 222 445 419 Completed tests 449 477 499 206 207 196 191 190 189 193 189 376 325

Open response test n/a WO2v1

WO2v2 WO2v3

WO3v1 WO3v2 WO3v3

WO4v1 WO4v2 WO4v3

Target sample size n/a 200 across the 3 versions 200 across the 3 versions 200 across the 3 versions Tests sent to schools n/a WO2v1 - 37

WO2v2 - 37 WO2v3 - 36

WO2v1 - 37 WO2v2 - 37 WO2v3 – 36

WO2v1 - 38 WO2v2 - 37 WO2v3 – 37

WO2v1 - 38 WO2v2 - 38 WO2v3 - 37

WO2v1 - 30 WO2v2 - 30 WO2v3 - 30

WO2v1 - 37 WO2v2 - 37 WO2v3 - 36

Completed tests n/a WO2v1 - 31 WO2v2 - 32 WO2v3 - 29

WO2v1 - 34 WO2v2 - 32 WO2v3 - 30

WO2v1 - 35 WO2v2 - 34 WO2v3 - 33

WO2v1 - 36 WO2v2 - 35 WO2v3 - 35

WO2v1 – 28 WO2v2 - 28 WO2v3 - 29

WO2v1 - 34 WO2v2 - 34 WO2v3 - 32

5

Table 2.3 Number of tests sent to schools and completed: Bilingual samples



Test name EM1 EM2 EM3 EM4 Target sample size n/a 170 170 170 170 170 170 Tests sent to schools n/a 192 191 216 212 204 201 Completed tests n/a 165 158 200 189 172 153

Test name WM1 WM2 EM3 EM4 Target sample size n/a 170 170 170 170 170 170 Tests sent to schools n/a 191 216 212 204 201 192 Completed tests n/a 158 198 191 172 155 165

6

The marking of the booklets took place in the two weeks following the trial. For the main test and anchor test items, about 70 per cent of items were automatically marked by data capture (i.e. all multiple-choice items) and the remaining items were marked by clerical markers, following a training session. All markers were monitored for accuracy and ongoing quality checks were conducted during the marking period. For the open response questions, marking was completed by expert markers who had received training from the development team.

7

3 The samples

3.1 Sample representation Six stratified, random samples were used for this trial. One sample was drawn for each of English-medium primary schools, Welsh-medium primary schools, English- medium secondary schools and Welsh-medium secondary schools. Two samples were also drawn to specifically target year 2 learners in each of English-medium infant / primary schools and Welsh-medium infant / primary schools. The samples of schools were stratified by FSM eligibility, urban/rural characteristic, school size and were drawn from maintained schools in Wales with the appropriate age groups. In total 833 schools were sampled. In anticipation of a drop out at learner level at the testing stage, an extra 10% of learners were to be recruited over and above the achieved numbers required, in an attempt to ensure that all targets were met. In addition to these samples, two further samples were drawn from the pool of bilingual schools in order to meet the DfES in the Welsh Government’s requirement to gather information about the comparability of the tests in the two languages. These samples aimed to gather data from 1000 bilingual learners across years 5-10. Year 2/3 were not included due to reluctance to overburden this youngest group and some concern about teachers’ confidence in judging learners’ reading competence when learners are still at an early stage of learning to read. Table 3.1 and 3.2 gives details of responses received from schools.

8

Table 3.1 School participation in standardisation (main samples)

Number of schools Sample numbers 53096 53097 53098 53099 53100 53101 English

Y2 Welsh

Y2 English Primary

Y3-6

Welsh Primary

Y3-6

English Secondary

Y7-10

Welsh Secondary

Y7-10

Drawn in sample 60 60 250 211 160 30

Withdrawn by LA 2 - 1 9 1 -

School closed - - - 2 - -

Schools contacted 58 60 249 200 159 30

Refused / unable to take part 5 2 27 17 28 3

No reply 19 21 92 67 44 7

Agreed to take part 34 37 130 116 87 20

Not required 9 11 30 6 3 -

Tests sent 25 26 100 110 84 20

Withdrew / unable to test - - 1 3 - -

Test materials completed by schools

25 26 99 107 84 20

9

Table 3.2 School participation in standardisation (additional bilingual samples)

Number of schools Sample numbers 53102 53103 Bilingual Primary Y5-6 Bilingual Secondary Y7-10 Drawn in sample 38 24

Withdrawn by LA - -

School closed - -

Schools contacted 38 24

Refused / unable to take part 5 1

No reply 10 4

Agreed to take part 23 19

Not required 5 -

Tests sent 18 19

Withdrew / unable to test 1 -

Test materials completed by schools 17 19

Forty schools did not give a reason for declining to participate in the pre-test, but a total of 49 schools gave one or more reasons which included: • too many requests for help / involvement in too many other projects (9 schools)

• inspection (7 schools)

• lack of time / pressure of work / other commitments (7 schools)

• scheduling / disruption concerns (5 schools)

• school with special problems i.e. reorganisation / closing (4 schools)

• too few eligible learners (2 schools)

• staff or headteacher illness / change / shortage (1 school)

• belief that there is too much testing (1 school)

Tables 3.3-3.8 show the representativeness of the achieved samples. In drawing the samples, the stratification variables were: percentage of learners eligible for Free School Meals, rural/urban location and school size. However, in the tables below, information about school type and region are also included. There were no significant differences between the achieved samples and the national distribution of schools in relation to the Free School Meals variable, which is considered the nearest proxy to attainment.

10

In terms of the primary schools, there were some differences between the schools in the achieved sample and the national distribution of schools in Wales. There were differences between regions and rural/urban descriptors at the five per cent level of significance for Welsh-medium schools as well as a difference in relation to the school sizes at the 0.1 per cent level of significance. In terms of the English-medium primary schools, there was also a difference between the achieved sample and the population in relation to rural/urban at the five per cent level and school sizes at a highly significant level (p<.0005). These differences are due, in part, to the higher proportions of larger schools in the achieved sample than in the national distribution. This is probably due to capacity issues whereby larger schools had a greater tendency to volunteer to take part than medium or small sized schools. There were no significant differences between the secondary schools achieved samples and the national distribution of secondary schools. Due to the small numbers involved, no tests for significant differences between the achieved sample and the whole bilingual schools population were carried out for the bilingual samples.

11

Table 3.3 Representation of the samples – English-medium primary

*p<.05 **p<.01 ***p<.001 ****p<.0005 Percentages may not sum to 100 due to rounding.

Achieved sample All schools n % n %

School type Infant / First 2 1.7 62 6.9 Primary / Combined 113 93.4 790 87.7 Junior 6 5.0 49 5.4 Region North 19 15.7 177 19.6 Powys and South West 14 11.6 142 15.8 South East 88 72.7 582 64.6 Rural / Urban Large towns – less sparse 86 71.1* 573 63.6* Large towns – sparsest 3 2.5* 18 2.0* Small towns – less sparse 23 19.0* 138 15.3* Small towns - sparsest 3 2.5* 15 1.7* Others – less sparse 5 4.1* 99 11.0* Others - sparsest 1 0.8* 58 6.4*

Percentage of learners eligible for Free School Meals Lowest 20% 20 16.5 161 17.9 2nd lowest 20% 21 17.4 155 17.2 Middle 20% 18 14.9 144 16.0 2nd highest 20% 29 24.0 201 22.3 Highest 20% 33 27.3 239 26.5 Missing 0 0.0 1 0.1 Relative school size Small 39 32.2**** 502 55.7**** Medium 49 40.5**** 211 23.4**** Large 33 27.3**** 188 20.9****

Total 121 100.0% 901 100.0%

12

Table 3.4 Representation of the samples – Welsh-medium primary



School type Infant / First 0 0.0 2 0.5 Primary / Combined 132 99.2 389 99.0 Junior 1 0.8 2 0.5 Region North 47 35.3* 191 48.6* Powys and South West 50 37.6* 126 32.1* South East 36 27.1* 76 19.3* Rural / Urban Large towns – less sparse 41 30.8* 85 21.6* Large towns – sparsest 1 0.8* 9 2.3* Small towns – less sparse 17 12.8* 36 9.2* Small towns - sparsest 13 9.8* 26 6.6* Others – less sparse 20 15.0* 71 18.1* Others - sparsest 41 30.8* 166 42.2*

Percentage of learners eligible for Free School Meals Lowest 20% 38 28.6 110 28.0 2nd lowest 20% 36 27.1 98 24.9 Middle 20% 41 30.8 106 27.0 2nd highest 20% 14 10.5 57 14.5 Highest 20% 4 3.0 21 5.3 Missing 0 0.0 1 0.3 Relative school size Small 27 20.3** 136 34.6** Medium 47 35.3** 141 35.9** Large 59 44.4** 116 29.5**

Total 133 100.0% 393 100.0%

13

Table 3.5 Representation of the samples – English-medium secondary



School type Comprehensive to 16 20 24.4 43 26.5 Comprehensive to 18 62 75.6 119 73.5 Region North 15 18.3 30 18.5 Powys and South West 14 17.1 24 14.8 South East 53 64.6 108 66.7 Rural / Urban Large towns – less sparse 64 78.0 121 74.7 Large towns – sparsest 2 2.4 5 3.1 Small towns – less sparse 4 4.9 19 11.7 Small towns - sparsest 4 4.9 6 3.7 Others – less sparse 4 4.9 7 4.3 Others - sparsest 4 4.9 4 2.5

Percentage of learners eligible for Free School Meals Lowest 20% 9 11.0 16 9.9 2nd lowest 20% 20 24.4 33 20.4 Middle 20% 21 25.6 46 28.4 2nd highest 20% 22 26.8 45 27.8 Highest 20% 9 11.0 20 12.3 Missing 1 1.2 2 1.2 Relative school size Small 26 31.7 52 32.1 Medium 22 26.8 55 34.0 Large 34 41.5 55 34.0

Total 82 100.0% 162 100.0%

14

Table 3.6 Representation of the samples – Welsh-medium secondary



School type Comprehensive to 16 5 26.3 8 25.8 Comprehensive to 18 14 73.7 23 74.2 Region North 8 42.1 18 58.1 Powys and South West 0 0.0 1 3.2 South East 11 57.9 12 38.7 Rural / Urban Large towns – less sparse 11 57.9 14 45.2 Large towns – sparsest 0 0.0 0 0.0 Small towns – less sparse 4 21.1 7 22.6 Small towns - sparsest 1 5.3 6 19.4 Others – less sparse 0 0.0 0 0.0 Others - sparsest 3 15.8 4 12.9


Total 19 100.0% 31 100.0%

15

Table 3.7 Representation of the samples – bilingual primary

Percentages may not sum to 100 due to rounding.

Achieved sample All bilingual schools n % n %

School type Infant / First 0 0.0 3 6.7 Primary / Combined 15 88.2 38 84.4 Junior 1 5.9 3 6.7 Missing 1 5.9 1 2.2 Region North 1 5.9 7 15.6 Powys and South West 14 82.4 33 73.3 South East 2 11.8 5 11.1 Rural / Urban Large towns – less sparse 1 5.9 6 13.3 Large towns – sparsest 1 5.9 1 2.2 Small towns – less sparse 5 29.4 8 17.8 Small towns - sparsest 5 29.4 12 26.7 Others – less sparse 0 0.0 6 13.3 Others - sparsest 5 29.4 12 26.7


Total 17 100.0% 45 100.0%

16

Table 3.8 Representation of the samples – bilingual secondary

Percentages may not sum to 100 due to rounding.

Achieved sample All bilingual schools n % n %

School type Comprehensive to 16 0 0.0 0 0.0 Comprehensive to 18 18 94.7 23 95.8 Missing 1 5.3 1 4.2 Region North 7 36.8 7 29.2 Powys and South West 12 63.2 17 70.8 South East 0 0.0 0 0.0 Rural / Urban Large towns – less sparse 2 10.5 2 8.3 Large towns – sparsest 0 0.0 2 8.3 Small towns – less sparse 3 15.8 3 12.5 Small towns - sparsest 8 42.1 9 37.5 Others – less sparse 2 10.5 2 8.3 Others - sparsest 4 21.1 6 25.0


Total 19 100.0% 24 100.0%

17

3.2 Learner data Teachers were asked to provide information about each of the learners taking part in the trial and this was gathered electronically on a secure portal. Information included the learner’s name, date of birth, gender, TA level in reading, home language and fluency in English/Welsh (if EAL or WAL). This information formed part of the analysis and will be crucial in constructing the progress scores and age-standardised scores. 3.2.1 Home language

Teachers were asked to identify each learner’s home language. With the exception of the bilingual samples, the majority of the learners had English as their home language. Across the English-medium samples, 85 per cent of learners had English as their home language. Less than three per cent of the sample spoke Welsh or were bilingual at home. In the Welsh-medium samples, 23 per cent of the learners had Welsh as their home language, whilst 58 per cent had English and 17 per cent were bilingual in Welsh and English. In the bilingual samples, just under half (48%) had English as their home language whilst nearly a third (31%) had Welsh. Eighteen per cent of the bilingual sample was reported to be bilingual, in English and Welsh, at home. Five per cent of the learners in the English-medium samples and about one per cent of learners in the Welsh-medium and bilingual samples spoke home languages other than English and/or Welsh. Across all samples, more than 40 home languages were reported. The most commonly reported home languages included: Bengali, Somali, Urdu/Mirpuri, Polish, Arabic/Egyptian, Punjabi/Pur and Chinese/Mandarin/Taiwanese. Interestingly, information about fluency of those who were taking the test in a language which was not their home language revealed that the majority of learners were either ‘becoming confident users of the [test] language’ or were a ‘very fluent user of [test] language in most social and learning contexts’. For example, 43 per cent of learners in the English-medium sample for whom English was not their home languages were described as being ‘very fluent users of English in most social and learning contexts’, as were 92 per cent of the learners in the bilingual sample. More than two-thirds of the learners (69%) in the Welsh-medium samples for whom Welsh was not their first language were reportedly ‘very fluent users of Welsh in most social and learning contexts’. Less than 10 per cent of learners in any of the samples taking a test in a language that was different to their home language were ‘new to the language’.

18

3.2.2 Reading TA Levels

Teachers were asked to provide a Teacher Assessment (TA) level for reading in the language of the test. A summary of the reported TA levels is provided in Tables 3.9 and 3.10 below.

Table 3.9 Reading TA levels – English-medium samples

Percentage of learners with each of the following TA levels

Year 2* Year 3 Year 4 Year 5 Year 6 Year 7 Year 8 Year 9 Year 10

n = 530 495 488 474 495 558 479 502 430

Level 1 0.8 13.5 5.3 2.7 0.6 0.0 0.0 0.2 0.0

Level 2 1.3 55.2 32.8 16.0 7.1 0.4 0.6 0.0 0.2

Level 3 11.1 22.2 47.7 47.9 29.9 7.5 9.6 4.0 1.4

Level 4 52.6 1.0 14.1 31.2 51.9 48.4 43.4 21.3 11.4

Level 5 29.8 5.5 0.0 2.1 10.5 42.3 36.7 47.2 43.3

Level 6 4.3 2.6 0.0 0.0 0.0 1.4 7.3 21.9 33.5

Level 7 0.0 0.0 0.0 0.0 0.0 0.0 2.3 5.4 9.8

Level 8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5

Total % 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0

Due to rounding, percentages may not sum to 100.

* For Year 2, ‘levels’ refer to the Foundation Phase Outcomes 1-6

Table 3.10 Reading TA levels – Welsh-medium samples

Percentage of learners with each of the following TA levels

Year 2* Year 3 Year 4 Year 5 Year 6 Year 7 Year 8 Year 9 Year 10

n= 479 482 500 513 491 441 441 411 393

Level 1 0.0 14.5 6.8 1.0 0.2 0.0 0.0 0.0 0.0

Level 2 1.3 47.9 33.2 12.5 4.1 1.4 1.1 0.2 0.3

Level 3 9.2 22.0 51.0 49.1 34.0 12.2 6.1 6.3 3.3

Level 4 45.5 3.3 8.8 35.9 46.8 57.8 50.3 29.4 18.8

Level 5 39.0 9.1 0.2 1.6 14.7 27.9 34.2 43.8 31.8

Level 6 5.0 3.1 0.0 0.0 0.2 0.7 8.2 18.5 33.1

Level 7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.7 10.7

Level 8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0

Total % 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0

Due to rounding, percentages may not sum to 100.

* For Year 2, ‘levels’ refer to the Foundation Phase Outcomes 1-6

19

4 Statistical outcomes Statistical analysis was conducted for each booklet. The outcomes for each test are reported in the accompanying reports. For each test the following information is provided: • learner enjoyment ratings

• whole test statistics

• text level statistics

• item level statistics, including analysis of gender differences

• results of administrator questionnaires. In each report, comments are only provided for items where the discrimination value is lower than anticipated and/or where an item gave rise to differential item functioning (DIF) by gender.

4.1 Year 2/3 test – Welsh medium

The main analysis and discussion of the findings for the Welsh-medium test for year 2/3 is in a separate report but additional analysis was carried out as a result of the score distribution observed. The test did not appear to have a normal score distribution, as would be expected, but did appear to have a bi-modal distribution, shown in Figure 4.1 below. Figure 4.1 Total score distribution (n=1505)

0.0

1.0

2.0

3.0

4.0

5.0

6.0

0 3 6 9 12 15 18 21 24 27 30 33

% o

f pup

ils

Score

20

Given that this score distribution differs from those on all the other tests in the suite, it was felt important to investigate the reasons for this and an analysis was carried out looking at test functioning by year group and by home language. Table 4.1 shows the mean scores achieved on this test by learners in the three different year groups (year 2, 3 and 4) and by home language, as recorded by the learners’ teachers on the Pupil Data Form.

Table 4.1 Mean scores on Y2/3 Welsh-medium test by year group and home language

Home language

Year 2 Year 3 Year 4

n Percent Mean score



Welsh 142 27 15.41 110 23 21.35 75 15 29.64

English 275 53 11.07 280 58 17.14 329 65 25.19

Bilingual 93 18 15.60 62 13 21.23 91 18 28.20

Other 8 2 11.38 7 1 19.43 7 1 26.57

Overall 518 100 13.08 484* 100 18.94 503** 100 26.43 Due to rounding, percentages may not sum to 100.

* includes 25 non-respondents

** includes 1 non-respondent

The table shows that the proportion of learners with Welsh as their home language varied considerably between year groups, with the largest proportion in the year 2 group. The proportion of bilingual learners (those with English and Welsh as their home language) was fairly consistent across year groups and their performance was very close to that of their Welsh home language peers. Learners with English as their home language formed the largest sub-group of learners in all year groups. Understandably, the learners with Welsh (or bilingual) as their home language scored more highly than their peers who had English as their home language and this was the case across all year groups. Examining the score distribution of each sub-group graphically (see appendix A1.1) shows that the performance of these groups varies considerably in years 2 and 3 but becomes less apparent by year 4. The year 2 English home language group distribution has a positive skew indicating that the majority of learners have a score that is below half the available marks. The Welsh home language learners have a more normal distribution, as do the bilingual home language learners, although there is a slightly positive skew with these learners tending to obtain less than half of the available marks. In year 3, the Welsh home language learners’ distribution has a slightly negative skew indicating that more learners obtain over half marks whilst the English home

21

language learners’ score distribution appears to be bi-modal, with similar proportions of learners obtaining below half marks and above half marks. Interestingly, by year 4, the score distributions for all three of the groups under discussion are very similar with a negative skew indicating that the majority of the year 4 learners obtain over half of the available marks regardless of home language. This pattern of negative skewness is apparent for all three language groups but, on average, the English home language learners score less well than their Welsh and bilingual home language peers. Figure 4.2 provides a year group summary of the data provided in Appendix A1.1. It shows the score distribution by cumulative percentage of pupils for each of the three home languages such that each point on the line includes the percentage pupils from all the previous points. The vertical line in the middle indicates a score of 50 per cent. In year 2, 80 per cent of the English home language group scores half marks or less compared to about 60 per cent of the Welsh and bilingual home language groups. In year 3, this figure fell to 53 per cent achieving half marks or less for the English home language group and just over 30 per cent for the Welsh and bilingual home language groups. In year 4, about 80 per cent of the English home language group scored half marks or more bringing them much closer to the distribution achieved by their Welsh and bilingual home language peers. The investigation of the year 2/3 test in this way reveals very interesting patterns of performance. Learners with English home language score less well than their Welsh home language peers during years 2 and 3. Whilst they have not caught up and still achieve a lower mean than their Welsh home language peers, English home language learners establish a more similar pattern of performance in reading in Welsh by year 4.

22

Figure 4.2 Cumulative score distributions by year group

23

4.2 Anchor test outcomes

The majority of learners took an anchor test after their main test. The purpose of the anchor tests is to provide a link between the main tests in order to allow all the tests to be placed on the same scale. This allows the difficulty of items within the various tests to be compared against one another and inferences to be made about how scores achieved on one test relate to the score a learner is likely to have achieved on another. These anchor tests will be used as part of the standardisation process in 2013 and 2014 in order to allow for progress on a continuum from year to year, longitudinally, to be calculated. The anchor tests will be used to calibrate the tests for different year groups against one another and the basis for such a calibration is item response theory (IRT). Item response theory works by associating the likelihood of a learner achieving any number of marks on an item to their underlying latent ability. Thus for any level of ability, the expected number of marks to be achieved on any item and hence the expected number of marks to be achieved on any test can be calculated. Using this method it is possible to translate scores on any one of the tests into an overall ability scale independent of which test a learner has taken; that is, the abilities of year 2 and 3 learners can be placed on the same scale as the abilities of year 4 and 5 learners and so on. The outcomes of the anchor tests are summarised in Tables 4.2-4.4 below. The mean score of a given year group has been compared with the mean score of each of the other year groups taking that test. The results of these comparisons, including any significant differences, are also shown in the table below.

24

Table 4.2 Whole test statistics – Anchor test A

English Welsh n 1872 1826 max. possible score 12 13 mean score 6.0

(50%) 6.0

(46%)

median 6.0 6.0 reliability (coefficient alpha) 0.84 0.81 standard deviation 3.5 3.3 mean score boys 5.7** 5.5**** mean score girls 6.2** 6.4****

Year

3 Year

4 Year

5 Year

6 Year

3 Year

4 Year

5 Year

6

mean score Year 2 3.4 **** **** **** **** 3.8 **** **** **** ****

mean score Year 3 4.9 **** **** **** 5.0 **** **** **** mean score Year 4 7.6 **** **** 6.7 **** **** mean score Year 5 8.7 n.s. 8.0 **** mean score Year 6 9.3 9.2 *p<.05 **p<.01 ***p<.001 ****p<.0005 Table 4.3 Whole test statistics – Anchor test B


(60%) 7.7

(51%)

median 9.0 8.0 reliability (coefficient alpha) 0.80 0.80 standard deviation 3.5 3.8 mean score boys 7.9**** 7.2**** mean score girls 8.9**** 8.2****

Year

6 Year

7 Year

8 Year

6 Year

7 Year

8

mean score Year 5 6.7 * **** **** 5.7 **** **** **** mean score Year 6 7.7 **** **** 7.6 n.s. **** mean score Year 7 9.7 n.s. 8.2 **** mean score Year 8 9.4 9.8 *p<.05 **p<.01 ***p<.001 ****p<.0005

25

Table 4.4 Whole test statistics – Anchor test C


(46%) 6.5

(46%)

median 6.0 6.0 reliability (coefficient alpha) 0.74 0.75 standard deviation 3.2 3.1 mean score boys 6.1**** 6.1**** mean score girls 6.9**** 6.8****

Year

8 Year

9 Year 10

Year 8

Year 9

Year 10

mean score Year 7 5.4 n.s. **** **** 4.4 **** **** **** mean score Year 8 5.2 **** **** 5.7 **** **** mean score Year 9 6.8 **** 6.9 ** mean score Year 10 7.7 7.7 *p<.05 **p<.01 ***p<.001 ****p<.0005

The outcomes from the anchor tests confirmed that the tests are functioning as expected with mean scores of about half marks and good reliability values. In the main, the mean scores increased with the age of the learners, as expected. Where this was not the case, for example Anchor Test B (English) where the mean for Year 7 is higher than Year 8, the difference in mean scores was not significant. The reliability of the tests was slightly lower than those observed on the ‘main’ tests but was probably due to the length of the anchor tests being relatively short (maximum of 15 marks compared to 35-40 marks for the ‘main’ tests).

26

4.3 Bilingual sample outcomes

Following discussion with Welsh Government colleagues it was agreed that a change to the specification was required. Originally, the requirement was for separately standardised tests. This meant that the tests would measure the reading attainment of the two ‘populations’ (Welsh-speaking / English-speaking) separately. It was subsequently determined that the outcome measure needed to be the attainment of the single population i.e. learners of a given age across the country, regardless of the language of the test. In order to do this, a further sample of learners for the standardisation of the reading tests was required in order to ascertain comparability between the English-medium and Welsh-medium tests materials. As shown in section 3.1, two bilingual samples were drawn to provide a sample of 1000 bilingual learners (spread across years 4-9) to take both English and Welsh versions of the test. Learners in years 2 and 3 were not included in this sample due to reluctance to overburden this youngest group and due to the concern about teachers’ confidence in judging learners’ reading competence when learners are still at an early stage of learning to read. In addition to the data collected for the other samples, TA levels in reading in both English and Welsh for these learners was gathered in order to ascertain their relative competence in the two languages. This was further supplemented by collection of a teacher judgement of each learner to establish if the learner was considered to be ‘better at reading in English’, ‘better at reading in Welsh’ or was ‘equally good at reading in English and Welsh’. A summary of this data is presented in Table 4.5 below.

Table 4.5 Learners’ reading ability in English and Welsh – by teacher assessment and teacher judgement

Y4/5 test Year 6/7 test Year 8/9 test Better at reading

in: Equally good at reading in English and Welsh

Better at reading in:

Equally good at reading in English and Welsh

Better at reading in:

Equally good at reading in English and Welsh

English Welsh English Welsh English Welsh

overall n 323 389 325 Based on TA levels

n 43 16 263 73 60 253 71 73 178 % 13.3 5.0 81.4 18.8 15.4 65.0 21.8 22.5 54.8

Based on teacher judgement

n 164 31 127 91 107 168 97 125 103

% 50.8 9.6 39.3 23.4 27.5 43.2 29.8 38.5 31.7

Each learner in the bilingual samples took two tests, one in English and one in Welsh. The whole test data for each test is shown in Tables 4.6-4.8 below, together

27

with the whole test data for the English-medium and Welsh-medium samples from the main study for comparison.

Table 4.6 Whole test statistics – comparison samples for Year 4/5 test

Y4/5 test Bilingual sample

(taking both Y4/5 English and Welsh tests)

English-medium sample

Welsh-medium sample

test English Welsh English Welsh n 323 323 976 1007 max. possible score 38 36 38 36 mean score 19.1

(50%) 19.4

(54%) 18.7

(49%) 20.0

(56%) median 19.0 19.0 19.0 21.0 reliability (coefficient alpha) 0.90 0.91 0.90 0.91 standard deviation 8.7 8.2 8.8 8.4 mean score boys 17.9* 17.5**** 17.4**** 18.7**** mean score girls 20.1* 20.9**** 20.1**** 21.4**** *p<.05 **p<.01 ***p<.001 ****p<.0005 Table 4.7 Whole test statistics – comparison samples for Year 6/7 test




Welsh-medium sample


(55%) 20.4

(57%) 23.4

(59%) 21.4

(59%) median 23.0 20.0 24.0 22.0 reliability (coefficient alpha) 0.89 0.88 0.88 0.87 standard deviation 7.9 7.3 7.7 7.1 mean score boys 21.5 20.3 22.1**** 20.5**** mean score girls 22.1 20.6 24.5**** 22.2**** *p<.05 **p<.01 ***p<.001 ****p<.0005

28

Table 4.8 Whole test statistics – comparison samples for Year 8/9 test




Welsh-medium sample


(49%) 18.0

(51%) 20.0

(50%) 19.8

(57%) median 20.0 18.0 20 20.0 reliability (coefficient alpha) 0.87 0.89 0.88 0.89 standard deviation 8.0 7.7 8.1 7.6 mean score boys 18.5* 16.5** 18.5**** 19.3 mean score girls 20.7* 19.3** 21.3**** 20.2 *p<.05 **p<.01 ***p<.001 ****p<.0005

Using the outcomes of these analyses, where learners take tests in both English and Welsh, in conjunction with adjusting for differences learners’ relative ability in English and Welsh according to their TA levels, it is hoped to equate scores achieved in English and Welsh. For example, if it was found that within the bilingual sample, there were fewer learners who tended to have higher ability in Welsh than English then we would give extra weight to these learners such that the combined sample has equally ability in English and Welsh.

29

5 Feedback Feedback was gathered via administrator questionnaires. No teacher questionnaires were used for this trial as it is not possible to make any changes to the test materials that may have been suggested. Teacher consultation took place earlier in the development via informal trialling, teacher questionnaires at item trial and the teacher panel meeting convened by DfES in the Welsh Government colleagues. However, some teachers expressed opinions to the administrators and some submitted written views along with the administrator questionnaires. The feedback from these questionnaires is provided in the appropriate reports.

30

6 Marker panel outcomes

6.1 Outline of the day A marker panel meeting was convened on 23 November 2012 in Llandrindod Wells. A group of 27 teachers, drawn from the four consortia, were invited to attend. The main purpose of the meeting was to gather evidence about the usability of the mark schemes and the time required to mark a ‘typical’ class set of 30 test papers. At the meeting the attendees were first given an opportunity to read a blank copy of one test booklet and its associated mark scheme. They were then asked to mark 30 anonymised test scripts taken from the standardisation exercise. Whilst doing the marking, the teachers were encouraged to annotate their mark schemes with any comments, for example, noting particular answers for which it was difficult to decide on a mark allocation. Following this marking period, a plenary session was held during which comments relating to general marking issues were raised and discussed. Following the meeting, the marks awarded on the teacher marked scripts were compared to the marks given by the development team and analysis of any variations was carried out.

6.2 Background of participants The participants were asked to complete a brief proforma outlining their teaching experience, any marking experience and opinion about the ease of application of the mark schemes.

31

Table 6.1 Summary of background of participants Number of teachers School representation English-medium primary 7 English-medium secondary 7 teachers and 1 teaching assistant

(1 teacher was delayed and was unable to take part in the marking task, but did contribute to discussions)

Welsh-medium primary 7 Welsh-medium secondary 6* Local Authority / Consortia SWAMWAC 5 SEWC 7 South Central 9 North 6 Teaching experience - years as a practitioner Less than 5 years 3 5-10 years 11 11-15 years 5 16-20 years 2 More than 20 years 4 Previous marking experience None 13 National Curriculum tests 5 GCSE marker 6 A level marker 1

* not all teachers completed a questionnaire

6.3 Introduction to test materials Prior to the meeting, sets of materials were prepared for the participants. Each pack consisted of a blank test booklet and associated mark scheme. Sets of 30 scripts were also compiled from the standardisation trial and pre-marked by the project team. Scripts were selected to represent a range of marks and to provide some examples of more challenging responses to mark (e.g. borderline responses). All materials were individually numbered and a record of these was kept to ensure security of the materials. At the start of the meeting, the confidential nature of the materials was emphasised and participants were reminded that they should not talk about the content of the materials with colleagues outside the meeting. Before the marking exercise began participants were allocated to a particular test to work with for the day, based on their current primary or secondary teaching experience. As far as possible, the teachers were divided equally between the four

32

tests in each language with an overlap of primary and secondary teachers for the year 6/7 test. Some of the participants who attended the marker meeting had previously attended the teacher panel meeting held in June 2012. All of these teachers made comments noting the changes that had happened to the texts and questions since they last saw them. Notably, two teachers observed that comments that they had made had been actioned and they were pleased that they had been listened to. Initially, the teachers did not like the practice questions as they did not think they would be helpful but, once the purpose of the practice questions had been further explained, they agreed that the practice items would be supportive. However, this did highlight the need to strengthen the advice given about practice questions in the test manuals that will accompany the tests. Some participants felt that some of the texts were too demanding and questioned the appropriacy of asking learners to read about things with which they were not familiar. Some teachers felt that providing a glossary of more words within some of the texts would be beneficial and some felt that the English-medium texts needed to have a greater Welsh ‘flavour’ to provide relevance for the test takers, although there was recognition from some quarters that there needs to be a balance between prior knowledge and the skill of reading for information. Participants were concerned about the font size on all but the Year 2-3 tests. In particular they felt that the size of the font for Year 4-5 was too small and not consistent with the size of font normally used with learners of this age. Some participants commented that the tests were visually unappealing and requested more colour illustrations in keeping with the current look of GCSE papers. Some concern was expressed as to whether learners with ALN would be able to access the materials and participants felt clear teacher guidance would be needed to ensure that these learners were properly and consistently catered for in all schools. Some teachers felt that it was unnecessary for Year 2-3 learners to read the whole text first, before attempting the questions on a particular section.

6.4 Observations during marking Having had an initial period of studying a test and its associated mark scheme, teachers were then given a set of 30 tests to mark. Although no specific guidance was given on how to approach the exercise, the methods adopted by different participants were of interest.

33

6.4.1 English-medium

The secondary participants worked individually and at some speed. No questions were raised. In contrast, the majority of the primary participants worked more collegiately, moving materials so that they could sit with another participant marking the same test. There was quite often discussion of the marks awarded and some teachers even compared marks awarded for each page and discussed any discrepancies that arose. One teacher, an experienced marker of National Curriculum test papers in England, worked alone and at speed and was the first to complete the task. Several of the primary participants raised queries as they worked through the marking. The main aspect of the marking that caused concern was the use of brackets to indicate whether or not an answer was allowable. Markers of the Year 2/3 test also sought clarity on the need for accurate transcription and whether clear intention for a particular spelling was acceptable or not (e.g. if rigled was acceptable for wriggled). Secondary participants tended to finish more quickly than primary participants. Observations and queries raised will usefully inform the teacher guidance and documents. 6.4.2 Welsh-medium

All the teachers in this group tended to work collegiately, discussing the mark scheme with colleagues and raising queries as they marked.

6.5 Feedback after marking On the whole, participants were happy that the time spent marking 30 scripts was not overly burdensome and that the mark schemes were relatively easy to apply – and would become more so with familiarity. This information was confirmed by the data from the questionnaire completed by participants at the end of the session. This showed that five participants considered the mark schemes ‘very easy’ to apply; 16 found them ‘easy’ and four considered them to be ‘manageable’. No participants indicated that they were ‘difficult’ to apply. It is interesting to note that the marking process allowed the participants to see a range of responses and total marks and some participants commented that by seeing this variety, their initial concerns about the texts and/or questions being too difficult or unsuitable for the target cohort were alleviated.

In discussion, participants raised the following points: • teachers will need to be careful to follow the instructions for each question to

avoid over or under marking particular questions. For example, where the question asks for learners to tick two boxes but receive one mark for both ticks,

34

teachers would need to be careful not to award two marks; one mark per correct tick.

• further guidance is required about how to deal with responses where two parts of an answer are given on one line whilst a second response line is left blank.

• layout – some participants felt that it would be preferable to combine the mark scheme and item information rather than splitting it over the two pages, e.g. so that each question is presented with its associated mark scheme. Some teachers also considered it would be preferable for the mark scheme to be presented as double page spreads the same as the test booklets.

• some participants considered the font size of the mark scheme was too small

• although the use of emboldening key words (e.g. the number of ticks) was considered very helpful, some participants thought that it would be beneficial to extend the emboldening to include the ‘thing’ associated with the number e.g. ‘words’ or ‘phrases’ in questions such as Find and copy one word to describe ....It was noted that the positioning of the instruction was not always consistent and that this could cause confusion for learners.

• some participants felt that the inclusion of any given response (e.g. ticks given as examples in table completion questions, numbers given as examples in sequencing questions) should be made clearer in the mark scheme so that markers do not count a ‘given’ response as part of the learner’s response.

• in table completion exercises it was noted that a significant number of learners of all ages and abilities were marking the correct column with a tick and the incorrect column with a cross. Teachers felt that guidance on this specific point should be included in the mark scheme. They recommended that teachers should be told to give credit for either a tick or cross as long as the response was unambiguous and correct.

In addition to these general comments, remarks relating to specific mark schemes were made and recorded for further reference by the project team. Although invited to annotate the mark schemes, very few comments were received. However, any comments made were noted by project team members and amendments were made, as appropriate. 6.5.1 Comments from teachers about the mark scheme and

marking process • “Easy to mark and better than the English SATs marking that I do”

• “Easy to apply the mark scheme”

• “It makes sense to mark a whole text at a time rather than a double page spread”

• “These tests could be marked by an LSA because the guidance is clear and they might be more objective than the class teacher”

• “It’s much better than the [named test] that we use at the moment” Following mark scheme specific discussions, broader issues relating to the administration and marking of the tests were raised and these included:

35

• some participants felt that the tests did not necessarily need to be marked by teachers. In fact, one of the participants was a learning support assistant who regularly marks tests within her school and was attending the meeting in anticipation of marking these tests next May

• some participants were not clear where the data gathered from the tests was being collected, or the purpose of the data collection. They also raised some concern about what data needed to be provided and particularly whether time would need to be spent calculating standardised scores

• some participants raised concern about the rigour of teacher marking and what quality assurance measures would be in place, particularly if the information provided was to be used for Banding purposes

• one secondary teacher voiced concern that there was likely to be a clash with the GCSE period and felt that this might impact on the time available to mark the tests within the required timescale

• several secondary teachers felt that the onus for the marking would fall on the English and Welsh departments, even though there was recognition that literacy was supposed to be taught across the curriculum. One teacher reported that her school were discussing the possibility of form teachers marking the test for their class to relieve the burden on the English department

• the question of whether online marking was a possibility was raised

• some teachers requested more information about how to arrive at the standardised scores as they were uncertain about the procedure

• some participants felt that it would be useful to have data at item level to gauge which questions proved problematic to individual learners or classes.

6.6 Optional open-response questions Following discussions about the ‘main’ tests, participants were introduced to the open response questions and their purpose as optional materials to provide teachers with additional formative and diagnostic information was explained. Generally, participants felt that the wording of the questions was good, allowing learners to develop their answers and explore understanding of the text. In the main, secondary participants were receptive to the idea, considering that the questions provided useful practice for the types of questions they are expected to do during GCSE. It was also felt that they could usefully be used for transition work between Year 6 and Year 7. Some of the primary participants commented that the questions were very similar to the kind of work they do during guided group reading as oral exercises. They also commented that they would be useful for a) assessing lower ability learners if used as a question to stimulate discussion or b) to assess more able learners and to use as a tool to stretch their understanding.

36

All participants agreed that these types of questions would have to be marked by teachers with a more in-depth knowledge of reading development and of their learners. However, it was also considered that the marking would be demanding without any training provision.

6.7 Outcomes Following the marker panel meeting, all materials were collected in and checked. At this point, it was discovered that two of the blank test booklets were missing. Due to the individual numbering of all items, it was possible to determine who had been working with the materials and DfES colleagues were able to contact the appropriate participants, recall the materials and reinforce the confidential status of the materials. Tables 6.2 and 6.3 give a breakdown of the analysis by test. Each table is then followed by a brief discussion of the questions which posed the most challenge in the marking process.

Table 6.2 Marking data – English tests

Y2/3 test Y4/5 test Y6/7 test Y8/9 test

Number of markers 3 3 4 4

Number of tests marked by each marker

30 30 30 30

Total number of tests marked

90 90 120 120

Time taken to complete marking

First complete: 1hr 22 mins Last complete: 2hrs 5 mins


First complete: 1hr 40 mins Last complete: 1hr 45 mins


Clerical accuracy – (total score awarded [raw score] matches number of marks awarded to questions)

100% 96% 95% 97%

Agreement between NFER mark and markers (percentage of items) Maximum number of marks (undermarked)

1 2* 1 4**

Overmarked 19% 32% 15% 33%

Maximum number of extra marks (overmarked)

3 3 2 3

Undermarked 2% 18% 10% 8%

Incomplete marking n/a 9% 2% n/a * this excludes data for tests which were not completely marked ** due to one section of a test being unmarked, no other discrepancies exceeded 1m

37

Analysis showed that some questions were more prone to marking discrepancies. These can be summarised as follows: [this section redacted for reasons of test confidentiality] One marker did not mark some pages of one test. This resulted in an incorrect total score (3 marks less than the ‘agreed’ mark) but in obtaining the total score the omitted marking was not picked up.

Table 6.3 Marking data – Welsh tests Y2/3 test Y4/5 test Y6/7 test Y8/9 test Number of markers 3 3 6 4

Number of tests marked by each marker

30 30 30 30

Total number of tests marked

90 90 180 120

Time taken to complete marking



First complete: 1hr 27 mins Last complete: 2hrs 10mins


Clerical accuracy – (total score awarded [raw score] matches number of marks awarded to questions)

84% 98% 95% 98%

Agreement between NFER mark and markers (percentage of items) Overmarked 24% 35% 25% 28%

Maximum number of extra marks (overmarked)

5 3 3 3

Undermarked 13% 13% 43% 8%

Maximum number of marks (undermarked)

3 2 7 2

Incomplete marking n/a n/a 1% n/a

Analysis showed that some questions were more prone to marking discrepancies. These can be summarised as follows: [this section redacted for reasons of test confidentiality]

38

6.8 Conclusions and recommendations from marker panel meeting Feedback from the participants indicated that the mark schemes were easy to follow and to apply. However, analysis of the marking indicates that accuracy of marking was worryingly variable both within and across tests. It is worth reiterating that the marking done in the marker meeting was conducted under artificial conditions and that when marking the ‘live’ tests teachers might reasonably be expected to spend longer becoming familiar with the test materials and marking guidance which may help to increase marking accuracy and reliability. Following the meeting, several amendments have been made to the test materials as a result of discussions and comments received. These include: • strengthening the guidance relating to the use of practice questions • strengthening the guidance on reading the generic marking information before

starting to mark any learners’ tests. This is particularly pertinent in relation to information about reading the number of ticks / words / rows needed to award one or two marks and the use of bracketed words and any other additional words

• amending the layout of the mark scheme to put each question and associated mark scheme explanation next to each other

• increasing the text size in the test booklets. In addition to these changes, further recommendations include: • additional support or training would be helpful for teachers; this may be best

delivered by the local authority. This should provide guidance / emphasis on the general marking rules and could extend to incorporate training on the use of the optional open response questions

• consideration should be given to moderating the marking of the tests to ensure quality and reliability of marking and recording and reporting of scores.

39

7 Conclusions and Recommendations Overall the standardisation exercise showed that the eight main tests all functioned appropriately and have good measures of reliability. The anchor tests also functioned as anticipated and will provide robust information with which to link all tests onto the same scale. The time allowed for the tests was appropriate, although it tended to be too generous for older learners. It is recommended that the timings are left as 50 minutes for the year 2/3 and as 60 minutes for the other tests. This will allow flexibility for future rounds of test development and will provide some consistency of administration time across the range of age groups. However, it is also recommended that the guidance to teachers indicates that allowing 45-50 minutes for the tests for years 6/7 and 8/9 would be advisable. The guidance could also confirm that if all learners within a class complete the test within 45-50 minutes, it would be acceptable to stop the test at that point. The open response questions all proved useful for providing formative and diagnostic information, though some function better than others and the numbers of questions to be taken forward for final use remains to be agreed. Year 4/5 learners continued to demonstrate some difficulty with the open response questions, as observed during the item trial. However, the diagnostic information that can be gleaned from the activity is potentially very useful to teachers. It is recommended that they continue to be provided as optional materials that could be administered orally by the teacher working either with individuals or with small groups of learners. This would allow learners to display their understanding without the pressure of having to write a response. Analysis of the administrator questionnaires revealed that some learners commented on the size of the font and this issue was also raised by teachers and the marking panel. To that end, it is recommended that the font size of the reading texts is increased. Other feedback from the administrators suggests that the tests were generally well received and this was confirmed, in the main, by the enjoyment ratings from learners. This information is provided in the specific reports. Administrators reported that, with the exception of some of the youngest learners, tests could be completed within the time allowed. Year 2/3 learners certainly appear to benefit from taking the test in several ‘chunks’. Analysis of the marker panel outcomes revealed that different teachers approached the marking task in different ways and with different levels of success. As a result of the exercise, additional guidance will be provided in the sample materials and in the teacher guidance to try and clarify best practice and approaches. The layout of the mark schemes will also be amended to address feedback received at the meeting. Finally, it is recommended that consideration is given to marker training being made

40

available to all teachers to ensure consistency of approach and ensure reliability of marking. This could be further supported by the introduction of moderation exercises.

41

Appendix 1: Year 2/3 Welsh-medium Test Score Distributions

42

Figure A1.1 Year 2 score distributions – by home language

43


44


45

NRTW

Report on the Standardisation of National Reading Tests for 2013 ...

Documents

Transcript of Report on the Standardisation of National Reading Tests for 2013 ...