DIFFERENCES IN READING STRATEGIES AND DIFFERENTIAL …...excelled, which were selected response...

DIFFERENCES IN READING STRATEGIES AND

DIFFERENTIAL ITEM FUNCTIONING ON PCAP

2007 READING ASSESSMENT

by

Tanya Scerbina

A thesis submitted in conformity with the requirements

for the degree of Master of Arts

Department of Human Development and Applied

Psychology

Ontario Institute for Studies in Education

University of Toronto

© Copyright by Tanya Scerbina 2012

ii

DIFFERENCES IN READING STRATEGIES AND

DIFFERENTIAL ITEM FUNCTIONING ON PCAP

2007 READING ASSESSMENT

Master of Arts 2012

Tanya Scerbina

Department of Human Development and Applied

Psychology

University of Toronto

Abstract

Pan-Canadian Assessment Program (PCAP) 2007 reading ability item data and contextual data

on reading strategies were analyzed to investigate the relationship between self-reported reading

strategies and item difficulty. Students who reported using higher- or lower-order strategies

were identified through a factor analysis. The purpose of this study was to investigate whether

students with the same underlying reading ability but who reported using different reading

strategies found the items differentially difficult. Differential item functioning (DIF) analyses

identified the items on which students who tended to use higher-order reading strategies

excelled, which were selected response items, but students who preferred using lower-order

strategies found these items more difficult. The opposite pattern was found for constructed

response items. The results of the study suggest that DIF analyses can be used to investigate

which reading strategies are related to item difficulty when controlling for students‟ level of

ability.

iii

ACKNOWLEDGMENTS

My deepest gratitude goes to my supervisor, Dr. Ruth Childs, for her expertise,

guidance and support throughout my Master‟s program, as well as for her insight and feedback

in the process of completing this thesis. I would also like to thank Monique Herbert, the second

member of my supervisory committee, for being an invaluable mentor to me in my program of

study.

The completion of this thesis would not have been possible without the statistical

expertise of Olesya Falenchuk and professional input of Pierre Brochu; I thank both for their

patience and ongoing assistance. I would also like to express my gratitude to the employees of

the Canadian Ministers of Education, Canada, specifically Kathryn O‟Grady and Pierre Brochu

for providing me with indispensable resources and services.

Finally, I would like to thank all of the members of Datahost and my classmates;

especially Amanda, Christie, Jayme and Marija for giving excellent advice. Special thank-you

goes to my family and friends for their immense support and encouragement.

iv

TABLE OF CONTENTS

Abstract ..................................................................................................................................... ii

Acknowledgments ................................................................................................................... iii

Table of Contents ..................................................................................................................... iv

List of Tables ........................................................................................................................... vi

List of Figures ........................................................................................................................ viii

1 Introduction............................................................................................................................ 1

1.1 Differential Item Functioning .......................................................................................... 1

1.2 Reading Process and Strategies ....................................................................................... 1

1.3 Objectives ........................................................................................................................ 6

2 Method ................................................................................................................................... 8

2.1 Data .................................................................................................................................. 8

2.2 Grouping Variable ........................................................................................................... 9

2.3 Analyses ......................................................................................................................... 15

3 Results and Discussion ........................................................................................................ 17

3.1 Score Distributions ........................................................................................................ 17

3.2 Classical Item Analysis.................................................................................................. 19

3.3 Reading Strategies and Test Scores ............................................................................... 32

3.4 DIF and DSF Analyses .................................................................................................. 33

3.4.1 Dichotomous Items ............................................................................................... 33

3.4.2 Polytomous Items and DSF .................................................................................. 36

3.4.3 DIF with Scaled Matching Score, Dichotomous Items ........................................ 37

3.4.4 DIF with Scaled Matching Score, Polytomous Items and DSF ........................... 40

v

4 Implications and Conclusion ............................................................................................... 46

4.1 Limitations and Future Directions ................................................................................. 48

References................................................................................................................................ 50

Appendix A. Factor Analyses for Booklet 1 and Booklet 2 .................................................... 54

Appendix B. Means, Standard Deviations and Frequencies of Students‟ Questionnaire

Responses for Grouping Variable Sample, Booklet 1 and Booklet 2...................................... 56

Appendix C. The Relationship between Total and Scaled Scores for Booklet 1 and Booklet 2

................................................................................................................................................. 58

Appendix D. Missing Item Data for Booklet 1 and Booklet 2 ................................................ 60

Appendix E. Grouping Variable Sample: Item Statistics for Booklet 1 and Booklet 2 .......... 63

Appendix F. Anchor Items Eliminated: Item Discrimination for Booklet 1 and Booklet 2 .... 69

Appendix G. Item Analysis be Section for Booklet 1 and Booklet 2 ...................................... 72

Appendix H. Chi-Square Analyses for Reading Strategies by Item, Booklet 1 and Booklet 2

................................................................................................................................................. 75

Appendix I. DIF with Total Matching Score for Booklet 1 and Booklet 2 ............................. 78

vi

LIST OF TABLES

Table 1 Student Questionnaire: Assessment of Reading Strategies .......................................... 9

Table 2 Means, Standard Deviations and Frequencies of Students’ Questionnaire Responses,

Booklet 1 .................................................................................................................................. 10

Table 3 Means, Standard Deviations and Frequencies of Students’ Questionnaire Responses,

Booklet 2 .................................................................................................................................. 11

Table 4 3-Factor Model: Reading Strategies .......................................................................... 13

Table 5 2-Factor Model ........................................................................................................... 14

Table 6 Item Statistics, Booklet 1 ............................................................................................ 20

Table 7 Item Statistics, Booklet 2 ............................................................................................ 21

Table 8 Distractor Analysis, Booklet 1 .................................................................................... 24

Table 9 Distractor Analysis, Booklet 2 .................................................................................... 25

Table 10 Item Discrimination by Subscores, Booklet 1 ........................................................... 27

Table 11 Item Discrimination by Subscores, Booklet 2 ........................................................... 28

Table 12 DIF for Dichotomous Items, Booklet 1 ..................................................................... 34

Table 13 DIF for Dichotomous Items, Booklet 2 ..................................................................... 35

Table 14 DIF for Polytomous Items, Booklet 1 ....................................................................... 36

Table 15 DSF for Polytomous Items, Booklet 1 ...................................................................... 37

Table 16 DIF for Dichotomous Items with Scaled Matching Score, Booklet 1 ....................... 38

Table 17 DIF for Dichotomous Items with Scaled Matching Score, Booklet 2 ....................... 39

Table 18 DIF for Polytomous Items with Scaled Matching Score, Booklet 1 ......................... 41

Table 19 DIF for Polytomous Items with Scaled Matching Score, Booklet 2 ......................... 41

Table 20 DSF for Polytomous Items with Scaled Matching Score, Booklet 1 ........................ 42

vii

Table 21 DSF for Polytomous Items with Scaled Matching Score, Booklet 2 ........................ 43

viii

LIST OF FIGURES

Figure 1. The relationship between PCAP 2007 Reading Assessment total score and IRT scaled

score. ...................................................................................................................................... 18

1

1 INTRODUCTION

1.1 Differential Item Functioning

Test items exhibit differential item functioning (DIF) when individuals with the same

level of ability belonging to different groups have different probability of responding correctly

(Holland & Thayer, 1988). Traditionally this procedure, called item bias analysis, has been used

to identify unfair test items, which exhibited differential difficulty for individuals having the

same level of knowledge or ability. Test items were said to be potentially biased when

respondents found them differentially difficult depending on characteristics irrelevant to

performance, such as gender, ethnicity, or disability (Holland & Thayer, 1988). However, this

analysis can also be performed to investigate if and to what extent any two groups demonstrate

differential item functioning on the test after matching individuals on their ability level.

The purpose of this study is to demonstrate that DIF analysis can be used to detect

differences in item difficulty for groups of students who tend to use different strategies during

reading. Results from such analysis can suggest which reading strategies may be most effective,

and thus inform educators about optimal choices for reading strategies‟ instruction.

1.2 Reading Process and Strategies

Reading is a complex information processing system, an interaction between such

mental operations as attention, perception, memory and thought, language acquisition and

retention, and other cognitive processes (Koda, 2005). According to a prevailing view, reading

is multidimensional as it involves the reader, text and nature of the reading activity within the

reader‟s sociocultural context, prior knowledge and experience (Magliano, Millis, Ozuru, &

McNamara, 2007). It starts with the reader (1) attending to and perceiving visual input, (2)

2

identifying words using phonological decoding, prior knowledge and context, (3) syntactically

integrating words into sentences, (4) interpreting sentences by semantically integrating words

into the overall message, (5) integrating sentences into bigger units of meaning, such as

paragraphs, and finally, but not necessarily, (6) making inferences of the implied meaning of the

text to establish a deeper understanding of the reading material (McNamara, O‟Reilly, Rowe,

Boonthum, & Levinstein, 2007; Perfetti, 2001).

In the literature, reading is referred to as a highly complex set of mental processes

(Magliano et al., 2007; Perfetti, 2001). In 1963, however, Charles Fries offered an alternative

way of conceptualizing reading in his simple view of reading. According to this view, successful

reading ability is made up of only two components, decoding and linguistic comprehension (or

more commonly referred to in literature today as word recognition and reading comprehension).

Whereas the first component transforms printed letters into mental representations of words, the

second component of reading comprehension integrates these disparate representations into a

meaningful whole (Sweet & Snow, 2003). Fries did not deny the complexity of reading process,

he argued that all other mental operations involved in reading, other than decoding and

linguistic comprehension, are also developed and accessible to individuals who cannot read

(Fries, 1963; Hoover & Gough, 1990).

Over the years, other researchers have elaborated on these two processes of reading.

Word recognition involves phonological awareness and decoding, vocabulary knowledge,

fluency and semantic access (Koda, 2005; VanderVeen, Huff, Gierl, McNamara, Louwerse, &

Graesser, 2007), and therefore entails some degree of semantic processing as it involves an

integration of lexical and contextual information because words‟ precise meaning is deeply

rooted in context. On the other hand, reading comprehension is directly related to semantic

processing and incorporates an array of integrative, interpretive and inferential abilities and

3

skills, such as activation of background knowledge, comprehension monitoring, inference and

prediction making, integration of multiple sources of information, understanding text structure,

and other processes (McNamara et al., 2007; Oakhill & Cain, 2007; VanderVeen et al., 2007). It

is worth noting that in this paper this model of reading is further simplified as other factors

necessary for successful reading are not discussed. For instance, other underlying abilities and

skills required for successful decoding are print awareness, alphabetic knowledge, individual

differences, etc. Motivation and working memory capacity also represent constraints for word

recognition and reading comprehension (VanderVeen et al., 2007).

Successful reading and superior comprehension occur when both word recognition and

comprehension of the text are accomplished in unison (McNamara et al., 2007). Thus, a

successful reader is the one who correctly and rapidly decodes words and coherently integrates

all mental representations into the overall meaning of the text (Magliano et al., 2007; Oakhill &

Cain, 2007). However, in some cases reading components of word recognition and

comprehension are dissociated. Although most good comprehenders are also good word

decoders, the relationship between these two processes is not necessarily sequential (Oakhill &

Cain, 2007). For example, dyslexic individuals struggle with word recognition, but frequently

achieve deep comprehension, and hyperlexic children can achieve superior decoding in the

absence of prior training, but frequently encounter comprehension failures (Hoover & Gough,

1990). Based on these and similar findings, Rapp and colleagues (2007) concluded that these

components develop simultaneously and independently, suggesting that teaching reading

comprehension skills is likely to be effective regardless of students‟ proficiency in decoding

(Rapp, van den Broek, McMaster, Kendeou, & Espin, 2007).

However, Fries‟ dual-process classification of reading does more than simplify the

reading process. It is convenient to dichotomize reading into decoding and comprehension

4

because these processes correspond to distinct cognitive classes reported in the literature, lower-

and higher-level cognitive abilities (Cain, Oakhill, Barnes, & Bryant, 2001; Graesser, 2007;

Magliano et al., 2007; McNamara, 2007; Oakhill & Cain, 2007; Oakhill & Yuill, 1996; Rapp et

al., 2007; VanderVeen et al., 2007). That is, (1) the processes making up the decoding

component correspond to lower-level processes such as explicit words/text decoding abilities

and (2) the processes involved in comprehension encompass higher-level processes such as

complex cognitive integration and inferential abilities.

In the last decade, most of the research on reading has focused on reading

comprehension in order to isolate key higher-level cognitive processes involved in meaning

construction (McNamara, 2007), as these processes are positively associated with superior

reading ability indices and academic achievement in general (Graesser, 2007; Magliano et al.,

2007). Also, in the last two decades, the abundance of reading comprehension research has

translated into application and practice; today, reading comprehension and meaning construction

instruction is widespread in educational settings (Dole, Duffy, Roehler, & Pearson, 1991; Dole,

Nokes, & Drits, 2008). Teaching effective reading comprehension skills is fundamental to

children‟s academic performance as these skills expand readers‟ capability to successfully

understand, derive and construct meaning in reading and in general.

A related concept to comprehension instruction is that of reading strategies, which are

equally important in teaching children to become effective readers. Reading strategies are

defined as “deliberate, goal-directed attempts to control and modify the reader‟s efforts to

decode text, understand words, and construct meanings of text” (Afflerbach, Pearson, & Paris,

2008, p. 368). In literature, the terms reading skills and reading strategies are often used

interchangeably, however there are important distinctions. Acquired reading skills are automatic

5

actions pertaining to reading proficiency, fluency, and comprehension, which require no

conscious awareness.

Paris and colleagues (1983) noted that reading instruction involves a progression from

effortful behaviour and use of reading strategies–whether simple word recognition or complex

meaning comprehension techniques–to acquiring automatic reading skill sets (Paris, Lipson, &

Wixson, 1983). In fact, the same action can be a skill or a strategy for different individuals

depending on the person‟s reading proficiency and context (Afflerbach et al., 2008). Therefore,

being an effective reader does not simply involve using reading strategies, but rather gauging

the situation, monitoring the effectiveness of behaviour, and adapting strategies that are

appropriate to the reading material at hand. Therefore, not all reading strategies are effective in

any given situation. An interesting finding regarding effective readers indicates that their use of

reading strategies is more purposeful and varied than that of an average reader (Fogarty, 2006).

Another distinction is that effective readers are more metacognitively aware and therefore are

better at recognizing when a reading strategy is no longer effective (McNamara et al., 2007;

Mokhtari & Reichard, 2002; Oakhill & Yuill, 1996). The distinction between cognitive and

metacognitive reading strategies is often made in the literature. Whereas cognitive strategies are

related to dealing directly with the content of the material to assist with comprehension,

metacognitive strategies refer to being aware of the reading process, being able to recognize

difficulties, and being able to modify behaviour to facilitate comprehension (Afflerbach et al.,

2008).

Cognitive reading strategies cited in the literature encompass an array of techniques

that help readers with word recognition and comprehension. In fact, there are strategies that deal

with explicit techniques to aid word/text decoding and strategies that help with meaning

integration and meaning construction (McNamara et al., 2007; Oakhill & Cain, 2007). Just as

6

word recognition and comprehension components of reading correspond to lower-level and

higher-level cognitive processes, respectively, various reading strategies can also be

conceptualized as lower- versus higher-level strategies based on what component of reading

they address. To differentiate cognitive reading processes and cognitive reading strategies in this

paper, the terms „lower-level‟ and „higher-level‟ refer to cognitive processes during reading,

whereas the terms „lower-order‟ and „higher-order‟ refer to cognitive reading strategies. Thus,

lower-order strategies address word recognition and include such techniques as defining

unfamiliar words or assimilating new words into existing vocabulary, whereas higher-order

strategies address reading comprehension and involve such techniques as connecting themes

within the text or inferring the author‟s message.

The use of reading strategies has been shown to significantly predict achievement

outcomes evaluated by large-scale assessments (O‟Reilly & McNamara, 2007). The results

suggest that even basic strategies can facilitate comprehension and positively relate to ability

indicators. However, studies such as that by O‟Reilly and McNamara (2007) do not show the

relationship between reading strategies and academic achievement when controlling for general

reading ability. The question that remains to be addressed is, are test items differentially difficult

for students with the same underlying reading ability, but who report using different reading

strategies?

1.3 Objectives

The answer to the question of whether test items are differentially difficult for students

who report using different reading strategies and have the same level of reading ability can be

assessed with differential item functioning analysis. Therefore, the main purpose of this study is

7

to demonstrate that DIF can be used as a tool to investigate individual differences that are

relevant to test performance, such as preference of reading strategies.

This study also addresses the following research questions:

- Is there a pattern in students‟ self-reported use of reading strategies, and if so, what

is it?

- Can groups of students be identified who exhibit preference for lower-order versus

higher-order reading strategies?

- Does the use of different reading strategies affect test performance (overall test and

each item), and if so, in what way?

- Does employing DIF analysis, to detect differences in item difficulty for groups of

students matched on ability who use different reading strategies, reveal additional

information regarding the effects of reading strategies on test performance?

8

2 METHOD

2.1 Data

PCAP is a cyclical Pan-Canadian assessment program of students 13 years of age in

reading, mathematics and science administered by Council of Ministers of Education, Canada

(CMEC). It is a paper-and-pencil test containing constructed and selected response items

administered with student, teacher and school contextual questionnaires. Data from the PCAP

Reading Assessment as a primary domain in 2007 and the corresponding student questionnaire

were used in the following analysis. Data from students who took the test in French were not

included, nor were the mathematics and science items.

The PCAP Reading Assessment contained 50 items: 37 selected response and 13

constructed response items. Test items assessed three subdomains of reading: comprehension,

interpretation and response to text. Two different forms/booklets, designed to be matched on

difficulty and content, were administered to a random Pan-Canadian sample. All but 11 anchor

items were different for the two versions of the test, including different reading prompts. The

anchor reading prompt and anchor items were in the same location in both booklets. The

distribution of items assessing the three subdomains of reading was similar for both booklets of

the test. Selected response items were coded dichotomously and constructed responses were

coded on a scale of 0 to 3. Subscores of selected and constructed response items were also

obtained. On the student questionnaires, 15 items assessed how often students used specific

strategies when reading, rated on a 3-point Likert scale from „rarely or never‟ to „often‟ (see

Table 1).

9

Table 1

Student Questionnaire: Assessment of Reading Strategies

How often do you use the following strategies to help you understand what you are

reading?

(a) Reading out loud to myself

(b) Sounding out as many words as I can

(c) Looking for clues such as headings or captions

(d) Trying to make connections to what I already know

(e) Thinking about the author‟s message

(f) Looking at charts and pictures

(g) Asking someone to help me

(h) Applying what I know about word origins or word parts

(i) Using an outside source like a dictionary

(j) Thinking about the other words in a sentence to figure out the meaning

(k) Finding a quiet place to read

(l) Re-reading the more difficult parts

(m) Highlighting or making notes or drawings on the important parts

(n) Sometimes reading more quickly or more slowly, depending on the material

(o) Trying to predict what the material is about

After eliminating data from students who took the French language version of the test,

the data set contained 7,537 students who wrote the English Reading Assessment Booklet 1 and

7,472 who wrote English Reading Assessment Booklet 2. Students‟ responses on the contextual

questionnaire were matched to their reading achievement scores.

2.2 Grouping Variable

Descriptive analysis was performed for self-reported data regarding the use of the

reading strategies. The distribution of the ratings, means, standard deviations and ratings‟

patterns are reported in Tables 2 and 3. These results demonstrate that (1) students had different

preferences for reading strategies and (2) the results were nearly identical for the students who

were administered Booklet 1 and those administered Booklet 2.

10

Table 2

Means, Standard Deviations and Frequencies of Students’ Questionnaire Responses, Booklet 1

Score Distributions

Questions M SD

Rarely or

never Sometimes Often Pattern

Reading out loud to myself 1.62 0.69 49.8% 38.2% 11.9%

Sounding out as many words as I

can 1.54 0.66 55.3% 35.4% 9.3%

Looking for clues such as headings

or captions 1.82 0.69 34.2% 49.3% 16.5%

Trying to make connections to

what I already know 2.09 0.67 18.8% 53.8% 27.5%

Thinking about the author‟s

message 1.80 0.72 37.8% 44.0% 18.2%

Looking at charts and pictures 2.04 0.71 23.3% 49.1% 27.5%

Asking someone to help me 1.68 0.69 45.3% 41.5% 13.2%

Applying what I know about word

origins or word parts 1.78 0.68 36.8% 48.6% 14.6%

Using an outside source like a

dictionary 1.72 0.69 42.2% 43.9% 14.0%

Thinking about the other words in

a sentence to figure out the

meaning

2.08 0.69 20.0% 51.8% 28.2%

Finding a quiet place to read 2.16 0.76 22.1% 39.9% 38.0%

Re-reading the more difficult parts 2.28 0.71 15.4% 41.2% 43.4%

11

Score Distributions

Questions M SD

Rarely or


Highlighting or making notes or

drawings on the important parts 1.54 0.70 57.9% 30.2% 11.9%

Sometimes reading more quickly

or more slowly, depending on the

material

2.12 0.64 15.0% 58.3% 26.7%

Trying to predict what the material

is about 1.93 0.69 27.5% 52.1% 20.4%

Table 3

Means, Standard Deviations and Frequencies of Students’ Questionnaire Responses, Booklet 2

Score Distributions

Questions M SD

Rarely or



Sounding out as many words as I

can 1.55 0.67 54.6% 35.6% 9.8%

Looking for clues such as headings

or captions 1.82 0.70 34.9% 47.9% 17.2%

Trying to make connections to

what I already know 2.07 0.68 20.0% 52.9% 27.1%

Thinking about the author‟s

message 1.76 0.72 40.9% 42.8% 16.4%



Applying what I know about word

origins or word parts 1.76 0.69 38.8% 46.7% 14.5%

12

Score Distributions

Questions M SD

Rarely or


Using an outside source like a

dictionary 1.71 0.69 42.3% 44.5% 13.2%

Thinking about the other words in

a sentence to figure out the

meaning

2.08 0.69 20.2% 51.6% 28.2%

Finding a quiet place to read 2.16 0.75 21.6% 40.8% 37.6%

Re-reading the more difficult parts 2.27 0.71 15.4% 41.8% 42.8%

Highlighting or making notes or

drawings on the important parts 1.53 0.69 58.5% 30.0% 11.4%

Sometimes reading more quickly

or more slowly, depending on the

material

2.11 0.63 15.3% 58.6% 26.1%

Trying to predict what the material

is about 1.91 0.69 28.7% 51.3% 20.1%

By taking a preliminary look at the content of reading strategies and taking into account

the reviewed literature in the introduction of this paper, it appears that some strategies (e.g.,

„trying to make connections to what I already know‟, „thinking about the author‟s message, and

„trying to predict what the material is about‟) rely on higher-level integrative and inferential

cognitive processes, whereas others (e.g., „reading out loud to myself‟ and „sounding out as

many words as I can‟) rely on lower-level skills such as phonological awareness and decoding.

In fact, students reported using the former strategies more often than the latter.

Because students reported using strategies that conceptually represent higher-order

strategies more often than those that represent lower-order strategies, a factor analysis was

performed to explore whether the above strategies could be dichotomized into these two

13

categories, higher-order and lower-order reading strategies. Specifically, a principal axis factor

(PAF) with a Promax oblique rotation of 15 Likert scale questions was conducted on data from

both samples combined, Booklets 1 and 2, a total of 15009 participants (performing separate

analyses yielded nearly identical results, see Appendix A). The minimum criterion for factor

loadings was .30. Table 4 reports the pattern matrix; the 15 reading strategies have clustered into

three factors. A subsequent factor analysis was performed, after dropping the items of the third

factor (see Table 5), because factor 3 consisted of strategies that were more ambiguous in terms

of their cognitive function as opposed to higher- and lower-order strategies of factor 1 and 2,

respectively. The final analysis with two factors had Cronbach‟s alpha of .75 for factor 1

(higher-order reading strategies) and .67 for factor 2 (lower-order reading strategies).

Table 4

3-Factor Model: Reading Strategies

Questions Pattern

1 2 3

Trying to make connections to what I already know .723

Thinking about the author‟s message .646

Applying what I know about word origins or word parts .618

Looking for clues such as headings or captions .435 .325

Thinking about the other words in a sentence to figure out the meaning .383 .253

Trying to predict what the material is about .331 .217

Asking someone to help me .620 -.100

Sounding out as many words as I can .593

Reading out loud to myself -.170 .559 .165

Highlighting or making notes or drawings on the important parts .109 .391

Using an outside source like a dictionary .184 .356

Looking at charts and pictures .236 .327

Re-reading the more difficult parts .663

Finding a quiet place to read .591

Sometimes reading more quickly or more slowly, depending on the

material

.487

14

Table 5

2-Factor Model

Questions Pattern

1 2


Thinking about the author‟s message .662 -.107


Thinking about the other words in a sentence to figure out the meaning .530

Trying to predict what the material is about .442


Asking someone to help me -.127 .597


Reading out loud to myself .577




According to the results of the two-factor model, factor 1 was consistent with higher-

order reading strategies and factor 2 was consistent with lower-order strategies. “Looking for

clues such as headings or captions” was the only strategy that exhibited cross-factor loadings

above .30. Based on the literature review, it is probable that using such information as headings

or captions relies more on higher-level processes because this information can be used as a tool

to derive meaning from the reading material by integrating it with the overall text. Therefore,

this strategy was more consistent with factor 1, higher-order reading strategies.

Finally, a grouping variable was computed by calculating the difference score for each

individual between mean rating scores of higher-order and lower-order strategies. A difference

score of zero meant that students had no preference for either type of strategies, students with

positive difference scores tended to report using higher-order reading strategies, and those with

negative scores tended to report using lower-order strategies. To create groups that were

distinct, students with difference scores close to zero (i.e., ±0.33) were dropped from DIF

15

analysis. The final sample size was 2,667 for Booklet 1 and 2,623 for Booklet 2. The

distribution of the ratings, means and standard deviations for self-reported use of reading

strategies for the new sample is reported in Appendix B. A limitation of this grouping variable is

that it identified reported reading strategy use in general rather than specifying their use during

the test.

2.3 Analyses

SPSS 17.0 was used to perform descriptive analyses, factor analyses, t-tests, chi-square

tests of independence and classical item analyses (including item difficulty and discrimination).

DIFAS 4.0 software, developed by Penfield (2005), was used to compute DIF statistics. This

software uses non-parametric indices, such as the Mantel-Haenszel statistic and others. One

advantage of using non-parametric tests is their lack of assumptions except for the requirement

of adequate sample size for each combination of the variables. DIFAS software assesses DIF for

dichotomous (e.g., selected response) and polytomous (constructed response) items, detecting

differences in difficulty while controlling for the matching variable of ability. For polytomous

items, DIF indices measure the overall (omnibus) difference in difficulty and differential step

functioning (DSF) indices measure the differences at each score level within the item, or each

step. In the PCAP 2007 Reading Assessment, constructed response items were coded from 0 to

3; DSF uses a cumulative step function to detect differences in difficulty at each step (i.e., first

step being a change from score 0 to 1, second step a change from score ≤1 to 2, and third step a

change from score ≤2 to 3).

For DIF with dichotomous items, DIFAS produces Mantel-Haenszel (MH) chi-square

statistics and a MH Common Log-Odds Ratio that indicates the direction of the DIF. A

categorization of the effect size according to ETS criteria is also produced, „A‟ as small, „B‟ as

16

moderate and „C‟ as large (Penfield, 2007). With polytomous items, a cumulative step-level

Log-Odds Ratio (CU-LOR) is produced, as well as Liu-Agresti Common Log-Odds Ratio as a

measure of effect size. For more information on DIFAS‟ procedures and interpretations, please

refer to Penfield (2005; 2007) and Penfield, Gattamorta, and Childs (2009).

When conducting DIF analyses through DIFAS, the grouping variable described above

was used; the reference group was identified as students who reported using higher-order

reading strategies and the focal group consisted of students preferring lower-order strategies.

When conducting the DIF analyses, total scores and scaled scores were used for matching

ability; the total score of selected response items was used for dichotomous DIF analysis and the

total score of constructed response items was used for polytomous DIF analysis. The PCAP

2007 Reading Assessment data set included the test‟s standardized score with a Canadian mean

of 500 and standard deviation of 100, an IRT scaled score ranging from 100 to 800. DIF

analysis was also performed using this scaled matching score. In order to include this variable in

DIFAS, it has been transformed into a categorical variable with 30 categories by equal

percentiles.

17

3 RESULTS AND DISCUSSION

3.1 Score Distributions

A total of 15009 English-speaking students wrote PCAP 2007 Reading Assessment;

7537 wrote Booklet 1 and 7472 wrote Booklet 2. Missing values on test items were treated as

incorrect responses, and therefore the following item analyses contained no missing data. The

results for the two booklets are reported separately because the booklets consisted of different

items, with the exception of 11 anchor items. The constructed response item scores [0, 1, 2, or 3]

were added to compute a constructed response subscore, with a maximum obtainable value of

39 [13 items 3]. The selected response items were scored as correct/incorrect; correct

responses were added to obtain a selected response subscore, with a maximum value of 37. The

total score was calculated by adding the two subscores, with a maximum value of 76.

Independent-samples t-tests were performed to evaluate whether students‟ performance

was equivalent between booklets. The findings were significant for the total test score,

t(14092.31) = -32.63, p = .00, η² = .07, constructed response subscore, t(15007) = -17.63, p =

.00, η² = .02 and selected response subscore, t(14820.39) = -38.15, p = .00, η² = .091. That is, the

performance on constructed and selected response subtests and the overall test was significantly

lower for students writing Booklet 1 (constructed response subscore: M = 13.84, SD = 7.40;

selected response subscore: M = 23.74, SD = 7.00; total score: M = 37.58, SD = 13.19) than for

students writing Booklet 2 (constructed response subscore: M = 15.97, SD = 7.42; selected

response subscore: M = 27.85, SD = 6.20; total score: M = 43.82, SD = 10.07). PCAP 2007

dataset also included an IRT scaled total score, with a Canadian mean of 500 and a standard

1 Equal variances assumption was violated for selected response subscore and total score tests, but not for the

constructed response subscore; when appropriate, adjusted t statistics are reported.

18

deviation of 100, ranging from 100 to 800. When an independent-samples t-test was conducted

using the scaled reading ability score, no significant difference was found between the booklets.

This finding suggests that the test score underwent a considerable transformation during scaling.

Figure 1 illustrates the relationship between the total score and the scaled score (separate

scatterplots for Booklet 1 and 2 are presented in Appendix C).

Figure 1. The relationship between PCAP 2007 Reading Assessment total score and IRT scaled

score.

In conclusion, although students were randomly assigned to write one of the two

booklets, students writing Booklet 1 performed considerably worse, regardless of the item

19

format (constructed or selected responses). To investigate the differences between the items of

the two booklets further, classical item analyses were conducted.

3.2 Classical Item Analysis

SPSS 17.0 was used to perform item analysis. Item difficulty indices, also referred to as

p-values, were equivalent to item means for dichotomous variables, in this case selected

response items. For constructed response items, item difficulty was computed by dividing the

item mean by its maximum obtainable score. Therefore, the item difficulty indices for the

selected response items are not directly comparable to the indices for the constructed response

items because the latter indicate the proportion of the mean of possible points students obtain.

For item discrimination indices, corrected (i.e., the item was not included in the total score)

point-biserial correlations are reported. Although some sources contend that discrimination

values between .10 and .30 represent fair items (Office of Educational Assessment, 2005), here,

indices below .25 are interpreted as potentially problematic.

Tables 6 and 7 list classical item statistics including item difficulty and item

discrimination for all test items in Booklet 1 and 2, respectively. Consistent with the results of t-

tests reported above, there were big differences between items of Booklet 1 and Booklet 2.

Regarding the difficulty of the items, both, constructed and selected response items were easier

on Booklet 2 (constructed response items: MBooklet 1 = 0.36, MBooklet 2 = 0.41; selected response

items: MBooklet 1 = 0.64, MBooklet 2 = 0.75), except anchor items which had similar item difficulty

indices. Regarding item discrimination, both constructed and selected response items had higher

discriminative power on Booklet 1 than Booklet 2 (constructed responses items: MBooklet 1 =

0.54, MBooklet 2 = 0.45; selected responses items: MBooklet1 = 0.35, MBooklet 2 = 0.24). Anchor items

also had highly divergent point-biserial correlations between the booklets. However, item

20

discrimination indices on both booklets were higher for constructed response items than selected

response items. This suggests that for constructed responses, the test discriminated well between

lower and higher performing students.

Table 6

Item Statistics, Booklet 1

Item Item Format Section/

Source p-value rpb

Proportion

Incorrect

Mean

Score

Correct†

Mean

Score

Incorrect

1 Constructed

Response Items

Only

A1-1 .35 .46 .27 40.63† 29.32

2 A2-1 .39 .57 .26 41.55† 26.23

3 A3-1 .17 .35 .68 44.24† 34.46

4 Constructed B1-1_Anchor .48 .48 .13 39.69† 23.67

5 Selected B2-1_Anchor .91 .27 .09 38.79 25.14










15

Selected

Response Items

Only

C1-1 .91 .36 .09 39.15 21.56

16 C2-1 .87 .39 .13 39.63 23.45

17 C3-1 .56 .39 .44 42.46 31.29

18 C4-1 .88 .32 .12 39.29 25.52

19 C5-1 .71 .31 .29 40.44 30.54

20 C6-1 .74 .39 .26 40.81 28.16

21 C7-1 .37 .32 .63 43.64 33.96

22 C8-1 .62 .29 .38 40.94 32.04

23 Constructed D1-1 .24 .50 .54 44.94† 31.42

24 Selected D2-1 .33 .25 .67 42.95 34.93

25 Selected D3-1 .53 .35 .47 42.30 32.30

26 Selected D4-1 .53 .23 .47 40.80 33.89

27 Selected D5-1 .46 .41 .54 43.88 32.14

28 Selected D6-1 .55 .40 .45 42.73 31.30

29 Constructed D7-1 .31 .56 .36 43.21† 27.34

30 Constructed D8-1 .39 .54 .25 41.29† 26.27

31 Constructed D9-1 .38 .60 .27 42.01† 25.79

32 Selected D10-1 .55 .26 .45 41.12 33.31

33 Constructed D11-1 .34 .59 .34 42.87† 27.40

21


Source p-value rpb

Proportion

Incorrect

Mean

Score

Correct†

Mean

Score

Incorrect

34

Selected

Response Items

Only

E1-1 .62 .39 .38 41.90 30.42

35 E2-1 .47 .31 .53 42.34 33.29

36 E3-1 .66 .40 .34 41.63 29.77

37 E4-1 .59 .46 .41 43.00 29.89

38 E5-1 .53 .26 .47 41.19 33.52

39 E6-1 .61 .47 .39 42.87 29.30

40 E7-1 .48 .40 .52 43.45 32.13

41 E8-1 .61 .34 .39 41.55 31.47

42 E9-1 .72 .49 .29 41.84 26.91

43

Selected

Response Items

Only

F1-1 .59 .36 .41 41.94 31.37

44 F2-1 .60 .31 .40 41.25 32.00

45 F3-1 .78 .42 .22 40.68 26.55

46 F4-1 .66 .29 .34 40.65 31.62

47 F5-1 .42 .35 .59 43.55 33.34

48 F6-1 .72 .35 .28 40.71 29.44

49 F7-1 .86 .26 .14 39.07 28.25

50 F8-1 .57 .42 .43 42.78 30.66

Note. Indices of item difficulty below 0.40 and indices of item discrimination below .25 are in bold. †For constructed response items, mean score correct includes score of 1 or above (i.e., partial and/or full credit).

Table 7

Item Statistics, Booklet 2


Source p-value rpb

Proportion

Incorrect

Mean

Score

Correct†

Mean

Score

Incorrect

1 Constructed

Response Items

Only

A1-2 .41 .38 .18 45.32† 36.96

2 A2-2 .38 .43 .27 46.13† 37.63

3 A3-2 .21 .28 .57 47.09† 41.38












22


Source p-value rpb

Proportion

Incorrect

Mean

Score

Correct†

Mean

Score

Incorrect

15

Selected

Response Items

Only

C1-2 .71 .15 .29 45.04 40.81

16 C2-2 .88 .25 .12 44.86 36.34

17 C3-2 .88 .28 .12 44.98 35.52

18 C4-2 .86 .24 .14 44.95 37.03

19 C5-2 .90 .25 .10 44.74 35.50

20 C6-2 .79 .23 .21 45.20 38.61

21 C7-2 .81 .19 .19 44.95 39.01

22 C8-2 .82 .31 .18 45.48 36.50

23 C9-2 .69 .15 .31 45.15 40.94

24 Constructed D1-2 .49 .47 .13 45.28† 33.83

25 Selected D2-2 .79 .32 .21 45.64 36.81

26 Selected D3-2 .54 .23 .46 46.38 40.87

27 Selected D4-2 .82 .29 .18 45.33 36.91

28 Selected D5-2 .89 .31 .11 45.02 34.05

29 Selected D6-2 .68 .28 .32 46.04 39.04

30 Selected D7-2 .46 .14 .54 45.85 42.09

31 Constructed D8-2 .37 .46 .28 46.59† 36.83

32 Constructed D9-2 .49 .51 .15 45.73† 33.21

33 Constructed D10-2 .40 .48 .22 46.27† 35.35

34 Constructed D11-2 .48 .47 .14 45.47† 33.99

35

Selected

Response Items

Only

E1-2 .66 .23 .34 45.81 39.91

36 E2-2 .73 .23 .27 45.46 39.30

37 E3-2 .84 .29 .16 45.24 36.44

38 E4-2 .79 .21 .21 45.12 38.91

39 E5-2 .62 .29 .38 46.44 39.52

40 E6-2 .77 .26 .23 45.47 38.30

41 E7-2 .71 .27 .29 45.86 38.95

42 E8-2 .75 .28 .25 45.70 38.34

43

Selected

Response Items

Only

F1-2 .74 .23 .26 45.45 39.12

44 F2-2 .70 .23 .30 45.59 39.70

45 F3-2 .76 .26 .24 45.52 38.44

46 F4-2 .94 .30 .06 44.65 31.12

47 F5-2 .82 .28 .18 45.30 37.06

48 F6-2 .59 .19 .41 45.82 40.97

49 F7-2 .69 .20 .31 45.44 40.25

50 F8-2 .61 .17 .39 45.57 41.11


To investigate the reasons for the booklet effect further, especially before the main

results of this study (i.e., differential item functioning) are presented, a distractor analysis was

23

also performed. Tables 8 and 9 report the results of the distractor analysis for Booklet 1 and

Booklet 2, respectively. With few exceptions, this analysis demonstrated several patterns. First,

for constructed response items, the lowest mean total score was observed for students who did

not answer the item correctly (or partially correctly) and the highest mean total score was

observed for those who received the highest item score. In fact, the higher the obtained score

was, the higher was the mean total score. Second, the pattern for the selected response items was

similar; the highest mean total score was obtained by students who chose the correct response

option and the lower mean total scores were observed for other, incorrect options. Specifically,

students with the lowest mean scores tended to select the least frequently chosen option. Finally,

the distractor analysis revealed differences in results between the two booklets of the test, but

only regarding the difficulty level. That is, the aforementioned patterns were found for both

booklets, but the mean total scores were lower across all response scores/options in Booklet 1

and higher in Booklet 2. However, despite being useful, these results did not provide any insight

as to why Booklet 2 contained more items with low item discrimination.

It is worth noting that the reason that the percentages of students responding with A, B,

C or D to selected response items, shown in Tables 8 and 9, do not add up to 100 percent is

because of missing responses. Although missing responses were scored as incorrect, when

individual response options were examined (as opposed to binary correct/incorrect scoring)

missing responses were not included. For this reason, additional tables are presented in

Appendix D which include the percentage of missing data. These results reveal that there was

slightly more missingness found for Booklet 1 than Booklet 2, perhaps because items of Booklet

1 were more difficult.

24

Table 8

Distractor Analysis, Booklet 1

Item Key Frequencies (Percent) Mean Score

0 1 2 3 0 1 2 3

Constructed Response Items

A1-1 ≥1 27.0 47.5 20.5 5.1 29.32 36.88 46.38 52.54

A2-1 ≥1 25.9 36.9 30.9 6.3 26.23 36.66 44.93 53.65

A3-1 ≥1 68.1 17.5 9.7 4.8 34.46 40.18 47.20 53.11

B1-1 ≥1 13.2 36.2 44.5 6.1 23.67 34.57 42.45 49.86

B6-1 ≥1 26.9 25.5 38.8 9.4 25.61 35.28 44.20 51.15

B7-1 ≥1 34.2 23.7 35.4 6.7 26.74 37.40 45.39 52.23

B9-1 ≥1 31.8 27.7 34.7 5.8 26.86 36.81 45.43 52.80

B11-1 ≥1 35.5 24.6 32.9 6.9 28.73 36.14 44.88 53.38

D1-1 ≥1 54.4 24.0 17.3 4.4 31.42 41.44 47.50 54.04

D7-1 ≥1 35.5 39.5 21.8 3.2 27.34 40.55 46.41 54.21

D8-1 ≥1 24.7 38.8 31.0 5.5 26.27 36.89 45.01 51.53

D9-1 ≥1 27.3 35.5 32.0 5.2 25.79 37.15 45.33 54.83

D11-1 ≥1 34.2 34.7 26.7 4.4 27.40 38.60 46.26 55.92

Item Key A B C D A B C D

Selected Response Items

B2-1 B 4.3 91.1 2.9 0.6 29.90 38.79 24.43 21.64

B3-1 D 2.8 3.6 13.1 79.2 27.33 28.88 35.36 39.09

B4-1 B 3.6 76.8 7.6 10.9 22.75 40.47 26.00 32.95

B5-1 A 79.0 6.4 6.6 7.0 40.88 25.64 26.22 26.35

B8-1 A 80.0 6.1 1.4 10.1 39.86 30.38 22.21 30.54

B10-1 A 56.6 12.9 9.8 17.7 42.34 31.16 27.91 35.27

C1-1 B 3.7 91.1 0.9 2.8 24.35 39.15 21.76 22.02

C2-1 D 6.4 2.3 2.5 87.3 26.37 24.37 21.36 39.63

C3-1 D 14.0 21.7 6.6 56.3 36.20 31.52 24.24 42.46

C4-1 B 3.1 87.6 3.0 5.1 24.09 39.29 25.72 30.19

C5-1 D 13.1 13.3 1.0 71.1 32.48 31.46 17.05 40.44

C6-1 D 10.5 6.4 7.3 74.4 30.90 23.78 30.96 40.81

C7-1 A 37.3 10.1 38.3 12.1 43.64 34.05 35.93 30.13

C8-1 B 2.8 62.2 17.1 16.1 24.10 40.94 33.54 33.61

D2-1 D 14.3 16.3 32.7 33.0 31.67 34.72 37.90 42.95

D3-1 C 35.4 5.6 52.8 3.5 34.90 26.55 42.30 26.61

D4-1 C 13.0 19.5 53.4 11.5 35.90 33.36 40.80 36.66

D5-1 C 7.8 32.1 46.3 11.4 24.96 34.39 43.88 34.34

D6-1 C 11.8 5.9 54.9 24.8 29.05 27.47 42.73 34.81

D10-1 C 21.6 7.4 54.6 10.6 37.76 29.12 41.12 33.75

E1-1 D 8.3 20.1 6.1 62.4 27.05 33.97 29.69 41.90

E2-1 D 14.2 16.7 18.9 47.4 32.99 30.90 38.18 42.34

E3-1 B 13.5 65.8 5.1 12.7 31.53 41.63 24.72 32.79

25


A B C D A B C D

E4-1 A 58.7 13.1 3.7 21.7 43.00 28.40 24.35 33.45

E5-1 C 17.2 23.2 53.0 3.7 31.91 37.73 41.19 27.36

E6-1 A 61.0 16.3 11.6 6.4 42.87 28.98 31.21 31.47

E7-1 D 7.1 14.5 25.6 48.1 26.99 34.38 34.08 43.45

E8-1 D 7.2 18.5 9.2 60.6 33.14 33.78 30.16 41.55

E9-1 C 6.8 11.6 71.5 5.6 28.92 28.10 41.84 26.11

F1-1 A 58.8 15.2 13.1 9.5 41.94 30.05 35.72 31.45

F2-1 B 3.3 60.3 10.9 22.1 24.86 41.25 35.87 33.00

F3-1 D 9.2 7.5 2.1 78.0 29.97 27.29 21.62 40.68

F4-1 D 5.7 21.7 2.9 66.0 27.18 35.68 23.90 40.65

F5-1 C 23.8 12.0 41.5 18.9 34.18 36.23 43.55 32.97

F6-1 C 6.7 9.6 72.2 7.9 27.46 32.58 40.71 31.88

F7-1 A 86.2 7.1 1.7 1.2 39.07 35.39 20.74 19.50

F8-1 B 30.2 57.1 2.4 6.3 31.95 42.78 27.23 31.60

Table 9

Distractor Analysis, Booklet 2


0 1 2 3 0 1 2 3


A1-2 ≥1 17.9 46.6 29.0 6.5 36.96 42.54 47.92 53.64

A2-2 ≥1 27.2 36.9 30.5 5.4 37.63 42.64 48.73 55.33

A3-2 ≥1 57.2 25.9 12.9 4.0 41.38 44.64 49.51 55.10

B1-2 ≥1 12.0 36.8 44.8 6.4 34.21 41.49 46.96 53.21

B6-2 ≥1 26.8 27.2 35.8 10.2 36.46 41.88 48.20 53.07

B7-2 ≥1 32.4 24.1 35.7 7.8 36.65 43.06 48.65 53.98

B9-2 ≥1 29.5 29.8 34.3 6.4 36.61 43.22 48.61 54.18

B11-2 ≥1 33.0 25.3 33.2 8.5 37.86 42.21 48.39 53.89

D1-2 ≥1 12.7 38.7 38.2 10.3 33.83 41.24 47.29 53.01

D8-2 ≥1 28.3 36.7 30.9 4.1 36.83 43.60 48.90 56.12

D9-2 ≥1 15.2 30.4 47.6 6.8 33.21 41.12 47.41 54.58

D10-2 ≥1 22.4 39.0 34.1 4.5 35.35 43.01 48.89 54.67

D11-2 ≥1 14.3 34.8 44.3 6.6 33.99 41.25 47.57 53.61

Item Key A B C D A B C D


B2-2 B 4.4 91.2 3.1 0.5 38.71 44.49 36.22 34.32

B3-2 D 2.6 4.1 13.3 78.6 37.64 38.54 42.46 44.66

B4-2 B 3.5 76.3 8.2 11.2 35.12 45.42 37.04 41.47

B5-2 A 78.7 6.1 7.1 7.3 45.86 36.96 36.48 36.19

26


A B C D A B C D

B8-2 A 79.9 6.1 1.7 10.3 45.15 39.53 34.38 39.21

B10-2 A 56.4 13.4 9.7 17.8 46.12 40.92 39.43 42.09

C1-2 C 10.3 9.3 71.2 6.0 40.49 42.50 45.04 40.52

C2-2 A 87.9 4.3 5.0 1.9 44.86 36.23 38.26 36.08

C3-2 D 5.5 1.5 4.1 87.8 38.15 30.39 35.49 44.98

C4-2 C 2.7 5.9 85.8 4.4 33.58 39.18 44.95 38.01

C5-2 B 2.9 90.1 2.7 3.3 34.73 44.74 37.29 37.04

C6-2 B 2.5 79.1 12.7 4.6 35.14 45.20 40.71 37.08

C7-2 A 81.1 10.5 2.2 4.8 44.95 42.15 31.15 38.04

C8-2 C 3.1 10.4 81.6 3.4 37.05 36.73 45.48 37.73

C9-2 C 15.6 11.1 68.5 3.2 42.06 41.33 45.15 38.64

D2-2 D 6.5 6.6 5.4 79.4 39.40 36.46 36.94 45.64

D3-2 D 31.9 4.8 7.0 53.7 42.98 36.49 37.46 46.38

D4-2 B 6.8 82.1 3.3 5.6 38.54 45.33 35.67 38.32

D5-2 C 2.0 3.6 89.1 3.3 34.36 35.20 45.02 35.08

D6-2 C 8.2 13.0 68.3 8.2 38.66 40.88 46.04 38.75

D7-2 D 13.1 24.5 13.5 46.2 41.29 43.60 41.93 45.85

E1-2 C 21.5 2.7 66.4 6.9 40.36 38.25 45.81 42.81

E2-2 A 73.4 7.6 9.2 6.9 45.46 39.80 40.71 40.14

E3-2 D 6.4 4.1 2.9 84.0 39.16 37.28 34.92 45.24

E4-2 B 4.1 79.1 3.6 10.1 37.10 45.12 36.20 42.66

E5-2 B 12.6 62.3 15.4 7.1 38.59 46.44 41.40 40.40

E6-2 D 3.8 2.9 13.6 77.0 37.39 36.44 40.56 45.47

E7-2 B 13.3 70.5 2.2 11.2 39.67 45.86 34.52 40.94

E8-2 C 6.7 8.7 74.5 7.2 40.45 38.40 45.70 39.37

F1-2 C 3.3 7.4 74.3 12.7 37.26 39.31 45.45 41.36

F2-2 D 16.7 2.2 8.6 70.0 41.21 34.35 40.81 45.59

F3-2 A 76.0 4.7 8.5 8.3 45.52 39.09 39.48 39.61

F4-2 D 1.4 1.0 1.4 93.9 33.97 32.69 30.93 44.65

F5-2 D 2.0 2.4 10.9 82.1 32.99 35.50 39.90 45.30

F6-2 D 15.4 3.1 19.6 58.8 41.75 35.74 42.57 45.82

F7-2 B 12.9 68.8 5.2 9.9 41.93 45.44 37.64 42.08

F8-2 D 4.5 23.9 7.4 60.9 38.32 43.65 38.64 45.57

To investigate the nature of the difference between the booklets further, item analyses

were also run with constructed and selected responses separately. Tables 10 and 11 present three

types of point-biserial correlations: item to constructed responses subscore, item to selected

responses subscore and as reported before, item to total score (the far right column).

27

Table 10

Item Discrimination by Subscores, Booklet 1

Item Item Discrimination

rpb for constructed

response items

rpb for selected

response items

rpb for all

items


A1-1 .50 --- .46

A2-1 .59 --- .57

A3-1 .38 --- .35

B1-1 .48 --- .48

B6-1 .61 --- .61

B7-1 .63 --- .61

B9-1 .60 --- .60

B11-1 .56 --- .55

D1-1 .50 --- .50

D7-1 .58 --- .56

D8-1 .57 --- .54

D9-1 .63 --- .60

D11-1 .63 --- .59


B2-1 --- .29 .27

B3-1 --- .20* .20*

B4-1 --- .37 .37

B5-1 --- .45 .46

B8-1 --- .32 .32

B10-1 --- .39 .38

C1-1 --- .39 .36

C2-1 --- .42 .39

C3-1 --- .41 .39

C4-1 --- .36 .32

C5-1 --- .32 .31

C6-1 --- .40 .39

C7-1 --- .33 .32

C8-1 --- .31 .29

D2-1 --- .24* .25

D3-1 --- .35 .35

D4-1 --- .22* .23*

D5-1 --- .39 .41

D6-1 --- .40 .40

D10-1 --- .25 .26

E1-1 --- .42 .39

E2-1 --- .32 .31

E3-1 --- .42 .40

E4-1 --- .48 .46

28


rpb for constructed

response items

rpb for selected

response items

rpb for all

items

E5-1 --- .28 .26

E6-1 --- .49 .47

E7-1 --- .42 .40

E8-1 --- .35 .34

E9-1 --- .50 .49

F1-1 --- .39 .36

F2-1 --- .34 .31

F3-1 --- .45 .42

F4-1 --- .30 .29

F5-1 --- .35 .35

F6-1 --- .38 .35

F7-1 --- .29 .26

F8-1 --- .43 .42

Note.*rpb is between .200 - .249.

Table 11

Item Discrimination by Subscores, Booklet 2


rpb for constructed

response items

rpb for selected

response items

rpb for all

items


A1-2 .49 --- .38

A2-2 .55 --- .43

A3-2 .35 --- .28

B1-2 .53 --- .41

B6-2 .62 --- .48

B7-2 .64 --- .50

B9-2 .61 --- .48

B11-2 .57 --- .45

D1-2 .60 --- .47

D8-2 .58 --- .46

D9-2 .67 --- .51

D10-2 .63 --- .48

D11-2 .63 --- .47


B2-2 --- .04*** .19**

B3-2 --- .02*** .12***

B4-2 --- .04*** .24*

B5-2 --- .06*** .35

29


rpb for constructed

response items

rpb for selected

response items

rpb for all

items

B8-2 --- .06*** .23*

B10-2 --- .33 .21*

C1-2 --- .24* .15**

C2-2 --- .42 .25

C3-2 --- .44 .28

C4-2 --- .39 .24*

C5-2 --- .41 .25

C6-2 --- .37 .23*

C7-2 --- .33 .19**

C8-2 --- .50 .31

C9-2 --- .25 .15**

D2-2 --- .53 .32

D3-2 --- .39 .23*

D4-2 --- .48 .29

D5-2 --- .54 .31

D6-2 --- .45 .28

D7-2 --- .22* .14***

E1-2 --- .40 .23*

E2-2 --- .37 .23*

E3-2 --- .48 .29

E4-2 --- .34 .21*

E5-2 --- .50 .29

E6-2 --- .43 .26

E7-2 --- .47 .27

E8-2 --- .45 .28

F1-2 --- .37 .23*

F2-2 --- .39 .23*

F3-2 --- .42 .26

F4-2 --- .50 .30

F5-2 --- .44 .28

F6-2 --- .30 .19**

F7-2 --- .31 .20*

F8-2 --- .30 .17**

For both booklets, constructed response items‟ discrimination indices were always

higher than those of selected response items. Also, when constructed and selected response

items were correlated with their own scales, the finding that more items in Booklet 2 exhibited

lower point-biserial correlations than in Booklet 1 was replicated. However, correlating items to

Note.*rpb is between .200 - .249, **.150 - .199, ***.000-.149.

30

their own subscores, as opposed to the total score, revealed that (1) point-biserials for all

constructed response items increased, (2) fewer selected response items had low point-biserial

correlations, (3) most of those selected response items which did have low point-biserials were

anchor items and finally (4) these results were found only for Booklet 2.

These findings have important implications. In Table 11, when rpb values are examined

separately for the two subscores instead of the total score, fewer items have low coefficients,

which implies that the test items of Booklet 2 measured more than one ability. This suspicion is

strengthened by the fact that most of these items with low discrimination are anchor items,

which suggests that section B (anchor prompt and items) measured a different dimension than

the rest of the test. Also, the finding that rpb values for constructed responses were higher when

correlated with their own subscale implies that constructed response items also measured

something different than selected response items. Classical item analysis indices, including item

difficulty and item to total score/item to subscore discrimination indices are also reported for the

grouping variable sample in Appendix E. Although the precise values (p-values and rpb values)

are different from those of the full sample, the pattern and conclusions are identical. That is, the

two booklets of the test were different from each other, and potentially measured different

aspects of reading ability.

Cronbach‟s alpha obtained through the reliability analysis does not, however, reveal

any potential problems with internal consistency of test items. In fact, alpha coefficients for (1)

the constructed responses subscore, (2) the selected responses subscore and (3) the total score

(Booklets 1 and 2 separately) are above .85. However, correlating the test subscales with each

other and with the total score supports the notion that Booklet 2 measured more than one ability.

That is, for Booklet 2, the relationship between constructed and selected response subscores was

almost non-existent (rCRxSR = .09), whereas these subscales were strongly related to the total

31

score (rCRxTotal = .79 and rSRxTotal = .66 for constructed and selected response subscales,

respectively); this is not surprising, as the items in each subscore also contributed to the total

score. In comparison, all of these correlations were strong for Booklet 1 (rCRxSR = .68, rCRxTotal =

.92, rSRxTotal = .91).

To investigate the poor fit between anchor items and the rest of the Booklet 2 test,

classical item analysis was also performed without the anchor items (Appendix F). The results

indicate that there were substantially fewer items low in discrimination on Booklet 2 after

eliminating anchors, which suggests that anchor items were measuring something other than the

rest of the test items on Booklet 2. When items were grouped by section (A, B, C, D, E, F) and

correlations of anchor items to their section subscale scores were computed, slightly different

results between the booklets persisted (anchor items rpb coefficients: MBooklet 1 = 0.44, MBooklet 2 =

0.40), see Appendix G. Also, more items with lower coefficients on Booklet 2 than Booklet 1

were observed for Section D, the only other section containing both constructed and selected

response items. This implies that constructed response items measured something other than

selected response items, but only on Booklet 2. Because PCAP was designed to assess three

subdomains of reading ability (comprehension, interpretation and response to text), separate

analyses were performed to evaluate whether there was an effect by the reading domain.

However, consistent with previous analyses, lower item to subdomain correlations were found

on Booklet 2 than Booklet 1 regardless of the domain of reading ability. Thus, the potential

multidimensionality of Booklet 2 cannot be explained by the aforementioned domains evaluated

by PCAP. Only two factors had an effect on these results, the inclusion of anchor items and the

item format. Therefore, the findings of the item discrimination analyses (1) by subscore, (2)

with anchor items eliminated, (3) by section and (4) by subdomain suggest that Booklet 2 was a

multidimensional measure of reading ability because constructed response and selected response

32

items were not consistent with each other and anchor items were not consistent with other

Booklet 2 test items.

All of the above analyses were carried out in order to better understand the ensuing DIF

results and to be able to make appropriate interpretations. Because of the findings reported

above, Booklets 1 and 2 will be treated as entirely two different examinations, both measuring

one or more aspect of reading ability.

3.3 Reading Strategies and Test Scores

As described in the literature review, higher-level cognitive skills contribute to

successful reading comprehension and deeper meaning construction. Thus, it is important to

explore whether the reported use of higher-order reading strategies is positively related to

students‟ reading ability too, and to investigate the role of lower-order reading strategies.

Independent-samples t-tests were conducted to assess whether the grouping variable of

students‟ preference for using lower- versus higher-order reading strategies predicted students‟

reading ability assessed by PCAP. The findings for Booklet 1 were significant for the total test

score, t(593.00) = 15.53, p = .00, η² = .09, constructed response item subscore, t(2665) = 12.37,

p = .00, η² = .05 and selected response item subscore, t(585.40) = 16.12, p = .00, η² = .102. The

findings for Booklet 2 were slightly different, significant results were found for the total test

score, t(2621) = 7.91, p = .00, η² = .02 and selected response item subscore, t(530.27) = 12.94, p

= .00, η² = .08; the results for constructed response item subscore were not significant, t(2621) =

-.07, p = .95, η² = .00. For instance, for total score t-test, students who tended to use higher-

order reading strategies scored higher on the overall test (Booklet 1: M = 43.45, SD = 11.40;

Booklet 2: M = 46.45, SD = 9.33) than those who used lower-order strategies (Booklet 1: M =

2 Equal variances assumption was violated for some tests; when appropriate, adjusted t statistics are reported.

33

33.57, SD = 12.33; Booklet 2: M = 42.49, SD = 9.91). This finding was also replicated with the

IRT scaled score, t(2665) = 16.15, p = .00, η² = .09 for Booklet 1 and t(2621) = 10.66, p = .00,

η² = .04 for Booklet 2. That is, the reading ability scaled score was higher for students who

reported using higher-order reading strategies (Booklet 1: M = 520.58, SD = 85.11; Booklet 2:

M = 506.51, SD = 82.57) more than lower-order (Booklet 1: M = 448.00, SD = 90.25; Booklet 2:

M = 459.74, SD = 83.02). In conclusion, students who preferred higher-order reading strategies

tended to do better on the overall assessments of reading ability.

Two-way contingency table analyses were performed to assess the relationship between

higher-/lower-order reading strategies and students‟ performance on each test item with chi-

square test of independence (without taking the level of ability into account). For Booklet 1,

students who reported using higher-order reading strategies performed significantly better on all

except one (B3) constructed response and selected response items; no negative effects were

found. For Booklet 2, such students performed significantly better on 31 out of 37 selected

response items and significantly worse on one selected response item (B8) in comparison to

students who reported using lower-order reading strategies; the use of reading strategies on

Booklet 2 was not significantly related to answering any constructed response items correctly.

Therefore, using lower-order reading strategies was almost never related to better performance

on test items. See Appendix H for the full list of reading strategies by item crosstabulations and

chi-square results.

3.4 DIF and DSF Analyses

3.4.1 Dichotomous Items

DIF analysis was conducted on all 37 selected response items, matching students on the

total score (i.e., selected response subscore). The Mantel-Haenszel chi-square statistic indicates

34

which items exhibited significant DIF. Another statistic, the Breslow-Day chi-square is effective

in detecting nonuniform DIF (Penfield, 2007). The combined decision rule (CDR) combines

both statistics in the decision to flag the item for DIF. The direction of DIF is also identified,

either in favour of students who used higher-order reading strategies or those who used lower-

order strategies. And finally, the last column refers to the “effect size” of small (A), moderate

(B) and large (C) based on the Educational Testing Services‟ (ETSs‟) categorization scheme.

For more details, please refer to Penfield (2007) and Penfield, Gattamorta, and Childs (2009).

Before interpreting the results of the DIF analysis, the assumption of adequacy of cell

count was verified. For Booklet 1, 29% of all cells (grouping by total score matching variable)

had fewer than five cases and 9% had a count of zero. For Booklet 2, 38% of all cells had fewer

than five cases and 17% had zero; thus, the adequacy of cell count might be compromised as

these values are below the desirable level of 20 percent.

Table 12

DIF for Dichotomous Items, Booklet 1

Item Mantel-Haenszel

Chi-Square

Breslow-Day

Chi-Square Direction of DIF

Combined

Decision

Rule (CDR)

Effect

Size

B3-1 13.91** 0.73 Lower-order strategies DIF Moderate

B4-1 0.37 7.01** Higher-order strategies DIF Small

B5-1 8.42** 1.47 Higher-order strategies DIF Small

F6-1 5.92* 0.87 Higher-order strategies DIF Small

Note. *p < .05, **p < .01

Table 12 lists statistics for the selected response items that were flagged for DIF in

Booklet 1. Only one of the four flagged items detected a moderate effect; the rest represented

small levels of DIF. Also, for most of these items DIF was in favour of the reference group,

except one item (B3) with DIF in favour of the focal group. In other words, most items with

35

differential functioning based on the reported use of reading strategies favoured students who

used higher-order reading strategies, and students who tended to use lower-order strategies

performed better than expected on only one test item out of the four flagged items.

Table 13

DIF for Dichotomous Items, Booklet 2


Chi-Square

Breslow-Day


Combined

Decision

Rule (CDR)

Effect

Size



C9-2 0.00 6.00* Higher-order strategies DIF Small D3-2 5.04* 2.38 Higher-order strategies DIF Small

D6-2 32.75** 7.80** Higher-order strategies DIF Large E4-2 5.95* 0.63 Lower-order strategies DIF Small

E5-2 10.79** 1.03 Higher-order strategies DIF Moderate

F1-2 10.21** 0.66 Higher-order strategies DIF Moderate

Note. *p < .05, **p < .01

Table 13 shows the DIF analysis obtained for Booklet 2. There was a difference in

results between the booklets; DIF for Booklet 2 flagged more items, that is, 8 out of 37 (see

Table 13). Only one of these test items detected a large effect and half of the DIF items detected

a moderate effect; the rest represented small levels of DIF. Also, for most of these items DIF

was in favour of the reference group, except three items (B5, B8, E4) with DIF in favour of the

focal group. In other words, again, most test items functioned in favour of students who used

higher-order reading strategies, and students who tended to use lower-order strategies performed

better than expected on three items out of the eight items flagged for DIF. See Appendix I for

the full list of DIF statistics for dichotomous items, Booklet 1 and Booklet 2.

Interestingly, the differences of the DIF results between the two booklets were found

even for the same items (anchor items, B2, B3, B4, B5, B8 and B10); different anchor items

36

were flagged for DIF. Also, all of the DIF items which favoured the focal group were anchor

items, with an exception of item E4 on Booklet 2. However, on the selected response items

overall, most DIF favoured the students who reported using higher-order reading strategies.

3.4.2 Polytomous Items and DSF

DIF analysis for polytomous items was conducted on all 13 constructed response items,

matching students on the total score (i.e., constructed response subscore). Again, the adequacy

of cell count assumption was verified. For Booklet 1, 24% of all cells (grouping by total score

matching variable) had fewer than five cases and 10% had a count of zero. For Booklet 2, 20%

of all cells had fewer than five cases and 6% had zero. Thus, the adequacy of cell count

assumption was almost met.

Table 14

DIF for Polytomous Items, Booklet 1


Chi-Square

Step(s) Direction of DIF

Combined

Decision Rule

(CDR)

A1-1 0.02 --- --- ---

A2-1 3.57 --- --- ---

A3-1 0.31 --- --- ---

B1-1 0.09 --- --- ---

B6-1 0.01 --- --- ---

B7-1 0.00 --- --- ---

B9-1 0.60 --- --- ---

B11-1 0.03 --- --- ---

D1-1 4.11* 1st Higher-order strategies DIF

D7-1 0.17 --- --- ---

D8-1 7.32** 1st Lower-order strategies DIF

D9-1 2.25 --- --- ---

D11-1 1.71 --- --- ---

Note. *p < .05, **p < .01

37

Table 15

DSF for Polytomous Items, Booklet 1

Item Step CU-LOR Z DSF Size

D1-1 1 0.364 2.881* Small

2 0.229 1.369 Small

3 -0.592 -1.924 Small

D8-1 1 -0.315 -2.061* Small

2 -0.260 -1.938 Small

3 -0.333 -1.234 Small Note. *p < .05

Tables 14 and 15 summarize DIF and DSF results for Booklet 1, respectively. No DIF

was detected for items of Booklet 2 (see Appendix I for the list of statistics). DIF was detected

for only two items on Booklet 1, one in favour of the focal group and another in favour of the

reference group. However, the effect was small and involved only the first step (change in

scoring from 0 to 1).

To summarize, (1) some differences in DIF for dichotomous and polytomous items

existed between the booklets and (2) most of the items flagged for DIF favoured the reference

group (use of higher-order reading strategies).

3.4.3 DIF with Scaled Matching Score, Dichotomous Items

It is conventional to perform DIF analyses using the total score as the matching

variable; however, since the dataset included a scaled reading ability measure, DIF analysis was

rerun using this variable as a matching variable. The scaled score had to be transformed into a

categorical variable before the results could be analyzed in DIFAS. The scaled score, which

ranged from 100 to 800, was divided into 30 equal percentile categories. The assumption of

adequacy of cell count was met. For Booklet 1, 2% of all cells (grouping by scaled score

matching variable) had fewer than five cases and no counts of zero were observed. For Booklet

2, all cells had more than five cases and none had zero.

38

Table 16

DIF for Dichotomous Items with Scaled Matching Score, Booklet 1


Chi-Square

Breslow-Day


Combined

Decision

Rule (CDR)

Effect

Size

B2-1 0.03 0.09 --- --- ---

B3-1 8.16** 0.39 Lower-order strategies DIF Small

B4-1 1.33 1.75 --- --- ---

B5-1 6.96** 4.49* Higher-order strategies DIF Small

B8-1 3.62 0.56 --- --- ---

B10-1 4.94 3.97 --- --- ---

C1-1 0.72 0.06 --- --- --- C2-1 0.91 0.00 --- --- --- C3-1 5.90* 0.64 Higher-order strategies DIF Small

C4-1 0.04 0.21 --- --- ---

C5-1 0.07 0.22 --- --- --- C6-1 0.00 0.11 --- --- ---

C7-1 11.36** 1.26 Higher-order strategies DIF Moderate

C8-1 0.92 2.16 --- --- --- D2-1 0.19 0.62 --- --- ---

D3-1 0.33 5.38* Higher-order strategies DIF Small

D4-1 0.66 0.48 --- --- ---

D5-1 0.14 0.36 --- --- --- D6-1 4.32 2.73 --- --- ---

D10-1 1.67 3.21 --- --- --- E1-1 3.45 4.62 --- --- ---

E2-1 0.19 0.12 --- --- --- E3-1 3.31 2.78 --- --- --- E4-1 8.96** 0.12 Higher-order strategies DIF Small

E5-1 0.00 0.31 --- --- --- E6-1 0.08 1.19 --- --- --- E7-1 1.46 0.09 --- --- --- E8-1 0.76 0.71 --- --- ---

E9-1 0.34 0.05 --- --- --- F1-1 7.23** 2.93 Higher-order strategies DIF Small

F2-1 6.30* 3.15 Higher-order strategies DIF Small

F3-1 0.00 0.32 --- --- --- F4-1 0.88 1.96 --- --- --- F5-1 3.02 6.02* Higher-order strategies DIF Small

F6-1 10.65** 5.39* Higher-order strategies DIF Small

F7-1 010 2.02 --- --- ---

F8-1 0.00 0.00 --- --- ---

Note. *p < .05, **p < .01

39

Table 17

DIF for Dichotomous Items with Scaled Matching Score, Booklet 2


Chi-Square

Breslow-Day


Combined

Decision

Rule (CDR)

Effect

Size

B2-2 2.86 0.32 --- --- ---

B3-2 2.27 0.01 --- --- ---

B4-2 3.53 1.27 --- --- ---

B5-2 29.79** 1.04 Lower-order strategies DIF Large



C1-2 0.06 3.96 --- --- ---




C5-2 3.03 0.82 --- --- --- C6-2 27.87** 0.78 Higher-order strategies DIF Large

C7-2 6.45* 1.56 Higher-order strategies DIF Small C8-2 22.60** 0.00 Higher-order strategies DIF Moderate

C9-2 6.29* 2.67 Higher-order strategies DIF Small

D2-2 25.47** 0.42 Higher-order strategies DIF Large D3-2 29.57** 0.08 Higher-order strategies DIF Moderate

D4-2 29.11** 0.21 Higher-order strategies DIF Large D5-2 4.75 0.07 --- --- ---

D6-2 85.43** 6.60** Higher-order strategies DIF Large D7-2 1.96 0.34 --- --- ---

E1-2 5.75* 1.65 Higher-order strategies DIF Small E2-2 13.51** 0.01 Higher-order strategies DIF Moderate

E3-2 10.14** 0.97 Higher-order strategies DIF Moderate

E4-2 0.43 3.98 --- --- --- E5-2 57.77** 0.05 Higher-order strategies DIF Large

E6-2 2.48 3.37 --- --- --- E7-2 17.16** 3.92 Higher-order strategies DIF Moderate

E8-2 7.82* 1.06 Higher-order strategies DIF Small F1-2 0.13 2.65 --- --- ---



F4-2 3.21 8.40 Higher-order strategies DIF Small F5-2 4.58 1.02 --- --- --- F6-2 4.18 0.02 --- --- --- F7-2 0.02 1.27 --- --- ---


Note. *p < .05, **p < .01

40

Table 16 reports the DIF statistics for all 37 selected response items for Booklet 1. DIF

analysis with the scaled score matching variable rather than the total score flagged 10 out of 37

items (Table 16). Only one of these test items detected a moderate effect; the rest represented

small levels of DIF. Also, for most of these items DIF was in favour of the reference group,

except one item (B3) with DIF in favour of the focal group. In other words, most items

functioned in favour of students who tended to use higher-order reading strategies, and students

who used lower-order strategies performed better than expected on only one item out of the ten

flagged items.

Table 17 shows the DIF analysis results obtained for Booklet 2. There was a large

difference in results between the booklets. DIF for Booklet 2 revealed that 65% of all selected

response items were flagged for DIF. Most of these items showed medium-to-high level effects.

However, similarly to Booklet‟s 1 results, only two items were in favour of the focal group.

Anchor items also demonstrated a different pattern of DIF between the two booklets.

For Booklet 1, DIF was detected for B3 and B5, with small effects. In the case of Booklet 2,

items B5, B8, and B10 were flagged for DIF with small-to-large effects. Interestingly, only one

item in Booklet 1 and two in Booklet 2 (all three items were anchor items) were in favour of the

focal group, those who reported using lower-order reading strategies more often. However, on

all other selected response items, differential item functioning favoured the reference group.

3.4.4 DIF with Scaled Matching Score, Polytomous Items and DSF

In terms of the DIF for polytomous items, the results were similar to the previous

analysis with dichotomous items in that a large difference in the proportion of DIF items was

observed between the two booklets. See Tables 18 and 19 for DIF analyses of polytomous items

for Booklet 1 and Booklet 2, respectively.

41

Table 18

DIF for Polytomous Items with Scaled Matching Score, Booklet 1


Chi-Square


Combined

Decision Rule

(CDR)

A1-1 0.00 --- --- ---

A2-1 0.74 --- --- ---

A3-1 0.58 --- --- ---

B1-1 0.89 --- --- ---

B6-1 3.01 --- --- ---

B7-1 4.49* 3rd

Lower-order strategies DIF

B9-1 2.07 --- --- ---

B11-1 2.42 --- --- ---

D1-1 0.19 --- --- ---

D7-1 1.37 --- --- ---

D8-1 16.45** 1st, 2

nd, 3

rd Lower-order strategies DIF

D9-1 9.92** 1st, 2

nd Lower-order strategies DIF

D11-1 9.25** 1st, 2


Note. *p < .05, **p < .01

Table 19

DIF for Polytomous Items with Scaled Matching Score, Booklet 2


Chi-Square


Combined

Decision Rule

(CDR)

A1-2 16.80** 1st, 2

nd, 3


A2-2 30.40** 1st, 2


A3-2 6.89** 2nd


B1-2 26.97** 1st, 2


B6-2 44.01** 1st, 2

nd, 3


B7-2 47.04** 1st, 2


B9-2 38.49** 1st,

2nd

, 3rd


B11-2 23.17** 1st,

2nd

, 3rd


D1-2 47.92** 1st,

2nd

, 3rd

Lower-order strategies DIF D8-2 43.38** 1

st, 2

nd, 3


D9-2 44.36** 1st, 2


D10-2 45.59** 1st, 2

nd, 3


D11-2 31.29** 1st, 2


Note. **p < .01

42

DIF was detected for four items in Booklet 1 and all of the items in Booklet 2. The

replication of Booklet 2‟s higher rate of DIF with constructed response items confirms that the

two booklets were in fact quite different. As mentioned earlier, lower test performance was

observed for examinees in Booklet 1 compared to Booklet 2; perhaps this lack of equivalency

between the two booklets in terms of performance is implicated in these DIF findings.

Another intriguing finding was that all DIF items, on both booklets, were in favour of

the focal group. For these items lower-order reading strategies facilitated better performance.

One potential explanation may be that selected responses predominantly require reading,

whereas constructed responses involve writing. Perhaps different strategies pertaining to writing

can promote higher scores on these items, but higher-order reading strategies fail to do so.

The results of a follow up differential step functioning analyses are reported in Tables

20 and 21. For Booklet 1, small DSF effects were detected on the four items that demonstrated

DIF, see Table 20. For Booklet 2, moderate-to-large DSF effects were observed for most of the

steps, see Table 21. For both booklets, the majority of the items involved more than one step.

Table 20

DSF for Polytomous Items with Scaled Matching Score, Booklet 1


B7-1 1 -0.278 -1.949 Small

2 -0.066 -0.473 Small

3 -0.674 -2.828* Large

D8-1 1 -0.480 -3.177** Moderate

2 -0.395 -2.979** Small

3 -0.490 -1.843* Moderate

D9-1 1 -0.323 -2.242* Small

2 -0.393 -2.849* Small

3 -0.225 -0.651 Small

D11-1 1 -0.354 -2.526* Small

2 -0.372 -2.550* Small

3 -0.128 -0.352 Small Note. *p < .05, p < .01

43

Table 21

DSF for Polytomous Items with Scaled Matching Score, Booklet 2


A1-2 1 -0.579 -3.760** Moderate

2 -0.313 -2.578* Small

3 -0.484 -2.034* Moderate

A2-2 1 -0.664 -4.972** Large

2 -0.494 -3.950** Moderate

3 -0.517 -1.871 Moderate

A3-2 1 -0.187 -1.672 Small

2 -0.420 -2.863* Small

3 -0.434 -1.517 Small

B1-2 1 -1.018 -4.959** Large

2 -0.440 -3.756** Moderate

3 -0.313 -1.327 Small

B6-2 1 -0.858 -6.032** Large

2 -0.639 -5.147** Moderate

3 -0.491 -2.646* Moderate

B7-2 1 -0.873 -6.334** Large

2 -0.703 -5.608** Large

3 -0.380 -1.620 Small

B9-2 1 -0.712 -5.278** Large

2 -0.551 -4.414** Moderate

3 -0.754 -3.278** Large

B11-2 1 -0.526 -4.120** Moderate

2 -0.484 -3.900** Moderate

3 -0.487 -2.323* Moderate

D1-2 1 -1.067 -5.146** Large

2 -0.640 -5.267** Large

3 -0.621 -3.284** Moderate

D8-2 1 -0.697 -5.163** Large

2 -0.681 -5.400** Large

3 -0.651 -2.004* Large

D9-2 1 -0.848 -4.829** Large

2 -0.759 -6.061** Large

3 -0.294 -1.187 Small

D10-2 1 -0.771 -5.059** Large

2 -0.669 -5.342** Large

3 -0.726 -2.420* Large

D11-2 1 -0.805 -4.562** Large

2 -0.534 -4.535** Moderate

3 -0.288 -1.128 Small

Note. *p < .05, **p < .01

44

An informal review of the content of the test booklets was also performed. The first

question posed was „why do certain items exhibit DIF?‟ Test items appeared to differ along such

dimensions as length, difficulty level, specific/advanced vocabulary, etc. Some of the items

relied primarily on the content of the reading prompt, that is, the answers could be found

directly in the prompt. Other items demanded reflection and connecting many ideas; the answers

to such items were not in the reading prompt.

The second question addressed by the review of the booklets was „why wasn’t DIF

found for other items?‟ This question was trying to get at whether there were any specific

characteristics about other items that “protected” them from differential item functioning

pertaining to reading strategies. Once again, items that were not flagged for DIF were highly

diverse, different in length, difficulty, differences in demand (e.g., content versus reflection),

etc. However, this preliminary content review did not reveal any specific patterns. Therefore,

the question as to „why was performance for some items facilitated by higher-order reading

strategies, but impeded for others?‟ remained.

To summarize, (1) large differences in DIF and DSF existed between the booklets, (2)

on selected response items most of the DIF favoured the reference group (use of higher-order

reading strategies), (3) on constructed response items most of the DIF favoured the focal group

(use of lower-order reading strategies) and (4) different DIF results were obtained for analyses

with total score and scaled score matching variables.

The findings of this study are encouraging even though evidence of DIF was

discovered. Specifically, the presence of DIF within this study does not point to item bias; rather

it provides support that this analysis can be used for other research purposes. Here, it

demonstrates that the general use of various reading strategies is important for achievement

outcomes; the same reading strategies can facilitate or hinder students‟ performance based on

45

context (in this study, answering selected versus constructed response items) and, potentially,

content of the items. In conclusion, to put it simply, knowing and using higher-order reading

strategies is good, but being able to use situation-appropriate strategies and being able to switch

back and forth between different types of strategies is better.

46

4 IMPLICATIONS AND CONCLUSION

Differential item functioning offers a unique way of assessing test fairness and validity.

DIF occurs when test items are differentially difficult for individuals from different groups with

the same ability. Although, this analysis has traditionally been used to identify problematic

items that assessed traits irrelevant to the test, this study demonstrates that it can also be used to

investigate attributes that are important to performance regardless of students‟ ability. Thus, in

addition to examining fairness, DIF can be used as a tool to identify individual differences that

increase (or decrease) the probability of correctly responding to test items.

In this study, DIF analysis was used to examine whether employing different types of

strategies during reading affected students‟ performance on a reading assessment. To

demonstrate that DIF offers a unique perspective on the effects of reading strategies on test

performance, it is useful to contrast DIF findings with a similar analysis that does not account

for underlying reading ability. Chi-square test of independence statistics for each test item

suggested that it was always more advantageous to use higher-order reading strategies than

lower-order strategies on PCAP 2007 reading assessment, with an exception of only one item in

Booklet 2. However, analyzing the results with DIF offers additional information as it

demonstrates that, when taking reading ability into account, the use of higher-order reading

strategies is not always facilitative and suitable because these strategies can facilitate or hinder

the performance on an item depending on the context, and possibly content. That is, using DIF,

this study found that higher-order strategies were only effective for answering selected response

items correctly, but lower-order reading strategies were more helpful when answering

constructed response items.

47

Another reported finding was that the two booklets of the test had different magnitudes

of observed DIF. Again, the interpretation of this finding might be aided by comparing DIF

results to the chi-square statistics. For Booklet 1, without accounting for ability level, students

who reported using higher-order reading strategies performed significantly better on most of the

test items. For Booklet 2, however, the use of higher-order strategies was not significantly

related to answering constructed response items. For both booklets, using lower-order reading

strategies was never related to higher test performance3. Yet, DIF results identified significant

differences in performance between students who reported using higher- and lower-order

strategies for both booklets. The implications of these findings are that (1) the effectiveness of

reading strategies depends on the students‟ reading ability level and (2) demonstrating the

interaction between ability level and the use of reading strategies was made possible by

performing DIF analyses in addition to evaluating the effectiveness of reading strategies on their

own.

The finding that the performance on constructed response items was hindered by

higher-order reading strategies is counterintuitive. Answering these types of items correctly

might be qualitatively different from answering selected response items, because the former

involves a writing component. Therefore, it is possible that other higher-order strategies would

facilitate higher performance on constructed response items, but these strategies might be

specific to cognitive processes and skills that pertain to the writing process rather than reading.

Another interesting finding was the variation of DIF results when different estimates of

reading ability were used. As was shown before (Figure 1 and Appendix C), the total and scaled

scores were not well related within Booklet 2, which might have contributed to these DIF

3 One exception is item B8 on Booklet 2; without accounting for ability level, this selected response item was in the

direction of lower-order reading strategies.

48

results. This finding also points out that decisions such as this, regarding the choice of a

matching variable, must be made carefully. In this study, however, even though the magnitude

of the results was different depending on the estimate of ability, the finding about the direction

of DIF (i.e., better performance on selected response items by students using higher-order

strategies and better performance on constructed response items by students using lower-order

reading strategies) persisted regardless of which matching variable was employed in the

analysis4.

One of the merits of using differential item functioning is that this procedure helps shed

additional light on other conventional analyses and helps to fine-tune the interpretation of the

findings. To conclude, DIF offers a unique approach to studying the relationships of individual

differences with test performance.

4.1 Limitations and Future Directions

A major limitation of this study is that the questions on the student contextual

questionnaire assessing the use of reading strategies asked about general behaviours and

preferences. Thus, an assumption underlying this study was that students who reported using,

for example, higher-order reading strategies more often than lower-order strategies, used these

strategies during the examination as well. Another limitation involves the generalizability of the

results; only PCAP reading assessment data were analyzed in this study. However, the use of

specific reading strategies might affect other domains differently, as was shown here with

writing. Using other datasets, such as PCAP mathematics and/or science assessments, it would

be interesting to investigate whether the results of this study are generalizable to other domains.

4 One exception is item D1 on Booklet 1 for DIF analysis with the total score matching variable; this constructed

response item was in the direction of higher-order reading strategies.

49

Also, replicating the results with an alternative reading assessment would further strengthen the

generalizability of results of this study.

An investigation of the role of metacognitive strategies during reading would also

supplement the findings of this study. As discussed in the literature review, metacognitive

strategies aid in monitoring the effectiveness of cognitive reading strategies on successful

comprehension. Thus, it would be interesting to examine if, and how, metacognitive strategies

affect the choice of cognitive reading strategies, lower- and higher-order strategies. Another

future direction involves performing a content analysis of the test. Specifically, content of the

reading prompts and item stems/options can be classified according to predetermined criteria

such as length, difficulty level, theme, etc, and analyzed for patterns; a procedure that was

beyond the scope of the present study. Exploring the dimensionality of the two versions of the

test with alternative analyses would also strengthen the results found here.

50

REFERENCES

Afflerbach, P., Pearson, P. D., & Paris, S. G. (2008). Clarifying differences between reading

skills and reading strategies. The Reading Teacher, 61(5), 364-373.

Cain, K., Oakhill, J. V., Barnes, M. A., Bryant, P. E. (2001). Comprehension skill, inference-

making ability, and their relation to knowledge. Memory & Cognition, 29(6), 850-

859.

Dole, J. A., Duffy, G. G., Roehler, L. R., & Pearson, P. D. (1991). Moving from the old to the

new: Research on reading comprehension instruction. Review of Educational

Research, 61(2), 239-264.

Dole, J. A., Nokes, J. D., & Drits, D. (2008). Cognitive strategy instruction. In G. G. Duffy & S.

E. Israel (Eds.), Handbook of research on reading comprehension (pp. 347-373).

Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

Fogarty, E. A. (2006). Teachers’ use of differentiated reading strategy instruction for talented,

average, and struggling readers in regular and SEM-R classrooms (Doctoral

dissertation). Retrieved April 27, 2012, from

http://www.gifted.uconn.edu/siegle/Dissertations/Elizabeth%20Fogarty.pdf

Fries, C. C. (1963). Linguistics and reading. New York: Holt, Rinehart & Winston.

Graesser, A. C. (2007). An introduction to strategic reading comprehension. In D. S. McNamara

(Ed.), Reading comprehension strategies: Theories, interventions, and technologies

(pp. 3-26). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

Holland, W. P., & Thayer, D. T. (1988). Differential item performance and the Mantel Haenszel

procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129-145). Hillsdale,

NJ: Lawrence Erlbaum Associates, Inc.

http://www.gifted.uconn.edu/siegle/Dissertations/Elizabeth%20Fogarty.pdf

51

Hoover, W. A., & Gough, P. B. (1990). The simple view of reading. Reading and Writing: An

Interdisciplinary Journal, 2(2), 127-160.

Koda, K. (2005). Insights into second language reading: A cross-linguistic approach. New

York: Cambridge University Press.

Magliano, J. P., Millis, K., Ozuru, Y., & McNamara, D. S. (2007). A multidimensional

framework to evaluate reading assessment tools. In D. S. McNamara (Ed.), Reading

comprehension strategies: Theories, interventions, and technologies (pp. 107-136).

Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

McNamara, D. S. (2007). Reading comprehension strategies: Theories, interventions, and

technologies. Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

McNamara, D. S., O‟Reilly, T., Rowe, M., Boonthum, C., & Levinstein, I. (2007). iSTART: A

web-based tutor that teaches self-explanation and metacognitive reading strategies.

In D. S. McNamara (Ed.), Reading comprehension strategies: Theories,

interventions, and technologies (pp. 397-420). Mahwah, NJ: Lawrence Erlbaum

Associates, Inc.

Mokhtari, K., & Reichard, C. A. (2002). Assessing students‟ metacognitive awareness of

reading strategies. Journal of Educational Psychology, 94(2), 249-259.

Oakhill, J., & Cain, K. (2007). Issues of causality in children‟s reading comprehension. In D. S.

McNamara (Ed.), Reading comprehension strategies: Theories, interventions, and

technologies (pp. 47-71). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

Oakhill, J., & Yuill, N. (1996). Higher order factors in comprehension disability: Processes and

remediation. In C. Cornaldi & J. Oakhill (Eds.), Reading comprehension difficulties:

Processes and intervention (pp. 69-92). Mahwah, NJ: Lawrence Erlbaum

Associates, Inc.

52

Office of Educational Assessment. (2005). ScorePak®: Item Analysis. Seattle, WA: Office of

Educational Assessment. Retrieved August 28, 2012 from

http://www.washington.edu/oea/pdfs/resources/item_analysis.pdf

O‟Reilly, T., & McNamara, D. S. (2007). The impact of science knowledge, reading skill, and

reading strategy knowledge on more traditional “high-stakes” measures of high

school students‟ science achievement. American Educational Research Journal,

44(1), 161-196.

Paris, S. G., Lipson, M. Y., & Wixson, K. K. (1983). Becoming a strategic reader.

Contemporary Educational Psychology, 8, 293-316.

Penfield, R. D. (2005). DIFAS: Differential item functioning analysis system. Applied

Psychological Measurement, 29(2), 150-151.

Penfield, R. D. (2007). DIFAS 4.0 user’s manual. Retrieved October 24, 2011, from

http://www.education.miami.edu/facultysites/penfield/index.html

Penfield, R. D., Gattamorta, K., & Childs, R. A. (2009). An NCME instructional module on

using differential step functioning to refine the analysis of DIF in polytomous items.

Educational Measurement: Issues and Practice, 38-49.

Perfetti, C. (2001). Reading skills. In N. J. Smelser & P. B. Baltes (Eds.), International

encyclopedia of the social & behavioral sciences (pp. 12800-12805). Oxford:

Pergamon.

Rapp, D. N., van den Broek, P., McMaster, K. L., Kendeou, P., & Espin, C. A. (2007). Higher-

order comprehension processes in struggling readers: A perspective for research and

intervention. Scientific Studies of Reading, 11(4), 289-312.

Sweet, A. P., & Snow, C. E. (2003). Rethinking reading comprehension. New York: Guilford

Press.

http://www.washington.edu/oea/pdfs/resources/item_analysis.pdf

http://www.education.miami.edu/facultysites/penfield/index.html

53

VanderVeen, A., Huff, K., Gierl, M., McNamara, D. S., Louwerse, M., & Graesser, A. C.

(2007). Developing and validating instructionally relevant reading competency

profiles measured by the critical reading section of the SAT Reasoning Test. In D. S.

McNamara (Ed.), Reading comprehension strategies: Theories, interventions, and

technologies (pp. 137-172). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

54

APPENDIX A. FACTOR ANALYSES FOR BOOKLET 1 AND BOOKLET 2

3-Factor Model: Reading Strategies, Booklet 1

Questions Pattern

1 2 3







Asking someone to help me .641 -.112









material

.500

2-Factor Model, Booklet 1

Questions Pattern

1 2













55

3-Factor Model: Reading Strategies, Booklet 2

Questions Pattern

1 2 3








Asking someone to help me .598








material

.477

2-Factor Model, Booklet 2

Questions Pattern

1 2













56

APPENDIX B. MEANS, STANDARD DEVIATIONS AND FREQUENCIES OF

STUDENTS’ QUESTIONNAIRE RESPONSES FOR GROUPING VARIABLE

SAMPLE, BOOKLET 1 AND BOOKLET 2

Grouping Variable Sample, Booklet 1 [N = 2667]

Questions

Score Distributions

M SD Rarely or

never Sometimes Often

Higher-order Reading Strategies

Trying to make connections to what I already

know 2.32 0.66 10.9% 45.8% 43.3%

Thinking about the author‟s message 2.04 0.75 26.2% 43.4% 30.4%

Applying what I know about word origins or

word parts 1.98 0.71 26.6% 49.3% 24.1%

Thinking about the other words in a sentence

to figure out the meaning 2.29 0.68 12.9% 45.1% 42.0%

Trying to predict what the material is about 2.11 0.71 20.2% 48.2% 31.6%

Looking for clues such as headings or captions 1.95 0.70 27.3% 50.3% 22.5%

Lower-order Reading Strategies


Sounding out as many words as I can 1.43 0.64 64.4% 27.8% 7.8%


Highlighting or making notes or drawings on

the important parts 1.43 0.67 67.2% 22.7% 10.1%

Using an outside source like a dictionary 1.65 0.70 48.4% 38.2% 13.4%


57

Grouping Variable Sample, Booklet 2 [N = 2623]

Questions

Score Distributions

M SD Rarely or

never Sometimes Often

Higher-order Reading Strategies

Trying to make connections to what I already

know 2.32 0.66 10.5% 46.5% 43.0%

Thinking about the author‟s message 2.00 0.75 28.4% 43.1% 28.5%

Applying what I know about word origins or

word parts 1.96 0.72 27.5% 48.7% 23.8%

Thinking about the other words in a sentence

to figure out the meaning 2.30 0.68 12.4% 45.3% 42.3%

Trying to predict what the material is about 2.12 0.72 20.3% 47.5% 32.2%

Looking for clues such as headings or captions 1.95 0.71 27.7% 49.7% 22.6%

Lower-order Reading Strategies


Sounding out as many words as I can 1.45 0.64 62.9% 28.7% 8.3%


Highlighting or making notes or drawings on

the important parts 1.43 0.67 66.6% 23.6% 9.8%

Using an outside source like a dictionary 1.64 0.69 48.1% 39.8% 12.0%


58

APPENDIX C. THE RELATIONSHIP BETWEEN TOTAL AND SCALED SCORES

FOR BOOKLET 1 AND BOOKLET 2

Booklet 1

59

Booklet 2

60

APPENDIX D. MISSING ITEM DATA FOR BOOKLET 1 AND BOOKLET 2

Booklet 1

Item Frequencies (Percent)

0 1 2 3 Missing


A1-1 6.3 47.5 20.5 5.1 20.6

A2-1 6.8 36.9 30.9 6.3 19.1

A3-1 2.6 17.5 9.7 4.8 65.5

B1-1 9.6 36.2 44.5 6.1 3.6

B6-1 17.1 25.5 38.3 9.4 9.8

B7-1 22.8 23.7 35.4 6.7 11.3

B9-1 24.5 27.7 34.7 5.8 7.3

B11-1 23.7 24.6 32.9 6.9 11.8

D1-1 42.2 24.0 17.3 4.4 12.2

D7-1 20.9 39.5 21.8 3.2 14.6

D8-1 15.2 38.8 31.0 5.5 9.6

D9-1 14.2 35.5 32.0 5.2 13.1

D11-1 16.9 34.7 26.7 4.4 17.3

Item A B C D Missing


B2-1 4.3 91.1 2.9 0.6 1.1

B3-1 2.8 3.6 13.1 79.2 1.3

B4-1 3.6 76.8 7.6 10.9 1.0

B5-1 79.0 6.4 6.6 7.0 1.0

B8-1 80.0 6.1 1.4 10.1 2.5

B10-1 56.6 12.9 9.8 17.7 3.0

C1-1 3.7 91.1 0.9 2.8 1.6

C2-1 6.4 2.3 2.5 87.3 1.4

C3-1 14.0 21.7 6.6 56.3 1.4

C4-1 3.1 87.6 3.0 5.1 1.2

C5-1 13.1 13.3 1.0 71.1 1.5

C6-1 10.5 6.4 7.3 74.4 1.4

C7-1 37.3 10.1 38.3 12.1 2.1

C8-1 2.8 62.2 17.1 16.1 1.8

D2-1 14.3 16.3 32.7 33.0 3.9

D3-1 35.4 5.6 52.8 3.5 2.7

D4-1 13.0 19.5 53.4 11.5 2.6

D5-1 7.8 32.1 46.3 11.4 2.3

D6-1 11.8 5.9 54.9 24.8 2.6

D10-1 21.6 7.4 54.6 10.6 5.9

E1-1 8.3 20.1 6.1 62.4 3.1

E2-1 14.2 16.7 18.9 47.4 2.8

E3-1 13.5 65.8 5.1 12.7 2.9

61


A B C D Missing

E4-1 58.7 13.1 3.7 21.7 2.8

E5-1 17.2 23.2 53.0 3.7 2.9

E6-1 61.0 16.3 11.6 6.4 4.7

E7-1 7.1 14.5 25.6 48.1 4.6

E8-1 7.2 18.5 9.2 60.6 4.4

E9-1 6.8 11.6 71.5 5.6 4.6

F1-1 58.8 15.2 13.1 9.5 3.5

F2-1 3.3 60.3 10.9 22.1 3.4

F3-1 9.2 7.5 2.1 78.0 3.2

F4-1 5.7 21.7 2.9 66.0 3.7

F5-1 23.8 12.0 41.5 18.9 3.8

F6-1 6.7 9.6 72.2 7.9 3.6

F7-1 86.2 7.1 1.7 1.2 3.8

F8-1 30.2 57.1 2.4 6.3 4.0

Booklet 2


0 1 2 3 Missing


A1-2 4.0 46.6 29.0 6.5 13.8

A2-2 2.9 36.9 30.5 5.4 24.3

A3-2 1.4 25.9 12.9 4.0 55.8

B1-2 9.2 36.8 44.8 6.4 2.8

B6-2 18.2 27.2 35.8 10.2 8.6

B7-2 22.5 24.1 35.7 7.8 9.9

B9-2 23.3 29.8 34.3 6.4 6.2

B11-2 23.1 25.3 33.2 8.5 9.9

D1-2 7.3 38.7 38.2 10.3 5.4

D8-2 15.0 36.7 30.9 4.1 13.3

D9-2 8.7 30.4 47.6 6.8 6.5

D10-2 11.7 39.0 34.1 4.5 10.7

D11-2 6.2 34.8 44.3 6.6 8.2

Item A B C D Missing


B2-2 4.4 91.2 3.1 0.5 0.9

B3-2 2.6 4.1 13.3 78.6 1.4

B4-2 3.5 76.3 8.2 11.2 0.9

B5-2 78.7 6.1 7.1 7.3 0.8

B8-2 79.9 6.1 1.7 10.3 2.1

B10-2 56.4 13.4 9.7 17.8 2.7

62


A B C D Missing

C1-2 10.3 9.3 71.2 6.0 3.2

C2-2 87.9 4.3 5.0 1.9 0.9

C3-2 5.5 1.5 4.1 87.8 1.1

C4-2 2.7 5.9 85.8 4.4 1.2

C5-2 2.9 90.1 2.7 3.3 1.0

C6-2 2.5 79.1 12.7 4.6 1.1

C7-2 81.1 10.5 2.2 4.8 1.4

C8-2 3.1 10.4 81.6 3.4 1.6

C9-2 15.6 11.1 68.5 3.2 1.7

D2-2 6.5 6.6 5.4 79.4 2.2

D3-2 31.9 4.8 7.0 53.7 2.6

D4-2 6.8 82.1 3.3 5.6 2.1

D5-2 2.0 3.6 89.1 3.3 2.0

D6-2 8.2 13.0 68.3 8.2 2.3

D7-2 13.1 24.5 13.5 46.2 2.7

E1-2 21.5 2.7 66.4 6.9 2.5

E2-2 73.4 7.6 9.2 6.9 2.9

E3-2 6.4 4.1 2.9 84.0 2.6

E4-2 4.1 79.1 3.6 10.1 3.1

E5-2 12.6 62.3 15.4 7.1 2.6

E6-2 3.8 2.9 13.6 77.0 2.6

E7-2 13.3 70.5 2.2 11.2 2.6

E8-2 6.7 8.7 74.5 7.2 2.8

F1-2 3.3 7.4 74.3 12.7 2.3

F2-2 16.7 2.2 8.6 70.0 2.5

F3-2 76.0 4.7 8.5 8.3 2.5

F4-2 1.4 1.0 1.4 93.9 2.3

F5-2 2.0 2.4 10.9 82.1 2.6

F6-2 15.4 3.1 19.6 58.8 3.2

F7-2 12.9 68.8 5.2 9.9 3.1

F8-2 4.5 23.9 7.4 60.9 3.3

63

APPENDIX E. GROUPING VARIABLE SAMPLE: ITEM STATISTICS FOR

BOOKLET 1 AND BOOKLET 2

Item Statistics, Booklet 1 [N = 2667]


Source p-value rpb

Proportion

Incorrect

Mean

Score

Correct†

Mean

Score

Incorrect

1 Constructed

Response Items

Only

A1-1 .39 .44 .21 43.96† 33.76

2 A2-1 .46 .55 .18 44.48† 29.60

3 A3-1 .22 .35 .61 47.21† 38.44












15

Selected

Response Items

Only

C1-1 .94 .31 .06 42.86 26.81

16 C2-1 .92 .31 .08 43.04 28.79

17 C3-1 .64 .38 .36 45.63 35.16

18 C4-1 .91 .25 .09 42.87 31.49

19 C5-1 .75 .28 .25 43.97 35.25

20 C6-1 .80 .33 .20 44.01 32.95

21 C7-1 .45 .32 .55 46.59 37.95

22 C8-1 .67 .26 .33 44.35 36.74

23 Constructed D1-1 .29 .48 .47 47.23† 35.62

24 Selected D2-1 .38 .23 .62 45.98 39.33

25 Selected D3-1 .59 .30 .41 45.24 36.86

26 Selected D4-1 .56 .22 .44 44.62 38.25

27 Selected D5-1 .54 .40 .46 46.69 36.13

28 Selected D6-1 .63 .37 .37 45.50 35.46

29 Constructed D7-1 .35 .53 .28 45.79† 31.36

30 Constructed D8-1 .43 .48 .19 44.31† 31.47

31 Constructed D9-1 .43 .56 .21 44.79† 30.80

32 Selected D10-1 .59 .22 .41 44.38 38.10

33 Constructed D11-1 .39 .57 .26 45.50† 31.34

34

Selected

Response Items

Only

E1-1 .69 .36 .31 44.96 34.77

35 E2-1 .53 .27 .47 45.30 37.90

36 E3-1 .73 .35 .27 44.64 34.12

37 E4-1 .69 .44 .31 45.63 33.45

64


Source p-value rpb

Proportion

Incorrect

Mean

Score

Correct†

Mean

Score

Incorrect

38

E5-1 .57 .23 .43 44.63 38.10

39 E6-1 .71 .45 .29 45.55 32.92

40 E7-1 .56 .37 .44 46.18 36.28

41 E8-1 .68 .31 .32 44.70 35.77

42 E9-1 .80 .46 .20 44.76 29.93

43

Selected

Response Items

Only

F1-1 .67 .33 .33 44.93 35.42

44 F2-1 .66 .30 .34 44.73 36.16

45 F3-1 .84 .35 .16 43.83 31.58

46 F4-1 .70 .29 .30 44.36 35.83

47 F5-1 .50 .34 .50 46.44 37.28

48 F6-1 .78 .31 .22 44.02 34.16

49 F7-1 .89 .15 .11 42.57 35.56

50 F8-1 .65 .40 .35 45.68 34.71


Item Statistics, Booklet 2 [N = 2623]


Source p-value rpb

Proportion

Incorrect

Mean

Score

Correct†

Mean

Score

Incorrect

1 Constructed

Response Items

Only

A1-2 .42 .42 .18 47.35† 38.68

2 A2-2 .38 .41 .27 48.08† 39.70

3 A3-2 .23 .31 .55 49.19† 43.01












15

Selected

Response Items

Only

C1-2 .74 .11 .26 46.70 43.28

16 C2-2 .92 .18 .08 46.41 39.11

17 C3-2 .92 .25 .08 46.60 36.92

18 C4-2 .89 .20 .11 46.58 39.29

19 C5-2 .94 .18 .06 46.30 38.00

20 C6-2 .84 .20 .16 46.81 40.59

21 C7-2 .84 .13 .16 46.52 42.21

65


Source p-value rpb

Proportion

Incorrect

Mean

Score

Correct†

Mean

Score

Incorrect

22 C8-2 .88 .24 .12 46.79 38.95

23 C9-2 .72 .10 .28 46.66 43.62

24 Constructed D1-2 .50 .50 .12 47.15† 35.50

25 Selected D2-2 .86 .27 .14 46.97 38.61

26 Selected D3-2 .61 .20 .39 47.70 42.84

27 Selected D4-2 .88 .23 .12 46.72 39.16

28 Selected D5-2 .94 .23 .06 46.41 36.35

29 Selected D6-2 .76 .25 .24 47.35 40.88

30 Selected D7-2 .50 .10 .50 47.27 44.37

31 Constructed D8-2 .38 .49 .28 48.43† 39.02

32 Constructed D9-2 .50 .55 .14 47.56† 35.12

33 Constructed D10-2 .41 .52 .21 48.07† 37.10

34 Constructed D11-2 .48 .50 .14 47.40† 35.82

35

Selected

Response Items

Only

E1-2 .72 .19 .28 47.19 42.27

36 E2-2 .77 .17 .23 46.93 42.08

37 E3-2 .89 .24 .11 46.69 38.54

38 E4-2 .81 .14 .19 46.61 42.32

39 E5-2 .70 .26 .30 47.69 41.40

40 E6-2 .83 .21 .17 46.90 40.64

41 E7-2 .78 .21 .22 47.10 41.22

42 E8-2 .80 .25 .20 47.17 40.35

43

Selected

Response Items

Only

F1-2 .81 .19 .19 46.87 41.36

44 F2-2 .78 .16 .22 46.86 42.15

45 F3-2 .82 .22 .18 46.95 40.53

46 F4-2 .96 .27 .04 46.36 32.08

47 F5-2 .87 .24 .13 46.79 38.98

48 F6-2 .65 .15 .35 47.22 43.23

49 F7-2 .73 .14 .27 46.90 42.94

50 F8-2 .66 .13 .34 47.04 43.39


66

Item Discrimination Indices, Booklet 1 [N = 2667]


rpb for constructed

response items

rpb for selected

response items

rpb for all

items


A1-1 .48 --- .44

A2-1 .56 --- .55

A3-1 .38 --- .35

B1-1 .43 --- .42

B6-1 .56 --- .56

B7-1 .60 --- .59

B9-1 .57 --- .57

B11-1 .54 --- .53

D1-1 .49 --- .48

D7-1 .56 --- .53

D8-1 .52 --- .48

D9-1 .59 --- .56

D11-1 .61 --- .57


B2-1 --- .22* .20*

B3-1 --- .18** .18**

B4-1 --- .33 .32

B5-1 --- .42 .41

B8-1 --- .26 .26

B10-1 --- .40 .37

C1-1 --- .34 .31

C2-1 --- .34 .31

C3-1 --- .41 .38

C4-1 --- .31 .25

C5-1 --- .28 .28

C6-1 --- .36 .33

C7-1 --- .34 .32

C8-1 --- .29 .26

D2-1 --- .22* .23*

D3-1 --- .31 .30

D4-1 --- .22* .22*

D5-1 --- .38 .40

D6-1 --- .38 .37

D10-1 --- .19** .22*

E1-1 --- .39 .36

E2-1 --- .28 .27

E3-1 --- .39 .35

E4-1 --- .46 .44

E5-1 --- .26 .23*

E6-1 --- .47 .45

67


rpb for constructed

response items

rpb for selected

response items

rpb for all

items

E7-1 --- .40 .37

E8-1 --- .30 .31

E9-1 --- .47 .46

F1-1 --- .37 .33

F2-1 --- .33 .30

F3-1 --- .38 .35

F4-1 --- .30 .29

F5-1 --- .35 .34

F6-1 --- .33 .31

F7-1 --- .18** .15**

F8-1 --- .40 .40

Note.*rpb is between .200 - .249, **.150 - .199.

Item Discrimination Indices, Booklet 2 [N = 2623]


rpb for constructed

response items

rpb for selected

response items

rpb for all

items


A1-2 .50 --- .42

A2-2 .57 --- .41

A3-2 .39 --- .31

B1-2 .52 --- .42

B6-2 .60 --- .51

B7-2 .63 --- .54

B9-2 .61 --- .52

B11-2 .58 --- .49

D1-2 .60 --- .50

D8-2 .59 --- .49

D9-2 .67 --- .55

D10-2 .64 --- .52

D11-2 .62 --- .50


B2-2 --- .03*** .18**

B3-2 --- .01*** .12***

B4-2 --- .04*** .24*

B5-2 --- .06*** .37

B8-2 --- .03*** .21*

B10-2 --- .30 .16**

C1-2 --- .19** .11***

C2-2 --- .34 .18**

68


rpb for constructed

response items

rpb for selected

response items

rpb for all

items

C3-2 --- .41 .25

C4-2 --- .38 .20*

C5-2 --- .34 .18**

C6-2 --- .35 .20*

C7-2 --- .25 .13***

C8-2 --- .45 .24*

C9-2 --- .21* .10***

D2-2 --- .49 .27

D3-2 --- .37 .20*

D4-2 --- .44 .23*

D5-2 --- .45 .23*

D6-2 --- .44 .25

D7-2 --- .16** .10***

E1-2 --- .34 .19**

E2-2 --- .31 .17**

E3-2 --- .42 .24*

E4-2 --- .22* .14***

E5-2 --- .50 .26

E6-2 --- .37 .21*

E7-2 --- .44 .21*

E8-2 --- .45 .25

F1-2 --- .30 .19**

F2-2 --- .33 .16**

F3-2 --- .37 .22*

F4-2 --- .46 .27

F5-2 --- .40 .24*

F6-2 --- .27 .15**

F7-2 --- .24* .14***

F8-2 --- .29 .13***

Note.*rpb is between .200 - .249, **.150 - .199, ***.000-.149.

69

APPENDIX F. ANCHOR ITEMS ELIMINATED: ITEM DISCRIMINATION FOR

BOOKLET 1 AND BOOKLET 2

Booklet 1


rpb for constructed

response items

rpb for selected

response items

rpb for all

items


A1-1 .49 --- .45

A2-1 .55 --- .55

A3-1 .37 --- .34

D1-1 .51 --- .50

D7-1 .60 --- .56

D8-1 .60 --- .54

D9-1 .65 --- .61

D11-1 .65 --- .59


C1-1 --- .37 .35

C2-1 --- .41 .37

C3-1 --- .40 .37

C4-1 --- .35 .32

C5-1 --- .31 .30

C6-1 --- .40 .39

C7-1 --- .32 .31

C8-1 --- .31 .29

D2-1 --- .24* .25

D3-1 --- .35 .35

D4-1 --- .22* .22*

D5-1 --- .39 .41

D6-1 --- .40 .40

D10-1 --- .24* .27

E1-1 --- .42 .40

E2-1 --- .32 .30

E3-1 --- .41 .40

E4-1 --- .47 .46

E5-1 --- .28 .26

E6-1 --- .48 .48

E7-1 --- .42 .40

E8-1 --- .35 .35

E9-1 --- .50 .49

F1-1 --- .39 .37

F2-1 --- .34 .31

F3-1 --- .45 .43

F4-1 --- .30 .30

70


rpb for constructed

response items

rpb for selected

response items

rpb for all

items

F5-1 --- .35 .35

F6-1 --- .38 .36

F7-1 --- .29 .27

F8-1 --- .43 .42

Booklet 2


rpb for constructed

response items

rpb for selected

response items

rpb for all

items


A1-1 .50 --- .29

A2-1 .55 --- .32

A3-1 .35 --- .21*

D1-1 .59 --- .34

D7-1 .58 --- .34

D8-1 .66 --- .37

D9-1 .63 --- .35

D11-1 .64 --- .34


C1-1 --- .24* .19**

C2-1 --- .42 .33

C3-1 --- .45 .36

C4-1 --- .40 .31

C5-1 --- .42 .32

C6-1 --- .37 .29

C7-1 --- .33 .26

C8-1 --- .50 .40

D2-1 --- .25 .20*

D3-1 --- .53 .42

D4-1 --- .40 .30

D5-1 --- .49 .38

D6-1 --- .55 .42

D10-1 --- .46 .36

E1-1 --- .23* .18**

E2-1 --- .40 .31

E3-1 --- .37 .29

E4-1 --- .49 .38

E5-1 --- .35 .27

E6-1 --- .50 .38

Note.*rpb is between .200 - .249.

71


rpb for constructed

response items

rpb for selected

response items

rpb for all

items

E7-1 --- .44 .34

E8-1 --- .48 .36

E9-1 --- .46 .37

F1-1 --- .38 .30

F2-1 --- .40 .30

F3-1 --- .43 .35

F4-1 --- .52 .40

F5-1 --- .45 .36

F6-1 --- .31 .24*

F7-1 --- .32 .25

F8-1 --- .31 .23*

Note.*rpb is between .200 - .249, **.150 - .199.

72

APPENDIX G. ITEM ANALYSIS BY SECTION FOR BOOKLET 1 AND BOOKLET 2

Booklet 1

Item

Item Discrimination

rpb (item to

section subscale)

rpb (item

to total)

Section A

A1-1_CR .59 .46

A2-1_CR .59 .57

A3-1_CR .43 .35

Section B – Anchor

B1-1_CR .50 .48

B2-1_SR .23* .27

B3-1_SR .17** .20*

B4-1_SR .34 .37

B5-1_SR .43 .46

B6-1_CR .65 .61

B7-1_CR .67 .61

B8-1_SR .30 .32

B9-1_CR .64 .60

B10-1_SR .33 .38

B11-1_CR .56 .55

Section C

C1-1_SR .35 .36

C2-1_SR .44 .39

C3-1_SR .36 .39

C4-1_SR .32 .32

C5-1_SR .29 .31

C6-1_SR .32 .39

C7-1_SR .24* .32

C8-1_SR .28 .29

Section D

D1-1_CR .52 .50

D2-1_SR .63 .25

D3-1_SR .62 .35

D4-1_SR .68 .23*

D5-1_SR .65 .41

D6-1_SR .23* .40

D7-1_CR .30 .56

D8-1_CR .17** .54

D9-1_CR .36 .60

D10-1_SR .34 .26

D11-1_CR .25 .59

Section E

E1-1_SR .38 .39

73

Item

Item Discrimination

rpb (item to

section subscale)

rpb (item

to total)

E2-1_SR .29 .31

E3-1_SR .38 .40

E4-1_SR .42 .46

E5-1_SR .24* .26

E6-1_SR .45 .47

E7-1_SR .39 .40

E8-1_SR .33 .34

E9-1_SR .47 .49

Section F

F1-1_SR .33 .36

F2-1_SR .32 .31

F3-1_SR .42 .42

F4-1_SR .29 .29

F5-1_SR .29 .35

F6-1_SR .34 .35

F7-1_SR .32 .26

F8-1_SR .38 .42

Booklet 2

Item

Item Discrimination

rpb (item to

section subscale)

rpb (item

to total)

Section A

A1-2_CR .60 .38

A2-2_CR .55 .43

A3-2_CR .36 .28

Section B – Anchor

B1-2_CR .50 .41

B2-2_SR .22* .19**

B3-2_SR .16** .12***

B4-2_SR .31 .24*

B5-2_SR .44 .35

B6-2_CR .65 .48

B7-2_CR .67 .50

B8-2_SR .29 .23*

B9-2_CR .62 .48

B10-2_SR .03*** .21*

B11-2_CR .56 .45

Note. CR stands for constructed response items, SR

stands for selected response items.

*rpb is between .200 - .249, **.150 - .199.

74

Item

Item Discrimination

rpb (item to

section subscale)

rpb (item

to total)

Section C

C1-2_SR 0.20* .15**

C2-2_SR 0.39 .25

C3-2_SR 0.44 .28

C4-2_SR 0.37 .24*

C5-2_SR 0.40 .25

C6-2_SR 0.36 .23*

C7-2_SR 0.34 .19**

C8-2_SR 0.46 .31

C9-2_SR 0.24* .15**

Section D

D1-2_CR .54 .47

D2-2_SR .18** .32

D3-2_SR .13*** .23*

D4-2_SR .16** .29

D5-2_SR .18** .31

D6-2_SR .17** .28

D7-2_SR .08*** .14***

D8-2_CR .52 .46

D9-2_CR .61 .51

D10-2_CR .57 .48

D11-2_CR .57 .47

Section E

E1-2_SR .38 .23*

E2-2_SR .35 .23*

E3-2_SR .47 .29

E4-2_SR .32 .21*

E5-2_SR .46 .29

E6-2_SR .41 .26

E7-2_SR .45 .27

E8-2_SR .43 .28

Section F

F1-2_SR .35 .23*

F2-2_SR .34 .23*

F3-2_SR .39 .26

F4-2_SR .51 .30

F5-2_SR .42 .28

F6-2_SR .29 .19**

F7-2_SR .29 .20*

F8-2_SR .28 .17**

Note. CR stands for constructed response items, SR

stands for selected response items.

*rpb is between .200 - .249, **.150 - .199, ***.000-.149.

75

APPENDIX H. CHI-SQUARE ANALYSES FOR READING STRATEGIES BY ITEM

FOR BOOKLET 1 AND BOOKLET 2

Chi-Square Tests for Selected Response Items, Booklet 1

Items

Higher-order Strategies Lower-order Strategies

p-value Frequencies (Percent) Frequencies (Percent)

Correct Incorrect Correct Incorrect

B2-1 94.2 5.8 89.3 10.7 14.18 .00**

B3-1 82.2 17.8 82.2 17.8 0.00 .10

B4-1 83.5 16.5 68.7 31.3 52.41 .00**

B5-1 88.7 11.3 68.9 31.1 114.80 .00**

B8-1 86.5 13.5 74.0 26.0 44.10 .00**

B10-1 67.7 32.3 46.3 53.7 72.27 .00**

C1-1 94.5 5.5 88.8 11.2 19.92 .00**

C2-1 93.2 6.8 82.9 17.1 50.55 .00**

C3-1 67.5 32.5 44.5 55.5 83.42 .00**

C4-1 91.9 8.1 85.4 14.6 18.49 .00**

C5-1 77.2 22.8 66.7 33.3 21.98 .00**

C6-1 82.4 17.6 69.4 30.6 39.12 .00**

C7-1 48.3 51.7 27.2 72.8 66.20 .00**

C8-1 69.1 30.9 55.7 44.3 29.60 .00**

D2-1 39.0 61.0 30.1 69.9 12.23 .00**

D3-1 61.8 38.2 46.3 53.7 36.31 .00**

D4-1 57.4 42.6 50.2 49.8 7.61 .01**

D5-1 56.7 43.3 40.2 59.8 40.24 .00**

D6-1 66.8 33.2 45.9 54.1 69.28 .00**

D10-1 60.7 39.3 52.7 47.3 9.73 .00**

E1-1 72.5 27.5 52.7 47.3 67.12 .00**

E2-1 55.2 44.8 42.2 57.8 24.80 .00**

E3-1 76.3 23.7 58.0 42.0 62.43 .00**

E4-1 72.9 27.1 47.5 52.5 110.47 .00**

E5-1 58.7 41.3 48.9 51.1 14.42 .00**

E6-1 73.7 26.3 54.3 45.7 66.09 .00**

E7-1 59.2 40.8 39.7 60.3 56.46 .00**

E8-1 69.8 30.2 57.8 42.2 24.14 .00**

E9-1 83.4 16.6 64.2 35.8 85.51 .00**

F1-1 70.7 29.3 50.7 49.3 66.45 .00**

F2-1 69.3 30.7 50.2 49.8 59.26 .00**

F3-1 85.6 14.4 73.7 26.3 37.97 .00**

F4-1 72.6 27.4 58.4 41.6 35.29 .00**

F5-1 52.8 47.2 33.8 66.2 52.69 .00**

F6-1 80.9 19.1 61.6 38.4 78.77 .00**

F7-1 90.4 9.6 84.7 15.3 12.36 .00**

F8-1 67.7 32.3 50.2 49.8 49.28 .00**

Note. **p < .01

76

Chi-Square Tests for Selected Response Items, Booklet 2

Note. *p < .05, **p < .01

Items



Correct Incorrect Correct Incorrect

B2-2 91.6 8.4 91.7 8.3 0.01 .93

B3-2 79.0 21.0 79.9 20.1 0.19 .66

B4-2 76.7 23.3 75.4 24.6 0.32 .57

B5-2 79.4 20.6 83.0 17.0 2.82 .09

B8-2 80.4 19.6 84.9 15.1 4.70 .03*

B10-2 67.4 32.6 52.0 48.0 36.71 .00**

C1-2 74.8 25.2 70.2 29.8 3.84 .05*

C2-2 93.5 6.5 83.2 16.8 50.09 .00**

C3-2 93.8 6.2 82.0 18.0 65.54 .00**

C4-2 91.1 8.9 80.4 19.6 43.34 .00**

C5-2 95.0 5.0 89.4 10.6 20.74 .00**

C6-2 86.6 13.4 70.0 30.0 72.55 .00**

C7-2 85.1 14.9 76.1 23.9 20.86 .00**

C8-2 90.0 10.0 74.9 25.1 73.29 .00**

C9-2 73.5 26.5 63.8 36.2 16.60 .00**

D2-2 88.8 11.2 72.1 27.9 82.93 .00**

D3-2 64.8 35.2 41.8 58.2 78.80 .00**

D4-2 90.5 9.5 75.2 24.8 79.05 .00**

D5-2 95.2 4.8 88.2 11.8 31.69 .00**

D6-2 80.9 19.1 51.5 48.5 168.64 .00**

D7-2 51.1 48.9 43.0 57.0 9.23 .00**

E1-2 74.1 25.9 61.2 38.8 29.35 .00**

E2-2 79.2 20.8 64.8 35.2 41.71 .00**

E3-2 91.0 9.0 79.9 20.1 44.86 .00**

E4-2 81.8 18.2 79.2 20.8 1.55 .21

E5-2 74.7 25.3 46.3 53.7 136.57 .00**

E6-2 84.2 15.8 74.2 25.8 24.68 .00**

E7-2 80.8 19.2 63.8 36.2 59.82 .00**

E8-2 82.4 17.6 68.1 31.9 45.67 .00**

F1-2 81.5 18.5 77.1 22.9 4.49 .03*

F2-2 80.5 19.5 62.9 37.1 63.86 .00**

F3-2 84.7 15.3 70.0 30.0 52.69 .00**

F4-2 97.0 3.0 91.5 8.5 29.58 .00**

F5-2 89.0 11.0 79.0 21.0 32.81 .00**

F6-2 66.5 33.5 55.1 44.9 20.41 .00**

F7-2 73.4 26.6 68.1 31.9 5.05 .03*

F8-2 69.2 30.8 51.1 48.9 52.39 .00**

77

Chi-Square Tests for Constructed Response Items, Booklet 1

Items



0 1 2 3 0 1 2 3

A1-1 19.3 46.4 26.5 7.8 29.2 51.1 16.7 3.0 45.30 .00**

A2-1 15.0 36.3 38.9 9.8 32.2 38.6 25.1 4.1 93.65 .00**

A3-1 59.5 19.9 13.0 7.6 70.8 15.8 9.6 3.9 21.41 .00**

B1-1 8.3 32.3 49.8 9.5 15.8 40.0 40.0 4.3 45.91 .00**

B6-1 17.9 23.0 45.1 14.0 31.5 29.2 33.6 5.7 70.87 .00**

B7-1 22.5 23.1 44.2 10.1 40.4 27.2 25.8 6.6 83.31 .00**

B9-1 21.6 26.4 42.9 9.0 38.8 29.2 28.5 3.4 78.14 .00**

B11-1 25.1 22.4 42.3 10.2 39.5 28.5 26.5 5.5 64.93 .00**

D1-1 42.8 27.9 23.1 6.2 65.3 20.1 10.5 4.1 78.32 .00**

D7-1 23.8 45.9 25.6 4.7 46.1 34.0 16.9 3.0 91.90 .00**

D8-1 17.8 39.1 36.0 7.0 26.9 41.6 26.9 4.6 28.48 .00**

D9-1 19.0 34.8 38.6 7.6 32.2 35.8 29.0 3.0 51.10 .00**

D11-1 23.5 36.3 33.2 7.0 38.6 35.6 23.3 2.5 55.47 .00**

Note. **p < .01

Chi-Square Tests for Constructed Response Items, Booklet 2

Items



0 1 2 3 0 1 2 3

A1-2 18.0 45.7 29.7 6.6 16.5 50.1 27.2 6.1 2.81 .42

A2-2 27.7 36.4 30.4 5.5 23.9 41.6 30.0 4.5 5.20 .16

A3-2 54.3 26.9 14.0 4.8 56.3 25.1 14.7 4.0 1.24 .74

B1-2 11.6 35.9 45.7 6.8 8.3 39.5 46.3 5.9 5.24 .16

B6-2 26.6 26.0 36.4 11.0 22.7 30.5 36.4 10.4 4.87 .18

B7-2 31.2 24.2 36.1 8.5 28.4 26.7 38.8 6.1 4.92 .18

B9-2 28.9 29.6 34.8 6.8 27.4 32.9 32.6 7.1 2.05 .56

B11-2 31.9 24.2 34.5 9.4 31.7 27.0 33.3 8.0 1.90 .60

D1-2 11.9 39.1 38.0 11.0 9.2 41.1 39.0 10.6 2.73 .44

D8-2 28.1 35.9 31.6 4.4 26.2 36.6 33.8 3.3 2.08 .56

D9-2 14.2 31.0 46.8 8.0 13.5 30.5 50.6 5.4 4.38 .22

D10-2 20.7 41.0 33.6 4.6 19.9 40.9 35.2 4.0 0.65 .88

D11-2 13.8 35.5 43.3 7.5 13.0 36.4 45.4 5.2 3.17 .37

78

APPENDIX I: DIF WITH TOTAL MATCHING SCORE FOR BOOKLET 1 AND

BOOKLET 2

DIF for Dichotomous Items with Total Matching Score, Booklet 1


Chi-Square

Breslow-Day


Combined

Decision

Rule (CDR)

Effect

Size

B2-1 0.01 0.76 --- --- ---


B4-1 0.37 7.01** Higher-order strategies DIF Small


B8-1 1.87 0.04 --- --- ---

B10-1 0.92 2.91 --- --- --- C1-1 1.65 1.15 --- --- ---

C2-1 0.35 0.29 --- --- --- C3-1 1.58 0.35 --- --- --- C4-1 2.05 0.09 --- --- ---

C5-1 0.51 0.01 --- --- --- C6-1 0.24 0.92 --- --- ---

C7-1 3.24 0.06 --- --- --- C8-1 0.12 1.94 --- --- --- D2-1 1.48 0.17 --- --- ---

D3-1 0.04 2.48 --- --- --- D4-1 4.15 1.91 --- --- ---

D5-1 0.98 0.32 --- --- --- D6-1 1.36 0.16 --- --- ---

D10-1 1.27 0.52 --- --- --- E1-1 1.00 0.57 --- --- ---

E2-1 0.81 1.05 --- --- --- E3-1 0.26 0.27 --- --- --- E4-1 4.43 0.21 --- --- --- E5-1 3.47 0.02 --- --- --- E6-1 0.27 1.04 --- --- ---

E7-1 0.10 2.41 --- --- --- E8-1 1.35 0.11 --- --- --- E9-1 0.37 1.00 --- --- ---

F1-1 1.26 2.46 --- --- --- F2-1 1.44 0.02 --- --- --- F3-1 0.53 2.73 --- --- --- F4-1 0.00 0.22 --- --- ---

F5-1 0.10 1.53 --- --- --- F6-1 5.92* 0.87 Higher-order strategies DIF Small

F7-1 0.01 0.02 --- --- ---

F8-1 0.27 0.67 --- --- ---

Note. *p < .05, **p < .01

79

DIF for Dichotomous Items with Total Matching Score, Booklet 2


Chi-Square

Breslow-Day


Combined

Decision

Rule (CDR)

Effect

Size

B2-2 1.58 2.05 --- --- ---

B3-2 2.56 1.26 --- --- ---

B4-2 1.48 2.96 --- --- ---



B10-2 0.01 0.01 --- --- ---

C1-2 3.30 3.83 --- --- ---

C2-2 3.32 0.23 --- --- ---

C3-2 3.40 0.16 --- --- --- C4-2 0.13 1.17 --- --- --- C5-2 0.01 1.77 --- --- --- C6-2 4.35 0.70 --- --- ---

C7-2 0.01 1.99 --- --- --- C8-2 1.31 0.55 --- --- ---

C9-2 0.00 6.00* Higher-order strategies DIF Small D2-2 1.99 0.29 --- --- --- D3-2 5.04* 2.38 Higher-order strategies DIF Small

D4-2 4.92 0.89 --- --- --- D5-2 0.09 0.16 --- --- ---

D6-2 32.75** 7.80** Higher-order strategies DIF Large D7-2 0.46 0.17 --- --- ---

E1-2 0.31 0.03 --- --- --- E2-2 0.89 1.40 --- --- ---

E3-2 0.02 1.89 --- --- --- E4-2 5.95* 0.63 Lower-order strategies DIF Small E5-2 10.79** 1.03 Higher-order strategies DIF Moderate

E6-2 1.32 1.14 --- --- --- E7-2 0.02 1.08 --- --- ---

E8-2 0.63 0.05 --- --- --- F1-2 10.21** 0.66 Higher-order strategies DIF Moderate

F2-2 4.20 0.65 --- --- --- F3-2 0.69 2.50 --- --- --- F4-2 0.07 1.31 --- --- ---

F5-2 0.56 1.14 --- --- --- F6-2 0.93 2.92 --- --- ---

F7-2 4.83 0.33 --- --- --- F8-2 2.44 0.93 --- --- ---

Note. *p < .05, **p < .01

80

DIF for Polytomous Items with Total Matching Score, Booklet 2


Chi-Square


Combined

Decision Rule

(CDR)

A1-2 0.27 --- --- ---

A2-2 0.27 --- --- ---

A3-2 0.15 --- --- ---

B1-2 0.28 --- --- ---

B6-2 0.30 --- --- ---

B7-2 0.02 --- --- ---

B9-2 0.00 --- --- ---

B11-2 0.75 --- --- ---

D1-2 0.51 --- --- ---

D8-2 0.29 --- --- ---

D9-2 0.56 --- --- ---

D10-2 0.08 --- --- ---

D11-2 0.59 --- --- ---

DIFFERENCES IN READING STRATEGIES AND DIFFERENTIAL …...excelled, which were selected response...

Documents

Transcript of DIFFERENCES IN READING STRATEGIES AND DIFFERENTIAL …...excelled, which were selected response...