DIFFERENCES IN READING STRATEGIES AND DIFFERENTIAL …...excelled, which were selected response...
Transcript of DIFFERENCES IN READING STRATEGIES AND DIFFERENTIAL …...excelled, which were selected response...
DIFFERENCES IN READING STRATEGIES AND
DIFFERENTIAL ITEM FUNCTIONING ON PCAP
2007 READING ASSESSMENT
by
Tanya Scerbina
A thesis submitted in conformity with the requirements
for the degree of Master of Arts
Department of Human Development and Applied
Psychology
Ontario Institute for Studies in Education
University of Toronto
© Copyright by Tanya Scerbina 2012
ii
DIFFERENCES IN READING STRATEGIES AND
DIFFERENTIAL ITEM FUNCTIONING ON PCAP
2007 READING ASSESSMENT
Master of Arts 2012
Tanya Scerbina
Department of Human Development and Applied
Psychology
University of Toronto
Abstract
Pan-Canadian Assessment Program (PCAP) 2007 reading ability item data and contextual data
on reading strategies were analyzed to investigate the relationship between self-reported reading
strategies and item difficulty. Students who reported using higher- or lower-order strategies
were identified through a factor analysis. The purpose of this study was to investigate whether
students with the same underlying reading ability but who reported using different reading
strategies found the items differentially difficult. Differential item functioning (DIF) analyses
identified the items on which students who tended to use higher-order reading strategies
excelled, which were selected response items, but students who preferred using lower-order
strategies found these items more difficult. The opposite pattern was found for constructed
response items. The results of the study suggest that DIF analyses can be used to investigate
which reading strategies are related to item difficulty when controlling for students‟ level of
ability.
iii
ACKNOWLEDGMENTS
My deepest gratitude goes to my supervisor, Dr. Ruth Childs, for her expertise,
guidance and support throughout my Master‟s program, as well as for her insight and feedback
in the process of completing this thesis. I would also like to thank Monique Herbert, the second
member of my supervisory committee, for being an invaluable mentor to me in my program of
study.
The completion of this thesis would not have been possible without the statistical
expertise of Olesya Falenchuk and professional input of Pierre Brochu; I thank both for their
patience and ongoing assistance. I would also like to express my gratitude to the employees of
the Canadian Ministers of Education, Canada, specifically Kathryn O‟Grady and Pierre Brochu
for providing me with indispensable resources and services.
Finally, I would like to thank all of the members of Datahost and my classmates;
especially Amanda, Christie, Jayme and Marija for giving excellent advice. Special thank-you
goes to my family and friends for their immense support and encouragement.
iv
TABLE OF CONTENTS
Abstract ..................................................................................................................................... ii
Acknowledgments ................................................................................................................... iii
Table of Contents ..................................................................................................................... iv
List of Tables ........................................................................................................................... vi
List of Figures ........................................................................................................................ viii
1 Introduction............................................................................................................................ 1
1.1 Differential Item Functioning .......................................................................................... 1
1.2 Reading Process and Strategies ....................................................................................... 1
1.3 Objectives ........................................................................................................................ 6
2 Method ................................................................................................................................... 8
2.1 Data .................................................................................................................................. 8
2.2 Grouping Variable ........................................................................................................... 9
2.3 Analyses ......................................................................................................................... 15
3 Results and Discussion ........................................................................................................ 17
3.1 Score Distributions ........................................................................................................ 17
3.2 Classical Item Analysis.................................................................................................. 19
3.3 Reading Strategies and Test Scores ............................................................................... 32
3.4 DIF and DSF Analyses .................................................................................................. 33
3.4.1 Dichotomous Items ............................................................................................... 33
3.4.2 Polytomous Items and DSF .................................................................................. 36
3.4.3 DIF with Scaled Matching Score, Dichotomous Items ........................................ 37
3.4.4 DIF with Scaled Matching Score, Polytomous Items and DSF ........................... 40
v
4 Implications and Conclusion ............................................................................................... 46
4.1 Limitations and Future Directions ................................................................................. 48
References................................................................................................................................ 50
Appendix A. Factor Analyses for Booklet 1 and Booklet 2 .................................................... 54
Appendix B. Means, Standard Deviations and Frequencies of Students‟ Questionnaire
Responses for Grouping Variable Sample, Booklet 1 and Booklet 2...................................... 56
Appendix C. The Relationship between Total and Scaled Scores for Booklet 1 and Booklet 2
................................................................................................................................................. 58
Appendix D. Missing Item Data for Booklet 1 and Booklet 2 ................................................ 60
Appendix E. Grouping Variable Sample: Item Statistics for Booklet 1 and Booklet 2 .......... 63
Appendix F. Anchor Items Eliminated: Item Discrimination for Booklet 1 and Booklet 2 .... 69
Appendix G. Item Analysis be Section for Booklet 1 and Booklet 2 ...................................... 72
Appendix H. Chi-Square Analyses for Reading Strategies by Item, Booklet 1 and Booklet 2
................................................................................................................................................. 75
Appendix I. DIF with Total Matching Score for Booklet 1 and Booklet 2 ............................. 78
vi
LIST OF TABLES
Table 1 Student Questionnaire: Assessment of Reading Strategies .......................................... 9
Table 2 Means, Standard Deviations and Frequencies of Students’ Questionnaire Responses,
Booklet 1 .................................................................................................................................. 10
Table 3 Means, Standard Deviations and Frequencies of Students’ Questionnaire Responses,
Booklet 2 .................................................................................................................................. 11
Table 4 3-Factor Model: Reading Strategies .......................................................................... 13
Table 5 2-Factor Model ........................................................................................................... 14
Table 6 Item Statistics, Booklet 1 ............................................................................................ 20
Table 7 Item Statistics, Booklet 2 ............................................................................................ 21
Table 8 Distractor Analysis, Booklet 1 .................................................................................... 24
Table 9 Distractor Analysis, Booklet 2 .................................................................................... 25
Table 10 Item Discrimination by Subscores, Booklet 1 ........................................................... 27
Table 11 Item Discrimination by Subscores, Booklet 2 ........................................................... 28
Table 12 DIF for Dichotomous Items, Booklet 1 ..................................................................... 34
Table 13 DIF for Dichotomous Items, Booklet 2 ..................................................................... 35
Table 14 DIF for Polytomous Items, Booklet 1 ....................................................................... 36
Table 15 DSF for Polytomous Items, Booklet 1 ...................................................................... 37
Table 16 DIF for Dichotomous Items with Scaled Matching Score, Booklet 1 ....................... 38
Table 17 DIF for Dichotomous Items with Scaled Matching Score, Booklet 2 ....................... 39
Table 18 DIF for Polytomous Items with Scaled Matching Score, Booklet 1 ......................... 41
Table 19 DIF for Polytomous Items with Scaled Matching Score, Booklet 2 ......................... 41
Table 20 DSF for Polytomous Items with Scaled Matching Score, Booklet 1 ........................ 42
vii
Table 21 DSF for Polytomous Items with Scaled Matching Score, Booklet 2 ........................ 43
viii
LIST OF FIGURES
Figure 1. The relationship between PCAP 2007 Reading Assessment total score and IRT scaled
score. ...................................................................................................................................... 18
1
1 INTRODUCTION
1.1 Differential Item Functioning
Test items exhibit differential item functioning (DIF) when individuals with the same
level of ability belonging to different groups have different probability of responding correctly
(Holland & Thayer, 1988). Traditionally this procedure, called item bias analysis, has been used
to identify unfair test items, which exhibited differential difficulty for individuals having the
same level of knowledge or ability. Test items were said to be potentially biased when
respondents found them differentially difficult depending on characteristics irrelevant to
performance, such as gender, ethnicity, or disability (Holland & Thayer, 1988). However, this
analysis can also be performed to investigate if and to what extent any two groups demonstrate
differential item functioning on the test after matching individuals on their ability level.
The purpose of this study is to demonstrate that DIF analysis can be used to detect
differences in item difficulty for groups of students who tend to use different strategies during
reading. Results from such analysis can suggest which reading strategies may be most effective,
and thus inform educators about optimal choices for reading strategies‟ instruction.
1.2 Reading Process and Strategies
Reading is a complex information processing system, an interaction between such
mental operations as attention, perception, memory and thought, language acquisition and
retention, and other cognitive processes (Koda, 2005). According to a prevailing view, reading
is multidimensional as it involves the reader, text and nature of the reading activity within the
reader‟s sociocultural context, prior knowledge and experience (Magliano, Millis, Ozuru, &
McNamara, 2007). It starts with the reader (1) attending to and perceiving visual input, (2)
2
identifying words using phonological decoding, prior knowledge and context, (3) syntactically
integrating words into sentences, (4) interpreting sentences by semantically integrating words
into the overall message, (5) integrating sentences into bigger units of meaning, such as
paragraphs, and finally, but not necessarily, (6) making inferences of the implied meaning of the
text to establish a deeper understanding of the reading material (McNamara, O‟Reilly, Rowe,
Boonthum, & Levinstein, 2007; Perfetti, 2001).
In the literature, reading is referred to as a highly complex set of mental processes
(Magliano et al., 2007; Perfetti, 2001). In 1963, however, Charles Fries offered an alternative
way of conceptualizing reading in his simple view of reading. According to this view, successful
reading ability is made up of only two components, decoding and linguistic comprehension (or
more commonly referred to in literature today as word recognition and reading comprehension).
Whereas the first component transforms printed letters into mental representations of words, the
second component of reading comprehension integrates these disparate representations into a
meaningful whole (Sweet & Snow, 2003). Fries did not deny the complexity of reading process,
he argued that all other mental operations involved in reading, other than decoding and
linguistic comprehension, are also developed and accessible to individuals who cannot read
(Fries, 1963; Hoover & Gough, 1990).
Over the years, other researchers have elaborated on these two processes of reading.
Word recognition involves phonological awareness and decoding, vocabulary knowledge,
fluency and semantic access (Koda, 2005; VanderVeen, Huff, Gierl, McNamara, Louwerse, &
Graesser, 2007), and therefore entails some degree of semantic processing as it involves an
integration of lexical and contextual information because words‟ precise meaning is deeply
rooted in context. On the other hand, reading comprehension is directly related to semantic
processing and incorporates an array of integrative, interpretive and inferential abilities and
3
skills, such as activation of background knowledge, comprehension monitoring, inference and
prediction making, integration of multiple sources of information, understanding text structure,
and other processes (McNamara et al., 2007; Oakhill & Cain, 2007; VanderVeen et al., 2007). It
is worth noting that in this paper this model of reading is further simplified as other factors
necessary for successful reading are not discussed. For instance, other underlying abilities and
skills required for successful decoding are print awareness, alphabetic knowledge, individual
differences, etc. Motivation and working memory capacity also represent constraints for word
recognition and reading comprehension (VanderVeen et al., 2007).
Successful reading and superior comprehension occur when both word recognition and
comprehension of the text are accomplished in unison (McNamara et al., 2007). Thus, a
successful reader is the one who correctly and rapidly decodes words and coherently integrates
all mental representations into the overall meaning of the text (Magliano et al., 2007; Oakhill &
Cain, 2007). However, in some cases reading components of word recognition and
comprehension are dissociated. Although most good comprehenders are also good word
decoders, the relationship between these two processes is not necessarily sequential (Oakhill &
Cain, 2007). For example, dyslexic individuals struggle with word recognition, but frequently
achieve deep comprehension, and hyperlexic children can achieve superior decoding in the
absence of prior training, but frequently encounter comprehension failures (Hoover & Gough,
1990). Based on these and similar findings, Rapp and colleagues (2007) concluded that these
components develop simultaneously and independently, suggesting that teaching reading
comprehension skills is likely to be effective regardless of students‟ proficiency in decoding
(Rapp, van den Broek, McMaster, Kendeou, & Espin, 2007).
However, Fries‟ dual-process classification of reading does more than simplify the
reading process. It is convenient to dichotomize reading into decoding and comprehension
4
because these processes correspond to distinct cognitive classes reported in the literature, lower-
and higher-level cognitive abilities (Cain, Oakhill, Barnes, & Bryant, 2001; Graesser, 2007;
Magliano et al., 2007; McNamara, 2007; Oakhill & Cain, 2007; Oakhill & Yuill, 1996; Rapp et
al., 2007; VanderVeen et al., 2007). That is, (1) the processes making up the decoding
component correspond to lower-level processes such as explicit words/text decoding abilities
and (2) the processes involved in comprehension encompass higher-level processes such as
complex cognitive integration and inferential abilities.
In the last decade, most of the research on reading has focused on reading
comprehension in order to isolate key higher-level cognitive processes involved in meaning
construction (McNamara, 2007), as these processes are positively associated with superior
reading ability indices and academic achievement in general (Graesser, 2007; Magliano et al.,
2007). Also, in the last two decades, the abundance of reading comprehension research has
translated into application and practice; today, reading comprehension and meaning construction
instruction is widespread in educational settings (Dole, Duffy, Roehler, & Pearson, 1991; Dole,
Nokes, & Drits, 2008). Teaching effective reading comprehension skills is fundamental to
children‟s academic performance as these skills expand readers‟ capability to successfully
understand, derive and construct meaning in reading and in general.
A related concept to comprehension instruction is that of reading strategies, which are
equally important in teaching children to become effective readers. Reading strategies are
defined as “deliberate, goal-directed attempts to control and modify the reader‟s efforts to
decode text, understand words, and construct meanings of text” (Afflerbach, Pearson, & Paris,
2008, p. 368). In literature, the terms reading skills and reading strategies are often used
interchangeably, however there are important distinctions. Acquired reading skills are automatic
5
actions pertaining to reading proficiency, fluency, and comprehension, which require no
conscious awareness.
Paris and colleagues (1983) noted that reading instruction involves a progression from
effortful behaviour and use of reading strategies–whether simple word recognition or complex
meaning comprehension techniques–to acquiring automatic reading skill sets (Paris, Lipson, &
Wixson, 1983). In fact, the same action can be a skill or a strategy for different individuals
depending on the person‟s reading proficiency and context (Afflerbach et al., 2008). Therefore,
being an effective reader does not simply involve using reading strategies, but rather gauging
the situation, monitoring the effectiveness of behaviour, and adapting strategies that are
appropriate to the reading material at hand. Therefore, not all reading strategies are effective in
any given situation. An interesting finding regarding effective readers indicates that their use of
reading strategies is more purposeful and varied than that of an average reader (Fogarty, 2006).
Another distinction is that effective readers are more metacognitively aware and therefore are
better at recognizing when a reading strategy is no longer effective (McNamara et al., 2007;
Mokhtari & Reichard, 2002; Oakhill & Yuill, 1996). The distinction between cognitive and
metacognitive reading strategies is often made in the literature. Whereas cognitive strategies are
related to dealing directly with the content of the material to assist with comprehension,
metacognitive strategies refer to being aware of the reading process, being able to recognize
difficulties, and being able to modify behaviour to facilitate comprehension (Afflerbach et al.,
2008).
Cognitive reading strategies cited in the literature encompass an array of techniques
that help readers with word recognition and comprehension. In fact, there are strategies that deal
with explicit techniques to aid word/text decoding and strategies that help with meaning
integration and meaning construction (McNamara et al., 2007; Oakhill & Cain, 2007). Just as
6
word recognition and comprehension components of reading correspond to lower-level and
higher-level cognitive processes, respectively, various reading strategies can also be
conceptualized as lower- versus higher-level strategies based on what component of reading
they address. To differentiate cognitive reading processes and cognitive reading strategies in this
paper, the terms „lower-level‟ and „higher-level‟ refer to cognitive processes during reading,
whereas the terms „lower-order‟ and „higher-order‟ refer to cognitive reading strategies. Thus,
lower-order strategies address word recognition and include such techniques as defining
unfamiliar words or assimilating new words into existing vocabulary, whereas higher-order
strategies address reading comprehension and involve such techniques as connecting themes
within the text or inferring the author‟s message.
The use of reading strategies has been shown to significantly predict achievement
outcomes evaluated by large-scale assessments (O‟Reilly & McNamara, 2007). The results
suggest that even basic strategies can facilitate comprehension and positively relate to ability
indicators. However, studies such as that by O‟Reilly and McNamara (2007) do not show the
relationship between reading strategies and academic achievement when controlling for general
reading ability. The question that remains to be addressed is, are test items differentially difficult
for students with the same underlying reading ability, but who report using different reading
strategies?
1.3 Objectives
The answer to the question of whether test items are differentially difficult for students
who report using different reading strategies and have the same level of reading ability can be
assessed with differential item functioning analysis. Therefore, the main purpose of this study is
7
to demonstrate that DIF can be used as a tool to investigate individual differences that are
relevant to test performance, such as preference of reading strategies.
This study also addresses the following research questions:
- Is there a pattern in students‟ self-reported use of reading strategies, and if so, what
is it?
- Can groups of students be identified who exhibit preference for lower-order versus
higher-order reading strategies?
- Does the use of different reading strategies affect test performance (overall test and
each item), and if so, in what way?
- Does employing DIF analysis, to detect differences in item difficulty for groups of
students matched on ability who use different reading strategies, reveal additional
information regarding the effects of reading strategies on test performance?
8
2 METHOD
2.1 Data
PCAP is a cyclical Pan-Canadian assessment program of students 13 years of age in
reading, mathematics and science administered by Council of Ministers of Education, Canada
(CMEC). It is a paper-and-pencil test containing constructed and selected response items
administered with student, teacher and school contextual questionnaires. Data from the PCAP
Reading Assessment as a primary domain in 2007 and the corresponding student questionnaire
were used in the following analysis. Data from students who took the test in French were not
included, nor were the mathematics and science items.
The PCAP Reading Assessment contained 50 items: 37 selected response and 13
constructed response items. Test items assessed three subdomains of reading: comprehension,
interpretation and response to text. Two different forms/booklets, designed to be matched on
difficulty and content, were administered to a random Pan-Canadian sample. All but 11 anchor
items were different for the two versions of the test, including different reading prompts. The
anchor reading prompt and anchor items were in the same location in both booklets. The
distribution of items assessing the three subdomains of reading was similar for both booklets of
the test. Selected response items were coded dichotomously and constructed responses were
coded on a scale of 0 to 3. Subscores of selected and constructed response items were also
obtained. On the student questionnaires, 15 items assessed how often students used specific
strategies when reading, rated on a 3-point Likert scale from „rarely or never‟ to „often‟ (see
Table 1).
9
Table 1
Student Questionnaire: Assessment of Reading Strategies
How often do you use the following strategies to help you understand what you are
reading?
(a) Reading out loud to myself
(b) Sounding out as many words as I can
(c) Looking for clues such as headings or captions
(d) Trying to make connections to what I already know
(e) Thinking about the author‟s message
(f) Looking at charts and pictures
(g) Asking someone to help me
(h) Applying what I know about word origins or word parts
(i) Using an outside source like a dictionary
(j) Thinking about the other words in a sentence to figure out the meaning
(k) Finding a quiet place to read
(l) Re-reading the more difficult parts
(m) Highlighting or making notes or drawings on the important parts
(n) Sometimes reading more quickly or more slowly, depending on the material
(o) Trying to predict what the material is about
After eliminating data from students who took the French language version of the test,
the data set contained 7,537 students who wrote the English Reading Assessment Booklet 1 and
7,472 who wrote English Reading Assessment Booklet 2. Students‟ responses on the contextual
questionnaire were matched to their reading achievement scores.
2.2 Grouping Variable
Descriptive analysis was performed for self-reported data regarding the use of the
reading strategies. The distribution of the ratings, means, standard deviations and ratings‟
patterns are reported in Tables 2 and 3. These results demonstrate that (1) students had different
preferences for reading strategies and (2) the results were nearly identical for the students who
were administered Booklet 1 and those administered Booklet 2.
10
Table 2
Means, Standard Deviations and Frequencies of Students’ Questionnaire Responses, Booklet 1
Score Distributions
Questions M SD
Rarely or
never Sometimes Often Pattern
Reading out loud to myself 1.62 0.69 49.8% 38.2% 11.9%
Sounding out as many words as I
can 1.54 0.66 55.3% 35.4% 9.3%
Looking for clues such as headings
or captions 1.82 0.69 34.2% 49.3% 16.5%
Trying to make connections to
what I already know 2.09 0.67 18.8% 53.8% 27.5%
Thinking about the author‟s
message 1.80 0.72 37.8% 44.0% 18.2%
Looking at charts and pictures 2.04 0.71 23.3% 49.1% 27.5%
Asking someone to help me 1.68 0.69 45.3% 41.5% 13.2%
Applying what I know about word
origins or word parts 1.78 0.68 36.8% 48.6% 14.6%
Using an outside source like a
dictionary 1.72 0.69 42.2% 43.9% 14.0%
Thinking about the other words in
a sentence to figure out the
meaning
2.08 0.69 20.0% 51.8% 28.2%
Finding a quiet place to read 2.16 0.76 22.1% 39.9% 38.0%
Re-reading the more difficult parts 2.28 0.71 15.4% 41.2% 43.4%
11
Score Distributions
Questions M SD
Rarely or
never Sometimes Often Pattern
Highlighting or making notes or
drawings on the important parts 1.54 0.70 57.9% 30.2% 11.9%
Sometimes reading more quickly
or more slowly, depending on the
material
2.12 0.64 15.0% 58.3% 26.7%
Trying to predict what the material
is about 1.93 0.69 27.5% 52.1% 20.4%
Table 3
Means, Standard Deviations and Frequencies of Students’ Questionnaire Responses, Booklet 2
Score Distributions
Questions M SD
Rarely or
never Sometimes Often Pattern
Reading out loud to myself 1.62 0.69 50.2% 37.5% 12.2%
Sounding out as many words as I
can 1.55 0.67 54.6% 35.6% 9.8%
Looking for clues such as headings
or captions 1.82 0.70 34.9% 47.9% 17.2%
Trying to make connections to
what I already know 2.07 0.68 20.0% 52.9% 27.1%
Thinking about the author‟s
message 1.76 0.72 40.9% 42.8% 16.4%
Looking at charts and pictures 2.02 0.72 24.9% 48.7% 26.4%
Asking someone to help me 1.67 0.70 46.6% 40.1% 13.2%
Applying what I know about word
origins or word parts 1.76 0.69 38.8% 46.7% 14.5%
12
Score Distributions
Questions M SD
Rarely or
never Sometimes Often Pattern
Using an outside source like a
dictionary 1.71 0.69 42.3% 44.5% 13.2%
Thinking about the other words in
a sentence to figure out the
meaning
2.08 0.69 20.2% 51.6% 28.2%
Finding a quiet place to read 2.16 0.75 21.6% 40.8% 37.6%
Re-reading the more difficult parts 2.27 0.71 15.4% 41.8% 42.8%
Highlighting or making notes or
drawings on the important parts 1.53 0.69 58.5% 30.0% 11.4%
Sometimes reading more quickly
or more slowly, depending on the
material
2.11 0.63 15.3% 58.6% 26.1%
Trying to predict what the material
is about 1.91 0.69 28.7% 51.3% 20.1%
By taking a preliminary look at the content of reading strategies and taking into account
the reviewed literature in the introduction of this paper, it appears that some strategies (e.g.,
„trying to make connections to what I already know‟, „thinking about the author‟s message, and
„trying to predict what the material is about‟) rely on higher-level integrative and inferential
cognitive processes, whereas others (e.g., „reading out loud to myself‟ and „sounding out as
many words as I can‟) rely on lower-level skills such as phonological awareness and decoding.
In fact, students reported using the former strategies more often than the latter.
Because students reported using strategies that conceptually represent higher-order
strategies more often than those that represent lower-order strategies, a factor analysis was
performed to explore whether the above strategies could be dichotomized into these two
13
categories, higher-order and lower-order reading strategies. Specifically, a principal axis factor
(PAF) with a Promax oblique rotation of 15 Likert scale questions was conducted on data from
both samples combined, Booklets 1 and 2, a total of 15009 participants (performing separate
analyses yielded nearly identical results, see Appendix A). The minimum criterion for factor
loadings was .30. Table 4 reports the pattern matrix; the 15 reading strategies have clustered into
three factors. A subsequent factor analysis was performed, after dropping the items of the third
factor (see Table 5), because factor 3 consisted of strategies that were more ambiguous in terms
of their cognitive function as opposed to higher- and lower-order strategies of factor 1 and 2,
respectively. The final analysis with two factors had Cronbach‟s alpha of .75 for factor 1
(higher-order reading strategies) and .67 for factor 2 (lower-order reading strategies).
Table 4
3-Factor Model: Reading Strategies
Questions Pattern
1 2 3
Trying to make connections to what I already know .723
Thinking about the author‟s message .646
Applying what I know about word origins or word parts .618
Looking for clues such as headings or captions .435 .325
Thinking about the other words in a sentence to figure out the meaning .383 .253
Trying to predict what the material is about .331 .217
Asking someone to help me .620 -.100
Sounding out as many words as I can .593
Reading out loud to myself -.170 .559 .165
Highlighting or making notes or drawings on the important parts .109 .391
Using an outside source like a dictionary .184 .356
Looking at charts and pictures .236 .327
Re-reading the more difficult parts .663
Finding a quiet place to read .591
Sometimes reading more quickly or more slowly, depending on the
material
.487
14
Table 5
2-Factor Model
Questions Pattern
1 2
Trying to make connections to what I already know .759
Thinking about the author‟s message .662 -.107
Applying what I know about word origins or word parts .578
Thinking about the other words in a sentence to figure out the meaning .530
Trying to predict what the material is about .442
Looking for clues such as headings or captions .422 .306
Asking someone to help me -.127 .597
Sounding out as many words as I can .592
Reading out loud to myself .577
Highlighting or making notes or drawings on the important parts .110 .392
Using an outside source like a dictionary .191 .355
Looking at charts and pictures .228 .316
According to the results of the two-factor model, factor 1 was consistent with higher-
order reading strategies and factor 2 was consistent with lower-order strategies. “Looking for
clues such as headings or captions” was the only strategy that exhibited cross-factor loadings
above .30. Based on the literature review, it is probable that using such information as headings
or captions relies more on higher-level processes because this information can be used as a tool
to derive meaning from the reading material by integrating it with the overall text. Therefore,
this strategy was more consistent with factor 1, higher-order reading strategies.
Finally, a grouping variable was computed by calculating the difference score for each
individual between mean rating scores of higher-order and lower-order strategies. A difference
score of zero meant that students had no preference for either type of strategies, students with
positive difference scores tended to report using higher-order reading strategies, and those with
negative scores tended to report using lower-order strategies. To create groups that were
distinct, students with difference scores close to zero (i.e., ±0.33) were dropped from DIF
15
analysis. The final sample size was 2,667 for Booklet 1 and 2,623 for Booklet 2. The
distribution of the ratings, means and standard deviations for self-reported use of reading
strategies for the new sample is reported in Appendix B. A limitation of this grouping variable is
that it identified reported reading strategy use in general rather than specifying their use during
the test.
2.3 Analyses
SPSS 17.0 was used to perform descriptive analyses, factor analyses, t-tests, chi-square
tests of independence and classical item analyses (including item difficulty and discrimination).
DIFAS 4.0 software, developed by Penfield (2005), was used to compute DIF statistics. This
software uses non-parametric indices, such as the Mantel-Haenszel statistic and others. One
advantage of using non-parametric tests is their lack of assumptions except for the requirement
of adequate sample size for each combination of the variables. DIFAS software assesses DIF for
dichotomous (e.g., selected response) and polytomous (constructed response) items, detecting
differences in difficulty while controlling for the matching variable of ability. For polytomous
items, DIF indices measure the overall (omnibus) difference in difficulty and differential step
functioning (DSF) indices measure the differences at each score level within the item, or each
step. In the PCAP 2007 Reading Assessment, constructed response items were coded from 0 to
3; DSF uses a cumulative step function to detect differences in difficulty at each step (i.e., first
step being a change from score 0 to 1, second step a change from score ≤1 to 2, and third step a
change from score ≤2 to 3).
For DIF with dichotomous items, DIFAS produces Mantel-Haenszel (MH) chi-square
statistics and a MH Common Log-Odds Ratio that indicates the direction of the DIF. A
categorization of the effect size according to ETS criteria is also produced, „A‟ as small, „B‟ as
16
moderate and „C‟ as large (Penfield, 2007). With polytomous items, a cumulative step-level
Log-Odds Ratio (CU-LOR) is produced, as well as Liu-Agresti Common Log-Odds Ratio as a
measure of effect size. For more information on DIFAS‟ procedures and interpretations, please
refer to Penfield (2005; 2007) and Penfield, Gattamorta, and Childs (2009).
When conducting DIF analyses through DIFAS, the grouping variable described above
was used; the reference group was identified as students who reported using higher-order
reading strategies and the focal group consisted of students preferring lower-order strategies.
When conducting the DIF analyses, total scores and scaled scores were used for matching
ability; the total score of selected response items was used for dichotomous DIF analysis and the
total score of constructed response items was used for polytomous DIF analysis. The PCAP
2007 Reading Assessment data set included the test‟s standardized score with a Canadian mean
of 500 and standard deviation of 100, an IRT scaled score ranging from 100 to 800. DIF
analysis was also performed using this scaled matching score. In order to include this variable in
DIFAS, it has been transformed into a categorical variable with 30 categories by equal
percentiles.
17
3 RESULTS AND DISCUSSION
3.1 Score Distributions
A total of 15009 English-speaking students wrote PCAP 2007 Reading Assessment;
7537 wrote Booklet 1 and 7472 wrote Booklet 2. Missing values on test items were treated as
incorrect responses, and therefore the following item analyses contained no missing data. The
results for the two booklets are reported separately because the booklets consisted of different
items, with the exception of 11 anchor items. The constructed response item scores [0, 1, 2, or 3]
were added to compute a constructed response subscore, with a maximum obtainable value of
39 [13 items 3]. The selected response items were scored as correct/incorrect; correct
responses were added to obtain a selected response subscore, with a maximum value of 37. The
total score was calculated by adding the two subscores, with a maximum value of 76.
Independent-samples t-tests were performed to evaluate whether students‟ performance
was equivalent between booklets. The findings were significant for the total test score,
t(14092.31) = -32.63, p = .00, η² = .07, constructed response subscore, t(15007) = -17.63, p =
.00, η² = .02 and selected response subscore, t(14820.39) = -38.15, p = .00, η² = .091. That is, the
performance on constructed and selected response subtests and the overall test was significantly
lower for students writing Booklet 1 (constructed response subscore: M = 13.84, SD = 7.40;
selected response subscore: M = 23.74, SD = 7.00; total score: M = 37.58, SD = 13.19) than for
students writing Booklet 2 (constructed response subscore: M = 15.97, SD = 7.42; selected
response subscore: M = 27.85, SD = 6.20; total score: M = 43.82, SD = 10.07). PCAP 2007
dataset also included an IRT scaled total score, with a Canadian mean of 500 and a standard
1 Equal variances assumption was violated for selected response subscore and total score tests, but not for the
constructed response subscore; when appropriate, adjusted t statistics are reported.
18
deviation of 100, ranging from 100 to 800. When an independent-samples t-test was conducted
using the scaled reading ability score, no significant difference was found between the booklets.
This finding suggests that the test score underwent a considerable transformation during scaling.
Figure 1 illustrates the relationship between the total score and the scaled score (separate
scatterplots for Booklet 1 and 2 are presented in Appendix C).
Figure 1. The relationship between PCAP 2007 Reading Assessment total score and IRT scaled
score.
In conclusion, although students were randomly assigned to write one of the two
booklets, students writing Booklet 1 performed considerably worse, regardless of the item
19
format (constructed or selected responses). To investigate the differences between the items of
the two booklets further, classical item analyses were conducted.
3.2 Classical Item Analysis
SPSS 17.0 was used to perform item analysis. Item difficulty indices, also referred to as
p-values, were equivalent to item means for dichotomous variables, in this case selected
response items. For constructed response items, item difficulty was computed by dividing the
item mean by its maximum obtainable score. Therefore, the item difficulty indices for the
selected response items are not directly comparable to the indices for the constructed response
items because the latter indicate the proportion of the mean of possible points students obtain.
For item discrimination indices, corrected (i.e., the item was not included in the total score)
point-biserial correlations are reported. Although some sources contend that discrimination
values between .10 and .30 represent fair items (Office of Educational Assessment, 2005), here,
indices below .25 are interpreted as potentially problematic.
Tables 6 and 7 list classical item statistics including item difficulty and item
discrimination for all test items in Booklet 1 and 2, respectively. Consistent with the results of t-
tests reported above, there were big differences between items of Booklet 1 and Booklet 2.
Regarding the difficulty of the items, both, constructed and selected response items were easier
on Booklet 2 (constructed response items: MBooklet 1 = 0.36, MBooklet 2 = 0.41; selected response
items: MBooklet 1 = 0.64, MBooklet 2 = 0.75), except anchor items which had similar item difficulty
indices. Regarding item discrimination, both constructed and selected response items had higher
discriminative power on Booklet 1 than Booklet 2 (constructed responses items: MBooklet 1 =
0.54, MBooklet 2 = 0.45; selected responses items: MBooklet1 = 0.35, MBooklet 2 = 0.24). Anchor items
also had highly divergent point-biserial correlations between the booklets. However, item
20
discrimination indices on both booklets were higher for constructed response items than selected
response items. This suggests that for constructed responses, the test discriminated well between
lower and higher performing students.
Table 6
Item Statistics, Booklet 1
Item Item Format Section/
Source p-value rpb
Proportion
Incorrect
Mean
Score
Correct†
Mean
Score
Incorrect
1 Constructed
Response Items
Only
A1-1 .35 .46 .27 40.63† 29.32
2 A2-1 .39 .57 .26 41.55† 26.23
3 A3-1 .17 .35 .68 44.24† 34.46
4 Constructed B1-1_Anchor .48 .48 .13 39.69† 23.67
5 Selected B2-1_Anchor .91 .27 .09 38.79 25.14
6 Selected B3-1_Anchor .79 .20 .21 39.09 31.81
7 Selected B4-1_Anchor .77 .37 .23 40.47 27.99
8 Selected B5-1_Anchor .79 .46 .21 40.88 25.20
9 Constructed B6-1_Anchor .43 .61 .27 41.98† 25.61
10 Constructed B7-1_Anchor .38 .61 .34 43.20† 26.74
11 Selected B8-1_Anchor .80 .32 .20 39.86 28.46
12 Constructed B9-1_Anchor .38 .60 .32 42.57† 26.86
13 Selected B10-1_Anchor .57 .38 .43 42.34 31.37
14 Constructed B11-1_Anchor .37 .55 .36 42.45† 28.73
15
Selected
Response Items
Only
C1-1 .91 .36 .09 39.15 21.56
16 C2-1 .87 .39 .13 39.63 23.45
17 C3-1 .56 .39 .44 42.46 31.29
18 C4-1 .88 .32 .12 39.29 25.52
19 C5-1 .71 .31 .29 40.44 30.54
20 C6-1 .74 .39 .26 40.81 28.16
21 C7-1 .37 .32 .63 43.64 33.96
22 C8-1 .62 .29 .38 40.94 32.04
23 Constructed D1-1 .24 .50 .54 44.94† 31.42
24 Selected D2-1 .33 .25 .67 42.95 34.93
25 Selected D3-1 .53 .35 .47 42.30 32.30
26 Selected D4-1 .53 .23 .47 40.80 33.89
27 Selected D5-1 .46 .41 .54 43.88 32.14
28 Selected D6-1 .55 .40 .45 42.73 31.30
29 Constructed D7-1 .31 .56 .36 43.21† 27.34
30 Constructed D8-1 .39 .54 .25 41.29† 26.27
31 Constructed D9-1 .38 .60 .27 42.01† 25.79
32 Selected D10-1 .55 .26 .45 41.12 33.31
33 Constructed D11-1 .34 .59 .34 42.87† 27.40
21
Item Item Format Section/
Source p-value rpb
Proportion
Incorrect
Mean
Score
Correct†
Mean
Score
Incorrect
34
Selected
Response Items
Only
E1-1 .62 .39 .38 41.90 30.42
35 E2-1 .47 .31 .53 42.34 33.29
36 E3-1 .66 .40 .34 41.63 29.77
37 E4-1 .59 .46 .41 43.00 29.89
38 E5-1 .53 .26 .47 41.19 33.52
39 E6-1 .61 .47 .39 42.87 29.30
40 E7-1 .48 .40 .52 43.45 32.13
41 E8-1 .61 .34 .39 41.55 31.47
42 E9-1 .72 .49 .29 41.84 26.91
43
Selected
Response Items
Only
F1-1 .59 .36 .41 41.94 31.37
44 F2-1 .60 .31 .40 41.25 32.00
45 F3-1 .78 .42 .22 40.68 26.55
46 F4-1 .66 .29 .34 40.65 31.62
47 F5-1 .42 .35 .59 43.55 33.34
48 F6-1 .72 .35 .28 40.71 29.44
49 F7-1 .86 .26 .14 39.07 28.25
50 F8-1 .57 .42 .43 42.78 30.66
Note. Indices of item difficulty below 0.40 and indices of item discrimination below .25 are in bold. †For constructed response items, mean score correct includes score of 1 or above (i.e., partial and/or full credit).
Table 7
Item Statistics, Booklet 2
Item Item Format Section/
Source p-value rpb
Proportion
Incorrect
Mean
Score
Correct†
Mean
Score
Incorrect
1 Constructed
Response Items
Only
A1-2 .41 .38 .18 45.32† 36.96
2 A2-2 .38 .43 .27 46.13† 37.63
3 A3-2 .21 .28 .57 47.09† 41.38
4 Constructed B1-2_Anchor .49 .41 .12 45.13† 34.21
5 Selected B2-2_Anchor .91 .19 .09 44.49 36.93
6 Selected B3-2_Anchor .79 .12 .21 44.66 40.75
7 Selected B4-2_Anchor .76 .24 .24 45.42 38.70
8 Selected B5-2_Anchor .79 .35 .21 45.86 36.29
9 Constructed B6-2_Anchor .43 .48 .27 46.53† 36.46
10 Constructed B7-2_Anchor .40 .50 .32 47.27† 36.65
11 Selected B8-2_Anchor .80 .23 .20 45.15 38.56
12 Constructed B9-2_Anchor .39 .48 .30 46.84† 36.61
13 Selected B10-2_Anchor .56 .21 .44 46.12 40.85
14 Constructed B11-2_Anchor .39 .45 .33 46.76† 37.86
22
Item Item Format Section/
Source p-value rpb
Proportion
Incorrect
Mean
Score
Correct†
Mean
Score
Incorrect
15
Selected
Response Items
Only
C1-2 .71 .15 .29 45.04 40.81
16 C2-2 .88 .25 .12 44.86 36.34
17 C3-2 .88 .28 .12 44.98 35.52
18 C4-2 .86 .24 .14 44.95 37.03
19 C5-2 .90 .25 .10 44.74 35.50
20 C6-2 .79 .23 .21 45.20 38.61
21 C7-2 .81 .19 .19 44.95 39.01
22 C8-2 .82 .31 .18 45.48 36.50
23 C9-2 .69 .15 .31 45.15 40.94
24 Constructed D1-2 .49 .47 .13 45.28† 33.83
25 Selected D2-2 .79 .32 .21 45.64 36.81
26 Selected D3-2 .54 .23 .46 46.38 40.87
27 Selected D4-2 .82 .29 .18 45.33 36.91
28 Selected D5-2 .89 .31 .11 45.02 34.05
29 Selected D6-2 .68 .28 .32 46.04 39.04
30 Selected D7-2 .46 .14 .54 45.85 42.09
31 Constructed D8-2 .37 .46 .28 46.59† 36.83
32 Constructed D9-2 .49 .51 .15 45.73† 33.21
33 Constructed D10-2 .40 .48 .22 46.27† 35.35
34 Constructed D11-2 .48 .47 .14 45.47† 33.99
35
Selected
Response Items
Only
E1-2 .66 .23 .34 45.81 39.91
36 E2-2 .73 .23 .27 45.46 39.30
37 E3-2 .84 .29 .16 45.24 36.44
38 E4-2 .79 .21 .21 45.12 38.91
39 E5-2 .62 .29 .38 46.44 39.52
40 E6-2 .77 .26 .23 45.47 38.30
41 E7-2 .71 .27 .29 45.86 38.95
42 E8-2 .75 .28 .25 45.70 38.34
43
Selected
Response Items
Only
F1-2 .74 .23 .26 45.45 39.12
44 F2-2 .70 .23 .30 45.59 39.70
45 F3-2 .76 .26 .24 45.52 38.44
46 F4-2 .94 .30 .06 44.65 31.12
47 F5-2 .82 .28 .18 45.30 37.06
48 F6-2 .59 .19 .41 45.82 40.97
49 F7-2 .69 .20 .31 45.44 40.25
50 F8-2 .61 .17 .39 45.57 41.11
Note. Indices of item difficulty below 0.40 and indices of item discrimination below .25 are in bold. †For constructed response items, mean score correct includes score of 1 or above (i.e., partial and/or full credit).
To investigate the reasons for the booklet effect further, especially before the main
results of this study (i.e., differential item functioning) are presented, a distractor analysis was
23
also performed. Tables 8 and 9 report the results of the distractor analysis for Booklet 1 and
Booklet 2, respectively. With few exceptions, this analysis demonstrated several patterns. First,
for constructed response items, the lowest mean total score was observed for students who did
not answer the item correctly (or partially correctly) and the highest mean total score was
observed for those who received the highest item score. In fact, the higher the obtained score
was, the higher was the mean total score. Second, the pattern for the selected response items was
similar; the highest mean total score was obtained by students who chose the correct response
option and the lower mean total scores were observed for other, incorrect options. Specifically,
students with the lowest mean scores tended to select the least frequently chosen option. Finally,
the distractor analysis revealed differences in results between the two booklets of the test, but
only regarding the difficulty level. That is, the aforementioned patterns were found for both
booklets, but the mean total scores were lower across all response scores/options in Booklet 1
and higher in Booklet 2. However, despite being useful, these results did not provide any insight
as to why Booklet 2 contained more items with low item discrimination.
It is worth noting that the reason that the percentages of students responding with A, B,
C or D to selected response items, shown in Tables 8 and 9, do not add up to 100 percent is
because of missing responses. Although missing responses were scored as incorrect, when
individual response options were examined (as opposed to binary correct/incorrect scoring)
missing responses were not included. For this reason, additional tables are presented in
Appendix D which include the percentage of missing data. These results reveal that there was
slightly more missingness found for Booklet 1 than Booklet 2, perhaps because items of Booklet
1 were more difficult.
24
Table 8
Distractor Analysis, Booklet 1
Item Key Frequencies (Percent) Mean Score
0 1 2 3 0 1 2 3
Constructed Response Items
A1-1 ≥1 27.0 47.5 20.5 5.1 29.32 36.88 46.38 52.54
A2-1 ≥1 25.9 36.9 30.9 6.3 26.23 36.66 44.93 53.65
A3-1 ≥1 68.1 17.5 9.7 4.8 34.46 40.18 47.20 53.11
B1-1 ≥1 13.2 36.2 44.5 6.1 23.67 34.57 42.45 49.86
B6-1 ≥1 26.9 25.5 38.8 9.4 25.61 35.28 44.20 51.15
B7-1 ≥1 34.2 23.7 35.4 6.7 26.74 37.40 45.39 52.23
B9-1 ≥1 31.8 27.7 34.7 5.8 26.86 36.81 45.43 52.80
B11-1 ≥1 35.5 24.6 32.9 6.9 28.73 36.14 44.88 53.38
D1-1 ≥1 54.4 24.0 17.3 4.4 31.42 41.44 47.50 54.04
D7-1 ≥1 35.5 39.5 21.8 3.2 27.34 40.55 46.41 54.21
D8-1 ≥1 24.7 38.8 31.0 5.5 26.27 36.89 45.01 51.53
D9-1 ≥1 27.3 35.5 32.0 5.2 25.79 37.15 45.33 54.83
D11-1 ≥1 34.2 34.7 26.7 4.4 27.40 38.60 46.26 55.92
Item Key A B C D A B C D
Selected Response Items
B2-1 B 4.3 91.1 2.9 0.6 29.90 38.79 24.43 21.64
B3-1 D 2.8 3.6 13.1 79.2 27.33 28.88 35.36 39.09
B4-1 B 3.6 76.8 7.6 10.9 22.75 40.47 26.00 32.95
B5-1 A 79.0 6.4 6.6 7.0 40.88 25.64 26.22 26.35
B8-1 A 80.0 6.1 1.4 10.1 39.86 30.38 22.21 30.54
B10-1 A 56.6 12.9 9.8 17.7 42.34 31.16 27.91 35.27
C1-1 B 3.7 91.1 0.9 2.8 24.35 39.15 21.76 22.02
C2-1 D 6.4 2.3 2.5 87.3 26.37 24.37 21.36 39.63
C3-1 D 14.0 21.7 6.6 56.3 36.20 31.52 24.24 42.46
C4-1 B 3.1 87.6 3.0 5.1 24.09 39.29 25.72 30.19
C5-1 D 13.1 13.3 1.0 71.1 32.48 31.46 17.05 40.44
C6-1 D 10.5 6.4 7.3 74.4 30.90 23.78 30.96 40.81
C7-1 A 37.3 10.1 38.3 12.1 43.64 34.05 35.93 30.13
C8-1 B 2.8 62.2 17.1 16.1 24.10 40.94 33.54 33.61
D2-1 D 14.3 16.3 32.7 33.0 31.67 34.72 37.90 42.95
D3-1 C 35.4 5.6 52.8 3.5 34.90 26.55 42.30 26.61
D4-1 C 13.0 19.5 53.4 11.5 35.90 33.36 40.80 36.66
D5-1 C 7.8 32.1 46.3 11.4 24.96 34.39 43.88 34.34
D6-1 C 11.8 5.9 54.9 24.8 29.05 27.47 42.73 34.81
D10-1 C 21.6 7.4 54.6 10.6 37.76 29.12 41.12 33.75
E1-1 D 8.3 20.1 6.1 62.4 27.05 33.97 29.69 41.90
E2-1 D 14.2 16.7 18.9 47.4 32.99 30.90 38.18 42.34
E3-1 B 13.5 65.8 5.1 12.7 31.53 41.63 24.72 32.79
25
Item Key Frequencies (Percent) Mean Score
A B C D A B C D
E4-1 A 58.7 13.1 3.7 21.7 43.00 28.40 24.35 33.45
E5-1 C 17.2 23.2 53.0 3.7 31.91 37.73 41.19 27.36
E6-1 A 61.0 16.3 11.6 6.4 42.87 28.98 31.21 31.47
E7-1 D 7.1 14.5 25.6 48.1 26.99 34.38 34.08 43.45
E8-1 D 7.2 18.5 9.2 60.6 33.14 33.78 30.16 41.55
E9-1 C 6.8 11.6 71.5 5.6 28.92 28.10 41.84 26.11
F1-1 A 58.8 15.2 13.1 9.5 41.94 30.05 35.72 31.45
F2-1 B 3.3 60.3 10.9 22.1 24.86 41.25 35.87 33.00
F3-1 D 9.2 7.5 2.1 78.0 29.97 27.29 21.62 40.68
F4-1 D 5.7 21.7 2.9 66.0 27.18 35.68 23.90 40.65
F5-1 C 23.8 12.0 41.5 18.9 34.18 36.23 43.55 32.97
F6-1 C 6.7 9.6 72.2 7.9 27.46 32.58 40.71 31.88
F7-1 A 86.2 7.1 1.7 1.2 39.07 35.39 20.74 19.50
F8-1 B 30.2 57.1 2.4 6.3 31.95 42.78 27.23 31.60
Table 9
Distractor Analysis, Booklet 2
Item Key Frequencies (Percent) Mean Score
0 1 2 3 0 1 2 3
Constructed Response Items
A1-2 ≥1 17.9 46.6 29.0 6.5 36.96 42.54 47.92 53.64
A2-2 ≥1 27.2 36.9 30.5 5.4 37.63 42.64 48.73 55.33
A3-2 ≥1 57.2 25.9 12.9 4.0 41.38 44.64 49.51 55.10
B1-2 ≥1 12.0 36.8 44.8 6.4 34.21 41.49 46.96 53.21
B6-2 ≥1 26.8 27.2 35.8 10.2 36.46 41.88 48.20 53.07
B7-2 ≥1 32.4 24.1 35.7 7.8 36.65 43.06 48.65 53.98
B9-2 ≥1 29.5 29.8 34.3 6.4 36.61 43.22 48.61 54.18
B11-2 ≥1 33.0 25.3 33.2 8.5 37.86 42.21 48.39 53.89
D1-2 ≥1 12.7 38.7 38.2 10.3 33.83 41.24 47.29 53.01
D8-2 ≥1 28.3 36.7 30.9 4.1 36.83 43.60 48.90 56.12
D9-2 ≥1 15.2 30.4 47.6 6.8 33.21 41.12 47.41 54.58
D10-2 ≥1 22.4 39.0 34.1 4.5 35.35 43.01 48.89 54.67
D11-2 ≥1 14.3 34.8 44.3 6.6 33.99 41.25 47.57 53.61
Item Key A B C D A B C D
Selected Response Items
B2-2 B 4.4 91.2 3.1 0.5 38.71 44.49 36.22 34.32
B3-2 D 2.6 4.1 13.3 78.6 37.64 38.54 42.46 44.66
B4-2 B 3.5 76.3 8.2 11.2 35.12 45.42 37.04 41.47
B5-2 A 78.7 6.1 7.1 7.3 45.86 36.96 36.48 36.19
26
Item Key Frequencies (Percent) Mean Score
A B C D A B C D
B8-2 A 79.9 6.1 1.7 10.3 45.15 39.53 34.38 39.21
B10-2 A 56.4 13.4 9.7 17.8 46.12 40.92 39.43 42.09
C1-2 C 10.3 9.3 71.2 6.0 40.49 42.50 45.04 40.52
C2-2 A 87.9 4.3 5.0 1.9 44.86 36.23 38.26 36.08
C3-2 D 5.5 1.5 4.1 87.8 38.15 30.39 35.49 44.98
C4-2 C 2.7 5.9 85.8 4.4 33.58 39.18 44.95 38.01
C5-2 B 2.9 90.1 2.7 3.3 34.73 44.74 37.29 37.04
C6-2 B 2.5 79.1 12.7 4.6 35.14 45.20 40.71 37.08
C7-2 A 81.1 10.5 2.2 4.8 44.95 42.15 31.15 38.04
C8-2 C 3.1 10.4 81.6 3.4 37.05 36.73 45.48 37.73
C9-2 C 15.6 11.1 68.5 3.2 42.06 41.33 45.15 38.64
D2-2 D 6.5 6.6 5.4 79.4 39.40 36.46 36.94 45.64
D3-2 D 31.9 4.8 7.0 53.7 42.98 36.49 37.46 46.38
D4-2 B 6.8 82.1 3.3 5.6 38.54 45.33 35.67 38.32
D5-2 C 2.0 3.6 89.1 3.3 34.36 35.20 45.02 35.08
D6-2 C 8.2 13.0 68.3 8.2 38.66 40.88 46.04 38.75
D7-2 D 13.1 24.5 13.5 46.2 41.29 43.60 41.93 45.85
E1-2 C 21.5 2.7 66.4 6.9 40.36 38.25 45.81 42.81
E2-2 A 73.4 7.6 9.2 6.9 45.46 39.80 40.71 40.14
E3-2 D 6.4 4.1 2.9 84.0 39.16 37.28 34.92 45.24
E4-2 B 4.1 79.1 3.6 10.1 37.10 45.12 36.20 42.66
E5-2 B 12.6 62.3 15.4 7.1 38.59 46.44 41.40 40.40
E6-2 D 3.8 2.9 13.6 77.0 37.39 36.44 40.56 45.47
E7-2 B 13.3 70.5 2.2 11.2 39.67 45.86 34.52 40.94
E8-2 C 6.7 8.7 74.5 7.2 40.45 38.40 45.70 39.37
F1-2 C 3.3 7.4 74.3 12.7 37.26 39.31 45.45 41.36
F2-2 D 16.7 2.2 8.6 70.0 41.21 34.35 40.81 45.59
F3-2 A 76.0 4.7 8.5 8.3 45.52 39.09 39.48 39.61
F4-2 D 1.4 1.0 1.4 93.9 33.97 32.69 30.93 44.65
F5-2 D 2.0 2.4 10.9 82.1 32.99 35.50 39.90 45.30
F6-2 D 15.4 3.1 19.6 58.8 41.75 35.74 42.57 45.82
F7-2 B 12.9 68.8 5.2 9.9 41.93 45.44 37.64 42.08
F8-2 D 4.5 23.9 7.4 60.9 38.32 43.65 38.64 45.57
To investigate the nature of the difference between the booklets further, item analyses
were also run with constructed and selected responses separately. Tables 10 and 11 present three
types of point-biserial correlations: item to constructed responses subscore, item to selected
responses subscore and as reported before, item to total score (the far right column).
27
Table 10
Item Discrimination by Subscores, Booklet 1
Item Item Discrimination
rpb for constructed
response items
rpb for selected
response items
rpb for all
items
Constructed Response Items
A1-1 .50 --- .46
A2-1 .59 --- .57
A3-1 .38 --- .35
B1-1 .48 --- .48
B6-1 .61 --- .61
B7-1 .63 --- .61
B9-1 .60 --- .60
B11-1 .56 --- .55
D1-1 .50 --- .50
D7-1 .58 --- .56
D8-1 .57 --- .54
D9-1 .63 --- .60
D11-1 .63 --- .59
Selected Response Items
B2-1 --- .29 .27
B3-1 --- .20* .20*
B4-1 --- .37 .37
B5-1 --- .45 .46
B8-1 --- .32 .32
B10-1 --- .39 .38
C1-1 --- .39 .36
C2-1 --- .42 .39
C3-1 --- .41 .39
C4-1 --- .36 .32
C5-1 --- .32 .31
C6-1 --- .40 .39
C7-1 --- .33 .32
C8-1 --- .31 .29
D2-1 --- .24* .25
D3-1 --- .35 .35
D4-1 --- .22* .23*
D5-1 --- .39 .41
D6-1 --- .40 .40
D10-1 --- .25 .26
E1-1 --- .42 .39
E2-1 --- .32 .31
E3-1 --- .42 .40
E4-1 --- .48 .46
28
Item Item Discrimination
rpb for constructed
response items
rpb for selected
response items
rpb for all
items
E5-1 --- .28 .26
E6-1 --- .49 .47
E7-1 --- .42 .40
E8-1 --- .35 .34
E9-1 --- .50 .49
F1-1 --- .39 .36
F2-1 --- .34 .31
F3-1 --- .45 .42
F4-1 --- .30 .29
F5-1 --- .35 .35
F6-1 --- .38 .35
F7-1 --- .29 .26
F8-1 --- .43 .42
Note.*rpb is between .200 - .249.
Table 11
Item Discrimination by Subscores, Booklet 2
Item Item Discrimination
rpb for constructed
response items
rpb for selected
response items
rpb for all
items
Constructed Response Items
A1-2 .49 --- .38
A2-2 .55 --- .43
A3-2 .35 --- .28
B1-2 .53 --- .41
B6-2 .62 --- .48
B7-2 .64 --- .50
B9-2 .61 --- .48
B11-2 .57 --- .45
D1-2 .60 --- .47
D8-2 .58 --- .46
D9-2 .67 --- .51
D10-2 .63 --- .48
D11-2 .63 --- .47
Selected Response Items
B2-2 --- .04*** .19**
B3-2 --- .02*** .12***
B4-2 --- .04*** .24*
B5-2 --- .06*** .35
29
Item Item Discrimination
rpb for constructed
response items
rpb for selected
response items
rpb for all
items
B8-2 --- .06*** .23*
B10-2 --- .33 .21*
C1-2 --- .24* .15**
C2-2 --- .42 .25
C3-2 --- .44 .28
C4-2 --- .39 .24*
C5-2 --- .41 .25
C6-2 --- .37 .23*
C7-2 --- .33 .19**
C8-2 --- .50 .31
C9-2 --- .25 .15**
D2-2 --- .53 .32
D3-2 --- .39 .23*
D4-2 --- .48 .29
D5-2 --- .54 .31
D6-2 --- .45 .28
D7-2 --- .22* .14***
E1-2 --- .40 .23*
E2-2 --- .37 .23*
E3-2 --- .48 .29
E4-2 --- .34 .21*
E5-2 --- .50 .29
E6-2 --- .43 .26
E7-2 --- .47 .27
E8-2 --- .45 .28
F1-2 --- .37 .23*
F2-2 --- .39 .23*
F3-2 --- .42 .26
F4-2 --- .50 .30
F5-2 --- .44 .28
F6-2 --- .30 .19**
F7-2 --- .31 .20*
F8-2 --- .30 .17**
For both booklets, constructed response items‟ discrimination indices were always
higher than those of selected response items. Also, when constructed and selected response
items were correlated with their own scales, the finding that more items in Booklet 2 exhibited
lower point-biserial correlations than in Booklet 1 was replicated. However, correlating items to
Note.*rpb is between .200 - .249, **.150 - .199, ***.000-.149.
30
their own subscores, as opposed to the total score, revealed that (1) point-biserials for all
constructed response items increased, (2) fewer selected response items had low point-biserial
correlations, (3) most of those selected response items which did have low point-biserials were
anchor items and finally (4) these results were found only for Booklet 2.
These findings have important implications. In Table 11, when rpb values are examined
separately for the two subscores instead of the total score, fewer items have low coefficients,
which implies that the test items of Booklet 2 measured more than one ability. This suspicion is
strengthened by the fact that most of these items with low discrimination are anchor items,
which suggests that section B (anchor prompt and items) measured a different dimension than
the rest of the test. Also, the finding that rpb values for constructed responses were higher when
correlated with their own subscale implies that constructed response items also measured
something different than selected response items. Classical item analysis indices, including item
difficulty and item to total score/item to subscore discrimination indices are also reported for the
grouping variable sample in Appendix E. Although the precise values (p-values and rpb values)
are different from those of the full sample, the pattern and conclusions are identical. That is, the
two booklets of the test were different from each other, and potentially measured different
aspects of reading ability.
Cronbach‟s alpha obtained through the reliability analysis does not, however, reveal
any potential problems with internal consistency of test items. In fact, alpha coefficients for (1)
the constructed responses subscore, (2) the selected responses subscore and (3) the total score
(Booklets 1 and 2 separately) are above .85. However, correlating the test subscales with each
other and with the total score supports the notion that Booklet 2 measured more than one ability.
That is, for Booklet 2, the relationship between constructed and selected response subscores was
almost non-existent (rCRxSR = .09), whereas these subscales were strongly related to the total
31
score (rCRxTotal = .79 and rSRxTotal = .66 for constructed and selected response subscales,
respectively); this is not surprising, as the items in each subscore also contributed to the total
score. In comparison, all of these correlations were strong for Booklet 1 (rCRxSR = .68, rCRxTotal =
.92, rSRxTotal = .91).
To investigate the poor fit between anchor items and the rest of the Booklet 2 test,
classical item analysis was also performed without the anchor items (Appendix F). The results
indicate that there were substantially fewer items low in discrimination on Booklet 2 after
eliminating anchors, which suggests that anchor items were measuring something other than the
rest of the test items on Booklet 2. When items were grouped by section (A, B, C, D, E, F) and
correlations of anchor items to their section subscale scores were computed, slightly different
results between the booklets persisted (anchor items rpb coefficients: MBooklet 1 = 0.44, MBooklet 2 =
0.40), see Appendix G. Also, more items with lower coefficients on Booklet 2 than Booklet 1
were observed for Section D, the only other section containing both constructed and selected
response items. This implies that constructed response items measured something other than
selected response items, but only on Booklet 2. Because PCAP was designed to assess three
subdomains of reading ability (comprehension, interpretation and response to text), separate
analyses were performed to evaluate whether there was an effect by the reading domain.
However, consistent with previous analyses, lower item to subdomain correlations were found
on Booklet 2 than Booklet 1 regardless of the domain of reading ability. Thus, the potential
multidimensionality of Booklet 2 cannot be explained by the aforementioned domains evaluated
by PCAP. Only two factors had an effect on these results, the inclusion of anchor items and the
item format. Therefore, the findings of the item discrimination analyses (1) by subscore, (2)
with anchor items eliminated, (3) by section and (4) by subdomain suggest that Booklet 2 was a
multidimensional measure of reading ability because constructed response and selected response
32
items were not consistent with each other and anchor items were not consistent with other
Booklet 2 test items.
All of the above analyses were carried out in order to better understand the ensuing DIF
results and to be able to make appropriate interpretations. Because of the findings reported
above, Booklets 1 and 2 will be treated as entirely two different examinations, both measuring
one or more aspect of reading ability.
3.3 Reading Strategies and Test Scores
As described in the literature review, higher-level cognitive skills contribute to
successful reading comprehension and deeper meaning construction. Thus, it is important to
explore whether the reported use of higher-order reading strategies is positively related to
students‟ reading ability too, and to investigate the role of lower-order reading strategies.
Independent-samples t-tests were conducted to assess whether the grouping variable of
students‟ preference for using lower- versus higher-order reading strategies predicted students‟
reading ability assessed by PCAP. The findings for Booklet 1 were significant for the total test
score, t(593.00) = 15.53, p = .00, η² = .09, constructed response item subscore, t(2665) = 12.37,
p = .00, η² = .05 and selected response item subscore, t(585.40) = 16.12, p = .00, η² = .102. The
findings for Booklet 2 were slightly different, significant results were found for the total test
score, t(2621) = 7.91, p = .00, η² = .02 and selected response item subscore, t(530.27) = 12.94, p
= .00, η² = .08; the results for constructed response item subscore were not significant, t(2621) =
-.07, p = .95, η² = .00. For instance, for total score t-test, students who tended to use higher-
order reading strategies scored higher on the overall test (Booklet 1: M = 43.45, SD = 11.40;
Booklet 2: M = 46.45, SD = 9.33) than those who used lower-order strategies (Booklet 1: M =
2 Equal variances assumption was violated for some tests; when appropriate, adjusted t statistics are reported.
33
33.57, SD = 12.33; Booklet 2: M = 42.49, SD = 9.91). This finding was also replicated with the
IRT scaled score, t(2665) = 16.15, p = .00, η² = .09 for Booklet 1 and t(2621) = 10.66, p = .00,
η² = .04 for Booklet 2. That is, the reading ability scaled score was higher for students who
reported using higher-order reading strategies (Booklet 1: M = 520.58, SD = 85.11; Booklet 2:
M = 506.51, SD = 82.57) more than lower-order (Booklet 1: M = 448.00, SD = 90.25; Booklet 2:
M = 459.74, SD = 83.02). In conclusion, students who preferred higher-order reading strategies
tended to do better on the overall assessments of reading ability.
Two-way contingency table analyses were performed to assess the relationship between
higher-/lower-order reading strategies and students‟ performance on each test item with chi-
square test of independence (without taking the level of ability into account). For Booklet 1,
students who reported using higher-order reading strategies performed significantly better on all
except one (B3) constructed response and selected response items; no negative effects were
found. For Booklet 2, such students performed significantly better on 31 out of 37 selected
response items and significantly worse on one selected response item (B8) in comparison to
students who reported using lower-order reading strategies; the use of reading strategies on
Booklet 2 was not significantly related to answering any constructed response items correctly.
Therefore, using lower-order reading strategies was almost never related to better performance
on test items. See Appendix H for the full list of reading strategies by item crosstabulations and
chi-square results.
3.4 DIF and DSF Analyses
3.4.1 Dichotomous Items
DIF analysis was conducted on all 37 selected response items, matching students on the
total score (i.e., selected response subscore). The Mantel-Haenszel chi-square statistic indicates
34
which items exhibited significant DIF. Another statistic, the Breslow-Day chi-square is effective
in detecting nonuniform DIF (Penfield, 2007). The combined decision rule (CDR) combines
both statistics in the decision to flag the item for DIF. The direction of DIF is also identified,
either in favour of students who used higher-order reading strategies or those who used lower-
order strategies. And finally, the last column refers to the “effect size” of small (A), moderate
(B) and large (C) based on the Educational Testing Services‟ (ETSs‟) categorization scheme.
For more details, please refer to Penfield (2007) and Penfield, Gattamorta, and Childs (2009).
Before interpreting the results of the DIF analysis, the assumption of adequacy of cell
count was verified. For Booklet 1, 29% of all cells (grouping by total score matching variable)
had fewer than five cases and 9% had a count of zero. For Booklet 2, 38% of all cells had fewer
than five cases and 17% had zero; thus, the adequacy of cell count might be compromised as
these values are below the desirable level of 20 percent.
Table 12
DIF for Dichotomous Items, Booklet 1
Item Mantel-Haenszel
Chi-Square
Breslow-Day
Chi-Square Direction of DIF
Combined
Decision
Rule (CDR)
Effect
Size
B3-1 13.91** 0.73 Lower-order strategies DIF Moderate
B4-1 0.37 7.01** Higher-order strategies DIF Small
B5-1 8.42** 1.47 Higher-order strategies DIF Small
F6-1 5.92* 0.87 Higher-order strategies DIF Small
Note. *p < .05, **p < .01
Table 12 lists statistics for the selected response items that were flagged for DIF in
Booklet 1. Only one of the four flagged items detected a moderate effect; the rest represented
small levels of DIF. Also, for most of these items DIF was in favour of the reference group,
except one item (B3) with DIF in favour of the focal group. In other words, most items with
35
differential functioning based on the reported use of reading strategies favoured students who
used higher-order reading strategies, and students who tended to use lower-order strategies
performed better than expected on only one test item out of the four flagged items.
Table 13
DIF for Dichotomous Items, Booklet 2
Item Mantel-Haenszel
Chi-Square
Breslow-Day
Chi-Square Direction of DIF
Combined
Decision
Rule (CDR)
Effect
Size
B5-2 12.97** 1.70 Lower-order strategies DIF Moderate
B8-2 14.74** 0.74 Lower-order strategies DIF Moderate
C9-2 0.00 6.00* Higher-order strategies DIF Small D3-2 5.04* 2.38 Higher-order strategies DIF Small
D6-2 32.75** 7.80** Higher-order strategies DIF Large E4-2 5.95* 0.63 Lower-order strategies DIF Small
E5-2 10.79** 1.03 Higher-order strategies DIF Moderate
F1-2 10.21** 0.66 Higher-order strategies DIF Moderate
Note. *p < .05, **p < .01
Table 13 shows the DIF analysis obtained for Booklet 2. There was a difference in
results between the booklets; DIF for Booklet 2 flagged more items, that is, 8 out of 37 (see
Table 13). Only one of these test items detected a large effect and half of the DIF items detected
a moderate effect; the rest represented small levels of DIF. Also, for most of these items DIF
was in favour of the reference group, except three items (B5, B8, E4) with DIF in favour of the
focal group. In other words, again, most test items functioned in favour of students who used
higher-order reading strategies, and students who tended to use lower-order strategies performed
better than expected on three items out of the eight items flagged for DIF. See Appendix I for
the full list of DIF statistics for dichotomous items, Booklet 1 and Booklet 2.
Interestingly, the differences of the DIF results between the two booklets were found
even for the same items (anchor items, B2, B3, B4, B5, B8 and B10); different anchor items
36
were flagged for DIF. Also, all of the DIF items which favoured the focal group were anchor
items, with an exception of item E4 on Booklet 2. However, on the selected response items
overall, most DIF favoured the students who reported using higher-order reading strategies.
3.4.2 Polytomous Items and DSF
DIF analysis for polytomous items was conducted on all 13 constructed response items,
matching students on the total score (i.e., constructed response subscore). Again, the adequacy
of cell count assumption was verified. For Booklet 1, 24% of all cells (grouping by total score
matching variable) had fewer than five cases and 10% had a count of zero. For Booklet 2, 20%
of all cells had fewer than five cases and 6% had zero. Thus, the adequacy of cell count
assumption was almost met.
Table 14
DIF for Polytomous Items, Booklet 1
Item Mantel-Haenszel
Chi-Square
Step(s) Direction of DIF
Combined
Decision Rule
(CDR)
A1-1 0.02 --- --- ---
A2-1 3.57 --- --- ---
A3-1 0.31 --- --- ---
B1-1 0.09 --- --- ---
B6-1 0.01 --- --- ---
B7-1 0.00 --- --- ---
B9-1 0.60 --- --- ---
B11-1 0.03 --- --- ---
D1-1 4.11* 1st Higher-order strategies DIF
D7-1 0.17 --- --- ---
D8-1 7.32** 1st Lower-order strategies DIF
D9-1 2.25 --- --- ---
D11-1 1.71 --- --- ---
Note. *p < .05, **p < .01
37
Table 15
DSF for Polytomous Items, Booklet 1
Item Step CU-LOR Z DSF Size
D1-1 1 0.364 2.881* Small
2 0.229 1.369 Small
3 -0.592 -1.924 Small
D8-1 1 -0.315 -2.061* Small
2 -0.260 -1.938 Small
3 -0.333 -1.234 Small Note. *p < .05
Tables 14 and 15 summarize DIF and DSF results for Booklet 1, respectively. No DIF
was detected for items of Booklet 2 (see Appendix I for the list of statistics). DIF was detected
for only two items on Booklet 1, one in favour of the focal group and another in favour of the
reference group. However, the effect was small and involved only the first step (change in
scoring from 0 to 1).
To summarize, (1) some differences in DIF for dichotomous and polytomous items
existed between the booklets and (2) most of the items flagged for DIF favoured the reference
group (use of higher-order reading strategies).
3.4.3 DIF with Scaled Matching Score, Dichotomous Items
It is conventional to perform DIF analyses using the total score as the matching
variable; however, since the dataset included a scaled reading ability measure, DIF analysis was
rerun using this variable as a matching variable. The scaled score had to be transformed into a
categorical variable before the results could be analyzed in DIFAS. The scaled score, which
ranged from 100 to 800, was divided into 30 equal percentile categories. The assumption of
adequacy of cell count was met. For Booklet 1, 2% of all cells (grouping by scaled score
matching variable) had fewer than five cases and no counts of zero were observed. For Booklet
2, all cells had more than five cases and none had zero.
38
Table 16
DIF for Dichotomous Items with Scaled Matching Score, Booklet 1
Item Mantel-Haenszel
Chi-Square
Breslow-Day
Chi-Square Direction of DIF
Combined
Decision
Rule (CDR)
Effect
Size
B2-1 0.03 0.09 --- --- ---
B3-1 8.16** 0.39 Lower-order strategies DIF Small
B4-1 1.33 1.75 --- --- ---
B5-1 6.96** 4.49* Higher-order strategies DIF Small
B8-1 3.62 0.56 --- --- ---
B10-1 4.94 3.97 --- --- ---
C1-1 0.72 0.06 --- --- --- C2-1 0.91 0.00 --- --- --- C3-1 5.90* 0.64 Higher-order strategies DIF Small
C4-1 0.04 0.21 --- --- ---
C5-1 0.07 0.22 --- --- --- C6-1 0.00 0.11 --- --- ---
C7-1 11.36** 1.26 Higher-order strategies DIF Moderate
C8-1 0.92 2.16 --- --- --- D2-1 0.19 0.62 --- --- ---
D3-1 0.33 5.38* Higher-order strategies DIF Small
D4-1 0.66 0.48 --- --- ---
D5-1 0.14 0.36 --- --- --- D6-1 4.32 2.73 --- --- ---
D10-1 1.67 3.21 --- --- --- E1-1 3.45 4.62 --- --- ---
E2-1 0.19 0.12 --- --- --- E3-1 3.31 2.78 --- --- --- E4-1 8.96** 0.12 Higher-order strategies DIF Small
E5-1 0.00 0.31 --- --- --- E6-1 0.08 1.19 --- --- --- E7-1 1.46 0.09 --- --- --- E8-1 0.76 0.71 --- --- ---
E9-1 0.34 0.05 --- --- --- F1-1 7.23** 2.93 Higher-order strategies DIF Small
F2-1 6.30* 3.15 Higher-order strategies DIF Small
F3-1 0.00 0.32 --- --- --- F4-1 0.88 1.96 --- --- --- F5-1 3.02 6.02* Higher-order strategies DIF Small
F6-1 10.65** 5.39* Higher-order strategies DIF Small
F7-1 010 2.02 --- --- ---
F8-1 0.00 0.00 --- --- ---
Note. *p < .05, **p < .01
39
Table 17
DIF for Dichotomous Items with Scaled Matching Score, Booklet 2
Item Mantel-Haenszel
Chi-Square
Breslow-Day
Chi-Square Direction of DIF
Combined
Decision
Rule (CDR)
Effect
Size
B2-2 2.86 0.32 --- --- ---
B3-2 2.27 0.01 --- --- ---
B4-2 3.53 1.27 --- --- ---
B5-2 29.79** 1.04 Lower-order strategies DIF Large
B8-2 17.17** 2.88 Lower-order strategies DIF Moderate
B10-2 10.76** 3.47 Higher-order strategies DIF Small
C1-2 0.06 3.96 --- --- ---
C2-2 17.01** 1.91 Higher-order strategies DIF Moderate
C3-2 18.98** 1.10 Higher-order strategies DIF Moderate
C4-2 10.73** 0.21 Higher-order strategies DIF Moderate
C5-2 3.03 0.82 --- --- --- C6-2 27.87** 0.78 Higher-order strategies DIF Large
C7-2 6.45* 1.56 Higher-order strategies DIF Small C8-2 22.60** 0.00 Higher-order strategies DIF Moderate
C9-2 6.29* 2.67 Higher-order strategies DIF Small
D2-2 25.47** 0.42 Higher-order strategies DIF Large D3-2 29.57** 0.08 Higher-order strategies DIF Moderate
D4-2 29.11** 0.21 Higher-order strategies DIF Large D5-2 4.75 0.07 --- --- ---
D6-2 85.43** 6.60** Higher-order strategies DIF Large D7-2 1.96 0.34 --- --- ---
E1-2 5.75* 1.65 Higher-order strategies DIF Small E2-2 13.51** 0.01 Higher-order strategies DIF Moderate
E3-2 10.14** 0.97 Higher-order strategies DIF Moderate
E4-2 0.43 3.98 --- --- --- E5-2 57.77** 0.05 Higher-order strategies DIF Large
E6-2 2.48 3.37 --- --- --- E7-2 17.16** 3.92 Higher-order strategies DIF Moderate
E8-2 7.82* 1.06 Higher-order strategies DIF Small F1-2 0.13 2.65 --- --- ---
F2-2 24.62** 0.73 Higher-order strategies DIF Moderate
F3-2 14.30** 0.06 Higher-order strategies DIF Moderate
F4-2 3.21 8.40 Higher-order strategies DIF Small F5-2 4.58 1.02 --- --- --- F6-2 4.18 0.02 --- --- --- F7-2 0.02 1.27 --- --- ---
F8-2 22.90** 0.03 Higher-order strategies DIF Moderate
Note. *p < .05, **p < .01
40
Table 16 reports the DIF statistics for all 37 selected response items for Booklet 1. DIF
analysis with the scaled score matching variable rather than the total score flagged 10 out of 37
items (Table 16). Only one of these test items detected a moderate effect; the rest represented
small levels of DIF. Also, for most of these items DIF was in favour of the reference group,
except one item (B3) with DIF in favour of the focal group. In other words, most items
functioned in favour of students who tended to use higher-order reading strategies, and students
who used lower-order strategies performed better than expected on only one item out of the ten
flagged items.
Table 17 shows the DIF analysis results obtained for Booklet 2. There was a large
difference in results between the booklets. DIF for Booklet 2 revealed that 65% of all selected
response items were flagged for DIF. Most of these items showed medium-to-high level effects.
However, similarly to Booklet‟s 1 results, only two items were in favour of the focal group.
Anchor items also demonstrated a different pattern of DIF between the two booklets.
For Booklet 1, DIF was detected for B3 and B5, with small effects. In the case of Booklet 2,
items B5, B8, and B10 were flagged for DIF with small-to-large effects. Interestingly, only one
item in Booklet 1 and two in Booklet 2 (all three items were anchor items) were in favour of the
focal group, those who reported using lower-order reading strategies more often. However, on
all other selected response items, differential item functioning favoured the reference group.
3.4.4 DIF with Scaled Matching Score, Polytomous Items and DSF
In terms of the DIF for polytomous items, the results were similar to the previous
analysis with dichotomous items in that a large difference in the proportion of DIF items was
observed between the two booklets. See Tables 18 and 19 for DIF analyses of polytomous items
for Booklet 1 and Booklet 2, respectively.
41
Table 18
DIF for Polytomous Items with Scaled Matching Score, Booklet 1
Item Mantel-Haenszel
Chi-Square
Step(s) Direction of DIF
Combined
Decision Rule
(CDR)
A1-1 0.00 --- --- ---
A2-1 0.74 --- --- ---
A3-1 0.58 --- --- ---
B1-1 0.89 --- --- ---
B6-1 3.01 --- --- ---
B7-1 4.49* 3rd
Lower-order strategies DIF
B9-1 2.07 --- --- ---
B11-1 2.42 --- --- ---
D1-1 0.19 --- --- ---
D7-1 1.37 --- --- ---
D8-1 16.45** 1st, 2
nd, 3
rd Lower-order strategies DIF
D9-1 9.92** 1st, 2
nd Lower-order strategies DIF
D11-1 9.25** 1st, 2
nd Lower-order strategies DIF
Note. *p < .05, **p < .01
Table 19
DIF for Polytomous Items with Scaled Matching Score, Booklet 2
Item Mantel-Haenszel
Chi-Square
Step(s) Direction of DIF
Combined
Decision Rule
(CDR)
A1-2 16.80** 1st, 2
nd, 3
rd Lower-order strategies DIF
A2-2 30.40** 1st, 2
nd Lower-order strategies DIF
A3-2 6.89** 2nd
Lower-order strategies DIF
B1-2 26.97** 1st, 2
nd Lower-order strategies DIF
B6-2 44.01** 1st, 2
nd, 3
rd Lower-order strategies DIF
B7-2 47.04** 1st, 2
nd Lower-order strategies DIF
B9-2 38.49** 1st,
2nd
, 3rd
Lower-order strategies DIF
B11-2 23.17** 1st,
2nd
, 3rd
Lower-order strategies DIF
D1-2 47.92** 1st,
2nd
, 3rd
Lower-order strategies DIF D8-2 43.38** 1
st, 2
nd, 3
rd Lower-order strategies DIF
D9-2 44.36** 1st, 2
nd Lower-order strategies DIF
D10-2 45.59** 1st, 2
nd, 3
rd Lower-order strategies DIF
D11-2 31.29** 1st, 2
nd Lower-order strategies DIF
Note. **p < .01
42
DIF was detected for four items in Booklet 1 and all of the items in Booklet 2. The
replication of Booklet 2‟s higher rate of DIF with constructed response items confirms that the
two booklets were in fact quite different. As mentioned earlier, lower test performance was
observed for examinees in Booklet 1 compared to Booklet 2; perhaps this lack of equivalency
between the two booklets in terms of performance is implicated in these DIF findings.
Another intriguing finding was that all DIF items, on both booklets, were in favour of
the focal group. For these items lower-order reading strategies facilitated better performance.
One potential explanation may be that selected responses predominantly require reading,
whereas constructed responses involve writing. Perhaps different strategies pertaining to writing
can promote higher scores on these items, but higher-order reading strategies fail to do so.
The results of a follow up differential step functioning analyses are reported in Tables
20 and 21. For Booklet 1, small DSF effects were detected on the four items that demonstrated
DIF, see Table 20. For Booklet 2, moderate-to-large DSF effects were observed for most of the
steps, see Table 21. For both booklets, the majority of the items involved more than one step.
Table 20
DSF for Polytomous Items with Scaled Matching Score, Booklet 1
Item Step CU-LOR Z DSF Size
B7-1 1 -0.278 -1.949 Small
2 -0.066 -0.473 Small
3 -0.674 -2.828* Large
D8-1 1 -0.480 -3.177** Moderate
2 -0.395 -2.979** Small
3 -0.490 -1.843* Moderate
D9-1 1 -0.323 -2.242* Small
2 -0.393 -2.849* Small
3 -0.225 -0.651 Small
D11-1 1 -0.354 -2.526* Small
2 -0.372 -2.550* Small
3 -0.128 -0.352 Small Note. *p < .05, p < .01
43
Table 21
DSF for Polytomous Items with Scaled Matching Score, Booklet 2
Item Step CU-LOR Z DSF Size
A1-2 1 -0.579 -3.760** Moderate
2 -0.313 -2.578* Small
3 -0.484 -2.034* Moderate
A2-2 1 -0.664 -4.972** Large
2 -0.494 -3.950** Moderate
3 -0.517 -1.871 Moderate
A3-2 1 -0.187 -1.672 Small
2 -0.420 -2.863* Small
3 -0.434 -1.517 Small
B1-2 1 -1.018 -4.959** Large
2 -0.440 -3.756** Moderate
3 -0.313 -1.327 Small
B6-2 1 -0.858 -6.032** Large
2 -0.639 -5.147** Moderate
3 -0.491 -2.646* Moderate
B7-2 1 -0.873 -6.334** Large
2 -0.703 -5.608** Large
3 -0.380 -1.620 Small
B9-2 1 -0.712 -5.278** Large
2 -0.551 -4.414** Moderate
3 -0.754 -3.278** Large
B11-2 1 -0.526 -4.120** Moderate
2 -0.484 -3.900** Moderate
3 -0.487 -2.323* Moderate
D1-2 1 -1.067 -5.146** Large
2 -0.640 -5.267** Large
3 -0.621 -3.284** Moderate
D8-2 1 -0.697 -5.163** Large
2 -0.681 -5.400** Large
3 -0.651 -2.004* Large
D9-2 1 -0.848 -4.829** Large
2 -0.759 -6.061** Large
3 -0.294 -1.187 Small
D10-2 1 -0.771 -5.059** Large
2 -0.669 -5.342** Large
3 -0.726 -2.420* Large
D11-2 1 -0.805 -4.562** Large
2 -0.534 -4.535** Moderate
3 -0.288 -1.128 Small
Note. *p < .05, **p < .01
44
An informal review of the content of the test booklets was also performed. The first
question posed was „why do certain items exhibit DIF?‟ Test items appeared to differ along such
dimensions as length, difficulty level, specific/advanced vocabulary, etc. Some of the items
relied primarily on the content of the reading prompt, that is, the answers could be found
directly in the prompt. Other items demanded reflection and connecting many ideas; the answers
to such items were not in the reading prompt.
The second question addressed by the review of the booklets was „why wasn’t DIF
found for other items?‟ This question was trying to get at whether there were any specific
characteristics about other items that “protected” them from differential item functioning
pertaining to reading strategies. Once again, items that were not flagged for DIF were highly
diverse, different in length, difficulty, differences in demand (e.g., content versus reflection),
etc. However, this preliminary content review did not reveal any specific patterns. Therefore,
the question as to „why was performance for some items facilitated by higher-order reading
strategies, but impeded for others?‟ remained.
To summarize, (1) large differences in DIF and DSF existed between the booklets, (2)
on selected response items most of the DIF favoured the reference group (use of higher-order
reading strategies), (3) on constructed response items most of the DIF favoured the focal group
(use of lower-order reading strategies) and (4) different DIF results were obtained for analyses
with total score and scaled score matching variables.
The findings of this study are encouraging even though evidence of DIF was
discovered. Specifically, the presence of DIF within this study does not point to item bias; rather
it provides support that this analysis can be used for other research purposes. Here, it
demonstrates that the general use of various reading strategies is important for achievement
outcomes; the same reading strategies can facilitate or hinder students‟ performance based on
45
context (in this study, answering selected versus constructed response items) and, potentially,
content of the items. In conclusion, to put it simply, knowing and using higher-order reading
strategies is good, but being able to use situation-appropriate strategies and being able to switch
back and forth between different types of strategies is better.
46
4 IMPLICATIONS AND CONCLUSION
Differential item functioning offers a unique way of assessing test fairness and validity.
DIF occurs when test items are differentially difficult for individuals from different groups with
the same ability. Although, this analysis has traditionally been used to identify problematic
items that assessed traits irrelevant to the test, this study demonstrates that it can also be used to
investigate attributes that are important to performance regardless of students‟ ability. Thus, in
addition to examining fairness, DIF can be used as a tool to identify individual differences that
increase (or decrease) the probability of correctly responding to test items.
In this study, DIF analysis was used to examine whether employing different types of
strategies during reading affected students‟ performance on a reading assessment. To
demonstrate that DIF offers a unique perspective on the effects of reading strategies on test
performance, it is useful to contrast DIF findings with a similar analysis that does not account
for underlying reading ability. Chi-square test of independence statistics for each test item
suggested that it was always more advantageous to use higher-order reading strategies than
lower-order strategies on PCAP 2007 reading assessment, with an exception of only one item in
Booklet 2. However, analyzing the results with DIF offers additional information as it
demonstrates that, when taking reading ability into account, the use of higher-order reading
strategies is not always facilitative and suitable because these strategies can facilitate or hinder
the performance on an item depending on the context, and possibly content. That is, using DIF,
this study found that higher-order strategies were only effective for answering selected response
items correctly, but lower-order reading strategies were more helpful when answering
constructed response items.
47
Another reported finding was that the two booklets of the test had different magnitudes
of observed DIF. Again, the interpretation of this finding might be aided by comparing DIF
results to the chi-square statistics. For Booklet 1, without accounting for ability level, students
who reported using higher-order reading strategies performed significantly better on most of the
test items. For Booklet 2, however, the use of higher-order strategies was not significantly
related to answering constructed response items. For both booklets, using lower-order reading
strategies was never related to higher test performance3. Yet, DIF results identified significant
differences in performance between students who reported using higher- and lower-order
strategies for both booklets. The implications of these findings are that (1) the effectiveness of
reading strategies depends on the students‟ reading ability level and (2) demonstrating the
interaction between ability level and the use of reading strategies was made possible by
performing DIF analyses in addition to evaluating the effectiveness of reading strategies on their
own.
The finding that the performance on constructed response items was hindered by
higher-order reading strategies is counterintuitive. Answering these types of items correctly
might be qualitatively different from answering selected response items, because the former
involves a writing component. Therefore, it is possible that other higher-order strategies would
facilitate higher performance on constructed response items, but these strategies might be
specific to cognitive processes and skills that pertain to the writing process rather than reading.
Another interesting finding was the variation of DIF results when different estimates of
reading ability were used. As was shown before (Figure 1 and Appendix C), the total and scaled
scores were not well related within Booklet 2, which might have contributed to these DIF
3 One exception is item B8 on Booklet 2; without accounting for ability level, this selected response item was in the
direction of lower-order reading strategies.
48
results. This finding also points out that decisions such as this, regarding the choice of a
matching variable, must be made carefully. In this study, however, even though the magnitude
of the results was different depending on the estimate of ability, the finding about the direction
of DIF (i.e., better performance on selected response items by students using higher-order
strategies and better performance on constructed response items by students using lower-order
reading strategies) persisted regardless of which matching variable was employed in the
analysis4.
One of the merits of using differential item functioning is that this procedure helps shed
additional light on other conventional analyses and helps to fine-tune the interpretation of the
findings. To conclude, DIF offers a unique approach to studying the relationships of individual
differences with test performance.
4.1 Limitations and Future Directions
A major limitation of this study is that the questions on the student contextual
questionnaire assessing the use of reading strategies asked about general behaviours and
preferences. Thus, an assumption underlying this study was that students who reported using,
for example, higher-order reading strategies more often than lower-order strategies, used these
strategies during the examination as well. Another limitation involves the generalizability of the
results; only PCAP reading assessment data were analyzed in this study. However, the use of
specific reading strategies might affect other domains differently, as was shown here with
writing. Using other datasets, such as PCAP mathematics and/or science assessments, it would
be interesting to investigate whether the results of this study are generalizable to other domains.
4 One exception is item D1 on Booklet 1 for DIF analysis with the total score matching variable; this constructed
response item was in the direction of higher-order reading strategies.
49
Also, replicating the results with an alternative reading assessment would further strengthen the
generalizability of results of this study.
An investigation of the role of metacognitive strategies during reading would also
supplement the findings of this study. As discussed in the literature review, metacognitive
strategies aid in monitoring the effectiveness of cognitive reading strategies on successful
comprehension. Thus, it would be interesting to examine if, and how, metacognitive strategies
affect the choice of cognitive reading strategies, lower- and higher-order strategies. Another
future direction involves performing a content analysis of the test. Specifically, content of the
reading prompts and item stems/options can be classified according to predetermined criteria
such as length, difficulty level, theme, etc, and analyzed for patterns; a procedure that was
beyond the scope of the present study. Exploring the dimensionality of the two versions of the
test with alternative analyses would also strengthen the results found here.
50
REFERENCES
Afflerbach, P., Pearson, P. D., & Paris, S. G. (2008). Clarifying differences between reading
skills and reading strategies. The Reading Teacher, 61(5), 364-373.
Cain, K., Oakhill, J. V., Barnes, M. A., Bryant, P. E. (2001). Comprehension skill, inference-
making ability, and their relation to knowledge. Memory & Cognition, 29(6), 850-
859.
Dole, J. A., Duffy, G. G., Roehler, L. R., & Pearson, P. D. (1991). Moving from the old to the
new: Research on reading comprehension instruction. Review of Educational
Research, 61(2), 239-264.
Dole, J. A., Nokes, J. D., & Drits, D. (2008). Cognitive strategy instruction. In G. G. Duffy & S.
E. Israel (Eds.), Handbook of research on reading comprehension (pp. 347-373).
Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Fogarty, E. A. (2006). Teachers’ use of differentiated reading strategy instruction for talented,
average, and struggling readers in regular and SEM-R classrooms (Doctoral
dissertation). Retrieved April 27, 2012, from
http://www.gifted.uconn.edu/siegle/Dissertations/Elizabeth%20Fogarty.pdf
Fries, C. C. (1963). Linguistics and reading. New York: Holt, Rinehart & Winston.
Graesser, A. C. (2007). An introduction to strategic reading comprehension. In D. S. McNamara
(Ed.), Reading comprehension strategies: Theories, interventions, and technologies
(pp. 3-26). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Holland, W. P., & Thayer, D. T. (1988). Differential item performance and the Mantel Haenszel
procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129-145). Hillsdale,
NJ: Lawrence Erlbaum Associates, Inc.
51
Hoover, W. A., & Gough, P. B. (1990). The simple view of reading. Reading and Writing: An
Interdisciplinary Journal, 2(2), 127-160.
Koda, K. (2005). Insights into second language reading: A cross-linguistic approach. New
York: Cambridge University Press.
Magliano, J. P., Millis, K., Ozuru, Y., & McNamara, D. S. (2007). A multidimensional
framework to evaluate reading assessment tools. In D. S. McNamara (Ed.), Reading
comprehension strategies: Theories, interventions, and technologies (pp. 107-136).
Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
McNamara, D. S. (2007). Reading comprehension strategies: Theories, interventions, and
technologies. Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
McNamara, D. S., O‟Reilly, T., Rowe, M., Boonthum, C., & Levinstein, I. (2007). iSTART: A
web-based tutor that teaches self-explanation and metacognitive reading strategies.
In D. S. McNamara (Ed.), Reading comprehension strategies: Theories,
interventions, and technologies (pp. 397-420). Mahwah, NJ: Lawrence Erlbaum
Associates, Inc.
Mokhtari, K., & Reichard, C. A. (2002). Assessing students‟ metacognitive awareness of
reading strategies. Journal of Educational Psychology, 94(2), 249-259.
Oakhill, J., & Cain, K. (2007). Issues of causality in children‟s reading comprehension. In D. S.
McNamara (Ed.), Reading comprehension strategies: Theories, interventions, and
technologies (pp. 47-71). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Oakhill, J., & Yuill, N. (1996). Higher order factors in comprehension disability: Processes and
remediation. In C. Cornaldi & J. Oakhill (Eds.), Reading comprehension difficulties:
Processes and intervention (pp. 69-92). Mahwah, NJ: Lawrence Erlbaum
Associates, Inc.
52
Office of Educational Assessment. (2005). ScorePak®: Item Analysis. Seattle, WA: Office of
Educational Assessment. Retrieved August 28, 2012 from
http://www.washington.edu/oea/pdfs/resources/item_analysis.pdf
O‟Reilly, T., & McNamara, D. S. (2007). The impact of science knowledge, reading skill, and
reading strategy knowledge on more traditional “high-stakes” measures of high
school students‟ science achievement. American Educational Research Journal,
44(1), 161-196.
Paris, S. G., Lipson, M. Y., & Wixson, K. K. (1983). Becoming a strategic reader.
Contemporary Educational Psychology, 8, 293-316.
Penfield, R. D. (2005). DIFAS: Differential item functioning analysis system. Applied
Psychological Measurement, 29(2), 150-151.
Penfield, R. D. (2007). DIFAS 4.0 user’s manual. Retrieved October 24, 2011, from
http://www.education.miami.edu/facultysites/penfield/index.html
Penfield, R. D., Gattamorta, K., & Childs, R. A. (2009). An NCME instructional module on
using differential step functioning to refine the analysis of DIF in polytomous items.
Educational Measurement: Issues and Practice, 38-49.
Perfetti, C. (2001). Reading skills. In N. J. Smelser & P. B. Baltes (Eds.), International
encyclopedia of the social & behavioral sciences (pp. 12800-12805). Oxford:
Pergamon.
Rapp, D. N., van den Broek, P., McMaster, K. L., Kendeou, P., & Espin, C. A. (2007). Higher-
order comprehension processes in struggling readers: A perspective for research and
intervention. Scientific Studies of Reading, 11(4), 289-312.
Sweet, A. P., & Snow, C. E. (2003). Rethinking reading comprehension. New York: Guilford
Press.
53
VanderVeen, A., Huff, K., Gierl, M., McNamara, D. S., Louwerse, M., & Graesser, A. C.
(2007). Developing and validating instructionally relevant reading competency
profiles measured by the critical reading section of the SAT Reasoning Test. In D. S.
McNamara (Ed.), Reading comprehension strategies: Theories, interventions, and
technologies (pp. 137-172). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
54
APPENDIX A. FACTOR ANALYSES FOR BOOKLET 1 AND BOOKLET 2
3-Factor Model: Reading Strategies, Booklet 1
Questions Pattern
1 2 3
Trying to make connections to what I already know .745
Thinking about the author‟s message .635
Applying what I know about word origins or word parts .616
Looking for clues such as headings or captions .471 .286
Thinking about the other words in a sentence to figure out the meaning .368 .258
Trying to predict what the material is about .318 .234
Asking someone to help me .641 -.112
Sounding out as many words as I can .552
Reading out loud to myself -.142 .550 .153
Highlighting or making notes or drawings on the important parts .109 .388
Using an outside source like a dictionary .161 .363
Looking at charts and pictures .225 .317
Re-reading the more difficult parts .689
Finding a quiet place to read .579
Sometimes reading more quickly or more slowly, depending on the
material
.500
2-Factor Model, Booklet 1
Questions Pattern
1 2
Trying to make connections to what I already know .770
Thinking about the author‟s message .653
Applying what I know about word origins or word parts .581
Thinking about the other words in a sentence to figure out the meaning .517
Looking for clues such as headings or captions .451 .267
Trying to predict what the material is about .434
Asking someone to help me -.146 .612
Reading out loud to myself .571
Sounding out as many words as I can .553
Highlighting or making notes or drawings on the important parts .115 .390
Using an outside source like a dictionary .187 .364
Looking at charts and pictures .227 .310
55
3-Factor Model: Reading Strategies, Booklet 2
Questions Pattern
1 2 3
Trying to make connections to what I already know .703
Thinking about the author‟s message .651 -.111
Applying what I know about word origins or word parts .616
Thinking about the other words in a sentence to figure out the meaning .403 .246
Looking for clues such as headings or captions .399 .361
Trying to predict what the material is about .336 .206
Sounding out as many words as I can .634
Asking someone to help me .598
Reading out loud to myself -.195 .566 .178
Highlighting or making notes or drawings on the important parts .110 .393
Using an outside source like a dictionary .215 .347
Looking at charts and pictures .248 .336
Re-reading the more difficult parts .627
Finding a quiet place to read .605
Sometimes reading more quickly or more slowly, depending on the
material
.477
2-Factor Model, Booklet 2
Questions Pattern
1 2
Trying to make connections to what I already know .747
Thinking about the author‟s message .670 -.123
Applying what I know about word origins or word parts .573
Thinking about the other words in a sentence to figure out the meaning .544
Trying to predict what the material is about .447
Looking for clues such as headings or captions .393 .344
Sounding out as many words as I can .632
Asking someone to help me -.107 .581
Reading out loud to myself .580
Highlighting or making notes or drawings on the important parts .105 .395
Using an outside source like a dictionary .198 .344
Looking at charts and pictures .228 .322
56
APPENDIX B. MEANS, STANDARD DEVIATIONS AND FREQUENCIES OF
STUDENTS’ QUESTIONNAIRE RESPONSES FOR GROUPING VARIABLE
SAMPLE, BOOKLET 1 AND BOOKLET 2
Grouping Variable Sample, Booklet 1 [N = 2667]
Questions
Score Distributions
M SD Rarely or
never Sometimes Often
Higher-order Reading Strategies
Trying to make connections to what I already
know 2.32 0.66 10.9% 45.8% 43.3%
Thinking about the author‟s message 2.04 0.75 26.2% 43.4% 30.4%
Applying what I know about word origins or
word parts 1.98 0.71 26.6% 49.3% 24.1%
Thinking about the other words in a sentence
to figure out the meaning 2.29 0.68 12.9% 45.1% 42.0%
Trying to predict what the material is about 2.11 0.71 20.2% 48.2% 31.6%
Looking for clues such as headings or captions 1.95 0.70 27.3% 50.3% 22.5%
Lower-order Reading Strategies
Asking someone to help me 1.52 0.68 58.6% 30.7% 10.7%
Sounding out as many words as I can 1.43 0.64 64.4% 27.8% 7.8%
Reading out loud to myself 1.52 0.69 58.9% 29.7% 11.4%
Highlighting or making notes or drawings on
the important parts 1.43 0.67 67.2% 22.7% 10.1%
Using an outside source like a dictionary 1.65 0.70 48.4% 38.2% 13.4%
Looking at charts and pictures 2.02 0.75 26.8% 44.5% 28.7%
57
Grouping Variable Sample, Booklet 2 [N = 2623]
Questions
Score Distributions
M SD Rarely or
never Sometimes Often
Higher-order Reading Strategies
Trying to make connections to what I already
know 2.32 0.66 10.5% 46.5% 43.0%
Thinking about the author‟s message 2.00 0.75 28.4% 43.1% 28.5%
Applying what I know about word origins or
word parts 1.96 0.72 27.5% 48.7% 23.8%
Thinking about the other words in a sentence
to figure out the meaning 2.30 0.68 12.4% 45.3% 42.3%
Trying to predict what the material is about 2.12 0.72 20.3% 47.5% 32.2%
Looking for clues such as headings or captions 1.95 0.71 27.7% 49.7% 22.6%
Lower-order Reading Strategies
Asking someone to help me 1.50 0.67 59.5% 30.5% 10.0%
Sounding out as many words as I can 1.45 0.64 62.9% 28.7% 8.3%
Reading out loud to myself 1.51 0.68 59.7% 29.6% 10.7%
Highlighting or making notes or drawings on
the important parts 1.43 0.67 66.6% 23.6% 9.8%
Using an outside source like a dictionary 1.64 0.69 48.1% 39.8% 12.0%
Looking at charts and pictures 1.99 0.74 28.1% 44.8% 27.1%
58
APPENDIX C. THE RELATIONSHIP BETWEEN TOTAL AND SCALED SCORES
FOR BOOKLET 1 AND BOOKLET 2
Booklet 1
59
Booklet 2
60
APPENDIX D. MISSING ITEM DATA FOR BOOKLET 1 AND BOOKLET 2
Booklet 1
Item Frequencies (Percent)
0 1 2 3 Missing
Constructed Response Items
A1-1 6.3 47.5 20.5 5.1 20.6
A2-1 6.8 36.9 30.9 6.3 19.1
A3-1 2.6 17.5 9.7 4.8 65.5
B1-1 9.6 36.2 44.5 6.1 3.6
B6-1 17.1 25.5 38.3 9.4 9.8
B7-1 22.8 23.7 35.4 6.7 11.3
B9-1 24.5 27.7 34.7 5.8 7.3
B11-1 23.7 24.6 32.9 6.9 11.8
D1-1 42.2 24.0 17.3 4.4 12.2
D7-1 20.9 39.5 21.8 3.2 14.6
D8-1 15.2 38.8 31.0 5.5 9.6
D9-1 14.2 35.5 32.0 5.2 13.1
D11-1 16.9 34.7 26.7 4.4 17.3
Item A B C D Missing
Selected Response Items
B2-1 4.3 91.1 2.9 0.6 1.1
B3-1 2.8 3.6 13.1 79.2 1.3
B4-1 3.6 76.8 7.6 10.9 1.0
B5-1 79.0 6.4 6.6 7.0 1.0
B8-1 80.0 6.1 1.4 10.1 2.5
B10-1 56.6 12.9 9.8 17.7 3.0
C1-1 3.7 91.1 0.9 2.8 1.6
C2-1 6.4 2.3 2.5 87.3 1.4
C3-1 14.0 21.7 6.6 56.3 1.4
C4-1 3.1 87.6 3.0 5.1 1.2
C5-1 13.1 13.3 1.0 71.1 1.5
C6-1 10.5 6.4 7.3 74.4 1.4
C7-1 37.3 10.1 38.3 12.1 2.1
C8-1 2.8 62.2 17.1 16.1 1.8
D2-1 14.3 16.3 32.7 33.0 3.9
D3-1 35.4 5.6 52.8 3.5 2.7
D4-1 13.0 19.5 53.4 11.5 2.6
D5-1 7.8 32.1 46.3 11.4 2.3
D6-1 11.8 5.9 54.9 24.8 2.6
D10-1 21.6 7.4 54.6 10.6 5.9
E1-1 8.3 20.1 6.1 62.4 3.1
E2-1 14.2 16.7 18.9 47.4 2.8
E3-1 13.5 65.8 5.1 12.7 2.9
61
Item Frequencies (Percent)
A B C D Missing
E4-1 58.7 13.1 3.7 21.7 2.8
E5-1 17.2 23.2 53.0 3.7 2.9
E6-1 61.0 16.3 11.6 6.4 4.7
E7-1 7.1 14.5 25.6 48.1 4.6
E8-1 7.2 18.5 9.2 60.6 4.4
E9-1 6.8 11.6 71.5 5.6 4.6
F1-1 58.8 15.2 13.1 9.5 3.5
F2-1 3.3 60.3 10.9 22.1 3.4
F3-1 9.2 7.5 2.1 78.0 3.2
F4-1 5.7 21.7 2.9 66.0 3.7
F5-1 23.8 12.0 41.5 18.9 3.8
F6-1 6.7 9.6 72.2 7.9 3.6
F7-1 86.2 7.1 1.7 1.2 3.8
F8-1 30.2 57.1 2.4 6.3 4.0
Booklet 2
Item Frequencies (Percent)
0 1 2 3 Missing
Constructed Response Items
A1-2 4.0 46.6 29.0 6.5 13.8
A2-2 2.9 36.9 30.5 5.4 24.3
A3-2 1.4 25.9 12.9 4.0 55.8
B1-2 9.2 36.8 44.8 6.4 2.8
B6-2 18.2 27.2 35.8 10.2 8.6
B7-2 22.5 24.1 35.7 7.8 9.9
B9-2 23.3 29.8 34.3 6.4 6.2
B11-2 23.1 25.3 33.2 8.5 9.9
D1-2 7.3 38.7 38.2 10.3 5.4
D8-2 15.0 36.7 30.9 4.1 13.3
D9-2 8.7 30.4 47.6 6.8 6.5
D10-2 11.7 39.0 34.1 4.5 10.7
D11-2 6.2 34.8 44.3 6.6 8.2
Item A B C D Missing
Selected Response Items
B2-2 4.4 91.2 3.1 0.5 0.9
B3-2 2.6 4.1 13.3 78.6 1.4
B4-2 3.5 76.3 8.2 11.2 0.9
B5-2 78.7 6.1 7.1 7.3 0.8
B8-2 79.9 6.1 1.7 10.3 2.1
B10-2 56.4 13.4 9.7 17.8 2.7
62
Item Frequencies (Percent)
A B C D Missing
C1-2 10.3 9.3 71.2 6.0 3.2
C2-2 87.9 4.3 5.0 1.9 0.9
C3-2 5.5 1.5 4.1 87.8 1.1
C4-2 2.7 5.9 85.8 4.4 1.2
C5-2 2.9 90.1 2.7 3.3 1.0
C6-2 2.5 79.1 12.7 4.6 1.1
C7-2 81.1 10.5 2.2 4.8 1.4
C8-2 3.1 10.4 81.6 3.4 1.6
C9-2 15.6 11.1 68.5 3.2 1.7
D2-2 6.5 6.6 5.4 79.4 2.2
D3-2 31.9 4.8 7.0 53.7 2.6
D4-2 6.8 82.1 3.3 5.6 2.1
D5-2 2.0 3.6 89.1 3.3 2.0
D6-2 8.2 13.0 68.3 8.2 2.3
D7-2 13.1 24.5 13.5 46.2 2.7
E1-2 21.5 2.7 66.4 6.9 2.5
E2-2 73.4 7.6 9.2 6.9 2.9
E3-2 6.4 4.1 2.9 84.0 2.6
E4-2 4.1 79.1 3.6 10.1 3.1
E5-2 12.6 62.3 15.4 7.1 2.6
E6-2 3.8 2.9 13.6 77.0 2.6
E7-2 13.3 70.5 2.2 11.2 2.6
E8-2 6.7 8.7 74.5 7.2 2.8
F1-2 3.3 7.4 74.3 12.7 2.3
F2-2 16.7 2.2 8.6 70.0 2.5
F3-2 76.0 4.7 8.5 8.3 2.5
F4-2 1.4 1.0 1.4 93.9 2.3
F5-2 2.0 2.4 10.9 82.1 2.6
F6-2 15.4 3.1 19.6 58.8 3.2
F7-2 12.9 68.8 5.2 9.9 3.1
F8-2 4.5 23.9 7.4 60.9 3.3
63
APPENDIX E. GROUPING VARIABLE SAMPLE: ITEM STATISTICS FOR
BOOKLET 1 AND BOOKLET 2
Item Statistics, Booklet 1 [N = 2667]
Item Item Format Section/
Source p-value rpb
Proportion
Incorrect
Mean
Score
Correct†
Mean
Score
Incorrect
1 Constructed
Response Items
Only
A1-1 .39 .44 .21 43.96† 33.76
2 A2-1 .46 .55 .18 44.48† 29.60
3 A3-1 .22 .35 .61 47.21† 38.44
4 Constructed B1-1_Anchor .52 .42 .10 43.08† 29.95
5 Selected B2-1_Anchor .93 .20 .07 42.53 32.01
6 Selected B3-1_Anchor .82 .18 .18 43.00 36.40
7 Selected B4-1_Anchor .81 .32 .19 43.88 33.02
8 Selected B5-1_Anchor .85 .41 .15 44.02 28.96
9 Constructed B6-1_Anchor .49 .56 .20 44.76† 30.18
10 Constructed B7-1_Anchor .45 .59 .26 45.76† 30.32
11 Selected B8-1_Anchor .84 .26 .16 43.32 33.73
12 Constructed B9-1_Anchor .44 .57 .24 45.42† 30.74
13 Selected B10-1_Anchor .64 .37 .36 45.52 35.22
14 Constructed B11-1_Anchor .44 .53 .28 45.24† 32.82
15
Selected
Response Items
Only
C1-1 .94 .31 .06 42.86 26.81
16 C2-1 .92 .31 .08 43.04 28.79
17 C3-1 .64 .38 .36 45.63 35.16
18 C4-1 .91 .25 .09 42.87 31.49
19 C5-1 .75 .28 .25 43.97 35.25
20 C6-1 .80 .33 .20 44.01 32.95
21 C7-1 .45 .32 .55 46.59 37.95
22 C8-1 .67 .26 .33 44.35 36.74
23 Constructed D1-1 .29 .48 .47 47.23† 35.62
24 Selected D2-1 .38 .23 .62 45.98 39.33
25 Selected D3-1 .59 .30 .41 45.24 36.86
26 Selected D4-1 .56 .22 .44 44.62 38.25
27 Selected D5-1 .54 .40 .46 46.69 36.13
28 Selected D6-1 .63 .37 .37 45.50 35.46
29 Constructed D7-1 .35 .53 .28 45.79† 31.36
30 Constructed D8-1 .43 .48 .19 44.31† 31.47
31 Constructed D9-1 .43 .56 .21 44.79† 30.80
32 Selected D10-1 .59 .22 .41 44.38 38.10
33 Constructed D11-1 .39 .57 .26 45.50† 31.34
34
Selected
Response Items
Only
E1-1 .69 .36 .31 44.96 34.77
35 E2-1 .53 .27 .47 45.30 37.90
36 E3-1 .73 .35 .27 44.64 34.12
37 E4-1 .69 .44 .31 45.63 33.45
64
Item Item Format Section/
Source p-value rpb
Proportion
Incorrect
Mean
Score
Correct†
Mean
Score
Incorrect
38
E5-1 .57 .23 .43 44.63 38.10
39 E6-1 .71 .45 .29 45.55 32.92
40 E7-1 .56 .37 .44 46.18 36.28
41 E8-1 .68 .31 .32 44.70 35.77
42 E9-1 .80 .46 .20 44.76 29.93
43
Selected
Response Items
Only
F1-1 .67 .33 .33 44.93 35.42
44 F2-1 .66 .30 .34 44.73 36.16
45 F3-1 .84 .35 .16 43.83 31.58
46 F4-1 .70 .29 .30 44.36 35.83
47 F5-1 .50 .34 .50 46.44 37.28
48 F6-1 .78 .31 .22 44.02 34.16
49 F7-1 .89 .15 .11 42.57 35.56
50 F8-1 .65 .40 .35 45.68 34.71
Note. Indices of item difficulty below 0.40 and indices of item discrimination below .25 are in bold. †For constructed response items, mean score correct includes score of 1 or above (i.e., partial and/or full credit).
Item Statistics, Booklet 2 [N = 2623]
Item Item Format Section/
Source p-value rpb
Proportion
Incorrect
Mean
Score
Correct†
Mean
Score
Incorrect
1 Constructed
Response Items
Only
A1-2 .42 .42 .18 47.35† 38.68
2 A2-2 .38 .41 .27 48.08† 39.70
3 A3-2 .23 .31 .55 49.19† 43.01
4 Constructed B1-2_Anchor .49 .42 .11 47.00† 36.22
5 Selected B2-2_Anchor .92 .18 .08 46.41 39.26
6 Selected B3-2_Anchor .79 .12 .21 46.62 42.76
7 Selected B4-2_Anchor .76 .24 .24 47.32 40.89
8 Selected B5-2_Anchor .80 .37 .20 47.72 38.16
9 Constructed B6-2_Anchor .44 .51 .26 48.45† 38.28
10 Constructed B7-2_Anchor .41 .54 .31 49.11† 38.37
11 Selected B8-2_Anchor .81 .21 .19 46.94 40.96
12 Constructed B9-2_Anchor .40 .52 .29 48.77† 38.43
13 Selected B10-2_Anchor .65 .16 .35 47.28 43.10
14 Constructed B11-2_Anchor .40 .49 .32 48.63† 39.79
15
Selected
Response Items
Only
C1-2 .74 .11 .26 46.70 43.28
16 C2-2 .92 .18 .08 46.41 39.11
17 C3-2 .92 .25 .08 46.60 36.92
18 C4-2 .89 .20 .11 46.58 39.29
19 C5-2 .94 .18 .06 46.30 38.00
20 C6-2 .84 .20 .16 46.81 40.59
21 C7-2 .84 .13 .16 46.52 42.21
65
Item Item Format Section/
Source p-value rpb
Proportion
Incorrect
Mean
Score
Correct†
Mean
Score
Incorrect
22 C8-2 .88 .24 .12 46.79 38.95
23 C9-2 .72 .10 .28 46.66 43.62
24 Constructed D1-2 .50 .50 .12 47.15† 35.50
25 Selected D2-2 .86 .27 .14 46.97 38.61
26 Selected D3-2 .61 .20 .39 47.70 42.84
27 Selected D4-2 .88 .23 .12 46.72 39.16
28 Selected D5-2 .94 .23 .06 46.41 36.35
29 Selected D6-2 .76 .25 .24 47.35 40.88
30 Selected D7-2 .50 .10 .50 47.27 44.37
31 Constructed D8-2 .38 .49 .28 48.43† 39.02
32 Constructed D9-2 .50 .55 .14 47.56† 35.12
33 Constructed D10-2 .41 .52 .21 48.07† 37.10
34 Constructed D11-2 .48 .50 .14 47.40† 35.82
35
Selected
Response Items
Only
E1-2 .72 .19 .28 47.19 42.27
36 E2-2 .77 .17 .23 46.93 42.08
37 E3-2 .89 .24 .11 46.69 38.54
38 E4-2 .81 .14 .19 46.61 42.32
39 E5-2 .70 .26 .30 47.69 41.40
40 E6-2 .83 .21 .17 46.90 40.64
41 E7-2 .78 .21 .22 47.10 41.22
42 E8-2 .80 .25 .20 47.17 40.35
43
Selected
Response Items
Only
F1-2 .81 .19 .19 46.87 41.36
44 F2-2 .78 .16 .22 46.86 42.15
45 F3-2 .82 .22 .18 46.95 40.53
46 F4-2 .96 .27 .04 46.36 32.08
47 F5-2 .87 .24 .13 46.79 38.98
48 F6-2 .65 .15 .35 47.22 43.23
49 F7-2 .73 .14 .27 46.90 42.94
50 F8-2 .66 .13 .34 47.04 43.39
Note. Indices of item difficulty below 0.40 and indices of item discrimination below .25 are in bold. †For constructed response items, mean score correct includes score of 1 or above (i.e., partial and/or full credit).
66
Item Discrimination Indices, Booklet 1 [N = 2667]
Item Item Discrimination
rpb for constructed
response items
rpb for selected
response items
rpb for all
items
Constructed Response Items
A1-1 .48 --- .44
A2-1 .56 --- .55
A3-1 .38 --- .35
B1-1 .43 --- .42
B6-1 .56 --- .56
B7-1 .60 --- .59
B9-1 .57 --- .57
B11-1 .54 --- .53
D1-1 .49 --- .48
D7-1 .56 --- .53
D8-1 .52 --- .48
D9-1 .59 --- .56
D11-1 .61 --- .57
Selected Response Items
B2-1 --- .22* .20*
B3-1 --- .18** .18**
B4-1 --- .33 .32
B5-1 --- .42 .41
B8-1 --- .26 .26
B10-1 --- .40 .37
C1-1 --- .34 .31
C2-1 --- .34 .31
C3-1 --- .41 .38
C4-1 --- .31 .25
C5-1 --- .28 .28
C6-1 --- .36 .33
C7-1 --- .34 .32
C8-1 --- .29 .26
D2-1 --- .22* .23*
D3-1 --- .31 .30
D4-1 --- .22* .22*
D5-1 --- .38 .40
D6-1 --- .38 .37
D10-1 --- .19** .22*
E1-1 --- .39 .36
E2-1 --- .28 .27
E3-1 --- .39 .35
E4-1 --- .46 .44
E5-1 --- .26 .23*
E6-1 --- .47 .45
67
Item Item Discrimination
rpb for constructed
response items
rpb for selected
response items
rpb for all
items
E7-1 --- .40 .37
E8-1 --- .30 .31
E9-1 --- .47 .46
F1-1 --- .37 .33
F2-1 --- .33 .30
F3-1 --- .38 .35
F4-1 --- .30 .29
F5-1 --- .35 .34
F6-1 --- .33 .31
F7-1 --- .18** .15**
F8-1 --- .40 .40
Note.*rpb is between .200 - .249, **.150 - .199.
Item Discrimination Indices, Booklet 2 [N = 2623]
Item Item Discrimination
rpb for constructed
response items
rpb for selected
response items
rpb for all
items
Constructed Response Items
A1-2 .50 --- .42
A2-2 .57 --- .41
A3-2 .39 --- .31
B1-2 .52 --- .42
B6-2 .60 --- .51
B7-2 .63 --- .54
B9-2 .61 --- .52
B11-2 .58 --- .49
D1-2 .60 --- .50
D8-2 .59 --- .49
D9-2 .67 --- .55
D10-2 .64 --- .52
D11-2 .62 --- .50
Selected Response Items
B2-2 --- .03*** .18**
B3-2 --- .01*** .12***
B4-2 --- .04*** .24*
B5-2 --- .06*** .37
B8-2 --- .03*** .21*
B10-2 --- .30 .16**
C1-2 --- .19** .11***
C2-2 --- .34 .18**
68
Item Item Discrimination
rpb for constructed
response items
rpb for selected
response items
rpb for all
items
C3-2 --- .41 .25
C4-2 --- .38 .20*
C5-2 --- .34 .18**
C6-2 --- .35 .20*
C7-2 --- .25 .13***
C8-2 --- .45 .24*
C9-2 --- .21* .10***
D2-2 --- .49 .27
D3-2 --- .37 .20*
D4-2 --- .44 .23*
D5-2 --- .45 .23*
D6-2 --- .44 .25
D7-2 --- .16** .10***
E1-2 --- .34 .19**
E2-2 --- .31 .17**
E3-2 --- .42 .24*
E4-2 --- .22* .14***
E5-2 --- .50 .26
E6-2 --- .37 .21*
E7-2 --- .44 .21*
E8-2 --- .45 .25
F1-2 --- .30 .19**
F2-2 --- .33 .16**
F3-2 --- .37 .22*
F4-2 --- .46 .27
F5-2 --- .40 .24*
F6-2 --- .27 .15**
F7-2 --- .24* .14***
F8-2 --- .29 .13***
Note.*rpb is between .200 - .249, **.150 - .199, ***.000-.149.
69
APPENDIX F. ANCHOR ITEMS ELIMINATED: ITEM DISCRIMINATION FOR
BOOKLET 1 AND BOOKLET 2
Booklet 1
Item Item Discrimination
rpb for constructed
response items
rpb for selected
response items
rpb for all
items
Constructed Response Items
A1-1 .49 --- .45
A2-1 .55 --- .55
A3-1 .37 --- .34
D1-1 .51 --- .50
D7-1 .60 --- .56
D8-1 .60 --- .54
D9-1 .65 --- .61
D11-1 .65 --- .59
Selected Response Items
C1-1 --- .37 .35
C2-1 --- .41 .37
C3-1 --- .40 .37
C4-1 --- .35 .32
C5-1 --- .31 .30
C6-1 --- .40 .39
C7-1 --- .32 .31
C8-1 --- .31 .29
D2-1 --- .24* .25
D3-1 --- .35 .35
D4-1 --- .22* .22*
D5-1 --- .39 .41
D6-1 --- .40 .40
D10-1 --- .24* .27
E1-1 --- .42 .40
E2-1 --- .32 .30
E3-1 --- .41 .40
E4-1 --- .47 .46
E5-1 --- .28 .26
E6-1 --- .48 .48
E7-1 --- .42 .40
E8-1 --- .35 .35
E9-1 --- .50 .49
F1-1 --- .39 .37
F2-1 --- .34 .31
F3-1 --- .45 .43
F4-1 --- .30 .30
70
Item Item Discrimination
rpb for constructed
response items
rpb for selected
response items
rpb for all
items
F5-1 --- .35 .35
F6-1 --- .38 .36
F7-1 --- .29 .27
F8-1 --- .43 .42
Booklet 2
Item Item Discrimination
rpb for constructed
response items
rpb for selected
response items
rpb for all
items
Constructed Response Items
A1-1 .50 --- .29
A2-1 .55 --- .32
A3-1 .35 --- .21*
D1-1 .59 --- .34
D7-1 .58 --- .34
D8-1 .66 --- .37
D9-1 .63 --- .35
D11-1 .64 --- .34
Selected Response Items
C1-1 --- .24* .19**
C2-1 --- .42 .33
C3-1 --- .45 .36
C4-1 --- .40 .31
C5-1 --- .42 .32
C6-1 --- .37 .29
C7-1 --- .33 .26
C8-1 --- .50 .40
D2-1 --- .25 .20*
D3-1 --- .53 .42
D4-1 --- .40 .30
D5-1 --- .49 .38
D6-1 --- .55 .42
D10-1 --- .46 .36
E1-1 --- .23* .18**
E2-1 --- .40 .31
E3-1 --- .37 .29
E4-1 --- .49 .38
E5-1 --- .35 .27
E6-1 --- .50 .38
Note.*rpb is between .200 - .249.
71
Item Item Discrimination
rpb for constructed
response items
rpb for selected
response items
rpb for all
items
E7-1 --- .44 .34
E8-1 --- .48 .36
E9-1 --- .46 .37
F1-1 --- .38 .30
F2-1 --- .40 .30
F3-1 --- .43 .35
F4-1 --- .52 .40
F5-1 --- .45 .36
F6-1 --- .31 .24*
F7-1 --- .32 .25
F8-1 --- .31 .23*
Note.*rpb is between .200 - .249, **.150 - .199.
72
APPENDIX G. ITEM ANALYSIS BY SECTION FOR BOOKLET 1 AND BOOKLET 2
Booklet 1
Item
Item Discrimination
rpb (item to
section subscale)
rpb (item
to total)
Section A
A1-1_CR .59 .46
A2-1_CR .59 .57
A3-1_CR .43 .35
Section B – Anchor
B1-1_CR .50 .48
B2-1_SR .23* .27
B3-1_SR .17** .20*
B4-1_SR .34 .37
B5-1_SR .43 .46
B6-1_CR .65 .61
B7-1_CR .67 .61
B8-1_SR .30 .32
B9-1_CR .64 .60
B10-1_SR .33 .38
B11-1_CR .56 .55
Section C
C1-1_SR .35 .36
C2-1_SR .44 .39
C3-1_SR .36 .39
C4-1_SR .32 .32
C5-1_SR .29 .31
C6-1_SR .32 .39
C7-1_SR .24* .32
C8-1_SR .28 .29
Section D
D1-1_CR .52 .50
D2-1_SR .63 .25
D3-1_SR .62 .35
D4-1_SR .68 .23*
D5-1_SR .65 .41
D6-1_SR .23* .40
D7-1_CR .30 .56
D8-1_CR .17** .54
D9-1_CR .36 .60
D10-1_SR .34 .26
D11-1_CR .25 .59
Section E
E1-1_SR .38 .39
73
Item
Item Discrimination
rpb (item to
section subscale)
rpb (item
to total)
E2-1_SR .29 .31
E3-1_SR .38 .40
E4-1_SR .42 .46
E5-1_SR .24* .26
E6-1_SR .45 .47
E7-1_SR .39 .40
E8-1_SR .33 .34
E9-1_SR .47 .49
Section F
F1-1_SR .33 .36
F2-1_SR .32 .31
F3-1_SR .42 .42
F4-1_SR .29 .29
F5-1_SR .29 .35
F6-1_SR .34 .35
F7-1_SR .32 .26
F8-1_SR .38 .42
Booklet 2
Item
Item Discrimination
rpb (item to
section subscale)
rpb (item
to total)
Section A
A1-2_CR .60 .38
A2-2_CR .55 .43
A3-2_CR .36 .28
Section B – Anchor
B1-2_CR .50 .41
B2-2_SR .22* .19**
B3-2_SR .16** .12***
B4-2_SR .31 .24*
B5-2_SR .44 .35
B6-2_CR .65 .48
B7-2_CR .67 .50
B8-2_SR .29 .23*
B9-2_CR .62 .48
B10-2_SR .03*** .21*
B11-2_CR .56 .45
Note. CR stands for constructed response items, SR
stands for selected response items.
*rpb is between .200 - .249, **.150 - .199.
74
Item
Item Discrimination
rpb (item to
section subscale)
rpb (item
to total)
Section C
C1-2_SR 0.20* .15**
C2-2_SR 0.39 .25
C3-2_SR 0.44 .28
C4-2_SR 0.37 .24*
C5-2_SR 0.40 .25
C6-2_SR 0.36 .23*
C7-2_SR 0.34 .19**
C8-2_SR 0.46 .31
C9-2_SR 0.24* .15**
Section D
D1-2_CR .54 .47
D2-2_SR .18** .32
D3-2_SR .13*** .23*
D4-2_SR .16** .29
D5-2_SR .18** .31
D6-2_SR .17** .28
D7-2_SR .08*** .14***
D8-2_CR .52 .46
D9-2_CR .61 .51
D10-2_CR .57 .48
D11-2_CR .57 .47
Section E
E1-2_SR .38 .23*
E2-2_SR .35 .23*
E3-2_SR .47 .29
E4-2_SR .32 .21*
E5-2_SR .46 .29
E6-2_SR .41 .26
E7-2_SR .45 .27
E8-2_SR .43 .28
Section F
F1-2_SR .35 .23*
F2-2_SR .34 .23*
F3-2_SR .39 .26
F4-2_SR .51 .30
F5-2_SR .42 .28
F6-2_SR .29 .19**
F7-2_SR .29 .20*
F8-2_SR .28 .17**
Note. CR stands for constructed response items, SR
stands for selected response items.
*rpb is between .200 - .249, **.150 - .199, ***.000-.149.
75
APPENDIX H. CHI-SQUARE ANALYSES FOR READING STRATEGIES BY ITEM
FOR BOOKLET 1 AND BOOKLET 2
Chi-Square Tests for Selected Response Items, Booklet 1
Items
Higher-order Strategies Lower-order Strategies
p-value Frequencies (Percent) Frequencies (Percent)
Correct Incorrect Correct Incorrect
B2-1 94.2 5.8 89.3 10.7 14.18 .00**
B3-1 82.2 17.8 82.2 17.8 0.00 .10
B4-1 83.5 16.5 68.7 31.3 52.41 .00**
B5-1 88.7 11.3 68.9 31.1 114.80 .00**
B8-1 86.5 13.5 74.0 26.0 44.10 .00**
B10-1 67.7 32.3 46.3 53.7 72.27 .00**
C1-1 94.5 5.5 88.8 11.2 19.92 .00**
C2-1 93.2 6.8 82.9 17.1 50.55 .00**
C3-1 67.5 32.5 44.5 55.5 83.42 .00**
C4-1 91.9 8.1 85.4 14.6 18.49 .00**
C5-1 77.2 22.8 66.7 33.3 21.98 .00**
C6-1 82.4 17.6 69.4 30.6 39.12 .00**
C7-1 48.3 51.7 27.2 72.8 66.20 .00**
C8-1 69.1 30.9 55.7 44.3 29.60 .00**
D2-1 39.0 61.0 30.1 69.9 12.23 .00**
D3-1 61.8 38.2 46.3 53.7 36.31 .00**
D4-1 57.4 42.6 50.2 49.8 7.61 .01**
D5-1 56.7 43.3 40.2 59.8 40.24 .00**
D6-1 66.8 33.2 45.9 54.1 69.28 .00**
D10-1 60.7 39.3 52.7 47.3 9.73 .00**
E1-1 72.5 27.5 52.7 47.3 67.12 .00**
E2-1 55.2 44.8 42.2 57.8 24.80 .00**
E3-1 76.3 23.7 58.0 42.0 62.43 .00**
E4-1 72.9 27.1 47.5 52.5 110.47 .00**
E5-1 58.7 41.3 48.9 51.1 14.42 .00**
E6-1 73.7 26.3 54.3 45.7 66.09 .00**
E7-1 59.2 40.8 39.7 60.3 56.46 .00**
E8-1 69.8 30.2 57.8 42.2 24.14 .00**
E9-1 83.4 16.6 64.2 35.8 85.51 .00**
F1-1 70.7 29.3 50.7 49.3 66.45 .00**
F2-1 69.3 30.7 50.2 49.8 59.26 .00**
F3-1 85.6 14.4 73.7 26.3 37.97 .00**
F4-1 72.6 27.4 58.4 41.6 35.29 .00**
F5-1 52.8 47.2 33.8 66.2 52.69 .00**
F6-1 80.9 19.1 61.6 38.4 78.77 .00**
F7-1 90.4 9.6 84.7 15.3 12.36 .00**
F8-1 67.7 32.3 50.2 49.8 49.28 .00**
Note. **p < .01
76
Chi-Square Tests for Selected Response Items, Booklet 2
Note. *p < .05, **p < .01
Items
Higher-order Strategies Lower-order Strategies
p-value Frequencies (Percent) Frequencies (Percent)
Correct Incorrect Correct Incorrect
B2-2 91.6 8.4 91.7 8.3 0.01 .93
B3-2 79.0 21.0 79.9 20.1 0.19 .66
B4-2 76.7 23.3 75.4 24.6 0.32 .57
B5-2 79.4 20.6 83.0 17.0 2.82 .09
B8-2 80.4 19.6 84.9 15.1 4.70 .03*
B10-2 67.4 32.6 52.0 48.0 36.71 .00**
C1-2 74.8 25.2 70.2 29.8 3.84 .05*
C2-2 93.5 6.5 83.2 16.8 50.09 .00**
C3-2 93.8 6.2 82.0 18.0 65.54 .00**
C4-2 91.1 8.9 80.4 19.6 43.34 .00**
C5-2 95.0 5.0 89.4 10.6 20.74 .00**
C6-2 86.6 13.4 70.0 30.0 72.55 .00**
C7-2 85.1 14.9 76.1 23.9 20.86 .00**
C8-2 90.0 10.0 74.9 25.1 73.29 .00**
C9-2 73.5 26.5 63.8 36.2 16.60 .00**
D2-2 88.8 11.2 72.1 27.9 82.93 .00**
D3-2 64.8 35.2 41.8 58.2 78.80 .00**
D4-2 90.5 9.5 75.2 24.8 79.05 .00**
D5-2 95.2 4.8 88.2 11.8 31.69 .00**
D6-2 80.9 19.1 51.5 48.5 168.64 .00**
D7-2 51.1 48.9 43.0 57.0 9.23 .00**
E1-2 74.1 25.9 61.2 38.8 29.35 .00**
E2-2 79.2 20.8 64.8 35.2 41.71 .00**
E3-2 91.0 9.0 79.9 20.1 44.86 .00**
E4-2 81.8 18.2 79.2 20.8 1.55 .21
E5-2 74.7 25.3 46.3 53.7 136.57 .00**
E6-2 84.2 15.8 74.2 25.8 24.68 .00**
E7-2 80.8 19.2 63.8 36.2 59.82 .00**
E8-2 82.4 17.6 68.1 31.9 45.67 .00**
F1-2 81.5 18.5 77.1 22.9 4.49 .03*
F2-2 80.5 19.5 62.9 37.1 63.86 .00**
F3-2 84.7 15.3 70.0 30.0 52.69 .00**
F4-2 97.0 3.0 91.5 8.5 29.58 .00**
F5-2 89.0 11.0 79.0 21.0 32.81 .00**
F6-2 66.5 33.5 55.1 44.9 20.41 .00**
F7-2 73.4 26.6 68.1 31.9 5.05 .03*
F8-2 69.2 30.8 51.1 48.9 52.39 .00**
77
Chi-Square Tests for Constructed Response Items, Booklet 1
Items
Higher-order Strategies Lower-order Strategies
p-value Frequencies (Percent) Frequencies (Percent)
0 1 2 3 0 1 2 3
A1-1 19.3 46.4 26.5 7.8 29.2 51.1 16.7 3.0 45.30 .00**
A2-1 15.0 36.3 38.9 9.8 32.2 38.6 25.1 4.1 93.65 .00**
A3-1 59.5 19.9 13.0 7.6 70.8 15.8 9.6 3.9 21.41 .00**
B1-1 8.3 32.3 49.8 9.5 15.8 40.0 40.0 4.3 45.91 .00**
B6-1 17.9 23.0 45.1 14.0 31.5 29.2 33.6 5.7 70.87 .00**
B7-1 22.5 23.1 44.2 10.1 40.4 27.2 25.8 6.6 83.31 .00**
B9-1 21.6 26.4 42.9 9.0 38.8 29.2 28.5 3.4 78.14 .00**
B11-1 25.1 22.4 42.3 10.2 39.5 28.5 26.5 5.5 64.93 .00**
D1-1 42.8 27.9 23.1 6.2 65.3 20.1 10.5 4.1 78.32 .00**
D7-1 23.8 45.9 25.6 4.7 46.1 34.0 16.9 3.0 91.90 .00**
D8-1 17.8 39.1 36.0 7.0 26.9 41.6 26.9 4.6 28.48 .00**
D9-1 19.0 34.8 38.6 7.6 32.2 35.8 29.0 3.0 51.10 .00**
D11-1 23.5 36.3 33.2 7.0 38.6 35.6 23.3 2.5 55.47 .00**
Note. **p < .01
Chi-Square Tests for Constructed Response Items, Booklet 2
Items
Higher-order Strategies Lower-order Strategies
p-value Frequencies (Percent) Frequencies (Percent)
0 1 2 3 0 1 2 3
A1-2 18.0 45.7 29.7 6.6 16.5 50.1 27.2 6.1 2.81 .42
A2-2 27.7 36.4 30.4 5.5 23.9 41.6 30.0 4.5 5.20 .16
A3-2 54.3 26.9 14.0 4.8 56.3 25.1 14.7 4.0 1.24 .74
B1-2 11.6 35.9 45.7 6.8 8.3 39.5 46.3 5.9 5.24 .16
B6-2 26.6 26.0 36.4 11.0 22.7 30.5 36.4 10.4 4.87 .18
B7-2 31.2 24.2 36.1 8.5 28.4 26.7 38.8 6.1 4.92 .18
B9-2 28.9 29.6 34.8 6.8 27.4 32.9 32.6 7.1 2.05 .56
B11-2 31.9 24.2 34.5 9.4 31.7 27.0 33.3 8.0 1.90 .60
D1-2 11.9 39.1 38.0 11.0 9.2 41.1 39.0 10.6 2.73 .44
D8-2 28.1 35.9 31.6 4.4 26.2 36.6 33.8 3.3 2.08 .56
D9-2 14.2 31.0 46.8 8.0 13.5 30.5 50.6 5.4 4.38 .22
D10-2 20.7 41.0 33.6 4.6 19.9 40.9 35.2 4.0 0.65 .88
D11-2 13.8 35.5 43.3 7.5 13.0 36.4 45.4 5.2 3.17 .37
78
APPENDIX I: DIF WITH TOTAL MATCHING SCORE FOR BOOKLET 1 AND
BOOKLET 2
DIF for Dichotomous Items with Total Matching Score, Booklet 1
Item Mantel-Haenszel
Chi-Square
Breslow-Day
Chi-Square Direction of DIF
Combined
Decision
Rule (CDR)
Effect
Size
B2-1 0.01 0.76 --- --- ---
B3-1 13.91** 0.73 Lower-order strategies DIF Moderate
B4-1 0.37 7.01** Higher-order strategies DIF Small
B5-1 8.42** 1.47 Higher-order strategies DIF Small
B8-1 1.87 0.04 --- --- ---
B10-1 0.92 2.91 --- --- --- C1-1 1.65 1.15 --- --- ---
C2-1 0.35 0.29 --- --- --- C3-1 1.58 0.35 --- --- --- C4-1 2.05 0.09 --- --- ---
C5-1 0.51 0.01 --- --- --- C6-1 0.24 0.92 --- --- ---
C7-1 3.24 0.06 --- --- --- C8-1 0.12 1.94 --- --- --- D2-1 1.48 0.17 --- --- ---
D3-1 0.04 2.48 --- --- --- D4-1 4.15 1.91 --- --- ---
D5-1 0.98 0.32 --- --- --- D6-1 1.36 0.16 --- --- ---
D10-1 1.27 0.52 --- --- --- E1-1 1.00 0.57 --- --- ---
E2-1 0.81 1.05 --- --- --- E3-1 0.26 0.27 --- --- --- E4-1 4.43 0.21 --- --- --- E5-1 3.47 0.02 --- --- --- E6-1 0.27 1.04 --- --- ---
E7-1 0.10 2.41 --- --- --- E8-1 1.35 0.11 --- --- --- E9-1 0.37 1.00 --- --- ---
F1-1 1.26 2.46 --- --- --- F2-1 1.44 0.02 --- --- --- F3-1 0.53 2.73 --- --- --- F4-1 0.00 0.22 --- --- ---
F5-1 0.10 1.53 --- --- --- F6-1 5.92* 0.87 Higher-order strategies DIF Small
F7-1 0.01 0.02 --- --- ---
F8-1 0.27 0.67 --- --- ---
Note. *p < .05, **p < .01
79
DIF for Dichotomous Items with Total Matching Score, Booklet 2
Item Mantel-Haenszel
Chi-Square
Breslow-Day
Chi-Square Direction of DIF
Combined
Decision
Rule (CDR)
Effect
Size
B2-2 1.58 2.05 --- --- ---
B3-2 2.56 1.26 --- --- ---
B4-2 1.48 2.96 --- --- ---
B5-2 12.97** 1.70 Lower-order strategies DIF Moderate
B8-2 14.74** 0.74 Lower-order strategies DIF Moderate
B10-2 0.01 0.01 --- --- ---
C1-2 3.30 3.83 --- --- ---
C2-2 3.32 0.23 --- --- ---
C3-2 3.40 0.16 --- --- --- C4-2 0.13 1.17 --- --- --- C5-2 0.01 1.77 --- --- --- C6-2 4.35 0.70 --- --- ---
C7-2 0.01 1.99 --- --- --- C8-2 1.31 0.55 --- --- ---
C9-2 0.00 6.00* Higher-order strategies DIF Small D2-2 1.99 0.29 --- --- --- D3-2 5.04* 2.38 Higher-order strategies DIF Small
D4-2 4.92 0.89 --- --- --- D5-2 0.09 0.16 --- --- ---
D6-2 32.75** 7.80** Higher-order strategies DIF Large D7-2 0.46 0.17 --- --- ---
E1-2 0.31 0.03 --- --- --- E2-2 0.89 1.40 --- --- ---
E3-2 0.02 1.89 --- --- --- E4-2 5.95* 0.63 Lower-order strategies DIF Small E5-2 10.79** 1.03 Higher-order strategies DIF Moderate
E6-2 1.32 1.14 --- --- --- E7-2 0.02 1.08 --- --- ---
E8-2 0.63 0.05 --- --- --- F1-2 10.21** 0.66 Higher-order strategies DIF Moderate
F2-2 4.20 0.65 --- --- --- F3-2 0.69 2.50 --- --- --- F4-2 0.07 1.31 --- --- ---
F5-2 0.56 1.14 --- --- --- F6-2 0.93 2.92 --- --- ---
F7-2 4.83 0.33 --- --- --- F8-2 2.44 0.93 --- --- ---
Note. *p < .05, **p < .01
80
DIF for Polytomous Items with Total Matching Score, Booklet 2
Item Mantel-Haenszel
Chi-Square
Step(s) Direction of DIF
Combined
Decision Rule
(CDR)
A1-2 0.27 --- --- ---
A2-2 0.27 --- --- ---
A3-2 0.15 --- --- ---
B1-2 0.28 --- --- ---
B6-2 0.30 --- --- ---
B7-2 0.02 --- --- ---
B9-2 0.00 --- --- ---
B11-2 0.75 --- --- ---
D1-2 0.51 --- --- ---
D8-2 0.29 --- --- ---
D9-2 0.56 --- --- ---
D10-2 0.08 --- --- ---
D11-2 0.59 --- --- ---