Core Assessment 4A Section One: Literacy Assessment and Professional
Development Report
For many reading specialists, assessment of student reading abilities is one of
the most important aspects of the job. Student assessment measures inform
instructional choices and practices, and provide a complex picture of why and how
struggling readers are encountering challenges in their journey towards becoming
better readers. The informed reading specialist must have a deep understanding of
current research on literacy assessment if they are to be an effective agent of
assessment. The following is a synthesis of research on factors that contribute to
reading success; assessments, their uses, and misuses; purposes for assessing
performance of all readers including tools for screening, diagnosis, progress, and
measuring outcomes; reliability, content, and construct validity; and state
assessment frameworks, proficiency standards, and student benchmarks.
There are a great many studies that investigate factors that contribute to
reading success in both the home and school settings. Included here is research on
how student temperament can impact academic resiliency when learning reading
skills; an analysis of statistically significant correlatives of contributing factors to
the academic success of students placed into a program for gifted children, yet who
have come from low socio-economic backgrounds; a general examination of factors
that contribute to reading success, and one study that focuses specifically on family
contributions to reading success.
McTigue, Washburn, and Liew (2009) have done recent research into the
impact children’s temperaments and the development of their social-emotional
1
skills has upon their academic resiliency and literacy learning. Their analysis of
current research supports the idea that “time spent developing early socioemotional
skills boosts students’ future success in literacy” (p. 423). When students of all
temperaments- extroverted or introverted, aggressive or more hesitant- are
coached to develop a sense of self-efficacy (which is teachable and not necessarily
inherent or fixed) their chances of developing literacy skills and reading ability
increase. Teachers of reading should take note that self-efficacy should be taught
hand-in-hand with early literacy, and indeed with literacy at any level.
Bailey’s recent study of students coming from low socioeconomic
backgrounds in a program for gifted children (children identified for and referred to
the Questioning, Understanding, Enriching, Seeking and Thinking (QUEST)
program) suggests that one key at-home influence on children’s later reading
success is the frequency with which they are read to. Upon statistical analysis of
questionnaires and interviews with students’ parents, the study found that of three
variables analyzed—regular parental reading (activities that take place at least 3-4
times per week), preschool exposure, and age at which children receive initial pre-
reading or reading instruction— “[I]t was determined that the economically at-risk
QUEST students privy to regular parental reading were more likely to experience
early reading success that QUEST students that were not exposed to the variable”
(Bailey, 2006, p. 314) In fact, regular parental reading was the only factor that
indicated a statistically significant influence on reading ability of the at-risk QUEST
children. Reading specialists who are working with family literacy programs—
especially with parents who may be economically at-risk— should be sure to
2
provide information and resources around the importance of frequent parental
reading to children.
Leslie and Allen (1999) found three independent variables that exerted
statistically significant influence on reading scores: amount of time spent reading in
classroom settings, level of parental involvement, whether in attendance at literacy
events or return rate of forms sent home, and amount of time a student spent
recreationally reading. All three were strongly correlated with higher reading
achievement in students that participated in the study.
Many schools are finally recognizing the power of families as first teachers as
significant factors in the success of the development of students’ literacy levels and
are implementing family literacy programs that connect the school and the home.
John Holloway (2004) nicely summarizes some of the recent research at the time
coming to the conclusion that “research indicates that family literacy activities
contribute to children’s success in school and that family literacy programs can
provide opportunities for educational success for parents and children. These
programs can also serve as models of family involvement, showing how families can
become part of an extended classroom and build on the work of the school”
(Holloway, 2004, p. 89).
Understanding factors that contribute to reading success is one of the first
steps when constructing a comprehensive literacy program designed to raise the
reading levels of all students. Additionally, a program of this nature would be
incomplete without a well-selected set of assessments that can provide rich data on
individual students so instruction can be tailored to meet their specific needs. The
3
reading specialist should be aware of common uses—and misuses—of such
assessments, and the direction in which the field of reading assessment is headed.
Included here are several articles discussing the need for a greater balance between
process and product assessments in an educational era where high-stakes,
summative, standardized testing has been privileged above other, more detailed
forms of individualized assessment.
Upon my review of the literature, I would assert that one of the greatest
points of active discussion in the field of reading assessment at this point in time is
“Balancing the assessment of learning and for learning in support of student literacy
achievement”. In Edwards, Turner, and Mokhtari’s 2008 article of the same name,
they explore the frustrations that literacy educators deal with in the face of this
imbalance (also referred to by others as assessment of “product” vs. “process”), and
suggest a handful of ideas to help such instructors strike a balance between the two
types of assessment. According to Edwards, Turner, and Mokhtari, multiple
assessments, culturally appropriate assessments, engaging students in the
assessment process and engaging school personnel in inquiry and action research
would be a first step in moving towards greater balance. The reading specialist
would do well to heed the research and recommendations in this area. A paradigm
shift is needed and reading specialists will be some of the prime movers and agents
of this change.
Winograd, Paris, and Bridge (1991) echo concerns about balance in literacy
assessment. They cite research stating that “traditional assessments are based upon
an outdated model of literacy,” “traditional assessments prohibit the use of learning
4
strategies,” “traditional assessments redefine educational goals,” and “traditional
assessments are easily misinterpreted and misused.” Their suggestions for
improving assessment are helpful. They suggest clarifying the goals of instruction
as well as the purposes of assessment, selecting multiple measures, and interpreting
results in ways that enhance instruction. They subsequently propose a model for
improving literacy assessment that includes helping students gain ownership of
their learning by monitoring comprehension, fluency, and a list of books read and
preferred authors. They also suggest helping teachers make instructional decisions
and helping parents understand their children’s progress through various measures
including conferences, or additional comments added to plain letter grades. They
even suggest helping administrators and community members make larger policy
decisions that would impact selection of testing that provides more detailed
feedback of student progress.
While assessment for learning is an admirable goal, it can be difficult to use—
or misuse—in a traditional educational setting. Many teachers and reading
specialists with good intentions may not know how to go about the data-driven
instruction process. Mokhtari, Rosemary and Edwards (2007) present a structure
for data analysis teams to use to help guide efforts at data driven instruction. Called
“The Data Analysis Framework for Instructional Decision Making,” it is a list of
guiding questions to help teams new to the data driven analysis procedure. Efforts
to use literacy assessments should always be based on research and the guidance of
educated professionals with proven validity and success. McKenna and Walpole
(2005) also suggest a model called “Reading First” which assumes a carefully
5
selected comprehensive reading program has already been selected district wide
and that there are already various screening assessments in use to catch students at
risk in various and specific areas. The framework of interventions varies on the
level of risk the student demonstrates on assessments and provides a structure to
help guide teachers through a process they may be unfamiliar with.
It should be clear from the previous section that process-focused assessment
for learning is in need of bolstering. The research on product-focused assessments
is copious and exhaustive. Standardized, product-based tests have been in use for
decades, yet it is common knowledge in the educational field that national reading
scores have held steady for decades as well. What about research on process based
testing? What tests do we use to assess student performance that provides us with
complex knowledge about multiple facets of a student’s reading abilities?
Understanding the use and misuse is only the beginning of a deeper
knowledge of assessment that should be cultivated by the reading specialist.
Additionally reading specialists should be aware of the different types of reading
assessments, their intended audiences, and how to use them to monitor progress
and measure outcomes.
Nina Nilsson (2008) provides an analysis of eight different informal reading
inventories used to assess reading process ability levels. The Informal Reading
Inventories analyzed were Applegate, Quinn and Applegate’s (2002) The critical
reading inventory: Assessing students’ reading and thinking (2nd ed.), Bader’s (2005)
Bader reading and language inventory (5th ed.), Burns and Roe’s (2007) Informal
reading inventory, Cooter, Flynt, and Cooter’s (2007) Comprehensive reading
6
inventory: Measuring reading development in regular and special education
classrooms, Johns’ (2005) Basic reading inventory (9th ed.), Leslie and Caldwell’s
(2006) Qualitative reading inventory-4, Silvaroli and Wheelock’s (2004) Classroom
reading inventory, and Woods and Moe’s (2007) Analytical reading inventory. All of
the informal reading inventories included passages to be read aloud and/or silently
by the student being evaluated. Each of the IRIs took a slightly different approach to
vocabulary, although all but one include word lists of varying levels to gain insights
into the student’s word recognition and decoding skills. Emphasis on word
recognition for the sake of identification versus word identification for the sake of
vocabulary knowledge and comprehension varied. Some IRIs provided
supplemental sections for phonemic awareness and phonics, but these were not
required portions of the main set of recommended evaluations. Additionally, all but
one of the IRIs included some measure of fluency. Nilsson provides a handy
summary of recommendation for choosing an IRI:
For reading professionals who work with diverse populations and are
looking for a diagnostic tool to assess the five critical components of reading
instruction, the CRI-CFC, in Spanish and English (Cooter et al., 2007) for
regular and special education students, as well as some sections of the BRLI
(Bader, 2005), are attractive options. Most likely, those who work with
middle and high school students will find the QRI-4 (Leslie & Caldwell, 2006)
and ARA (Woods & Moe, 2007) passages and assessment options appealing.
The CRI-2 (Applegate et al., 2008) would be a good fit for reading
professionals concerned with thoughtful response and higher-level thinking.
7
In addition, the variety of passages and rubrics in BRI (Johns, 2005) and
contrasting format options in CRI-SW (Silvaroli & Wheelock, 2004) would
provide flexibility for those who work with diverse classrooms that are
skills-based and have more of a literacy emphasis. For literature-based
literacy programs, the IRI-BR (Burns & Roe, 2007) with its appendix of
leveled literature selections is a valuable resources for matching students
with appropriate book selections after students’ reading levels are
determined. (p. 535)
A reading specialist would be wise to follow up on Nilsson’s
recommendations when seeking the proper IRI to use in the school district they are
working in. Additionally, it is important to have a more in-depth understanding of
some of the classical components of each IRI. Miscue analysis is experiencing
something of a resurgence, and McKenna and Picard provide a brief re-assessment
of the technique in their 2006 article, Revisiting the role of miscue analysis in effective
teaching. After a brief discussion of the history of miscue analysis, they explore one
study that put the validity of miscue analysis—insofar as it measures how and why
students make miscues based on context and prior knowledge—into question. In
other words, the reason why students make the errors they make are still not
completely clear, but the fact that they make errors should be considered. They
suggest that miscue analysis can be useful, but should results should be interpreted
with caution. They encourage the use of error totals for determining a student’s
independent and instructional reading levels, but semantically correct miscue tallies
are not supported by research and should be avoided. They write, “teachers should
8
view meaningful miscues (like substituting pony for horse) as evidence of
inadequate decoding skills, and not as an end result to be fostered. Because
beginning readers will attempt to compensate for weak decoding by reliance on
context, teachers should instruct them in how to use the graphophonic, semantic,
and syntactic cueing systems to support early reading” (McKenna & Picard, 2006).
They conclude that teachers and reading specialists should use focus on using
miscue analysis to monitor whether a student is relying too heavily on context and
instead shifting more towards decoding to figure out unknown words.
Kuhn, Schwanenflugel, and Meisenger provide a closer look at the
assessment of reading fluency in their 2010 article, Aligning theory and assessment
of reading fluency: Automaticity, prosody, and definitions of fluency. They explore
several theoretical perspectives on reading fluency and finally suggest an updated
definition of fluency that synthesizes the body of research presented earlier in the
article:
Fluency combines accuracy, automaticity, and oral reading prosody, which,
taken together, facilitate the reader’s construction of meaning. It is
demonstrated during oral reading through ease of word recognition,
appropriate pacing, phrasing, and intonation. It is a factor in both oral and
silent reading that can limit or support comprehension. (p. 240)
The purpose of their analysis and the production of their definition seems to
be to shed new light on the perception that a “fast” reader is a “good” reader. True
reading fluency is a combination of speed and prosody, the speed being an indicator
that the reading is occurring at a rate fast enough to be comprehended as whole
9
phrases and ideas, and the prosody being an indication of the reader’s
comprehension of the interpreted meaning. They note three final implications for
assessment. The first is that they suggest that if a word-per-minute assessment is
being used to assess student reading fluency that a prosodic measurement such as
the NAEP oral reading fluency scale (Pinnell et al., 1995) or the multidimensional
fluency scoring guide (Rasinski et al. 2009; Zutell & Rasinski, 1991) supplement its
use. Second, they suggest that fast decoding not be over-emphasized, and that a
comprehension evaluation be administered any time fluency is measured, which
could be as simple as a few impromptu questions or a brief discussion about what
was just read. Finally they assert that oral reading fluency is only one measure of
student reading ability, and that it be held in context with testing that evaluates
other aspects of reading ability as well, such as comprehension questions, retellings,
or miscue analyses.
While awareness of tests designed to assess different aspects of a reader’s
ability is important, it is also important to consider their reliability and construct
validity. We will take a brief look at research exploring the validity of IRIs, the
perceived validity of teachers as end-users of many reading assessment tools, the
validity of such “qualitative” forms of assessment as student portfolios, and the
validity of tests designed to measure ELL reading ability.
In 2001 Klesius and Homan published an article entitled, A validity and
reliability update on the informal reading inventory with suggestions for
improvement. In it they explore several aspects of validity in a broad comparison of
a commonly used group of Informal Reading Inventories. They explored content
10
validity, concurrent validity (“a comparison of performance on a new [IRI] test to
performance on existing [IRI] tests”, p. 72) inter-scorer reliability, and the impact of
passage length on the validity of reading scores. In terms of content validity, some
concerns were what percentage of comprehension questions could be answered
independent of having read the passage, or the scoring criteria used to determine
the students’ instructional level of reading, which varied greatly from one test to
another. The research on concurrent validity showed that generally from one test to
another the coefficients were acceptable, although little research has been done in
this area. Based on five separate studies, it was found that there was generally a
70% inter-scorer reliability rate. Research on passage length suggests that passages
shorter than 125 words may result in erratic or inaccurate results. The authors
make several suggestions in consideration of their findings both for teachers and for
evaluating IRIs. They conclude that despite issues with validity, IRIs are still
valuable reading assessment tools, and should be used in concert with some of the
suggestions and precautions they have made in consideration of the research on
validity.
Kyriakides (2004) explores the possibilities inherent in asking the teachers
themselves how useful the testing measures are to them, and how they use them in
their teaching. He suggests that this way of evaluating test “validity” is one way that
could be useful in developing the test in the future. Teachers responded to a set of
questionnaires and the data was processed to show the mean and standard
deviation of each of the responses. I found this article very interesting, and will
consider a strategy such as this to evaluate the information that is most useful to my
11
teachers as I provide them with student reading ability score information. In
addition to traditional measures of test validity, a reading specialist should consider
the usefulness of the test to the teachers and students as a key component in its true
“validity”.
For many schools, portfolio assessment is assumed to be outside of the scope
of possibility- either because it takes too much work to maintain properly, or
because such open-ended measures of student progress are simply not “valid” and
there are no “high stakes” portfolio evaluations coming down from on high.
Johnson, Fisher, Willeke and McDaniel (2003) explore validity measures of a family
literacy portfolio in an effort to contribute to research on such open-ended
assessments. They examined inter-scorer reliability rates on several various goals
assessed in the portfolio, and a holistic rubric used to evaluate 42 family portfolios.
While the inter-rater reliability for the estimate of the six goals ranged from a
dependability of .47 to .7, the holistic rubric had a much stronger reliability of .79.
According to the authors’ research, various sources of guidance suggest that “low
stakes assessments require a minimal reliability of .70; whereas, in applied settings
with high-stakes, tests require a minimal reliability of .9 (Herman et al., 1992;
Nunnally, 1978)” (p. 373). Despite the fact that only the holistic rubric would
qualify as acceptably valid under these terms, “feedback from stakeholders
indicated that the collaborative decision-making resulted in a credible assessment.
Family educators reported that their involvement focused attention on program
goals, contributed to their professional development, and increased their
understanding of families” (p. 375). The authors conclude that portfolio evaluation
12
has much potential, and that further research on reliability and validity measures
would contribute greatly to the field.
One validity construct that is becoming increasingly relevant in
contemporary U.S. culture is the validity of ELL test scores. While it is important to
measure how ELL students are performing in English—as that is commonly the
language of instruction—it may also provide an inaccurate or irrelevant assessment
of aspects of their knowledge or understanding that are not properly evaluated in a
language they are not fully fluent in. Sireci, Han, and Wells (2008) provide a
complex statistical formula that could potentially be used by those seeking to
evaluate test validity for ELLs in the future, although the evidence required to fill in
the formulas is yet to be collected.
One final factor that is key in literacy assessment is the standards we use to
provide guidance at the state and district level both for students and teachers, and
for reading professionals. McCombes-Tolis and Fein (2008) cite research affirming
“a direct relationship between teachers’ knowledge and skills about essential
components of literacy instruction and student literacy outcomes. (p. 236)”
Unfortunately they also note that there is a lack of consistent certification and
content standards for reading professionals, content-area teachers and classes,
which results in an inconsistent quality of literacy education. They discuss one
potentially successful possibility as the Connecticut Blueprint for Reading
Achievement. The authors identify this publication as an exhaustive and
comprehensive source of effective, research-backed literacy standards for educators
and classrooms. In an effort to evaluate the effectiveness of this document for
13
creating change within the teaching and learning community in Connecticut, the
authors administered an extensive questionnaire that measured teacher knowledge
of and perceived effectiveness of the blueprint. Their results were disappointing.
Most teachers did not correctly answer questions about basic content of the
blueprint, and were unable to correctly answer questions about basic literacy
competencies. They write,
Collectively, these results indicate that simply articulating essential teacher
competencies (knowledge/skills) within state reading blueprints is
inadequate to promote mastery of these competencies across targeted
teacher populations. Findings suggest instead that states that have taken
care to articulate essential teacher competencies within their reading
blueprints should also ensure that higher education teacher preparation
practices systematically prepare teacher candidates to meet these
competency standards so they may begin their careers as educators able to
effectively serve the literacy needs of diverse student populations (Ehri &
Williams, 1995). (p. 263)
Clearly it makes little sense to go through the trouble of producing high-
quality literacy standards if educators are unaware of them, or lack the training
necessary to implement them. Reading specialists everywhere should be sure to
familiarize themselves with the reading standards suggested within their states, as
research suggests that knowledge of these standards alone is correlated with
student reading success.
14
There have been those who question the effectiveness of the use of reading
standards at all for various specific uses. In Shannon’s 1995 article Can reading
standards really help? He discusses the original efforts of a joint task force between
the International Reading Association (IRA) and the National Council of Teachers of
English (NCTE) to create a set of national reading standards for reading
professionals and educators. The author’s primary concern is that the standards
created are themselves open for interpretation, especially from a social justice
standpoint. He asserts that the standards are open enough that different literacy
educators starting from different points and with different audiences could arrive at
different ends using the same guidelines, and that they do not help to address
student inequalities. He writes, “My point is that standards (or even laws) cannot
change biased thinking and behavior. (p. 6)” The article is an opinion piece, and it
seems the author has made an assumption that the primary, or even secondary
attempt of the authors of the standards was to directly impact de facto or de jure
inequality between student reading levels. However, his suggestion that the IRA
and NCTE “put some teeth into their declarations against bias in and out of schools”
(p. 7) is a relevant cry in 1995 when schools were only just beginning to grapple
with racial and economic inequality in schools in earnest.
Most reading educators and researchers agree that standards are only as
successful as those who choose to enforce and evaluate them in the classroom and
with individual students.
15
Bibliography
Bailey, L. B. (2006). Examining gifted students who are economically at-risk
to determine factors that influence their early reading success. Early Childhood
Educaction Journal, 33(5), 307-315.
Edwards, P. A., Turner, J. D., & Mokhtari, K. (2008). Balancing the assessment
of learning and for learning in support of student literacy achievement. The Reading
Teacher, 61(8), 682-684.
Holloway, J. H. (2004). Family literacy. Educational Leadership, 61(6), 88-89.
Johnson, R. L., Fisher, S., Willeke, M. J., & McDaniel, F. (2003). Portfolio
assessment in a collaborative program evaluation: the reliability and validity of a
family literacy portfolio. Evaluation and Program Planning, 26(1), 367-377.
Klesius, J. P., & Homan, S. (1985). A validity and reliability update on the
informal reading inventory with suggestions for improvement. Journal of Learning
Disabilities, 18(2), 71-76.
Kuhn, M. R., Schwanenflugel, P. J., & Meisinger, E. B. (2010). Aligning theory
and assessment of reading fluency: Automaticity, prosody, and definitions of
fluency. Reading Research Quarterly, 45(2), 230-251.
Kyriakides, L. (2004). Investigating validity from teachers' perspectives
through their engagement in large-scale assessment. Assessment in Education, 11(2),
143-163.
Leslie, L., & Allen, L. (1999). Factors that predict success in an early literacy
intervention project. Reading Research Quarterly, 34(4), 404-424.
16
Mccombes-Tolis, J., & Feinn, R. (2008). Comparing teachers' literacy-related
knowledge to their state's standards for reading. Reading Psychology, 29(1), 236-
265.
McKenna, M. C., & Picard, M. C. (2007). Revisiting the role of miscue analysis
in effective teaching. The Reading Teacher, 60(4), 378-380.
McKenna, M. C., & Walpole, S. (2005). How well does assessment inform our
reading instruction?. The Reading Teacher, 59(1), 84-86.
McTigue, E. M., Washburn, E. K., & Liew, J. (2009). Academic resilience and
reading: Building successful readers. The Reading Teacher, 65(2), 422-432.
Nilsson, N. L. (2008). A critical analysis of eight informal reading inventories.
The Reading Teacher, 61(7), 526-536.
Shannon, P. (1995). Can reading standards really help?. Clearing House,
68(4), 1-7.
Sireci, S. G., Han, K. T., & Wells, C. S. (2008). Methods for evaluating the
validity of test scores for english language learners. Educational Assessment, 13(1),
108-131.
Winograd, P., Paris, S., & Bridge, C. (1991). Improving the assessment of
literacy. The Reading Teacher, 45(2), 108-116.
17
Top Related