home.lagrange.eduhome.lagrange.edu/educate/Advanced Programs/M.Ed. Defense... · Web viewFigure 2.1...
-
Upload
nguyenkiet -
Category
Documents
-
view
212 -
download
0
Transcript of home.lagrange.eduhome.lagrange.edu/educate/Advanced Programs/M.Ed. Defense... · Web viewFigure 2.1...
THE EFFECTIVENESS OF TIERED SUMMATIVE ASSESSMENTS IN A GEORGIA HIGH SCHOOL MATH CLASS
Except where reference is made to the work of others, the work described in this thesis is my own or was done in collaboration with my advisor. This thesis does not include proprietary
or classified information.
Scott W. Barnett
Certificate of Approval
_____________________________ ______________________________
Donald R. Livingston, Ed.D. Sharon M. Livingston, Ph.D. Thesis Co-Chair Thesis Co-Chair Education Department Education Department
THE EFFECTIVENESS OF TIERED SUMMATIVE ASSESSMENTS IN A GEORGIA
HIGH SCHOOL MATH CLASS
A thesis submitted
by
Scott William Barnett
to
LaGrange College
in partial fulfillment of
the requirement for the
degree of
MASTER OF EDUCATION
In
Curriculum and Instruction
LaGrange, Georgia
May 10, 2011
Tiered Summative Assessment iii
Abstract
This study explores the effectiveness of a tiered assessment model in a regular
education multi-leveled Georgia high school mathematics classroom. Students from two
classes were pretested, instructed and treated in exactly the same manner. The treated class
was offered one of three student chosen summative assessments tiered by difficulty. The
control class was unaware an option. Data were gathered, interpreted, and concluded using
three focus questions guiding the study. Statistical and qualitative data were triangulated in
determining the results. This study found that students did in fact benefit from tiered testing
as an option serving as their summative assessment. This method is completely transferable
to multiple subjects and disciplines. Other terms to look into: student choice, alternative
assessments, multi-leveled tests.
Tiered Summative Assessment iv
Table of Contents
Abstract.…………………………………………………………………………………..…..iii
Table of Contents……………………………………………………………………………..iv
List of Tables and Figures………………………………………………………………..…..vi
Chapter 1: Introduction……………………………………………………………..…………1Statement of the Problem……………………………………………………………...1Significance of the Problem………………………………………………………...…2Theoretical and Conceptual Frameworks..……………………………………………3Focus Questions……………………………………………………………………….6Overview of Methodology…………………………………………………………….6Human as Researcher………………………………………………………………….7
Chapter 2: Review of the Literature…………………………………………………………..9Constructivism………………………………………………………………………...9Differentiation………………………………………………………………………..10Assessments………………………………………………………………………….12Student Outcomes…………………………………………………………...……….15Teacher’s Reaction………………………………………………………………...…18
Chapter 3: Methodology……………………………………………………………...…….. 22Research Design……………………………………………………………………...22Setting………………………………………………………………………………..23Subjects……………………….……………………………………………………...23Procedures and Data Collection Methods……………………………………………24 Validity, Reliability, Dependability, and Bias…..……………………………...……29Analysis of Data…………………………………………………………..………….34Validation…………………………………………………………………………….36Credibility……...…………………………………………………………………….36Transferability………………………………………………………………………..37Transformational……………………………………………………………………..38
Chapter 4: Results…………………………………………………………………………....39
Tiered Summative Assessment v
Chapter 5: Analysis and Discussion of Results…………………………………………...…48Analysis of Results…..………………………………………………………………48Discussion……………………………………………………………………………58Implications…………………………………………………………………………..62Impact on Student Learning………………………………………………………….64Recommendations for Future Research…………………………………..……….....65
References…………...……………………………………………………………………….66
Appendixes…………………………………………..………………………………………71
Tiered Summative Assessment vi
List of Tables and Figures
Tables
Table 3.1 Data Shell………………………………………………………………….…26
Table 4.1 Chi Square for Treatment and Control Student Surveys……...……………..45
Figures
Figure 2.1 The Assessment Equation……………………………………………………12
Figure 2.2 Grade Ranges on Tiered GCSE ..……………………………………………15
Tiered Summative Assessment vii
Tiered Summative Assessment 1
CHAPTER ONE -- INTRODUCTION
Statement of the Problem
Currently there is an outcry from the education community that learning is not taking
place in classrooms across the United States at its potential. Students are not grasping the
material as they should. Many scholars argue that mastery of the curriculum is not the
problem, but it is the method of assessment of that curriculum that is failing. Wood (2005)
says, “We have to establish assessments designed to reflect the variety of achievement targets
that underpin standards: mastery of content knowledge, the ability to use knowledge to
reason, demonstration of performance skills and product development capabilities” (p. 89).
The notion of information in, information out of these empty vessels called students is
outdated and not working. High stakes testing has expedited many to come to the conclusion
that summative assessment in today’s education arena just has no place; that it is a false
indicator of just how much a particular student really knows.
Education is in a transition period. And anyone who has been involved in the
education process during the last twenty-five years can clearly see it. However, students are
not ready to drop pencil and paper, break free from the classroom and teacher, and begin
personal pilgrimages searching for their own educational Zen. In the meantime, educators
push their students towards greatness using any strategy that they can imagine that makes
that student, not just pass the curriculum, but learn the curriculum. In doing so, educators
need to know, with the abundance of differing methods of assessment flooding today’s
education arena, which one or few will give that educator a true depiction of just how much
of the material did a given student learn. With alternate learning styles and differentiation
Tiered Summative Assessment 2
here to stay, how do educators know if the assessment style they use is the correct indicator
for what the students are learning?
Significance of the Problem
Wood (2005) says, “The entire method of evaluating what our high school students
have learned is unique to the school setting itself. Nowhere else in our society, will one’s
worth or abilities be measured by a paper-and-pencil test of short-term memory” (p.84).
Statements like Wood’s are plentiful. Many are opinions and many have scientific backing.
Summative assessments under No Child Left Behind (NCLB) have not only been brought to
the forefront of discussion in virtually every school system in the United States over the last
eight years, but they have been, according to Cizek (2010), designated as an end all, be all
decision maker whether a student graduates, transitions to the next level or course, or obtains
a license or credentials from a course. Now, in some areas, it is under investigation that in
the near future teacher’s pay could be based on the students’ scores from these types of
assessments. With so much riding on these summative assessments, educators need to
explore different methods of assessment; so promoting or retaining, or awarding or
reprimanding these students is actually a result of not learning the curriculum, not
understanding of the material, and failure to master the lessons. This thesis will explore
alternative summative assessments to investigate whether differing assessment styles will
have an impact on students’ grades.
Measuring a student’s mastery with the right assessment tool is vital in that student’s
overall success in education on the whole. Many failing grades can cause some students to
throw in the towel on particular courses and even on finishing school altogether. Sprick
(2002) echoes, “The way you organize instructional content and evaluate student mastery of
Tiered Summative Assessment 3
that content can play a major role in whether students’ expectancy of success is high or low”
(p. 27). Poor test scores can cause students to have a poor outlook on that class; behavior can
deteriorate, snowballing towards a poor outlook on school in general. This drowning in poor
grades and test scores can increase the dropout rate thus effecting graduation rates (Sprick,
2002). Educators have got to find a way to assess what the student knows without damaging
the student’s drive to succeed.
Theoretical and Conceptual Frameworks
Constructivism is a guiding philosophy adopted by the Education Department of
LaGrange College. In constructivism, a teacher or educator acts as a facilitator to education.
Students are charged with constructing their own education in their own terms and
background with steerage from the teacher. This increases the meaning of their education as
something that they own, not borrowed from a lecturer riddled with unfamiliar terminology
that means nothing to the pupil. Piaget describes constructivism as a method of teaching
where the student owns his or her knowledge and learning (Ackermann, 2001). Students
need to interpret what they are learning to make a personal connection with the material that
is being presented. An interaction must take place from what is being taught to life and the
world around the student. And finally, once personal connections are established, students
will seek more knowledge (Ackermann, 2001).
Tenet 1 of LaGrange College Education Department’s [LCED] (2008) Conceptual
Framework states an enthusiastic engagement in learning, coined the “professional
knowledge tenet.” In this tenet, candidates (educators) will understand the concepts and
structure of a given discipline and use that to create learning experiences that make the
subject matter meaningful to students. Furthermore, educators will journey across the
Tiered Summative Assessment 4
curriculum with the subject matter, linking it to real world applications making it more
relevant to the student. Educators will employ a range of instructional tools and techniques
while meeting state, national, and professional association content standards. Lastly,
educators will understand their students from learning styles and developmental growth to
diversity and culture and how they along with outside influences affect student learning and
engagement. (LCED, 2008)
This study will align with LCED’s Conceptual Framework in every tenet and its core
philosophy. The study will be conducted in the area of mathematics and summative
assessment. The study will take place in an actual active classroom during the regular school
year in a Georgia high school. Keeping students’ culture and learning styles on the forefront,
differentiated instruction will be used to teach new material as well as reviewing old. A
variety of methods of summative assessments will be used to gauge the progress of the
student subjects and participants used in the study. This study will follow the current
curriculum put forth by the state of Georgia and the county in which the study will take
place. This study will be testing the effectiveness of summative assessments in a high school
mathematics classroom; however, this study does involve real students during a real school
year. At no time will the education of these students suffer or be jeopardized for the
advancement of this study or its completion.
Tenet 2 of LCED’s Conceptual Framework states that educators will use professional
teaching practices while working with and preparing for students in the classroom. For this
study, backwards design will be used for instruction and assessment. Wiggins and McTighe
(1999) define backwards design as a process where “One starts with the end – the desired
results (goals or standards) –and then derives the curriculum from the evidence of learning
Tiered Summative Assessment 5
(performances) called for by the standard and the teaching needed to equip students to
perform” (p. 8). It is a process by which desired goals are determined, the assessment
method by which mastery of those goals will be measured is created, and then instruction
methods are derived, planned, and delivered to the students.
SSTs and IEPs will be considered, and modifications will be made as appropriate
during the planning, delivery, and assessment of the subject matter during this study. The
classroom management will be held to the highest standards, in terms of behavior plans, on-
task engagement, and educational integrity. Students will be given high quality hands-on
tasks suitable to the understanding of the curriculum and presented in such a way that is
respectful and applicable to students and their needs.
Tenet 3 of Conceptual Framework illustrates a caring and supportive classroom and
learning communities. This tenet requires the educator to be informed of a student’s
struggles during and away from school. And, the educator should take into account these
struggles in addition to the students’ cultures during instruction, assessment, and remediation.
The educator is charged with conforming teaching methods to better fit the student’s
background and learning style.
This study will take place in an environment conducive to learning. Remediation and
support, group and individual, will be provided for those students who need it. The idea of
student learning and success will remain paramount throughout the study. This study will
mimic an everyday classroom in a real school. To be true to the study, the execution of it
will go forth during the normal interactions of the school day, where multiple instances of
collaboration with other students, teachers, and even administrators take place. As
Tiered Summative Assessment 6
administrators of the school are aware and approve of this study, collaboration and the
findings of this study will be shared with them.
Focus Questions
This study will pinpoint the effectiveness of a tiered assessment model of summative
assessments in a single multi-ability math classroom by focusing on the following three
questions that will guide the research.
1. How can tiered assessments be infused into the curriculum?
2. What is the process by which tiered assessment effectiveness can be measured?
3. How do students respond attitudinally to tiered assessments?
Overview of Methodology
This study was completed using action research. Hendricks (2009) says action
research is a systematic, step by step, approach that allows findings through structured
experimentation and ongoing reflection. The research process is not steered towards a
desired outcome. Data are collected, evaluated, and concluded. Hendricks (2009) continues
to explain that action research is implemented in such a way that it is ongoing, either by the
individual performing the research or by future researchers to continue.
The study was performed using two classes of the same subject matter in a high
school mathematics classroom. It consisted of approximately 50 high school junior subjects
in Georgia’s Math 3, an equivalent to Trigonometry in the spring of 2011. Some of the
students had Individualized Education Programs (IEP) and others had Student Support
Teams (SST). Most were regular education students and the groups were quasi-randomly
selected based on the fact that they had me as their teacher and which period they were
placed in, neither of which I could control. For this study, the students were assessed over
Tiered Summative Assessment 7
the same material by alternative methods to see if a tiered assessment model presented a
more accurate account of a given student’s mastery of the curriculum. In order to do this, a
baseline needed to be determined. A pretest was given prior to any instruction in a given
unit. The same method of pretesting was given to all students in all classes participating in
the study. All classes were instructed for exactly the same amount of time and in the same
manner. The treated class was given the opportunity to choose one of three tiered summative
assessments aligned by difficulty to the GPS. The control class was not offered an option,
nor were they even aware there were multiple tests. The posttest scores were compared to the
pretest scores in order to determine a baseline for natural learning and progress. The posttest
scores were again compared with the baseline progress scores in order to determine if the test
shows a different level of progress or decline. Finally, the posttest scores from each group
were compared to determine if student chosen tiered summative testing positively impacted
student outcomes.
For qualitative data, two surveys were given to the study subjects in order to collect
data on the testing method that they prefer and why. Finally, I, as the administrator of the
assessments and data collection, kept a reflective journal consisting of my findings and
observations prior, throughout, and after the completion of the study.
In the planning stages, a unit plan and rubric for critiquing the unit plan was
developed. The rubric was used by a highly qualified third party otherwise unassociated with
the study, to ensure validity and aligned of the curriculum for the study.
Human as a Researcher
I have been teaching high school math at a Title One school in Georgia for 5 years. I
believe that some tests can be “beaten”, that is to say that, for example, multiple choice tests
Tiered Summative Assessment 8
questions can be answered correctly with the student not actually knowing the correct
answer. True/false tests are the same in their nature, a guess; 50 – 50 odds in this case that a
student will get the right answer and have no idea what the question is even talking about.
My greatest fear is, because mathematics is a science that builds from one class to the next,
will my students learn what they need to know to not only get to the next course in the math
track, but also, will be given the proper foundation by me to succeed on that next level.
The purpose of testing is for the teacher to measure how much of the taught and
sometimes background or foundation material is learned by the student. The test has to be a
true and accurate tool of measurement otherwise the teacher will get a false impression of the
learning that took place during that particular unit. As a teacher, I have to trust the tests
(measuring tools) that I am using. Because I used tests and other forms of assessment to
monitor progress from my students, I need to know which assessment will yield the best
correlation between what the student knows and the scores on their tests.
Tiered Summative Assessment 9
CHAPTER TWO – REVIEW OF THE LITERATURE
This study was accomplished using the Constructivism philosophy of learning along
with an Action Research approach at the core of this study driving its execution, data
collection, and findings. Constructivism, as defined by Crotty (1998), is knowledge
constructed, not discovered, by individuals by experiences. Later, in 2004, Maclellan and
Soden defined Constructivism as “knowledge, not passively received from the world or from
authoritative sources, but constructed by individuals or groups making sense of their
experimental worlds” (p. 255). From these definitions of Constructivism, one can deduce
that Constructivism at the heart of the meaning, is an active process of learning where the
student is partly the teacher building on his or her frame of knowledge from his or her own
experiences and the interpretations of those experiences. Simply put, it is a kin to a child
learning not to touch a red hot eye on a stove by touching it. The child actively decided to
and touched the eye after being intrigued by it. And learned, by being burned, in the future do
not touch a red hot eye on a stove because the child experienced the effects of the touch, and
deduced that it hurts and may hurt again next time. That child created or constructed his or
her own knowledge by experiment that it is definitely not a good idea to touch a red hot eye
on a stove. That in essence is Constructivism.
The purpose of Constructivism is the theory that this form of learning advances
meaning-making of knowledge gained by the learned or student if that knowledge is mostly
self-led and self-taught with the aid of an authoritative teacher or mentor (Yilmaz, 2008).
Yilmaz (2008) believes that a student who has learned under the philosophy of
Constructivism will have better use of what was learned because the knowledge gained will
have a personal meaning and connection to the educated and will be personalized to fit the
Tiered Summative Assessment 10
specific needs of the student who learned the material. In other words, the knowledge will
have a larger impact on the student; in turn, it will be internalized better by the student
because the student played a role in how the material was taught or delivered. If a student
was going to devise a research method for experimentation under the Constructivist
philosophy, the best fit would be action research. Because action research and
Constructivism are at their cores allies, it just fits that one cannot work without the other.
Differentiation
The notion of differentiation has reformed education to conform to today’s student.
With the revelation of different learning styles and how that affects students learning
abilities, differentiation is a tool developed to reach students that do not perform well in a
traditional classroom setting. Within the realm of the classroom, a practitioner of
differentiation will attempt differing and revolutionary tactics and strategies to reach all
students in the classroom with differing methods of instruction and assessment.
Differentiating the delivery of instruction and methods of assignments/ assessments is
designed to reach each student in their own learning style so learning can occur for that
student with all of the clutter of how it was delivered or assessed removed (Tomlinson,
1995).
An example of differentiating an assessment for a student would be giving an oral
examination to an English as a Second Language [ESL] student. The ESL student may not
speak English fluently enough to take a traditional test, but knows the material well enough
to pass the test. Giving the student an alternative form of assessment in this case would
benefit the student and the teacher by providing the student a fair chance to show what he or
she knows, and the teacher will receive a more accurate picture of what that student knows.
Tiered Summative Assessment 11
Carol Tomlinson (1995) has blazed the trail of modern differentiation in today’s
schools as one of the most published experts on the subject. She defines it as, “At its most
basic level, differentiating means ‘shaking up’ what goes on in the classroom so that students
have multiple options for taking in information, making sense of ideas, and expressing what
they learn” (p. 3). She argues that differentiation is not only necessary in today’s classroom,
but vital for an increase in student outcomes. She says, “Acknowledging that students learn
at different speeds and that they differ widely in their ability to think abstractly or understand
complex ideas is like acknowledging that students at any given age are not the same height”
(1995, p. 2).
Differentiation is crucial here because in Tomlinson’s words, “teachers can create a
‘user-friendly’ environment, one in which they flexibly adapt pacing, approaches to learning,
and channels for expressing learning in response to their students’ differing needs” (1995, p.
2). Tomlinson believes that education has been and needs to be reformed, replacing
instruction and teaching of yesterday, with differentiation. Because students are individually
different, they should be taught individually different, or as much as possible within the
teachers’ resources. Tomlinson (2000a) maintains that differentiation is not just an
instructional strategy, nor is it a recipe for teaching, rather it is an innovative way of thinking
about teaching and learning. She echoes that notion in 1995, “Differentiated instruction is so
powerful because it focuses on concepts and principles instead of predominantly on facts” (p.
47). Students given alternative options for expressing what they know will so show that the
intended concepts of the material were internalized by the student without mistranslation of
that from student to teacher based on the media of assessment getting in the way.
Tiered Summative Assessment 12
As differentiation gets a stronger hold on educators today, techniques expand and
become specialized. Differentiated instruction can be broken down into subgroups,
assessment, classroom instruction and delivery, curricula, classroom management, and
planning. Assessment, along with delivery, is on the forefront right now, especially for
students with special needs and ESL students. Most experts argue that when differentiation
in terms of planning and instruction occurs, assessment will naturally follow. Tomlinson,
Kaplan, Renzulli, Purcell, Leppien, Burns, Strickland, and Imbeau (2009) say, “An
assessment usually involves the demonstration of a behavior or product that results from the
student’s interaction with content” (p.45). Tomlinson et al. (2009) goes on to chart how a
student’s brain internalizes an assessment question, processes it, and answers the question in
Figure 2.1. Differentiated planning and instruction leads to differentiated tasks and
assignments. From there, differentiated assessments complete the differentiated process.
Differentiating assessments in a differentiated classroom are further attempts by the teacher
and curriculum planner to teach and assess towards students’ individual struggles and
readiness levels (Tomlinson, 1995).
Participant + Content + Task +
Cognitive Processing = Assessment
§
Figure 2.1 The Assessment Equation
Assessments
Assessments in a traditional classroom setting are the culminating activity that show
what students have learned throughout that testing period. Under differentiation, the
definition does not change as much as one might think. Tomlinson et al. (2009) defines
Tiered Summative Assessment 13
assessments using and not using differentiation as, “assessments are tasks assigned to
students in order to determine the extent to which they have acquired the knowledge and/or
skills embedded within a performance standard or content goal” (p.44). This is true with any
assessment, whether it is a pre-assessment, informative progress assessment, an intermediate
formative assessment, or a summative or culminating assessment. Tomlinson et al. (2009)
continue in terms of summative assessments, “More specifically, summative assessments
help teachers understand who has mastered content and skills objectives at a designated
‘ending point’ of instruction” (p. 45).
As defined earlier, assessments are supposed to be clear measurement tools that
illustrates to the teacher what the test taker knows about a given topic or list of topics. With
a better understanding of the way students learn today, several problems with traditional
assessment methods have arisen hindering the ability of the teacher to receive an accurate
picture of what that student actually knows. One problem that is widely overlooked is a
student’s interest in the test method itself. The assessment needs to be something tangible
that the student can visualize before instruction begins. The student needs to feel that he or
she has a personal stake in the assessment whether he or she actually does or not. Wormeli
(2006) reports, “Students are likely to do the homework assignment if they have a clear
picture of the finished product. If the assessment is fuzzy, they won’t” (p. 22). If the student
has no concept accomplishment or at least an initial grasp of success in the beginning of the
unit, then most likely the student will not perform well on the culminating assessment. One
of the most popular assessment methods especially in a math classroom is the teacher
secretly choosing test questions based on examples worked in class. This secrecy is well
guarded by the teacher sometimes to such extremes that those teachers will not even allow
Tiered Summative Assessment 14
colleagues to view their tests. Wormeli (2006) says this is hindering student learning,
“Nothing in the post-school world is kept a secret, so we shouldn’t play games with students,
coyly declaring that we maintain the right to choose anything we want from the chapter text
when they ask what’s on the test” (p.22).
Giving students a choice in their assessment method can add confidence to the
student that the assignment can be completed successfully. That feeling alone can increase
student outcomes. Without alternative assessments, this achievement can never be tapped. A
recent study by Scouller (1998) conducted on students’ performance outcome based solely on
choice of the method of the assessment found the following, “When performance outcome
was analyzed in terms of preference for assessment method, highly significant differences
were found between the two groups in terms of their assignment essay marks. Those, who
preferred assignment essays as the assessment method, were significantly more likely to be
successful in their assignment” (p. 465).
Assessments are supposed to measure knowledge. Most traditional tests are built to
test for facts and memorization of those facts. Just knowing those facts is not enough.
Today’s student needs to be able to decide, process, and infer information, and knowledge
based tests are just not measuring that. Schwartz and Arena (2009) argue that, “Knowledge
assessments are inherently retrospective, but past knowledge is a small slice of what matters.
Current knowledge assessments miss critical factors relevant to learning that include
motivations to learn, responses to feedback and change, tacit understandings, and abilities to
learn when no longer being told what to do” (p. 12).
Tiered Summative Assessment 15
Student Outcomes
In August of 2010, Wheadon and Beguin published an article in Assessments in
Education: Principles, Policy, & Practice testing the notion of multi-stage tiered tests
investigating whether tiering the test using an Item Response Theory [IRT] test would
increase student outcomes for learning in the British high school standards called General
Certificate of Secondary Education [GCSE], the British equivalent of a high school diploma
in the United States. The experiment grouped students of like abilities and labeled them
based on their grade A* through G as the passing grades and a failing grade simply labeled
‘fail’. In the model, the A* is the highest achieving student and the G is the lowest yet still
passing student or in this cases groups of students. This label is not the score on the test
instead it is the type of performer cumulatively of the student; much like an average ability
student is labeled a ‘C’ student and the highest achieving student is labeled and ‘A’ student.
The model from the article is depicted in Figure 2.2.
Since the
experiment
was testing two
levels of tests and
groups C, D, and E
took both versions of the test, the study only discusses the outcomes of those groups. This
group took the two versions of the same test; one being more difficult than the other but both
tests covered the same standards. The standards set forth by the GCSE were not
compromised during the study. Wheadon and Beguin found that in the treatment group 8%
of the C level test takers failed to achieve a grade higher than C on the higher leveled test,
Highest
A* A B C D E F G fail
Lowest
Figure 2.2: Grade ranges available on tiered GCSE papers.
Tiered Summative Assessment 16
and 4% failed to achieve a grade higher than C on the lower leveled test. That implies a 4%
pass/fail difference for the C students than if the group took only the higher leveled test, or if
tiered testing was not offered (Wheadon & Beguin, 2010). The D and E student groups also
showed an increase within the tiered system. According to Wheadon and Beguin (2010),
16% of the students who would have made D performance on the higher leveled test made a
C on the lower leveled test. Similarly the same statistic holds true of E performance on the
higher level test making D performance on the lower one. That is almost one in five students
in the D group and almost one in five of the E group; in addition one in twenty-five students
in the C group enjoyed benefits from the tier system of tests.
Wheadon and Beguin (2010) continue in terms of fairness, a maximum score was
placed on the lowered level test of C. In doing so, Wheadon and Beguin found that 25% of
the students in the C group had their grades capped by the maximum grade rule because
those students scored in the B range but because they were capped for taking the lowered
level test their final grade would be recorded as a C. That is an additional one in four
students that noticed improvement in outcomes with implementation of the tiered system.
Wheadon and Beguin (2010) warn that for a tiered system to be successful and relevant, the
standards cannot be fluid. They must not be altered in any way. He warns that it is easy to
alter the tests in such a way as to diminish the standards and urges the practitioner of tier
testing to be aware and careful of that.
A similar study in Australia occurred in 2004. This study was aimed at eight and nine
year old swimmers as part of a physical education class. Whipp (2004) argues, “Readiness
gaps were seen to negatively impact on a student’s level of concentration, involvement,
potency, achievement, motivation and self-worth” (p. 4). The study consisted of twenty-
Tiered Summative Assessment 17
eight students in three different physical education classes in Perth area of Western Australia.
The students were given swimming tasks to complete based on readiness of the individual
student and that student’s potential for growth. Each instructor chose the task for the
individual students without the student’s input based on teacher observations and past
performance. Whipp (2004) “believed the low ability swimmers improved their swimming,
and he thought that some of the middle ability girls also improved. [He] conceded a failure to
extend the higher ability swimmers, thoughts echoed by the students with 58.9% agreeing
that their swimming had improved” (p. 10). He continued to explain that the low and
middles ability students showed an increased interest in achieving the tasks set before them
and improving on their previous scores. This information was obtained through a non-
participant observer and by student surveys. Whipp (2004) concludes his findings by
explaining that the highest level students’ improvement was immeasurable. He explained
that he did not expect the highest ability students to improve much because those students
were already performing at the highest possible levels and their range for improvement was
already small especially because of the nature of the assessment in physical education where
preparation for examination is vastly different from a regular cognitive course of study.
In July of 1997, Herman, Klein, and Wakai studied student attitudes towards
alternative assessments. The study began in 1993 among 13 schools and over 800 eighth
grade students in California. The alternative assessments were designed to encourage critical
thinking and performance. The traditional test was a state mandated multiple-choice test.
The research group showed that 14% of students performed better on the alternative
assessment than the traditional assessment. But 67% of the students preferred the multiple
choice test method to the alternative method. In terms of alternative assessments, “students
Tiered Summative Assessment 18
try harder on these items; and they recognize that open-ended items require them to think
harder, explain their thinking, and communicate their understanding of mathematical
knowledge” (p. 16). Herman et al. (1997) explains the students’ perceptions of multiple
choice questions, “students express a preference for multiple-choice items. They find
multiple-choice items easier to understand and believe that they perform better on such
items” (p. 16).
Teachers’ Reaction
Tomlinson (2000b) said, “What we call differentiation is not a recipe for teaching. It
is not an instructional strategy. It is not what a teacher does when he or she has time” (p. 1).
She stresses that many educators struggle with differentiation simple because they do not
know what it is, or they do not have time, or there is some other reason that hinders their
execution of the philosophy. She continues to explain other factors that prevent teachers
from implementing differentiation in the classroom is standard-based learning. Teachers
have been pushed by local administrators, curriculum developers, and state standards to teach
exclusively towards high stakes tests. These tests are almost exclusively multiple choice
assessments that test students for trivia type knowledge, remembering facts instead of
processing information into inferences. The latter is what is driven by differentiation in the
classroom. Soloman (1998) reports, “unfortunately, a multiple choice test is out of sync with
the more constructive demand of real life” (p. 110). She continues to explain multiple choice
tests are easy to standardize, norm, and validate; and that is why states use them to measure
the learning of such standards. It is also easier and quicker to grade multiple choice tests.
Oberg (2009) says, “Finding adequate and appropriate assessments is a constant
challenge for teachers. Purpose, time, results, and how results will be used contribute to
Tiered Summative Assessment 19
determine the type of assessment that best fit teachers’ needs” (p. 3). Teachers are finding it
difficult to find assessments that meet the standard that are pushing all students to be alike
and at the same time assessing their individual needs and abilities.
Teachers who embrace differentiation in the classroom follow the natural flow into
differentiation assessments. If a teacher has the freedom from authorities and time to
prepare, they can become effective practitioners of differentiation (Tomlinson, 2000b).
Tomlinson (2000b) reports for another teacher, “I feel as if I'm a better teacher. I understand
what I'm teaching better, and I certainly have come to understand the students I teach more
fully. I no longer see my curriculum as a list to be covered” (p. 6). Differentiation comes
from collaboration with the school board, the teacher, the parent, and the student, without just
one of those, differentiation has lost its effectiveness. If differentiation is in regular practice
in the classroom, assessment will naturally be performance based and differentiated.
Tomlinson sums it all up by saying,
Teaching is hard. Teaching well is fiercely so. Confronted by too many
students, a schedule without breaks, a pile of papers that regenerates daily,
and incessant demands from every educational stakeholder, no wonder we
become habitual and standardized in our practices. Not only do we have
no time to question why we do what we do, but we also experience the
discomfort of change when we do ask the knotty questions (2000b, 7).
In 2005, Watt conducted a study on teachers and the use of alternative assessment.
She studied three math classes in schools in New South Wales and Sydney, Australia. The
purpose of the study was to examine teacher response acceptance of alternative methods of
assessment. She found that teachers on the whole are beginning to embrace the use
Tiered Summative Assessment 20
alternative forms of assessing although 71% of the teachers studied were using some form of
alternative assessment in the classroom. 68% of the teachers with more than three years’
experience had poor regards toward the assessment method. The number one reason given
for the lack of acceptance was time to plan. Teachers felt that creating and implementing
alternative measures was time consuming, and for an already time starved curriculum,
alternative assessments were not feasible (Watt, 2005). The next reason for teachers’ poor
acceptance to alternative assessment was the grading method of the assessments was
unstructured in the opinions of the seasoned teachers; they also felt there was little room to
make an alternative assessment ‘fit’ most lessons. Overall, seasoned teachers felt there was
no reason to change from traditional ways of assessment. That notion has nothing to do with
alternative instruction and classroom procedures, only assessment. They felt that their
traditional assessments did not need to be overhauled. Watt (2005) quotes another teacher
from the study as saying, “Teachers were relatively satisfied with traditional mathematics
tests as a measure of students’ mathematical ability” (p. 28).
Conversely, Watt (2005) explains, teachers with three years or less of teaching
experience were more eager to embrace alternative assessments and showed more
enthusiasm with planning and implementation of the assessments. They had the same
complaints, that creating alternative assessments were very time consuming, but they felt it
was worth it. She continues to rationalize the thought process of newer teachers in
explaining that in colleges alternative assessments and instruction were part of the curriculum
so newer teachers received their training with the philosophy of alternative methods already
embedded. The culture of newer teachers has differentiation and alternative methods as one
of it foundations, so the move from tradition is less resistant.
Tiered Summative Assessment 21
The significance of the problem points out a need for reform in assessment. With the
strong hold differentiation has had and continues to have on teaching structure and
techniques, assessment is naturally affected by the new treads in education. Countless
experts have emerged urging educators when to test, how to test, and even what to test. To
make matters even more complicated, according to Linn (1998), high stakes testing has
become the determining factor in whether students are promoted or retained, so these
methods of assessment have the education world’s attention right now. This study will
examine the effectiveness of a multi-leveled or tiered assessment model in a regular
education, with total inclusion, high school math classroom.
Tiered Summative Assessment 22
CHAPTER THREE - METHODOLOGY
Research Design
Hendricks (2009) says, “Educational research is conducted to advance our
understanding of a variety of issues…” (p. 1). She continues to explain that in education,
research is used to develop theory, test hypotheses, study relationships among variables,
describe educational phenomena, and determine whether actions are based on results. In its
infancy, action research was described by Kurt Lewin in the 1930s as “a spiraling process
that included reflection and inquiry on the part of its stakeholders for the purposes of
improving work environments and dealing with social problems” (Burns, 1999). This
definition originated in context of the work of Lewin, who was charged with improving the
production of factory workers as he studied them in the workplace (Burns, 1999). Burns
(1999) research was based on his doing an experiment with real workers in a real
environment in real time. This concept was new in those days as most research of the time
was done by thinkers theorizing about outcomes based on intellectual perceptions. As time
passed and action research evolved, it moved into the classroom, but its essence remains,
actively doing an experiment or study with real students in the actual classroom in real time,
interpreting real results. Hendricks (2009) argues, “The purpose of action research is for
practitioners to investigate and improve their practices” (p. 3). She wants teachers who use
teaching and instruction to educate their students to study how the teaching process can be
improved for the purpose of producing a better educated student. This research is done by
the teacher as a self-study, so the teacher can take the findings and improve their future
teaching in the classroom (Hendricks, 2009).
Tiered Summative Assessment 23
This study was conducted in a high school math classroom in January of 2011 in the
metro-Atlanta area of Georgia. The students involved were regular and inclusive special
education students of mixed abilities in a regular educations Math 3 classroom. Math 3 is
Georgia’s equivalent to Advanced Algebra or post Algebra 2. I was the teacher for the
classroom, and I wanted to test tiered assessments for mixed ability students in my class
because I was looking for a better way to access my students in the philosophy of
differentiation. The study was supervised by the Education Department at Lagrange College
in Lagrange Georgia with permissions from my county’s school board and my school’s
principal. To protect the integrity of the study and the interests of the students involved,
Institutional Review Board [IRB] approval was also obtained in order to conduct this study.
Setting
The study took place in a metropolitan Atlanta suburban high school in the spring of
2011. The school is located in the county seat and is deep rooted in the town’s culture. The
school housed 2200 students 16.4% were special education students in a total inclusion
environment. The school makeup was 28.8% black, 0.7% Hispanic, and 0.4% Asian
according to U.S. Census Bureau, 2000 Census and National Center for Education Statistics,
U.S. Department of Education. Additionally, the school was 49% male, 36.3%
economically disadvantaged. The school was a Title 1 distinguished school and had made
Adequate Yearly Progress [AYP] every year from its inception. The school had a 75.5%
graduation rate and a 17.4 student to teacher ratio.
Subjects
Tiered Summative Assessment 24
For the spring semester in 2011, I taught two Math 3 classes. The students in each
class, sixteen to seventeen years of age, were mixed-ability students in a traditional regular
education classroom. The sophomores and juniors had some mainstream special education
students mixed in each class. Those students were usually part of regular education
classrooms; as my school system practices total inclusion for special education students and
has for some time. The students, known as subjects, had no idea they were being studied.
The subjects for the study were not chosen at random, but were grouped only by their
schedule. The untreated group is one Math 3 class and the treated group is the other Math 3
class. I chose my first class as my treated class and my second block class as my untreated
class. The selection of which class was treated and untreated was completely random. I
made the decision on who was treated and untreated before I even saw the rosters of each
class before the upcoming semester. Because I tested an entire class versus another entire
class, subgroups were not necessary for this study. The untreated group consists of 21
students, 17 regular education students and 3 mainstream special educations students and 1
gifted student. The treated group consisted of 28 of students, 23 regular education students
and 2 mainstream special educations students and 3 gifted students.
Procedures and Data Collection Methods
Because my study deals with real grades and in the interest of fairness and protection
for my students, an agreement was made between my principal and me, that a student during
this study involving one unit could retake his or her test at their discretion post study. This is
to ensure that all students of the school have the same opportunity for success and no student
is given an advantage for passing over another student. That is not to imply that the
assessments used in the study are not aligned to the Georgia Performance Standards [GPS]. I
Tiered Summative Assessment 25
offered a tiered assessment for one group and no tiering for the other. My principal felt that
some students in the non-tiered class may benefit from the tiered assessment, and he wanted
the opportunity to be available without tarnishing the study. The scores on the retake
examination were not used for the study as it may have caused the data to be skewed in one
direction or another. Furthermore, the students were not aware of the retesting possibility at
the time of their assessment, as not to compromise the integrity of the original assessment.
They thought that the study’s assessment was their one and only summative grade for that
unit until after the conclusion of data collection.
This study examined student outcomes in a tiered assessment system on a unit in an
upper level mathematics classroom from start to finish. This study was conducted using
action research. Action research was the best type of research for this particular study
because the research was actually done in a real world setting with real outcomes. Because I
am the researcher and I am a practicing teacher trying to improve my methods, the study fits
an action research model best. Hendricks (2009) says that Classroom Action Research is a
form of action research used by active teachers in their classrooms to hone their skills. The
results will have an impact of the future of the methods of practice inside the classroom for
me. I was the administrator of the study, and data were collected and observations were taken
by me in real time as they happened. This method provided the best details in terms of
feelings of the subjects and researcher and observations made by the researcher as they are
inferred and recorded at the time the observations or data collection was made. Hendricks
(2009) says, “Observational data are the most important source of information in an action
research study” (p. 90). It is crucial that the correct interpretation of the observations is
recorded as soon as possible to protect the validity of the information retrieved. The study
Tiered Summative Assessment 26
was guided using a data shell, a table containing my focus questions and data collection
summary, see Table 3.1.
Table 3.1. Data ShellFocus Question Literature
SourcesData Sources Why do these data answer
the question?How are data analyzed?
How can tiered assessments be infused into the curriculum?
Wormeli, R
Scouller, K..
Tomlinson, C.
1.) Rubric from Unit Plan2.) Archival3.) Instructional Plan
1.) Peer reviewed for validity. 2.) Content validity for study. 3.) Implementation of treatment
Qualitatively: coded for themes aligned with focus questions.
What is the process by which tiered assessment effectiveness can be measured?
Wheadon, C. & Beguin, A.Oberg, C. Tomlinson, C.
Test scores obtained1.) Pre-Pre between treated/ untreated classes.2.) Pre-post in treated class.3.) Post-post between classes
1.) Scores will or will not show a significant difference in the treated/ untreated classes.2.) Scores will or will not show increases student outcomes3.) Scores will or will not show significant gains in the treated class.
Quantitatively: 1.) Pre-pre: independent T test with unequal variances2.) Pre-post: dependent T test 3.) Post-post: Independent T tests with unequal variances
How do students respond attitudinally to tiered assessments?
Watt, H.Herman, J., Klein, C. & Wakai, S. Tomlinson, C.
1.) Subject Survey;2.) Reflective Journal;3.) Subject Survey
1.) Subjects will take a survey as to their preferences on the level of test they felt would express their understanding of the material. 2.) I will record observations about tiered tests that I feel is significant to the process. 3.) An additional survey will be conducted pretreatment in order to draw conclusions on the types of students choose what test.
Quantitatively:1.) Chi-square done on survey with descriptive statistics. Qualitatively: 2.) Coded for themes aligned with focus questions. 3.) Coded for themes aligned with focus questions.
Tiered Summative Assessment 27
As the study embarked, a pretest was given to both groups of students. One group
consisted as a control group, separated from the treated group because they were in a
different class and grouped together because they were in the same class. The construction
of the two groups was semi-random, as they were chosen based on their schedule, but I had
neither control over who was in each group nor knowledge prior to the study to whom was in
each class, or group. The pretest was supposed to distinguish whether the two groups
showed significant differences in tests scores prior to treatment.
After the pretest, instruction for the unit was exactly the same for both groups. The
students were introduced to topics in a given unit of Math 3, the Georgia equivalent to
Algebra 2. The students were assigned classwork and homework, quizzed, remediated, and
lectured in exactly the same manner for the purposes of the study. The one difference
between the groups was the treated group was aware from the day after the pretest before any
instruction took place that they would have a choice of the summative assessment in terms of
level of difficulty. The process was explained to them along with how the assessment would
be administered and what level each assessment aligned with the Georgia High School
Graduation Test [GHSGT]. In addition, the score values for the three levels of assessments
were explained.
The explanation went as follows; at the end of the unit each student will have the
choice of assessment, Meets, Exceeds, and Excels. The Meets test is aligned with a Meets
Standards score of 500 on the GHSGT. Five hundred is the minimum score to pass the
GHSGT. In addition, the Meets test carries a maximum score of 80% in the classroom.
Students who choose this test can score no higher as the questions on it are testing for only a
Tiered Summative Assessment 28
basic knowledge of the concepts from the unit. The Exceeds test is aligned with the GHSGT
with an Exceeds Standards score of 516. This test expects a higher level of learning from the
student and awards appropriately. This test carries a maximum score of 100% in the
classroom. This score is representative of a traditional ‘B’ student, above average but not
tops in the class. The Excels test is the final level in the tier that assesses students on the
highest level of learning. Students who choose this test excelled in every concept from the
unit. This test has questions that are difficult and are aligned above the standards set forth by
the GHSGT. Some of the questions on this test were not covered in class, as they are
concepts achieved from transfer of knowledge or interpretation of the concepts presented in
class. Students will have to make inferences and predictions using the concepts taught. A
maximum grade of 110% is allowed for this test. The students were also informed that the
homework is based on the Meets test, quizzes were based on the Exceeds test, and the
delivery of instruction without the inferences and predictions was based on the Excels test.
The untreated group took only the Exceeds test with a maximum score of 100%.
Instruction, assignments, quizzes, and remediation went on throughout the unit
exactly for both classes. If one class spent twenty minutes on a quiz, then the other class
spent twenty minutes on the same quiz. The unit test review was also delivered exactly the
same way. As day to day work was conducted, notes were taken on observations of both
groups. Student attitudes were closely looked at here due to one group knowing there was a
choice on assessment and one group had no choice.
On test day, each student in the treated group informed me one by one which test they
were going to take, after questions were taken from students attempting to understand the
process of tiered assessments. The untreated group was just given the test. They took it and
Tiered Summative Assessment 29
turned it in. The treated group was given the test each student individually and anonymously
chose. I wanted the test choice to be a secret to eliminate peer pressure. The tests were
graded and the scores recorded. Later, the scores from the untreated group were compared to
the scores of the treated group looking for a significant difference. The pretest scores from
each group were compared to the posttest scores respectively looking for significant gains in
student outcomes from pretest to posttest. I expected gains because teaching occurred
between the two tests, but if the treated group’s scores showed a greater increase than the
non-treated group’s scores, then validity of a tiered system can be argued.
Validity, Reliability, Dependability, and Bias
Popham (2011) defines validity as “not simply a synonym for test-related goodness.
Rather, validity refers to the accuracy of test-based inferences” (p. 437). Popham (2011) feels
that validity of a study is the most significant concept of the study, and if a study is not valid
then its findings are not valid. Steps were taken during this study to ensure that the
inferences made on the findings were and are valid.
Focus question one queries, how can tiered assessments be infused into the
curriculum? First, an instructional plan (see Appendix A) was developed and peer-reviewed
by the school’s Title 1 math coach using a designed rubric (see Appendix B) to ensure
alignment of instruction and, more importantly, the assessments used to Georgia’s
educational standards and for validity. Once the plan was in place and aligned with the
content and curriculum, research was gathered focusing on other scholars that had attempted
similar studies. These studies were compared and used as a guide for this study. Similar
studies include Watt (2005), Whipp (2004), and Wheaton and Beguin (2010). These
Tiered Summative Assessment 30
scholars’ works can be found in the reference section of this thesis and specifics of each of
these studies along with more not mentioned here can be found in Chapter Two.
Content validity, as Popham (2011) defines, “refers to the adequacy with which the
content of the test represents the content of the curricular aim” (p. 89), was ensured in that all
lessons, practice questions, notes, instruction, and even quizzes were identical for both
groups, treated and untreated throughout the study. That was also reinforced by the math
coach’s critique of the instructional plan as explained earlier. Because this study was
focusing on the summative assessment, there was no reason to alter the instruction and day to
day teaching and exercises of the students between the groups. Because of this, both groups
were assigned exactly the same practice exercises and quizzes throughout the study. In
addition, both groups were presented the same notes and instruction throughout the execution
of the study. This was enabled by the use of Power Point presentations as notes and
instructional guides to ensure consistency. This serves as a tool for dependability for the
study.
Golafshani (2003) illustrates that dependability of qualitative data is akin to the
reliability of quantitative data. Since focus question one was measured with qualitative data,
reliability will not be discussed here, rather dependability of the study was the goal for focus
question one. In addition to the above example of dependability at the end of the previous
paragraph, dependability was also kept in check by the math coach’s critique of the
instructional plan, and her aligning of the assessments to the Georgia education standards to
and to each other. Popham (2011) explains the importance of reliability/dependability of
alternate forms of assessment when comparing student outcomes from two distinct groups.
Tiered Summative Assessment 31
As far as bias for the content portion of this study, there is inherently minimal risk of
bias infecting the study, due to the nature of how the content was presented and scored. The
content and curriculum having been aligned with the state’s curriculum left me little room to
alter the lessons taught. This keeps much of the bias, at least as far as content is concerned,
out of the picture.
Focus question two asks, what is the process by which tiered assessment effectiveness
can be measured? These data were collected by using the scores of the subjects pretest
scores and posttest scores for the unit in the study. The posttest refers to the tiered
assessment option for the treated group and the non-tiered option for the control group.
Those scores were used in multiple quantitative tests that will be discussed in further detail in
chapter 4. These methods are strong as they are statistical mathematical forms of comparison
that are time proven, and as Salkind (2010) states, “tools developed specifically to
understand the world around us” (p. 9). Popham (2011) describes this type of validity as
Criterion Validity, using measurements between two groups as a basis of a predictive
inference. In addition, care was taken in the grading process to ensure that the first
assessment was scored in the same manner and under the same scrutiny as the last. If one set
of assessments were scored more harshly than another set, validation of the findings would
be questionable.
Reliability was shown by correlations on the groups consisting of test/retest reliability
within each group. This gave me a clear vision of where each group is prior to and after the
study is complete. A correlation here can show progress of a particular group. Inferences
can be made on the correlation. Next, parallel reliability is illustrated by correlating the
Tiered Summative Assessment 32
control group to the treated group. Inferences can be made on those correlations as well
(Salkind, 2010).
Students’ identities were unknown during scoring to prevent the so called “halo
effect”, an overlooking of errors by certain students based on the expectation that the student
successfully answered the question correctly (Nisbett & Wilson, 1977). The study was also
done at the beginning of the semester before the scorer had the opportunity to learn which
students / groups would stand out or lag; this also aided in preventing the halo effect.
Focus question three probes, how do students respond attitudinally to tiered
assessments? The study was kept secret to its subjects as not to taint the efforts of the
students. If they thought they were part of a study and there was a chance that their grades
on the assessments may not count as a real grade, they may have not given the assessment
their best effort thus invalidating the findings for those students. The students were unaware
a study was taking place for that reason.
Data for this focus question were collected from surveys given to the students in the
treated group and control group. The surveys for each group were not the same. These data
were converted into numbers and a chi square quantitative analysis was performed. This
form of data collection concerns me because of the nature of students not having an interest
in the study. Rogelberg, Fisher, Maynard, Hakel, and Horvath (2001) warn of making
surveys mandatory, arguing that the responses given may be invalid due to the respondents
being forced to participate in the survey itself. Because the subjects had no idea they were
being studied, they may not have taken the surveys seriously. They also may not have
thought about their responses thoroughly before answering. Since this was a concern prior to
the distribution of the surveys, the subjects were prompted that the information gathered was
Tiered Summative Assessment 33
important and needed to be taken seriously. They also were told that it was not required and
only the subjects who intended to answer the survey seriously should participate. In addition,
the subjects were urged to think thoroughly about the question before answering each
question. Not all surveys distributed were returned, but more than enough were completed to
infer from the data. For criterion validity, a chi square correlation was calculated to
determine significance. Cronbach’s Alpha showed internal consistency on the surveys
showing reliability of the data collected (Salkind, 2010).
The qualitative portion of this focus question was in the form of a reflective journal
kept by me in order to record day to day observations on the treated and control groups. The
journal was recorded daily just after each group’s departure from class. And the observations
recorded were consistent based on writing prompts that were used each day. The writing
prompts for each day were the same (see Appendix C). This illustrates dependability for the
journal data.
Bias for the third focus question is based on the researcher. As I indicated in
Chapter One, I had some knowledge of tiered assessing as well as formed some opinions
prior to this study. However, the writing prompts used for the journal were adhered to during
the writing of the journal to protect against my personal feelings getting in the way.
For fairness, the negative aspects of tiered assessment were researched. With the
limited resources put forth toward this aspect of tiered assessments, Oberg (2009) and
Wheaton and Beguin (2010) reported findings that I would consider negative outcomes.
Oberg (2009) reported low teacher attitudes towards tiered assessments, as teachers felt they
lacked the time to prepare for such endeavors. This process does take an exuberant amount
of planning time to accomplish. Wheaton and Beguin (2010) reported that only certain
Tiered Summative Assessment 34
students were given a tiered assessment and that assessment was chosen by the instructor not
the student. This study concentrated on tiered assessments with a student choice.
Popham (2011) defines offensiveness as, “[something] that contains elements that
would insult any group of students on the basis of their personal characteristics, such as
religion or race” (p. 503). This study had no elements of offensiveness, the instruction was
aligned with the Georgia Performance Standards [GPS] and questions and instruction
techniques were peer reviewed and tested for offensiveness. In addition, the assessments
used were also aligned with the GPS and were peer reviewed testing for offensiveness.
Popham (2011) also explains if one group of students’ scores were decidedly different
from the rest of the test takers, then disparate impact has occurred. The groups of students
can be socioeconomic, religious, cultural, racial, or gender. The student outcomes from this
study yielded no disparate impact as no group or subgroup of students showed stand out
scores on the assessments.
Analysis of Data
How can tiered assessments be infused into the curriculum? Focus question one,
examines the pedagogy of tiered assessments. The data collected for this question was
analyzed qualitatively and coded for themes. An instructional plan was developed and
implemented. The plan was designed for dependability and consistency as a guide throughout
the study. This was to ensure that both groups were given the same treatment throughout the
study except for the treatment itself. This reduces the variables and margin of error so the
results had merit. A rubric of that plan was peer-reviewed for validity. It was also to ensure
that the methods adhered to the curriculum and content of the course. The rubric was also
examined for fairness and to ensure that no intended unwanted variables or by products
Tiered Summative Assessment 35
arose. Next archival data was collected examining the methods of other scholars. This
information was used to structure the study in a professional and research oriented manner.
The data here was coded for themes looking for common threads and consistency. Portions
of other scholars’ works, cited in Chapter Two, were used to fine tune this study to ensure
validity.
What is the process by which tiered assessment effectiveness can be measured?
This second focus question deals specifically with the scores of the subjects assessments.
This is the essence of this study as it focuses on test scores and how to improve those. The
data for this portion is the actual assessment scores from each group. First, with a null
hypothesis that there is no significant difference between the scores of each group, an
independent t-test with unequal variances at the P < 0.05 significance level was done on the
pretest of the control and the pretest of the treated group to determine if there were
significant differences between each group. Next, with a null hypothesis that there is no
significant difference between the pretest versus the posttest scores within each group, a
dependent t-test at a P < 0.05 significance level was done. This was to account for the normal
learning curve that took place between pre and posttests. Third, with a null hypothesis that
there is no significant difference between the scores of each group, an independent t-test with
unequal variances at a P < 0.05 significance level was done on the posttests between each
group to determine if there were significant differences between them. The effect size for
each analysis was also calculated.
How do students respond attitudinally to tiered assessments? Focus question three
questioned the attitudes of the students and me as the researcher. A reflective journal was
kept throughout the study guided by writing prompts. The journal was coded for themes
Tiered Summative Assessment 36
looking for categorical and repeating data that formed patterns of behaviors. One surveys
was given only to the treated group (see Appendix D) while another survey was given to both
groups (see Appendix E). Cronbach’s Alpha was done on the results from each survey for
internal consistency reliability. A Chi Square was calculated for each survey question to find
which questions were significant and which ones were not.
Validation
In terms of consensual validation of the study, my goal was to contribute to the
conversation about the use of tiered assessment model in a multi-level high school classroom
and its influence on student outcomes and learning. In addition, this work was reviewed by
the faculty at LaGrange College. As Eisner (1991) states, “’Consensual Validation’ is an
agreement among competent others that the description, interpretation, evaluation, and
thematic are right” (p. 112). Kvale (1995) echoes Eisner by saying, “consensual theory of
truth aims at universally valid truths as an ideal.” Meaning that analysis of this study will be
consistent with similar analyses of other like studies by other competent scholars. That is to
say that this study and its analysis of the data were considered with the whole and impact of
the results in mind.
Carberry, Ohland, and Swan (2010) define “Epistemology is a branch of
philosophy that concerns the nature and scope of knowledge and the
process(es) by which knowledge is gained”. Epistemological Validation is
validation gained on a piece of research by the nature or means the research was constructed,
executed and concluded. A study is valid if these aspects of the study were adhered to with
the nature of the research kept on the forefront of the intentions of the study/ research for the
Tiered Summative Assessment 37
benefit of the whole research community as well as its findings. This study, in keeping with
the spirit of valid research, was constructed from a montage of the works of other scholars.
Credibility
Eisner (1991) says structural corroboration is a confluence of multiple data sources
coming together to make an argument concerning the whole. From the data shell (Table 3.1)
multiple data collection techniques and devices have given rise to the inferences made in this
study. In Chapter Two, the fairness of the study has been illustrated by the example of the
study from Watt (2005). She found that many teachers did not want to use differentiation in
the classroom nor a tiered testing model. Conversely, Tomlinson (2000a) urges the necessity
for such techniques needed for student achievement. For rightness of fit, great care was
taken to ensure precision and accuracy for this study. Records were kept with the integrity of
the data collected in mind so that a tight argument could be made.
Transferability
Tronchim’s (2006) perspective on transferability, “Transferability refers to the degree
to which the results of qualitative research can be generalized or transferred to other contexts
or settings. The qualitative researcher can enhance transferability by doing a thorough job of
describing the research context and the assumptions that were central to the research.” This
study which was constructed in the spirit of other studies with their merit and credibility,
along with the original portion of this study also being true to the spirit of research that can
be used by future scholars and researches is true to the works before and after it. This work
is qualified to stand beside the works of others as credible and transferable.
For Referential Adequacy, this study was completely assessment based. Since
assessments are virtually universal to all disciplines, this study can easily be replicated. Care
Tiered Summative Assessment 38
was taken to reduce the variables that may skew the data in this study; and with the exception
of the assessments themselves, no differences occurred between the control and treated
groups. Only the assessments were different. Since most classrooms consist of instruction
then assessment, a research could easily reproduce this study.
Transformational
Catalytic Validity is the degree in which the researcher anticipates this study to
transform the subjects, participants, and the school (Lather as cited in Kinchloe & McLaren,
1998). Because of this study, I was approached by colleagues and administrators interested
in the concept of tiered assessment models for their fields for the other teachers, and how
student outcomes increased from administrators. Since differentiation now has a strong hold
on today’s education, I do expect this study to cause interest in the general area of this
researcher and hopefully to anyone this study reaches. Being a math teacher, I consulted the
science department at my school and they chose to roll out this model in the fall of 2011.
The students involved in this study seemed to develop an ownership of their grades
and learning from having the options to decide at what level they had to express it under the
tiered model. At this time is not possible to say that the students’ renewed interest in their
learning was linked to the choice or the tier. Further research in needed to develop those
inferences. I have implemented this model full time in my classroom and it has seen success
holistically, not only specific to grades but education on the whole.
Tiered Summative Assessment 39
CHAPTER FOUR – RESULTS
Focus question 1 investigates the pedagogy of the study, the design if you will. You
can refer to the data shell, table 3.1, on page 26 of this thesis for the focus questions and data
collection methods for each focus question. Focus question 1 presents three methods of data
collection; the unit plan for the study; the peer reviewed rubric for the unit plan; and the
archival data collected for the study. The unit plan and rubric are located in the Appendix, A
and B respectively. The archival data is throughout chapter two of this thesis.
The unit plan was written with the state of Georgia’s Department of Education
standards, called the Georgia Performance Standards [GPS], as a resource for alignment.
The validity for this resource and data collection method is discussed in Chapter Three of this
thesis. Furthermore, the rubric for the unit plan was peer reviewed by the Title I mathematics
instructional coach at my high school. This reviewed rubric was to ensure alignment to the
GPS and the mathematic content of the course while obtaining a highly qualified and trained
eye on the study’s pedagogy. The validity of this method of data collection was also
discussed in Chapter Three of this thesis.
The review of the literature was made to locate previous academic studies to ensure
consistency and reliable findings for this study. Once again, the validity for this method is
also discussed in detail in Chapter Three of this thesis. From the archival data, the emerging
theme of the literature is educators and education designers are probing for anything that will
increase student learning and mastery of the concepts. In doing so, there have been educators
and designers of education who have found success in tiered alternative assessment within
the classroom. Those researchers have run into their own problems with their own studies,
Tiered Summative Assessment 40
such as Watt (2005) having trouble with the veteran teacher not buying into the ideas of
tiered assessments, and the teachers’ attitudes toward the testing method were poor. This
same notion was echoed by Tomlinson (2000b), urging that teachers must accept that change
is necessary to teach today’s student and assessment change and differentiation was just a
natural progression of instructional differentiation; some teachers were resisting the change.
The literature also illustrates the emerging theme that tiered assessments within the
classroom, at least for many of the researchers spotlighted in Chapter Two of this thesis, have
had some success with tiered assessments and student outcomes have increased.
Focus question 2 of this research deals with student outcomes of my study. As
explained in chapter 3 of this thesis, my study consisted of 21 of students, 17 regular
education students and 3 mainstream special educations students and 1 gifted student in the
untreated or control group. The treated group consists of 28 of students, 23 regular education
students and 2 mainstream special educations students and 3 gifted students. This is
explained explicitly in Chapter Three of this study.
The students in both groups were given an identical pretest on the content prior to any
instruction. This took place on day 1 of the study. The students were never exposed to the
material prior to the pretest as the content is not covered in any prerequisite course. The
high grade on the pretest in the untreated group was a 72%, the lowest was a 0%. The class
mean of the untreated group on the pretest was 19.5%, the median score was 12%. For the
treated group, the high pretest score was 45% and the lowest was a 6%. The class mean for
the treated group was 21.8%, with a median score of 18%.
The two classes’ pretest scores were statistically independently t-tested with unequal
variances to an alpha (confidence level) set at P < 0.05. The null hypothesis for the t-test was
Tiered Summative Assessment 41
there was not a significant difference of the groups’ pretest scores. For the hypothesis to be
rejected, the obtained value [OV], obtained from the data, must be larger than the critical
value [CV], created by setting the alpha to 0.05. From the independent t-test of the pretest
scores between the treated and untreated group, the CV was 1.693889 and the results from
the t-test was t(32) = – 0.51458, p > 0.05. The purpose of this t-test was to show that both
groups were relatively equal in ability and prior knowledge of the material from the onset of
the study. Rejecting the null would quantitatively show a significant difference between the
groups and not rejecting the null would show no significant difference in the students’ ability
at the beginning of the study.
Posttests were given to both groups at the end of the unit, the last day of the study.
The high grade on the untreated group’s posttest was 95% and the low grade was 35%. The
mean of this group was 67.3% and the median score was 71%. This assessment, or posttest,
was untiered and without student choice. The students simply completed the assessment, or
posttest, that they were handed on the day of the assessment, the final day of the study. A
dependent t-test was conducted using a 0.05 alpha with the null hypothesis being no
significant difference between the two sets of scores. This dependent t-test was used to
determine the natural learning growth to normally expect on the posttests. As the instruction
was delivered throughout the unit, natural learning is going to occur. The t-test was
performed to determine how much was to be expected in the final score comparisons. The
results from this t-test of the untreated group’s scores, pretest and posttest, was t(20) = –
9.4381, p < 0.05 and the CV was 1.724718.
The treated group’s posttests scores were also dependently t-tested with the pretest
scores for the treated group. The treated group’s high grade on the posttest was 95% and the
Tiered Summative Assessment 42
low grade was 52%. The mean score was 75.1% and the median score was 72.5%. The
dependent t-test was conducted using a 0.05 alpha with the null hypothesis being no
significant difference between the two sets of scores. This t-test was used to compare with
the dependent t-test of the untreated groups’ pretest and posttest scores to determine if
student outcomes were higher with the treated or untreated group so conclusions could be
drawn on the effectiveness of tiered assessment in the classroom. Also, this test could show
a similarity in the normal learning curve for each group. The results for this t-test was
t(27) = – 17.183, p < 0.05 and the CV was 1.703288.
The final t-test was an independent t-test with unequal variances on the treated
groups’ posttest scores with the untreated groups’ posttest scores. The alpha for calculating
the critical value of the t-test was 0.05. The null hypothesis for this t-test was no significant
difference between the scores. This test was used to determine the effectiveness of a tiered
assessment model in a multi-level/ ability classroom. This gave a more accurate account of
how well of not so well the assessment model did for the study. The results for this t-test was
t(30) = – 1.89661, p < 0.05 and the CV was 1.695519.
An effect size test was run on the posttests of the treated group with the posttests of
the non-treated group. Since the test was run between two different classes of two different
populations, a Cohen’s d was run on this data. The treated group hosted a mean score of
75.07143% with a standard deviation of 11.98434. The non-treated group had a mean score
of 66% with a standard deviation of 18.84005; the Cohen’s d = 0.57. In addition an effect
size statistic was run on the pretest and posttest of the treated group and the control group’s
pretest and posttest scores. The effect size for the treated group’s pre/posttest scores was
0.91; and the effect size for the control group’s pre/posttest scores was 0.82.
Tiered Summative Assessment 43
Focus question 3 deals with the attitudes of the subjects and the researcher throughout
the study. As part of the data collection for this thesis a reflection journal was kept by this
researcher to record the day to day observations and attitudes of the students as well as this
researcher. Even though the journal was more of a summary piece that deals with the study
on the whole, it will be discussed first here, but last in chapter 5, when it will be analyzed.
Writing prompts for the journal were used to ensure consistency though out the process. The
emerging themes were recorded and interpreted. The interpretation of these themes will be
revealed in chapter 5 of this thesis. For now, the emerging themes are listed from the journal
in two parts, the students’ attitudes towards the study and the execution of it, and this
researchers’ attitude toward the study and pedagogy of it.
From the reflective journal kept by this researcher throughout the duration of this
study, several themes emerged. First, the students seemed to be a little bewildered with the
process of tiered assessments in the beginning of the study, but as the study progressed, this
confusion began to subside. The students frequently asked questions about how each test
was structured and aligned with the standards of the class. They were also curious about how
many points each test counted towards their overall grade and how they could plan for which
test they were going to choose for the points they felt they could earn.
Another observation made during the process was the students seemed very aware of
what they understood and what they needed to learn in terms of the content; they would
frequently ask questions such as, “If I understand addition, subtraction, and multiplication,
but not inverses, which test should I take to make the best grade I can?” Then follow up
questions would be, for example, “If I learn inverses, should I take the Excels test?”
Towards the end of the study, when the assessment was within reach, students would also say
Tiered Summative Assessment 44
things like, “I don’t believe I know enough to take the Excels test.” Another student was
overheard saying, “I made a 100 on the first quiz and a 92 on the last one, so I am definitely
taking the ‘X’ test.” X refers to the Excels test.
Pedagogical observations also became apparent during the study. It was noted that a
structure was seemingly needed in terms of a study guide for each test. The students wanted
to know what they needed to understand to use as a checkpoint with choosing the appropriate
test and to determine how well they would do on that chosen assessment. This was asked for
by many throughout the study, but mostly toward the end when the students started planning
for which assessment they would choose.
It was also recorded that my school’s administration and Title I content coach began
taking notice and asking questions of how the study was conducted and why. Some other
content department leaders also became interested in the study and the workings of it. It was
even mentioned at our school system’s monthly math meeting highlighted in the
differentiation portion of the meeting. Teachers, colleagues, and other education
professionals were taking interest in the study pedagogically, and they wanted the results of
the study, whether it worked or not; worked meaning did it increase student learning and
outcomes.
Two surveys were completed by the subjects of the study. One survey was
administered to both groups, Appendix E, as a hypothetical and baseline developing survey
after the completion of the unit. It consisted of eight questions answered on a Likert scale
from 1 to 5, 1 being “Strongly Disagree” and 5 being “Strongly Agree.” This anonymous
survey was designed to give the researcher an overall assessment of the attitudes of the
subjects with regards to the content and their attitudes about the possibility of a tiered
Tiered Summative Assessment 45
assessment program at some unknown point in the future. The control group knew nothing
of the study or a tiered assessment policy at the time of this survey. A Cronbach’s Alpha was
done on this survey for internal consistency reliability with an obtain Alpha of 0.44 for the
treated group and 0.19 for the control group.
Table 4.1: Chi Square for Treatment and Control Student Surveys2 Treatment n = 28 2 Control n = 21
Q1: I like math. 2.1 3.7Q2: I feel that a test grade shows my teacher how much I really know about a unit. 7.6 3.1Q3: I feel that having a choice on what level of test I take will improve my chances to pass. 18.8*** 18.6***Q4: I like having an option on which level test I take. 34.4*** 32***Q5: I feel that taking one version of a test will increase my chances of failure. 26.3*** 2Q6: I feel that if I know the material and I am properly prepared, the type of test I take will not affect my grade. 5.6 5.3Q7: The tier tests options game me confidence that I could pass the test. 19*** 6.4Q8: All students should take the same tests. 11.6* 10.9*
* P < 0.05, ** P < 0.01, *** P < 0.001
From this survey a chi squared statistical value was obtained from the Likert scale
formatted survey for each question of the survey. In the treated group, from figure 4.1,
question 1 showed 2(4) = 2.09; p > 0.05. Question 2 gave 2
(4) = 7.64; p > 0.05, question 3,
Tiered Summative Assessment 46
2(4) = 18.80; p < 0.001, question 4, 2
(4) = 34.40; p < 0.001. The remaining questions, 5, 6, 7,
and 8 yielded 2(4) = 26.31; p < 0.001, 2
(4) = 5.58; p > 0.05, 2(4) = 19.00; p < 0.001, 2
(4) =
11.60; p < 0.05, respectively. Each question will be listed in Chapter Five of this thesis, along
with its chi squared value with the interpretation of each question.
The non-treated group’s chi squared values for this survey are as follows, question 1,
2(4) = 3.67; p > 0.05; question 2, 2
(4) = 3.11; p > 0.05; question 3, 2(4) = 18.59; p < 0.001;
and question 4, 2(4) = 32.00; p < 0.001. The values for questions five through 8 are: question
5, 2(4) = 2.00; p > 0.05; question 6, 2
(4) = 5.33; p > 0.05; question 7, 2(4) = 6.44; p > 0.05;
and question 8, 2(4) = 10.89; p < 0.05.
Another survey, actually the first of the two sequentially, was given to the treated
group just prior to the beginning of the unit before any instruction and preparation for the
assessment had begun. The completely anonymous survey consisted of five open ended
questions that were coded for emerging themes. From question 1 the students simply listed
which level assessment they chose for the posttest. From the three choices, three students
chose the Meets, or lowest level, test, thirteen students chose the Exceeds, or middle level,
test, and 7 chose the Excels, or highest level, test.
Question 2 asked to students to explain why they chose that particular test. The three
factors that drove their decisions were security from failing, confidence in the material, and
the points offered for each test. Question 3 posed a counter reason for not choosing one of
the other posttests. The points offered for each test, confidence in the material, and the
student’s feeling of the test being a true measure of what they actually knew were the most
common reasons given for question 3.
Tiered Summative Assessment 47
Question 4 asked to students why having a choice in their post assessment was
important to them. The most common answers were students felt that the choice gave them
confidence that they could do well on the assessment; the choice gave them flexibility and
control of their learning putting it more in their terms. Also, the students felt that the multi-
levels gave them a sense of preparation, they either knew how prepared they were or needed
to be prior to the test. One student felt that having a choice did not matter.
Question 5 asked the students to explain why a tiered assessment program would
help/hinder his or her grade holistically. The emergent themes here were the common
confidence, preparation, and flexibility. The option gave the students confidence to do well
holistically if continued throughout the course, gave them a clear picture of how prepared
they were or needed to be from each unit to the next, and the flexibility to change their level
from unit to unit depending on how well the felt that understood each unit separately. And
once again, one student felt that the tiered assessment program did not matter or would not
present any changes to his or her final grade.
Tiered Summative Assessment 48
CHAPTER FIVE – ANALYSIS AND DISCUSSION OF RESULTS
Analysis of Results
For focus question one, how can tiered assessments be fused into the curriculum, data
were gathered on three aspects of the question. A unit plan was devised by the researcher
with detailed plans of implementation of the tiered assessment. This plan was aligned with
the content of the course as laid out by the county’s department of education which was in
turn structured based on the curriculum from the state of Georgia’s curriculum through
Georgia Performance Standards [GPS]. A rubric was also developed based on the unit plan to
be used solely for grading the unit plan by the school’s Title I Mathematics Coach. The
Math Coach has more than 20 years experience in the classroom.
The purpose of the rubric was to have an experienced set of eyes, unrelated to the
study, examine the unit plan checking for validity and alignment to the content and GPS
standards. In addition, the Math Coach, whose training is based on student success with
increased use of differentiation in the classroom, was used as the grader of the rubric for the
purpose of aligning the tiered assessments with the content under the umbrella of the use of
differentiated assessments in the classroom. With the training, experience, and title, the
Math Coach was well qualified to grade the rubric and unit plan accurately and with merit.
The results of the rubric from the Math Coach revealed that the unit plan was well
aligned not only with the county’s curriculum but also the GPS. In addition, the unit plan
and rubric also show that each assessment of the tier was also aligned by difficulty with the
Georgia High School Graduation Test [GHSGT] as the Meets assessment, the easiest test,
was aligned in question difficulty with a minimum passing grade on the GHSGT. The
Tiered Summative Assessment 49
Exceeds assessment was successfully aligned with a score of 516 on the GHSGT which is
equivalent with an Exceeds Standards score on the GHSGT. The Excels assessment was
successfully aligned in difficulty with the GHSGT as these questions are deemed too difficult
to show on the GHSGT but are still aligned with the content put forth by the GPS.
In addition to having the unit plan scrutinized by experienced personnel in teaching
experience and county and state standards, as well as, hands on workings with the GHSGT
by a rubric that was specifically designed to test for such alignment and validity, authors’
works were researched looking for specifics on how to integrate a tiered assessment program
seamlessly into the curriculum with success. Wormeli (2006) explains the importance of
students having a personal connection with the assessment if possible. The students making
a choice on which level assessment that they were to take is that student owning his or her
test and having a say in the final product. Scouller (1998) explains that infusing instruction
with a choice based tiered assessment with give the students a personal stake in the
assessment and their learning. And Scouller’s (1998) study showed significant differences in
students who were offered different levels of assessment as measured against a group of
students who were only offered one test.
Focus question two, what is the process by which tiered assessment effectiveness can
be measured; data were also gathered on three aspects of the question. First, the two groups
were given the same pretests over the material. This pretest was to establish that the groups
were on equal playing fields in terms of pre-knowledge prior to the beginning the unit. From
the scores of the pretests, an independent t-test was run on the two groups with unequal
variances, because the size of the two groups was different. The t-test showed a critical value
[CV] of 1.693889 and the results from the t-test was t(32) = – 0.51, p > 0.05. Since the
Tiered Summative Assessment 50
absolute value of the Obtained Value [OV] of the t-test was smaller than the CV, statistically
the researcher fails to reject the null hypothesis. The null hypothesis for the t-test yielded no
significant difference of the groups’ pretest scores. Therefore, one can conclude that there
were no significant differences between the two groups’ prior knowledge of the material at
the onset of the study. And the aforementioned even playing field has been established.
The second part of focus question two, the pre-post test scores of the two groups
independently from one another, was designed to show the natural learning curve as a direct
result from instruction, and a larger gain of student outcome for the treated group as a result
of the study. Because this study focuses on the summative assessment, measuring for an
increase in student outcomes in the treated group comes directly from the knowledge of the
students beforehand that they would have a choice in which test they would take at the end of
the unit. The increase here can be attributed to the students’ knowledge of what was to come
on the assessment as far as they would have a say in which level assessment they would take.
The untreated group did not know what to expect and had no choice in the matter. This is not
to say that the treated group saw test questions prior to their assessment; they did not, only
that they had a choice on the level of difficulty of the test. Scouller (1998) echoes this notion
in chapter two by saying that students perform better when they have a choice in the
assessment process.
These t-tests were run based on the group. A t-test was run on the pretest of the
untreated group versus the posttest scores of the same group. And another t-test was run on
the pretest of the treated group versus the posttests of the same group. For the untreated
group, the results on the t-test revealed a CV of 1.724 and an OV of t(20) = – 9.4, p < 0.05.
Again, with the absolute value of the OV greater than the CV, statistically the null hypothesis
Tiered Summative Assessment 51
would be rejected that there were no significant differences between the scores. But that is to
be expected because one score is before instruction and one score is after instruction.
An effect size was run on the untreated group to determine if the group was large
enough to yield valid results. The effect size for the control group’s pre/posttest scores was
0.82. According to Salkind (2010), any effect size larger than 0.50 is a large effect size.
This means that the two scores have little overlap in similarities which makes for a stronger
argument in the validity of the findings. The closer the effect size is to 2, the stronger the
argument, that the findings are validity because the chosen testing group was large enough to
yield accurate results. In addition, the effect size for the treated group’s pre/posttest scores
was 0.91. And again, because the effect size is larger than 0.5 and close to 1, the treated
group is also considered large enough to produce valid findings.
In the treated group, a t-test was run for the same reasons as the t-test for the control
group. The t-test of their pretest and posttest scores rendered t(27) = – 17.183, p < 0.05 and
the CV was 1.7. Once again, the null hypothesis should be rejected because the absolute
value of the OV is greater than the CV. Again this is to be expected.
When compared to the learning curve in the untreated group with effect size 0.82, it
can be argued that because the students were aware they had a say in the final assessment,
their learning curve increased and ultimately they retained more of the material during the
study and unit. The treated group had an effect size of 0.92 which shows a stronger argument
than the 0.82 effect size of the untreated group, and it has already been established that both
groups started out the lesson in the same place in terms of knowledge. It can be said that the
treated group learned more, almost double by comparison, of the same material in the same
amount of time from the same instruction, only differing in the final assessment and the
Tiered Summative Assessment 52
knowledge that there would be a student decision made in which assessment the students
would ultimately take. Herman et al., (1997) associated the increased student outcomes of
their study directly to the students knowing that they had a choice in the assessment prior to
instruction; and significant gains were measured.
The third part of focus question two dealt with the posttests of each group compared
for significant difference. A t-test was run on each group’s posttest scores, using unequal
variances of an independent t-test. This test was to test the null hypothesis that there was no
significant difference between the two scores. Anyone can see for this thesis to have
viability, there should be a significant difference between these two groups. The results for
this t-test was t(30) = – 1.89, p < 0.05 and the CV was 1.69. Because the absolute value of the
OV is greater than the CV, statistically the null hypothesis is rejected and this test showed
that there were significant differences in the scores. Wheadon and Beguin (2010), and Whipp
(2004) both recorded significant gains in the middle achieving and low achieving students
simply by introducing alterative levels of assessment those students would take. Ultimately,
the researchers all showed improvements in student scores by tiering the tests for their
respective groups.
As previously discussed in the second part of focus question two the treated group’s
scores were significantly better than the untreated group’s scores. Thus, the posttest scores
from the treated group increased on a greater magnitude than that of the untreated group. A
Cohen’s d was run on this data to determine if the groups’ sizes were large enough to
validate the study. The treated group hosted a mean score of 75.07143% with a standard
deviation of 11.98434. The non-treated group had a mean score of 66% with a standard
deviation of 18.84005; the Cohen’s d = 0.57. That value is a large significance which means
Tiered Summative Assessment 53
that the groups were large enough to create a strong argument for the viability. The strength
of Cohen’s d is the same as effect size, if the number is larger than 0.5 then the group is large
enough to argue a strong case for validity. Since this Cohen’s d is above that mark, then this
test has validity.
Focus question three, how do students respond attitudinally to tiered assessments,
deals with the students and the teachers feelings or attitudes about the tiered assessment
model posed in this study. For this part of the study, surveys and a reflective journal was
administered and kept recording attitudes and observations by the students and the teacher.
These findings were in turn coded for themes and interpreted. In the initial stages of the
study, the treated group was given a survey that asked for their attitudes towards a tiered
testing model and math in general. Second, at the conclusion of the study, a survey was
given to both groups, asking how they felt about having a choice of a tiered assessment
program. The control group’s answers were theoretical because this was their first exposure
to the process. The treated group’s answers were more of a reflection from the student’s
point of view. Lastly, a reflective journal was kept throughout the study by the researcher on
observations about the process and attitudes of the student and teacher, and any other
information that the researcher deemed relevant to the study.
The first part of focus question three, the initial survey given to the treated group was
coded for emerging themes. This survey was discussed last in chapter 4, but first here
because it was chronologically first. At the time of the survey, the students had been
explained the concept of tiered testing and specifically the program installed for the study.
All of the specifics were explained, the students were given a handout to read about it, and all
student questions were answered. However, the unit had not started at that time as the unit
Tiered Summative Assessment 54
had not begun. The survey questions were open ended and the results were gathered
qualitatively.
For the first three questions, the survey asked which test the student would ultimately
choose, why they would choose that one, and why they would avoid the other tests offered.
Most students chose the hardest, X, or the middle, E, test due to the rewards (points) offered
for taking the harder tests. Some shied from the hardest test, stating that lack of confidence
in the upcoming material would scare them away from that test. However, for the most part,
students did not “take the easy road” as some may think.
The students felt that they were rewarded from taking the harder tests. This was a
tiered points option that was part of the model. As the level of difficulty increased the
number of points that they could earn increased respectively. This motivated the students to
attempt the more difficult test for the chance to earn the most points translating into a higher
grade.
Question four of the survey dealt with student choice. It asked students to discuss
what it meant to them to have a choice of assessment. Herman et al. (1997) showed in their
study that students responded positively when given a choice in assessment. Scores
increased and more material was retained. The students of this study echoed that of Herman
et al.’s students. The students of this study showed that having a say in their assessment
motivated them and gave confidence in the impending assessment. The choice notion
emerged as being one of the vital components to a successful tiered model. Without the
strength of using choice with the tiered model, it would not have been as successful.
Question five from the survey deals with the student’s holistic view of a tiered
assessment system full time in the classroom. Again the vast majority of the students
Tiered Summative Assessment 55
enjoyed having a choice to choose the difficult test for one unit and changing to a less
difficult one for the next, if they struggled, and going back to the harder test for the next unit
if their struggles diminished. Confidence in how well they were learning the material was
another frequent response. Students felt that by having an option on which test to take and
when they had to make that choice, at the end of the unit, they felt there was a lower risk to
stretch their levels of learning to reach for a high learning threshold.
Part two of the focus question three was the second survey that was given to both
groups after the conclusion of the unit after the posttest. This survey was not open ended but
provided answer options on the Likert Scale from 1 to 5, 1 being “Strongly Disagree” and 5
being “Strongly Agree.” The eight questions asked both groups their opinions about tiered
assessments and were analyzed using Chi Squared and Cronbach’s Alpha statistics. The
Cronbach’s Alpha for the control group, 0.19, and for the treated group, 0.44, showed
according to George and Mallery (2003) to be unacceptable internal consistency. This could
be due to confusion on some of the questions, which was reported by some subjects; or
subjects not taking the survey seriously.
Neither group’s answers for question one, I like math, were concentrated enough to
make it significant. Although the result was not really expected for this question to be
significant, it does show that the studied groups were ordinary high school classrooms filled
with students of varying likes and dislikes toward school and particular subjects. Question
two, I feel that a test grade shows my teacher how much I really know about a unit, was
designed to see how much students understood about the reason for testing. The results were
also not significantly concentrated toward one answer. Again, this is not a surprise, for many
students never analyze why they are tested other than the teacher assigned it.
Tiered Summative Assessment 56
Question 3, I feel that having a choice on what level of test I take will improve my
chances to pass, both groups felt that this question was extremely significant in the Strongly
Agree direction to a significance of *** which is p < 0.001 that this question being answered
this way was a random, chance event. The students feel very strongly that having an option
will contribute to increasing their test scores. Having a choice was obviously very important
to the students. Question 4, I like having an option on which level test I take, also had major
significance. The students strongly agreed with this question making it strongly significant to
*** also which again is p < 0.001 a random, chance event. This question along with question
3 shows that the students felt that having an option was a good thing and they felt it had the
potential to increase their scores.
Question 5, I feel that taking one version of a test will increase my chances of failure,
was a problem question, many students complained during the administration of the survey
that is was confusing and the students did not understand what specifically it was asking.
The treated group, used the undecided vote for “I don’t know” so it became significant in the
middle or undecided direction. The control group did not make this question significant as
answers to this question were all over the option list. Because of the issue with this question,
I did not use the information obtained from this question in my interpretation or analysis of
the data.
Question 6, I feel that if I know that material and I am properly prepared, the type of
test I take will not affect my grade, was not significant to either group. Many students were
undecided here. It could have been from a question clarity problem or maybe the students
just did not have an opinion about this question.
Tiered Summative Assessment 57
Question 7, the tier tests option gave me confidence that I could pass the test, with the
treated group, this question was significant sharply toward strong agree to a significance of
*** or p < 0.001. This question really shows that those students felt the model gave them
confidence and ease with taking the test because it was tiered. The control group did not feel
this question was significant, but 10 out for the 18 that took that survey answered agree or
strongly agree and only 2 of the 18 marked disagree or strongly disagree. 6 students
answered undecided. I am not sure why they answered this way but even with how the
answers were scattered, it still appears that the majority of the students felt in favor of the
tiered model giving them confidence to pass the test.
Question 8, all students should take the same tests, was made significant by both
groups to * or p < 0.05. This question was a direct result from the tier assessment
conversation, so it definitely in the context of tiered of not tiered tests. From this
significance in the strongly disagree direction, it did matter to the students that they have the
option of taking a tiered assessment of the material. In all, the students showed in this survey
that having a choice of a tiered assessment was important to them because they felt it gave
them confidence on passing the test. They also showed, in their minds, that all tests are not
one size fits all.
The third part of focus question three was the reflective journal that was kept
throughout the study. The journal was an ongoing record of observations and attitudes of the
students and teacher during the process. The most emergent theme from the journal was the
students were continually trying to understand the process of the tiered model; it was obvious
that they had not been exposed to a tiered assessment model before. As the students got used
to the idea of a tiered model, they were very engaged and aware of how and how well they
Tiered Summative Assessment 58
needed to learn the material in order to take a particular test. In this journal it was recorded
that the students took a great interest in the test. That interest translated into the students
engaging the material with purpose throughout the study. This journal finding had
corroborated the finding from the pre/post student surveys in that the students felt having
choice of the test was beneficial to their learning and test scores.
Discussion
This study gave students a choice of a tiered summative assessment model that was
pre-organized and prepared. This model offered an easy, medium, and harder summative
assessment over the same material for the students to choose. It is important that the choice
and the different levels go hand in hand. Without the choice, I feel that the students would
not have embraced the different levels of assessment the way they did or reached farther than
their comfort zone to pass the hardest test. Of course without any options, the choice is a
moot point. As the study progressed, after the survey questions and observations made, it
became increasingly apparent that the choice was just as important as the multiple levels.
This study could have been done without the student choice. Research was found where
scholars chose the test for the students based on past and present ability and performance
with many finding positive results. But I felt giving the students the opportunity to choose
for themselves would hopefully tap into their motivation and accountability for their own
learning, in the process the students’ outcomes increased.
The results obtained from this student came directly from the students. All I did was
really prepared the lessons and structure of the tiered program. Other than teaching the
material and doing all the work any normal teacher does, the students drove the study;
especially the results. The students felt like they were participants in their learning because
Tiered Summative Assessment 59
they were part of the planning for it. They had a say so in the process, at least for the final
test, but they took ownership of that from very early in the study. Further, they prepared for
that hard test or that middle test from the onset. They understood that there was a failsafe
test, the easiest one, so they did not feel they were at risk for trying to learn the more difficult
concepts.
I was surprised on the findings, I must admit. I thought students would not step up to
the challenge, take the easy test and get it over with. I am not the pessimistic teacher by any
means. That is not at all what happened. The students, once they understood how the tiered
program worked, ran with it. They asked questions, I never thought they would come up
with. They showed interest in their own education, which is increasingly rare, especially in a
middle to low level high school math classroom. They cared about the grades again. They
worked toward understanding the material well enough to take the hardest test. From 49
students taking this test in both groups, only three M (easiest) tests were taken, one in the
treated group, and two in the control group. That surprised me; I thought the numbers would
never be that low on the easiest test. But that just proves, that the students strive to be better,
and they want to be better than basic.
Tomlinson (2000a) says about assessment, “with differentiated instruction in full
swing, differentiating the assessment is a natural progression” (p. 28). Two aspects of this
study are very relevant to today’s educational trends. The method of assessment is
considered differentiated instruction and assessing. No one in the education field has to be
told how important that is. Teachers are constantly trying to find and implement new what to
motivate, educate, and graduate students by changing up the pace, delivery, and method of
instruction and now testing. Also, it is an assessment, specifically a summative assessment.
Tiered Summative Assessment 60
Testing drives the education world right now. There are tests for everything, and there seems
to be a different high stakes tests every month in our classroom these days. Students in this
study became excited about the material and even the test itself. With this study, students
test scores increased. That is what, I am sure, every school administrator in the country is
preaching right now.
Choosing the test became a big deal to the students. They gave great thought into
which test they were to take. There were several students that prepared to take the middle test
that, at the last minute, requested to take the hardest one. That would drive some students to
tears of joy, but for me, pride. I became proud of my students for striving to be better than
average, for deducing for no one but themselves, which test to take and not fearing the worst.
As teachers, if we could get most or all of our students, especially the middle and low level
students, to work that hard and care that much, then it would all be worth it.
Structural corroboration was a concept that was carefully planned throughout this
study. The notion of triangulation was paramount in the development of the study. With
three focus questions, each one was researched from at least three different aspects, hopefully
pointing all back to the same answer. With this study triangulation was achieved, therefore
this study has credibility.
For the first focus question, how can tiered assessments be infused into the
curriculum. For this question, a detailed unit plan, a rubric critiquing the unit plan by a
highly qualified third party, and archival methods already accomplished by published
researchers guided the focus question towards one answer. The pedagogy of the study was
answered, therefore triangulation for focus question one was achieved.
Tiered Summative Assessment 61
Focus question two, what is the process by which tiered assessment effectiveness can
be measured? For this focus question, multiple statistical tests were run on the data. Four t-
tests, two Effect Size statistics, and one Cohen’s d statistic, all showed that the groups started
in the same place and the treated group showed more retention and yielded high test scores
than the control group. Just for good measure, archival data was introduced from other
successful researchers corroborating my findings as not uncommon.
Focus question three, how do students respond attitudinally to tiered assessments.
Two surveys were used for this part of the study along with a reflective journal for
triangulation. Again, archival data was introduced from scholars having already recorded
their findings and student attitudes for corroboration with my findings. As the student
attitudes towards tiered assessments become more prevalent during the study, this focus
question and its data gathering methods become the most important part of the study. This
was a surprise, because the study was focused on comparing test scores, seemingly focus
question two would have been the driving force, but no, the students’ attitudes towards the
study really are what makes the study worth repeating and implementing as the norm in the
classroom.
From the three focus questions, approached from at least three different directions all
pointing toward that same outcome, with statistical data showing validity and strong results,
this study in indeed strong enough to argue and make judgments from. It is credible and has
rightness of fit. This study was successfully accomplished with no altering of any data, and
research done during this study point toward the same findings of the study. The study has
impressed this researcher enough that it will become my testing policy for the near future,
ready for full implementation in my classroom for next term. Already, two other
Tiered Summative Assessment 62
departments at my school have shown interest in using this testing method in their classes,
and I am currently “teaching” those colleagues how to implement this program into their
classrooms.
Implications
Even though this study was too small to generalize for all classrooms, all students
who are taught will eventually take some sort of summative assessment during their
education. No matter what the discipline, because all courses have standards that guide
educators on what is considered basic and advanced understanding of a given topic, there
will be an assessment. When creating the assessment for a given topic, it is easy to build a
test bank of questions ranging from a basic understanding of the concepts taught to a more
complex and even higher understanding of the concepts than the standards require. From
that bank of questions, tests can be created by grouping the questions by like difficulty. If
more than one multi-leveled test is formed, then tiered assessment has been created. The
testing is summative and will appear at the end of the testing period; there is no need for
much altering of already constructed lesson plans. Further, because the testing choice is an
individual decision made by each student, the size of the class or group of students involved
in the testing model is limitless.
It was discovered that students enjoyed having power over their learning by being the
master over which assessment they would take to show what they learned. Choosing their
test gave the students a sense of ownership over what they learned and it motivated them to
work harder because they felt a sense of collaboration with the teacher instead of
subordination. With that said if teachers can find a way to make the students take ownership
of their learning, it will motivate them to learn to the top of their potential and sometimes
Tiered Summative Assessment 63
push the bar farther no matter the course description. These students engaged their education
in a way that I have never seen before. This increased engagement is because the students
found a personal connection, even if just for a good grade, to the material; the students made
the learning process their own because they had control of the outcome in more than a
normal situation.
For anyone in the education field who wants their students to achieve higher grades,
while taking accountability for their own learning, this process can help. This study has
shown that my students achieved higher test grades, retaining more information, while
challenging themselves to process the most difficult aspects of the topic all by their own
volition without any pressure from the teacher. Moreover, for Referential Adequacy, this
study is easy to replicate because it is simply test modification from the normal classroom
environment. This study was simple and focused on one thing, the summative assessment.
Great strides were taken to ensure that no other variables were introduced that may taint the
findings and so the findings would be reflective of only the alternative testing method. As a
byproduct, reducing the variables made the study simplistic in nature and easy to replicate.
Since the study ended my students did not want to return to not having a say in how
they were assessed. They reported that they felt more confident in what they learned and
suffered less test anxiety because they chose the test. It was their decision, not someone
else’s. The students requested that all of their assessments be tiered from now on. In trying
to help in any way I can to make them learn, I have accommodated. After a couple of units
passed since the study, I noticed that grades were dropping and the students’ attitudes began
to drop, so I tiered the last two unit tests, grades shot back up and morale increased. So,
tiered assessments is the method by which all of my students from here on will be assessed.
Tiered Summative Assessment 64
I have reported earlier that many departments in our school and throughout the county
that have become aware of the testing model have shown interest in implementing this model
for their departments/ classrooms. Tiered assessments in practice will be the norm for my
department next fall, as all the teachers in my department agreed to implement the model
across the board for math. The Social Studies and Science department have scheduled
meeting with me and my administrator to develop their own pilot programs for testing. The
school has embraced the notion of alternative assessment and is willing to hear from those
trying to help students achieve higher.
As for me, I am sold. Starting immediately, all of my classes, at the students’ request,
are tiered testing classes for summative assessments only. I have learned that I can become a
better teacher by listening to my students and working with them as “colleagues” with their
education, as a facilitator of the education instead of a provider of the education. My whole
philosophy toward teaching has changed. I no longer feel that I am giving students’
knowledge, but assisting them in creating their own knowledge.
Impact on Student Learning
As I have said in previous paragraphs, the major emerging theme from this study was
the students’ self-imposed accountability for the curriculum. That alone is enough to change
a syllabus. Getting students to care about their own education is an argument I have heard
from just about every teacher I have ever met. “If I could just get them [the students] to care
about their grade as much as I [the teacher] do…” is something that even I have said. But
taking a statistical look at what this model did for students is another but related story.
Administrators, schools, systems, states and even the federal government seemingly
care about one thing, test scores. If test scores are an indication of what a student has learned
Tiered Summative Assessment 65
then a higher test score means the student learned more. These scores increased because of
this study. Looking at the t-tests from the pre-post from each group, the natural learning
curve in the control group yielded a higher effect size calculation for the treated group.
The natural learning curve increased due to the study, and test scores improved. Classroom
means increased and the median score was higher.
Recommendations for Future Research
Fortunately for me, my statistical data all pointed towards to same outcome, so I
really did not have any data that I could not explain. In addition, the qualitative data also was
very concentrated into terms of the emerging themes. However, I do believe that I did not
eliminate all of the variables that I set out to eliminate; as it is ludicrous to believe that all
variables in a classroom of thirty students could all be controlled or eliminated. As I worked
on the study I realized that there is a current condition that I did not take into consideration.
I did not examine the effect of test anxiety on students and how it can reduce test
scores. In addition, I did not research how test anxiety can be reduced and whether student
choice is considered a reliever of test anxiety. As a spinoff of this study, the investigation of
tiered assessment as a reducer of test anxiety would make for a strong argument in favor of
tiered assessments’ implementation in the classroom if favorable outcomes can be obtained.
The largest part of this thesis that will need further investigation is the idea of student
choice. I realized during the study from my students that student choice was the major
contributing factor on the success of this tiered model. Without student choice, I do not feel
the study would have been such a success. I plan to continue this thesis and expand it into a
dissertation with the further inquiry in how student choice effects tiered assessments and how
working together they will increase student outcomes.
Tiered Summative Assessment 66
References
Ackermann, E. (2001). Piaget’s constructivism, Papert’s constructionism: What’s the
difference?. Future of Learning Group Publication (MIT). 4(3). 438 – 442.
http://learning.media.mit.edu/content/publications/EA.Piaget%20_%20Papert.pdf
Burns, A. (1999). Collaborative action research for English language teachers. Cambridge,
England: Cambridge University Press.
Carberry, A., Ohland, M., & Swan, C. (2010). A pilot validation study of the epistemological
beliefs assessment for engineering (EBAE): First-year engineering student beliefs.
American Society for Engineering Education. 9(1).
Cizek, G.J. (2010). An introduction to formative assessment. In H. L. Andrade, & G. J.
Cizek (Eds.), Handbook of formative assessment. 3 – 17. New York, NY: Routledge.
Crotty, M. (1998). The Foundations of social research: Meaning and perspective in the
research process. Thousand Oaks, CA: Sage Publications. ISBN 0761961054
Eisner, E.W. (1991). The enlightened eye. New York, NY: Macmillan.
George, D., & Mallery, P. (2003). SPSS for Windows step by step: A simple guide and
reference. 11.0 update. (4th ed.). Boston: Allyn & Bacon.
Golafshani. N. (2003). Understanding reliability and validity in qualitative research. The
Qualitative Report. 8(4). 597-607.
Hendricks, C. (2009). Improving schools through action research: A comprehensive guide
for educators. (2nd Ed.). Upper Saddle River, NJ: Pearson Education, Inc.
Herman, J., Klein, C. & Wakai, S. (1997). American students’ perspectives on alternative
assessment: do they know it’s different? CSE Technical Report 439.
CRESST/University of California, Los Angeles, CA.
Tiered Summative Assessment 67
http://www.nova.edu/ssss/QR/QR8-4/golafshani.pdf
Kinchloe, J., & McLaren, P. (1998). Rethinking critical theory and qualitative research. In N.
Denzin & Y. Lincoln (Eds.), The landscape of qualitative research: Theories and
issues (pp. 260 – 299). Thousand Oaks, CA: Sage Publications.
Kvale, S. (1995). The social construction of validity. Qualitative Inquiry. 1(1). 19 – 40.
Lagrange College Education Department. (2008). Conceptual framework. Lagrange, GA:
Lagrange College.
Linn, R. (1998 November). Assessment and accountability. CSE Technical Report 490.
National Center for Research on Evaluation. Los Angeles, CA.
http://research.cse.ucla.edu/Reports/TECH490.pdf
Maclellan, E. & Soden, R. (2004). The importance of epistemic cognition in student-centered
learning. Instructional Science. 32(3). 253–268. DOI:
10.1023/B:TRUC.0000024213.03972.ce
Nisbett, R.E. & Wilson, T. D. (1977). The halo effect: Evidence for unconscious alteration of
judgments. Journal of Personality and Social Psychology. 35(4). 250-256.
Oberg, C. (2009). Guiding classroom instruction through performance assessment. Online
Journal of Case Studies in Accreditation and Assessment. 1(1). 1–11. ISSN: 1941–
3386. http://www.aabri.com/manuscripts/09257.pdf
Popham, W. J. (2011). Classroom assessment what teachers need to know. (6th Ed.). Boston,
MA: Pearson Education Inc.
Rogelberg, S. G., Fisher, G. G., Maynard, D. C., Hakel M. D., & Horvath, M. (2001).
Attitudes toward surveys: Development of a measure and its relationship to
Tiered Summative Assessment 68
respondent behavior. Organizational Research Methods. 4(3). DOI:
10.1177/109442810141001. http://orm.sagepub.com/cgi/content/abstract/4/1/3
Salkind, N. J. (2010). Statistics for people who (think they) hate statistics: Excel 2007
Edition. (2nd Ed.). Thousand Oaks, CA: Sage Publications, Inc. ISBN 978-1-4129-
7102-7.
Schwartz D. & Arena D. (2009, August). Choice-based assessments for the digital age.
Stanford University, School of Education. Stanford, CA. White paper for the
MacArthur Foundation
http://aaalab.stanford.edu/papers/ChoiceSchwartzArenaAUGUST232009.pdf
Scouller, K. (1998). The influence of assessment method on students’ learning approaches:
multiple choice question examination versus assignment essay. Higher Education.
35(4). 453–472. Netherlands: Kluwer Academic Publishers. DOI:
10.1023/A:1003196224280
Soloman, P. (1998). The Curriculum Bridge: From Standards to Actual Classroom Practice.
Los Angeles, CA: Corwin Press.
Sprick, R. S. (2002). Discipline in the secondary classroom: A positive approach to behavior
management. San Francisco, CA: John Wiley & Sons.
Tomlinson, C. (1995). How to differentiate instruction in mixed-ability classrooms.
Alexandria, VA: Association for Supervision and Curriculum Development.
Tomlinson, C. (2000a). The differentiated classroom: Responding to the needs of all
learners. Alexandria, VA: Association for Supervision and Curriculum Development.
Tomlinson, C. (2000b). Reconcilable differences? Standards-based teaching and
differentiation. Educational Leadership. 58(1). 6 – 11.
Tiered Summative Assessment 69
Tomlinson, C., Kaplan, S., Renzulli, J., Purcell, J., Leppien, J., Burns, D., Strickland, C., &
Imbeau, M. (2009). The parallel curriculum: A design to develop learner potential
and challenge advanced learners. (2nd Ed.). Thousand Oaks, CA: Corwin Press.
ISBN 978-1-4129-6131-8 {pbk.}
Trochim,W. (2006). Research methods knowledge base. Social Research Methods. Online
journal. http://www.socialresearchmethods.net/kb/qualval.php
Watt. H. (2005). Attitudes to the use of alternative assessments methods in mathematics: a
study with secondary mathematics teachers in Sydney, Australia. Educational
Studies in Mathematics. 58(1). 21 – 44.
Wheadon, C., & Beguin, A. (2010). Fears for tiers: are candidates being appropriately
rewarded for their performance in tiered examinations? Assessment in Education:
Principles, Policy, & Practice. 17(3). 287 – 300. ISSN: 0969594X. DOI:
10.1080/0969594X.2010.496239
Whipp. P. (2004). Differentiation in outcomes focused physical education: pedagogical
rhetoric and reality. The University of Western Australia. Paper presented at the
AARE International Educational Research Conference, Melbourne, Nov-Dec 2004.
Wiggins, G. & McTighe J. (1999). Understanding by design. Alexandria, VA: Association
for Supervision and Curriculum Development. http://www.flec.ednet.ns.ca/staff/What
%20is%20Backward%20Design%20etc.pdf
Wood, G. H. (2005). Time to learn: How to create high schools that serve all students. (2nd
Ed.). Portsmouth, NH: Heinemann.
Wormeli, R. (2006). Fair isn’t always equal: Assessing and grading in the differentiated
classroom. Portland, ME: Stenhouse Publishers. ISBN 1-57110-424-0.
Tiered Summative Assessment 70
Yilmaz, K. (2008). Constructivism: Its theoretical underpinnings, variations, and implications
for classroom instruction. Educational Horizons. 86(3). 161 – 172. (EJ798521).
http://www.eric.ed.gov/PDFS/EJ798521.pdf
Tiered Summative Assessment 71
Appendix A
Lesson Plan: Math 3 Matrix Operations Unit
Name: Scott BarnettStage 1 – Desired Results
GPS and/or Elements (use only the elements that you teach in THIS lesson!):
MM3A4. Students will perform basic operations with matrices. a. Add/subtract, multiply, and invert matrices, when possible, choosing appropriate methods
including technology. b. Find the inverses of two-by-two matrices using pencil and paper, and find inverses of larger
matrices using technology. c. Examine the properties of matrices, contrasting them with properties of real numbers.
MM3A5. Students will use matrices to formulate and solve problems. a. Represent a system of linear equations as a matrix equation. b. Solve matrix equations using inverse matrices. c. Represent and solve realistic problems using systems of linear equations.
Enduring Understandings:
Students will understand that…
Matrices have many properties and will be able to answer questions concerning determinants, addition/subtraction, multiplication, Cramer’s rule, and inverses.
Real World Understandings (What might transfer to their world?):
Students will answer real world questions concerning matrices, such as encryption. They will use formulas to answer questions about inverses and solving systems.
Essential Question(s):
What are the properties of matrices?
How do you use the determinant to find the inverse of a 2 x 2 matrix?
How do you use Cramer’s rule to find the solution to a linear system?
What do the dimensions of a matrix have to do with how two matrices are related?
What kinds of matrices are commutative?
How do dimensions of a matrix rule how matrices are multiplied?
How is scalar multiplication different from matrix multiplication?
Knowledge (NOUNS for the GPS): Skills (VERBS from the GPS):
Tiered Summative Assessment 72
Students will know…
Properties of matrix addition/subtraction
Properties of matrix scalar/ matrix multiplication
Formula for determinant/ 2x2 matrix inverse
Real World knowledge (Where do they use this KNOWLEDGE in their real world):
Students will solve problems involving matrices in mock situations including computer email encryption.
Students will be able to…
Understand
Solve
Justify/ verify/ show
Apply
Determine
Real World Applications (Where do they use these SKILLS in their real world):
Use of these skills is evidenced by students’ ability to solve problems and work through tasks effectively.
Stage 2 – Assessment Evidence
Performance Task(s) and Product(s) to be assessed (What will they put in my hand to be assessed that they created individually):
Daily concept worksheets
1 performance task
Formal Assessment Grading Format(s) (How will I grade it, letting them know in advance how to receive every point in my grading scale):
1 unit pre test
3 homework concept checks
2 quizzes covering individual lessons
1 unit post test
Stage 3 – Learning Plan
Procedures/Sequence:
Day 1:
Students will complete the pre-test before beginning the Matrix unit
Tiered Summative Assessment 73
Students will collect and define terms that will become part of their word wall, (a cumulative collection of vocabulary terms needed for math 3).
Day 2-3:
Students will learn theorems and processes associated with matrix dimension and equality and the properties of matrix addition and subtraction. After a PowerPoint lesson, students will use properties of matrices from their notes to complete 20 questions. Answers will be checked by comparing student work to the worksheet key along with teacher checkpoints throughout the assignment with individual students. A student homework concept check will be taken after day 3.
Day 4:
Students will use learn theorems and processes associated with matrix and scalar multiplication. After a PowerPoint lesson, students will use their notes and individual teacher guidance to answer 18 questions. Answers will be checked by comparing student work to the worksheet key along with teacher checkpoints throughout the assignment with individual students. A student homework concept check will be taken after day 4.
Day 5:
Students will learn formulas and processes associated with matrix inverse of a 2 x 2 matrix. After a PowerPoint lesson, students will use properties of matrices from their notes to complete 20 questions. Answers will be checked by comparing student work to the worksheet key along with teacher checkpoints throughout the assignment with individual students.
Day 6 & 7:
Day 6: Quiz #1: concepts: matrix dimension, equality, addition/subtraction, matrix/scalar multiplication. 10 questions.
Day 6 & 7: Students will use performance task to discovery/ reinforce concepts and properties of matrices and transferring those concepts to real life situations. Answers will be checked by comparing student work to the worksheet key along with teacher checkpoints throughout the assignment with individual students.
Day 8 & 9:
Students will learn concepts and processes associated with solving linear systems of equations with matrix operations (day 8) and Cramer’s rule (day 9). After a PowerPoint lesson, students will use properties of matrices from their notes to complete 15 questions. Answers will be checked by comparing student work to the worksheet key along with teacher checkpoints throughout the assignment with individual students. A student homework concept check will be taken after day 8.
Tiered Summative Assessment 74
Day 10:
Quiz # 2: concepts: matrix inverse 2 x 2, solving linear systems, Cramer’s rule: 10 questions Unit test review/ flex grouping remediation
Day 11:
Post-Test
Enrichment, Hands-On, Student–Centered Activity:
(outlined above)
Materials:
Power Point lessons; worksheets for each day; performance task; tiered post-tests; pre-tests; quizzes; remediation assignments; homework concept checks.
1. Student LD: (i.e. Process, Product, Content)
Students may come in before or after school or during their study hall to receive more individual help from the teacher. Collaborative classes will utilize the collaborative teacher to assist in all activities, instruction and smaller group activities.
2. Student ESL: (Process, Product, Content)
Students will receive an outline of the unit in their native language.
Appendix B
Unit Plan Rubric for: Scott Barnett
Tiered Summative Assessment 75
3 2 1 0 Score Comments
Standards/ Learning Objectives
Curriculum standards and learning objectives are specific and clearly stated, linked to each concept
Curriculum standards and learning objectives are specific but vaguely stated and linked to each concept
Curriculum standards and learning objectives are included but not specific nor linked to each concept.
No presence of curriculum standards and learning objectives.
Check Points for Mastery
Check points for mastery are frequent and varied for student redirection and remediation
Check points for mastery are infrequent OR too similar for student redirection and remediation
Check points for mastery are infrequent AND to similar for student redirection and remediation
No presence of adequate check points for mastery
AssessmentPractices
Student product assessed on content and application of the content in a variety of ways.
Student product assessed on content and application of the content but not in a variety of ways.
Student product poorly assessed on content and application of the content and not enough variety.
There is no evidence of assessment of the student
Summative Assessment
Assessment adheres directly to the lesson’s standards and well-designed testing all parts of the unit.
Assessment adheres loosely to the lesson’s standards OR tests most parts of the unit.
Assessment adheres loosely to the lesson’s standards AND tests some or little parts of the unit.
Assessment has little to do with the standards and the unit.
Overall Focus on Student Outcomes
Formative assessments are well placed throughout the unit and are geared toward success on the summative assessment
Formative assessments are well placed throughout the unit and are poorly geared toward success on the summative assessment
Formative assessments are poorly placed throughout the unit and are poorly geared toward success on the summative assessment
Formative assessments are too few and poorly placed and have no correlation to the summative assessment
Appendix C
Tiered Summative Assessment 76
Reflective Journal Prompts
1. What did we do today?2. What went well?3. What went wrong?4. How did the students feel about what we did?5. How do I feel about what we did?6. Observations…?
Appendix D
Tiered Summative Assessment 77
Student Assessment Survey
Student Assessment Survey
Treatment Group
DO NOT PUT YOUR NAME ON THIS
Tiered Summative Assessment 1
Test Option 1: Meets
Mastery Level: Meets Standards GHSGTBasic Content Mastery (Score 500 on GHSGT)Maximum Grade: 80/100
Test Option 2: Exceeds
Mastery Level: Exceeds Standards GHSGTProficient Content Mastery (Score 516 on GHSGT)Maximum Grade: 95/100
Test Option 3: Excels
Mastery Level: Advanced Standards GHSGTExemplary Content Mastery (Score > 516 on GHSGT)Maximum Grade: 110/100
Tiered Summative Assessment 2
Other information necessary for your test decision:
1. Homework checks are aligned with the Meets Test2. Quizzes are aligned with the Exceeds Test3. Class notes and instruction are aligned with the Excels Test
1. Which test would you choose?
2. Why would you choose the test from the previous question?
3. Why would you not choose the other tests?
4. Explain why having a choice on which test you take does/does not matter to you.
5. Explain why a tiered testing program will improve/deteriorate my final grade for this course.
Tiered Summative Assessment 80
Appendix E
Student Assessment Survey
Student Assessment Survey
DO NOT PUT YOUR NAME ON THIS
Question Strongly Disagree
1 2Undecided
3 4
Strongly Agree
5I like math.I feel that a test grade shows my teacher how much I really know about a unit.I feel that having a choice on what level of test I take will improve my chances to pass.I like having an option on which level test I takeI feel that taking one version of a test will increase my chances of failure.I feel that if I know the material and I am properly prepared, the type of test I take will not affect my grade.The tier tests option gave me confidence that I could pass the test.All students should take the same tests.