Assessing Students’ Reading and Texts. Agenda Assessment Humor Major Issues With Accountability...

Assessing Students’ Reading and Texts

Agenda Assessment Humor

Major Issues With Accountability and Reading Assessment

America’s Infatuation with Assessment and Accountability

What Statistics Tell Us: A little and a lot

Reading Assessments for Your Class

Assessment Two types of assessment: Formal and Informal

Assessment is ever-present reality

Good teachers are always assessing their students

Good teachers are always assessing their own practice

Sign of a bad teacher is one who is not introspective and seeking constructive feedback! EXAMPLES

Accountability = Assessment ???

High-Stakes Testing

“You can’t fatten a calf by weighing it”

- proverb

(and a quote from House the floor during the NCLB debate)

Origins of High-Stakes Testing

Schooling and high stakes testing grew exponentially in conjunction with industrialization and with “modern” psychology (a la E.L. Thorndike)

Worldwide “paradigm shift” (Kuhn, 1962) from rural farming to industrialization

Origins of High-Stakes Testing

Schools as Factories: Assembly lines (age-graded classrooms)

Interchangeable parts (teachers all teach same curriculum in each grade)

Product (all students have same knowledge when finished)

Quality control (tests at specific intervals to ensure learning)

Belief that similar inputs = similar outcomes = measurable knowledge

Standardized Tests: Why Americans Love ‘em

Belief that standardized tests are inherently fair

Standardized tests lend themselves to statistical analysis and reporting U.S. population tends to

trust statistics

Belief that STATISTICS = UNBIASED

Belief that statistics are math

U.S. Statistics: Test your knowledge

At what pace has the murder rate grown over the last 40 years (in other words, how much worse is murder per capita now than 40 years ago)?

How has the violent crime rate changed in the last 30 years?

How much more likely are children to be abducted today compared to 25 years ago? 40 years ago?

Statistics

Answer to ALL of the ABOVE: Far less

Violent crimes have fallen relatively consistently since 1972.

U.S. Statistics: Test your knowledge

At what pace has the murder rate grown over the last 40 years (in other words, how much worse is murder per capita now than 40 years ago)?

How has the violent crime rate changed in the last 30 years?

How much more likely are children to be abducted today compared to 25 years ago? 40 years ago?

If you were answering these questions as a lay person—and not in the context of this class—how would you answer? How would most people you know answer? What leads people to answer as they do?

Misuse of Statistics

Take the following example:

"Every year since 1950, the number of American children gunned down has doubled.” (Children’s Defense Fund)

Now the real statistic:

"The number of American children killed each year by guns has doubled since 1950.” (Children’s Defense Fund)

Now the real statistic:

"The number of American children killed each year by guns has doubled since 1950.” (Children’s Defense Fund)

By the first account, 1995’s murder rate for children (when the first quote above appeared) would have been 35 trillion children

Statistics

When people who are ‘trusted’ quote statistics, what they say is given even more credibility (often regardless of how ridiculous the claim). Using your incredible statistical (and basic math) skills, what is wrong with the following statistical analysis?

Bad Statistics! Bad Dog!

Statistics

1) PER CAPITA is a ratio. It does NOT change when the population size changes.

Per capita is Latin for by the head or for each head

- Example:

If I were to say that 20% of Russians smoke, it does not matter if the population of Russia is 10 people, 200,000 people, or 400,000 people. - One in five Russians smoke.

2) Statistical procedures and measurements are prone to specific kinds of bias, but “the way they do statistics” does not change

Statistics = Fact? Statistics can be manipulated and are seldom

free of bias Discarding Unfavorable Data

Sampling bias/Margin of Error

Leading/Loaded Question bias

False causality

Null Hypothesis

Numerous other issues (see examples in hyperlinks below)

For more examples of common misuses of statistics, see:http://en.wikipedia.org/wiki/Misuse_of_statisticshttp://cseweb.ucsd.edu/~ricko/CSE3/Lie_with_Statistics.pdfhttp://knowledge.wharton.upenn.edu/article/the-use-and-misuse-of-statistics-how-and-why-numbers-are-so-easily-manipulated/

Statistics = Fact? Discarding Unfavorable Data

Companies and their research staff can—and often do—ignore data that contradicts what they hope to find and/or they fail to publish studies that are disadvantageous to them.

Medical studies in which the outcomes do not favor the introduction and use of a new (and costly) medicine.

Ignoring myriad unfavorable variables and outcomes while selectively using those that are favorable (e.g., the drug reduces arthritis pain but increases risk of death

fourfold).

Antidepressant company researching the benefits of a new drug choose to discard from the study sample a group of people who showed dramatically increased risk of suicide when on the drug (excluding them for myriad reasons).

Statistics = Fact? Sampling Bias (also correlates with Margin or Error)

Recent Example: The 2012 Romney Campaign managers and statisticians genuinely thought the race would be close or that they would win handily. They based this upon statistics garnered through sampling voters by telephone poll. They ignored the findings of statistician Nate Silver who had in 2008 accurately predicted the electoral vote who was again predicting (accurately) the 2012 electoral vote.

“Dick Morris, former Campaign manager for Bill Clinton's 1996 reelection [and a leading strategist for Mitt Romney] has absolutely put his political pundit reputation on the line by declaring that Mitt Romney will win the Presidency in a landslide, which of course mirrors yours truely's prediction of Romney getting 52% of the votes against Obama's 47%.” - JustPlainPolitics.com

Romney Sampling = “likely voters’ with home phones willing to answer a poll about their preference of Presidential candidates.

Statistics = Fact? Leading/Loaded Question Bias

Do you support the attempt by the USA to bring freedom and democracy to other places in the world?

Do you support the unprovoked military action by the USA?

Do you support ObamaCare?

Do you support the Affordable Care Act?

Do you think teachers should be held to high standards that are measured fairly and accurately?

Do you support more standardized testing in K-12 public school classrooms?

Statistics = Fact? False causality (A ‘causes’ B)

Correlation is NOT causation. Many things are correlated (related) to each other, but this does not mean that one thing causes another. Almost all heavy drug use starts with first with alcohol or

marijuana use. Thus, marijuana use causes heroin addiction (FALSE).

The number of people buying ice cream at the beach is statistically related to the number of people who drown at the beach. Thus ice cream causes drowning (FALSE).

Statistics = Fact? Misuse of Null Hypothesis

Statisticians use the ‘null hypothesis’ as their starting point—that there is no relationship between two measured phenomena or that a potential medical treatment has no effect—and assume this true until proven otherwise via conclusive evidence (confidence intervals). The U.S. Court system follows a similar approach:

Innocent until proven guilty beyond a reasonable doubt. But the acquittal of a defendant does not prove the defendant “innocent” of the crime; rather it merely states that there is insufficient evidence for a conviction.

If a tobacco company runs studies to show that its products do not cause cancer. But it uses a small sample and the study is done over a short period of time. Thus it is unlikely that they’ll disprove the null hypothesis (that there is no relationship between using X tobacco product and cancer). They should not therefore report that their product does not cause cancer.

Statistics Bias: Reporting

Based upon performance, which stock would you be more inclined to buy?

Statistics Bias: Reporting

Based upon performance, which stock would you be more inclined to buy?

The stocks performed at same rate. The only difference is in how their performance is registered.

Statistics = Fact?/

(& “reading is reading”)

U.S. population tends to trust statistics Belief that STATISTICS = UNBIASED (statistics = math)

Thus, belief that standardized tests are inherently fair

Statistics can be manipulated, poorly done, and biased to find what one is seeking to find

Romney Campaign

SAT scores as example of school failure (Berliner and Biddle, ) Concern that SAT scores over time not changing or

getting higher

“Culture of Fear” & Statistics

The Bell Curve (Murray & Herrnstein)

Statistics and Standardized Tests

Standardized tests are trusted to give us an accurate picture of how well students are doing in school. Because they are statistically based, norm-referenced, and relatively easily scored, most people trust the information they give.

WHAT DO THEY NOT GIVE?

WHAT DO THEY NOT MEASURE?

WHAT SOURCES SHOULD WE TRUST FOR READING STATISTICS?

A Real (valid) Statistic

According to the National Assessment of Educational Progress (NAEP), approximately one in four students in the 12th grade (who have not already dropped out of school) are still reading at "below basic" levels, while only one student in twenty reads at "advanced" levels.

High-Stakes Testing

Your performance will be based, at least in part, upon your students’ test scores; measures such as Student Success Act (2011) and Race to the Top all rely on grading teachers’ performance as indicators of their teaching ability and performance pay.

50% of your yearly assessment—regardless of content area—will be based upon one test score!

This is yet another reason to make sure that your students can read their content area texts effectively: Your job will depend upon it!

Different Kinds of Testing

Norm-referenced tests vs. criterion-referenced tests Norm-Referenced Tests

NCLB and large-scale testing tends to be norm-referenced

Norm-referenced means a student’s performance is measured against a ‘norm’ for that age, ability, level, etc. Students are compared to a large average

Teacher success based largely on norm-referenced test scores

Criterion-Referenced Tests Classroom testing is almost always criterion-

referenced

Criterion-referenced tests measure knowledge or ability on a specific area (whether or not a student has learned specific material)

Reliability vs. Validity Reliability refers to the confidence we can

place on the measuring instrument to give us the same numeric value when the measurement is repeated on the same object

Will students score roughly the same if the test/assessment is repeated (results are not random)

Validity refers to whether the assessment instrument/tool actually measures the property it is supposed to measure.

Does the assessment tool actually measure the right thing?

Reliability vs. Validity Reliability

“Another way to think of reliability is to imagine a kitchen scale. If you weigh five pounds of potatoes in the morning, and the scale is reliable, the same scale should register five pounds for the potatoes an hour later (unless, of course, you peeled and cooked them). Likewise, instruments such as classroom tests and national standardized exams should be reliable – it should not make any difference whether a student takes the assessment in the morning or afternoon; one day or the next”

(http://fcit.usf.edu/assessment/basic/basicc.html)

Reliability CLASSROOM EXAMPLE:

“Another measure of reliability is the internal consistency of the items. For example, if you create a quiz to measure students’ ability to solve quadratic equations, you should be able to assume that if a student answers an item correctly, he or she will also be able to answer other, similar items correctly. The following table outlines three common reliability measures.”

http://fcit.usf.edu/assessment/basic/basicc.html

Validity

Validity refers to the accuracy of an assessment -- whether or not it measures what it is supposed to measure. Even if a test is reliable, it may not provide a valid measure.

If a test is valid, it is almost always reliable; a reliable test, however, does not correlate with validity.

GENERAL RULE:

Validity = Reliability

Reliability ≠ Validity

Validity Validity

Imagine a bathroom scale that consistently tells you that you weigh 118 pounds. The reliability (consistency) of this scale is very good, but it is not accurate

Validity

Because “teachers, parents, and school districts make decisions about students based on assessments (such as grades, promotions, and graduation), the validity inferred from the assessments is essential -- even more crucial than the reliability.”

(http://fcit.usf.edu/assessment/basic/basicc.html)

When you try to assess whether or not a student is able to read a text, you must use a valid measurement. For instance, students may be able to read and comprehend all of the words of a text but not understand the content. Testing for their ability to literally ‘read’ the text would not be a valid assessment.

The figure above shows four possible situations. In the first one, you are hitting the target consistently, but you are missing the center of the target; you are consistently and systematically measuring the wrong value for all respondents. This measure is reliable, but not valid (that is, it's consistent but wrong). The second shows hits that are randomly spread across the target. You seldom hit the center of the target but, on average, you are getting the right answer for the group (but not very well for individuals); you get a valid group estimate, but you are inconsistent. Here, you can clearly see that reliability is directly related to the variability of your measure. The third scenario shows a case where your hits are spread across the target and you are consistently missing the center. Your measure in this case is neither reliable nor valid. Finally, we see the "Robin Hood" scenario -- you consistently hit the center of the target. Your measure is both reliable and valid.

High-Stakes Tests

Today’s Corollary: A gun aimed at teachers saying “If all of your students don’t score well on the test, you’re a bad teacher who needs to go.” Bang

Assessment & Reading

You, the content area teacher, can do a number of relatively simple assessments to gauge a student’s or a group of students’ reading ability/level

There are also numerous ways to determine the reading level of various texts (textbooks, articles, web pages, etc.)

The goal: to match readers’ ability to appropriate texts (within the Zone of Proximal Development)

Informal Assessments Questioning:

Questioning (orally) students about textual information (from general to specific, making note of students’ responses in some format)

Questioning students directly (but privately) about their reading abilities

Students know when they struggle with reading & are often more open about their struggles than you would initially imagine.

Informal Assessments: Observation: when and where do students

struggle with reading? Watch for (and make note of) those

students who never volunteer to read and who avoid reading out loud, even in small groups

Listen to peer’s comments about individual’s reading ability

Watch students as they read (do they look up often? are they easily distracted? do they react negatively or disruptively?)

Speed of reading is a good indicator of reading text ability (though not always!)

Rate of Comprehension There is, obviously, a relationship (correlation)

between how quickly one reads and one’s level of fluency. Good readers read more quickly. This is not a one-to-one correspondence.:

Rate of Comprehension: Create a simple ratio for how long it takes students to read a passage of a specific length.

This measure is best used with a Comprehension Inventory (see p. 111)

Note that speed does not necessarily correspond with accuracy (sometimes slower readers can be reading for better understanding than faster readers). By combining the Comprehension Inventory and the Rate of Comprehension one can get a fuller picture (i.e., students who rush through a reading but read superficially and students who spend too much time on one passage).

Readability Readability formulas

Can be good arbiter of difficulty of text

Used in conjunction with other data (using it alone tells you very little that can be of use; you must use such information with other points of data and what you know about your students)

Zone of Proximal Development Do NOT use readability formulas to tailor all of your

information/texts to students’ respective abilities Doing so can hinder student reading growth

Readability formulas do not correlate with students’ individual abilities, background knowledge, or ability to read specific texts

What did you find about the texts you selected using the Fry Readability Graph or the Flesch-Kincaid Readability Formula?

Reading and ZDP Zone of Proximal Development (Lev

Vygotsky)

Readability Readability formulas: A caveat

Readability formulas can be misused and misunderstood. Basic readability measures are simple formula: ration of

syllables to words. But, it does NOT take into account: Specialized vocabulary (regardless of length of word)

Difficulty in construction of passages (think in terms of “Truth is untruth insofar as…” and poetry

There are a number of formal assessment tools for determining a student’s ability to read well:

Dynamic Indicators of Basic Early Literacy Skills (DIBELS) – primarily for emergent literacy and early literacy)

Woodcock Reading Mastery Test – Ibid

Diagnostic Assessment of Reading (DAR): can be used for secondary students (expensive)

District Measures (FCAT and other measures)

Lexile Scores

Readability

Lexile© Scores

Lexile scores can be obtained through state agencies (Departments of Education). The Scholastic Reading Inventory (SRI) provides a Lexile score. Check with your school or district to find out if a student or students have been tested in ways that measure Lexile.

Some resources are free (analyzing a classroom text for example)

Readability

Lexile Scores

Lexile gives a score that roughly corresponds to a range in which the average readers in that age should fall.

Lexile also categorizes texts to determine its Lexile score (reading difficulty)

Teachers can use this data to find appropriate reading materials for students.

Cloze Tests Cloze Tests

Help you see how well students know material (can be especially helpful as a pre-test of content) while also helping you determine reading ability

Formula for a cloze test:

1) Select 250-500 words for a selected piece of text

2) Leaving first sentence intact, begin deleting every fifth to seventh word of the text thereafter (delete a mixture of important vocabulary, conjunctions, verbs, etc.) as this tells teachers a great deal about comprehension

3) Delete fifty words.

4) Multiply students’ exact word replacements (or very close substitutions) by two to get percentage correct

Simple Cloze Tests as a class

http://www.edict.com.hk/vlc/cloze/cloze.htm

Readability & Comprehension

Readability vs. Comprehension

Readability rates the text's complexity in terms of words and grammar, but we're actually more interested in the text's difficulty in terms of reader comprehension of the content. Sad to say, no formula can measure whether users understand your site.

Take, for example, the following two sentences:

He waved his hands.

He waived his rights.

Both score well in readability formulas: simple words, short sentences. But whereas everybody understands what the first sentence describes, you might need a law degree to fully comprehend the implications of the second sentence.

Summary and Discussion: Using This Information

What can YOU do? Find out how well your students are reading

Prior test data (state measures, other measures, IEPs, etc.)

Informal assessments that you conduct in your content area

Determine readability AND supplement accordingly

You may not have a choice of whether or not to use specific text

If you do have a choice, choose wisely (not catering to students’ weaknesses, but within their ‘zone’ of proximal development)

Readability

Lexile© Scores

Lexile scores can be obtained through state agencies (Departments of Education). The Scholastic Reading Inventory (SRI) provides a Lexile score. Check with your school or district to find out if a student or students have been tested in ways that measure Lexile.

Some resources are free (analyzing a classroom text for example)

Readability

Lexile Scores

Lexile gives a score that roughly corresponds to a range in which the average readers in that age should fall.

Lexile also categorizes texts to determine its Lexile score (reading difficulty)

Teachers can use this data to find appropriate reading materials for students.

Observation As the teacher your knowledge is the best

indicator of students’ relative abilities to read

When and where do students struggle with reading?

Watch for (and make note of) those students who never volunteer to read and who avoid reading out loud, even in small groups

Listen to peer’s comments about individual’s reading ability

Watch students as they read. Do they look up often? Are they easily distracted? Do they react negatively or disruptively?

Speed of reading is a good indicator of reading text ability (though not always!)

Questioning Questioning (orally) students about textual

information (from general to specific, making note of students’ responses in some format)

Questioning students directly (but privately) about their reading abilities

Students know when they struggle with reading

Informal Observation & Questioning Video

Assessment of Reading: Individual

San Diego Quick (SDQ)

Create a list of vocabulary words from your discipline/content area that range in “readability”

Monosyllabic to polysyllabic, common English words to Latin-based words, etc.

Paste individual words onto note cards. Note readability level of word on back of card (using your judgment and/or readability formulas)

Starting with lower-level words, test student’s ability to pronounce the word randomly and quickly (gauge their confidence with the words)

Note where students begin to struggle. Move forward AND backward with word difficulty to try to determine where student is comfortably reading them; this will help determine their reading level

Assessing Students’ Reading and Texts. Agenda Assessment Humor Major Issues With Accountability...

Documents

Transcript of Assessing Students’ Reading and Texts. Agenda Assessment Humor Major Issues With Accountability...

Reading Assessment

The Science of Storytelling: The Infatuation

LOVE or Infatuation

Mutual Infatuation: Rosebud Sioux and Cincinnatianslibrary.cincymuseum.org/topics/c/files/cintizoo/qch-v52-n1-2-mut-030… · Mutual Infatuation: Rosebud Sioux and Cincinnatians Queen

READING Cambridge Assessment.

Developmental Reading Assessment

The Infatuation is Always There

LOVE vs. Infatuation

Management of Assumption Infatuation in Large Complex …...PM World Journal Management of Assumption Infatuation in Vol. V, Issue IV – April 2016 Large Complex Projects Featured

Love and infatuation

Reading Assessment (Ch5)

The Essence of Infatuation

Rigby Navigator Guided Reading Assessment Focuses · Rigby Navigator Guided Reading Assessment Focuses ... Rigby Navigator Guided Reading Assessment Focuses ... Out of this World

LOVE vs. Infatuation

Assessment of Reading

Reading Assessment English

LOVE or Infatuation. Is it LOVE or just Infatuation? Infatuation Mature Love O Develops at beginning of relationship. O Sexual attraction is central.

Assessment in Guided Reading Reading Records & Comprehension Assessment.

Assessment of Reading Instruction for Teacher Licensure · Assessment of Reading Instruction ... Assessment of Reading Instruction for Teacher Licensure . ... component of a multi-pronged

Critical Reading Assessment of critical Reading … OF CRITICAL READING SKILLS IN ... DEFINE CRITICAL READING AND CRITICAL THINKING 2. ... Critical Reading Assessment of critical Reading