Overview

43
Peer Review Process & Alignment Study Joe Willhoft Assistant Superintendent of Assessment and Student Information Yoonsun Lee Director of Assessment and Psychometrics Office of Superintendent of Public Instruction 1

description

Peer Review Process & Alignment Study Joe Willhoft Assistant Superintendent of Assessment and Student Information Yoonsun Lee Director of Assessment and Psychometrics Office of Superintendent of Public Instruction. Overview. Peer Review Purpose of Peer Review NCLB Peer Review Components - PowerPoint PPT Presentation

Transcript of Overview

Page 1: Overview

Peer Review Process & Alignment Study

Joe Willhoft Assistant Superintendent of Assessment and Student Information

Yoonsun Lee Director of Assessment and Psychometrics

Office of Superintendent of Public Instruction

1

Page 2: Overview

Overview

Peer Review Purpose of Peer Review NCLB Peer Review Components WA Peer Review Results

Alignment Study Alignment Method Alignment Study Example

2

Page 3: Overview

NCLB Accountability Requirements

NCLB requires states to: Establish challenging standards

State must apply the same academic standards to all public schools and public school students

Standards include description of the knowledge, skills, and levels of achievement expected of all students

Include at least mathematics, reading or language arts, and science

Develop aligned assessments Build accountability systems for districts and schools

based on educational results

3

Page 4: Overview

NCLB Peer Review

Each state’s standards and accountability program is subject to review and must be approved by a panel of peers

Panels consist of three assessment professionals

States submit documentation that the state has met the requirements for each of seven “critical elements”

Categories of approval: Not Approved, Approval Pending, Approval Expected, Fully Approved (with or without recommendations)

4

Page 5: Overview

NCLB Peer Review

Review August 6, 2008 Letter

5

Page 6: Overview

Seven Peer Review Critical Elements

1. Challenging academic content standards2. Challenging academic achievement standards3. System of annual high-quality assessments4. System of assessments with high technical

quality5. Alignment of academic content standards,

academic achievement standards, and assessments

6. Inclusion of all students in the assessment system

7. An effective system of assessment reports

6

Page 7: Overview

Seven Peer Review Critical Elements

• Each table discuss two elements• Table 1: Discuss Elements 1 & 2• Table 2: Review Elements 2 & 3• Table 3: Review Elements 3 & 4, etc.

• Discussion:1. What does this element mean?2. How does Washington addresses this element?3. What impact does this element have on schools?

• Be prepared to report out

7

Page 8: Overview

Table Discussion

1. Challenging academic content standards

2. Challenging academic achievement standards

3. System of annual high-quality assessments

4. System of assessments with high technical quality

5. Alignment of academic content standards, academic achievement standards, and assessments

6. Inclusion of all students in the assessment system

7. An effective system of assessment reports

What does this element mean?

How does Washington addresses this element?

What impact does this element have on schools?

8

Page 9: Overview

NCLB Peer Review

Review May 5, 2006 Letter

9

Page 10: Overview

1. Challenging Academic Content Standards

States must develop a set of challenging academic content standards. Standards must:

develop grade specific expectations in addition to its standards

define the knowledge and skills that are expected of all students prior to graduation (high school level)

be rigorous and encourage the teaching of advanced skills

Standards review by external panel Careful review of the grade level expectations development

process with curriculum and assessment personnel Online survey to gather feedback on refinements to the

standards

10

Page 11: Overview

2. Challenging Academic Achievement Standards

Academic achievement standards must: include at least three achievement levels (e.g., basic,

proficient, and advanced). Proficient and advanced must represent high achievement and basic must represent achievement that is not yet proficient.

Include descriptions of the content-based competencies associated with each level. Cutscores must be established through a process that involves both expert judgments and consideration of assessment results.

Be aligned with the state’s academic content standards in that they capture the full range and depth of knowledge and skills defined in the state’s academic content standards.

11

Page 12: Overview

3. Alternate Academic Achievement Standards

A state is permitted to define alternate achievement standards to evaluate the achievement of students with the most significant cognitive disabilities. Alternate academic achievement standards must:

Be aligned with the state’s academic content standards for the grade in which the student is enrolled.

Be challenging for eligible students, but may be less difficult than the grade level academic achievement standards

Include at least three achievement levels Be developed through a documented and validated

standards setting process that includes broad stakeholder input.

12

Page 13: Overview

4. System of Annual High-Quality Assessments

NCLB requires states to develop a single statewide system of high quality assessments. All public school students must participate in this assessment system including those with disabilities and those who are not yet proficient in English.

Reading and Mathematics components of the assessment systems in place by the 2005-2006 school year (science by 2007-2008) and must be administered annually to all students in each of grades 3-8 and at least once to students in the 10-12 grade range.

13

Page 14: Overview

4. System of Annual High-Quality Assessments (continued)

The State’s assessment system should involve multiple measures that assess higher-order thinking skills and understanding of challenging content.

WASL includes multiple measures (multiple choice, short answer, and extended response items) to assess higher order thinking skills and different levels of cognitive complexity.

14

Page 15: Overview

System of Assessments with High Technical Quality

The Standards for Educational and Psychological Testing delineates the characteristics of high-quality assessments and describes the processes that a state can employ to ensure that its assessments and use of results are appropriate, credible, and technically defensible.

Validity Reliability Other dimensions of technical quality

15

Page 16: Overview

System of Assessments with High Technical Quality

Validity: Whether the State has evidence that the assessment results can be interpreted in a manner consistent with their intended purpose(s).

Evidence based on test content Evidence based on the assessment’s relation to

other variables Evidence based on student response processes Evidence from internal structure

16

Page 17: Overview

Validity

Evidence based on test content (Content validity): alignment of the standards and the assessment

Content validity is important but not sufficient. States must document not only the surface aspects of validity illustrated by a good content match but also the more substantive aspects of validity that clarify the real meaning of a score

For WASL, content validity is confirmed by content specialists (teachers, curriculum and assessment specialists) by examining if each item is aligned with content standards.

17

Page 18: Overview

Validity

Evidence based on the assessment’s relation to other variables: Demonstrate the validity of an assessment by confirming its positive relationship with other assessments or evidence that is known or assumed to be valid.

If students who do well on the assessment in question also do well on some trusted assessment or rating such as teachers’ judgments.

18

Page 19: Overview

System of Assessments with High Technical Quality

Evidence based on student response processes: Eliminate sources of test invalidity during the test development process through reviews for ambiguity, irrelevant clues, and inaccuracy.

19

Page 20: Overview

Validity

Evidence based on internal structure: Use statistical techniques to study the structure of a test. Item correlations Generalizability analyses Factor analysis

20

Page 21: Overview

Reliability

Reliability is defined with consistency, stability and accuracy.

States assessment systems are obliged to Make a reasonable effort to determine the types of

error that may distort interpretations of the findings Estimate their magnitude Make every possible effort to alert the users to this

lack of certainty

21

Page 22: Overview

Reliability

Traditional methods of portraying the consistency of test results are Reliability coefficients Standard errors of measurement Actual level of accuracy Actual level of consistency

22

Page 23: Overview

Other Dimensions of Technical Quality

Fairness/Accessibility Do the items and tasks provide an equal opportunity

for all students to fully demonstrate their knowledge and skills?

Are the assessments administered in ways that ensure fairness?

Are the results reported in ways that ensure fairness? Are the results interpreted or used in ways that leads

to equal treatment?

23

Page 24: Overview

Other Dimensions of Technical Quality

Comparability of results Comparability from year to year, from student to

student, and from school to school

Procedures for test administration, scoring, data analysis, and reporting Are the assessments properly administered? Are directions followed? Are test security requirements clearly specified and

followed?

24

Page 25: Overview

Other Dimensions of Technical Quality

Interpretation and use of results Do the results reflect the goals of instruction, especially

those related to higher-order thinking and understanding?

Use of accommodations Are appropriate accommodations available to students

with disabilities and students covered by Section 504? Are appropriate accommodations available to limited

English proficient students ? Do scores for those students (disabilities, limited English

proficient) allow for valid inferences about students’ knowledge and skills and can be combined meaning fully with sores from none-accommodated administration circumstances?

25

Page 26: Overview

5. Alignment of academic content standards, achievement standards, and assessments

Do a State’s assessments adequately measure the knowledge and skills specified in the State’s academic content standards?

Do the assessments cover the full range of content specified in the State’s academic content standards?

Do the assessments measure both the content and the process aspects of the academic content standards?

Do the assessments reflect the full range of cognitive complexity and level of difficulty of the concepts and processes descried in the State’s academic content standards?

26

Page 27: Overview

Alignment studies should: Demonstrate the breath and depth of the match

between assessments and content standards. Demonstrate that the performance descriptors

are consistent with the demands of the test content and content standards.

Document the link between alternate assessments based on alternate achievement standards and grade level content standards.

27

5. Alignment of academic content standards, achievement standards, and assessments

Page 28: Overview

6. Inclusion of All Students in the Assessment System

Inclusion of all students in a State’s system of standards, assessments, and accountability

For students with disabilities and for students who are not yet proficient in English, participation in the State’s assessment system may require special considerations. For LEP students who have been in school in the U.S.

for less than 12 months, regulations permit the State to substitute participation in the Sates English proficiency test for participation in the grade level reading test for one year only

28

Page 29: Overview

7. An effective system of assessment reports

Do a parent, educator, or other stakeholder find answers to questions about how well a student or group of students is achieving, as well as important information on how to improve achievement in the future?

Do States produce reports at the individual student, school, LEA, and State levels? Reports must include scores that are aligned with the

Sate’s academic content standards.

29

Page 30: Overview

Peer Review Process

To determine whether or not states have met NCLB standards and assessments requirements, the U.S. Dept. of Education use a peer review process involving experts (peer reviewers) in the fields of standards and assessments.

Peer reviewers examine characteristics of a State’s assessment system that will be used to hold schools and school districts accountable under NCLB.

Peer reviewers advise the Dept. of Education on whether a State assessment system meets a particular requirement based on totality of evidence submitted.

30

Page 31: Overview

WA Peer Review Results

In August, 2008 Washington’s standards and assessment system were approved.

The decision was based on input from peer reviewers external to the U.S. Dept of Education who reviewed the evidence demonstrating that Washington’s system includes academic content and student academic achievement standards in reading, mathematics, and science; alternate academic achievement standards for students with the most significant cognitive disabilities in those subjects.

31

Page 32: Overview

WA Alignment Study

32

Page 33: Overview

Science Alignment Study

How does the WASL align with Essential Academic Learning Requirements (EALRs) and Grade Level Expectations (GLEs) in science at the 5th, 8th, and 10th grade levels?

Panels of educators participated in this study. The primary task was to evaluate how well

score points from the WASL science assessments matched the state GLEs in terms of content and cognitive complexity

33

Page 34: Overview

Methodology

Frisbie (2003); Webb (1997) Judgments

Cognitive complexity of GLEs and EOLs Cognitive complexity of score points for each item Content fit of score points with GLEs and EOLs

Scoring guides and exemplar responses for score points were available for review by panelists

Two-stage procedure Independent judgment Group consensus discussion and recommendation

34

Page 35: Overview

Procedures

14 science educators participated in the alignment study based on geographic information and school size.

The participants had content and assessment expertise.

The panelist review was facilitated by an independent contractor, who also wrote the summary report

35

Page 36: Overview

Procedures

Panelists were asked to rate the cognitive complexity for each GLE and Evidence of Learning (EOL). Three levels of complexity were used Level1 - Conceptual understanding and

comprehension: Assessment items, GLEs, or EOLs at this level focus on remembering facts, comprehension of concepts, recognizing attributes of a process and understanding ideas. Assessment items at this level might ask examinees to identify, recognize, recall, classify, summarize, or compare.

36

Page 37: Overview

Procedures

Level 2 – Application, analysis, synthesis, and evaluation: Assessment items, GLEs, or EOLs at this level focus on application of concepts and ideas to human problems and situations through predictive analysis, synthesis of information and evaluation of situations or problems. Assessment items at this level might ask examinees to conclude, plan, differentiate, critique, create new ideas or meaning, design, explain, evaluate, or organize.

Unclassifiable – This level applies when a GLE or EOL is worded so ambiguously that it is not possible to determine how students are expected to interact with the content.

37

Page 38: Overview

Procedures

After rating the cognitive complexity for each GLE and Evidence of Learning (EOL), panelists evaluated the degree of fit between item (or score point for CR items) and the EOL the item is designed to assess. Three levels of fit were used: C – Complete fit: the main content required to answer

the item correctly is contained in the GLE/EOL. If the student gets the item right, this is one relevant piece of information about the student’s level of achievement of the content stated in the GLE/EOL

38

Page 39: Overview

Procedures

P - Partial fit: A significant portion of the content required to answer the item correctly is embodies in the GLE/EOL. But there is additional, significant understanding required that is represented by some other GLE/EOL. If the student gets the item (point) right, it is because the student has some other significant knowledge that is not part of this GLE/EOL

S - Slight fit: There is some relationship between the item content and the content of the EOL, but much more is needed to answer the item correctly. Alignment would probably be more complete with some other GLE/EOL, or it might take several GLE/EOLs to cover the content of the item sufficiently.

X - No fit: the item does not fit the content of any GLE/EOL

39

Page 40: Overview

2007 Results – Grade 5

Overall All score points judged to align to one GLE Coverage is balanced across content and cognitive

levelsSystems of Science

20 score pointsInquiry in Science

20 score pointsApplication of Science

9 score points

40

Page 41: Overview

2007 Results – Grade 8

Overall All score points judged to align to at least one GLE Coverage is balanced across content and cognitive

levelsSystems of Science

27 score pointsInquiry in Science

24 score pointsApplication of Science

12 score points

41

Page 42: Overview

2007 Results – Grade 10

Overall All score points judged to align to at least one GLE Coverage is balanced across content and cognitive

levelsSystems of Science

31 score pointsInquiry in Science

27 score pointsApplication of Science

9 score points

42

Page 43: Overview

Conclusions

Results suggest increasingly challenging content standards across grade levels.

Score points were balanced across GLEs on content and cognitive complexity.

Panelists’ evaluation ratings and comments were positive.

43