New York State Education Department Understanding The Process: Science Assessments and the New York...

38
New York State Education Department Understanding The Process: Science Assessments and the New York State Learning Standards

Transcript of New York State Education Department Understanding The Process: Science Assessments and the New York...

New York State Education Department

Understanding The Process:

Science Assessments and the New York State Learning Standards

January 2002

NYSED

New York State Learning Standards• April 1994, the Board of Regents approved a

plan to revise the State assessment system based on learning standards.

• July 1996, the Board of Regents approved 28 learning standards in seven standard areas:– Math Science and Technology, Social Studies,

The Arts, English Language Arts, Languages other than English, Career Development and Occupational Studies, Family Consumer Science/Health/Physical Education

January 2002

NYSED

New York State Learning Standards• Learning standards outline what students

should know, understand and be able to do in a specific subject area

• Learning standards contain content and performance standards– Content Standard

The knowledge, skills, and understandings that individuals can habitually demonstrate over time as a consequence of instruction and experience

– Performance Standards Levels of student achievement in domains of study

January 2002

NYSED

New York State Learning Standards• Learning standards consist of performance

indicators at the: – Elementary ( K-4) ,– Intermediate (5-8), and – Commencement (9-12) levels

• Performance indicators are embedded in the learning standards and are aligned to Science Core Curriculum Guides and State Assessments

January 2002

NYSED

State Assessments• Provide a uniform measure of student achievement

across all districts, all schools, all classrooms • State tests assess the extent to which students have

achieved the learning standards in a content area • Are important indicators of student achievement of

the learning standards • Are used to understand individual student needs in

conjunction with other appropriate measures• Drive necessary changes in curriculum and

classroom instruction

January 2002

NYSED

Science Assessments• Elementary Science Elementary Science Program

Evaluation Test (ESPET) Administered at Grade 4

• Intermediate Science Intermediate Level Science Administered at Grade 8

• Commencement Level Regents Science ExamsLiving EnvironmentPhysical Setting/Earth SciencePhysical Setting/ChemistryPhysical Setting/Physics

January 2002

NYSED Test Development Process

in Science• The test development process ensures assessments created are fair, valid and reliable measures of student performance in relation to meeting the State learning standards

• The process involves 19 steps and approximately two-three years to develop a State assessment

January 2002

NYSED Test Development Process

Item W ritingSolic it item w ritersTrain item w riters

Test item s are subm itted and review ed

Testing Item sPre Test item s/form sF ield tes t item s/form s

O perational form s/Tests

Test AnalaysisPre test data/F ield tes t dataItem analys is /Test analys is

Exam review com m ittees/ S tandards setting s tudy

L ea rn ing S ta nd a rd s

C O R E G uid e s /S ub je c t S p e c ific C o nte nt A r e aT e st S p e c ific a t io ns /T e st B lue p r int

January 2002

NYSED

Test Development …continued• Review learning standards in subject content

area• Design test specifications- “test blueprint”• Solicit and train Item Writers• Publish prototypes of items/generic rubrics;

(sample tests)• Review and edit submitted items• Pre-test items; scan pretests, read and score

performance items• Perform item analysis; review items and data

January 2002

NYSED

Test Development …continued• Field test forms; scan field tests; read and score

performance items • Perform item and test analysis• Submit to Statewide Examination Review

Committees:Sensitivity Review - ensures that all people are depicted in accord

to dignity ; certified trained reviewers review or reject test items

Bias Analysis - evaluates whether a test question asks the same question and at the same level of difficulty across sub- groups of test takers

• Determine student performance levels through Standards Setting Study - “cut scores”

January 2002

NYSED

Test Construction• New York State teachers and content

consultants, in coordination with Office of State Assessment and Curriculum and Instruction, determine test specifications

• A “test blueprint” determines the percentage of questions weighted for each standard and key ideas

January 2002

NYSED

Item WritingNew York State Teachers & Content

Specialists • Are trained as item writers by New York

State Education Department staff

• Align State learning standards contained in Science Core curriculum guides to all test items generated

• Write items and scoring rubrics for State tests in science

January 2002

NYSED

Pre-Tests• Prospective test items are “pre-tested” by a

diverse sample of students across the State

• Approximately 200 students for each item are tested

• Results from pre-tested items are statistically analyzed to determine question “item difficulty,” and fairness

January 2002

NYSED

Field Tests• Field test items are developed from pre-

test questions and administered in “short forms” to a representative sample of students across the State; (800-1000)

• Field tests are comparable in difficulty from different test forms based on statistical

analysis and student performance

January 2002

NYSED

Field Tests• Each field test form is “equated,” meaning

two or more test forms are constructed to cover the same explicit content, conform to the same statistical specifications, and are administered under identical procedures

• Two or more essentially parallel tests are

placed on a common scale -“ equating”

January 2002

NYSED

Field Tests to Operational Tests• “State Assessments,” operational tests, are

assembled from Field Test Forms

• Statistical analysis ensures different test forms are comparable in fairness, validity and reliability

• Operational tests are placed on a “scale score,” a derived score to which raw scores are converted by numerical transformation; raw scores to standard scores

• Full length forms are presented to the State Examinations Review Committee for sensitivity and bias review

January 2002

NYSED

Standard Setting Process• State Tests assess the extent to which students have met the

learning standards in a content area

• Although scores for the Regents Exams are placed on a numerical scale based on field test data, there are essentially

Three Performance Levels Does not meet the standardsMeets the standards

Meets the standards with distinction

January 2002

NYSED

Standard SettingPerformance Levels• Standard Setting committee members are

given definitions of student performance levels

• Student performance levels are applied to all State assessments that are developed including Regents tests

January 2002

NYSEDStandard Setting…Three Performance Levels Example: Physical Setting /Earth Science• Does Not Meet Learning Standards• Meets Learning Standards• Meets Learning Standards with Distinction

– The student demonstrates, on demand, proficiency, in terms of Physical Setting /Earth Science content, concepts, science skills and basic science knowledge in any or most of the science learning standards and key ideas that are addressed for productive citizenship and has sufficient knowledge and skill for the demands of most work places or post secondary academic environments

January 2002

NYSED

Setting the “cut score”• The Board of Regents has determined “65” as

passing a NY State Regents Examination and “85” as passing with distinction

• “Passing” =“Proficient”, the performance needed to achieve learning standards

• To determine the “passing score” or “65” a formal Standards Setting study is conducted based on the reasoned judgement of subject matter specialists and student performance data

January 2002

NYSED

Scoring and Scaling• Based on statistics from student pre-test and

field test data, items are placed on a logarithmic scale according to item difficulty level and student ability

• The two points “passing/65” and “passing with distinction/85” are then algebraically mapped to scale, 0-100 (not raw score but scale score

January 2002

NYSED

Standards Setting Committee• Committee members are:

-knowledgeable in the learning standards for science -from public and nonpublic schools -are current and former classroom teachers -represent urban, suburban and rural schools -selected members from business and industry

• Each member makes individual judgements with respect to the item difficulty, scaling and equating of field tests, and professional expertise

January 2002

NYSED

Standards Setting Process• New York State Teachers and content experts

use the “book marking” method in conjunction with professional judgement to set a “cut score”

• In the “bookmarking” procedure, multiple choice and constructed response items are ordered in terms of their item difficulty

• Test items corresponding to various points on the scale are presented as examples of test items at that difficulty level

• The purpose of the items is to illustrate the meaning of the difficulty scale at specific points

January 2002

NYSED

Standard Setting Process• Test items used come from an “anchor” form; a

test form upon which all cut points are set and all later forms of the test will be equated

• Committee members apply their professional judgements to these ordered items

• A “cut-score” , or performance standard, is a specified point on a scale score, “65”, and is set such that scores at or above that point are acted upon differently from scores below that point

January 2002

NYSED

Science Regents Scoring and Scaling• The Conversion Chart provided for each test

administration translates raw scores to scale scores (performance standards) and then maps to a 0-100 scale.

January 2002

NYSED

Science Regents ExaminationsScoring• Test administration for each test form is

“equated” so that the same “scale score”, represents the same level of achievement

• Test forms vary somewhat in the mix of easier and more difficult items, resulting in the relationship between the raw score and the scale score also varying from each test

administration

January 2002

NYSED

Science Regents Examinations• Syllabus - Based

• Addressed a selective student population

• Assessments were designed from course of study

• Syllabi contained prescriptive content

• Standards -Based

• Universal access to all students

• Assessments are derived from the standards

• Standards drive the content of the courses designed

January 2002

NYSEDThis item’s difficulty level, based on field test data, was the easiest question on the

ES June 2001 exam.

January 2002

NYSEDTest item 9 on the LE June 2001 exam has an item difficulty level at the passing performance level, “meets the standards”, based on field test data and the standards setting process.

January 2002

NYSEDThis item’s difficulty level, based on field test data, is an example of a test item at the designated passing performance level, “meets the standards”.

January 2002

NYSEDTest item 30 on the LE June 2001 exam has an item difficulty level at the passing performance level, “meets the standards”, based on field test data and the standards setting process.

January 2002

NYSEDThis item’s difficulty level, based on field test data, is another example of a question on the ES June 2001exam that “meets the standards”.

January 2002

NYSEDThis item’s difficulty level, based on field test data, was one of the most difficult questions on the LE June 2001 exam.

January 2002

NYSEDThis item’s difficulty level, based on field test data, was the most difficult question on the ES June 2001 exam.

January 2002

NYSED

Science Regents Examinations• Old “65”/passing was

determined by a “Raw Score”

• A students score was based on a maximum of 100 points.

• Test item difficulty varied from each test form

• New “65”/passing is determined by a “scale score”

• A student score is derived by converting a raw score to a scale score based on student field test data

• The item difficulty values represent the same level of difficulty from each test administration

January 2002

NYSED

Science Regents ExaminationNumber of Students Tested - Total StateRegents Test 1997-1998 1998-1999 1999-2000

Biology 131,992 141,424 149,605

Earth Science 68,405 80,512 75,357

Earth Sciencepro mod

54,318Total 122,723

63,556Total 144,068

67,114Total 142,471

Chemistry 98,016 104,230 104,763

Physics 48,345 49,517 50,159

January 2002

NYSED

Regents Science ExaminationsStatisticsRegents Science Total # of Students

Tested 2001Increase # of Students

Tested 1997-2001

Regents BiologyLiving Environment

70,387179,489

Total 249,876

117,884

Earth SciencePhysical Setting/ES

36,804129,564

Total 166,368

43,645

Chemistry 113,253 15,237

Physics 50,663 2,318

January 2002

NYSED

Reliability of State Assessments• Reliability focuses on the consistency of test

scores (performance) for a group of tests takers across measures of time

• Reliability is best achieved by evaluating the whole test before considering smaller portions of the test

• Inter- rater reliability is conducted after each test

administration (Teams of teachers are provided uniform training and scoring procedures to re-score 10% of the Regents examinations audited)