Presenters Sharon Hall U.S. Department of Education Martin Kehe

62
State Exemplar: Maryland’s Alternate Assessment Using Alternate Achievement Standards The Alternate Maryland School Assessment Presenters Sharon Hall U.S. Department of Education Martin Kehe Maryland State Department of Education William Schafer University of Maryland

description

State Exemplar: Maryland’s Alternate Assessment Using Alternate Achievement Standards The Alternate Maryland School Assessment. Presenters Sharon Hall U.S. Department of Education Martin Kehe Maryland State Department of Education William Schafer University of Maryland. Session Summary. - PowerPoint PPT Presentation

Transcript of Presenters Sharon Hall U.S. Department of Education Martin Kehe

Page 1: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

State Exemplar: Maryland’s Alternate Assessment

Using Alternate Achievement Standards

The Alternate Maryland School AssessmentPresenters

Sharon Hall U.S. Department of Education

Martin KeheMaryland State Department of Education

William Schafer University of Maryland

Page 2: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Session Summary

This session highlights the Alternate Assessment based on Alternate Academic Achievement Standards in Maryland – The Alternate Maryland School Assessment (Alt-MSA)

Discussion will focus on A description of the assessment and the systems-change process

which was required to develop and implement the testing program Development of reading, mathematics and science item banks The process to ensure alignment with grade-level content

standards and results and results of independent alignment studies

Technical documentation and research agenda to support validity and reliability.

Page 3: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Agenda Developing Maryland’s AA-AAAS: A Systems Change Perspective

Conceptual Framework Alt-MSA Design

Developing the Mastery Objective Banks Evaluation of the Alt-MSA’s alignment with content standards Technical Documentation and Establishing a Research Agenda

Support Validity and Reliability Questions and Answers

Page 4: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

A Systems Change Perspective

Process Collaboration

Divisions of Special Education and Assessment Stakeholder Advisory Alt-MSA Facilitators Alt-MSA Facilitators and LACs MSDE and Vendor

Instruction and Assessment Students assigned to age appropriate grade (for purposes

of Alt-MSA) Local School System Grants

Page 5: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

A Systems Change Perspective

Content

Reading and Mathematics mastery objectives and artifacts (evidence) linked with grade level content standards

No program evaluation criteria

Page 6: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Maryland’s Alternate Assessment Design (Alt-MSA)

Portfolio Assessment 10 Reading and 10 Mathematics Mastery

Objectives (MOs) Evidence of Baseline (50% or less attained) Evidence of Mastery (80% - 100%): 1 artifact for

each MO 2 Reading and 3 Mathematics MOs aligned with

science Vocabulary and informational text; measurement and

data analysis

Page 7: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

What’s Assessed: Reading Maryland Reading Content Standards

1.0 General Reading Processes Phonemic awareness, phonics, fluency (2 MOs) Vocabulary (2 MOs; 1 aligned with science) General reading comprehension (2 MOs)

2.0 Comprehension of Informational Text (2 MOs; 1 aligned with science)

3.0 Comprehension of Literary Text (2 MOs)

Page 8: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

What’s Assessed: Mathematics Algebra, Patterns, and Functions

(2 MOs) Geometry

(2 MOs) Measurement

(2 MOs; 1 aligned with science) Statistics-Data Analysis

(2 MOs aligned with science) Number Relationships and Computation

(2 MOs)

Page 9: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

What’s Assessed: Science (2008)

Grades 5, 8, 10 Grades 5 and 8; select 1 MO each

Earth/Space Science Life Science Chemistry Physics Environmental Science

Grade 105 Life Science MOs

Page 10: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Steps in the Alt-MSA Process Step 1: September

Principal meets with Test Examiner Teams

Review results or conduct pre-assessment

Page 11: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Steps in the Alt-MSA ProcessStep 2: September-November

TET selects or writes Mastery Objectives

Principal reviews and submits

Share with parents

Revise (written) Mastery Objectives

Page 12: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Steps in the Alt-MSA ProcessStep 3: September-March

Collect Baseline Data for Mastery Objectives: 50% or less accuracy

Teach Mastery Objectives

Assess Mastery Objectives

Construct Portfolio

Page 13: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Standardized Number of mastery objectives assessed Format of mastery objectives Content standards/topics assessed All mos must have baseline data and evidence of

mastery at 80%-100% Types of artifacts permissible Components of artifacts Training and Handbook provided Scoring training and procedures

Page 14: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

MO Format

Page 15: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Evidence (Artifacts) Acceptable Artifacts (Primary Evidence)

Videotapes-1 reading and 1 math mandatory Audiotape Student work (original) Data collection charts (original)

Unacceptable Artifacts photographs, checklists, narrative descriptions

Page 16: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Artifact Requirements Aligned with Mastery Objective Must include baseline data that demonstrates

student performs MO with 50% or less accuracy Data chart must show 3-5 demonstrations of

instruction prior to mastery The observable, measurable student response

must be evident (not “trial 1”) Mastery is 80%-100% accuracy Name, date, accuracy score, prompts

Page 17: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Scores and Condition CodesA MO is not alignedB Artifact is missing or not acceptableC Artifact is incompleteD Artifact does not align with MO, or

components of MO are missingE Data Chart does not show 3-5

observations of instruction on different days prior to demonstration of mastery

F Accuracy score is not reported

Page 18: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Reliability: Scorer Training Conducted by contractor scoring director, MSDE

always present Must attain 80% accuracy on each qualifying set Every portfolio is scored twice by 2 different teams Daily backreading by supervisors and scoring

directors Daily inter-rater reliability data Twice weekly validity checks Ongoing retraining

Page 19: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Maryland’s Alt-MSA Report

Page 20: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Development of the Mastery Objective Banks

Initial three years of program involved teachers writing individualized reading and mathematics Mastery Objectives (approximately 100,000 objectives each year)

Necessary process to help staff learn the content standards

Maryland and contractor staff reviewed 100% of MOs for alignment and technical quality

Page 21: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Mastery Objective Banks Prior to year 4, Maryland conducted an

analysis of written MOs to create the MO Banks for reading and mathematics

Banked items available in an online application, linked to and aligned with content standards

Provided additional degree of standardization Process still allows for writing of customized

MOs, as needed

Page 22: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Mastery Objective Banks In year 4, Baseline MO measurement was

added Teachers take stock of where a student is, without

prompts at the beginning of the year on each proposed MO

This helps to ensure that students are learning and assessed on skills and knowledge that has not already been mastered

Year 5 added Science MO Bank

Page 23: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Mastery Objective Banks

Page 24: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Mastery Objective Banks

Page 25: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Mastery Objective Banks

Page 26: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Mastery Objective Banks

Page 27: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Mastery Objective Banks

Page 28: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

National Alternate National Alternate Assessment Center (NAAC)Assessment Center (NAAC)

Alignment Study Alignment Study of the Alt-MSAof the Alt-MSA

Page 29: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

NAAC Alt-MSA Alignment NAAC Alt-MSA Alignment StudyStudy

Conducted by staff from University of North Carolina at Charlotte and Western Carolina University from March – August, 2007

Study was an investigation of the alignment of Alt-MSA Mastery Objectives in Reading and Mathematics to grade-level content standards

Page 30: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

NAAC Alt-MSA Alignment NAAC Alt-MSA Alignment StudyStudy

Eight (8) criteria used to evaluate Developed in collaboration of content experts

special educators and measurement experts at University of North Carolina at Charlotte (Browder, Wakeman, Flowers, Rickleman, Pugalee, & Karvonen, 2006)

A stratified random sampling method (stratified on grade level) was used to select the portfolios, grades 3 – 8 and 10, 225 reading/231 mathematics

Page 31: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Criterion 1: The content is academic and includes the major domains/strands of the content area as reflected in state and national standards (e.g., reading, math, science)

Outcome: Reading: 99% of MOs were rated academic Math: 94% of MOs were rated academic

Alignment Results by CriterionAlignment Results by Criterion

Page 32: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Criterion 2: The content is referenced to the student’s assigned grade level (based on chronological age)Outcome: Reading: 82% of the MOs reviewed were referenced to a grade level standard (2.0% were not referenced to a grade level standard. 16% were referenced to off-grade standards (K-2) which were referenced to the standards of phonics and phonemic awareness.) Math: 97% were referenced to a grade level standard

Alignment Results by CriterionAlignment Results by Criterion

Page 33: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Criterion 3: The focus of achievement maintains fidelity with the content of the original grade level standards (content centrality) and when possible, the specified performanceOutcome Reading: 99% MOs rated as far or near for content centrality, 92% MOs rated partial or full performance centrality, and 90% rated as being linked to the MO

Math: 92% MOs rated as far in content centrality, 92% MOs rated partial performance centrality, and 92% rated as being linked to the MO

Alignment Results by CriterionAlignment Results by Criterion

Page 34: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Criterion 4: The content differs from grade level in range, balance, and Depth of Knowledge (DOK), but matches high expectations set for students with significant cognitive disabilities. Outcome Reading: All the reading standards had multiple MOs that were linked to the standard and although 73% were rated at the depth of knowledge level of memorize/recall, there were MOs rated at the highest level of depth of knowledge levels (i.e., comprehension, application, and analysis)Math: MOs were aligned to all grade level standards and distributed across all levels of depth of knowledge except the lowest level (i.e., attention), with the largest percentage of MOs at the performance and analysis/synthesis/evaluation levels.

Alignment Results by CriterionAlignment Results by Criterion

Page 35: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Criterion 5: There is some differentiation in achievement across grade levels or grade bands.Outcome Reading: Overall the reading has good differentiation across grade levels Math: While there is some limited differentiation, some items were redundant from lower to upper gradesCriterion 6: The expected achievement for students is for the students to show learning of grade referenced academic content Outcome: The Alt-MSA score is not augmented with program factors. However, in cases where more intrusive prompting is used, the level of inference that can be made is limited.

Alignment Results by CriterionAlignment Results by Criterion

Page 36: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Criterion 7: The potential barriers to demonstrating what students know and can do are minimized in the assessmentOutcome: Alt-MSA minimizes barriers for the broadest range of heterogeneity within the population, because flexibility is built into the tasks teachers select. (92% of the MOs were accessible at an abstract level of symbolic communication, while the remaining MOs were accessible to students at a concrete level of symbolic communication).

Criterion 8: The instructional program promotes learning in the general curriculumOutcome: The Alt-MSA Handbook is well developed and covers the grade level domains that are included in alternate assessment. Some LEAs in MD have exemplary professional development materials.

Alignment Results by CriterionAlignment Results by Criterion

Page 37: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Study SummaryStudy Summary

Overall the Alt-MSA demonstrated good access to the general curriculum

The Alt-MSA was well developed and covered the grade level standards

The quality of the professional development materials varied across the different counties

Page 38: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Technical Documentationof the Alt-MSA

Page 39: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Sources Alt-MSA Technical Manuals (2004, 2005, 2006) Schafer, W. D. (2005). Technical Documentation for

Alternate Assessments. Practical Assessment, Research and Evaluation, 10(10). At PAREonline.net.

Marion, S. F. & Pellegrino, J. W. (2007). A validity framework for evaluating the technical adequacy of alternate assessments. Educational Measurement: Issues and Practice, 25(4), 47-57.

Report from the National Alternate Assessment Center from a panel review of the Alt-MSA.

Contracted technical studies on Alt-MSA

Page 40: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Validity of the CriterionIs Always Important

To judge proficiency in any assessment, a student’s score is compared with a criterion score

Regular assessment: standard setting generates a criterion score for all examinees

Regular assessment: the criterion score is assumed appropriate for everyone It defines an expectation for minimally acceptable

performance It is interpreted in behavioral terms through achievement

level descriptions

Page 41: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Criterion in Alternate Assessment

A primary question in alternate assessment is Should the same criterion score should apply to everyone?

Our answer was no, because behaviors that imply success for some students, imply failure for others

This implies that flexible criteria are needed to judge the success of a student or of a teacher – unlike the regular assessment

Page 42: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Criterion Validity

The quality of criteria is documented for the regular assessment through a standard setting study

When criteria vary, then each different criterion needs to be documented

So we need to consider both score and criterion reliability & validity for Alt-MSA.

Page 43: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Technical Research Agenda

There are four sorts of technical research we should undertake: Reliability of Criteria Reliability of Scores Validity of Criteria Validity of Scores

We will describe some examples and possibilities for each.

Page 44: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Reliability of Criteria

Could see if the criteria (MOs) are internally consistent for a student in terms of difficulty, cognitive demand, and/or levels of the content elements they represent

Could do that for, say, 9 samples of students: L-M-H degrees of challenge for L-M-H grade levels,

Degree of challenge might be assessed by age of identification of disability or by location in the extended standards of last year’s MOs

Page 45: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Reliability of Scores

2007 rescore of a 5% sample of 2006 portfolios (n=266) showed agreement rates of 82%-89% for reading & 83%-89% for math

A NAAC review concluded the inter-rater evidence of scorer reliability is strong

Amount of evidence could be evaluated using Smith’s (2003) approach of modeling error using the binomial distribution to get decision accuracy estimates:

Page 46: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Decision Accuracy Study Assume each student produces a sample of

size 10 from a binomial population of MOs Can use the binomial distribution to generate

the probabilities of all outcomes (X=0 to10) for any π

For convenience, use the midpoints of ten equally-spaced intervals for π (.05 … .95)

Using X=0-50 for Basic, X=60-80 for Proficient, X=90-100 for Advanced yields:

Page 47: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Classification Probabilities for Students with Various πs

π Basic Proficient Advanced.95 .0001 .0861 .9138.85 .0098 .4458 .5443.75 .0781 .6779 .2440.65 .2485 .6656 .0860.55 .4956 .4812 .0232.45 .7384 .2571 .0045.35 .9052 .0944 .0005.25 .9803 .0207 .0000.15 .9986 .0013 .0000.05 1.000 .0000 .0000

Page 48: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

3x3 Decision AccuracyCollapsing across π with True Basic = .05-.55, True

Proficient = .65-.85, True Advanced = .95:Classification

True Level Basic Proficient Advanced TotalAdvanced .0000 .0086 .0914 .1000Proficient .0336 .1789 .0874 .3000Basic .5118 .0855 .0028 .6000P(Accurate) = .5118 + .1789 + .0914 = .7821This assumes equally-weighted πs

Page 49: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Empirically Weighted πsMastery Objectives Mastered in 2006 for Reading and Math (N = 4851 students)

Percent Mastered Reading Percent Math Percent100 21.8 26.4 90 16.1 16.7

80 11.6 10.3 70 8.0 7.8 60 6.7 6.1

50 5.5 5.8 40 4.9 4.6

30 5.1 4.1 20 4.7 4.1 10 6.7 6.3 0 6.9 7.7

Page 50: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

3x3 Decision Accuracy with Empirical Weights - Reading

Observed Achievement LevelTrue Level Basic Proficient Advanced TotalAdvanced .0000 .0258 .2726 .2984Proficient .0274 .1768 .1057 .3099Basic .3414 .0486 .0017 .3917

P(Accurate) = .3414 + .1768 + .2726 = .7908

Page 51: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

NCLB requires decisions in terms of Proficient/Advanced vs. Basic

Observed Level Group - ReadingTrue Level Basic Proficient or AdvancedProficient or Advanced .0451 .9549Basic .8716 .1284

These are conditional probabilities – they sum to 1 by rows.P[Type I Error (taking action)] = .0451P[Type II Error (taking no action)] = .1284These are less than Cohen’s guidelines of .05 and .20.

Page 52: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

3x3 Decision Accuracy with Empirical Weights - Math

Observed Achievement LevelTrue Level Basic Proficient Advanced TotalAdvanced .0000 .0299 .3174 .3474Proficient .0256 .1676 .1014 .2946Basic .3092 .0472 .0017 .3581

P(Accurate) = .3092 + .1676 + .3174 = .7942

Page 53: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

NCLB requires decisions in terms of Proficient/Advanced vs. Basic

Observed Level Group - MathTrue Level Basic Proficient or AdvancedProficient or Advanced .0398 .9602Basic .8635 .1365

These are conditional probabilities – they sum to 1 by rows.P[Type I Error (taking action)] = .0398P[Type II Error (taking no action)] = .1365These are also less than Cohen’s guidelines of .05 and .20.

Page 54: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Reliability of ScoresConclusions

Decision accuracy of Reading is 79.1% Decision accuracy of Math is 79.4% Misclassification probabilities are

False Reading MathProf. 12.8% 13.6%Not Prof. 4.5% 4.0%

These are within Cohen’s guidelines

Page 55: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Validity of Criteria:Content Evidence

Could study MO development & review process for 9 samples of students, L-M-H degrees of challenge for L-M-H grade levels

Could map student progress along content standard strands over time

Could evaluate and monitor the use of the bank

Could survey parents: are MOs too modest, about right, or too idealistic

MSDE will conduct a new cut-score study

Page 56: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Validity of Criteria: Quantitative Evidence

For n=267 same-student portfolio pairs from 2006 & 2007 95% of 2007 reading MOs 90% of 2007 math MOs were completely new or more demandingthan the respective student’s 2006MOs(suggesting growth)

Alternate standard-setting studies could generate evidence about validity of the existing (or resulting) criteria:

Page 57: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Possible Alternate Standard Setting Study Approaches

Develop percentage cut-scores for groups with different degrees of disability (e.g., modified Angoff) & articulate vertically & horizontally

Establish criterion groups using an external criterion and identify cut scores that minimize classification errors (contrasting groups)

Set cutpoints that match the percentages of students in the achievement levels in the general population (equipercentile)

Page 58: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Validity of Criteria:Consequential Evidence

Could study IEPs to see if they have become more oriented toward academic goals over time

Could study of the ability of Alt-MSA to drive instruction – e.g., do the enacted content standards move toward the assessed content standards?

Page 59: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Validity of Scores:Content Evidence

Could study how well raters can categorize samples of artifacts into the content strand elements their MOs were designed to represent

Page 60: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Validity of Scores:Consequential Evidence

Could survey stakeholders: How have the scores been used? How have the scores been misused?

Page 61: Presenters Sharon Hall  U.S. Department of Education Martin Kehe

Two Philosophical Issues

Justification is needed for implementing flexible performance expectations all the way down to the individual student

Justification is needed for using standardized percentages for success categories across the flexible performance expectations