Independent Alignment Review of the Florida Standards Alternate … · 2019-08-21 · Independent...

Florida Standards Alternate Assessment Alignment Study i

2017 No. 041

Independent Alignment Review of the Florida

Standards Alternate Assessment – Performance Task (FSAA-PT): Civics, US History, and the

Writing Prompts Final Report

Prepared for:

Vince Verges Florida Department of Education Turlington Building 325 West Gaines Street Tallahassee, Florida 32399

Prepared under:

Florida Department of Education Turlington Building 325 West Gaines Street Tallahassee, Florida 32399

Authors: Yvette M. Nemeth Justin Purl Tatiana Longabach Elizabeth Patton Caroline Wiley

Date:

December 26, 2017

Independent Alignment Review of the FSAA-PT: Civics, US History, and the Writing Prompts i

Independent Alignment Review of the Florida Standards Alternate Assessment – Performance Task (FSAA-PT): Civics, US History, and the

Writing Prompts

Table of Contents

Executive Summary ...................................................................................................................... 1

Overview ................................................................................................................................... 1

Methodology ............................................................................................................................. 1

Alignment Study Workshop ....................................................................................................... 2

Access Point to Standards Alignment Summary ...................................................................... 3

FSAA-PT to Access Points Alignment Summary ...................................................................... 6

Recommendations .................................................................................................................... 9

Chapter 1: Introduction ............................................................................................................... 11

Organization and Contents of the Report ............................................................................... 12

Chapter 2: Alignment Study Design and Methodology ............................................................... 13

Alignment of Assessments and Standards on Content and Performance .............................. 13 AP and FSAA-PT Overview ................................................................................................ 13

Content Alignment and Accessibility ....................................................................................... 14

Scope of Alignment Evaluations ............................................................................................. 16 Training ............................................................................................................................... 16 Panelists ............................................................................................................................. 17 Materials ............................................................................................................................. 18 Procedures .......................................................................................................................... 18 Workshop Progress ............................................................................................................ 20

Chapter 3: Alignment of Access Points to Standards ................................................................. 21

Overview of Access Points ..................................................................................................... 21

LAL Criteria ............................................................................................................................. 21 Criterion 1: Age Appropriateness ........................................................................................ 22 Criterion 2a: Content Centrality ........................................................................................... 22 Criterion 2b: Performance Centrality ................................................................................... 23 Criterion 3: Content Coverage – HumRRO Alignment Method ........................................... 25 Criterion 4: Content Differentiation ...................................................................................... 25 Criterion 5: Achievement ..................................................................................................... 26 Criterion 6: Performance Accuracy ..................................................................................... 27

Independent Alignment Review of the FSAA-PT: Civics, US History, and the Writing Prompts ii

Table of Contents

Chapter 4: Alignment of FSAA-PT Tasks to APs ........................................................................ 28

LAL Criteria ............................................................................................................................. 28 Criterion 1: Age Appropriateness ........................................................................................ 28 Criterion 2a: Content Centrality ........................................................................................... 29 Criterion 2b: Performance Centrality ................................................................................... 29 Criterion 3a: Tasks Represent Intended Content ................................................................ 30 Criterion 3b: Tasks Represent Intended Categories ........................................................... 30 Criterion 3c: Task DOK Represent Alternate Standards ..................................................... 32 Criterion 4: Content Differentiation ...................................................................................... 37 Criterion 5: Achievement ..................................................................................................... 39 Criterion 6: Performance Accuracy ..................................................................................... 45

Chapter 5: Summary and Recommendations ............................................................................. 50

Access Point to Standards Alignment Summary .................................................................... 50

FSAA-PT Alignment Summary ............................................................................................... 52

Recommendations .................................................................................................................. 56

References .................................................................................................................................. 58

Appendix A. Panelist Alignment Review Materials Samples .................................................... A-1

Independent Alignment Review of the FSAA-PT: Civics, US History, and the Writing Prompts iii

Table of Contents (Continued)

List of Tables

Table 1. Percent of Grade-Level APs Which Met Each LAL Criterion .......................................... 5

Table 2. Percent of Grade-Level Tasks Which Met Each LAL Criterion ....................................... 7

Table 3. Grade/Content Areas Included in Alignment Study ...................................................... 14

Table 4. LAL Criteria for AP and FSAA-PT Alignment Evaluation .............................................. 15

Table 5. Professional and Demographic Characteristics of Panelists ........................................ 17

Table 6. Alignment Steps for Panelists’ Ratings ......................................................................... 18

Table 7. Alignment Steps Completed by Each Panel Group June 21-22, 2017 ......................... 20

Table 8. Number of Blueprint Standards Compared to APs for Writing ...................................... 21

Table 9. Number of Blueprint Standards Compared to APs for Social Studies .......................... 21

Table 10. Percent of Writing APs Rated as Age Appropriate ..................................................... 22

Table 11. Percent of Social Studies APs Rated as Age Appropriate .......................................... 22

Table 12. Percent of Writing APs Linked to On-Grade Level Writing LAFS ............................... 23

Table 13. Percent of Social Studies APs Linked to On-Grade Level NGSSS for Social Studies ...... 23

Table 14. Percent of Writing APs at Lower, Same, or Higher Levels of Complexity Compared to Related Writing Standards ................................................................... 24

Table 15. Percent of Social Studies APs at Lower, Same, or Higher Levels of Complexity Compared to Related NGSSS for Social Studies ...................................................... 24

Table 16. Percent of Linked APs at Various Levels of Performance Centrality – Writing ........... 25

Table 17. Percent of Linked APs at Various Levels of Performance Centrality – Social Studies ...... 25

Table 18. Consensus AP Content Differentiation – Writing ........................................................ 26

Table 19. Percent of APs Rated as Accessible to Different Disability Groups – Writing............. 27

Table 20. Percent of APs Rated as Accessible to Different Disability Groups – Social Studies ........ 27

Table 21. Percent of Writing Tasks Rated as Age Appropriate .................................................. 28

Table 22. Percent of Social Studies Tasks Rated as Age Appropriate ....................................... 29

Table 23. Percent of Writing Tasks at Various Levels of Performance Centrality....................... 29

Table 24. Percent of Social Studies Tasks at Various Levels of Performance Centrality ........... 30

Table 25. Writing Task Alignment Ratings .................................................................................. 30

Table 26. Social Studies Task Alignment Ratings ...................................................................... 30

Table 27. Mean Number of Aligned Writing Items by Content Category .................................... 31

Table 28. Mean Number of Aligned Social Studies Items by Content Category ......................... 31

Table 29. Percent of Writing Tasks at Lower, Same, or Higher Levels of Complexity ................ 32

Table 30. Percent of Social Studies Tasks at Lower, Same, or Higher Levels of Complexity ........... 33

Independent Alignment Review of the FSAA-PT: Civics, US History, and the Writing Prompts iv

Table of Contents (Continued)

List of Tables

Table 31. Distribution of Panelist DOK Ratings – Writing ........................................................... 33

Table 32. Distribution of Panelist DOK Ratings – Social Studies ............................................... 34

Table 33. Percent of Writing Tasks at Lower, Same, or Higher Levels of Complexity Compared to Related APs ......................................................................................... 34

Table 34. Percent of Social Studies Tasks at Lower, Same, or Higher Levels of Complexity Compared to Related APs ...................................................................... 34

Table 35. Percent of Writing Tasks at Lower, Same, or Higher Levels of Volume of Information ................................................................................................................ 35

Table 36. Percent of Social Studies Tasks at Lower, Same, or Higher Levels of Volume of Information ............................................................................................................ 35

Table 37. Percent of Writing Tasks at Lower, Same, or Higher Levels of Vocabulary................ 36

Table 38. Percent of Social Studies Tasks at Lower, Same, or Higher Levels of Vocabulary ........... 36

Table 39. Percent of Writing Tasks at Lower, Same, or Higher Levels of Context ..................... 37

Table 40. Percent of Social Studies Tasks at Lower, Same, or Higher Levels of Context ......... 37

Table 41. Consensus Content Differentiation Across Grades – Writing Prompt Portion of the ELA FSAA-PT ..................................................................................................... 38

Table 42. Consensus Content Differentiation – Social Studies FSAA-PT Item Set .................... 39

Table 43. Consensus Student Learning – Writing Prompt Portion of the ELA FSAA-PT ............ 40

Table 44. Consensus Student Learning – Social Studies FSAA-PT ........................................... 44

Table 45. Percent of FSAA-PT Tasks as Accessible to Different Disability Groups – Writing ........... 46

Table 46. Percent of FSAA-PT Tasks as Accessible to Different Disability Groups – Social Studies ............................................................................................................ 46

Table 47. Percent of FSAA-PT Tasks as Amenable to Accommodations or Supports – Writing ...... 46

Table 48. Percent of FSAA-PT Tasks as Amenable to Accommodations or Supports – Social Studies ............................................................................................................ 47

Table 49. Consensus Whole Test Barriers to Demonstrating Student Knowledge ..................... 47

Table 50. Consensus Whole Test Barriers to Demonstrating Student Knowledge for Certain Disability Groups ........................................................................................... 48

Table 51. Percent of Grade-Level APs Which Met Each LAL Criterion ...................................... 51

Table 52. Percent of Grade-Level Tasks Which Met Each LAL Criterion ................................... 53

Independent Alignment Review of the FSAA-PT: Civics, US History, and the Writing Prompts 1


Writing Prompts

Executive Summary

Overview

The Florida Department of Education (FDOE) requested an external, independent alignment study (review and analysis) of the Florida Standards Alternate Assessment – Performance Task (FSAA-PT) in civics and US history. In addition, the writing prompt portion of the English/Language Arts (ELA) assessment1 was evaluated. In general, the ELA assessment includes Reading and Writing selected response items as well as a writing prompt. An alignment review provides one form of evidence supporting the validity of the state assessment system. All aspects of the state assessment system must coincide, including the grade-level standards, academic content standards, and each assessment. In general, the FSAA-PT is an alternate assessment designed for students with significant cognitive disabilities. As a result of their cognitive disabilities, these students would not be appropriately assessed by the general statewide assessment program. The assessment is designed to evaluate the Florida Standards Access Points (AP) for Language Arts and the Next Generation Sunshine State Standards for Social Studies Access Points (AP)2, a reduced and marginally simplified version of the Florida content standards.

The Human Resources Research Organization (HumRRO) was contracted to complete the alignment of the FSAA-PT for FDOE. Our alignment approach was designed to indicate the extent to which the reviewed APs, associated with the FSAA-PT blueprint for the ELA writing prompt section, civics, and US history, are related to the Language Arts Florida Standards (LAFS) and the Next Generation Sunshine State Standards (NGSSS) for Social Studies. In addition, we evaluated whether the APs are age appropriate; the APs differ in breadth and depth across grade levels; and the APs are accessible to a wide range of students with varying disabilities. Our approach is flexible and allows for whether items on the FSAA-PT are not only related to assigned APs, reporting categories, and cognitive complexity, but to age appropriateness, differing levels of complexity across tasks, and accessibility of students with varying disabilities as well.

Methodology

HumRRO used the Links for Academic Learning alignment method (LAL) developed by the National Alternate Assessment Center as a basis to conduct the content alignment reviews and analyze the results (Flowers, Wakeman, Browder, & Karvonen, 2007). The original LAL method includes Webb’s methodology for Criterion 3: Content Coverage. HumRRO adapted the LAL

1 An alignment study of the ELA FSAA-PT was conducted in 2016 and results can be found in Nemeth, Purl, and Smith (2016 No. 029). For that study, the full ELA FSAA-PT was evaluated; however, the writing prompts were at the field test stage while the rest of the assessment was operational. Thus, the current alignment study only focuses on the now operational writing prompts and the associated writing APs. 2 Downloadable versions of the Florida Standards and Next Generation Sunshine State Standards Access Points can be found at: http://www.cpalms.org/Downloads.aspx


method3 to best fit FDOE’s data analysis needs and substituted the HumRRO alignment methodology for Webb’s methodology in Criterion 3. The criteria considered in this study are listed below:

Criterion 1: Age Appropriate – The content is referenced to the student’s assigned grade-level (based on chronological age).

Criterion 2: Standards Fidelity - 2a: Content Centrality – The target content of the APs maintain fidelity with the

content of the original grade-level standards. - 2b: Performance Centrality – The focus of achievement of the APs maintain

fidelity with the specified performance in the grade-level standards.

Criterion 3: Content Coverage – (using the HumRRO Alignment Method) - 3a: Content Representation – A basic measure of alignment between APs and

item content. Simply stated, this criterion is a check of the AP assigned to each item by item writers

- 3b: Category Representation – This is a measure of how well items represent reporting categories as indicated in the test blueprint.

- 3c: Depth of Knowledge (DOK) Representation – This is a measure of the cognitive complexity of tasks and whether that represents cognitive complexity of the content in the APs.

- 3d: Category Reporting – Reporting categories are sufficient measured.

Criterion 4: Content Differentiation – The level of differentiation of content across grade-levels within a grade span panel group.

Criterion 5: Achievement – The expected achievement provides the students an adequate opportunity to show learning of grade referenced academic content.

Criterion 6: Performance Accuracy – The potential barriers to demonstrating what students know and can do are minimized in the assessment to increase measurement accuracy of student performance.

The LAL method is appropriate for alignment of the APs to the corresponding LAFS and NGSSS for Social Studies, as well as for alignment of the FSAA-PT to APs. Criteria 1, 2, 4, and 6 are appropriate for the alignment of APs to Standards. For the alignment of the FSAA-PT to APs, all six criteria are applicable. The methodology described above meets or exceeds prior requirements for Federal peer review.

Alignment Study Workshop

Five panel groups (writing grades 4-5, 6-8, and 9-10; civics EOC; and US history EOC were recruited from a database of Florida educators, both general education and Exceptional Student Education (ESE) teachers, provided by FDOE and Measured Progress. HumRRO conducted the alignment workshop on June 21-22, 2017 at a hotel in Jacksonville, Florida. The workshop began with a general session to introduce HumRRO staff, review reimbursement logistics, read and sign affidavits of nondisclosure for the secure materials panelists would review, and conduct general training. Throughout the workshop, panelists were told and reminded that the alignment

3 The full LAL method contains an additional criterion. Criterion 1: Academic evaluates whether the content is academic and includes the major domains/strands of the content area. As alternate assessments have progressed, this criterion is no longer of added value. Thus, we did not ask panelists to rate tasks on this criterion and do not refer to it in the report.


ratings and other evaluative information they provided, were independent of both FDOE and Measured Progress.

Panelists received paper copies of the FSAA-PT to review first. They were also provided paper and electronic copies of various resource materials such as the APs, presentation rubric, DOK definitions, and Panelist Instructions to support their evaluation. Panelists used electronic Microsoft Excel rating forms for their data entry and notes.

The HumRRO project director oriented all of the panelists to the work they would conduct at the workshop and supported the facilitators by answering questions and providing further guidance if needed. Because the project director oversaw all groups, she made certain that process decisions and information was shared among the rooms.

Following the general session, panelists began working in their assigned groups. US history and civics panel groups were located in a separate room free from other groups and distractions. Writing 4-5, 6-8, and 9-10 panel groups were located in one room, since their instructions were similar and could be provided to them at the same time. A HumRRO facilitator was assigned to each of the panel groups. The facilitators reviewed how panelists should use materials and provided detailed training on rating procedures. They also answered questions and guided the pace of the workshop.

Before each alignment step was conducted, facilitators trained panelists on the purpose of the step, the rating code definitions, and entering data in the appropriate rating form. Before allowing panelists to work independently on certain tasks, facilitators had panelists complete the first two to three ratings as a group to ensure everyone understood the task and rating code definitions. Additionally, facilitators conducted periodic consistency checks to ensure panelists were continuing to understand the task.

Access Point to Standards Alignment Summary

For this alignment evaluation, panelists reviewed APs, associated with the FSAA-PT blueprints, for the writing prompt section of the ELA assessment, civics, and US history in multiple ways. First, they evaluated the content centrality (Criterion 2) between the blueprint APs and the corresponding LAFS, and NGSSS for Social Studies. Second, panelists evaluated the progression of content (Criterion 4) from one grade to the next only for the blueprint identified writing APs. Lastly, panelists rated the appropriateness and accessibility (Criteria 1 and 6) of the AP content for this population of students.

The rules for the LAL criterion applied to the alignment between blueprint identified APs and the LAFS and NGSSS for Social Studies are as follows:

Criterion 1: Age Appropriateness (individual panelist rating) - 90% or more of the APs were rated as ‘adapted’ or ‘neutral’

Criterion 2a: Content Centrality (individual panelist rating) - 90% or more of the APs were linked to the LAFS or NGSSS for Social Studies

Criterion 2b: Performance Centrality (individual panelist rating) - 90% or more of the APs were comparable in complexity to the LAFS or NGSSS for

Social Studies

Criterion 4: Content Differentiation (consensus group rating) - Dimension ratings were ‘clear’ or ‘partial’ and the Identical dimension is ‘no’


Criterion 6: Performance Accuracy (individual panelist rating) - 90% or more of the APs were accessible to different disability groups

Criterion 3: Content Coverage is not included because the content coverage criterion focuses on the relationship between items and APs regarding content, category, and DOK representation and is not applicable to the AP to Standards evaluation. Criterion 5: Achievement focuses on the degree to which the assessment provides evidence of a student’s ability to demonstrate what they know and can do on grade referenced academic content. Thus, this criterion is not applicable to the evaluation of the AP to Standards relationship.

Table 1 provides summary conclusions on the alignment of the blueprint identified APs to their respective LAFS and NGSSS for Social Studies. As a reminder, only the writing APs and LAFS are of interest in this alignment study. For non-writing APs and LAFS, refer to the Nemeth et al. (2016 No. 029) report. If APs met the criterion, then a green highlighted box containing a ‘’ is assigned. For results falling slightly below a criterion, then a yellow highlighted box containing the criterion results is assigned. Finally, a red highlighted box contains results that fell well below the criterion.

Independent Alignm

ent Review

of the FS

AA

-PT

: Civics, U

S H

istory, and the Writing P

rompts

5

Table 1. Percent of Grade-Level APs Which Met Each LAL Criterion

Criterion 1 Criterion 2 Criterion 4 Criterion 6

Age Appropriate Content Centrality Performance Centrality Content Differentiation Performance Accuracy

Is the content of the

APs age appropriate?

Does the AP content link with the associated LAFS

or NGSSS?

Are the APs comparable in complexity to the LAFS & NGSSS?

Does content differ across grade-levels

within a grade span?4

Are barriers to demonstrating student knowledge minimized?

Tables 10-11 Tables 12-13 Tables 16-17 Table 18 Tables 19-20

W4 0 out of 5

W5

W6

2 out of 5

W7

W8

W9 NA

W10

Civ NA

USH NA

4 For Writing grades 4-8, a comparison between this study and the 2016 alignment study (see Nemeth, et al. [2016 No. 029]) reveals drastically different results. In the 2016 alignment study, panelists evaluated all the blueprint identified APs for Language Arts associated with the ELA FSAA-PT. However, the current alignment study required panelists to review the blueprint identified APs for Language Arts associated with only the writing prompt section of the ELA FSAA-PT.


In general, the blueprint identified APs exhibited high content linkage with the grade-level standards. Specifically, the APs across all grades and subjects were rated by panelists as age appropriate (Criterion 1) and were found to assess the same content and performance expectations as the grade-level standards (Criterion 2) for all grades and subjects. Panelists felt that the blueprint identified APs were accessible to different disability groups (Criterion 6).

Criterion 4 (content differentiation) was assessed only for writing grades 4-5 and 6-8 and was conducted as a group consensus activity within each grade span. The civics assessment and the US history assessment were not intended to have content differentiation between grades. Similarly, writing grades 9-10 APs were the same between these two grades. Content differentiation appears to be an area in need of improvement. For writing grades 4-5, panelists found content differentiation to be low in all areas (breadth, depth, prerequisite, new learning), and consequently rated the APs to be identical between the grades. For writing grades 6-8, the panelists found no differentiation in breadth between any of the three grades, low differentiation in new learning between grades 7 and 8, and partial differentiation in depth and prerequisite. As a result, they concluded that one of the APs (AP 2.4) is identical across the grades.

FSAA-PT to Access Points Alignment Summary

Table 2 provides summary conclusions on the alignment of the FSAA-PT writing prompt section of the ELA, civics, and US history assessments to the associated LAFS and NGSSS for Social Studies APs, respectively. If tasks met the criterion, then a green highlighted box containing a ‘’ was assigned. For results falling slightly below a criterion, a yellow highlighted box containing the criterion results was assigned. Finally, a red highlighted box contains results that fell well below the criterion.

The rules for the LAL and HumRRO criterion applied to the alignment between FSAA-PT tasks and APs are as follows:

Criterion 1: Age Appropriateness (individual panelist rating) - 90% or more of the tasks were rated as ‘adapted’ or ‘neutral’

Criterion 2b: Performance Centrality (individual panelist rating) - 90% or more of the tasks were rated as ‘some’ or ‘all’

Criterion 3a: Content Representation (individual panelist rating) - 90% or more of the tasks were rated as ‘partial’ or ‘fully’ aligned

Criterion 3b: Category Representation (based on individual panelist rating) - Tasks match the FSAA-PT Test Specifications targets

Criterion 3c: DOK Representation (individual panelist rating) - 50% or more of the prompt tasks and task 3 of an item set were at the same or

higher DOK level as the AP - 90% or more of the assigned complexity ratings are confirmed by panelists for DOK,

Volume of Information, Vocabulary, and Context


Criterion 5: Achievement (consensus group rating) - 6 of the 7 dimensions have some level of inference, either low or high - At least 4 dimensions have a high level of inference

Criterion 6: Performance Accuracy (individual panelist rating) - 90% or more of the tasks were accessible to different disability groups - 90% or more of the tasks were amenable to accommodations or supports

Independent Alignm

ent Review

of the FS

AA

-PT

: Civics, U

S H


rompts

7

Table 2. Percent of Grade-Level Tasks Which Met Each LAL Criterion

Criterion 1 Criterion 2 Criterion 3 Criterion 4 Criterion 5 Criterion 6

Age

Appropriate Performance Centrality

Content Coverage Content

Differentiation Achievemen

t Performance

Accuracy

Item

Alignment

Represent Intended Categorie

s

Task Complexity

Is th

e co

nten

t of

the

task

s ag

e ap

pro

pria

te?

Is th

e ite

m s

et ta

sk

com

para

ble

in

com

plex

ity to

the

A

P?

Are

task

s fu

lly

alig

ned

with

A

Ps?

Do

task

s ad

equa

tely

re

pres

ent r

epor

ting

cate

gorie

s?

Do

task

s re

flect

the

rang

e of

DO

K in

the

AP

s?5

Do

pane

lists

agr

ee w

ith

DO

K?

Do

pane

lists

agr

ee w

ith

Vol

ume

of In

form

atio

n?

Do

pane

lists

agr

ee w

ith

Voc

abul

ary?

Do

pane

lists

agr

ee w

ith

Con

text

?

Wri

tin

g:

Do

prom

pts

incr

ease

in c

om

plex

ity

acro

ss g

rade

leve

ls?

6

C

ivic

s &

US

H:

Do

task

s w

ithin

an

item

set

in

crea

se in

co

mpl

exity

?

Stu

dent

ach

ieve

men

t de

mon

stra

tes

lear

ning

.

Are

task

s ac

cess

ible

to

diffe

rent

dis

abili

ty

grou

ps?

Are

task

s am

ena

ble

to

acco

mm

odat

ions

or

supp

orts

?

Tables 21-22

Tables 23-24

Tables 25-26

Tables 27-28

Tables 33-34

Tables 29-30

Tables 35-36

Tables 37-38

Tables 39-40

Tables 41-42

Tables 43-44

Tables 45-46

Tables 47-48

W4 17%

80% 70% 75% 75% 2 out of 5

6 out of 7; 3

75%

W5 13% 6 out of 7; 3

W6 88% 17%

0 out of 5

W7 88% 0%

W8 0%

W9 21% 65% 85% 75%

W10 33%

Civ 3 out of 7; 3

USH

5 For Writing grades 4-10, a comparison between this study and the 2016 alignment study (see Nemeth, et al. [2016 No. 029]) reveals different results. In the 2016 alignment study, panelists evaluated the field test writing prompts still under development. Also, panelists from last year and this year were not the same educators. 6 In the 2016 alignment study, panelists evaluated all tasks and prompts on the ELA FSAA-PT. However, the current alignment study required panelists to review only the writing prompts of the ELA FSAA-PT.


In general, the civics and US history FSAA-PT exhibited good overall alignment with the fewest areas for improvement. The writing prompts associated with the ELA FSAA-PT showed more areas for improvement. Panelists found the APs and assessment tasks for all subjects and grades to be age appropriate (Criterion 1). They determined that for the most part, the assessment tasks maintain fidelity with the performance expectations in the APs for civics and US history, and for writing grades 4-5, 8, and 9-10. For writing grades 6 and 7, 88% of tasks were found to call for comparable performance levels as the standards (Criterion 2).

There were mixed results on Criterion 3. Panelists found the tasks for each grade and subject to be fully aligned with the standards, and the percent of aligned tasks matches test specifications. However, panelists found the task cognitive complexity to be substantially lower than the AP complexity in writing for all grades. In civics and US history, the cognitive complexity of tasks was found to match the AP cognitive complexity. For the most part, panelists agreed with the assigned DOK. There was some disagreement in writing grades 4 and 9, but the overall cognitive complexity assigned by the panelists was either the same or higher. For writing grade 4, panelists agreed with 80% of assigned DOK levels, rating 10% of tasks as requiring a higher DOK, and 10% of tasks as requiring a lower DOK. For writing grade 9, panelists agreed with only 65% of assigned DOK levels, rating other tasks as lower (25%) and higher (10%). Similarly, panelists agreed with most of the Volume of Information levels, except for writing grades 4 and 9. They agreed with 70% of grade 4 writing tasks, and rated the other 30% as having a higher Volume of Information. For grade 9, on the other hand, panelists agreed with 85% of the tasks, and rated the other 15% as having a lower Volume of Information. For the most part, panelists agreed with the Vocabulary rating, with the exception of writing grade 4 and grade 9, where they agreed with 75% of the tasks. For grade 4, the other 15% of the tasks were rated as having a higher Vocabulary level, and for grade 9, 10% of the tasks were rated as having a lower Vocabulary level while 15% were rated as having a higher Vocabulary level. Panelists agreed with the rating of Context in most cases, with the exception of grade 4. In this case, they agreed with 75% of the tasks, and rated the Context of the other 25% of the tasks as higher.

Criterion 4 was evaluated differently for the writing and social studies assessments; however, the criterion was evaluated as a group consensus rating for all panel groups. Since the writing tasks, unlike the tasks for civics and US history, were not ordered from easiest to hardest, these tasks were evaluated for content differentiation in the following way: Is there a progression in breadth, depth, prerequisite, new learning from lower grade prompt 1 and prompt 2 to higher grade prompt 1 and prompt 2? Content differentiation ratings at the prompt level agree with the overall AP content differentiation ratings for these grades. The progression of prompts for grades 4-5 writing was judged to have no differentiation in new learning, limited prerequisite differentiation, and partial differentiation in the breadth and depth. As a result, panelists concluded that content differentiation was limited for these grades. However, for writing grades 6-8, panelists evaluated depth as limited between grades 6-7 and 8, and prerequisite as limited between grades 6 and 7. They stated that breadth and new learning were absent across all three grades, and consequently determined that the tasks were identical between grades 6 and 7, but not between grade 8 and the other two grades. For writing grades 9-10, panelists found clear content differentiation. For civics and US history, panelists were asked to evaluate whether the content differentiation existed from task 1 to task 3 in an item set. Overall, panelists found clear content differentiation for civics and US history.

Criterion 5, a group consensus rating within each grade span, is an evaluation of whether the assessment system, in general, provides student demonstration of learning. Here, as well, some dimensions were rated by panelists as providing ‘no inference’ of student learning. For example,


in civics panelists stated that little inference can be made about the presence of new learning, and the assessment results may be challenging to generalize across people and settings, and materials and activities. In grade 4-5 writing, panelists stated as a group that the assessment was seen to include tasks where hand over hand teacher guidance may be reducing the level of inference about student knowledge; therefore, the level of independence was judged to provide no inference about student knowledge. For the most part, across the subjects, panelists felt the FSAA-PT provides an assessment in which student learning can be demonstrated.

One thing we found in the course of this study is that even after we discussed with panelists the allowable accommodations and modifications as described in the test administration manual, panelists tend to think back to how these and similar assessments are being administered in the field. In some cases, for example, if a teacher is unable to elicit a response from a student by the means specified in the test administration manual, they are going to implement some solutions that are not explicitly prohibited, but also not explicitly endorsed in the manual. While the manual does not explicitly endorse hand over hand assistance, the participants observed it being implemented in the field, and mentioned it in the discussion. We value these statements by the teachers, even though they diverge from the test administration manual instructions, since they come from their expertise. To make ratings more consistent, it may be helpful to state more explicitly in the test administration manual not only which accommodations/modifications are allowed, but also which ones are prohibited.

For Criterion 6, the ratings provided by panelists for all grades and subjects except for writing grade 4 found 100% of the tasks to be accessible to different disability groups. For writing grade 4, only 75% of the tasks were rated as accessible to different disability groups. Panelists voiced concerns about the tasks translating to ASL, and students with visual impairments having trouble with some tasks.

Recommendations7

HumRRO makes the following recommendations to strengthen the alignment between the components of the Florida assessment system:

Review the cognitive complexity of writing tasks. Tasks should assess APs at the same or higher complexity level. This ensures the tasks are appropriately assessing the content of the AP that the task is asking a student to demonstrate knowledge and ability. The majority of writing tasks, associated with prompt 1, did not assess students on a cognitive complexity level that was similar to the cognitive complexity level of the AP; instead, tasks were judged to be too low. It is recommended that the writing tasks, particularly for prompt 1, be reviewed to ensure the cognitive complexity level of the tasks are in accordance with the assessment design and, if needed, additional writing tasks developed measuring a wider range of complexity to better match the cognitive complexity of the APs.

Review content differentiation of writing APs and tasks across grades. APs should increase in content breadth, depth and newer knowledge, as well as growth on prerequisite skills. Similarly, for assessments in which tasks are structured in such a way that they increase in cognitive complexity between grades (writing grades 4-10), there should be a progression of breadth and depth between the tasks between grades. However, in writing grades 4-5 and 6-8 no content differentiation was found between

7 A supplemental appendix, not for public dissemination as it contains item information, identifying specific items and tasks that FDOE and Measured Progress may want to review will be provided.


APs across grades within grade spans, and little task differentiation between grades within grade spans among the prompts. It is recommended, especially for writing grades 4-5 and 6-8 that the APs and tasks be reviewed to ensure appropriate content differentiation within and across grade spans. If the content differentiation between APs and thus prompts is not meant to be reflected in the AP, per se, but in the complexity of the reading passage associated with the writing prompt, then additional training or communication to educators in the field regarding such is recommended.

Review the DOK, Volume of Information, Vocabulary, and Context assigned to tasks. For writing grades 4 and 9, panelists agreed with less than 90% of assigned DOK, Volume of Information, Vocabulary, and Context (writing grade 4 only) assigned to tasks. It is recommended, especially for these grades and subjects, for the tasks to be reviewed to ensure they reflect the appropriate DOK, Volume of Information, Vocabulary, and Context.

Review the degree to which the assessment provides evidence of a student’s ability to demonstrate what they know and can do. For civics and writing grades 4-5, panelists expressed concern that the tasks may not generalize to people and settings, and materials and activities, and that a student’s responses may not be sufficiently independent from the teacher. It is recommended that the accommodations and modifications allowed and not allowed is explicitly stated in the test administration manual.

Review the accessibility of tasks to different disability groups. For grade 4 writing, panelists rated only 75% of the tasks as accessible to all disability groups. Their specific concerns were the accessibility of tasks for deaf, deaf/ blind students, and students who communicate nonverbally with pictures. While only grade 4 writing did not meet the criterion for accessibility of tasks, concerns about these population groups were voiced by panelists in other subject groups as well. It is recommended that accommodations for these groups are provided and/or outlined in a more clear and specific fashion.



Writing Prompt

Chapter 1: Introduction

The Florida Department of Education (FDOE) requested an external, independent alignment study (review and analysis) of the Florida Standards Alternate Assessment – Performance Task (FSAA-PT) End of Course (EOC) in civics and US history. In addition, the writing prompt portion of the English/Language Arts (ELA) assessment was evaluated. In general, the ELA assessment includes Reading and Writing selected response items as well as a writing prompt. An alignment review provides one form of evidence supporting the validity of the state assessment system. Alignment results demonstrate that assessments represent the full range of the content standards, and they measure student knowledge in the same manner and at the same level of complexity as expected in the content standards. All aspects of the state assessment system must coincide, including the grade-level standards, academic content standards, and each assessment.

FDOE requested the alignment study to meet both state and federal requirements. The federal requirement of the U.S. Department of Education (USDE) stems from the No Child Left Behind (NCLB) Act of 2001 and most recently the Every Student Succeeds Act (ESSA) of 2015. The federal government has established regulations for students with significant cognitive disabilities, often referred to as the “1% rule” (U.S. Department of Education, 2005). This rule allows the state to accommodate students with significant cognitive disabilities by setting different performance expectations for up to 1% of their student population. States can develop alternate academic standards, achievement standards, and assessments that more fairly and accurately demonstrate the achievement of these students. However, states must show that the alternate academic standards and achievement standards for these students link to the general, statewide grade-level expectations, although the breadth and depth of these expectations can be reduced (USDE, 2005).

The FSAA-PT is an alternate assessment designed for students with significant cognitive disabilities. Because of their cognitive disabilities, these students would not be appropriately assessed by the general statewide assessment program. The FSAA-PT EOC in civics and US history consists of 16 operational item sets with each item set containing three tasks ranging from low to high complexity. A student’s teacher has the ability to scaffold the first task, if needed, by reducing the response options if the student does not respond correctly. The second and third tasks do not allow for scaffolding if the student responds incorrectly. Students are also provided with appropriate stimuli, if needed, to demonstrate parts of the question, as necessary. The FSAA-PT writing prompt portion of the ELA assessment consists of two prompts that represent two levels of complexity. Prompt 1 consists of five selected-response questions in response to text. These questions are not written to increase in complexity, but are intended to lead a student to a full writing product. All five questions must be administered to the student; there is no scaffolding allowed. Prompt 2 is an open response format that requires a student to create a writing product (e.g., an essay). The assessment is designed to evaluate the Florida Standards Access Points (AP) for Language Arts and the Next Generation Sunshine State


Standards for Social Studies Access Points (AP)8, a reduced and marginally simplified version of the Florida content standards.

Therefore, in accordance with federal requirements, Florida must demonstrate that: (1) the Language Arts Florida Standards (LAFS) and the Next Generation Sunshine State Standards (NGSSS) for Social Studies link to the corresponding APs; and (2) the FSAA-PT writing prompt portion of ELA, civics, and US history link to the corresponding APs.

Organization and Contents of the Report

This report contains five chapters. Chapter 2 explains alignment methodologies, including general methods used to evaluate alignment of alternate assessments. Subsequent chapters provide alignment results for comparison between the components of the assessment system: (a) Chapter 3 presents results of the alignment comparison between the APs and the LAFS and NGSSS for Social Studies; (b) Chapter 4 presents results on the content review of the FSAA-PT writing prompts and tasks relative to the corresponding LAFS and NGSSS for Social Studies; and (c) Chapter 5 provides recommendations for FDOE to strengthen alignment over time. Appendix A includes examples of rating forms and training materials used in the alignment workshop.

8 Downloadable versions of the Florida Standards and Next Generation Sunshine State Standards Access Points can be found at: http://www.cpalms.org/Downloads.aspx


Chapter 2: Alignment Study Design and Methodology

In this section, we discuss key concepts related to alignment research, followed by a description of the alignment evaluations and methods used as part of the study.

Alignment of Assessments and Standards on Content and Performance

Alignment studies answer one vital question related to the validity of an assessment, “Does the assessment content adequately reflect the content that students are expected to learn as provided in the state standards?” For Florida, the content is found in the APs associated with the LAFS and NGSSS for Social Studies. Assessments must measure only the content specified in the standards, and student scores generated from these assessments should adequately reflect student knowledge of the content standards. The FSAA-PTs were built based on the assessable APs listed in the Florida Standards Alternate Assessment Performance Task: Test Design, Blueprint, and Item Specifications for English Language Arts and Social Studies (FSAA-PT Test Specifications).

In general, alignment evaluations for an assessment reveal the breadth, or scope, of knowledge as well as the depth-of-knowledge, or cognitive processing, expected of students by the state’s content standards. Alignment analyses for alternate assessments help to answer questions such as the following:

How much and what type of content is covered by the FSAA-PT?

Is the content in the alternate assessment or alternate standards sufficiently similar to the expectations of Florida’s content standards?

Are students asked to demonstrate this knowledge at the same level of rigor as expected in the full content standards?

Does the assessment accurately measure student knowledge of content standards?

Is the alternate assessment accessible to all students in the targeted population?

These questions can be grouped into two categoriescontent alignment and performance alignment. However, all alignment evaluations tie back to the state content standards.

AP and FSAA-PT Overview

The FSAA-PT is available for those students with significant cognitive disabilities who, even with accommodations, the general assessment is not suitable due to a variety of disabilities. A students’ knowledge and understanding of the APs in writing, civics, and US history are measured by the set of prompts and item tasks on the FSAA-PT.

The FSAA-PT is administered one-on-one between the student and the student’s teacher or other licensed professional who has worked with the student. The FSAA-PT in civics and US history is designed such that an item set, containing three tasks, measures one AP. The first task is written to the Participatory AP, the second task is written to the Supported AP, and the third task is written to the Independent AP. A student’s teacher can scaffold the first task by reducing the response options if the student does not respond correctly. If scaffolding is used on the first task, then the student does not receive the second and third tasks. Subsequently, the second and third tasks do not allow for scaffolding if the student responds incorrectly.


The exception to the structure of the tasks outlined above is the writing prompt portion of the ELA assessment. To begin, scaffolding is not allowed for the writing prompt section of the ELA assessment. The writing prompt section consists of two different prompts. The first prompt consists of five selected-response questions associated with a passage, and the second prompt consists of a single open-response format prompt associated with another passage. Table 3 shows the grade levels in which each FSAA-PT subject area is administered as well as all the assessments reviewed in this alignment study. The civics and US history EOC assessments are administered to the majority of students at the indicated grade level but not exclusively.

Table 3. Grade/Content Areas Included in Alignment Study

Grade Writing Civics EOC US History EOC

4 X

5 X

6 X

7 X X

8 X

9 X

10 X

High School X

Content Alignment and Accessibility

Alignment methodologies can be used on general and alternate assessments. Several methods of alignment (e.g., Porter, 2002; Webb, 1997, 1999, 2005) are in current use and involve rating a number of different aspects of assessment items relative to the content standards. In particular, alignment studies of alternate assessments often require review of additional aspects of alignment unique to the design of the alternate assessments. These dimensions include the extent to which the alternate benchmarks link to the general content standards and to the accessibility of the assessment system to students with a variety of disabilities. Alternate assessments differ from general state assessments in form and structure; thus, an alignment methodology must be responsive to these differences. Approaches outlined in the Links for Academic Learning (LAL) Alignment Method (Flowers, Wakeman, Browder, & Karvonen, 2007) and HumRRO’s alignment methodology (Nemeth, Purl, & Smith, 2016; Smith, Deatz, Wen, & Nemeth, 2014; Smith, Wen, Nemeth, Levinson, & Deatz, 2014) provides an overall model for evaluating alternate assessments. The methodology we used meets or exceeds prior requirements for Federal peer review.

Links for Academic Learning Alignment Method. For this alignment study, HumRRO used the Links for Academic Learning alignment method (referred to in this report as LAL) developed by the National Alternate Assessment Center as a basis to conduct the content alignment reviews and analyze the results (Flowers et. al.,2007). The original LAL method includes Webb’s methodology for Criterion 3: Content Coverage. HumRRO adapted the LAL method9 to best fit FDOE’s data analysis needs and substituted the HumRRO alignment methodology for Webb’s methodology in Criterion 3. The criteria are listed below:

9 The full LAL method contains an additional criterion. Criterion 1: Academic evaluates whether the content is academic and includes the major domains/strands of the content area. As alternate assessments have progressed, this criterion is no longer of added value. Thus, we did not ask panelists to rate tasks on this criterion and do not refer to it in the report.


Criterion 1: Age Appropriate – The content is referenced to the student’s assigned grade-level (based on chronological age).

Criterion 2: Standards Fidelity - 2a: Content Centrality – The target content of the APs maintain fidelity with the

content of the original grade-level standards. - 2b: Performance Centrality – The focus of achievement of the APs maintain

fidelity with the specified performance in the grade-level standards.

Criterion 3: Content Coverage – (HumRRO Alignment Method, described in more detail on the following page) - 3a: Content Representation – Items represent AP content. - 3b: Category Representation – Items represent content categories. - 3c: Depth of Knowledge (DOK) Representation – Item DOK represent content

APs. - 3d: Category Reporting – Reporting categories are sufficient measured.

Criterion 4: Content Differentiation – The level of differentiation of content across grade-levels within a grade span panel group.

Criterion 5: Achievement – The expected achievement provides the students an adequate opportunity to show learning of grade referenced academic content.

Criterion 6: Performance Accuracy – The potential barriers to demonstrating what students know and can do are minimized in the assessment to increase measurement accuracy of student performance.

The LAL method is appropriate for alignment of the APs to the corresponding LAFS, and NGSSS for Social Studies, as well as for alignment of the FSAA-PT to APs. Table 4 shows which of the LAL criteria are appropriate for each evaluation. An IR denotes the criterion data was obtained from individual panelist ratings while a CR denotes the criterion data was obtained through a consensus group rating where panelists collectively determined the response.

Table 4. LAL Criteria for AP and FSAA-PT Alignment Evaluation

Alignment Criterion 1 Criterion 2 Criterion 3

Criterion 4 Criterion 5 Criterion 6 2a 2b 3a 3b 3c 3d

APs to Standards

IR IR IR CR IR

FSAA-PT Items to APs

IR IR IR IR IR CR CR IR

Criterion 3: Content Coverage using HumRRO Alignment Method. HumRRO has used this method in previous alignments of alternate assessments for the Minnesota Department of Education in 2013 and 2014 as well as the Indiana Department of Education in 2015. The method borrows much from the Webb (1997, 1999, 2005) alignment methodology, but diverges in key ways that include:

Instructed reviewers to determine whether they agree with item writers’ link to the standards instead of having reviewers provide independent ratings (allowing comparison of reviewers’ judgements to item writers).

Instructed reviewers to assign an overall degree of alignment rating to ascertain if assessments adequately capture the intended content.


Criterion 3a: Tasks Represent Intended Content. This is a basic measure of alignment between APs and tasks. Simply stated, this criterion is a check of the AP, assigned to each task during the item writing process, by a group of independent panelists that were not involved in the item writing process. Using a previously developed rating scale, panelists rated task alignment as (1) not aligned, (2) partially aligned, or (3) strongly aligned. For ratings of 1 or 2, panelists provided an explanation for why the task is poorly aligned or unrepresented within the indicated content and identified another AP to which the task is better aligned, if applicable. We reported the proportion of tasks with each rating. Tasks with ratings of 1 or 2 were identified for scrutiny by FDOE and/or the testing contractor in a confidential, supplemental appendix.

In addition to the ratings, the total number of APs indicated in the test specifications were compared to the task results to verify that a range of APs was being assessed by the test.

Criterion 3b: Tasks Represent Intended Categories For this criterion, we compared the expected distribution of tasks by reporting category (e.g., Origin and Purposes of Law and Government; Roles, Rights, and Responsibilities of Citizens; Government Policies and Political Processes; Organization and Function of Government), as presented in the test specifications, to the actual proportion found on each test. We report acceptability in terms of meeting test blueprint requirements.

Criterion 3c: Task DOK Represent Alternate Standards. This measure is a comparison of the DOK ratings assigned by panelists to FSAA-PT tasks and the APs linked to that task (HumRRO Criterion 1). The DOK ratings assigned to the APs was completed as one of the panelists’ first alignment process steps.

Since the FSAA-PT Test Specifications do not contain ranges for the proportion of tasks at each DOK level, the recommended level of cognitive complexity is taken from Webb’s alignment criteria (2005). For ratings of acceptable, 50% of tasks must be rated at the same or higher DOK level as the APs.

Criterion 3d: Item Sufficiency for Category Reporting. This is a measure of the extent to which reporting categories are sufficiently measured. In contrast to the other criteria, student assessment data is used to inform this criterion. Specifically, we conduct psychometric analyses to determine if the category reporting practices can be supported by evidence of factor structure and reliability estimates rather than simply requiring a minimum number of items per reporting category. Criterion 3d is not included in this alignment study due to the smaller student sample, reduced number of items, and potential variances in assessment administration for the alternate assessment.

Scope of Alignment Evaluations

Two different types of alignment evaluations were performed for this study: (a) the APs linked to the LAFS and NGSSS for Social Studies and (b) the writing prompt portion of the ELA, civics, and US history FSAA-PT tasks linked to the APs in writing, civics, and US history. Both alignment evaluations were conducted using Florida educators and HumRRO staff familiar with alignment studies.

Training

An essential aspect of alignment is training for both HumRRO facilitators and panelists so they are familiar with the methodology. Alignment workshops do not occur weekly nor are all studies


exactly the same, so it is important to train even experienced alignment facilitators and panelists for the nuances of each study.

Facilitators attended a 2-hour training session that included a presentation of the Florida assessment system, the alignment process steps, and examples of the rating documents panelists would use. The alignment steps for facilitators were summarized in a Facilitator Instructions document. Facilitators participated in a detailed walk-through of the document and specific procedural and anecdotal guidance that could/should be provided to panelists was highlighted.

Panelists’ training was conducted in two ways at the workshop: (1) alignment familiarization training on Day 1 of the workshop as a full group, and (2) targeted procedural training in their panel groups prior to starting each alignment task. The full group training focused on the Florida assessment system and included information specific to the FSAA-PT requirements, the APs, and recent changes that required the current alignment study. The training also covered the roles of FDOE, Measured Progress, HumRRO, and panelists; the definition of alignment; why alignment is important; the alignment process; cognitive complexity; and the rating forms used in the study. The in-group training focused on specific task processes, rating definitions, and calibration activities to reinforce panelists’ shared understanding.

During the general and targeted training, panelists were reminded that their role was to provide their independent judgements using their expert knowledge.

Panelists

Panelists were recruited by FDOE, Measured Progress, and HumRRO from a database of Florida educators, both general education and Exceptional Student Education (ESE) teachers, provided by FDOE and Measured Progress. Each of the five panels (writing grades 4-5, writing grades 6-8, writing grades 9-10, and civics and US history EOC) consisted of a combination of special education teachers and general education teachers or content specialists; each group had at least one special education teacher and at least one general education teacher. Panelists were assigned to groups based on their experience in the subject area and grade level. Table 5 presents the characteristics of the panelists.

Table 5. Professional and Demographic Characteristics of Panelists

Panel Experience Current Position Gender Ethnicity Current Position

Less

than

1

year

1-5

year

s

6-15

ye

ars

Mor

e th

an

15 y

ears

Fem

ale

Mal

e

Whi

te

Bla

ck

His

pani

c

ES

Eb

Tea

cher

Gen

Ed

Tea

cher

Writing Gr 4-5 1 2 1 3 1 2 1 1 2 2

Writing Gr 6-8a 1 2 1 4 3 1 3

Writing Gr 9-10a 2 2 3 1 2 2 1 3

Civics 2 2 2 2 4 1 3

US History 1 2 1 3 1 3 1 2 2

Total: 1 6 9 4 15 5 14 4 1 7 13 a One panelist did not provide ethnicity. b Exceptional Student Education (ESE).


Materials

Panelists received paper copies of the FSAA-PT to review. They were also provided paper and electronic copies of various resource materials such as the APs, presentation rubric, DOK definitions, and Panelist Instructions to support their evaluation. The panelists used electronic rating forms, in Microsoft Excel. Examples of rating forms and panelist instructions are presented in Appendix A.

Test Forms. There were four alternate forms (Form A – Form D) of the FSAA-PT, which included a common set of items and a set of field test items some of which were on multiple forms. Panelists reviewed all the common FSAA-PT tasks in civics and US history and only the two writing prompts associated with the ELA assessment.

Panelist Instructions and Rating Forms. Panelists were given a Panelist Instructions document listing their alignment tasks, as well as rating codes and code definitions (see Appendix A). The rating forms were Excel documents and panelists completed two individual tasks while the other tasks were consensus during the 2-day workshop (see Appendix A).

Procedures

HumRRO conducted the alignment workshop on June 21-22, 2017 in Jacksonville, Florida. The workshop began with a general session to introduce HumRRO staff, review reimbursement logistics, read and sign affidavits of nondisclosure for the secure materials panelists would review, and conduct 30 minutes of general training. In both the general session and in each panel group, panelists were informed that the alignment reviews were independent from FDOE and Measured Progress, the testing vendor.

Following the general session, panelists began working in their panel groups. US history and civics panel groups were located in a separate room free from other groups and distractions. Writing 4-5, 6-8, and 9-12 panel groups were located in one room, since their instructions were similar and could be provided to them at the same time. A HumRRO facilitator was assigned to each of the panel groups, and the HumRRO project director supported the facilitators by answering questions and providing further guidance if needed. The project director also made certain that the different groups retained their shared understanding of the alignment method and tasks. Panelists received detailed training on rating procedures by the facilitator responsible for leading the group through each alignment step as listed in Table 6.

Table 6. Alignment Steps for Panelists’ Ratings

Step Alignment Step Description

1 LAFS & NGSSS DOK (consensus)

2 LAFS & NGSSS AP DOK (consensus)

3 AP to LAFS/NGSSS alignment

4 AP content differentiation (if applicable) (consensus)

5 FSAA-PT item review

6 FSAA-PT content differentiation (consensus)

7 Whole test (consensus)

8 Student learning review (consensus)


To begin, the facilitator gave a brief introduction and had all panelists introduce themselves, where they are from, and what they teach. The facilitator provided panelists with the Panelist Instructions document (see Appendix A), writing specific LAFS or NGSSS in Social Studies specific to the panel group, a Depth of Knowledge reference guide specific to the subject area and provided by Measured Progress (see Appendix A), the APs specific to the panel group, electronic Excel files on individual computers, all assessment materials for each grade/subject being reviewed (Test Booklet, Response Booklet, and Passage Booklets [writing only]), and a Presentation Rubric reference sheet provided by Measured Progress (see Appendix A). A single copy of the Test Administration Manual was available in each panel group.

Throughout the workshop, facilitators offered general suggestions and comments when appropriate on procedural concerns; however, they emphasized they would not get involved in determining the ratings since the panelists are valued as the content experts. Before each alignment step was conducted, facilitators trained panelists on the purpose of the step, the rating code definitions, and entering data in the appropriate rating form. Before allowing panelists to work independently on certain tasks, facilitators had panelists complete the first two to three ratings as a group to ensure that everyone understood the task and rating code definitions. Additionally, facilitators conducted periodic consistency checks to ensure that panelists were continuing to understand the task. If ratings varied widely across panelists, then the facilitator would review the task and rating code definitions and inform panelists to alter their ratings only if the panelist felt they were misinterpreting the task and/or rating code definitions.

The first alignment step was to assign a DOK level to the writing specific LAFS or NGSSS for Social Studies that were linked to the corresponding APs. The second step was similar, assigning a DOK level to the APs for each grade/subject listed in the FSAA-PT Test Specifications. Both steps were completed as consensus ratings. For each step, panelists first assigned DOK ratings, independently. Panelists then discussed their ratings and determined a consensus DOK rating for each writing specific LAFS or NGSSS for Social Studies and the corresponding APs. If full group consensus could not be reached, then the DOK agreed upon by the majority of panelists was recorded as consensus.

Next, panelists evaluated the APs on a variety of factors. The first rating was to indicate if the AP content was fully aligned with the linked writing LAFS or NGSSS for Social Studies listed. If not, panelists were asked to explain what content was missing and provide another standard if it was better linked. Additional factors for rating APs included: (a) whether the AP matched the measure of student performance expected in the writing LAFS or NGSSS for Social Studies, (b) whether the AP was appropriate for the chronological age at which it was measured, and (c) whether the content expectation of the AP was accessible to various disability groups. These ratings were made individually; no consensus ratings were obtained.

Panelists then evaluated the APs for differentiation of breadth, depth, prerequisite knowledge, and new knowledge across grades, step 4. This step was only applicable for the panel groups evaluating writing grade 4-5 and writing grade 6-8. Panelists indicated if they found clear, limited, partial, or no differentiation across the grades they reviewed and provided comments regarding their reasoning for their response, with evidence. This task was completed as a consensus rating among panelists.

For step 5, panelists conducted an evaluation of the FSAA-PT tasks on several factors similar to the AP review. FSAA-PT tasks were linked to APs during the item development process, and reviewers were asked to rate how well the task content was aligned (not, partially, fully) with the assigned AP. If they indicated the alignment was partial or not aligned, panelists were asked to


describe their reasoning and provide another AP they felt was better linked with the task. Panelists continued the FSAA-PT task review by (a) verifying the complexity levels (DOK, Volume of Information [VI], Vocabulary [V], & Context [C]) assigned to the task, (b) whether the task measured student performance of the AP, (c) whether the task was appropriate for the chronological age at which it was measured, and (d) whether the task could be modified or supported without changing the meaning or difficulty.

Steps 6, 7, and 8 were completed as a consensus rating. In step 6, the content differentiation was conducted using the same dimensions and rating levels as the AP review in Step 4. However, for civics and US history panel groups were asked to complete this step with the focus being on the progression of task 1 to task 3 within an item set on the complexity levels. For writing, since there was no progression in difficulty from task 1 to task 5 in prompt 1, the panelists were asked to evaluate content differentiation between grades. Step 7 provided a ‘Whole Test’ rating in which panelists were asked to determine if, overall, barriers existed for some students (i.e., blind, deaf) to demonstrate learning on the FSAA-PT. Lastly, panelists evaluated student learning by providing ratings on the level of inference that can be made about students based on the score they receive, or if the score may be more a result of the teachers or assessment program. As with all the alignment steps, panelists were encouraged to provide comments if they rated a task low on any dimension.

Workshop Progress

The first day consisted of panelists providing DOK consensus for all the writing LAFS and NGSSS for Social Studies, as well as the corresponding APs. All APs were reviewed and rated for step 3. On day 2, panelists completed the remaining tasks (4-8). Table 7 shows the steps that were completed by each panel group.

Table 7. Alignment Steps Completed by Each Panel Group June 21-22, 2017

Panel Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8

Writing Gr 4-5

Writing Gr 6-8

Writing Gr 9-10 NA

Civics NA

US History NA


Chapter 3: Alignment of Access Points to Standards

Overview of Access Points

The first challenge for evaluating the alignment of any alternate assessment to traditional standards is to define what the alternate assessment purposefully measures, versus what is intentionally omitted from the assessment. The FSAA-PT is designed from a test blueprint specifying the assessed standards items should measure. Items are written to address these blueprint standards, and the blueprint standards are a subset of APs. In this alignment study, panelists evaluated the APs associated with the writing LAFS and NGSSS for Social Studies in the FSAA-PT Test Specifications.

The assessable APs listed in the FSAA-PT Test Specifications is a subset of the grade-level APs, as indicated in Tables 8 and 9 below. Roughly 21 – 44% of the available writing APs are eligible for use in the writing assessments, while 58% and 30% are represented on civics and US history, respectively. The assessable APs are typically selected to represent the most important or key aspects of the content, to be accessible to the widest possible group of students, and to provide the most actionable test results for alternate assessment students and educators.

Table 8. Number of Blueprint Standards Compared to APs for Writing

Grade Number of

Assessable APs Total Number of

Writing APs Percent of APs Represented

Grade 4 7 34 20.59%

Grade 5 15 38 39.47%

Grade 6 17 41 41.46%

Grade 7 9 43 20.93%

Grade 8 18 43 41.86%

Grade 9 20 45 44.44%

Grade 10 17 45 37.78%

Table 9. Number of Blueprint Standards Compared to APs for Social Studies

Grade Number of

Assessable APs Total Number of APs

Percent of APs Represented

Civics 23 40 57.50%

US History EOC 25 82 30.49%

LAL Criteria

For the alignment of APs to Standards, four of the six LAL criteria are suitable: age appropriateness, standards fidelity, content differentiation, and performance accuracy. The remainder of this chapter will highlight the results of these four LAL criteria.


Criterion 1: Age Appropriateness

Criterion 1 pertains to the developmental level of the content included in the APs. For this evaluation, panelists were asked to individually determine whether the content of the APs is appropriate for the age and grade-level indicated. Several response options were possible:

Adapted = Linked to grade-level content Neutral = Content is not age-bound and is appropriate at any age Inappropriate = Content is off-grade level For this criterion, at least 90% of the APs should be rated as ‘adapted’ or ‘neutral’10. As seen in Tables 10 and 11, 100% of the APs were rated as ‘adapted’ or ‘neutral’ for all subjects and grade levels.

Table 10. Percent of Writing APs Rated as Age Appropriate

Grade N N

% Inappropriate % Neutral % Adapted Raters APs

Grade 4 4 7 0.00 75.00 25.00

Grade 5 4 13 0.00 75.00 25.00

Grade 6 4 17 0.00 100.00 0.00

Grade 7 4 9 0.00 100.00 0.00

Grade 8 4 18 0.00 100.00 0.00

Grade 9 4 20 0.00 0.00 100.00

Grade 10 4 17 0.00 0.00 100.00

Table 11. Percent of Social Studies APs Rated as Age Appropriate

Grade N N

% Inappropriate % Neutral % Adapted Raters APs

Civics 4 23 0.00 0.00 100.00

US History 4 25 0.00 44.00 56.00

Criterion 2a: Content Centrality

Panelists were asked to indicate, individually, whether the AP content fully linked to the writing LAFS or NGSSS for Social Studies associated with the AP. For every AP not fully linked to the content standard, panelists provided an explanation of the content missing from the standard and identified an alternate content standard, if applicable. For this criterion, at least 90% of the APs should be rated as ‘yes, the AP content fully links to the writing LAFS or NGSSS for Social Studies’.11

Tables 12 and 13 show the relationship between the APs and the writing LAFS and NGSSS for Social Studies. For writing across all grades, 100% of the blueprint APs contained content fully

10 The LAL method does not specify a minimum for Criterion 1. This minimum level was established by HumRRO. 11 The LAL method does not specify a minimum for Criterion 2a. This minimum level was established by HumRRO.


linked to the writing LAFS. In civics and US history, 99% of the blueprint APs contained content fully linked to the NGSSS for Social Studies.

Table 12. Percent of Writing APs Linked to On-Grade Level Writing LAFS

Grade N

Raters

N % Yes % No

APs

Grade 4 4 7 100.00 0.00

Grade 5 4 13 100.00 0.00

Grade 6 4 17 100.00 0.00

Grade 7 4 9 100.00 0.00

Grade 8 4 18 100.00 0.00

Grade 9 4 20 100.00 0.00

Grade 10 4 17 100.00 0.00

Table 13. Percent of Social Studies APs Linked to On-Grade Level NGSSS for Social Studies

Grade N

Raters

N % Yes % No

APs

Civics 4 23 98.91 1.09

US History 4 25 99.33 0.67

Criterion 2b: Performance Centrality

The APs should link to the writing LAFS and NGSSS for Social Studies in performance expectations as well as content, although the depth of these expectations can be reduced for the alternate assessment. Several analyses were conducted to compare the performance levels specified in the APs to the writing LAFS and NGSSS for Social Studies. One analysis focused on the depth of knowledge (DOK) ratings. Panelists worked together to achieve consensus DOK ratings on the APs and the writing LAFS and NGSSS for Social Studies separately. These ratings were analyzed for comparability.

We compared the DOK ratings of the APs from the FSAA-PT Test Specifications to the ratings given to the corresponding writing LAFS and NGSSS for Social Studies. Tables 14 and 15 present the percentage of APs per grade-level/subject rated as expecting performance at the same level, or higher or lower levels, as the writing LAFS and NGSSS for Social Studies. Although there is no minimum level of acceptable overlap in DOK established by the LAL criteria, there is an assumption that APs should be skewed to lower cognitive complexity than the state standards (Flowers et. al, 2007). It may be reasonable, then, to expect that as many as half of the APs would require students to demonstrate performance at a lower level than the state standards. On the other hand, it would be problematic to find several APs with performance expectations at a higher level than the writing LAFS and NGSSS for Social Studies.

Across all content areas, at least 84% of the APs were given a DOK level at the same or lower level than the corresponding writing LAFS and NGSSS for Social Studies. In civics, 16% of APs were assigned higher levels of cognitive complexity than the corresponding NGSSS. Some of


the writing APs were also assigned a higher level of complexity than the state standards, most notably in grade 7 where 11% of the writing APs were assigned higher levels of cognitive complexity than the corresponding writing standards.

Table 14. Percent of Writing APs at Lower, Same, or Higher Levels of Complexity Compared to Related Writing Standards

Grade N APs % Lower % Same % Higher % Same or Lower

Grade 4 7 57.14 42.86 0.00 100.00

Grade 5 15 46.67 53.33 0.00 100.00

Grade 6 17 76.47 17.65 5.88 94.12

Grade 7 9 77.78 11.11 11.11 88.89

Grade 8 18 88.89 5.56 5.56 94.45

Grade 9 20 75.00 20.00 5.00 100.00

Grade 10 17 75.00 25.00 0.00 100.00

Table 15. Percent of Social Studies APs at Lower, Same, or Higher Levels of Complexity Compared to Related NGSSS for Social Studies

Grade N APs % Lower % Same % Higher % Same or Lower

Civics 23 50.72 33.33 15.94 84.05

US History 25 82.67 16.00 1.33 98.67

We also asked panelists to directly compare the written performance expectations in the APs with the associated writing LAFS and NGSSS for Social Studies. Panelists, individually, evaluated the language of each AP to decide whether the expectations are the same, partially similar, or differ entirely from what is expected in the corresponding writing LAFS and NGSSS for Social Studies. For example, if an NGSSS for Social Studies expects students to ‘identify and explain’, while the AP asks students to ‘identify’ only, these expectations are rated as partially similar. When students are asked to ‘distinguish between’ in the writing LAFS, but the AP requires students to ‘recognize’, then the expectation for demonstrating knowledge is different. Tables 16 and 17 show the results of this comparison. At least 90% of the APs should be rated as ‘Some’ or ‘All’ compared with the state standards.

In all grades and subjects, panelists rated 99% or more of the APs as having some or all of the same performance expectations as the corresponding writing LAFS and NGSSS for Social Studies.


Table 16. Percent of Linked APs at Various Levels of Performance Centrality – Writing

Grade N

Raters

N % None % Some % All

Aps

Grade 4 4 7 0.00 0.00 100.00

Grade 5 4 13 0.00 0.00 100.00

Grade 6 4 17 0.00 79.41 20.59

Grade 7 4 9 0.00 91.67 8.33

Grade 8 4 18 0.00 83.33 16.67

Grade 9 4 20 0.00 0.00 100.00

Grade 10 4 17 0.00 0.00 100.00

Table 17. Percent of Linked APs at Various Levels of Performance Centrality – Social Studies

Grade N

Raters

N % None % Some % All

APs

Civics 4 23 1.09 94.57 4.35

US History EOC 4 25 1.00 57.67 41.33

Criterion 3: Content Coverage – HumRRO Alignment Method

Since the content coverage criterion focuses on the relationship between items and APs regarding content, category, and DOK representation, Criterion 3: Content Coverage is not applicable to the AP to Standards evaluation.

Criterion 4: Content Differentiation

This criterion focuses on whether the content expectations change appropriately between grade-levels within a grade span panel group. For this reason, the evaluation of content differentiation involves a comparison between grade-level content expectations. Panelists in the writing grade 4-5 and grade 6-8 panel groups were asked to review the APs of the grades they were evaluating, and determine a consensus rating of the extent to which higher grade-levels evidenced broader, deeper, and newer knowledge, as well as growth on prerequisite skills (see Appendix A for a more detailed explanation of the categories). For each category in Table 18, panelists came to a consensus as to whether the content differentiation of the APs between grades was clear, partial, limited, or there was none. According to the LAL method, content expectations should show evidence of at least partial differences in content between grades on the dimensions of Broader, Deeper, Prerequisite, and New. After panelists evaluated the four categories, they were asked to give an overall yes/no rating of whether the content expectations between grades were identical. A rating of ‘yes’ (they are identical) would suggest there are generally no increases or changes in the expectations between grade-levels. Thus, a rating of ‘No’ would be preferable.

As Table 18 exhibits, the degree of content differentiation varies across dimensions and grade-levels. The LAL method suggests that all ratings indicating differentiation exists (clear, partial, or limited) indicate acceptability for each category., Because the standards are identical for writing grades 9 and 10 and civics and US history are EOC assessments, this evaluation step was


completed for writing grades 4-5 and 6-8 only. For writing grades 4-5, panelists found content differentiation to be limited in all areas (breadth, depth, prerequisite, new learning), and consequently rated the APs to be identical between the grades. For writing grades 6-8, the panelists found no differentiation in breadth between any of the three grades, limited differentiation in new learning between grades 7 and 8, and partial differentiation in depth and prerequisite. They concluded that AP 2.4 is identical across the grades.

Table 18. Consensus AP Content Differentiation – Writing

Grades Reviewed

Category Rating Rating Support

4 – 5

Broader L

There is limited increase in breadth between grades 4 and 5. We have taken the same concept and added another component to it. This applies to one access point (AP LAFS5.W.1.AP.2b and AP LAFS4.W.1.AP.2b) but not to the majority of the access points. Organizing is the deeper concept. The addition of entertainment in AP LAFS5.W.1.AP.4b is abstract - moving away from purely concrete ideas.

Deeper L

There is an increase in complexity between grades 4 and 5. We have taken the same concept and added another component to it. This applies to one access point (AP LAFS5.W.1.AP.2b and AP LAFS4.W.1.AP.2b) but not to the majority of the access points. Organizing is the deeper concept. The addition of entertainment in AP LAFS5.W.1.AP.5b is abstract - moving away from purely concrete ideas.

Prerequisite L They are not prerequisite skills because they are the same skills from grade 4 APs to grade 5.

New L

There are new skills and strategies mentioned at grade 5 that are not in grade 4 (AP LAFS5.W.1.AP.2b and AP LAFS4.W.1.AP.2b). Otherwise, the access points are nearly identical.

Identical Y

The standards are identical, with the exception of added complexity between AP LAFS5.W.1.AP.2b and AP LAFS4.W.1.AP.2b, and the added complexity of entertainment in LAFS5.W.1.AP.4b.

6 – 8

Broader N For standard 2.4 in grade 6, 7, & 8 there is no differentiation - they are identical; other APs only get deeper and not broader - additional text types are not added

Deeper P in 1.1 the claims and counter claims become more specific across the grades; in 1.2 transitions are deeper because there is mastery of transitions as the standards move up the grades

Prerequisite P 1.1 and 1.2 build on each other as move up in grade level

New L there is an added task from 6th - 7th grade, "identify claims" to "identify and acknowledge claims" in 1.1; this doesn't occur between 7th and 8th grades

Identical Y 2.4 is identical across grades 6 - 8

Criterion 5: Achievement

Criterion 5: Achievement focuses on the degree to which the assessment provides evidence of a student’s ability to demonstrate what they know and can do on grade referenced academic


content. Thus, this criterion is not applicable to the evaluation of the AP to Standards relationship.

Criterion 6: Performance Accuracy

Panelists, individually, evaluated whether students could reasonably demonstrate the content and performance expected in the APs. In general, for alternate standards and assessments, it is expected that teachers and test administrators can modify the content to instruct and assess students at the appropriate level based on their Individual Education Plans (IEPs). Panelists rated the general accessibility to students based on various types of disabilities. For example, can students with visual impairments, an inability to follow instructions, or need for assistive technology demonstrate the knowledge expected by the APs? Panelists provided a simple ‘yes’ (accessible to all) or ‘no’ (not accessible to some groups) response to indicate their judgments. Tables 19 and 20 include the percent of APs judged as accessible to all groups. At least 90% of the APs should be rated as ‘Yes.’

Across all grades and subjects, panelists rated nearly 100% of the APs as accessible by a wide range of students with different physical and cognitive disabilities.

Table 19. Percent of APs Rated as Accessible to Different Disability Groups – Writing

Grade N N

% Yes % No Raters APs

Grade 4 4 7 100.00 0.00

Grade 5 4 13 100.00 0.00

Grade 6 4 17 100.00 0.00

Grade 7 4 9 100.00 0.00

Grade 8 4 18 100.00 0.00

Grade 9 4 20 100.00 0.00

Grade 10 4 17 100.00 0.00

Table 20. Percent of APs Rated as Accessible to Different Disability Groups – Social Studies

Grade N N

% Yes % No Raters APs

Civics 4 23 99.64 0.36

US History 4 25 100.00 0.00


Chapter 4: Alignment of FSAA-PT Tasks to APs

In this chapter, we report on the results of panelists’ ratings on the FSAA-PT tasks in the writing prompt portion of the ELA per grade assessment as well as civics, and US history End of Course (EOC). We present the results on the LAL Criteria 1 through 6. In general, and unless otherwise specified, at least 90% of FSAA-PT tasks must achieve acceptable ratings to demonstrate linkage to grade-level content for each LAL criterion.

As a reminder, the FSAA-PT for civics and US history consists of 16 item sets, containing three tasks each and purportedly measuring one AP. The first task is written to the Participatory AP, the second task is written to the Supported AP, and the third task is written to the Independent AP. The student’s teacher has the ability to scaffold the first task by reducing the response options if the student does not respond correctly. The second and third tasks do not allow for scaffolding if the student responds incorrectly. The writing section of the ELA assessment consists of two different prompts. The first prompt consists of five selected-response questions associated with a passage, and the second prompt consists of a single open-response format prompt associated with another passage. Scaffolding is not allowed in the writing assessment. Unless otherwise stated, the results presented are across all tasks regardless of the item set or prompt.

Throughout this chapter, the column ‘N Raters’ will denote the total number of panelists used in the analyses, while the ‘N Tasks’ column shows the range of tasks that panelists evaluated. If a panelist was not able to review a task or skipped a task, the total number of tasks, for a particular panelist, equals the number of tasks actually evaluated by the panelist and not all of the tasks.

LAL Criteria

Criterion 1: Age Appropriateness

Panelists, individually, evaluated the FSAA-PT tasks on whether the content and task assessed students at an appropriate level linked to their assigned grade. Tables 21 and 22 display the percentage of tasks judged as adapted (linked on-grade level), inappropriate (off-grade), and neutral (not age-bound). For acceptable linkage, at least 90% of tasks must be judged ‘adapted’ or ‘neutral.’ In this case, all of the FSAA-PT tasks across subjects and grades were rated by panelists as being either adapted or neutral.

Table 21. Percent of Writing Tasks Rated as Age Appropriate

% of Tasks Rated as

Grade N N

Inappropriate Neutral Adapted Raters Tasks

Grade 4 4 6 0.00 29.17 70.83

Grade 5 4 6 0.00 25.00 75.00

Grade 6 4 6 0.00 100.00 0.00

Grade 7 4 6 0.00 100.00 0.00

Grade 8 4 6 0.00 100.00 0.00

Grade 9 4 6 0.00 0.00 100.00

Grade 10 4 6 0.00 0.00 100.00


Table 22. Percent of Social Studies Tasks Rated as Age Appropriate

% of Tasks Rated as

Grade N

Raters N

Tasks Inappropriate Neutral Adapted

Civics 4 48 0.00 0.00 100.00

US History 4 48 0.00 52.60 47.40

Criterion 2a: Content Centrality

Since panelists were provided the AP linked to the FSAA-PT task, a content centrality rating was not made. Instead, the task content match to the assigned AP was evaluated as part of criterion 3 below.

Criterion 2b: Performance Centrality

In addition to the targeted content, the FSAA-PT tasks should retain the performance intended by the APs to some extent. For example, if the AP requires students to ‘compare and contrast’ content, the task should necessitate students make some type of distinction. Tables 23 and 24 show the mean number of tasks rated, individually by panelists, as retaining all (same performance), some, or none of the performance expectations of the corresponding APs. At least 90% of tasks should receive ratings of ‘some’ or ‘all.’

For the majority of grades and subjects, panelists rated the number of FSAA-PT tasks as surpassing the 90% minimum level of acceptability for performance centrality. For civics and US history, panelists rated all tasks as measuring the same performance level of the AP. However, panelists rated 12.5% of grade 6 and 7 writing tasks as not having the same performance expectation as the corresponding AP. Panelists in the writing group stated that the tasks did not require students to perform to the full extent of the associated AP.

Table 23. Percent of Writing Tasks at Various Levels of Performance Centrality

% of Tasks Rated as

Grade N

Raters N

Tasks None Some All

% of Tasks Rated as All or Some

Grade 4 4 6 0.00 8.33 91.67 100.00

Grade 5 4 6 4.17 4.17 91.67 95.84

Grade 6 4 6 12.50 20.83 66.67 87.50

Grade 7 4 6 12.50 16.67 70.83 87.50

Grade 8 4 6 4.17 20.83 75.00 95.83

Grade 9 4 6 0.00 0.00 100.00 100.00

Grade 10 4 6 0.00 0.00 100.00 100.00


Table 24. Percent of Social Studies Tasks at Various Levels of Performance Centrality

% of Tasks Rated as

Grade N

Raters N

Tasks None Some All

% of Tasks Rated as All or Some

Civics 4 48 0.00 0.00 100.00 100.00

US History 4 48 0.00 0.00 100.00 100.00

Criterion 3a: Tasks Represent Intended Content

Panelists did not identify an AP for each FSAA-PT task. Instead, panelists, individually, verified that the AP assigned to the task by item writers was an accurate match. Panelists gave each task and matching AP a rating of (1) not aligned, (2) partially aligned, or (3) fully aligned. The cross-tabulation of the ratings is presented below in Tables 25 and 26. To estimate the approximate number of tasks assigned a specific rating, we divided by the number of panelists who provided ratings at the task level. At least 90% of tasks should receive ratings of ‘partially’ or ‘fully’ aligned.

More than 90% of tasks were rated as either partially or fully aligned to the indicated AP. In general, these results indicate that the FSAA-PT tasks are assessing the intended APs.

Table 25. Writing Task Alignment Ratings

% of Tasks Rated as

Grade N

Raters N

Tasks Not Aligned

Partially Aligned

Fully Aligned % of Tasks Rated

as Fully or Partially Aligned

Grade 4 4 6 0.00 4.17 95.83 100.00

Grade 5 4 6 4.17 0.00 95.83 95.83

Grade 6 4 6 0.00 16.67 83.33 100.00

Grade 7 4 6 0.00 16.67 83.33 100.00

Grade 8 4 6 0.00 16.67 83.33 100.00

Grade 9 4 6 0.00 0.00 100.00 100.00

Grade 10 4 6 0.00 0.00 100.00 100.00

Table 26. Social Studies Task Alignment Ratings

% of Tasks Rated as

Grade N

Raters N

Tasks Not Aligned

Partially Aligned

Fully Aligned % of Tasks Rated

as Fully or Partially Aligned

Civics 4 48 2.08 0.00 97.92 97.92

US History 4 48 1.04 0.52 98.44 98.96

Criterion 3b: Tasks Represent Intended Categories

To address this criterion, we examined the distribution of FSAA-PT item sets aligned by AP and compared it to the target stated in the FSAA-PT Test Specifications.


Data for this analysis was provided through individual panelists’ evaluation of the task to AP alignment. Each AP is associated with a content category; thus, when panelists agreed with the AP paired with a task, or they proposed an alternate AP that better assessed the task, they also aligned tasks to content categories.

Table 27 below shows that the mean number of aligned writing tasks (partially or fully aligned rating) across panelists resulted in tasks that matched the target criterion stated in the FSAA-PT Test Specifications for the content category for all writing grades. Even though Table 25 shows 4% of the tasks in writing grade 5 are not aligned, alternate AP assigned still placed the task into the same content category as the one assigned.

Table 27. Mean Number of Aligned Writing Items by Content Category

Grade Reporting Category Genre Criterion

Mean SD (N of Tasks)

Grade 4 Text-based Writing Informative 6 6.00 0.00







Table 28 shows the mean number of aligned item sets (partially or fully aligned rating) across panelists resulted in item sets that generally matched the criterion percentage for each content category for civics and US history. In civics, panelists assignment of alternate APs resulted in item sets measuring the reporting category “Origin and Purposes of Law and Government” changing to the reporting category “Roles, Rights, and Responsibilities of Citizens.” There was a slight variation in US history in the reporting category “Late Nineteenth and Early Twentieth Century, 1860-1910”. Overall though, the item sets generally matched the target criterion in the FSAA-PT Test Specifications.

Table 28. Mean Number of Aligned Social Studies Items by Content Category

Grade Reporting Category Criterion

Mean SD (N of Item Sets)

Civics

Origin and Purposes of Law and Government 12 9.00 0.00

Roles, Rights, and Responsibilities of Citizens 12 15.00 0.00

Government Policies and Political Processes 12 12.00 0.00

Organization and Function of Government 12 12.00 0.00

Total Mean Number of Linked Item Sets 48.00 0.00

US History

Late Nineteenth and Early Twentieth Century, 1860-1910

15 14.75 0.50

Global Military, Political, and Economic Challenges, 1890-1940

18 18.00 0.00

The United States and the Defense of the International Peace, 1940-present

12 12.00 0.00

Introduced in all Reporting Categories 3 3.00 0.00

Total Mean Number of Linked Item Sets 47.75 0.50


Criterion 3c: Task DOK Represent Alternate Standards

The tasks on each assessment should reflect the range of cognitive complexity in the APs, as interpreted by the state. Since the FSAA-PT Test Specifications do not indicate an intended DOK target, this criterion will be assessed by evaluating the assigned DOK of a task, evaluating the distribution of DOK levels, and comparing the DOK level of the aligned tasks and APs.

Data for these analyses was provided through panelists’ DOK evaluations. Panelists used the following DOK levels while evaluating the tasks (see Appendix A for the complete LAL DOK level descriptions).

DOK 1 = Attention DOK 2 = Memorize/recall DOK 3 = Performance DOK 4 = Comprehension DOK 5 = Application DOK 6 = Analysis, Synthesis, Evaluation As panelists reviewed FSAA-PT tasks, they, individually, determined whether the DOK level assigned to each task during the item writing process matched the task, or whether it was too low or too high. Tables 29 and 30 summarize the percent of tasks (across panelists) assigned DOK levels lower, the same, or higher than the DOK level assigned to the task. It is reasonable to expect panelists to agree with 90% of the DOK levels assigned to tasks.

Across all grades and subjects, the majority of tasks met the expectation of 90% agreement between panelists and assigned DOK levels. However, writing grades 4 and 9 did not meet this expectation. For grade 4, panelists were able to confirm the cognitive complexity of only 80% of the tasks and only 65% in grade 9. For all subjects, panelists reported that the lower DOK level resulted mainly from the task requiring a simple inference or recall and not a further extension of drawing a conclusion. Note that while there were 2 prompts (prompt 1 includes 5 selected-response tasks and prompt 2 is an open-response) for writing in all grades, prompt 2 was not assigned a DOK by the prompt writer; therefore, we could not make a comparison between the levels of DOK assigned by the prompt writers and the panelists. Only prompt 1 (5 selected-response tasks) are included in these comparisons for writing. When the panelists rated the task DOK lower, they usually cited lack of inference in the task as the reason.

Table 29. Percent of Writing Tasks at Lower, Same, or Higher Levels of Complexity

% of Linked Tasks with

Grade N

Raters N

Tasks Lower

Complexity Same

Complexity Higher

Complexity

Grade 4 4 5 10.00 80.00 10.00

Grade 5 4 5 5.00 90.00 5.00

Grade 6 4 5 0.00 90.00 10.00

Grade 7 4 5 0.00 95.00 5.00

Grade 8 4 5 0.00 100.00 0.00

Grade 9 4 5 25.00 65.00 10.00

Grade 10 4 5 0.00 100.00 0.00


Table 30. Percent of Social Studies Tasks at Lower, Same, or Higher Levels of Complexity


Grade N N Lower Same Higher

Raters Tasks Complexity Complexity Complexity

Civics 4 48 1.56 98.44 0.00

US History 4 48 1.56 96.88 1.56

To examine the distribution of DOK levels across tasks, we used the DOK that panelists rated as the best fit for the item. This means that when evaluating whether an assigned task DOK level was too low, matched, or too high, panelists rated what they thought was the task DOK level. If panelists agreed with the assigned DOK level, then the task was given that DOK level. In contrast, if the panelist felt the assigned DOK level was too low or too high, we asked panelists to identify the DOK level that was more appropriate for the task. In determining the DOK distributions across tasks, the DOK level associated with a task for any one panelist could consist of DOK levels that are assigned to the task and DOK levels assigned by the panelist.

In writing (Table 31), most tasks, in general, were rated DOK levels 2, 3, and 4. None of the tasks were given a DOK level of 1 or 5, but a few tasks were given a DOK 6 in grades 9 and 10. While grades 8 and 9 writing had about 50% of tasks in DOK level 3, in the other grades the distribution was approximately even between DOK 2, 3, and 4.

Table 31. Distribution of Panelist DOK Ratings – Writing

Grade Statistic DOK 1 DOK 2 DOK 3 DOK 4 DOK 5 DOK 6

Grade 4

Mean 0.00 2.00 2.25 1.75 0.00 0.00

SD NA 1.63 2.06 0.50 0.00 NA

Percent 0.00 33.33 37.50 29.17 0.00 0.00

Grade 5

Mean 0.00 2.00 2.00 2.00 0.00 0.00

SD NA 0.00 0.82 0.82 NA NA

Percent 0.00 33.33 33.33 33.33 0.00 0.00

Grade 6

Mean 0.00 1.75 2.00 2.25 0.00 0.00

SD NA 0.50 0.00 0.50 NA NA

Percent 0.00 29.17 33.33 37.50 0.00 0.00

Grade 7

Mean 0.00 1.75 2.25 2.00 0.00 0.00

SD NA 0.50 0.50 0.00 NA NA

Percent 0.00 29.17 37.50 33.33 0.00 0.00

Grade 8

Mean 0.00 1.00 3.00 2.00 0.00 0.00

SD NA 0.00 0.00 0.00 NA NA

Percent 0.00 16.67 50.00 33.33 0.00 0.00

Grade 9

Mean 0.00 1.50 3.00 1.00 0.00 0.50

SD NA 0.58 0.82 0.82 NA 0.58

Percent 0.00 25.00 50.00 16.67 0.00 8.33

Grade 10

Mean 0.00 2.00 2.00 1.00 0.00 1.00

SD NA 0.00 0.00 0.00 NA 0.00

Percent 0.00 33.33 33.33 16.67 0.00 16.67


In civics and US history (Table 32), the majority of tasks were given a DOK level of 2, 3, or 4, with an approximately even distribution of tasks among those DOK levels. No items were rated as DOK level 1 or 6, and a few items were rated as DOK 5.

Table 32. Distribution of Panelist DOK Ratings – Social Studies

Grade Statistic DOK 1 DOK 2 DOK 3 DOK 4 DOK 5 DOK 6

Civics

Mean 0.00 16.00 15.00 14.75 2.25 0.00

SD NA 0.00 0.00 1.50 1.50 NA

Percent 0.00 33.33 31.25 30.73 4.69 0.00

US History

Mean 0.00 16.00 15.00 16.00 1.00 0.00

SD NA 0.00 1.15 1.15 0.00 NA

Percent 0.00 33.33 31.25 33.33 2.08 0.00

In addition to determining the agreement between panelists’ DOK ratings and the DOK levels assigned to tasks, we compared the DOK ratings panelists provided for the APs and FSAA-PT tasks to evaluate the degree of alignment between the cognitive expectations. Tables 33 and 34 summarize the percent of tasks (across panelists) which were assigned DOK levels that were lower, the same, or higher than the DOK level of the aligned AP. It is reasonable to expect 50% of the tasks to be at the same or higher complexity level as the corresponding AP.

Table 33. Percent of Writing Tasks at Lower, Same, or Higher Levels of Complexity Compared to Related APs


Grade N

Raters N

Tasks Lower

Complexity Same

Complexity Higher

Complexity

% of Linked Tasks with Same or Higher

Complexity

Grade 4 4 6 83.33 16.67 0.00 16.67

Grade 5 4 6 87.50 12.50 0.00 12.50

Grade 6 4 6 83.33 12.50 4.17 16.67

Grade 7 4 6 100.00 0.00 0.00 0.00

Grade 8 4 6 100.00 0.00 0.00 0.00

Grade 9 4 6 79.17 12.50 8.33 20.83

Grade 10 4 6 66.67 33.33 0.00 33.33

Table 34. Percent of Social Studies Tasks at Lower, Same, or Higher Levels of Complexity Compared to Related APs


Grade N

Raters

N Task

s

Lower Complexity

Same Complexity

Higher Complexity

% of Linked Tasks with Same or Higher

Complexity

Civics 4 48 4.17 54.17 41.67 95.83

US History 4 48 10.42 70.83 18.75 89.58

In civics and US history, the majority of tasks were rated as the same or higher complexity than the AP. However, the majority of writing tasks for all grades were rated as having lower


complexity than the AP. In fact, none of the writing grades met the 50% criterion, with grades 7 and 8 having no tasks with the same or higher complexity.

Besides DOK, FSAA-PT tasks are assigned three additional complexity ratings, Volume of Information, Vocabulary, and Context, according to the Presentation Rubric (see Appendix A).

Panelists, individually, evaluated whether the additional complexity ratings assigned to each task during the item writing process matched the task, or whether it was too low or too high. We would expect panelists to agree with at least 90% of each additional complexity rating associated with the FSAA-PT tasks to achieve acceptability. Note that while there were 2 prompts (prompt 1 includes 5 selected-response tasks and prompt 2 is an open-response) for writing in all grades, prompt 2 was not assigned a Volume of Information, Vocabulary, or Context by the prompt writer; therefore, we could not make a comparison between the levels of Volume of Information, Vocabulary, and Context assigned by the prompt writers and the panelists. Only prompt 1 (the 5 selected-response tasks) are included in these comparisons for writing. Tables 35 and 36 present average panelist agreement with the assigned task complexity rating for Volume of Information. The number of tasks with each rating were averaged across panelists and presented as percentages.

As seen in Tables 35 and 36, tasks in civics and US history met the expectation that 90% of the tasks were rated by panelists as at the same Volume of Information as assigned the task. All of the writing grades except for grade 4 (70%) and grade 9 (85%) met or exceeded the 90% agreement expectation.

Table 35. Percent of Writing Tasks at Lower, Same, or Higher Levels of Volume of Information


Grade N

Raters N

Tasks

Lower Volume of Information

Same Volume of Information

Higher Volume of Information

Grade 4 4 5 0.00 70.00 30.00

Grade 5 4 5 0.00 90.00 10.00

Grade 6 4 5 0.00 100.00 0.00

Grade 7 4 5 0.00 100.00 0.00

Grade 8 4 5 0.00 100.00 0.00

Grade 9 4 5 15.00 85.00 0.00

Grade 10 4 5 0.00 100.00 0.00

Table 36. Percent of Social Studies Tasks at Lower, Same, or Higher Levels of Volume of Information


Grade N

Raters N

Tasks

Lower Volume of Information

Same Volume of Information

Higher Volume of Information

Civics 4 48 0.00 100.00 0.00

US History 4 48 0.52 98.44 1.04


Tables 37 and 38 show the average panelist agreement with the assigned task complexity ratings for Vocabulary. The number of tasks with each rating were averaged across panelists and presented as percentages.

The majority of grades and subjects met the expectation that 90% of the tasks were rated by panelists as at the same Vocabulary as assigned the task. In civics and US history, panelists rated more than 90% of the tasks at the same Vocabulary level as assigned the task. All of the writing grades except for grade 4 and grade 9 (75%) exceeded the 90% agreement expectation.

Table 37. Percent of Writing Tasks at Lower, Same, or Higher Levels of Vocabulary


Grade N

Raters N

Tasks Lower

VocabularySame

VocabularyHigher

Vocabulary

Grade 4 4 5 0.00 75.00 25.00

Grade 5 4 5 0.00 100.00 0.00

Grade 6 4 5 0.00 100.00 0.00

Grade 7 4 5 0.00 100.00 0.00

Grade 8 4 5 0.00 100.00 0.00

Grade 9 4 5 10.00 75.00 15.00

Grade 10 4 5 0.00 100.00 0.00

Table 38. Percent of Social Studies Tasks at Lower, Same, or Higher Levels of Vocabulary


Grade N

Raters N

Tasks Lower

VocabularySame

VocabularyHigher

Vocabulary

Civics 4 48 0.52 99.48 0.00

US History 4 48 3.65 95.83 0.52

Tables 39 and 40 show the average panelist agreement with the assigned task complexity ratings for Context. The number of tasks with each rating were averaged across panelists and presented as percentages.

As seen in Table 39 and 40, tasks in civics, US history, and writing grades 5-10 exceeded the expectation that 90% of the tasks were rated by panelists as at the same Context level as assigned the task. In grade 4 writing, 75% of the tasks were rated as having the same context while the other 25% of tasks were rated as written to a higher Context level.


Table 39. Percent of Writing Tasks at Lower, Same, or Higher Levels of Context


Grade N

Raters N

Tasks Lower

ContextSame

ContextHigher Context

Grade 4 4 5 0.00 75.00 25.00

Grade 5 4 5 0.00 100.00 0.00

Grade 6 4 5 0.00 100.00 0.00

Grade 7 4 5 0.00 100.00 0.00

Grade 8 4 5 0.00 100.00 0.00

Grade 9 4 5 0.00 100.00 0.00

Grade 10 4 5 0.00 100.00 0.00

Table 40. Percent of Social Studies Tasks at Lower, Same, or Higher Levels of Context


Grade N

Raters N

Tasks Lower

ContextSame

ContextHigher Context

Civics 4 48 0.00 99.48 0.52

US History 4 48 3.13 95.31 1.56

Criterion 4: Content Differentiation

This criterion focuses on whether the content increases in depth, breadth, and complexity at higher grade-levels for FSAA-PT tasks. For the writing prompt portion of the ELA assessment, the comparison was made across grades. However, we modified this criterion to focus instead on the three tasks within each item set for civics and US history. The FSAA-PT is structured such that each item contains three tasks which are written in increasing complexity on at least one of the complexity levels (DOK, Volume of Information, Vocabulary, or Context). In the civics and US history panel groups, panelists were asked to review the item sets and rate the amount of content differentiation evident between the tasks. This comparison required a more global judgment of each set of item tasks. Tables 41 and 42 show consensus ratings among panelists across the categories using the rating scheme of clear, partial, limited, or no differentiation. Although no minimum level of differentiation has been established, the LAL method suggests that all ratings of differentiation (clear, partial, or limited) are acceptable across grade-levels for each category so this same principle will be applied in evaluating the three tasks within an item set.

For writing, the content differentiation was determined by looking at the prompts across grade levels. As seen in Table 41, panelists found a clear task differentiation for writing prompts in grade 9-10. However, partial, limited, or no content differentiation was found for the writing prompts in grades 4-5 and 6-8. For writing grades 6 and 7, panelists found no content differentiation in the FSAA-PT tasks.


Table 41. Consensus Content Differentiation Across Grades – Writing Prompt Portion of the ELA FSAA-PT

Grade Category Consensus Rating Consensus Rating Support

Grade 4-5

Broader Partial

Grade five introduces "reasons" versus grade four "details" in the teacher script (Q1 T1). However, this does not impact the breadth or depth of the question. Otherwise processes remain the same, "link" and "link".

Deeper Partial

Grade four Q1 T4 is simpler than a comparable question and grade five demonstrating a requirement of deeper mastery. The reading passage in grade five is longer, with more distractors and complexity. Sentences are longer in Q1 T5 grade five versus four.

Prerequisite Limited Grade four Q1 T4 is simpler than a comparable question and grade five thus building upon a skill.

New No

differentiationVocabulary is familiar in both passages, comparable processes used, some complexity is added.

Identical No Different reading passages, some additional complexity in wording, length of responses.

Grade 6-8

Broader No

differentiation

each item sticks with one source text for all grades; a simple 2 - 3 paragraph with a graphic is used at each grade level; there's no additional text, graphs, etc. that are being included as the grades increase - don't ask for making connections

Deeper Limited

in 6th/7th grade there is textual scaffolding in the writing prompt seen through the tasks but in 8th grade the scaffolding is done through the teacher presentation of the tasks

Prerequisite Limited 6th/7th grade provide the foundation for 8th grade but 6th & 7th grade do not build on each other

New No

differentiationsame concept is presented across all grades

Identical Yes

6th & 7th are identical regarding the presentation of tasks, the format of tasks, and same concepts of assessment; however, 8th grade is not identical to 6th/7th because there is a different presentation of tasks and the format of tasks is different

Grade 9-10

Broader Clear

Going from informative to argumentative writing provides progression; the task in 9th grade only involves previewing and summarizing, but in 10th grade the task includes developing an argument.

Deeper Clear

Grade 10 task 5 states that a significant support for idea development is needed, which is deeper than the requirements of task 5 in grade 9. Instead of connecting different ideas, you are finding relationships between ideas.

Prerequisite Clear Informative writing is a prerequisite for argumentative writing; you have to have basic skills down before you create an argument.

New Clear Increased and new skills are needed to create an argument.

Identical No Clear content differentiation is present.


For civics and US history, the consensus content differentiation was determined by looking at the set of tasks associated with each item. Table 42 shows that panelists determined clear task differentiation present for the set of tasks for each item. Table 42. Consensus Content Differentiation – Social Studies FSAA-PT Item Set

Grade Category Consensus Rating Consensus Rating Support

Civics

Broader Clear The higher tasks clearly reflect a broader application of the target skill.

Deeper Clear The higher tasks clearly reflect a deeper mastery of the target skill.

Prerequisite Clear The higher tasks clearly reflect a target prerequisite for mastery of the AP.

New Clear Task 1 plus Task 2 plus task 3 clearly reflects a new skill.

Identical No

Each level of the task clearly builds to the performance level of the AP. Without having the tasks available the understanding of the Access points is not as clear. Specifically, the verb interpretation in reference to the performance task.

US History

Broader Clear Overall it does get broader. Independent are the most broad.

Deeper Clear The first one is general and then more specific. More rigorous

Prerequisite Clear Participatory tells the answer, then later ones require the first ones.

New Clear Have to know more than what is in the first one to get the later one right. They build to the last one.

Identical No Very definitely different

Criterion 5: Achievement

The fifth LAL criterion pertains to inferences that can be made about a student based on their FSAA-PT score. The alternate assessment should allow students with disabilities to demonstrate academic skills or knowledge acquired from their coursework on the assessment, free from teacher or program influence. To determine the extent to which the FSAA-PT enables students to demonstrate this learning, panelists evaluated the scoring rubrics, scoring guidelines, assessment administration manual, FSAA-PT tasks, and FSAA-PT Test Specifications. Panelists worked together to form consensus ratings regarding the level of inference (high, low, or no evidence) of student learning provided by the alternate assessment system. The ratings were made across several learning dimensions, which are described below (adapted from Flowers et al, 2007):

Level of accuracy – extent to which scoring makes clear distinctions in student responses (minimal leeway for teacher interpretation of student response).

Level of independence – extent to which student performance is based on independent response without teacher supports.


New learning – extent to which evidence of new learning is demonstrable based on use of baseline or pretest OR clear content differentiation between grade tests.

Generalization across people and settings

– extent to which students demonstrate knowledge regardless of people (test administrator) or assessment setting.

Generalization across materials and activities

– extent to which students demonstrate knowledge across different types of materials (i.e., objects) or activities.

Standard setting – extent to which achievement standards are distinct and based on demonstration of independent student performance.

Program quality indicators

– extent to which the inclusion of program characteristics (e.g., quality of the task, completeness or accuracy of IEP) are not part of a student’s score.

Ratings of ‘no inference’ suggest an assessment may not allow students to adequately demonstrate their knowledge, while ratings of ‘high inference’ indicate that students’ scores clearly reflect their level of learning. It is reasonable to expect some inference (low or high) on at least six of the seven dimensions and high inference on at least four of those dimensions. Tables 43 and 44 contain the group consensus ratings on the degree of inference on student learning evident in the FSAA-PT, along with rationales for their ratings.

As seen in Table 43 the majority of writing grades’ dimensions were rated as having some level of inference. The panelists’ main concerns were some tasks (where hand over hand teacher guidance is allowed) may provide the lowest level of inference; some task completion may only be understood by a teacher very familiar with the student; and some tasks may be more material-dependent than others. For writing grades 4 and 5, while panelists rated the dimensions as having some level of inference, only 3 out of 7 dimensions had a high level of inference.

Table 43. Consensus Student Learning – Writing Prompt Portion of the ELA FSAA-PT

Grade Dimension Consensus Rating Consensus Rating Support

Grade 4

Level of Accuracy High Prompt two may have less inference.

Level of Independence No Hand over hand is a current practice per administration manual.

New Learning Low Limited differentiation between grades four and five, no pretest.

Generalizations Across People and Settings

High A person with student familiarity would be able to administer test.

Generalizations Across Materials and Activities

Low Some items are material dependent.

(continued)


Table 43. (Continued)


Grade 4 (cont’d)

Standard Setting Low Any student has the ability to respond correctly by chance.

Program Quality Indicators

High Scoring is straight forward.

Grade 5

Level of Accuracy High Prompt two may be more subjective to teacher.

Level of Independence

No Hand over hand is a current practice per administration manual.

New Learning Low Limited differentiation between grades four and five, no pretest.

Generalizations Across People and

Settings High

A person with student familiarity would be able to administer test.

Generalizations Across Materials and

Activities Low Some items are material dependent.

Standard Setting Low Any student has the ability to respond correctly by chance.


High Scoring is straight forward.

Grade 6

Level of Accuracy Low There are some students where teacher interaction is needed to obtain a response (i.e., hand drop, sign language)


Low

Across all kids, it depends on the level of support and level of guidance the student requires as to the level of inference - in general it is low

New Learning No The writing tasks are similar across grade levels.


Settings High For most tasks this is true for students.


Activities High

Every standard is tested using a scaffolding approach, building up the full standard

(continued)




Grade 6

(cont’d)

Standard Setting High need more comprehension as mastery is not likely to be shown by chance especially for prompt 2


High student's score is indicative of what they can do

Grade 7


Level of Independence Low


New Learning No The writing tasks are similar across grade levels.


High For most tasks this is true for students.


High Every standard is tested using a scaffolding approach, building up the full standard




Grade 8


Level of Independence Low


New Learning Low The level of student independence is related to student guidance


High For most tasks this is true for students.

(continued)




Grade 8

(cont’d)


Activities High

Every standard is tested using a scaffolding approach, building up the full standard




Grade 9

Level of Accuracy High Student has to get items correct to receive credit.


High

mostly independent responses are acceptable. Dependent responses are occasionally acceptable when teacher prompt is specified.

New Learning High there is task differentiation across grades; the tasks increase in breadth, depth, and are often prerequisites for higher grade.


Settings High

students are expected to demonstrate knowledge across different testers who are familiar with the student.


Activities High

students are expected to demonstrate knowledge across materials and activities.

Standard Setting High students are expected to demonstrate high level of knowledge to be able to pass.


High Program quality indicators are not used according to the participant; only the student knowledge influences the score.

Grade 10

Level of Accuracy High Student has to get items correct to receive credit.


High

mostly independent responses are acceptable. Dependent responses are occasionally acceptable when teacher prompt is specified.

New Learning High there is task differentiation across grades; the tasks increase in breadth, depth, and are often prerequisites for higher grade.

(continued)




Grade 10

(cont’d)


Settings High

students are expected to demonstrate knowledge across different testers who are familiar with the student.


Activities High

students are expected to demonstrate knowledge across materials and activities.

Standard Setting High students are expected to demonstrate high level of knowledge to be able to pass.


High Program quality indicators are not used according to the participant; only the student knowledge influences the score.

The civics panelists did not view several dimensions as having a high level of interference; panelists’ comments reflect concern over lack of generalization to people and materials (Table 44). Panelists, in the US history group, viewed all dimensions as having some level of inference. They believed that the US history EOC score provides information about what a student knows and can do independently of the teacher or the assessment system. Table 44. Consensus Student Learning – Social Studies FSAA-PT

Grade Dimension Consensus

Rating Consensus Rating Support

Civics

Level of Accuracy High There is one answer they get credit for.

Level of Independence High Hand over hand maybe challenging to interpret.

New Learning No County by county each has their own EOC prerequisites, but not state wide


No Test admin dictates training and familiarity with the student.


No The tasks are specific to the access point and then tied to the specific standard.

Standard Setting NA Not established.


High Program quality indicators are reflective of the score not outside indicators.

(continued)




US History

Level of Accuracy Low

They try to make sure that everyone gives it the same way, but in reality there is some room for inference by the administrator. No way to fix it. In a perfect world we would have said high student inference. A lot of room for human error.

Level of Independence High You only get down to a 50/50 chance. Used to be the middle because you would scaffold twice.

New Learning High

Some questions are prior knowledge, but some are from this grade level. 25% is from middle school but the rest are grade level differentiated


Settings Low

Because the students are hard to understand, so someone else might not understand them. Someone else giving the test would affect the score. Have to be familiar with the student. You would need to have in the IEP that the person needs to be someone who understands the test. Some would be fine but some students would not.


Activities High

You can use other materials and other tasks could get at the same standard equally well.

Standard Setting High In the independent level items they are held to a high standard. That standard might even be too high for this group of students.


High No connection between an IEP and a score.

Criterion 6: Performance Accuracy

Criterion 6 is intended to evaluate the degree of accessibility of the FSAA-PT for all student groups who take it. Reduced access to the tasks would decrease accurate measurement of students’ skills. Panelists, individually, rated tasks on whether accommodations or supports can be provided for different types of students without substantially altering the target content.

Tables 45 and 46 display the mean percent of tasks rated as accessible to all students. At least 90% of the tasks should be rated as accessible for the whole assessment to be considered accessible. The ratings for all grades and subjects except for writing grade 4 indicate that panelists found 100% of the tasks to be accessible to different disability groups. For writing grade 4, they found only 75% of the tasks to be accessible to different disability groups. The panelists commented some tasks may not translate well to ASL; students with disabilities may


not be able to relate to some content; and some items may not be accessible to visually impaired students.

Table 45. Percent of FSAA-PT Tasks as Accessible to Different Disability Groups – Writing

Grade N N

% Yes % No Raters Tasks

Grade 4 4 6 75.00 25.00

Grade 5 4 6 100.00 0.00

Grade 6 4 6 100.00 0.00

Grade 7 4 6 100.00 0.00

Grade 8 4 6 100.00 0.00

Grade 9 4 6 100.00 0.00

Grade 10 4 6 100.00 0.00 Table 46. Percent of FSAA-PT Tasks as Accessible to Different Disability Groups – Social Studies

Grade N N


Civics 4 48 100.00 0.00

US History 4 48 100.00 0.00 The second rating required panelists to evaluate whether tasks could be modified or supports offered without altering the meaning or purpose of the task. A common approach to administering an alternate assessment is for teachers to offer accommodations based on student IEPs or supports (i.e., assistive technology; scaffolding) as appropriate for a given student.

Tables 47 and 48 include the mean percent of tasks panelists found amenable to these types of changes. Panelists found the majority of tasks could be altered appropriately for individual students.

Table 47. Percent of FSAA-PT Tasks as Amenable to Accommodations or Supports – Writing

Grade N N


Grade 4 4 6 100.00% 0.00%

Grade 5 4 6 100.00% 0.00%

Grade 6 4 6 100.00% 0.00%

Grade 7 4 6 100.00% 0.00%

Grade 8 4 6 100.00% 0.00%

Grade 9 4 6 95.83% 4.17%

Grade 10 4 6 100.00% 0.00%


Table 48. Percent of FSAA-PT Tasks as Amenable to Accommodations or Supports – Social Studies

Grade N N

% Yes % No Raters Tasksa

Civics 4 47-48 100.00% 0.00%

US History 4 47-48 100.00% 0.00% a A range of values denotes at least one panelist did not provide a rating on all tasks. To further evaluate the FSAA-PT on accessibility and accommodations, panelists were asked as a group to provide a consensus rating on four questions across nine disability groups. This evaluation allowed panelists to evaluate whether students with certain disabilities may indeed have difficulty accessing the FSAA-PT or accommodations are difficult to provide. The ratings above focused, in general, across all disabilities on whether the FSAA-PT is accessible and amenable to accommodations. Table 49 shows that panelists believed there are sufficient provisions in the assessment to capture responses for students without clear, intentional communication in civics and writing grades 4-5 and 9-10, but not in US history or writing grades 6-8. Panelists felt that accommodations, modifications, and supports were defined sufficiently to maintain standardized administration for all grades and subjects except for writing grades 6-8.

Table 49. Consensus Whole Test Barriers to Demonstrating Student Knowledge

Question Yes No Consensus Comments

Are there provisions in the assessment to

capture responses for students without clear,

intentional communication (even

at non-symbolic level)?

Civics US History If in the classroom you never know if you are getting to them, then the test doesn't have anything to help with that.

Writing 4-5 Writing 6-8

Writing Prompt 2 is too in-depth to capture an answer for those students who do not have clear, intentional communication

Writing 9-10

Are accommodations, modifications, and supports defined

sufficiently to maintain standardized

administration?

Civics,

Writing 6-8

ASL script is not standardized and if it becomes standardized, needs to allow flexibility to take into account the sign vocabulary of the student

US History,

Writing 4-5,

Writing 9-10

Table 50 indicates that overall, panelists felt that the FSAA-PT is accessible to many different disability groups. The main issue panelists did find was generally with writing for all grades, specifically regarding deaf/blind students. Panelists stated that for students who are deaf, deaf/ blind, or communicate nonverbally with pictures the accommodations would result in the meaning being changed.

Independent Alignm

ent Review

of the FS

AA

-PT

: Civics, U

S H


rompts

48

Table 50. Consensus Whole Test Barriers to Demonstrating Student Knowledge for Certain Disability Groups

Question Disability Group Consensus Comments

Vis

ually

Im

paire

d/Le

gally

Blin

d

Hea

ring

Impa

ired

Dea

f/Blin

d

Non

verb

al –

Prin

ted

Wor

ds

Non

verb

al -

Pic

ture

s

Non

verb

al –

Man

ual

Sig

ns

Non

verb

al –

Eye

Gaz

e

Ver

bal b

ut n

o us

e of

ha

nds

Com

mun

icat

es w

ith

obje

cts

or b

y in

dica

ting

yes/

no

Does the FSAA-PT contain provisions for students with these characteristics?

Writing gr 9-10

Writing gr 6-8

The instructions to teacher do not state specifically which items will be brailleable. Unclear what accommodations will be made for deaf/blind students.

Student can do the FSAA-PT tasks as designed with flexibility built into tasks?

Writing gr 9-10

Writing gr 6-8

The instructions to teacher do not state specifically which items will be brailleable. Unclear what accommodations will be made for deaf/blind students.

Student can do the FSAA-PT tasks with accommodations (no change to meaning)?

Writing gr 4-5

Writing gr 4-5, Writing gr 9-10

Writing gr 6-8

For students who routinely communicate utilizing ASL concepts than exact English there could be changes to meaning. It is not clear what accommodations are provided to deaf/blind students.

Independent Alignm

ent Review

of the FS

AA

-PT

: Civics, U

S H


rompts

49

Student can do the FSAA-PT tasks with modifications/supports (may change meaning)?

Writing gr 9-10

It is not clear what accommodations may be provided to deaf/blind students. We need to know exactly what modifications are allowed for deaf/blind students to be able to evaluate this question.


Chapter 5: Summary and Recommendations

In this section, we summarize the results of the alignment study and provide recommendations to strengthen portions of the Florida alternate assessment system.

Access Point to Standards Alignment Summary

For this alignment evaluation, panelists reviewed APs, associated with the FSAA-PT blueprints, for the writing prompt section of the ELA assessment, civics, and US history in multiple ways. First, they evaluated the content centrality (Criterion 2) between the blueprint APs and the corresponding LAFS and NGSSS for Social Studies. Second, panelists evaluated the progression of content (Criterion 4) from one grade to the next only for the blueprint identified writing APs. Lastly, panelists rated the appropriateness and accessibility (Criteria 1 and 6) of the AP content for this population of students.

The rules for the LAL criterion applied to the alignment between blueprint identified APs and the LAFS and NGSSS for Social Studies are as follows:

Criterion 1: Age Appropriateness (individual panelist rating) - 90% or more of the APs were rated as ‘adapted’ or ‘neutral’

Criterion 2a: Content Centrality (individual panelist rating) - 90% or more of the APs were linked to the LAFS or NGSSS for Social Studies

Criterion 2b: Performance Centrality (individual panelist rating) - 90% or more of the APs comparable in complexity to the LAFS or NGSSS for Social

Studies


Criterion 6: Performance Accuracy (individual panelist rating) - 90% or more of the APs were accessible to different disability groups

Table 51 provides summary conclusions on the alignment of the blueprint identified APs to their respective LAFS and NGSSS for Social Studies. As a reminder, only the writing APs and LAFS are of interest in this alignment study. For non-writing APs and LAFS, refer to Nemeth et al. (2016 No. 029) report. If APs met the criterion, then a green highlighted box containing a ‘’ is assigned. For results falling slightly below a criterion, then a yellow highlighted box containing the criterion results is assigned. Finally, a red highlighted box contains results that fell well below the criterion.

As illustrated in Table 51, in general, the blueprint identified APs exhibited high content linkage with the grade-level standards. Specifically, the APs across all grades and subjects were rated by panelists as age appropriate (Criterion 1) and were found to assess the same content and performance expectations as the grade-level standards (Criterion 2) for all grades and subjects. Panelists felt that the blueprint identified APs were accessible to different disability groups (Criterion 6).

Independent Alignm

ent Review

of the FS

AA

-PT

: Civics, U

S H


rompts

51

Table 51. Percent of Grade-Level APs Which Met Each LAL Criterion

Criterion 1 Criterion 2 Criterion 4 Criterion 6

Age Appropriate Content Centrality Performance Centrality Content Differentiation Performance Accuracy

Is the content of the APs

age appropriate?

Does the AP content link with the associated LAFS or NGSSS?

Are the APs comparable in complexity to the LAFS & NGSSS?

Does content differ across grade-levels

within a grade span?12

Are barriers to demonstrating student knowledge minimized?

Tables 10-11 Tables 12-13 Tables 16-17 Table 18 Tables 19-20

W4 0 out of 5

W5

W6

2 out of 5

W7

W8

W9 NA

W10

Civ NA

USH NA

12 For Writing grades 4-8, a comparison between this study and the 2016 alignment study (see Nemeth, et al. [2016 No. 029]) reveals different results. In the 2016 alignment study, panelists evaluated all the blueprint identified APs for Language Arts associated with the ELA FSAA-PT. However, the current alignment study required panelists to review the blueprint identified APs for Language Arts associated with only the writing prompt section of the ELA FSAA-PT.


Criterion 4 (content differentiation) at the grade level was assessed only for writing grades 4-5 and 6-8. The civics assessment and US history assessment were not intended to have content differentiation between grades. Similarly, writing grades 9-10 APs were the same between these two grades. Content differentiation appears to be an area in need of improvement. For writing grades 4-5, panelists found content differentiation to be low in all areas (breadth, depth, prerequisite, new learning), and consequently rated the APs to be identical between the grades. For writing grades 6-8, the panelists found no differentiation in breadth between any of the three grades, low differentiation in new learning between grades 7 and 8, and partial differentiation in depth and prerequisite. As a result, they concluded that one of the APs (AP 2.4) is identical across the grades.

FSAA-PT Alignment Summary

Table 52 provides summary conclusions on the alignment of the FSAA-PT writing prompt section of the ELA, civics, and US history assessments to the LAFS and NGSSS for Social Studies APs, respectively. If tasks met the criterion, then a green highlighted box containing a ‘’ is assigned. For results falling slightly below a criterion, then a yellow highlighted box containing the criterion results is assigned. Finally, a red highlighted box contains results that fell well below the criterion.

The rules for the LAL and HumRRO criterion applied to the alignment between FSAA-PT tasks and APs are as follows:

Criterion 1: Age Appropriateness (individual panelist rating) - 90% or more of the tasks were rated as ‘adapted’ or ‘neutral’

Criterion 2b: Performance Centrality (individual panelist rating) - 90% or more of the tasks were rated as ‘some’ or ‘all’

Criterion 3a: Content Representation (individual panelist rating) - 90% or more of the tasks were rated as ‘partial’ or ‘fully’ aligned

Criterion 3b: Category Representation (based on individual panelist rating) - Tasks match the FSAA-PT Test Specifications targets

Criterion 3c: DOK Representation (individual panelist rating) - 50% or more of the item set task 3 were at the same or higher DOK level as the AP - 90% or more of the assigned complexity ratings are confirmed by panelists for DOK,

Volume of Information, Vocabulary, and Context


Criterion 5: Achievement (consensus group rating) - 6 of the 7 dimensions have some level of inference, either low or high - At least 4 dimensions have a high level of inference

Criterion 6: Performance Accuracy (individual panelist rating) - 90% or more of the tasks were accessible to different disability groups - 90% or more of the tasks were amenable to accommodations or supports

Independent Alignm

ent Review

of the FS

AA

-PT

: Civics, U

S H


rompts

53

Table 52. Percent of Grade-Level Tasks Which Met Each LAL Criterion

Criterion 1 Criterion 2 Criterion 3 Criterion 4 Criterion 5 Criterion 6

Age

Appropriate Performance

Centrality Content Coverage

Content Differentiation

AchievementPerformance

Accuracy

Item

Alignment

Represent Intended

CategoriesTask Complexity

Is th

e co

nten

t of

the

task

s ag

e ap

pro

pria

te?

Is th

e ite

m s

et ta

sk

com

para

ble

in

com

plex

ity to

the

A

P?

Are

task

s fu

lly

alig

ned

with

A

Ps?

Do

task

s ad

equa

tely

re

pres

ent r

epor

ting

cate

gorie

s?

Do

task

s re

flect

the

rang

e of

DO

K in

the

AP

s?1

3

Do

pane

lists

agr

ee w

ith

DO

K?

Do

pane

lists

agr

ee w

ith

Vol

ume

of In

form

atio

n?

Do

pane

lists

agr

ee w

ith

Voc

abul

ary?

Do

pane

lists

agr

ee w

ith

Con

text

?

Wri

tin

g:

Do

prom

pts

incr

ease

in c

om

plex

ity

acro

ss g

rade

leve

ls?

14

C

ivic

s &

US

H:

Do

task

s w

ithin

an

item

set

in

crea

se in

co

mpl

exity

?

Stu

dent

ach

ieve

men

t de

mon

stra

tes

lear

ning

.

Are

task

s ac

cess

ible

to

diffe

rent

dis

abili

ty

grou

ps?

Are

task

s am

ena

ble

to

acco

mm

odat

ions

or

supp

orts

?

Tables 21-22

Tables 23-24

Tables 25-26

Tables 27-28

Tables 33-34

Tables 29-30

Tables 35-36

Tables 37-38

Tables 39-40

Tables 41-42

Tables 43-44

Tables 45-46

Tables 47-48

W4 17%

80% 70% 75% 75% 2 out of 5

6 out of 7; 3

75%

W5 13%

6 out of 7;

3

W6 88% 17%

0 out of 5

W7 88% 0%

W8 0%

W9 21% 65% 85% 75%

W10 33%

13 For Writing grades 4-10, a comparison between this study and the 2016 alignment study (see Nemeth, et al. [2016 No. 029]) reveals different results. In the 2016 alignment study, panelists evaluated the field test writing prompts which were still under development. Also, the panelists participating last year and this year were not the same educators. 14 In the 2016 alignment study, panelists evaluated all tasks and prompts on the ELA FSAA-PT. However, the current alignment study required panelists to review only the writing prompts of the ELA FSAA-PT.

Independent Alignm

ent Review

of the FS

AA

-PT

: Civics, U

S H


rompts

54

Civ 3 out of 7; 3

USH


In general, the civics and US history FSAA-PT exhibited good overall alignment with the fewest areas for improvement. The writing prompts associated with the ELA FSAA-PT showed more areas for improvement. Panelists found the APs and assessment tasks for all subjects and grades to be age appropriate (Criterion 1). They determined that for the most part, the assessment tasks maintain fidelity with the performance expectations in the APs for civics and US history, and for writing grades 4-5, 8, and 9-10. For writing grades 6 and 7, 88% of tasks were found to call for comparable performance levels as the standards (Criterion 2).

There were mixed results on Criterion 3. Panelists found the tasks for each grade and subject to be fully aligned with the standards, and the percent of aligned tasks matches test specifications. However, panelists found the task cognitive complexity to be substantially lower than the AP complexity in writing for all grades. In civics and US history, the cognitive complexity of tasks was found to match the AP cognitive complexity. For the most part, panelists agreed with the assigned DOK. There was some disagreement in writing grades 4 and 9, but the overall cognitive complexity assigned by the panelists was either the same or higher. For writing grade 4, panelists agreed with 80% of assigned DOK levels, rating 10% of tasks as requiring a higher DOK, and 10% of tasks as requiring a lower DOK. For writing grade 9, panelists agreed with only 65% of assigned DOK levels, rating other tasks as lower (25%) and higher (10%). Similarly, panelists agreed with most of Volume of Information levels, except for writing grades 4 and 9. They agreed with 70% of grade 4 writing tasks, and rated the other 30% as having a higher Volume of Information. For grade 9, on the other hand, panelists agreed with 85% of the tasks, and rated the other 15% as having a lower Volume of Information. For the most part, panelists agreed with the Vocabulary rating, with the exception of writing grade 4 and grade 9, where they agreed with 75% of the tasks. For grade 4, the other 15% of the tasks were rated as having a higher Vocabulary level, and for grade 9, 10% of the tasks were rated as having a lower Vocabulary level while 15% as having a higher Vocabulary level. Panelists agreed with the rating of Context in most cases, with the exception of grade 4. In this case, they agreed with 75% of the tasks, and rated the Context of the other 25% of the tasks as higher.

Criterion 4 was evaluated differently for the writing and social studies assessments. Since the writing tasks, unlike the tasks for civics and US history, were not ordered from easiest to hardest, these tasks were evaluated for content differentiation in the following way: Is there a progression in breadth, depth, prerequisite, new learning from lower grade prompt 1 and prompt 2 to higher grade prompt 1 and prompt 2? Content differentiation ratings at the prompt level agree with the overall AP content differentiation ratings for these grades. The progression of prompts for grades 4-5 writing was judged to have no differentiation in new learning, limited prerequisite differentiation, and partial differentiation in the breadth and depth. As a result, the panelists concluded that content differentiation was limited for these grades. However, for writing grades 6-8, panelists evaluated depth as limited between grades 6-7 and 8, and prerequisite as limited between grades 6 and 7. They stated that breadth and new learning were absent across all three grades, and consequently determined that the tasks were identical between grades 6 and 7, but not between grade 8 and the other two grades. For writing grades 9-10, panelists found clear content differentiation. For civics and US history, panelists were asked to evaluate whether the content differentiation existed from task 1 to task 3 in an item set. Overall, panelists found clear content differentiation for civics and US history.

Criterion 5 is an evaluation of whether the assessment system, in general, provides student demonstration of learning. Here, as well, some dimensions were rated by panelists as providing ‘no inference’ of student learning. For example, in civics panelists stated that little inference can be made about the presence of new learning, and the assessment results may be challenging to generalize across people and settings, and materials and activities. In grade 4-5 writing, the


assessment was seen to include tasks where hand over hand teacher guidance may be reducing the level of inference about student knowledge; therefore, the level of independence was judged to provide no inference about student knowledge. For the most part, across the subjects, panelists felt the FSAA-PT provides an assessment in which student learning can be demonstrated.

One thing we found in the course of this study is that even after we discussed with panelists the allowable accommodations and modifications as described in the test administration manual, panelists tend to think back to how these and similar assessments are being administered in the field. In some cases, for example, if a teacher is unable to elicit a response from a student by the means specified in the test administration manual, they are going to implement some solutions that are not explicitly prohibited, but also not explicitly endorsed in the manual. While the manual does not explicitly endorse hand over hand assistance, the participants observed it being implemented in the field, and mentioned it in the discussion. We value these statements by the teachers, even though they diverge from the test administration manual instructions, since they come from their expertise. To make ratings more consistent, it may be helpful to state more explicitly in the test administration manual not only which accommodations/modifications are allowed, but also which ones are prohibited.

For Criterion 6, the ratings provided by panelists for all grades and subjects except for writing grade 4 found 100% of the tasks to be accessible to different disability groups. For writing grade 4, only 75% of the tasks were rated as accessible to different disability groups. Panelists voiced concerns about the tasks translating to ASL, and students with visual impairments having trouble with some tasks.

Recommendations15

HumRRO makes the following suggestions to strengthen the alignment between the components of the Florida assessment system:

Review the cognitive complexity of writing tasks. Tasks should assess APs at the same or higher complexity level. This ensures the tasks are appropriately assessing the content of the AP that the task is asking a student to demonstrate knowledge and ability. The majority of writing tasks, associated with prompt 1, did not assess students on a cognitive complexity level that was similar to the cognitive complexity level of the AP; instead, tasks were judged to be too low. It is recommended that the writing tasks, particularly for prompt 1, be reviewed to ensure the cognitive complexity level of the tasks are in accordance with the assessment design and, if needed, additional writing tasks developed measuring a wider range of complexity to better match the cognitive complexity of the APs.

Review content differentiation of writing APs and tasks across grades. APs should increase in content breadth, depth and newer knowledge, as well as growth on prerequisite skills. Similarly, for assessments in which tasks are structured in such a way that they increase in cognitive complexity between grades (writing grades 4-10), there should be a progression of breadth and depth between the tasks between grades. However, in writing grades 4-5 and 6-8 no content differentiation was found between APs across grades within grade spans, and little task differentiation between grades within grade spans among the prompts. It is recommended, especially for writing grades 4-5 and 6-8 that the APs and tasks be reviewed to ensure appropriate content

15 A supplemental appendix, not for public dissemination as it contains item information, identifying specific items and tasks that FDOE and Measured Progress may want to review will be provided.


differentiation within and across grade spans. If the content differentiation between APs and thus prompts is not meant to be reflected in the AP, per se, but in the complexity of the reading passage associated with the writing prompt, then additional training or communication to educators in the field regarding such is recommended.

Review the DOK, Volume of Information, Vocabulary, and Context assigned to tasks. For writing grades 4 and 9, panelists agreed with less than 90% of assigned DOK, Volume of Information, Vocabulary, and Context (writing grade 4 only) assigned to tasks. It is recommended, especially for these grades and subjects, for the tasks to be reviewed to ensure they reflect the appropriate DOK, Volume of Information, Vocabulary, and Context.

Review the degree to which the assessment provides evidence of a student’s ability to demonstrate what they know and can do. For civics and writing grades 4-5, panelists expressed concern that the tasks may not generalize to people and settings, and materials and activities, and that a student’s responses may not be sufficiently independent from the teacher. It is recommended that the accommodations and modifications allowed and not allowed is explicitly stated in the test administration manual.

Review the accessibility of tasks to different disability groups. For grade 4 writing, panelists rated only 75% of the tasks as accessible to all disability groups. Their specific concerns were the accessibility of tasks for deaf, deaf/ blind students, and students who communicate nonverbally with pictures. While only grade 4 writing did not meet the criterion for accessibility of tasks, concerns about these population groups were voiced by panelists in other subject groups as well. It is recommended that accommodations for these groups are provided and/or outlined in a more clear and specific fashion.


References

Flowers, C., Wakeman, S., Browder, D., & Karvonen, M. (2007). Links for academic learning: An alignment protocol for alternate assessments based on alternate achievement standards. Charlotte, NC: University of North Carolina at Charlotte. Retrieved from: http://www.naacpartners.org/LAL/documents/NAAC_AlignmentManualVer8_3.pdf

Nemeth, Y. M., Purl, J., & Smith, E. A. (2016). Independent alignment review of the Florida Standards Assessment (FSA) in English Language Arts and Mathematics (2016 No. 029). Alexandria, VA: Human Resources Research Organization.

Porter, A. C. (2002, October). Measuring the content of instruction: Uses in research and practice. Educational Researcher 31(7), 3-14.

Smith, E. A., Deatz, R. C., Wen, Y., & Nemeth, Y. M. (2014). Independent alignment review of the mathematics grade 11 Minnesota Test of Academic Skills (MTAS) (2014 No. 048). Alexandria, VA: Human Resources Research Organization.

Smith, E. A., Wen, Y., Nemeth, Y. M., Levinson, H., & Deatz, R. C. (2014). Independent alignment review of the mathematics grade 11 Minnesota Comprehensive Assessment (MCA-III) (2014 No. 058). Alexandria, VA: Human Resources Research Organization.

Webb, N. L. (1997). Criteria for alignment of expectations and assessments in mathematics and mathematics education (Research Monograph No. 6). Washington, DC: Council of Chief State Schools Officers.

Webb, N. L. (1999). Alignment of mathematics and mathematics standards and assessments in four states (Research Monograph 18). Madison, WI: National Institute for Mathematics Education and Council of Chief State School Officers. (ERIC Document Reproduction Service No. ED440852)

Webb, N. L. (2005). Webb alignment tool: Training manual. Madison, WI: Wisconsin Center for Education Research. Available: http://www.wcer.wisc.edu/WAT/index.aspx

Independent Alignment Review of the FSAA-PT: Civics, US History, and the Writing Prompts A-1

Appendix A. Panelist Alignment Review Materials Samples

Panelists received the following instruction sheet and as a reference guide corresponding with verbal instructions form HumRRO facilitators.

FSAA-PT Panelist Instructions

Rating Task Documents Needed File Format

1 FSA Standards DOK (Consensus)

(1) FSA Standards (2) FSA_1_DOKConsensus_subject Grade x – x (3) Panelist Instructions (4) Depth of Knowledge_rev_nov21 subject ONLY.docx

Print copy Print copy Print copy Print copy

2

FSAA Access Points (AP) DOK (Consensus)

(1) FSAA Access Points (2) FSAA_2_APDOKConsensus_ subject Grade x – x (3) Panelist Instructions (4) Depth of Knowledge_rev_nov21 subject ONLY.docx

Print copy Print copy Print copy Print copy

3 AP Review (Individual)

(1) FSA Standards (2) FSAA Access Points (3) FSAA_3_APReview_subject Grade x – x (4) Panelist Instructions (5) Depth of Knowledge_rev_nov21 subject ONLY.docx

Print copy Print copy Excel spreadsheet Print copy Print copy

4

AP Content Differentiation (Consensus) Writing 4-5, 6-8 ONLY

(1) FSAA Access Points (2) FSAA_4_AP Content Diff_ subject Grade x – x

Print copy Excel spreadsheet

5 FSAA Task Review (Individual)

(1) FSAA Tasks (Prompts and Responses) (2) Item Workbook – subject Grade x – x (3) Panelist Instructions (4) FSAA Test Administration Manual (5) Presentation Rubric_rev_nov21.pdf (6) Depth of Knowledge_rev_nov21 subject ONLY.docx (7) FSAA Access Points

Print copy Excel spreadsheet Print copy Print copy Print copy Print copy Print copy

6 Task Content Differentiation (Consensus)

(1) FSAA Tasks (2) FSAA_6_ContentDiff_ subject Grade x – x

Print copy Excel spreadsheet

7 Whole Test (Consensus)

(1) FSAA Tasks (Prompts and Responses) (2) FSAA Test Administration Manual (3) FSAA_8_WholeTestCon_ subject Grade x – x

Print copy Print copy Excel spreadsheet

8

Student Learning Review (Consensus)

(1) FSAA Test Specs for writing, civics, and US history (2) FSAA Assessment Manual excerpts

Print copy Print copy Excel spreadsheet


Rating Form Excel files: Access HumRRO item rating forms:

a. Locate folder on desktop, double click to open. b. Open file specified by facilitator (example - FSAA_3_APReview_subject Grade x

– x). c. File, Save As, same file name with an underscore and their 3 initial extension

(e.g., FSAA_3_APReview_subject Grade x – x _eas). d. Autosave will be set to every “1” minute; however, please save often as this

doesn’t work all the time. e. Repeat for each rating form.

1 Rate FSA Standard DOK (Consensus)

(1) Use rating sheet, FSA Standards, and Depth of Knowledge (2) Calibration: Rate 5 Standards independently, answer on rating sheet handout, and

discuss as group to reach consensus. Note: if unable to reach consensus, majority rules, then tie break is higher DOK rating.

(3) The facilitator may repeat before you start entering your independent ratings.

2 Rate FSAA Extended Standard (AP) DOK (Consensus)

(1) Use rating sheet, FSAA Access Points, and Depth of Knowledge (2) Calibration: Rate 5 Access Points independently, answer on rating sheet handout, and

discuss as group to reach consensus. Note: if unable to reach consensus, majority rules, then tie break is higher DOK rating.

(3) The facilitator may repeat before you start entering your independent ratings. 3 Access Point (AP) Review

(1) Open FSAA_3_APReview_subject Grade x – x.xls and save with initial extension. (2) Review rating categories (codes on following pages). Reminder: White cells are for your

data; however, this file does have slightly greyed-out columns that you may need to ender data in. They are greyed-out because using them will likely be rare.

a. Content Centrality: Is all of the content in AP in the indicated standard? If you rate other than “Fully aligned”, you must explain in the second column what content the AP covers that is not part of the standard. Column 3 is available to suggest another standard if you feel there is one that is more appropriate.

b. Performance Centrality: Are students called upon to perform similarly between the AP and FSA standard? For example, do both standards require the student to select, identify, compare, analyze, or evaluate? If there are differences, then rate accordingly.

c. Age Appropriateness: Is the content and context of the AP indicative of age/grade level content.


d. Barriers to Demonstrating Knowledge. There are two ratings.

Symbolic (This is asking the level of communication required by the AP for this student population to reasonably demonstrate knowledge):

Awareness (pre-symbolic): 10% of alternate assessment students have minimal intentional communication skills, and may not respond to any on-demand assessment. Their physical challenges may make it difficult to judge responses. Teachers who know student may make inferences. In reading, objects and pictures are more concrete, and the student may make some limited connections to printed text, Braille, or raised symbols.

Early Symbolic: Intentional communication that can be interpreted more clearly but students lack abstract symbolic language; also described as emergent symbolic. This student can convey what they know and can do, but may not always communicate clearly or consistently. In reading, the student relies on picture discrimination and read-alouds with simplified text for listening comprehension.

Symbolic: Communicates with verbal or written words, sign language, braille, or language-based augmentative and alternative communication systems. Like the emergent symbolic student, however, this student can communicate what they know and can do, but communication is not always clear or consistent. Reading may be limited to pictures and sight words, and writing responses with sight words only.

Accessibility (This is outside of communication abilities; such as if students with visual impairments, or inability to follow instructions, or need of assistive technology):

Yes, all FSAA eligible students can demonstrate the knowledge required by this AP.

No, some FSAA eligible students can not demonstrate the knowledge required by this AP.

(3) Calibration: Rate 5 Access Points independently and discuss as group. This is NOT consensus and is only to ensure everyone is comfortable with the ratings.

(4) The facilitator may repeat before you start entering your independent ratings.

4 Content Differentiation for AP – Writing Grades 4-5 & 6-8 ONLY This criterion focuses on whether the content expectations (access points) change appropriately between grade levels. NOTE: THIS IS ONLY FOR WRITING GRADES 4-5 & 6-8

(1) Open FSAA_4_APContentDiff_subject Grade x – x.xls and save with initial extension. (2) Review rating categories (codes on following pages).

a. Use FSAA Access Points. b. Review APs for adjacent grades. c. Always specify an example(s) when explaining rating.


5 FSAA Task Review

(1) Panelists access Item Workbook - subject Grade x - x.xls and save with initial extension. (2) Review rating categories (codes on following pages)

a. Task Complexity Ratings (columns C-N) (Use Presentation Rubric_rev_nov21.pdf for definitions of VI, V, and C):

i. DOK: Does the assigned DOK indicate the correct level of complexity for this task? If not, then provide an alternate DOK rating and an explanation for the change.

ii. Volume of Information: Does the assigned VI indicate the correct level of volume of information? If not, then provide an alternate VI rating and an explanation for the change.

iii. Vocabulary: Does the assigned V indicate the correct level of vocabulary? If not, then provide an alternate V rating and explanation for the change.

iv. Context: Does the assigned C indicate the correct level of context? If not, then provide an alternate C rating and an explanation for the change.

b. Content Centrality: Does the content in the task match with content indicated in

AP? If you rate other than “Fully Aligned” explain what content is missing from EB and provide another benchmark if you feel there is one that is more appropriate.

c. Performance Centrality: Do the tasks allow students to demonstrate content at a similar performance level as the AP? Performance types include: select, identify, compare, analyze, or evaluate.

d. Age Appropriateness: Is the content and context of the content age/grade level appropriate?

e. Barriers to Demonstrating Knowledge. This has three ratings, symbolic, accessibility, and modification. See Task #3, AP Review, for information for Symbolic and Accessibility.

Modification (This is asking if there are supports teachers can provide, such as assistive technology or additional prompts of some type (ask for suggestions from the special ed teachers) as appropriate for a given students:

i. Yes, the task could be modified to be more accessible for some students without changing meaning.

ii. No, modifying the task further would change the meaning of difficulty.

(1) Calibration: Rate 5 tasks independently and discuss as group. This is NOT consensus

and is only to ensure everyone is comfortable with the ratings. (2) The facilitator may repeat before you start entering your independent ratings.


6 Content Differentiation for Tasks This criterion focuses on whether the content presented in items change appropriately between task levels.

(1) Open FSAA_6_ContentDiff_ subject Grade x – x.xls and save with initial extension. (2) Review rating categories (codes on following pages).

a. Use all items. b. Rate based on task levels for items on the test. c. Explain ratings for each category by citing specific example(s).

7 Rate ‘Whole Test’ (Consensus) by grade level assessment form The purpose of this step is to determine if barriers exist for some students to demonstrate learning per test form, similar to the AP and task ratings earlier, only as a consensus discussion.

(1) Open FSAA_7_WholeTestCon_ subject Grade x - x for reference only. The facilitator will record the groups discussion.

(2) Focus on across the assessment form in general, but use task examples for evidence in support of rating.

(3) Use FSAA materials, FSAA Administration Manual, and FSAA Tasks.

8 Student Learning (Consensus)

This criterion is to identify if inferences can be made about a student from their scores. In other words, are the scores indicative of student learning and knowledge, or are the scores entirely, or partly, the result of the teacher or program?

(1) Open FSAA_7_subject Grade x – x.xls as a reference only The facilitator will record the groups discussion.

a. You will need FSAA materials, FSAA Assessment Manual, FSAA tasks. b. Discuss each criteria and facilitator records “y” or “n” in one of the 3 available

cells for each criterion and documents the discussion (key points). (2) Additional clarification support:

a. Accuracy: From the tasks and assessment administration guidance, does it appear the teacher has a wide latitude interpreting student responses or is it clear that student response clearly shows learning has occurred.

b. Independence: From assessment administration guidance, what level of assistance is the teacher allowed to provide their student? For example: Hand-over-hand assistance is the teacher physically helping the student indicate the response selection.

c. New learning: Ask them to think about their responses to the Content Differentiation steps provided for Access Points and Tasks in addition to looking through test specs and assessment administration manual for indication of baseline or pretesting.

d. Generalize across people/settings: This is asking if the tasks are designed to work across people and settings, or if they are designed at the lowest end (no student inference) as being answerable if one person gives them to a student.


Can more than one particular person present a task item and record responses for any given student?

e. Generalize across materials/activities: This is similar to above, only with regard to the materials and activities. Are the tasks designed to fit only one standard and in only one context, with no options for using different materials?

f. Standard setting: Have panelists review Access Points, scoring guidelines, and rubrics to determine if at the lowest option students could make proficiency by chance. Are the APs written such that students would not be proficient without having to show some independent learning?

g. Program quality: Does FL use program quality indicators and do those indicators influence a student’s score? Example of indicators that could impact a student score would be if the task prompt was part of the evaluation of the student’s score or if the completeness and accuracy of the student’s IEP were part of the scoring (program indicator).


Training Support Materials Depth of Knowledge_rev_nov21 ELA ONLY.docx Depth of Knowledge_rev_nov21 SOCIAL STUDIES ONLY.docx Presentation Rubric_rev_nov21.pdf Step 3 and 5 Access Point and Task Reviews

Category Code Description

Content Centrality

1- Not aligned 2- Partially aligned 3-Fully aligned

AP/Task does not match standard content at all AP/Task is not fully aligned to the standard content AP/Task is a good match to standard content

Age Appropriateness

I-Inappropriate N-Neutral A-Adapted

Content is off-grade level Content is not age-bound, it is appropriate at any age or grade Adapted from, or linked to, age/grade-level content

Performance Centrality

N-None S-Some A-All

AP/Task has no similar performance types AP/Task has some similar performance types AP/Task has the same performance types

Symbolic Communication

A-Awareness E-Early symbolic S-Symbolic

Minimal intentional communication.; teacher inferred Recognizes some symbol-object relationships Has a broad knowledge of symbols, communicates picture or words through speech, assistive technology, signing

Accessibility Y-Yes N-No

AP is accessible to all students Some students cannot access content (explain who & why)

Modifications or Supports

Y-Yes N-No

Modifications and supports can be provided for this Task. This Task is not amenable to supports or modifications without changing meaning or difficulty.

Step 4 and 6 Content Differentiation (across grades) for APs and Tasks Category Description

Broader Higher-grade APs reflect broader application of target skill/knowledge.

Higher tasks reflect broader application of target skill/knowledge (AP).

Deeper Higher-grade APs reflect deeper mastery of the target skill/knowledge.

Higher tasks reflect deeper mastery of the target skill/knowledge (AP).

Prerequisite Lower-grade APs target a prerequisite skill for mastery of the higher grade AP.

Lower tasks target a prerequisite skill for mastery of the AP.

New The higher-grade has a new skill or knowledge unrelated to skill/knowledge covered at prior grades.

The higher task has a new skill or knowledge that combined with the lower tasks allows for the complete AP.

Identical Higher-grade APs appear identical to one of the lower-grade APs.

Higher tasks appear identical to one of the lower tasks in what a student is being asked to know/do.


Depth of Knowledge – ELA Reference Sheet revised 11/21/14

All items should be assigned a Depth of Knowledge level based on the information presented in the table below. Content clarification examples are not exhaustive and general performance verbs are not the defining criteria for Depth of Knowledge classification.

1 Attention General Performance Verbs: touch look vocalize repeat attend

• Simple commands that require no answer—only require doing the command.

• Generally not assessed as a skill. Used to focus the student on a task.

Examples: Look at me. Listen while I read this story.

2 Rote Knowledge, Memorize& Recall

General Performance Verbs: list identify state label recognize record match recall retell

• Habitual response—recalls previously heard or learned information. • Practiced, rote behavior. • No inferences are required for correct answer. • Habitual response of common day to day activities or objects.

English Language Arts

Matches picture/word to picture/word. Identifies rhyming words. Identifies letters by phonics/sounds or sight. Identifies detail of text of 2-3 simple sentences using verbatim wording. Identifies correct spelling of misspelled word. Identifies misspelled common words. Identifies letters and phonetically regular, high frequency words (self-read).

Examples: Show me/tell me… …which can you drink from? (book, cup, pen) …what do you read? (book, desk, stapler) …which pair of words rhyme?


3 Use of Knowledge and Information

General Performance Verbs: perform tell demonstrate follow count locate name read describe define spell

• Engagement of some mental processing beyond habitual response. • Simple inferences may be needed. • Uses information from a chart or graph to make simple inferences in order to

correctly respond. • Chooses what comes next in a sequence.


Indicates comprehension of basic/common words or two to three word sentences.

Identifies main idea by applying information gained from text.

Identifies detail by making simple inferences.

Identifies a relevant or best sentence to add to passage.

Self-reads materials/passages.

Identifies best word to complete sentence.

Identifies initial word in sentence in need of capitalization.

Identifies the correct spelling of grade appropriate words presented in sentence.

Identifies prefixes/suffixes in words.

Identifies incorrectly used common punctuation.

Identifies basic punctuation including periods, commas, and question marks.

Examples: Show me/tell me… …what is the main idea?

…who is this story about?

…what fits in the blank of this sentence?

…what happens next in the story?

…which word in this sentence is misspelled?

…which word uses the pre-fix…..

…which group of words has a comma?

…which word describes sound?

…which piece of evidence supports this clam?


4 Comprehension General Performance Verbs: explain conclude group categorize restate review translate describe paraphrase infer summarize illustrate compute classify solve

• Strategic thinking—requires reasoning, planning a sequence of steps. • Answer choices summarize and are not verbatim from passage.


FROM INFORMATION THAT IS INFERRED:

• Identifies theme or message of a story. • Identifies main idea by drawing

conclusions or making inferences. • Identifies elements of a story without

definition of the element. • Identifies purpose of writing passage. • Selects best sentence(s) for middle or

end of passage (correct order required). • Orders three or more sentences to

communicate logical sequence of events. • Sorts or groups words or items with

categories given. • Identifies sentence that best supports

topic. • Identifies two or more sentences to

complete a composition. • Identifies correct meaning of words from

context sentence. • Edits for correct use of subject and verb

agreement. • Edits for correct use of singular and plural

nouns. • Identifies proper nouns and pronouns

within sentences, and book titles in need of capitalization.

• Identifies correct usage of punctuation.

Examples: Show me/tell me…

…what is the main idea? …who is this story about? …what is the “plot” of this story? …which of these is found inside a house and which are found outside a house? (bed, swing set, trees, car, computer) Bed becomes a plural (more than one bed) by adding an “s”. …what would more than one tree be? (tree, treeses, trees) …which sentence shows commas used correctly? …which sentence provides the best conclusion by stating why the claim is significant?

5 Application General Performance Verbs: organize collect apply construct use develop generate interact with text implement compare contrast

• Extended thinking—making connections within and between subject domains, non routine problem solving.

• Student generates answer without cues. English Language Arts

• Makes connections between multiple sources.

• Compares events in two passages. • Generates response. • Implements a plan.

Examples: Show me/tell me…

…how the poem and the story are the same. …how the structure of both passages is the same. …how to revise this sentence using fewer words. (no response options)


6 Analysis Evaluation

General Performance Verbs: pattern analyze compose predict extend plan judge evaluate interpret cause/effect investigate examine distinguish differentiate generate

• Requires investigation. • Student predicts based on information

given. • Student creates possible alternative

outcomes. • Student uses multiple sources to answer

question without cues/supports. • Generally, DOK levels of 6 will not be

found on the assessment unless open response items that require investigation using two or more texts are assessed.

Examples: …tell me another possible ending to the story (no options provided). …what kind of science experiment can you do to find out how many hours of sun a seed needs to sprout?


Depth of Knowledge Rubric - Social Studies revised 6/2017 All items should be assigned a Depth of Knowledge level based on the information presented in the table below. Content clarification examples are not exhaustive and general performance verbs are not the defining criteria for Depth of Knowledge classification.

1 Attention General Performance Verbs: Touch, look, vocalize, repeat, attend

Simple commands that require no answer—only require doing the command.

Generally not assessed as a skill. Used to focus the student on a task.

Examples: Look at me. Listen while I read this story.

2 Rote Knowledge, Memorize& Recall General Performance Verbs: List, identify, state, label, recognize, record, match, recall, retell

Habitual response—recalls previously heard or learned information. Practiced, rote behavior. No inferences are required for correct answer. Habitual response of common day to day activities or objects.

Social Studies Matches pictures and/or words. Identifies details from text (1-2 simple

sentences) using verbatim wording. Identifies familiar characteristics of time

periods or situations. Recognizes simple definitions of social

studies related terms when definition is provided.

Examples

…what is something else that is built by people? (ship, rock, leaf)

…what is a manufactured good? (cats, shoes, trees)

What is a [law, rule, right, constitution, amendment]?

3 Use of Knowledge and Information General Performance Verbs: Perform, tell, demonstrate, follow, count, locate, name, read, describe, define, spell

Engagement of some mental processing beyond habitual response. Simple inferences may be needed. Uses information from a chart or graph to make simple inferences in order to

correctly respond. Chooses what comes next in a sequence.

Social Studies Identifies detail of text with 2-4 sentences

requiring a slight inference or connection of ideas.

Indicates comprehension of common social studies content words or concepts.

Identifies the how, who, what, and/or why of governmental processes.

Identifies reasons or importance of events and/or actions.

Examples:

Why did (name of person) build a (name of structure or invention)?

What was one reason why the (name of event or situation) take place?

What is the process for making a (law, rule, constitutional amendment)?

Why is (law, rule, right, constitution, amendment) important?


4 Comprehension

General Performance Verbs: Explain, conclude, group, categorize, restate, review, translate, describe, paraphrase, infer, summarize, illustrate, compute, classify, solve

Strategic thinking—requires reasoning, planning a sequence of steps. Answer choices summarize and are not verbatim from passage.

Social Studies

Draws conclusions based on information provided in a chart, table, or diagram.

Uses information to complete a chart. Identifies trends and/or changes in

processes or in ways of life. Identifies reasons and/or consequences of

changes.

Examples:

Based on information in the chart, how has (process, occupation, way of living, law, constitution) changed over the years?

Which sentence best completes the chart?

What was one result of the change in (event, people living in area, law, economic situation, invention)?

5 Application

General Performance Verbs: Organize, collect, apply, construct, use, develop, generate, interact with text, implement, compare, contrast

Extended thinking—making connections within and between subject domains, non routine problem solving.

Student generates answer without cues.

Social Studies

Explains cause and effect relationships. Explain similarities. Explain differences.

Examples:

Based on the agreements, what would have happened if. . . ?

In what way are these two (people, organizations, laws, events, governmental programs) alike?

What is one difference between. . . ?


6 Analysis Evaluation

General Performance Verbs: pattern, analyze, compose, predict, extend, plan, judge, evaluate, interpret, cause/effect, investigate, examine, distinguish, differentiate, generate

Requires investigation. Student predicts based on information

given. Student creates possible alternative

outcomes. Student uses multiple sources to answer

question without cues/supports. Generally, DOK levels of 6 will not be

found on the assessment unless open response items that require investigation using two or more texts are assessed.

Examples:

…tell me another possible ending to the story (no options provided).

…what kind of science experiment can you do to find out how many hours of sun a seed needs to sprout?

Independent Alignm

ent Review

of the FS

AA

-PT

: Civics, U

S H


rompts

A-15

Panelists reviewed the individual FSAA item tasks using the following rating form in electronic format. The format of the rating form was identical for each subject grade-level.

Independent Alignment Review of the Florida Standards Alternate … · 2019-08-21 · Independent...

Documents

Transcript of Independent Alignment Review of the Florida Standards Alternate … · 2019-08-21 · Independent...