NH SAS Items and Sample Blueprints...AIR test developers who develop items for summative tests,...

21
NH SAS Items and Sample Blueprints Contents Summative Assessments................................................................................................................. 2 Interim Assessments ....................................................................................................................... 4 Blueprints ........................................................................................................................................ 6 ELA............................................................................................................................................... 6 Mathematics ............................................................................................................................... 7 Science. ....................................................................................................................................... 8 Item Types and Number of Items ................................................................................................. 10 Sample Mathematics Task ........................................................................................................ 10 Exhibit D1.2-22 Interaction Types......................................................................................... 11 Exhibit D1.2-23: Sample Mathematics Item ......................................................................... 12 Sample English Language Arts Task .......................................................................................... 13 Exhibit D1.2-25: Sample ELA Item......................................................................................... 14 Sample Science Task ................................................................................................................. 15 Exhibit D1.2-27: Sample Science Item .................................................................................. 16 Interaction Types .......................................................................................................................... 18 Number of Items ........................................................................................................................... 19

Transcript of NH SAS Items and Sample Blueprints...AIR test developers who develop items for summative tests,...

Page 1: NH SAS Items and Sample Blueprints...AIR test developers who develop items for summative tests, including the ICCR items. Items in the Navigator bank thus have the same look and feel

NH SAS Items and Sample Blueprints

Contents Summative Assessments ................................................................................................................. 2

Interim Assessments ....................................................................................................................... 4

Blueprints ........................................................................................................................................ 6

ELA. .............................................................................................................................................. 6

Mathematics ............................................................................................................................... 7

Science. ....................................................................................................................................... 8

Item Types and Number of Items ................................................................................................. 10

Sample Mathematics Task ........................................................................................................ 10

Exhibit D1.2-22 Interaction Types ......................................................................................... 11

Exhibit D1.2-23: Sample Mathematics Item ......................................................................... 12

Sample English Language Arts Task .......................................................................................... 13

Exhibit D1.2-25: Sample ELA Item......................................................................................... 14

Sample Science Task ................................................................................................................. 15

Exhibit D1.2-27: Sample Science Item .................................................................................. 16

Interaction Types .......................................................................................................................... 18

Number of Items ........................................................................................................................... 19

Page 2: NH SAS Items and Sample Blueprints...AIR test developers who develop items for summative tests, including the ICCR items. Items in the Navigator bank thus have the same look and feel

NH SAS Items and Sample Blueprints Summative Assessments

ELA and Mathematics. The ICCR item pools have been constructed explicitly to support statewide assessment programs and have been administered as part of the Arizona, Florida, Ohio, Tennessee and Utah statewide assessments, and they have been embedded in Oregon’s administration of the Smarter Balanced assessments. ICCR items were developed in conjunction with state departments of education, following a rigorous system of internal and external review procedures implemented by each of the participating states. Thus, all ICCR items have been reviewed by content review committees comprising educators in one or more states, as well as by bias and fairness review committees in those states. All ICCR items have been field tested in embedded slots within operational test administrations so that item parameter estimates, based on large numbers of students participating in state summative assessments, are highly precise and stable. Following field test administration, item statistics for all ICCR items are evaluated for discrimination, difficulty, and differential item functioning, and any items flagged for out of range statistics are forwarded for further review by AIR content staff, AIR psychometric staff, and state assessment staff, with items that performed poorly rejected from the pool.

Following each test administration, ICCR items are calibrated using multiple IRT models and linked back to each state’s scale. In addition, a linking design was enacted to link item parameters between each of the state assessment systems to the common ICCR scale. Thus, the performance levels for each ICCR state can be represented on the ICCR scale. Moreover, benchmarks for other assessments, such as NAEP, TIMSS, or PISA identified for any participating state, can also be represented on the ICCR scale. The ICCR items are therefore not only very robust, with each item having passed through rigorous reviews in typically multiple statewide assessment systems, the validity of test score interpretations based on the ICCR scale are greatly enhanced by virtue of abundant benchmarking of scale score locations.

The ICCR item bank has been constructed to enact a model college and career ready blueprint. Blueprints are generally consistent across the participating states, but do allow flexibility for each state to craft and implement a custom blueprint that meets the unique requirements and needs of each state. AIR will work with the Department to finalize a blueprint that aligns to the New Hampshire College and Career Ready Standards; yields test scores that are valid and reliable, both overall and for all domain reporting categories; and ensures that test administration times remain within desired limits.

An important focus of the New Hampshire Statewide Assessments is on reporting, and adaptive test administration provides important advantages for interpreting test scores for all test users. With fixed-form tests, test forms are necessarily constructed using only a small sample of items comprising the content domain. At the level of individual standards, fixed-form tests may contain only 1–2 items and are thus not representative of all items comprising the standard. Conversely, adaptive test administrations proceed from the full pool of available items, providing a far better representation of the intended content domain to be assessed. As with a fixed-form test, any one student may see only 1–2 items sampled from a standard, but in the aggregate students are seeing different samples of items measuring those standards; at the level of classrooms and schools, each standard is assessed across a much broader sample of items representing the content domain, allowing for a much more robust measure of classroom and school performance at a finer grained level of analysis. Thus, AIR is able to provide educators with standard-level analyses of class and school performance that are highly reliable and lead to more valid interpretations of student performance since those indicators are based on a more representative range of the knowledge and skills subsumed within those standards.

Page 3: NH SAS Items and Sample Blueprints...AIR test developers who develop items for summative tests, including the ICCR items. Items in the Navigator bank thus have the same look and feel

NH SAS Items and Sample Blueprints Science Assessments. As the Department is aware, the ICCR item bank in science is being developed in collaboration with a group of states developing common item and item cluster specifications to measure three dimensional science standard. As with the ICCR item pools in ELA and mathematics, the ICCR item pools in science will allow the Department to identify the performance standards for any consortium state on the common ICCR scale. ICCR science items will be available for administration in spring 2018. For spring 2018, we propose to administer science item clusters and items in an operational field test design that will administer test forms that meet all blueprint specifications, allow for calibration and equating of science items to establish the ICCR science scale, and support identification and adoption of performance standards for New Hampshire’s statewide assessments in science. Because parameter estimation must follow test administration in spring 2018, immediate scoring and reporting of test scores will not be possible as part of the first administration of the science assessments. However, since standard-setting workshops necessary to recommend performance standards cannot commence until after the first operational test administration in order to obtain impact data, the post-equating approach would not further delay reporting of student test scores.

Page 4: NH SAS Items and Sample Blueprints...AIR test developers who develop items for summative tests, including the ICCR items. Items in the Navigator bank thus have the same look and feel

NH SAS Items and Sample Blueprints Interim Assessments

To support more truly formative uses of interim test results, we propose an interim test design around AIRWays, our innovative reporting tool for interim benchmark assessments. AIRWays is designed to leverage testing events to drive students and teachers to interact. The system reports only non-secure test results and shows the teacher both the item and each student’s actual response. This provides a platform and opportunity for the teacher and student to begin exploring gaps in knowledge to support instruction. AIRWays fosters more truly formative assessments by allowing teachers and students to explore their responses to test items and to focus instruction in more meaningful ways. Truly formative testing requires teachers and students to work together to understand how and why students respond to test items, which precludes administration of secure item content.

Eventually, however, the NH DOE may wish to aggregate student interim results to replace a summative test administration for accountability purposes. Since any accountability use of test items requires rigorous test security, an assessment regime in which interim assessments replace a summative assessment for accountability would have to be administered using the same ICCR item pool and test administration procedures as the summative assessments, precluding reporting of test results in AIRWays.

Since aggregating interim assessments to replace a summative test administration is not yet an available option for New Hampshire, we propose a two-stage solution to the NH DOE’s interim assessment requirement. Until USED determines that summative assessments can be replaced by aggregating a series of interim assessments, we propose to deliver New Hampshire’s interim assessments using one of several non-secure formative assessment banks.

One option is to deliver interim assessments from our own Learning Point Navigator item bank, as well as items from banks provided by Key Data System (KDS). Items in Navigator were developed by the same AIR test developers who develop items for summative tests, including the ICCR items. Items in the Navigator bank thus have the same look and feel as our ICCR items and are delivered using the same test engine as the summative assessments we are proposing. The KDS pools, which offer quality mathematics items aligned to state College and Career Ready Standards, provide a variety of item types that will allow us to deliver interim math assessments consistent with the summative assessments in both content and functionality. AIR will work with the Department to embed interim items in the summative assessments to calibrate and equate the Navigator and KDS items pools to the ICCR scale so that interim assessment results based on these item pools can be reported on the New Hampshire reporting scale.

New Hampshire may also be able to access Utah’s formative item pool populated with items already calibrated and equated to the ICCR scale. The item pools making up Utah’s formative assessments have several very important strengths. Utah’s formative item pool is composed of the same kind of items and item types used to administer Utah’s Student Assessment of Growth and Excellence (SAGE) accountability assessments. In fact, the formative item pools were drawn from the SAGE item pools. Thus, all formative items were developed to align with Utah’s Core Standards, which are consistent with New Hampshire College and Career Ready Standards. Moreover, because these items were originally part of Utah’s SAGE accountability assessment item pool, the items were developed following the same rigorous procedures described in Topic 2 Item Development. All items currently in the formative pool passed through all levels of content, fairness, and field-test data review and were promoted to the SAGE operational item pool. All items in the formative pool have item calibrations estimated from student responses in operational test administrations. As one of several states collaborating in the development of AIR’s ICCR item pool, Utah’s formative item bank includes the same high-quality items and versatile machine-scored item types as the ICCR item banks that we propose for New Hampshire’s summative assessments. Thus, the look and feel of the Utah formative items will be consistent with the ICCR summative assessments. Importantly, IRT parameters for items in the formative pool are already linked to the ICCR scale to support consistent reporting of interim and summative assessments results.

Moreover, because Utah’s formative item banks were designed to support multiple-opportunity adaptive test administrations, they can be flexibly deployed to meet a range of formative assessment goals. For example, the NH DOE could elect to deploy a system of multiple-opportunity comprehensive interim

Page 5: NH SAS Items and Sample Blueprints...AIR test developers who develop items for summative tests, including the ICCR items. Items in the Navigator bank thus have the same look and feel

NH SAS Items and Sample Blueprints assessments configured to administer adaptive test administrations meeting a proportional length summative blueprint. These interim assessments can be used by educators to track student progress toward achievement of the grade-level standards. Alternatively, the formative pool can be configured to administer a series of fixed-form benchmark assessments that educators can administer to evaluate student achievement of discrete instructional modules.

Utah’s SAGE and ICCR scales have been linked via a common-item design so that interim assessment results, including scale scores and performance levels, can be reported on the same scale used to report New Hampshire’s summative test results. This allows educators to monitor student progress toward achieving New Hampshire College and Career Ready Standards.

Because the items making up the Utah formative assessments are not secure, the results of benchmark assessments can be reported back to educators using AIRWays, allowing teachers to observe how students respond to individual items and to identify gaps in student understanding.

Should the US Education Department (USED) determine that the Every Student Succeeds Act (ESSA) allows states to report aggregate interim assessment results in lieu of a summative assessment for accountability purposes, the NH DOE may wish to offer this option to New Hampshire schools. Of course, any interim assessment system designed to substitute for summative test scores would need to satisfy the same peer review elements required of summative assessment systems, including requirements for item and test security. Thus, should the Department wish to transition to an interim assessment system that supports requirements for an accountability system, we propose to administer both the interim and summative assessments from the ICCR item pools. Test administrations can be configured for comprehensive or block interim assessments in addition to grade-level summative assessments. The ICCR item pools are sufficiently large to support both interim and summative test administrations, and AIR’s adaptive algorithm is configured to ensure that students are not administered the same item across test administrations. (this can be relaxed to allow a previously administered item if necessary to meet blueprint.) Because the accountability-based interim assessments would need to be secure, reporting for both the interim and summative assessments would be through our online reporting system (ORS), which we describe fully in Topic 22 Reports, and which provides educators with a highly intuitive and powerful tool for navigating assessment results.

Page 6: NH SAS Items and Sample Blueprints...AIR test developers who develop items for summative tests, including the ICCR items. Items in the Navigator bank thus have the same look and feel

NH SAS Items and Sample Blueprints Blueprints

The ICCR item banks have been constructed to enact model college and career ready blueprints that are generally consistent across participating states, but do allow flexibility for each state to craft and implement a custom blueprint that meets the unique requirements and needs of each state. AIR will work with the Department to finalize a blueprint that aligns to the New Hampshire College and Career Ready Standards and that yields test scores that are valid and reliable, both overall and for all domain reporting categories. In addition, as New Hampshire works to revise their academic standards, AIR will work with the Department to revise the blueprints as necessary to align with the new standards.

ELA. Although multiple configurations can be supported, the ICCR item banks have been constructed to support model ELA blueprints that are designed to report student achievement in Reading Literature, Reading Informational Text, and Writing/Language. Exhibit D1.1-1 provides a sample ELA blueprint for grades 3–5. This blueprint supports a test design in which students would be administered four reading passages (two Reading Literature and two Reading Informational Text), two editing tasks comprised of 3–4 language items each, and one writing prompt. AIR proposes to work with Department to evaluate whether this blueprint meets the needs of New Hampshire’s statewide assessment system and to make modifications to it as needed. Exhibit D1.1-1: Sample ELA Blueprint for Grades 3–5

Reading Literary: 50%

42 items Informational: 50%

Reading Literature 16–18 Key Ideas and Details 4–6 Craft and Structure 4–6 Integration of Knowledge and Ideas 1–4 Language/Vocabulary Acquisition and Use 1–2 Listening 0–4

Reading Informational 16–18 Key Ideas and Details 6–8 Craft and Structure 6–8 Integration of Knowledge and Ideas 2–6 Language/Vocabulary Acquisition and Use 1–2 Listening 0–4 Language/Editing 6–8 Writing 1 Task 1: Expository Essay 0–1 Task 2: Argumentative Essay 0–1

Page 7: NH SAS Items and Sample Blueprints...AIR test developers who develop items for summative tests, including the ICCR items. Items in the Navigator bank thus have the same look and feel

NH SAS Items and Sample Blueprints

Mathematics. The model blueprints on which the ICCR mathematics item banks have been constructed are designed to support content domain reporting of student performance, combining domains when necessary to support reporting requirements. The blueprints call for 50 items for the grades 3–8 tests.

The blueprints contain item ranges for each reporting category, content cluster, and content standard, ensuring comprehensive content coverage while still allowing flexibility to adapt item selection to student ability. The blueprints also contain ranges for Depth of Knowledge (DoK), ensuring that test administrations probe student achievement across the range of cognitive demand specified in the academic standards. AIR proposes to work with the Department to ensure that this blueprint meets the needs of New Hampshire’s statewide assessment system, and to make modifications to it as needed. Exhibit D1.1-2 below provides a sample from the grade 7 blueprint showing how the minimum and maximum number of items are nested in reporting category/cluster/standard.

Exhibit D1.1-2: Sample Mathematics Blueprint for Grade 7 Domain Total Cluster 50 50

Standard MIN MAX

RP Ratios and Proportional Relationships 11 13

A Analyze proportional relationships and use them to solve real-world and mathematical problems. 11 13

7.RP.1 0 5

7.RP.2(abcd) 0 5

7.RP.3 0 5

NS The Number System 9 11

B Apply and extend previous understandings of operations with fractions to add, subtract, multiply, and divide rational numbers. 9 11

7.NS.1(abcd) 0 4 7.NS.2(abcd) 0 4

7.NS.3 0 4

EE Expressions and Equations 8 10 C Use properties of operations to generate equivalent expressions. 2 6

7.EE.1 0 3

7.EE.2 0 3

D Solve real-life and mathematical problems using numerical and algebraic expressions and equations. 2 6

7.EE.3 0 3

7.EE.4(ab) 0 3

Page 8: NH SAS Items and Sample Blueprints...AIR test developers who develop items for summative tests, including the ICCR items. Items in the Navigator bank thus have the same look and feel

NH SAS Items and Sample Blueprints Exhibit D1.1-2: Sample Mathematics Blueprint for Grade 7 (continued)

Domain Total Cluster 50 50

Standard MIN MAX G Geometry 9 11

E Draw, construct, and describe geometrical figures and describe the relationships between them. 2 6

7.G.1 0 2 7.G.2 0 2 7.G.3 0 2

F Solve real-life and mathematical problems involving angle measure, area, surface area, and volume. 2 6

7.G.4 0 2 7.G.5 0 2

7.G.6 0 2 SP Statistics and Probability 9 11 G Use random sampling to draw inferences about a population. 0 3

7.SP.1 0 2 7.SP.2 0 2

H Draw informal comparative inferences about two populations. 0 3 7.SP.3 0 2 7.SP.4 0 2

I Investigate chance processes and develop, use, and evaluate probability models. 0 6

7.SP.5 0 2 7.SP.6 0 2

7.SP.7(ab) 0 2 7.SP.8(abc) 0 2

TOTAL ITEMS (for affinity

groups) 46 56

Science. Construction of test blueprints for New Hampshire college and career ready standard science presents a special challenge due to the cluster design of test items and the very large number of standards to be assessed. The goal of blueprint construction for these assessments is to ensure that students are administered psychometrically equivalent test forms, with respect both to coverage of discipline core ideas and distribution of test information. And at the aggregate level, to ensure that full range of standards are assessed and that group means for standards are as precise as possible.

Exhibit D1.1-3 presents a sample grade 5 ICCR science blueprint. In this blueprint, each student is administered a summative test form comprising 7 clusters and 12 stand-alone items, plus an embedded field test slot for administering one field test cluster or six stand-alone items. Three clusters and approximate 16 stand-alone items measure standards in the Physical Science DCI, while the Life Science and Earth/Space Science DCIs are each measured by 2 clusters and approximately 12 stand-alone items.

Employing a matrix design, multiple test forms are constructed, each conforming to blueprint specifications but measuring different standards within the DCIs. Thus, while each student is administered equivalent numbers of clusters and items for each DCI, students in a classroom will be assessed across the range of standards defining each DCI and resulting in aggregate measures of science

Page 9: NH SAS Items and Sample Blueprints...AIR test developers who develop items for summative tests, including the ICCR items. Items in the Navigator bank thus have the same look and feel

NH SAS Items and Sample Blueprints achievement that measure the entire content domain defined by the New Hampshire College and Career Ready Standards. Moreover, for larger aggregate units, student achievement of science can be meaningfully evaluated at the standard level, helping educators to identify areas of strength and weakness in the science curriculum.

Exhibit D1.1-3: Sample Grade 5 Science Blueprint

Min Clusters

Max Clusters

Min Stand Alone Items

Max Stand Alone Items

Min Items

Max Items

Min Clusters*

Max Clusters*

Min Stand Alone Items

Max Stand Alone Items

Min Items

Max Items

Min Items

Max Items

Physical Science 3 3 4 4 15 17 0 1 0 6 0 6 21 233-PS2-1 0 1 0 1 0 5 0 1 0 1 0 6 0 63-PS2-2 0 1 0 1 0 5 0 1 0 1 0 6 0 63-PS2-3 0 1 0 1 0 5 0 1 0 1 0 6 0 63-PS2-4 0 1 0 1 0 5 0 1 0 1 0 6 0 64-PS3-1 0 1 0 1 0 5 0 1 0 1 0 6 0 64-PS3-2 0 1 0 1 0 5 0 1 0 1 0 6 0 64-PS3-3 0 1 0 1 0 5 0 1 0 1 0 6 0 64-PS3-4 0 1 0 1 0 5 0 1 0 1 0 6 0 64-ESS3-1 0 1 0 1 0 5 0 1 0 1 0 6 0 64-PS4-1 0 1 0 1 0 5 0 1 0 1 0 6 0 64-PS4-3 0 1 0 1 0 5 0 1 0 1 0 6 0 65-PS1-1 0 1 0 1 0 5 0 1 0 1 0 6 0 65-PS1-2 0 1 0 1 0 5 0 1 0 1 0 6 0 65-PS1-3 0 1 0 1 0 5 0 1 0 1 0 6 0 65-PS1-4 0 1 0 1 0 5 0 1 0 1 0 6 0 6Life Science 2 2 4 4 11 13 0 1 0 6 0 6 17 193-LS2-1 0 1 0 1 0 5 0 1 0 1 0 6 0 63-LS4-1 0 1 0 1 0 5 0 1 0 1 0 6 0 63-LS4-3 0 1 0 1 0 5 0 1 0 1 0 6 0 63-LS4-4 0 1 0 1 0 5 0 1 0 1 0 6 0 63-LS1-1 0 1 0 1 0 5 0 1 0 1 0 6 0 63-LS3-1 0 1 0 1 0 5 0 1 0 1 0 6 0 63-LS3-2 0 1 0 1 0 5 0 1 0 1 0 6 0 63-LS4-2 0 1 0 1 0 5 0 1 0 1 0 6 0 64-PS4-2 0 1 0 1 0 5 0 1 0 1 0 6 0 64-LS1-1 0 1 0 1 0 5 0 1 0 1 0 6 0 64-LS1-2 0 1 0 1 0 5 0 1 0 1 0 6 0 65-PS3-1 0 1 0 1 0 5 0 1 0 1 0 6 0 65-LS1-1 0 1 0 1 0 5 0 1 0 1 0 6 0 65-LS2-1 0 1 0 1 0 5 0 1 0 1 0 6 0 6Earth/Space Science 2 2 4 4 11 13 0 1 0 6 0 6 17 193-ESS2-1 0 1 0 1 0 5 0 1 0 1 0 6 0 63-ESS2-2 0 1 0 1 0 5 0 1 0 1 0 6 0 63-ESS3-1 0 1 0 1 0 5 0 1 0 1 0 6 0 64-ESS1-1 0 1 0 1 0 5 0 1 0 1 0 6 0 64-ESS2-1 0 1 0 1 0 5 0 1 0 1 0 6 0 64-ESS2-2 0 1 0 1 0 5 0 1 0 1 0 6 0 64-ESS3-2 0 1 0 1 0 5 0 1 0 1 0 6 0 65-ESS2-1 0 1 0 1 0 5 0 1 0 1 0 6 0 65-ESS2-2 0 1 0 1 0 5 0 1 0 1 0 6 0 65-ESS3-1 0 1 0 1 0 5 0 1 0 1 0 6 0 65-PS2-1 0 1 0 1 0 5 0 1 0 1 0 6 0 65-ESS1-1 0 1 0 1 0 5 0 1 0 1 0 6 0 65-ESS1-2 0 1 0 1 0 5 0 1 0 1 0 6 0 6

Total 7 7 12 12 39 41 0 1 0 6 6 6 45 47

Item Ranges - Operational Item Ranges - Embedded Field Test Total Items

Page 10: NH SAS Items and Sample Blueprints...AIR test developers who develop items for summative tests, including the ICCR items. Items in the Navigator bank thus have the same look and feel

NH SAS Items and Sample Blueprints Item Types and Number of Items

AIR offers a wide array of interaction types that go above and beyond those listed in the RFP. We provide sample items in this section that not only demonstrate our capability to provide a variety of machine/AI-scorable items but that also show the functionality of our cutting-edge item authoring tool, AIRCraft. AIRCraft removes constraints that have historically impeded item developers from developing items that engage students in meaningful activities. AIRCraft also allows us to challenge students to apply the skills and knowledge that they have acquired in class.

To understand AIRCraft, set aside the notion of an item type. Think instead about interactions, that is, different ways that students can interact with the item to demonstrate what they know and can do. A multiple-choice item has a single interaction and is a selection interaction. Another item might ask a student to draw a graph (graphic-response interaction) or write an equation (equation interaction). Our revolution in test development breaks down the barriers between these types of interactions and enables us to incorporate all of these interactions in a single item. Students can be scored not only on whether they get the right answer, but also on whether they are able to appropriately incorporate choices in earlier interactions with choices made in other interactions. This capability allows for multiple solution paths and can challenge students to apply knowledge in meaningful settings to craft multi-step solutions.

The interaction types currently available are shown in Exhibit D1.2-22. Please see Appendix Q for screen captures of each of the interaction types we offer.

At the same time that AIRCraft facilitates more meaningful, challenging items, it also integrates accessibility into each item, automatically inserting Accessible Rich Internet Applications (ARIA) tags and other accessibility markup needed to have the items comply with Web Content Accessibility Guidelines (WCAG) 2.0 accessibility standards and promote compatibility with assistive technologies. Doing so makes our online tests accessible to virtually all students.

AIRCraft enables test developers to create machine-scored items that challenge students to actually construct something and demonstrate their understanding. Perhaps AIRCraft’s most important innovation is the ability to base scoring on specific features of a student’s response, rather than simply matching correct answers to a student’s selection. As illustrated in the item samples that follow, AIRCraft items embody evidence-centered design; for each item, the scoring is based on an explicit set of assertions that link features of student responses to the specific skill or knowledge that the response demonstrates. AIRCraft is designed for both efficiency and power. Common item interactions are established using templates, and scoring keys or scoring rules can be specified with a few selections from a graphical interface. More complex item interactions require test developers to introduce (sometimes complex) scoring rules. AIRCraft includes an easy-to-use graphical user interface to author complex rubrics for items with minimal effort. The system has built-in tools to unit test the rubric by subjecting it to a variety of inputs. Validation checks are integrated into each step to guide the item-authoring process.

Sample Mathematics Task

Exhibit D1.2-23 presents an example of the type of machine-scored, multiple-interaction item that test developers can develop using AIRCraft. This sample item is adapted from a classroom activity recommended by Illustrative Mathematics, an organization founded by Bill McCallum.

This item consists of five interactions, and the scoring of the interactions is dependent. The choices the student makes in one interaction are recognized and taken into account in other interactions. Using this technology, students have great latitude to make choices and demonstrate how they are thinking.

Page 11: NH SAS Items and Sample Blueprints...AIR test developers who develop items for summative tests, including the ICCR items. Items in the Navigator bank thus have the same look and feel

NH SAS Items and Sample Blueprints Exhibit D1.2-22 Interaction Types

Page 12: NH SAS Items and Sample Blueprints...AIR test developers who develop items for summative tests, including the ICCR items. Items in the Navigator bank thus have the same look and feel

NH SAS Items and Sample Blueprints Exhibit D1.2-23: Sample Mathematics Item

The way the items are scored creates a direct linkage between what the student does and how features of the student’s response provide evidence about the knowledge and skills the student has achieved. This approach provides a physical embodiment of evidence-centered design, Mislevy’s well-regarded approach to cognitive measurement. This approach provides a structure for ensuring and reviewing alignment during test development, and a clear explanation not only of what was measured but also how and why to support tests scoring and reporting.

Exhibit D1.2-24 provides a real example of the scoring assertions (evidence statements) that are used to score the sample item above. This sample item assesses student understanding of the concept of functions (8.F.A.1) and graphs of functions via the use of ordered pairs (also 8.F.A.1). It requires students to construct a model of a function given key information (8.F.B.4) and interpret key parts of the function (8.F.B.4). The item draws upon students’ knowledge of different representations of functions, in particular, linear functions (all part of cluster A: define, evaluate, and compare functions.) Further, students attend to the Standards for Mathematical Practice by interpreting parts of the model with respect

Page 13: NH SAS Items and Sample Blueprints...AIR test developers who develop items for summative tests, including the ICCR items. Items in the Navigator bank thus have the same look and feel

NH SAS Items and Sample Blueprints to the context (MP4), reasoning abstractly about a function and generating an algebraic model (MP2), and attending to the structure of a function by focusing on its key parts (MP7). This item assesses knowledge of multiple complex mathematical concepts in one authentic, coherent flow. Students implement several steps of the modeling cycle and make choices in how they construct their models (with both an algebraic function and a graph modeling the function). Our scoring system recognizes the choices the student has made and machine scores subsequent responses based on those choices.

Exhibit D1.2-24: Sample Scoring Assertions Relating Specific Features from the Student Response to the Skills and Knowledge Being Tested of Which They Provide Evidence

Please note that the item asks the student for five interactions, but the scoring assertions extract seven pieces of information, covering four different standards and three mathematics practices. By inspecting the student response for every meaningful piece of student input, we are able to harvest more information about what the student knows and can do. This provides greater measurement efficiency, supporting shorter tests with more detailed reporting.

Sample English Language Arts Task

Many of AIR’s most innovative English language arts items use scoring assertions, which are indicators of whether a student’s response provides evidence of understanding tied to content standards or elements of those standards. In Exhibit D1.2-25, we present a sample ELA grade 10 item that demonstrates how scoring assertions explicitly tie features of a student response to the standards for which those features provide evidence. In this example, five interactions yield six scoring assertions (see Exhibit D1.2-24), providing evidence of four unique standards across two strands. The passage pair used for the item is a retelling of the Prometheus myth paired with an excerpt from Mary Shelley’s Frankenstein. The item aligns to Reading Literature standards 2, 6, and 9 and Writing standard 9 in the Common Core State Standards.

The multiple interactions in this item provide opportunities for students to peel away the layers of a complex text in order to provide evidence of deeper understanding. Beginning with an analysis of ideas in the text, the interactions bring in more information that allows students to analyze the texts in the context of “ideas beyond the text.” The scoring assertions in this item allow us to measure multiple standards by eliciting different types of evidence from the student. While some students may only provide evidence

Page 14: NH SAS Items and Sample Blueprints...AIR test developers who develop items for summative tests, including the ICCR items. Items in the Navigator bank thus have the same look and feel

NH SAS Items and Sample Blueprints that demonstrates an understanding of the surface-level meaning in the texts, others will provide evidence that they have a full understanding of the texts’ subtleties and relationship to ideas outside the of text.

Exhibit D1.2-25: Sample ELA Item

Page 15: NH SAS Items and Sample Blueprints...AIR test developers who develop items for summative tests, including the ICCR items. Items in the Navigator bank thus have the same look and feel

NH SAS Items and Sample Blueprints Exhibit D1.2-26: Scoring Assertions

Scoring Assertion Strand/Standard

The student selects only “the negative impact of civilization” as the theme, providing evidence of the ability to determine a theme or central idea of a text.

RL.2

The student selects one or more of the following options, and no others, providing evidence of the ability to determine how a theme is shaped by specific details:

• Supports “the negative impact of civilization” with “I heard of the division of property, of immense wealth and squalid poverty, of rank, descent, and noble blood.”

• Supports “the negative impact of civilization” with “I learned that the possessions most esteemed by your fellow creatures were high and unsullied descent united with riches.”

• Supports “the negative impact of civilization” with “A man might be respected with only one of these advantages, but without either he was considered, except in very rare instances, as a vagabond and a slave, doomed to waste his powers for the profits of the chosen few!”

RL.2

The student correctly supports how the selected theme and sentence support the connection between the creature and the myth, providing evidence that he or she can analyze how an author draws on and transforms source material in a specific work.

RL.9

The student selects only “The creature’s search for knowledge only leads to misery” and “The creature is rejected by both his creator and the cottagers,” providing evidence of the ability to relate a work of fiction to the seminal ideas of its time.

RL.8.A

The student correctly analyzes how the quote supports the ideas in the myth and the excerpt, providing evidence of the ability to draw evidence from literary texts to support analysis, reflection, and research.

W.9

The student correctly analyzes how the quote supports the ideas in the myth and the excerpt, providing evidence of the ability to analyze a particular point of view or cultural experience reflected in a work of literature from outside the United States and to draw evidence from literary texts to support analysis, reflection, and research.

W.9 and RL.6

Sample Science Task

We offer a sample item cluster as an example of the clusters that we are building in other states and the capabilities that would be available to the Department under this contract.

Here we present a cluster measuring a middle school level performance expectation related to the cycling of matter and energy in the water cycle. The student will develop a model to explain that solar energy is driving the cycling of water. We begin with a phenomenon: fog regularly forms and then dissipates over the course of a morning in an Oregon valley. The phenomenon is communicated verbally and with an animation, as shown in Exhibit D1.2-27. The introduction and animation appear on the left side of the screen, and the items appear on the right.

Page 16: NH SAS Items and Sample Blueprints...AIR test developers who develop items for summative tests, including the ICCR items. Items in the Navigator bank thus have the same look and feel

NH SAS Items and Sample Blueprints Exhibit D1.2-27: Sample Science Item

In this cluster, the student is asked to develop a mathematical model by identifying and graphing three factors that combine to create the phenomenon. Each empty graph has a 24-hour period on the horizontal axis. The period during which the fog is visible is marked on the graph. Using the drop-down menu, the student selects which factors to graph from a list containing distractors. Each graph is heuristic, rather than requiring specific quantities. Even though the student is asked to graph, the scoring rubric is looking for patterns reflecting conceptual understanding rather than mathematical understanding of the phenomenon.

Exhibit D1.2-28 illustrates one of many (virtually infinite) correct answers. The student should graph the amount of sunlight, the temperature, and the proportion of water in the air that is in a gas form. The final item asks the student to indicate the causal sequence of the fog’s formation and dissipation. Note that students can graph the factors in any order, as long as the graphs have the right characteristics (for example, solar energy increasing over the course of the morning as the fog dissipates).

Page 17: NH SAS Items and Sample Blueprints...AIR test developers who develop items for summative tests, including the ICCR items. Items in the Navigator bank thus have the same look and feel

NH SAS Items and Sample Blueprints Exhibit D1.2-28: Items and Sample Answer

These interactions are actually a single item, and the scoring depends on the collection of responses rather than any single interaction. Our technology enables a scoring rubric to look across multiple interactions.

Using this approach, we engage students in actual scientific activities—in this case, modeling for the purpose of explanation. The performance expectation calls on students to actually employ a model, and they do that in this cluster. Moreover, they use a model that explains energy and matter transfers within part of the water cycle, thereby weaving in elements of all three dimensions of the performance expectation.

The questions are truly open-ended constructed-response items. These items are also immediately and accurately machine scored. Our tools allow our test developers to develop these sophisticated item clusters without requiring the assistance of software developers.

Page 18: NH SAS Items and Sample Blueprints...AIR test developers who develop items for summative tests, including the ICCR items. Items in the Navigator bank thus have the same look and feel

NH SAS Items and Sample Blueprints Finally, the features of the student responses that receive credit and the inference that the test developer would like to make from that evidence are explicitly captured as part of the item in the scoring assertions. Exhibit D1.2-29 presents the scoring assertions for this response to this item. These scoring assertions embody evidence-centered design as a physical part of the item.

Exhibit D1.2-29: Scoring Assertions for Fog Cluster

Interaction Types

The reader can see that AIR offers all of the interaction types called for in the RFP, plus many more. The mathematics task alone demonstrates a variety of selected-response interactions, along with an equation interaction, which allows for far more response types than the traditional “gridded” response. In English language arts, our sample task provides selected-response items, including “hot text,” which requires the student to interact with the passage in order to link an inference to textual evidence. Short constructed-response items similar to the one in this sample task can be machine scored using our natural language tool, which matches the propositions in a student’s response to an explicit rubric. Extended constructed-response items requiring a multi-paragraph response can be machine scored, using our AI scoring engine, which is further described in Topic 16 Machine Scored Items. The science task demonstrates our sophisticated, simulation-based items.

Page 19: NH SAS Items and Sample Blueprints...AIR test developers who develop items for summative tests, including the ICCR items. Items in the Navigator bank thus have the same look and feel

NH SAS Items and Sample Blueprints Number of Items

As we describe in Topic 1 Test Design, AIR proposes to offer our internally developed, ICCR item bank to assess student achievement of New Hampshire College and Career Ready Standards in ELA, mathematics and science. The ICCR bank provides comprehensive, robust item pools at each grade. The item pools were developed to ensure that ICCR assessments cover the full range and depth of the content standards at the aggregate level for each test administration. The bank grows larger each year as we continue to field test new items across multiple states. The multi-state participation in ICCR item development and test administration further strengthens the state-to-state comparisons afforded by the common ICCR scale.

Exhibit D1.2-30 provides the number of items that we anticipate to have in the operational pools at each grade in ELA and mathematics following field testing in spring 2017. (Note that these item counts do not reflect the attrition that may occur following item data review after field testing in spring 2017).

In addition, AIR has been developing science item clusters and standalone items to assess the three-dimensional science standards. Exhibit D1.2-30 also documents the number of clusters and standalone items that we anticipate will be eligible for administration in spring 2018. As we describe in Topic 18 Calibration and Scaling and Topic 19 Equating, we propose to administer the science items as an operational field-test design, in which a matrix of fixed forms, each conforming to blueprint specifications, enacts a balanced incomplete block matrix linking design to allow for concurrent calibration of all science items on a common IRT scale.

Exhibit D1.2-30: Item Counts at Each Grade in ICCR Bank Following Spring 2017 Field Testing Pre-Equated Items Available

for Adaptive Testing Science Items Available

for Operational Field Testing

Grade Mathematics ELA Grade Band Clusters Stand-

alone 3 417 253

3−5 40 21 4 401 288 5 381 279 6 379 306

6−8 55 21 7 333 328 8 381 255 9 N/A 322

High School 67 21

10 N/A 341 11 N/A 317 HS 1,017 N/A

The ELA numbers in Exhibit D1.2-30 show items in the ICCR item bank aligned to Reading Literature, Reading Informational Text, Language, and Speaking and Listening. In addition to these items, the ICCR bank also offers an extensive pool of writing prompts developed to support both informative and explanatory writing as well as opinion and argumentative writing. Exhibit D1.2-31 provides the number of writing prompts at each grade that have been field tested and are part of the operational pool.

Page 20: NH SAS Items and Sample Blueprints...AIR test developers who develop items for summative tests, including the ICCR items. Items in the Navigator bank thus have the same look and feel

NH SAS Items and Sample Blueprints Exhibit D1.2-31: Number of Writing Prompts in ICCR at Each Grade

Grade/Type Number of Field Tested

Writing Prompts

3 6 informative/explanatory 3 opinion/argumentative 3

4 14 informative/explanatory 7 opinion/argumentative 7

5 15 informative/explanatory 10 opinion/argumentative 5

6 16 informative/explanatory 9 opinion/argumentative 7

7 16 informative/explanatory 8 opinion/argumentative 8

8 17 informative/explanatory 10 opinion/argumentative 7

9 14 informative/explanatory 9 opinion/argumentative 5

10 15 informative/explanatory 9 opinion/argumentative 6

11 14 informative/explanatory 7 opinion/argumentative 7

Many of these writing prompts are accompanied by operationalized AI scoring models that are already in use in several states. Exhibit D1.2-32 shows the number of prompts at each grade in the ICCR bank with operationalized scoring models. Because we would not be using immediate scoring in year one, the Department could elect to administer some prompts that do not have scoring models. We would handscore these and build scoring models for future administrations.

Page 21: NH SAS Items and Sample Blueprints...AIR test developers who develop items for summative tests, including the ICCR items. Items in the Navigator bank thus have the same look and feel

NH SAS Items and Sample Blueprints Exhibit D1.2-32: Number of Writing Prompts with Operationalized Scoring Models

Grade Number of Writing Prompts

Grade 3 6 Grade 4 6 Grade 5 6 Grade 6 6 Grade 7 6 Grade 8 9 Grade 9 8 Grade 10 6 Grade 11 6

Total: 59

In spring of 2018, we anticipate field testing more than 1,200 new items in ELA, more than 400 new items in mathematics, approximately 165 science clusters, and about 65 stand-alone science items. As we mention in previous sections, we believe the Department will find the reading and mathematics item pools more than sufficient to support either our existing blueprint or a similar blueprint based on Department modifications. Our proposed spring 2018 operational field-test design will ensure that the full bank of science items is calibrated and equated to a common scale to support construction of psychometrically equivalent science matrix forms for future test administrations, as well as immediate reporting of test results.

Based on the ICCR model blueprints exhibited in Topic 1, Exhibit D1.2-33 summarizes the number of items that would appear on each assessment component. Please note that in ELA one item would be a text-based writing task.

We understand the state’s desire for meaningful assessments that balance the need for a comprehensive measurement design while mitigating the sometimes onerous burden on schools in terms of testing time. In Topic 1.2 Test Administration we present 85th percentile testing times for Utah’s SAGE summative assessments, which deliver test administrations based on blueprints similar to those we propose for New Hampshire’s statewide assessments. We will work with the Department to finalize a set of blueprints that meet reporting requirements while limiting test administration times.

Exhibit D1.2-33: Number of Items on Each Assessment Component

Assessment Component Grade Number of Items

on Assessment Component English Language Arts Grades 3−8 42

Mathematics Grades 3−8 50

Science

Grade 3 32 Grade 5 37

High School 46