ACT Aspire® Technical Manual - archchicago.orgocs.archchicago.org/Portals/23/ACT Aspire Technical...

discoveractaspire.org

Technical Manual

2016 VE R S I O N 1

www.discoveractaspire.org

PrefaceThe ACT Aspire Technical Manual contains detailed technical information about the ACT Aspire® summative assessment. The principal purpose of the manual is to document technical characteristics of the ACT Aspire assessment in light of its intended purposes. The ACT Aspire Technical Manual documents the collection of validity evidence that supports appropriate interpretations of test scores and describes various content and psychometric aspects of the ACT Aspire. Multiple test design and development processes are articulated documenting how ACT attends to building the assessment in line with the validity argument and how concepts like construct validity, fairness, and accessibility are attended to throughout the process. Also described are routine analyses designed to support ongoing and continuous improvement and research intended to assure the program remains psychometrically sound.

ACT endorses and is committed to industry standards and criteria. ACT endorses and is committed to complying with The Standards for Educational and Psychological Testing (AERA, APA, & NCME, 2014). ACT also endorses the Code of Fair Testing Practices in Education (Joint Committee on Testing Practices, 2004), which is a statement of the obligations to test takers of those who develop, administer, or use educational tests and test data in the following four areas: developing and selecting appropriate tests, administering and scoring tests, reporting and interpreting test results, and informing test takers. ACT endorses and is committed to complying with the Code of Professional Responsibilities in Educational Measurement (NCME Ad Hoc Committee on the Development of a Code of Ethics, 1995), which is a statement of professional responsibilities for those involved with various aspects of assessments, including development, marketing, interpretation, and use.

We encourage individuals who want more detailed information on a topic discussed in this manual, or on a related topic, to contact ACT.

Please direct comments or inquiries to the address below:

Research Services ACT, Inc. 500 ACT Drive Iowa City, Iowa 52243-0168.

© 2016 by ACT, Inc. All rights reserved. ACT Aspire®, the ACT® test, ACT Explore®, ACT Plan®, EPAS®, and ACT NCRC® are registered trademarks, and ACT National Career Readiness Certificate™ is a trademark of ACT, Inc. 6068

i

Contents1 General Description of ACT Aspire Assessments and Standards . 1.1

1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1

1.2 Purposes, Claims, Interpretations, and Uses of ACT Aspire . . . . . . . . . . . . . . . . . . . . . . . 1.1

1.3 ACT Aspire: Performance Level Descriptors and College and Career Readiness Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4

1.4 ACT Aspire and Standards Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5

2 ACT Aspire Test Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1

2.1 Assessment Design Elements and Construct Coherence . . . . . . . . . . . . . . . . . . . . . . . . 2.1

2.1.1 ACT Aspire Item types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2

2.1.2 Cognitive Complexity and Depth of Knowledge (DOK) . . . . . . . . . . . . . . . . . . . . . . 2.2

2.2 ACT Aspire Test Development Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3

2.2.1 Selecting and Training Item Writers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3

2.2.2 Designing Items that Elicit Student Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3

2.2.3 Item Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4

2.2.4 Field-testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4

3 Assessment Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1

3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1

3.2 ACT Aspire Assessment Support Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1

3.2.1 ACT Knowledge and Skills Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1

3.2.2 ACT Aspire Exemplar Items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2

3.3 English Language Arts Assessments Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2

3.3.1 English Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2

3.3.1.1. English Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2

3.3.1.2 English Reporting Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3

3.3.1.3 English Item Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4

ii

CONtENtS

3.3.1.4 English Test Blueprints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4

3.3.2 Reading Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6

3.3.2.1 Reading Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6

3.3.2.2 Reading Reporting Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7

3.3.2.3 Reading Test Complexity and Types of Texts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7

3.3.2.4 Reading Test Blueprints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9

3.3.3 Writing Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.11

3.3.3.1 Writing Task Types by Grade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.11

3.3.3.2 Writing Scoring Rubrics and Reporting Categories . . . . . . . . . . . . . . . . . . . . 3.12

3.4 Mathematics Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.13

3.4.1 Mathematics Reporting Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.14

3.4.2 Calculator Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.23

3.4.3 Mathematics Item Types, Tasks, Stimulus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.24

3.4.4 Mathematics Test Blueprints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.25

3.5 Science Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.30

3.5.1 Science Reporting Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.30

3.5.2 Continuum of Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.31

3.5.3 Science Item Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.32

3.5.4 Science Test Blueprints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.37

4 Item and Task Scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1

4.1 Selected-Response Item Scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1

4.2 Technology-Enhanced Items Scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1

4.3 Constructed-Response Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1

4.3.1 Performance Scoring Quality Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2

4.3.2 Automated Scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4

5 Accessibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1

5.1 Development of the ACT Aspire Accessibility Support System . . . . . . . . . . . . . . . . . . . . 5.1

5.2 Test Administration and Accessibility Levels of Support . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3

5.2.1 Understanding Levels of Accessibility Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3

5.2.2 Support Level 1: Default Embedded System Tools . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5

5.2.3 Support Level 2: Open Access Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6

5.2.4 Support Level 3: Accommodations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7

5.2.5 Support Level 4: Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.9

5.3 Accommodations, Open Access and Embedded Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.9

5.4 2016 Paper Summative Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.10

5.5 2016 Online Summative Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.13

6 Test Administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1

6.1 Policies and Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1

6.2 Standardized Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1

iii

CONtENtS

6.3 Selecting and Training Testing Staff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2

6.3.1 Room Supervisors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3

6.3.2 Responsibilities of Other Testing Staff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4

6.3.3 Staff Training Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5

7 Test Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1

7.1 Data Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1

7.1.1 Personally Identifiable Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2

7.1.2 Security Features of Data Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2

7.1.3 Data Breaches and Remedies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5

7.1.4 Data Privacy and Use of Student Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6

7.2 Test Security and Administration Irregularities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.8

8 ACT Aspire Reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1

9 ACT Aspire Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1

9.1 ACT Aspire Subject Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1

9.2 ACT Aspire Composite Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1

9.3 ACT Aspire ELA Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3

9.4 ACT Aspire STEM Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3

9.5 Reporting Category Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4

9.6 Progress with Text Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5

9.7 Progress toward Career Readiness Indicator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5

10 ACT Aspire Score Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1

10.1 ACT Aspire Scaling Philosophy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1

10.2 Scaling Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2

10.2.1 Scaling Study Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2

10.2.2 Construction of Scaling Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2

10.2.3 Data Collection Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3

10.2.4 Creation of Vertical Scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5

10.3 Evaluation of the Vertical Scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.11

10.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.12

10.4.1 Scaling Test Raw Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.12

10.4.2 Evaluating On-the-Same-Scale Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.12

10.4.3 Evaluating Constant Conditional Standard Error of Measurement Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.21

10.4.4 Growth Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.26

10.5 Scaling ACT Aspire Writing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.31

11 ACT Aspire Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1

iv

CONtENtS

12 EPAS® to ACT Aspire Concordance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1

12.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2

12.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3

12.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.16

13 ACT Readiness Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1

13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1

13.2 ACT Readiness Benchmarks for English, Mathematics, Reading, and Science . 13.2

13.2.1 Grades 8–10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2

13.2.2 Grades 3–7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3

13.3 ACT Readiness Benchmark for Writing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4

13.4 ACT Readiness Benchmarks for ELA and STEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5

13.5 ACT Readiness Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5

13.6 Readiness Ranges for Reporting Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.8

14 ACT Aspire Growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1

14.1 Predicted Score Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1

14.1.1 Student Progress Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2

14.1.2 Aggregate Progress Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2

14.1.3 Samples Used to Develop the Predicted Paths . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3

14.1.4 Statistical Models Used to Develop the Predicted Paths . . . . . . . . . . . . . . . . . . 14.4

14.1.5 Limitations of the Predicted Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.5

14.2 Student Growth Percentiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.5

14.2.1 Consistency of SGPs across Subject Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.5

14.3 ACT Aspire Gain Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.6

14.3.1 Interpreting ACT Aspire Gain Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.7

14.4 Growth-to-Standards Models with ACT Aspire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.8

14.5 Measurement Error of Growth Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.9

14.6 Aggregate Growth Scores for Research and Evaluation . . . . . . . . . . . . . . . . . . . . . . 14.9

14.6.1 Normative Data for School Mean SGPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.10

14.6.2 Mean SGP Differences across Student Subgroups . . . . . . . . . . . . . . . . . . . . . 14.12

14.7 Additional Growth Modeling Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.14

15 Progress toward Career Readiness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.1

15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.1

15.2 Link ACT NCRC Levels to the ACT Composite Scores . . . . . . . . . . . . . . . . . . . . . . . 15.3

15.3 Link the EPAS Composite Scores to ACT Aspire Composite Scores . . . . . . . . . . 15.4

15.4 Identify the ACT Aspire Composite Scores Corresponding to Each ACT NCRC Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.5

15.5 Report Progress toward Career Readiness using ACT Aspire Cut Scores . . . . . . 15.6

v

CONtENtS

16 ACT Aspire Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.1

16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.1

16.2 Raw Score Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2

16.3 Scale Score Reliability and Conditional Standard Error of Measurement . . . . . . . 16.6

17 ACT Aspire Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.1

17.1 Interpretations of ACT Aspire Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.2

17.2 Validity Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.3

17.2.1 Content-Oriented Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.3

17.2.2 Evidence Regarding Cognitive Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.5

17.2.3 Evidence Regarding Internal Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.6

17.2.4 Evidence Regarding Relationships with Conceptually Related Constructs 17.33

17.2.5 Evidence Regarding Relationships with Criteria . . . . . . . . . . . . . . . . . . . . . . . . . 17.64

17.2.6 Evidence Based on Consequences of Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.64

17.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.65

18 Fairness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.1

18.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.1

18.2 Fairness During the Testing Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.2

18.3 Fairness in Test Accessibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.2

18.4 Fairness in Score Interpretations for Intended Uses . . . . . . . . . . . . . . . . . . . . . . . . . . 18.2

19 ACT Aspire Mode Comparability Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.1

19.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.1

19.2 Method and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.2

19.3 Spring 2013 Comparability Study Demographic Summary Tables . . . . . . . . . . . . . 19.4

19.4 Mode Effects for Raw Scores on Common Items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.9

19.5 Item-level Performance across Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.26

19.6 Comparisons of Scale Scores across Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.44

19.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.62

20 ACT Aspire Equating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.1

20.1 Methods and Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.1

20.2 Summary from the Spring 2014 Administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.2

20.2.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.3

20.2.2 Standard Error of Equating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.3

Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1

Appendix A: Illustration of the ACT Aspire Mathematics Assessments . . . . . . . . . . . . . . . . A.1

Appendix B: The ACT with Constructed-Response Items . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.1

Appendix C: Spring 2013 Scaling Study Demographic Summary Tables . . . . . . . . . . . . . . C.1

Appendix D: Descriptive Statistics and Reliabilities of Reporting Category Raw Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D.1

vi

CONtENtS

Appendix E: ACT Aspire ELA Content Framework and Alignment: English . . . . . . . . . . . . E.1

Appendix F: ACT Aspire ELA Content Framework and Alignment: Reading . . . . . . . . . . . F.1

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R.1

2016 Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . U.1

Section 1: Updating ACT Aspire Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . U.1

ACT Aspire Norm Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . U.1

Weighting Methodology for ACT Aspire Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . U.2

Norm Tables and Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . U.7

Section 2: Updating ACT Aspire STEM Readiness Benchmarks . . . . . . . . . . . . . . . . . . . . U.30

Section 3: Reliability for Spring 2015 ACT Aspire Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . U.32

Section 4: Changes to the Program for 2016 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . U.34

vii

TablesTable 3.1. Specification Ranges by Item Type and Reporting Category for Grade 3 . . . . . . . . . . . 3.4

Table 3.2. Specification Ranges by Item Type and Reporting Category for Grades 4–5 . . . . . . . 3.5

Table 3.3. Specification Ranges by Item Type and Reporting Category for Grades 6–8 . . . . . . . 3.5

Table 3.4. Specification Ranges by Item Type and Reporting Category for Grade EHS . . . . . . . 3.6

Table 3.5. Number of Passages by Text Complexity for the ACT Aspire Reading Test Grade Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8

Table 3.6. Specification Ranges by Item Type and Reporting Category for Grades 3–7 . . . . . . 3.10

Table 3.7. Specification Ranges by Item Type and Reporting Category for Grades 8–EHS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.10

Table 3.8. Items by Passage Type for the ACT Aspire Reading Test . . . . . . . . . . . . . . . . . . . . . . . . . 3.11

Table 3.9. Tasks by Writing Mode for Each Grade of the Writing Test . . . . . . . . . . . . . . . . . . . . . . . . 3.11

Table 3.10. Writing Test Scoring Rubric Domains and Associated Reporting Categories . . . . . 3.12

Table 3.11. Points by Content Category for the Writing Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.13

Table 3.12. Number of Justification & Explanation Tasks, Raw-Score Points (and Percentage Weight) in Reporting Categories by Grade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.22

Table 3.13. Modeling Minimum Number (and Percentage) of Raw-Score Points by Reporting Category and Grade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.23

Table 3.14. Modeling Actual Average Number of Raw-Score Points for 2 Forms by Grade . . 3.23

Table 3.15. Modeling Actual Average Percentage of Raw-Score Points for 2 Forms by Grade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.23

Table 3.16. Target Percentage of Raw-Score Points by Item Type and Grade . . . . . . . . . . . . . . . . 3.24

Table 3.17. Target for Raw-Score Points (Percentage of Test) [Percentage of Reporting Category] by DOK and Grade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.25

Table 3.18. Specification Ranges by Item Type and Reporting Category for Grades 3-5 . . . . . 3.26

Table 3.19. Specification Ranges by Item Type and Reporting Category for Grades 6-7 . . . . . 3.27

Table 3.20. Specification Ranges by Item Type and Reporting Category for Grade 8 . . . . . . . . 3.28

Table 3.21. Specification Ranges by Item Type and Reporting Category for EHS . . . . . . . . . . . . 3.29

Table 3.22. Stimulus Modes used on the ACT Aspire Science Test . . . . . . . . . . . . . . . . . . . . . . . . . . 3.33

viii

tablES

Table 3.23. ACT Aspire Science College and Career Readiness Knowledge and Skill Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.34

Table 3.24. Specification Ranges by Item Type and Reporting Category for Grades 3–5 . . . . . 3.37

Table 3.25. Specification Ranges by Item Type and Reporting Category for Grades 6–7 . . . . . 3.38

Table 3.26. Specification Ranges by Item Type and Reporting Category for Grades 8–EHS . 3.39

Table 4.1. Scorer Performance Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4

Table 5.1. Paper Summative Testing Presentation Supports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.10

Table 5.2. Paper Summative Testing Interaction and Navigation Supports . . . . . . . . . . . . . . . . . . . 5.12

Table 5.3. Paper Summative Testing Response Supports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12

Table 5.4. Paper Summative Testing General Test Condition Supports . . . . . . . . . . . . . . . . . . . . . . 5.13

Table 5.5. Online Summative Testing Presentation Supports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.13

Table 5.6. Online Summative Testing Interaction and Navigation Supports . . . . . . . . . . . . . . . . . . . 5.15

Table 5.7. Online Summative Testing Response Supports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.16

Table 5.8. Online Summative Testing General Test Condition Supports . . . . . . . . . . . . . . . . . . . . . . 5.16

Table 5.9. Frequency of Use of Accommodated Forms, ACT Aspire, Spring 2015 . . . . . . . . . . . 5.17

Table 9.1. Composite Score Descriptive Statistics Based on the 2013 Scaling Study . . . . . . . . 9.1

Table 9.2. Composite Score Effective Weights-Equal Nominal Weights . . . . . . . . . . . . . . . . . . . . . . 9.2

Table 9.3. ELA Score Effective Weights-Equal Nominal Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3

Table 9.4. STEM Score Effective Weights-Equal Nominal Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4

Table 9.5. Reporting Categories within each Subject Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4

Table 10.1. Data Collection Design of the Scaling Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5

Table 10.2. Raw Scores of On-Grade Tests for Two Groups Taking Lower/Upper Level Scaling Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.7

Table 10.3. Raw Scores on Common Items for Two Groups Taking Lower/Upper level Scaling Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.8

Table 10.4. Mean and Standard Deviation of Raw Scores on English Scaling Test (ST) by Grade Specific Items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.13

Table 10.5. Mean and Standard Deviation of Raw Scores on Mathematics Scaling Test (ST) by Grade Specific Items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.14

Table 10.6. Mean and Standard Deviation of Raw Scores on Reading Scaling Test (ST) by Grade Specific Items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.15

Table 10.7. Mean and Standard Deviation of Raw Scores on Science Scaling Test (ST) by Grade Specific Items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.16

Table 10.8. English Scale Score Descriptive Statistics Based on Scaling Study by Grade . . 10.28

Table 10.9. Mathematics Scale Score Descriptive Statistics Based on Scaling Study by Grade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.28

Table 10.10. Reading Scale Score Descriptive Statistics Based on Scaling Study by Grade 10.29

Table 10.11. Science Scale Score Descriptive Statistics Based on Scaling Study by Grade 10.29

Table 10.12. Writing Scale Score Descriptive Statistics Based on Scaling Study by Grade . 10.29

Table 10.13. Lowest Obtainable Scale Scores (LOSS) and Highest Obtainable Scale Scores (HOSS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.30

ix

tablES

Table 11.1. Percentage of Students Meeting Subject Benchmark in the 2014 Norm Sample 11.2

Table 11.2. 2014 ACT Aspire English Norm Group Demographics . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2

Table 11.3. 2014 ACT Aspire Mathematics Norm Group Demographics . . . . . . . . . . . . . . . . . . . . 11.4

Table 11.4. 2014 ACT Aspire Reading Norm Group Demographics . . . . . . . . . . . . . . . . . . . . . . . . . 11.6

Table 11.5. 2014 ACT Aspire Science Norm Group Demographics . . . . . . . . . . . . . . . . . . . . . . . . . 11.8

Table 11.6. 2014 ACT Aspire Writing Norm Group Demographics . . . . . . . . . . . . . . . . . . . . . . . . 11.10

Table 11.7. 2014 ACT Aspire English Norms: Percentage of Students at or below Each Scale Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.12

Table 11.8. 2014 ACT Aspire Mathematics Norms: Percentage of Students at or below Each Scale Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.14

Table 11.9. 2014 ACT Aspire Reading Norms: Percentage of Students at or below Each Scale Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.16

Table 11.10. 2014 ACT Aspire Science Norms: Percentage of Students at or below Each Scale Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.18

Table 11.11. 2014 ACT Aspire Writing Norms: Percentage of Students at or below Each Scale Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.20

Table 12.1. Descriptive Statistics of the Sample for Concordance Analysis . . . . . . . . . . . . . . . . . . 12.3

Table 12.2. Demographic Information of the Sample for Concordance Analysis . . . . . . . . . . . . . 12.4

Table 12.3. Descriptive Statistics of ACT Explore and Concorded ACT Aspire Scores in Cross-Sectional Evaluation Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.7

Table 12.4. Descriptive Statistics of ACT Plan and Concorded ACT Aspire Scores in Cross-Sectional Evaluation Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.7

Table 12.5. Descriptive Statistics of ACT Explore Grade 8 and ACT Plan Grade 10 Scale Scores in Longitudinal Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.16

Table 12.6. Descriptive Statistics of Concorded ACT Aspire Scale Scores in Longitudinal Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.16

Table 12.7. EPAS to ACT Aspire Concordance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.17

Table 13.1. ACT College Readiness Benchmarks and ACT Readiness Benchmarks for Grades 8–10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3

Table 13.2. Classification Agreement Rate between ACT Aspire and EPAS Benchmarks . . . . 13.3

Table 13.3. ACT Readiness Benchmarks for Grades 3–7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4

Table 13.4. ACT Readiness Benchmarks in ELA and STEM for Grades 3–10 . . . . . . . . . . . . . . . 13.5

Table 13.5. Benchmarks, Low Cuts, and High Cuts for ACT Readiness Benchmarks by Subject and Grade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.7

Table 14.1. Longitudinal Samples Used to Develop Predicted Paths, Predicted ACT Score Ranges, and Student Growth Percentiles for Spring 2016 Reporting . . . . . . . . . . . . . . . . . 14.4

Table 14.2. Correlations of SGPs across Subject Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.6

Table 14.3. ACT Aspire Gain Score Means and Standard Deviations . . . . . . . . . . . . . . . . . . . . . . . . 14.7

Table 14.4. Distributions of School MGPs, by Subject Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.11

Table 14.5. SGP Sample Size by Subject Area and Student Subgroup . . . . . . . . . . . . . . . . . . . . 14.13

Table 14.6. Mean SGP by Subject Area and Student Subgroup . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.13

Table 15.1. Descriptive Statistics of ACT Composite Scores by ACT NCRC Level . . . . . . . . . . . 15.3

x

tablES

Table 15.2. ACT Composite Scores Corresponding to a 50% Chance of Obtaining Different ACT NCRC Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4

Table 15.3. Descriptive Statistics of the Analysis Sample (N=13,528) . . . . . . . . . . . . . . . . . . . . . . 15.4

Table 15.4. Grade 11 EPAS and ACT Aspire Composite Scores Corresponding to ACT NCRC Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.5

Table 15.5. ACT Aspire Composite Scores Corresponding to a 50% Chance of Obtaining Different ACT NCRC Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.5

Table 16.1. Raw and Scale Score Reliability Coefficient Ranges by Grade for Subject Tests and Composite Score from Spring 2013 Special Studies Data . . . . . . . . . . . . . . . . . . . . . . . . 16.2

Table 16.2. Raw Score Reliability Coefficient Ranges by Grade for Subject Tests from Spring 2014 Operational Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.3

Table 16.3. Percentages of Students for Writing Rater Analysis in the Spring 2013 Special Studies by Grade and Demographics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.4

Table 16.4. Writing Test Correlations between Rater 1 and Rater 2 by Domain and by Form from Spring 2013 Special Studies Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.5

Table 16.5. Writing Test Reliability Coefficients Based on Four Domain Scores from Spring 2013 Special Studies Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.5

Table 16.6. Scale Score Reliability Coefficient and Standard Error of Measurement by Grade and Subject from Spring 2014 Operational Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.8

Table 17.1. English ACT Aspire Online and Paper Mode Exploratory Factor Analysis Eigenvalues and Percentage of Variance Explained, by Grade Level Using Data from the Spring 2013 Comparability Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.18

Table 17.2. Mathematics ACT Aspire Online and Paper Mode Exploratory Factor Analysis Eigenvalues and Percentages of Variance Explained, by Grade Level Using Data from the Spring 2013 Comparability Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.20

Table 17.3. Reading ACT Aspire Online and Paper Mode Exploratory Factor Analysis Eigenvalues and Percentages of Variance Explained, by Grade Level Using Data from the Spring 2013 Comparability Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.22

Table 17.4. Science ACT Aspire Online and Paper Mode Exploratory Factor Analysis Eigenvalues and Percentages of Variance Explained, by Grade Level Using Data from the Spring 2013 Comparability Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.23

Table 17.5. English Test Fit Statistics for One- and Two-Factor Models Using Data from the 2013 ACT Aspire Mode Comparability Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.25

Table 17.6. Mathematics Test Fit Statistics for One- and Two-Factor Models Using Data from the 2013 ACT Aspire Mode Comparability Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.26

Table 17.7. Reading Test Fit Statistics for One- and Two-Factor Models Using Data from the 2013 ACT Aspire Mode Comparability Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.27

Table 17.8. Science Test Fit Statistics for One- and Two-Factor Models Using Data from the 2013 ACT Aspire Mode Comparability Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.28

Table 17.9. Two-Factor Exploratory Factor Model Loadings for Four ACT Aspire Forms Identified as Showing Evidence of Two Factors Based on Analysis of Fit Statistics . . 17.29

Table 17.10. Summary Statistics for ACT Aspire Scale Scores by Grade . . . . . . . . . . . . . . . . . . 17.30

Table 17.11. Correlations among ACT Aspire Scale Scores by Grade . . . . . . . . . . . . . . . . . . . . . 17.31

xi

tablES

Table 17.12. Percentages of Students in the ACT Explore, ACT Plan, and ACT Aspire Matched Sample by Subject, Grade and Demographics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.34

Table 17.13. Descriptive Statistics for ACT Explore, ACT Plan and ACT Aspire Scale Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.35

Table 17.14. Correlations between ACT Explore/ACT Plan Scale Scores and ACT Aspire Scale Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.36

Table 17.15. Disattenuated Correlations between ACT Explore/ACT Plan Scale Scores and ACT Aspire Scale Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.36

Table 17.16. Scale Score Reliabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.37

Table 17.17. Multitrait-Multimethod Matrices for ACT Explore/Plan and ACT Aspire Scale Scores by Grade Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.46

Table 17.18. Sample Sizes by Grade and Subject for the Sample of Students with Scores on ARMT+ and ACT Aspire in Spring 2013 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.50

Table 17.19. Descriptive Statistics for ARMT+ and ACT Aspire Scale Scores . . . . . . . . . . . . . 17.50

Table 17.20. Correlations between ARMT+ Scale Scores and ACT Aspire Scale Scores . . 17.51

Table 17.21. Disattenuated Correlations between ARMT+ Scale Scores and ACT Aspire Scale Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.51

Table 17.22. Multitrait-Multimethod Matrices for ARMT+ and ACT Aspire Scale Scores by Grade Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.60

Table 17.23. Disattenuated Correlations between ARMT+ and ACT Aspire Scale Scores by Grade Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.63

Table 19.1. Percentage of Items Different Across Online and Paper Forms . . . . . . . . . . . . . . . . . 19.3

Table 19.2. Sample Sizes by Grade and Subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.3

Table 19.3. Percentages of Students Taking English Tests in the Comparability Study by Grade, Mode and Demographics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.4

Table 19.4. Percentages of Students Taking Mathematics Tests in the Comparability Study by Grade, Mode and Demographics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.5

Table 19.5. Percentages of Students Taking Reading Tests in the Comparability Study by Grade, Mode and Demographics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.6

Table 19.6. Percentages of Students Taking Science Tests in the Comparability Study by Grade, Mode and Demographics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.7

Table 19.7. Percentages of Students Taking Writing Tests in the Comparability Study by Grade, Mode and Demographics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.8

Table 19.8. English Common-Item Raw Score Summary for Online and Paper Modes . . . . . 19.10

Table 19.9. Mathematics Common-Item Raw Score Summary for Online and Paper Modes 19.11

Table 19.10. Reading Common-Item Raw Score Summary for Online and Paper Modes . . 19.12

Table 19.11. Science Common-Item Raw Score Summary for Online and Paper Modes . . . 19.13

Table 19.12. Writing Common-Item Raw Score Summary for Online and Paper Modes . . . . 19.14

Table 19.13. ACT Aspire Spring 2013 Comparability Study P-values for English Early High School Items Flagged in Delta Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.43

Table 19.14. English Scale Score Summary for Online and Paper Modes . . . . . . . . . . . . . . . . . 19.45

Table 19.15. Mathematics Scale Score Summary for Online and Paper Modes . . . . . . . . . . . . 19.46

xii

tablES

Table 19.16. Reading Scale Score Summary for Online and Paper Modes . . . . . . . . . . . . . . . . 19.47

Table 19.17. Science Scale Score Summary for Online and Paper Modes . . . . . . . . . . . . . . . . . 19.48

Table 19.18. Writing Scale Score Summary for Online and Paper Modes . . . . . . . . . . . . . . . . . . 19.49

Table 20.1. Percentages of Students Taking English Online Forms in the Spring 2014 Equating Study by Grade and Demographics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.5

Table 20.2. Percentages of Students Taking English Paper Forms in the Spring 2014 Equating Study by Grade and Demographics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.6

Table 20.3. Percentages of Students Taking Mathematics Online Forms in the Spring 2014 Equating Study by Grade and Demographics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.7

Table 20.4. Percentages of Students Taking Mathematics Paper Forms in the Spring 2014 Equating Study by Grade and Demographics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.8

Table 20.5. Percentages of Students Taking Reading Online Forms in the Spring 2014 Equating Study by Grade and Demographics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.9

Table 20.6. Percentages of Students Taking Reading Paper Forms in the Spring 2014 Equating Study by Grade and Demographics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.10

Table 20.7. Percentages of Students Taking Science Online Forms in the Spring 2014 Equating Study by Grade and Demographics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.11

Table 20.8. Percentages of Students Taking Science Paper Forms in the Spring 2014 Equating Study by Grade and Demographics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.12

Table B.1. Number of Constructed-Response Items and Number of Score Points Included in the On-Grade ACT with Constructed-Response Test Forms in the Scaling Study . . B.1

Table C.1. Percentages of Students Taking English Tests in the Scaling Study by Grade and Demographics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C.1

Table C.2. Percentages of Students Taking Mathematics Tests in the Scaling Study by Grade and Demographics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C.2

Table C.3. Percentages of Students Taking Reading Tests in the Scaling Study by Grade and Demographics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C.3

Table C.4. Percentages of Students Taking Science Tests in the Scaling Study by Grade and Demographics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C.4

Table C.5. Percentages of Students Taking Writing Tests in the Scaling Study by Grade and Demographics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C.5

Table D.1. English Reporting Category Raw Score Descriptive Statistics and Reliabilities from Spring 2014 Test Administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D.1

Table D.2. Mathematics Reporting Category Raw Score Descriptive Statistics and Reliabilities from Spring 2014 Test Administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D.3

Table D.3. Reading Reporting Category Raw Score Descriptive Statistics and Reliabilities from Spring 2014 Test Administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D.7

Table D.4. Science Reporting Category Raw Score Descriptive Statistics and Reliabilities from Spring 2014 Test Administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D.9

Table U.1. Sample Sizes Used for Norm Establishment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . U.2

Table U.2. Correlation of District Mean ACT Composite Scores with District Mean ACT Aspire Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . U.3

Table U.3. Target Public School Population and ACT Public School Population . . . . . . . . . . . . . . . U.5

xiii

tablES

Table U.4. One Example of ACT Aspire Norm Sample Weighting: Grade 5 English . . . . . . . . . . . U.6

Table U.5. 2015 ACT Aspire Weighted English Norm Group Demographics . . . . . . . . . . . . . . . . . U.9

Table U.6. 2015 ACT Aspire Weighted Mathematics Norm Group Demographics . . . . . . . . . . . U.11

Table U.7. 2015 ACT Aspire Weighted Reading Norm Group Demographics . . . . . . . . . . . . . . . U.13

Table U.8. 2015 ACT Aspire Weighted Science Norm Group Demographics . . . . . . . . . . . . . . . . U.15

Table U.9. 2015 ACT Aspire Weighted Writing Norm Group Demographics . . . . . . . . . . . . . . . . . U.17

Table U.10. 2015 ACT Aspire English Norms: Percentage of Students at or below Each Scale Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . U.19

Table U.11. 2015 ACT Aspire Mathematics Norms: Percentage of Students at or below Each Scale Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . U.21

Table U.12. 2015 ACT Aspire Reading Norms: Percentage of Students at or below Each Scale Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . U.23

Table U.13. 2015 ACT Aspire Science Norms: Percentage of Students at or below Each Scale Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . U.25

Table U.14. 2015 ACT Aspire Writing Norms: Percentage of Students at or below Each Scale Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . U.27

Table U.15. National Average Scale Scores Using 2015 Norm Sample . . . . . . . . . . . . . . . . . . . . . U.28

Table U.16. Percentage of Students Who Met Subject Benchmark in the 2015 Norm Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . U.29

Table U.17. New ACT Readiness Benchmarks for ACT Aspire STEM Scores . . . . . . . . . . . . . . . U.31

Table U.18. Cronbach’s Alpha of ACT Aspire Forms Administered in Spring 2015 . . . . . . . . . . U.32

Table U.19. ACT Aspire Summative Testing Time Changes (in minutes) . . . . . . . . . . . . . . . . . . . . . U.34

Table U.20. English: Number of Items by Item Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . U.36

Table U.21. English: Number of Points by Reporting Category . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . U.36

Table U.22. Mathematics: Number of Items by Item Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . U.36

Table U.23. Mathematics: Number of Points by Reporting Category . . . . . . . . . . . . . . . . . . . . . . . . U.37

xiv

FiguresFigure 1.1. The full picture: evidence and validity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2

Figure 2.1. ACT Aspire field testing model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5

Figure 3.1. ACT Aspire English test reporting with example skill targets for one reporting category. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3

Figure 3.2. ACT Aspire reading test reporting categories with example skill targets for one reporting category . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7

Figure 3.3. ACT Aspire reading test item types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9

Figure 3.4. ACT Aspire Writing test modes and reporting categories . . . . . . . . . . . . . . . . . . . . . . . . 3.13

Figure 3.5. Mathematics Domains, and Primary Currents in the Flow Across Grades . . . . . . . . 3.20

Figure 3.6. Foundation and Grade Level Progress (illustrated for Grade 7) . . . . . . . . . . . . . . . . . . 3.21

Figure 3.7. ACT and ACT Aspire science content complexity continuum. . . . . . . . . . . . . . . . . . . . . 3.32

Figure 4.1. Components of student writing performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6

Figure 5.1. Accessibility feature mapping process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3

Figure 5.2. ACT Aspire Levels of Accessibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4

Figure 5.3. Default embedded system tools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6

Figure 5.4. Open access tools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7

Figure 5.5. Accommodations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8

Figure 5.6. Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.9

Figure 10.1. Construction of the ACT Aspire scaling tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3

Figure 10.2. Effect sizes on individual scaling test raw score scale and whole scaling test raw score scale derived from the 2PL model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.9

Figure 10.3. Scatter plots of average scale scores against scaling test raw scores by grade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.17

Figure 10.4. Conditional standard error of measurement on raw and scale score scales . . . 10.22

Figure 10.5. Effect sizes on individual scaling test raw score scale, whole scaling test raw score scale, and ACT Aspire score scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.26

Figure 12.1. Scatter plots of ACT Aspire scale scores versus EPAS scale scores . . . . . . . . . . . 12.5

Figure 12.2. Distributions of ACT Explore scale scores and concorded ACT Aspire scale scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.8

xv

FIguRES

Figure 12.3. Distributions of ACT Plan scale scores and concorded ACT Aspire scale scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.10

Figure 12.4. Box plots of ACT Plan Scores (or concorded ACT Aspire scale scores) conditional on ACT Explore Scores (or concorded ACT Aspire Scores) . . . . . . . . . . . . . . . . . . . . 12.12

Figure 14.1. Prototype of ACT Aspire Student Progress Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2

Figure 14.2. Prototype ACT Aspire Aggregate Progress Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3

Figure 14.3. ACT Aspire gain score statistics for grade 7–8 reading . . . . . . . . . . . . . . . . . . . . . . . . 14.8

Figure 14.4. Distributions of school MGPs, by subject area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.11

Figure 15.1. Sample Progress toward Career Readiness indicator from ACT Aspire report . . 15.2

Figure 17.1. Scree plots of ACT Aspire online and paper forms by subject and grade using data from the spring 2013 comparability study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.10

Figure 17.2. Box plots of ACT Explore or ACT Plan English scale scores conditional on ACT Aspire English scale scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.38

Figure 17.3. Box plots of ACT Explore or ACT Plan mathematics scale scores conditional on ACT Aspire mathematics scale scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.39

Figure 17.4. Box plots of ACT Explore or ACT Plan reading scale scores conditional on ACT Aspire reading scale scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.41

Figure 17.5. Box plots of ACT Explore or ACT Plan science scale scores conditional on ACT Aspire science scale scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.42

Figure 17.6. Box plots of ACT Explore or ACT Plan Composite scale scores conditional on ACT Aspire Composite scale scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.44

Figure 17.7. Box plots of ARMT+ mathematics scale scores conditional on ACT Aspire mathematics scale scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.52

Figure 17.8. Box plots of ARMT+ reading scale scores conditional on ACT Aspire reading scale scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.55

Figure 17.9. Box plots of ARMT+ science scale scores conditional on ACT Aspire science scale scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.58

Figure 19.1. Cumulative percentage of students for common item raw scores across modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.15

Figure 19.2. English common item mean scores for paper and online forms by item type* . 19.28

Figure 19.3. Mathematics common item mean scores for paper and online forms by item type* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.30

Figure 19.4. Reading common item mean scores for paper and online forms by item type* 19.32

Figure 19.5. Science common item mean scores for paper and online forms by item type* 19.34

Figure 19.6. Delta plots for online and paper common items with predicted delta and one- and two-standard error bands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.36

Figure 19.7. Cumulative percentage of students for scale scores across modes . . . . . . . . . . 19.50

Figure 19.8. Box plots of scale scores by grade and mode for each subject area . . . . . . . . . . 19.60

Figure 20.1. Average English scale scores against common item raw scores . . . . . . . . . . . . . 20.13

Figure 20.2. Average mathematics scale scores against common item raw scores . . . . . . . . 20.14

Figure 20.3. Average reading scale score against common item raw scores . . . . . . . . . . . . . . 20.15

Figure 20.4. Average science scale scores against common item raw scores . . . . . . . . . . . . . 20.16

xvi

FIguRES

Figure 20.5. Unrounded scale score standard error of equating for selected subjects and grades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.17

Figure A.1. Mathematics test content specifications and flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1

Figure U.1. Conceptual diagram of chain-linked weighting strategy . . . . . . . . . . . . . . . . . . . . . . . . . . U.8

1.1

CHaPtER 1

General Description of ACT Aspire Assessments and Standards

1.1 OverviewThe ACT Aspire® summative assessment program includes a vertically scaled battery of achievement tests designed to measure student growth in a longitudinal assessment system for grades 3–Early High School (EHS) in English, reading, writing, mathematics, and science. Taken as individual subject tests or as a battery, ACT Aspire can be delivered via computer or as a paper-and-pencil administration. The scale scores are linked to college and career data through scores on the ACT® test and the ACT National Career Readiness Certificate™ (ACT NCRC®) program. To enhance score interpretation, reporting categories for ACT Aspire use the same terminology as The ACT College and Career Readiness Standards (ACT CCRS) and other standards that target college and career readiness, including the standards of many states and the Common Core State Standards (CCSS).

1.2 Purposes, Claims, Interpretations, and Uses of ACT AspireThe purpose of the ACT Aspire assessments is to measure student achievement and progress toward college and career readiness. To identify college and career readiness constructs, ACT uses empirical and performance data to define requisite constructs in the content areas of English, reading, writing, mathematics, and science. Every three years, ACT administers the ACT National Curriculum Survey study to identify what kindergarten through postsecondary teachers, including instructors of

gENERal DESCRIPtION OF aCt aSPIRE aSSESSmENtS aND StaNDaRDS

1.2

entry-level college and workforce-training courses, expect of their entering students—this includes the most current knowledge and skills students need to demonstrate to be ready for entry-level postsecondary courses and jobs. ACT also collects data about what is actually being taught in elementary, middle, and high school classrooms. Taken altogether, these results support ACT Aspire’s curriculum base and ACT’s ability to identify the key skill targets and knowledge that is most important for students to know to be ready for the next step. ACT uses these and other research results to design assessments and inform test blueprints so that each test targets the most important college and career readiness skills across skill progressions.

Additional validation of the constructs occurs through ACT’s longitudinal research (see figure 1.1).

ACT National Curriculum

Survey

Academic Research

College and Career

Readiness Standards

CONTENT VALIDITY

The Full Picture: Evidence and Validity

PREDICTIVE VALIDITY

Describe what

students can do at each

benchmark

Identify the Skills that lead to College and Career Readiness

Design tasks that elicit valid evidence on these skills

Design test forms that produce reliable measurements

Instructionally Actionable Reporting Categories

Build Forms

Administer Forms

Score Results

ACTWorkKeys

Performance Levels

CareerSuccess

JobProfiles

CollegeCourseGrades

ACT AspireReadiness

Benchmarks

ACT CollegeReadiness

Benchmarks

Figure 1.1. the full picture: evidence and validity.

Figure 1.1 shows how ACT uses research including empirical feedback loops to help inform continuous improvement. The ACT Aspire development starts by exhaustively understanding the most important requirements and most desired evidence needed to support inferences about college and career readiness, and tests are designed specifically to elicit the identified evidence. Subject matter experts (SMEs), item writers, and other educators work collaboratively to write items and review them for accuracy, appropriateness, and fairness. Operational forms are put together using pretested items and are built to test specifications. Scores from assessments are calculated, reported, and sent out on various reports designed for each audience. Instructionally relevant reporting categories provide immediate actionable information about how students perform on key skills at a more granular level. Additional psychometric work continues, including analyzing student performance along the Kindergarten to Early High School (K–EHS) continuum. Students who meet readiness score benchmarks at one grade level are compared to students who succeed in the next grade. This information is used to ensure the benchmarks are accurate and


1.3

up-to-date. Through research and feedback loops, ACT ensures that key targets are identified and measured appropriately across the continuum.

ACT Aspire was created by employing a theory of action (TOA) approach that fuses content validity (academic research) with predictive validity (empirical data), thus following similar methodologies used to build the ACT. The TOA begins by answering fundamental questions about the purpose of the assessment, such as: Who are the intended users? What are the intended uses of the assessment results? What are the consequences that may result from using the assessment? What are the intended interpretations or claims based on the assessment? What are measurable outcomes from using the assessment? The answers to these questions emerge as a result of rigorous research and data collection that inform and allow for the identification of high-value skill targets in each subject area, resulting in focal points for the development of tasks and test forms. The procedure set forth by the TOA further gives rise to hypothesized mechanisms or processes for bringing about the intended goals of the assessment results. For example, cognitive labs, piloting, and field testing are used to validate results, and iteratively improve the specifications and design of the assessment. Operational results are used to continuously improve the components of the assessment.

Artifacts of the assessment architecture emerge from the research and data collection process to ensure that items and test forms elicit the intended evidence to support the claims made by the assessment. For example, content and item specifications, test blueprints, benchmarks, and performance level descriptors (PLD’s) influence the technical quality and output of test items and forms. These artifacts are informed by several factors, including:

• Academic research on skill targets, sequencing of skills, and grade level placement

• Data and evidence of student understanding collected from the assessments

• The ACT National Curriculum Survey

• Survey of standards frameworks—including, but not limited to the ACT CCRS, CCSS, and Next Generation Science Standards

• Subject Matter Experts (SME)

The principal claims, interpretations, and uses of ACT Aspire are the following:

1. To measure student readiness on an empirically derived college readiness trajec-tory. (Note that students taking the ACT Aspire battery will receive scores that can be compared to ACT Readiness Benchmarks that are linked to the ACT.)

2. To measure student readiness on a career readiness trajectory.

The secondary claims, interpretations, and uses of ACT Aspire are the following:

1. To provide instructionally actionable information to educators. Data from the ACT Aspire summative assessment can be used to identify areas of student strength and weakness in content areas at student, classroom, and school levels. Data can inform instruction and facilitate the identification of interventions.


1.4

2. To provide empirical data for inferences related to accountability. ACT Aspire data can be one of multiple measures for making inferences about student progress and growth with respect to college and career readiness in reading, language, writing, mathematics, and science, as reported in terms of ACT CCRS. ACT Aspire can also be used for accountability reporting where college and career readiness standards (such as CCSS and others) have been adopted.

3. To provide empirical support for inferences about international comparisons. Building on ACT research that mark ACT College Readiness Benchmarks as internationally competitive, ACT Aspire scores also will be linked to scores on the Programme for International Student Assessment (PISA) for comparisons on an international scale. Results will be available in the near future.

1.3 ACT Aspire: Performance Level Descriptors and College and Career Readiness StandardsACT Aspire assessments are aligned with leading frameworks of content standards that target college and career readiness. ACT Aspire Performance Level Descriptors (PLDs) have been created to provide specific descriptions of student performance within and across grades. The PLDs were developed in a 2016 study run by an independent facilitator; over 90 subject-matter experts (SMEs) from 14 states drafted the PLD statements. The PLDs at each grade are organized in 4 performance levels: In Need of Support, Close, Proficient, and Exceeding. The cut score for the Proficient level is based on the ACT Readiness Benchmark at that grade. The ACT Readiness Benchmarks were derived empirically working backwards from the ACT College and Career Readiness Benchmark scores, which indicate the level of achievement required to be on target for success at the next level (see Chapter 13 for details). In the 2016 PLD study, SMEs reviewed ACT Aspire materials, performance data and administered test items to generate statements that describe what students know and are able to do within each category in each content and grade. PLDs are found at https://www.discoveractaspire.org/performance-level-descriptors/.

In addition, ACT Aspire assessments in grade 8 and Early High School are aligned with the ACT College and Career Readiness Standards (CCRS). The ACT CCRS, developed for each content test, are descriptions of the skills and knowledge that ACT has empirically linked to readiness in postsecondary education and the world of work. Different groups of SMEs developed the ACT CCRS statements by synthesizing the domain-specific knowledge and skills demonstrated by students in particular score bands across thousands of students’ scores. Within each content area, the CCRS are organized by strand, which mirror the reporting categories featured in ACT Aspire, and by score band. The ACT CCRS are organized into six achievement levels; each level is defined by a 3-point score band. The CCRS also indicate the ACT College Readiness Benchmarks in each content area (except on the optional Writing test) that were derived using test performance and college placement data. The ACT College Readiness Benchmark scores are clearly delineated on the CCRS tables (see

https://www.discoveractaspire.org/performance-level-descriptors/


1.5

the ACT Technical Manual for more information on derivation of ACT CCRS and ACT College Readiness Benchmarks).

Science learning standards currently vary dramatically across states and organizations. ACT Aspire science assessments for grades 3–7 are associated with the draft ACT Aspire Science College and Career Readiness Knowledge and Skill Domain (See Table 3.23). SMEs developed these statements by back-mapping the ACT CCRS down to grade 3. Content and measurement specialists then consulted several sources including ACT’s evidence, research in science education, the National Science Education Standards, NAEP findings, and input from nationally known SMEs in elementary level science education. These statements were then used to develop the ACT Aspire PLDs. The ACT Aspire PLDs provide a greater level of specificity in describing student achievement across grades and at different scale-score levels.

1.4 ACT Aspire and Standards AlignmentMany sets of standards target college and career readiness exist including the ACT CCRS, the CCSS, and unique standards from states across the United States. The purpose of these standards is to articulate requisite knowledge and skills needed to prepare students for postsecondary and career success. As previously described, ACT Aspire assessments are designed to measure progress and provide evidence to back up the claim, in this case, that students are on target for college and career readiness.

To demonstrate alignment, evidence must be provided that assessment scores support inferences about student achievement and content understanding identified by a given set of expectations (often articulated by a state’s standards). Some assessments are designed to measure a single set of content standards; alignment is sometimes assumed for those cases. Sometimes alignment is demonstrated by asking SMEs to examine several content test forms and standards and judge whether content standards are covered adequately by the test forms. ACT Aspire has been involved in several SME-based alignment studies with each study demonstrating slightly different results depending on the standards examined and on the SMEs who participated.

Strong empirical evidence supports that ACT Aspire scores can be interpreted as measures of college and career readiness. ACT score scales for English, mathematics, reading, and science have been empirically linked to ACT Aspire scores and the ACT College Readiness Benchmarks, which relate scores on the ACT assessment directly to performance in college courses. ACT Aspire Readiness Benchmark scores have been backmapped from student performance data that support inferences of a student being on target for readiness. An empirical link has also been established between the ACT Aspire Composite score and the ACT National Career Readiness Certificate (ACT NCRC), which provides information about student achievement and employable skills in reading, math, and locating information. These research-based connections to real-world college and real-world career performance contribute direct


1.6

evidence of alignment from ACT Aspire assessments to college and career readiness skills and knowledge. Therefore, ACT Aspire assessments exhibit strong alignment with standards that target college and career readiness.

ACT has a historical connection to the CCSS, a set of academic standards adopted by many states. In 2008, The National Governors Association (NGA) and the Council of Chief State School Officers (CCSSO) invited ACT and other partners to participate in the development of a set of research-supported college and career readiness standards. ACT’s role included providing data, empirical research, and staff expertise about what constitutes college and career readiness. Due to some of the shared research upon which the CCSS were based, significant overlap exists between them and the college and career readiness constructs used to guide the development of ACT Aspire.

ACT closely monitors and has contributed to the creation of the Next Generation Science Standards (NGSS) that are being implemented in some states. ACT Science content experts presented ACT research on science curricula and career and college readiness with NGSS developers early in the NGSS standard drafting process. Considerable alignment between ACT Aspire and the NGSS exists, however, the ACT Aspire assessments are not designed specifically to assess the NGSS. As previously described, the ACT Aspire science assessments are based on ACT research on current curricula at the elementary, middle, and high school levels as well as ACT research on college and career readiness.

Some areas of achievement are not measured by ACT Aspire summative assessments. Specifically, any standards that require extended time, advanced uses of technology, active research, peer collaboration, producing evidence of a practice over time, or speaking and listening are not currently assessed on ACT Aspire summative assessments.

Further information on the alignment of ACT assessments can be found in the document “How ACT Assessments Align with State College and Career Readiness Standards,” available at: http://www.discoveractaspire.org/pdf/aCt_alignment-White-Paper.pdf

Many sets of standards target college and career readiness including the ACT CCRS, the CCSS, and unique standards from states across the United States. The purpose of these standards is to articulate requisite knowledge and skills needed to prepare students for postsecondary and career success. As previously described, ACT Aspire assessments are designed to measure progress and provide evidence to back up the claim, in this case, that students are on target for college and career readiness. To demonstrate alignment, evidence must be provided that assessment scores support inferences about student achievement and content understanding identified by a given set of expectations (often articulated by a state’s standards). Some assessments are designed to measure a single set of content standards; alignment is sometimes assumed for those cases. Sometimes alignment is demonstrated by asking SMEs to examine several content test forms and standards and judge whether content standards are covered adequately by the test forms. ACT Aspire has been involved in several SME-based alignment studies with each study demonstrating slightly different results depending on the standards examined and on the SMEs who participated. Strong empirical evidence supports that ACT Aspire scores can be interpreted as measures of college and career readiness. ACT score scales for English, mathematics, reading, and science have been empirically linked to ACT Aspire scores and the ACT College Readiness Benchmarks, which relate scores on the ACT assessment directly to performance in college courses. ACT Aspire Readiness Benchmark scores have been backmapped from student performance data that support inferences of a student being on target for readiness. An empirical link has also been established between the ACT Aspire Composite score and the ACT National Career Readiness Certificate (ACT NCRC), which provides information about student achievement and employable skills in reading, math, and locating information. These research-based connections to real-world college and real-world career performance contribute direct evidence of alignment from ACT Aspire assessments to college and career readiness skills and knowledge. Therefore, ACT Aspire assessments exhibit strong alignment with standards that target college and career readiness. ACT has a historical connection to the CCSS, a set of academic standards adopted by many states. In 2008, The National Governors Association (NGA) and the Council of Chief State School Officers (CCSSO) invited ACT and other partners to participate in the development of a set of research-supported college and career readiness standards. ACT’s role included providing data, empirical research, and staff expertise about what constitutes college and career readiness. Due to some of the shared research upon which the CCSS were based, significant overlap exists between them and the college and career readiness constructs used to guide the development of ACT Aspire. ACT closely monitors and has contributed to the creation of the Next Generation Science Standards (NGSS) that are being implemented in some states. ACT Science content experts presented ACT research on science curricula and career and college readiness with NGSS developers early in the NGSS standard drafting process. Considerable alignment between ACT Aspire and the NGSS exists, however, the ACT Aspire assessments are not designed specifically to assess the NGSS. As previously described, the ACT Aspire science assessments are based on ACT research on current curricula at the elementary, middle, and high school levels as well as ACT research on college and career readiness. Some areas of achievement are not measured by ACT Aspire summative assessments. Specifically, any standards that require extended time, advanced uses of technology, active research, peer collaboration, producing evidence of a practice over time, or speaking and listening are not currently assessed on ACT Aspire summative assessments. Further information on the alignment of ACT assessments can be found in the document “How ACT Assessments Align with State College and Career Readiness Standards,” available at: http://www.discoveractaspire.org/pdf/ACT_Alignment-White-Paper.pdf

Many sets of standards target college and career readiness including the ACT CCRS, the CCSS, and unique standards from states across the United States. The purpose of these standards is to articulate requisite knowledge and skills needed to prepare students for postsecondary and career success. As previously described, ACT Aspire assessments are designed to measure progress and provide evidence to back up the claim, in this case, that students are on target for college and career readiness. To demonstrate alignment, evidence must be provided that assessment scores support inferences about student achievement and content understanding identified by a given set of expectations (often articulated by a state’s standards). Some assessments are designed to measure a single set of content standards; alignment is sometimes assumed for those cases. Sometimes alignment is demonstrated by asking SMEs to examine several content test forms and standards and judge whether content standards are covered adequately by the test forms. ACT Aspire has been involved in several SME-based alignment studies with each study demonstrating slightly different results depending on the standards examined and on the SMEs who participated. Strong empirical evidence supports that ACT Aspire scores can be interpreted as measures of college and career readiness. ACT score scales for English, mathematics, reading, and science have been empirically linked to ACT Aspire scores and the ACT College Readiness Benchmarks, which relate scores on the ACT assessment directly to performance in college courses. ACT Aspire Readiness Benchmark scores have been backmapped from student performance data that support inferences of a student being on target for readiness. An empirical link has also been established between the ACT Aspire Composite score and the ACT National Career Readiness Certificate (ACT NCRC), which provides information about student achievement and employable skills in reading, math, and locating information. These research-based connections to real-world college and real-world career performance contribute direct evidence of alignment from ACT Aspire assessments to college and career readiness skills and knowledge. Therefore, ACT Aspire assessments exhibit strong alignment with standards that target college and career readiness. ACT has a historical connection to the CCSS, a set of academic standards adopted by many states. In 2008, The National Governors Association (NGA) and the Council of Chief State School Officers (CCSSO) invited ACT and other partners to participate in the development of a set of research-supported college and career readiness standards. ACT’s role included providing data, empirical research, and staff expertise about what constitutes college and career readiness. Due to some of the shared research upon which the CCSS were based, significant overlap exists between them and the college and career readiness constructs used to guide the development of ACT Aspire. ACT closely monitors and has contributed to the creation of the Next Generation Science Standards (NGSS) that are being implemented in some states. ACT Science content experts presented ACT research on science curricula and career and college readiness with NGSS developers early in the NGSS standard drafting process. Considerable alignment between ACT Aspire and the NGSS exists, however, the ACT Aspire assessments are not designed specifically to assess the NGSS. As previously described, the ACT Aspire science assessments are based on ACT research on current curricula at the elementary, middle, and high school levels as well as ACT research on college and career readiness. Some areas of achievement are not measured by ACT Aspire summative assessments. Specifically, any standards that require extended time, advanced uses of technology, active research, peer collaboration, producing evidence of a practice over time, or speaking and listening are not currently assessed on ACT Aspire summative assessments. Further information on the alignment of ACT assessments can be found in the document “How ACT Assessments Align with State College and Career Readiness Standards,” available at: http://www.discoveractaspire.org/pdf/ACT_Alignment-White-Paper.pdf

2.1

CHaPtER 2

ACT Aspire Test Development

2.1 Assessment Design Elements and Construct CoherenceACT Aspire tests are designed to measure student achievement and progress toward college and career readiness. A principal philosophical basis for a longitudinal system of tests such as ACT Aspire is that readiness is best assessed by measuring, as directly and authentically as possible, the academic knowledge and skills that students will need in order to be successful. Coherence across the learning trajectory of college and career readiness constructs is required to support inferences about individual and aggregate growth. Skills and understanding develop over time, which means that students must learn new material and must also apply what they have previously learned in more sophisticated ways. ACT Aspire assessments are designed to measure key college and career readiness constructs in a way that recognizes that knowledge and skills are not isolated to specific grades, but rather progress across grades. Items and tasks are designed to elicit evidence about constructs learned primarily in a particular grade, and also collect evidence about the skill progression across grades. From a measurement perspective, this design is an important component of supporting a vertical scale thus reinforcing inferences about growth. Using this design is also important to develop actionable reports for students across all achievement levels.

Although sampling constructs across grades occurs in all the subject tests, it can be illustrated most clearly by examining the design of the mathematics test. In each form, certain items contribute to Foundation scores. Foundation items are explicitly designed to elicit evidence about a concept that was introduced in a previous year but that the student must now apply in a more advanced context. An illustration of this for the mathematics test is provided in Appendix A. These items are important

aCt aSPIRE tESt DEVElOPmENt

2.2

for understanding where a student in the current grade actually falls on the learning trajectory.

As illustrated in Appendix B, the ACT Aspire assessments are designed to be developmentally and conceptually linked across grade levels. To clearly reflect and help interpret that linkage, the reporting categories are associated across all grade levels. The ACT Aspire scores are on the same vertical score scale for each subject. Higher grade levels of ACT Aspire assessments are taken by students who are closest in age and skill development to students taking the ACT and ACT NCRC, and, therefore, these scores have an even stronger empirical bridge to inferring levels of college and career readiness.

2.1.1 aCt aspire Item typesThe goal of assessment is to collect relevant evidence from the student as authentically as possible while sampling enough of the construct to support inferences based on the student's responses. ACT Aspire uses multiple item types to achieve this goal.The ACT Aspire assessments measure student knowledge and abilities in the following areas.

• Selected-response/multiple-choice (SR/MC) items require the examinee to select a single correct response among several provided choices.

• Constructed-response (CR) tasks require students to generate their own response to the questions.

• Technology-enhanced (TE) items and tasks incorporate computer interfaces to ask questions and pose scenarios that are not possible in traditional paper-based formats. TE items may require students to generate their responses, or they may present students with a set of answer options.

Each ACT Aspire test contains a combination of these item types. ACT Aspire English tests across all grade levels feature selected-response and TE items. The constructed response part of English is handled in the ACT Aspire writing tests that feature only constructed-response (based on a single writing prompt). For the reading, mathematics, and science tests, each grade-level exam contains selected-response, constructed-response, and technology enhanced items (with selected response items replacing TE items on paper forms).

2.1.2 Cognitive Complexity and Depth of Knowledge (DOK)The cognitive complexity level of written passages and the cognitive demands of an assessment item are important characteristics to consider when measuring a student’s academic achievement. ACT Aspire assessments reflect expectations that students will need to think, reason, and analyze at high levels of cognitive complexity in order to be college and career ready; items and tasks require the sampling of different levels of cognitive complexity, with most items targeted at upper levels. Due to the wide use of Norman Webb’s depth-of-knowledge terminology, this document describes the


2.3

cognitive complexity of items using language such as depth of knowledge (DOK). Given the various definitions of DOK levels, understanding ACT’s interpretation of them is important to understanding the test design.

Similar to Webb’s definition, DOK levels are assigned to reflect the complexity of the cognitive process required, not the psychometric “difficulty” of the item. Unlike other DOK interpretations, ACT only assigns a DOK level 4 value to describe multiday, potentially collaborative classroom activities and assessments designed for learning purposes. By this definition, DOK assignments on any summative assessment (including ACT Aspire) are limited to values of 1 to 3.

ACT’s DOK level 1 corresponds to Webb’s level 1 where students are primarily actively using knowledge and skills with limited extended processing. ACT’s DOK level 2 extends beyond level 1 and involves applying these cognitive processes to many situations, including real-world situations. Therefore, ACT’s DOK level 2 actually aligns with Webb’s DOK level 2 and some of Webb’s DOK level 3. ACT’s DOK level 3 involves situations where the student must apply high-level, strategic thinking skills to short- and long-term situations. Some of these situations are novel and some require generating something like a graph, but all involve higher-level thinking skills. Given this interpretation, ACT’s DOK level 3 aligns with Webb’s DOK level 3 and DOK level 4.

2.2 ACT Aspire Test Development Processes

2.2.1 Selecting and training Item WritersItem writers are chosen from applicants who demonstrate adequate qualifications (e.g., practicing teachers, subject specialists, curriculum coordinators, department chairs). Item writers have extensive content and pedagogical knowledge and teach at the grade levels covered by ACT Aspire. These educators are actively engaged in teaching at various levels, at a variety of institutions, from small private schools to large public schools.

ACT recruits item writers who represent the diversity found in the United States with respect to ethnic background, gender, English-language proficiency, and geographic location.

Figure 2.1 illustrates the process flow of item and test form development followed by ACT Aspire. The first 3 steps of this process are described in the following sections of this chapter and steps 4-7 are described in subsequent chapters of this technical manual.

2.2.2 Designing Items that Elicit Student EvidenceItem writers are instructed to consider the entire construct when crafting assessment tasks and items. Items are designed to elicit evidence about students’ knowledge and skills that span the full construct, which in many cases, is beyond any specific achievement standard language. Item writers use templates that frame what


2.4

knowledge, skills, and abilities are of greatest interest in construct measurement while calling out unintentional knowledge, skills, and abilities that should not be measured. Items must fulfill task template requirements (e.g., content, DOK, word count, accessibility), reflect diversity, and meet fairness standards.

The goal of crafting high quality items or tasks is to design situations to collect relevant evidence from the student in a manner that is as authentic as possible while sampling enough of the construct to support the inferences based on the student's responses.

2.2.3 Item ReviewAll items undergo rigorous content reviews by internal and external content experts to ensure that they solicit sufficient student evidence, are developmentally appropriate, and that the contents and contexts are error-free. The process includes internal peer reviews, internal senior-level reviews, and external reviews. The content reviews also ensure that each item or task measures what is intended and functions at the intended DOK. External experts participate in fairness reviews to ensure that items and tasks are fair and not biased to one student demographic.

Initial forms are constructed to conform to test specifications. Once constructed, forms are reviewed by ACT staff and external content panelists. These panelists evaluate items for content accuracy and evaluate the test form for context appropriateness and representation. They also confirm that the form will collect the necessary evidence to support the intended inferences from scores.

After forms are finalized, they are administered operationally, and they are equated. Before students receive scores, data is carefully checked to ensure items and forms are working as intended.

2.2.4 Field-testingACT Aspire field testing follows an embedded model in which field test items are spiraled across operational forms. All field test items are prepared and delivered for administration within the same forms as the operational items. This model allows for field test items to be delivered to some students with accommodations. There are no specific directions provided to students or test coordinators about field test items, with the exception that students and educators are alerted that some items may not count toward their overall score.


2.5

ACT’s Test

Development Staff

National Administration

ItemWriters

1.

2.

3. 4.

5.

6.

7.

Pre-TryoutContest

andFairnessReviews

TryoutNational

Administration

NationalFormsInitial

Construction

NationalForms Content

andFairness Reviews

EquatingAdministration

Figure 2.1. aCt aspire field testing model

3.1

CHaPtER 3

Assessment Specifications

3.1 OverviewEach test’s design includes a specified range of item types (i.e., selected-response, constructed-response, technology-enhanced) and a specified range of item difficulties at targeted depths of knowledge organized into content-specific reporting categories. The amount and nature of evidence needed to support an inference about a student’s achievement about a given construct are taken into consideration when selecting the types and number of items developed for each reporting category. These requirements are balanced with maintaining manageable administration conditions. ACT Aspire assessments cover topic progressions from foundational concepts to sophisticated applications. Taken as individual subject tests or as a battery, ACT Aspire assessments can be delivered via computer or as a paper-and-pencil administration.

3.2 ACT Aspire Assessment Support Materials

3.2.1 aCt Knowledge and Skills mapThe ACT Knowledge and Skills Map is an interactive, online tool that can be used to learn more about ACT’s college and career readiness domain across grades. Each subject area provides a grade-by-grade overview of knowledge and skills domains that research suggests a student should achieve on the path to college and career readiness. In addition to providing a framework for understanding the path to college and career readiness, the ACT Knowledge and Skills Map provides sample items and research citations.

http://skillsmap.actlabs.org/map

• username: actlabs

• password: actlabs

http://http://skillsmap.actlabs.org/map

aSSESSmENt SPECIFICatIONS

3.2

3.2.2 aCt aspire Exemplar ItemsACT Aspire has developed two resources designed to help students, parents, educators, and policymakers become familiar with ACT Aspire test presentation and content. These resources, the Student Sandbox and the Exemplar Test Question Booklets, illustrate the different types of test questions and formats found in both paper-based and computer-based testing modes.

http://actaspire.pearson.com/exemplars.html

http://www.discoveractaspire.org/assessments/test-items/

3.3 English Language Arts Assessments OverviewELA knowledge and skills include overlapping constructs related to English, reading, and writing. ACT Aspire ELA assessments are designed to elicit sufficient evidence to support general inferences about student achievement in ELA as well as student achievement in each of these content areas separately. Inferences about student achievement in reading, writing, and English language skills can be made by analyzing ACT Aspire reading, writing, and English test scores, respectively. ACT Aspire reporting uses language consistent with college and career readiness standards in ELA and literacy so that students, parents, and educators can understand performance on the test in terms that clearly relate to instruction.

Students who take ACT Aspire English, reading, and writing tests receive a Composite ELA score that shows how their overall performance compares to the readiness benchmark at their grade. The benchmark is based on empirical research of students who have been identified as college and career ready. The Composite ELA score is reported in an effort to provide insight about student skills that are integrated across the English Language Arts. This perspective aligns with the integrated approach to literacy education in leading achievement standards frameworks.

3.3.1 English testThe English test puts the student in the position of a writer who makes decisions to revise and edit a text. Short texts and essays in different genres provide a variety of rhetorical situations. Students must use the rich context of the passage to make editorial choices, demonstrating their understanding of grammar, usage, and mechanics conventions. Students must also apply understanding of rhetorical purposes and strategies, structure and organize sentences and paragraphs for effective communication, and maintain a consistent style and tone.

Figure 3.1 shows an example of the construct hierarchy for the English test.

3.3.1.1. English Framework

The English Framework articulates constructs measured across the grade levels. Each grade is broken out and each has descriptive statements of knowledge and skills measured within each reporting category. Please see Appendix E for more details.

http://actaspire.pearson.com/exemplars.html

http://www.discoveractaspire.org/assessments/test-items/


3.3

3.3.1.2 English Reporting Categories

The English test measures student knowledge and skill in the following reporting categories.

Production of Writing

The items in this category require students to apply their understanding of the rhetorical purpose and focus of a piece of writing to develop a topic effectively and to use various strategies to achieve logical organization, topical unity, and general cohesion.

• Topic Development: These items require students to demonstrate an understanding of, and control over, the rhetorical aspects of texts by identifying the purposes of parts of texts, determining whether a text or part of a text has met its intended goal, and evaluating the relevance of material in terms of a text’s focus.

• Organization, Unity, and Cohesion: These items require students to use various strategies to ensure that a text is logically organized, flows smoothly, and has an effective introduction and conclusion.

Knowledge of Language

The items in this category require students to demonstrate effective language use through ensuring precision and concision in word choice and maintaining consistency in style and tone.

Conventions of Standard English

The items in this category require students to apply an understanding of the conventions of standard English grammar, usage, and mechanics to revise and edit text.

• Punctuation, Usage, and Capitalization Conventions: These items require students to edit text to conform to standard English punctuation, usage, and capitalization.

• Sentence Structure and Formation: These items test understanding of relationships between and among clauses, placement of modifiers, and shifts in sentence construction.

ACT Aspire English

Reporting Categories

Production of Writing

Knowledge ofLanguage

Conventions ofStandard English

ACT SkillTargets

TopicDevelopment

Organization

Figure 3.1. aCt aspire English test reporting with example skill targets for one reporting category.


3.4

3.3.1.3 English Item Types

The ACT Aspire English Test puts the student in the position of a writer who makes decisions to revise and edit a text. Different passage types provide a variety of rhetorical situations, each accompanied by multiple- choice or constructed-response items.

3.3.1.4 English Test Blueprints

The test consist of texts (such as essays, sentences, and paragraphs), each accompanied by selected-response and, on computer-based forms of the test, technology-enhanced test items (on paper forms, technology-enhanced items are replaced by selected-response items). Different essay genres are employed to provide a variety of rhetorical situations. Texts vary in length depending on the grade level. The stimuli are chosen not only for their appropriateness in assessing writing skills, but also to reflect students’ interests and experiences. Some questions refer to underlined or highlighted portions of the essay and offer several alternatives to the portion underlined. These items include “NO CHANGE” to the underlined or highlighted portion in the essay as one of the possible responses. Some questions ask about a section of a paragraph or an essay, or about the paragraph or essay as a whole. The student must decide which choice best answers the question posed. Non-operational items are included in the total number of items in Tables 3.1–3.4 but do not contribute towards raw-score points.

Table 3.1. Specification Ranges by Item type and Reporting Category for grade 3

Operational Items Number of ItemsRaw-score points

(percentage of test)

Item Types

Multiple choice 21–23 21–23 (84–92%)

Technology enhanced 2–4 2–4 (8–16%)


Production of Writing 8–10 8–10 (32–40%)

Conventions of Standard English 15–17 15–17 (60–68%)

Depth of Knowledge

DOK Level 1 13–14 13–14 (52–56%)

DOK Level 2 3–7 3–7 (12–28%)

DOK Level 3 4–8 4–8 (16–32%)

Non-operational

Field-test 6–10

TOTAL: 31–35* 25 (100%)*

*Total number of items includes field-test items that do not contribute to score points.


3.5

Table 3.2. Specification Ranges by Item type and Reporting Category for grades 4–5



Item Types

Multiple choice 21–23 21–23 (84–92%)




Knowledge of Language 2–3 2–3 (8–12%)


Depth of Knowledge

DOK Level 1 13–14 13–14 (52–56%)

DOK Level 2 3–7 3–7 (12–28%)

DOK Level 3 4–8 4–8 (16–32%)

Non-operational

Field-test 6–10

TOTAL: 31–35* 25 (100%)*





Item Types

Multiple choice 33–35 33–35 (94–100%)






Depth of Knowledge

DOK Level 1 15–17 15–17 (43–49%)

DOK Level 2 6–9 6–9 (17–26%)

DOK Level 3 11–12 11–12 (31–34%)

Non-operational

Field-test 9–12 —

TOTAL: 44–47* 35 (100%)



3.6

Table 3.4. Specification Ranges by Item type and Reporting Category for grade EHS



Item Types

Multiple choice 48–50 48–50 (96–100%)






Depth of Knowledge

DOK Level 1 22–24 22–24 (44–48%)

DOK Level 2 10–12 10–12 (20–24%)

DOK Level 3 15–17 15–17 (30–34%)

Non-operational


TOTAL: 59–60* 50 (100%)*


3.3.2 Reading testThe reading test measures a student’s ability to read closely, reason logically about texts using evidence, and integrate information from multiple sources. Passages in the reading test include both literary narratives, such as prose fiction, memoirs, and personal essays, and informational texts from the natural sciences and social sciences. Across the grades, these texts span a range of complexity levels that lead up to the reading level in challenging first-year college courses. On the test at each grade, passages at different levels of complexity within a range appropriate for the grade present students different opportunities to demonstrate understanding. In addition, reading items assess the student’s ability to complete reading- related tasks at various depth-of-knowledge (DOK) levels, and, as a whole, the tests reflect a range of task difficulty appropriate for the grade level.

Figure 3.2 shows an example of the construct hierarchy for the Reading test.

3.3.2.1 Reading Framework

The Reading Framework articulates constructs measured across the grade levels. Each grade is broken out and each has descriptive statements of knowledge and skills measured within each reporting category. Please see Appendix F for more details.


3.7

3.3.2.2 Reading Reporting Categories

The Reading Test assesses skills in the following reporting categories.

Key Ideas and Details

• The items in this category require students to read texts closely; to determine central ideas and themes, and summarize information and ideas accurately; and to understand relationships and to draw logical inferences and conclusions including understanding sequential, comparative, and cause-effect relationships.

Craft and Structure

• The items in this category require students to determine word and phrase meanings, analyze an author’s word choice rhetorically, analyze text structure, and understand authorial purpose and characters’ points of view. They interpret authorial decisions rhetorically and differentiate between various perspectives and sources of information.

Integration of Knowledge and Ideas

• The items in this category require students to understand authors’ claims, differentiate between facts and opinions, and use evidence to make connections between different texts that are related by topic. Students read a range of informational and literary texts critically and comparatively, making connections to prior knowledge and integrating information across texts. They analyze how authors construct arguments, evaluating reasoning and evidence from various sources.

ACT Aspire Reading


Key Ideas and Details

Craft andStructure

Integration ofKnowledgeand Ideas

Progress withText Complexity

SkillTargets

Close Reading

Central Ideas,Themes andSummaries

Relationships

Figure 3.2. aCt aspire reading test reporting categories with example skill targets for one reporting category

3.3.2.3 Reading Test Complexity and Types of Texts

During the development of the reading test, ACT evaluates reading passages using both quantitative and qualitative analyses. This information is used to determine the levels of text complexity of passages and to build assessments with passages at a range of text complexity appropriate for each grade level.

Table 3.5 shows ACT Aspire’s five levels of text complexity. These levels were established through ACT’s research and validated through a review process that included ELA teachers. Together with tasks that target important reading skills at each


3.8

grade, these complexity levels ensure a coherent progression across the assessment continuum. At each grade level, the Reading Test includes a range of text complexities as specified in the table. Note that a sixth level of complexity, Highly Complex, is not included on the table because those passages are only administered on the ACT.

ACT research staff has analyzed the effect of text complexity on reading comprehension. Drawing on this research, ACT designed a measure that reports student performance on specific reading tasks that require comprehension and integration of information across texts. In addition to the overall reading score, each student receives this Text Complexity Progress Measure. While the overall reading score provides a measure of important comprehension skills, this score offers a focused measure of the skills that a reader uses to draw inferences, make connections, and integrate meaning across the challenging texts found in educational and workplace settings.

Table 3.5. Number of Passages by text Complexity for the aCt aspire Reading test grade level

3 4 5 6 7 8 EHS

Basic X X X

Straightforward X X X X X

Somewhat challenging X X X X X

More challenging X X

Complex X

Performance on the Text Complexity Progress Measure is compared to a readiness level empirically derived from ACT College Readiness Benchmarks. Students who perform above the benchmark level will receive an indication that they are making sufficient progress toward reading the complex texts they will encounter in college and career. Students who perform below the benchmark level will receive recommendations for improvement such as practicing reading appropriately complex texts from a variety of genres, monitoring understanding, and using other strategies to comprehend literary and informational texts.

ACT Aspire Reading passages are drawn from the following range of text types:

• Literary Narrative: Literary passages from short stories, novels, memoirs, and personal essays

• Social science: Informational passages on topics such as anthropology, archaeology, biography, business, economics, education, environmentalism, geography, history, political science, psychology, and sociology

• Natural science: Informational passages on topics such as anatomy, astronomy, biology, botany, chemistry, ecology, geology, medicine, meteorology, microbiology, natural history, physiology, physics, technology, and zoology


3.9

ACT Aspire Reading

PassageTypes

LiteraryNarrative

Social Science

Natural Science

Figure 3.3. aCt aspire reading test item types.

The reading test contains selected response, constructed response, and technology enhanced items (on the CBT version).

Examples of constructed-response tasks in the reading test include:

• Make claims about a passage and provide support using specific details from the text

• Identify similarities and differences between the key ideas of paired passages and provide support using specific details from both texts.

Constructed-response tasks are scored according to rubrics that give students varying amounts of credit for responses that are correct or partially correct, enabling differentiation between multiple skill levels.

3.3.2.4 Reading Test Blueprints

The Reading test comprises four sections in grades 3 through 5 and three sections in grades 6 through EHS. Each section contain one prose passage that is representative of the level and kinds of text commonly encountered in school curriculum at that grade. Each passage is accompanied by a set of multiple-choice test items, and, on computer-based forms of the test, some sections include technology-enhanced and constructed-response items (technology-enhanced items are replaced by multiple-choice items on paper forms). These items focus on complementary and mutually supportive skills that readers use to understand texts written for different purposes across a range of subject areas. Items do not test the rote recall of facts from outside the passage or rules of formal logic, nor do they assess vocabulary in isolation from the context. Additionally, every test includes a constructed response item that introduces a second shorter passage on the same topic as the primary passage. These items require the student to synthesize information from across the texts. Non-operational items are included in the total number of items provided in Tables 3.6–3.7, but do not contribute towards raw-score points.


3.10




Item Types

Multiple choice 18–20 18–20 (62–69%)


Constructed response 3 8 (28%)


Key Ideas and Details 13–15 14–18 (48–62%)

Craft and Structure 7–9 7–11 (24–38%)

Integration of Knowledge and Ideas 2–3 4–6 (14–21%)

Depth of Knowledge

DOK Level 1 2–5 2–6 (8–21%)

DOK Level 2 12–16 15–21 (50–67%)

DOK Level 3 5–8 6–10 (21–33%)

Non-operational


TOTAL 29–31* 29 (100%)


Table 3.7. Specification Ranges by Item type and Reporting Category for grades 8–EHS



Item Types

Multiple choice 20–21 20–21 (65–68%)




Key Ideas and Details 14–15 17–18 (45–58%)

Craft and Structure 6–9 7–11 (23–35%)

Integration of Knowledge and Ideas 1–3 4–6 (13–19%)

Depth of Knowledge

DOK Level 1 5–7 7–9 (23–29%)

DOK Level 2 9–13 12–17 (38–54%)

DOK Level 3 5–9 7–12 (21–38%)

Non-operational


TOTAL 30–32* 31 (100%)



3.11

Table 3.8. Items by Passage type for the aCt aspire Reading test

Grade Level

3 4 5 6 7 8 EHS

LN 10–11 10–11 10–11 7 7 7 7

INFO 10–11 10–11 10–11 14 14 14 14

SSC 5–6 5–6 5–6 7 7 7 7

NSC 5–6 5–6 5–6 7 7 7 7

Note. LN = Literary Narrative (prose fiction, memoirs, personal essays); INFO = Informational Text (SSC = Social Science and NSC = Natural Science)

3.3.3 Writing testThe ACT Aspire Writing Test consists of a single 30-minute summative writing task at each grade. The tasks target one of three primary modes of writing: reflective narrative, analytical expository, or persuasive/argumentative. Students respond in essay form to a writing stimulus. The assessments are designed to provide an indication of whether students have the writing skills they will need to succeed at the next level. All writing tasks have been assigned to DOK Level 3.

3.3.3.1 Writing Task Types by Grade

Because there is one extended writing task at each grade level, ACT Aspire rotates through the three modes to ensure coverage across the grades.

The reflective narrative mode appears at grades 3 and 6. The analytical expository mode appears at grades 4, 7, and early high school. The persuasive/argumentative mode appears at grades 5 and 8. The ACT Aspire assessments are designed to give students at every grade level an opportunity to display the higher-order thinking skills needed for meaningful reflection, analytical explanation, or persuasive argumentation.

Table 3.9. tasks by Writing mode for Each grade of the Writing test

Grade Writing Mode

3 Reflective Narrative

4 Analytical Expository

5 Persuasive Argumentative

6 Reflective Narrative

7 Analytical Expository

8 Persuasive Argumentative

EHS Analytical Expository

ACT Persuasive Argumentative


3.12

3.3.3.2 Writing Scoring Rubrics and Reporting Categories

The Writing Test is scored with a four-domain analytic scoring rubric. Each grade level has a unique rubric because the writing tasks assess different writing modes, but the underlying design is the same across grades.

Each of the four rubric domains corresponds to a different trait of the writing sample; traits in the writing sample are evidence of the following writing competencies:

Reflective Narrative/Analysis/Argument

The name of the first rubric domain corresponds to the mode of writing assessed at the grade level. Regardless of the mode, this rubric domain is associated with the writer’s generation of ideas. Scores in this domain reflect the ability to generate productive ideas and engage with the writing task. Depending on the mode, writers generate ideas to provide reflection, analysis, or persuasive and reasoned argument. Competent writers understand the topic they are invited to address, the purpose for writing, and the audience. They generate ideas that are relevant to the situation.

Development and Support

Scores in this domain reflect the writer’s ability to develop ideas. Competent writers explain and explore their ideas, supporting them with reasons, examples, and detailed descriptions. Their support is well integrated with their ideas. They help the reader understand their thinking about the topic.

Organization

Scores in this domain reflect the writer’s ability to organize ideas with clarity and purpose. Competent writers arrange their writing in a way that clearly shows the relationship between ideas, and they guide the reader through their reflection, analysis, or argument about the topic.

Language Use

Scores in this domain reflect the writer’s ability to use language to convey their ideas with clarity. Competent writers make use of the conventions of grammar, syntax, and mechanics. Their word choice is precise, and they are also aware of their audience, adjusting voice and tone to enhance their purpose.

The reporting categories on the writing test correspond to the rubric domains.

Table 3.10. Writing test Scoring Rubric Domains and associated Reporting Categories

Scoring Rubric Domain Reporting Category

Reflective Narrative/Analysis/Argument (corresponds to writing mode) Ideas and Analysis

Development Development and Support

Organization Organization

Language Use Language Use and Conventions


3.13

ACT Aspire Writing

WritingModes

ReflectiveNarrative

AnalyticalExpository

PersuasiveArgumentative


Ideas andAnalysis

Developmentand Support

Organization

Language Useand Conventions

Figure 3.4. aCt aspire Writing test modes and reporting categories

At grades 3 through 5, the rubrics have five performance levels; at grade 6 and above, the rubrics differentiate among six performance levels. At each grade level and in each domain, a score of 4 is associated with “adequacy,” indicating that a student who achieves this score is on track for success upon entering the next grade level. The five-point rubrics for grades 3 through 5 allow for only one degree of performance above adequate, whereas the six-point rubric allows for two degrees of differentiation above “adequate.” A score of 5 at these grades indicates an advancing level of skill.

The analytic rubrics delineate four dimensions of writing, but traits are scored holistically within each domain. Descriptors within each rubric domain level are not intended to function as independent features assessed separately. Instead, scorers use those elements collectively to form a holistic evaluation of the student’s ability in that domain.

Table 3.11. Points by Content Category for the Writing test

Raw-score points (percentage of test)

Content Category Grades 3-5 Grades 6-EHS

Ideas and Analysis 5 (25%) 6 (25%)

Development and Support 5 (25%) 6 (25%)

Organization 5 (25%) 6 (25%)

Language Use and Conventions 5 (25%) 6 (25%)

TOTAL 20 (100%) 24 (100%)

Further details about the Writing Rubrics for grades 3–EHS can be found in Appendix F.

3.4 Mathematics TestThe mathematics test measures the whole of a student’s mathematical development, breaking it into progress with topics new to the grade and progress with lasting


3.14

topics from previous grades. The mathematics test provides nine reporting categories, facilitating actionable insights.

The ACT Aspire mathematics assessments emphasize quantitative reasoning frequently applied to real-world contexts. The mathematics construct requires making sense of the problem, context, and notation; accessing appropriate mathematical knowledge from memory; incorporating the given information; doing mathematical calculations and manipulations; interpreting; applying thinking skills; providing justification; making decisions based on the mathematics; and communicating results. Knowledge of basic formulas and computational skills are assumed as background for the problems, but memorization of complex formulas and extensive computation are not required.

The mathematical content assessed at a given grade is based on mathematical learning expectations up through that grade that are important to future college and career readiness. Test tasks focus on what students can do with the mathematics they have learned, which encompasses not only content but also mathematical practices.

3.4.1 mathematics Reporting CategoriesSometimes tasks will cut across these domains; they will be assigned to the most appropriate single domain for reporting purposes. All tasks involve some level of mathematical practices. Reporting categories are specific to grade bands indicated in the parentheses following each category name. EHS indicates early high school.

Number & Operations in Base 10 (K–5)

Starting from an understanding of the base 10 system involving 1s and 10s and 100s, students extend to 100s, 1,000s, and further, working toward seeing place-value connections between each place and the neighboring places. Students use these place-value connections to extend their number system to include decimals. Students can compute using place-value understanding, flexibly combining and decomposing groups. This thinking undergirds mental mathematics for one’s lifetime, and being able to explain and critique strategies helps cement these important skills. Concurrently, students are seeing the need for efficient computation, and they understand and can explain the procedures they use. By the end of this grade band, students use place-value strategies fluently for mental arithmetic and explain their reasoning. By the end of this grade band, students compute fluently using efficient computation strategies with multidigit whole numbers and decimals to hundredths.

Number & Operations—Fractions (3–5)

Students develop an understanding of several useful interpretations of fractions. There are connections from unit fractions to equal sharing, and representations of a whole broken into equal-size pieces. But fractions are also numbers, each with a place on the number line and a role in computation. Students compute


3.15

with fractions (up through division with a unit fraction and a whole number) and understand how fraction computation is built on the properties of operations with whole numbers. They can explain why a step in a procedure is necessary.

The Number System (6–8)

Students coming into this grade band have nearly finished extending whole-number operations to work with fractions, and they complete the task, understanding division of one fraction by another. Then the cycle continues: students extend the four operations to work for all rational numbers. Students understand the order of rational numbers, which now allows them to plot points in all four quadrants of the coordinate plane, and they understand absolute value in terms of distance. Students solve problems and use properties of the rational numbers to aid in computation. They clearly tie their computations into their arguments and draw conclusions from conditional statements. Students can convert rational numbers to decimal and to fraction form. But not all numbers are rational numbers, and by the end of this grade band students understand that a number like √2 is not rational and they can work with approximations.

Number & Quantity (EHS)

Coming into high school, students have knowledge of the real number system and they have strong understanding and fluency with rational numbers and the four basic operations so they can work with irrational numbers by working with rational numbers that are close. Students are ready to move from integer exponents to rational exponents. And they are ready to probe a little deeper into properties of the real number system. Later, students extend to complex numbers, which offer the solutions to some simple equations that have no real-number solutions, and students learn to compute in this system. Students are invited to go further, exploring properties of complex numbers and therefore learning more about real numbers. Students are invited to explore vectors and matrices and to view them as number systems with properties, operations, and applications.

Throughout high school, students are maturing in their understanding of quantity and its connections to measurement. Attending to the types of quantities and units can guide solution strategies and help avoid errors. Students work with derived quantities, and when modeling they choose appropriate quantities to model.

Operations & Algebraic Thinking (K–5)

Students bring an understanding of addition and subtraction as well as counting by 2s and 5s to this grade band and use it to develop an understanding of multiplication in terms of combining equal-size groups and of division as forming equal-size groups for sharing. Arrays and rectangular area provide models for multiplication and division, and from those connections, students learn about properties of operations, using those properties to aid in computation. Solving problems, working with factors and multiples, and learning in other domains all contribute to fluency with all four operations and the ability to explain reasoning.


3.16

Students use conditional statements (when that is true, then this is true) mainly with specific numbers but working toward general statements. Numerical expressions and equations capture calculations and relationships, and working with number patterns develops operational sense as well as moving toward the concept of function.

Expressions & Equations (6–8)

This category is closely related to Ratio and Proportional Reasoning/Functions, building a foundation of algebraic understanding. Prior to this grade band, students routinely write numerical expressions. Now students routinely write expressions using a letter to represent a number and are working to attach meaning to expressions and to understand how the properties of numbers are useful for working with expressions. Integer exponents come into play for representing very large and very small quantities, and later, students extend to use rational exponents. Students need to understand what it means for a number to be a solution to an equation, and that solving an equation involves a process of reasoning about solutions, justified by the properties of numbers and operations. Linear equations are the focus, and students use graphs and equations to express relationships. Lines (excepting vertical lines) have a well-defined slope, and the concurrent geometric progress with similar triangles allows students to prove this. Students understand the connections between proportional relationships, lines, and linear equations. Reasoning about the solution of linear equations is applied to solve linear inequalities and to find simultaneous solutions of a pair of linear equations.

Ratios & Proportional Reasoning (6–7)

For 8th grade, proportional relationships are multiplicative relationships between quantities and are the next step toward general linear relationships. Students coming into this grade band start with strong multiplication skills and expand the ways of working with “how many times as much?” comparisons, including ratios, rates, and percentages. They look at rate-of-change and connect unit rates with the slope of a graph, a pattern in a table, and a multiplier in an equation, using all forms of rational numbers. This builds to general understanding of function as input and output pairs and understanding of proportional functions and then linear functions. Functions are not tied to a specific representation: tables can show representative values, graphs can show behavior in a representative region, equations can represent some types of functions. Linear equations y = mx + b represent all linear functions. The Expressions and Equations domain provides ways of working with these equations. Linear functions are useful in modeling.

Functions (8–EHS)

Appropriate relationships are multiplicative relationships between quantities and are the next step toward general linear relationships. Students coming into this grade band start with strong multiplication skills and expand the ways of working with “how many times as much?” comparisons, including ratios, rates, and


3.17

percentages. They look at rate-of-change and connect unit rates with the slope of a graph, a pattern in a table, and a multiplier in an equation, using all forms of rational numbers. This builds to general understanding of function as input and output pairs and understanding of proportional functions and then linear functions. Functions are not tied to a specific representation: tables can show representative values, graphs can show behavior in a representative region, equations can represent some types of functions. Linear equations y = mx + b represent all linear functions. The Expressions and Equations domain provides ways of working with these equations. Linear functions are useful in modeling.

For EHS, functions have been with students since their early years: consider the counting function that takes an input of “seven” and gives “eight” and an input of “twelve” to give “thirteen.” In grade 8, the concept of function was named and became an object of study. Understanding some general properties of functions will equip students for problem solving with new functions they create over their continued studies and careers. Functions provide a framework for modeling real-world phenomena, and students become adept at interpreting the characteristics of functions in the context of a problem and become attuned to differences between a model and reality. Some functions accept all numbers as inputs, but many accept only some numbers. Function notation gives another way to express functions that highlights general properties and behaviors. Students work with functions that have no equation, functions that follow the pattern of an equation, and functions based on sequences, which can even be recursive. Students investigate particular families of functions—like linear functions, quadratic functions, and exponential functions—in terms of the general function framework: looking at rates of change, algebraic properties, and connections to graphs and tables, and applying these functions in modeling situations. Students also examine a range of functions like those defined in terms of square roots, cube roots, polynomials, and exponentials, and also piecewise-defined functions.

Algebra (EHS)

Students coming into high school build on their understanding of linear equations to make sense of other kinds of equations and inequalities: what their graphs look like, how to solve them, and what kinds of applications they have for modeling. Students develop fluency with algebra. They continue to make sense of expressions in terms of their parts in order to use their fluency with strategy and to solve problems. Through repeated reasoning, students develop general understanding of solving equations as a process that provides justification that all the solutions will be found, and students extend to quadratic equations, polynomial equations, rational equations, radical equations, and systems, integrating an understanding of solutions in terms of graphs. Families of equations have properties that make them useful for modeling. Solutions of polynomial equations are related to factors of a polynomial. Students recognize relationships in applications and create expressions, equations, and inequalities to represent problems and constraints.


3.18

Geometry (K–EHS)

For grades 3–5, students start this grade band recognizing shapes in terms of attributes, which now makes for interrelated categories of shapes that share certain attributes (e.g., the category of parallelograms is made up of all shapes where each of the 4 sides is parallel to the opposite side). Viewed as categories, squares make up a subcategory of parallelograms, and so all the properties of parallelograms apply to squares. In general terms, the attributes of a category apply for all subcategories. Shapes can be made from points, lines or line segments, angles, and other shapes. Shapes can have lines of symmetry. Shapes can be decomposed, which has connections to fractions. All of these concepts can be used to solve problems. Students can find examples and counterexamples. Toward the end of this grade band, students extend number-line concepts to the coordinate plane and can plot points in the first quadrant and make interpretations in the context of solving real-world problems.

For grades 6–8, this grade band starts with a focus on area, surface area, and volume. The additive property of area means the area of a polygon can be found by decomposing it into simpler figures the student already can find the area of. Through repeated reasoning, and in conjunction with the Expressions and Equations domain, students bring meaning to formulas for perimeter/circumference, area, surface area, and volume, and solve problems involving triangles, special quadrilaterals, and polygons, with attention to symmetry, working up to circles, cubes, right prisms, cylinders, cones, and spheres. Students develop the spatial reasoning to describe the cross-section of 3-dimensional figures sliced by a plane. Students address geometric distance. In the coordinate plane, they start plotting polygons and finding the length of horizontal and vertical sides. Later they reach the Pythagorean Theorem and its converse, justifying these relations and using them to solve problems in 2 and 3 dimensions and in a coordinate plane. They understand how to use scale. Students delve into creating figures that meet given conditions based on angle measures and lengths and symmetry, noting when the conditions lead to a unique figure. And this leads to the notion of congruence and then to similarity in terms of dilations, translations, rotations, and reflections. Students learn about these transformations and establish facts about, for example, angle sums in a triangle, and use these facts to write equations and solve problems.

For EHS, building on their knowledge from grade 8, students add depth to what they know about transformations, reflections, rotations, and dilations and add precision to their understanding of congruence, similarity, and symmetry. Justification is the glue that holds structure together and lets students answer “Why?” questions, and students justify by using definitions and theorems, tying in calculations and diagrams, considering cases, understanding general versus specific statements, applying counterexamples, and putting statements together into coherent arguments. Students make constructions, solve problems, and model with geometric objects. Informal arguments give a chain of reasoning that leads to formulas for the area of a circle and then on to volume of cylinders, pyramids, and


3.19

cones. Students solve right-triangle problems. All of these results transfer to the coordinate plane, where analytic treatment of distance allows students to derive conditions for parallel and perpendicular lines, to split a line segment into pieces with a given ratio of lengths, to find areas, and to develop equations.

Measurement & Data (K–5)

Measurement and Data topics in this grade band provide mutual reinforcement with topics from other domains: unit conversions related to scaling and multiplication, whole numbers and fractions related to measurement and operations. Students solve problems that involve units and conversions, starting with time. They have a framework for measurement and can work with measurements as data for line plots and scaled picture graphs, and they can interpret these displays during problem solving. Students develop the concepts of perimeter and area through measurement and relate these to multiplication and addition, moving on to the concept of volume with its relationships to measurement, multiplication, and addition. Students understand angle measure and connect this to fractions of a circle.

Statistics & Probability (6–EHS)

For grades 6–8, statistics is about distributions of quantities—like the amount of electricity used by each home in Anchorage, Alaska, last month—and how to tell how likely something is—given what is known about the shape of the distribution and being sure to take into account the context of the data and the data-gathering process. Students have already graphed distributions, but now they pay more attention to the shape of the distribution, particularly where it seems to be centered and how spread out it is. Box plots capture some of the differences between distributions in a way that makes comparison surer. Students make informal inferences and learn about randomness in sampling. Probability is a language for conveying likelihood, and students develop uniform probability models and probability models from empirical data. They can use lists, tables, tree diagrams, and simulation results to represent sample spaces and estimate the probability of compound events.

For EHS, students add to their understanding of distributions of a single quantity, describing center and spread with statistics and interpreting these in the context of the data.

Before high school, students have used 2-way tables and scatter plots to look at relationships between different quantities, and have used linear functions to model relationships that look linear. Now students pay more attention to informal model fit and use other functions to model relationships; they use models for prediction and they interpret characteristics of the model in the context of the data. From 2-way tables, students interpret relative probabilities (including joint, marginal, and conditional relative frequencies but not tied to these terms) and relate these to probabilities. Students look for association and distinguish correlation and causation.


3.20

Randomness unlocks the power of statistics to estimate likelihood, and students learn about the role of randomness in sample surveys, experiments, and observational studies. Students use data to estimate population mean or proportion and make informal inferences based on their maturing judgment of likelihood. They can compare qualities of research reports based on data and can use simulation data to make estimates and inform judgment.

Before high school, students have tacitly used independence, but now the idea is developed with a precise definition. Students relate the sample space to events defined in terms of “and,” “or,” and “not,” and calculate probabilities, first using empirical results or independence assumptions.

Figure 3.5 shows the general flow of the mathematical domains across the grade levels.

Figure 3.5. mathematics Domains, and Primary Currents in the Flow across grades

For each grade, assessment of student progress within the 5 domains of topics new to the grade is complemented by assessment of student progress with lasting topics from previous grades. Not only should students retain what they learned in previous grades, they should actually strengthen and integrate their learning and what they can do with those topics.


3.21

Figure 3.6. Foundation and grade level Progress (illustrated for grade 7)

Progress with topics new to a grade is reflected in the Grade Level Progress reporting category. Progress with lasting topics from previous grades is reflected in the Foundation reporting category. Figure 3.6 shows how these components fit together. The sampling within the Grade Level Progress component is done in such a way to create a reporting category around each of the 5 domains of topics new to the grade. This additional layer of detail contributes greater precision to support insights about students.

Mathematical Practices

Mathematical Practices highlight cross-cutting mathematical skills and understandings and the complex and vital ways they integrate with content. Test items focus on important mathematics, which includes various levels of using mathematical practices. Selection of items for a form takes mathematical practices into account, providing a balance across content and mathematical practices.

In addition to assessing mathematical practices as a general part of scores, ACT Aspire reports individually on two important mathematical practices: Justification & Explanation and Modeling.

Justification & Explanation

Justification & Explanation tasks focus on the whys of mathematics. These are a direct measure, using a constructed-response format, of how well students function with mathematical argument, as appropriate for the given grade. Each task involves


3.22

Justification & Explanation as it applies to some mathematical content. Scores across 4 or 5 Justification & Explanation tasks are reported so that students and teachers can look in more depth at this important mathematical practice and how it is developing in each student. The tasks also contribute to the Grade Level Progress reporting category or to the Foundation reporting category according to the mathematical content involved. The number of Justification & Explanation tasks, raw-score points, and percent of raw-score points is captured in Table 3.12 by grade. Justification & Explanation tasks are each worth 4 raw-score points.

Table 3.12. Number of Justification & Explanation tasks, Raw-Score Points (and Percentage Weight) in Reporting Categories by grade

Grade

Justification & Explanation (JE) 3–5 6 8–EHS

JE tasks overall 4 4 5

JE weight overall 16 pts (43%) 16 pts (35%) 20 pts (38%)

JE tasks in Grade Level Progress 2 2 3

JE weight in Grade Level Progress 8 pts (36%) 8 pts (29%) 12 pts (36%)

JE tasks in Foundation 2 2 2

JE weight in Foundation 8 pts (53%) 8 pts (44%) 8 pts (40%)

EHS = Early High School

The Justification & Explanation rubric, for all grades, is based on a single progression of Justification & Explanation skills, providing a connected measure of progress from grade to grade. The skills are divided into 3 levels at each grade, but these groupings differ depending on the grade. The more advanced skills are at level 3. Some Justification & Explanation skills at level 3 for grade 3 become level 2 skills in later grades, and some skills at level 2 become level 1 skills in later grades.

Justification & Explanation tasks for a grade are designed to elicit evidence of skills at level 2 or level 3 for the grade, meaning that a successful solution to the task will necessarily demonstrate Justification & Explanation skills at level 2 or level 3, respectively. Most tasks that elicit Justification & Explanation skills at a given level will also elicit Justification & Explanation skills at all lower levels.

Modeling

Modeling involves two objects: the actual and the model. A model often helps one predict or understand the actual. The Modeling reporting category represents items that involve producing, interpreting, understanding constraints of, evaluating, and improving models.

There are a minimum number of raw-score points specified for a form for Modeling. Table 3.13 shows the minimum number by grade. Table 3.14 and Table 3.15 show the


3.23

actual average number for 2 forms, and the actual average percentage for those 2 forms, respectively.

Table 3.13. modeling minimum Number (and Percentage) of Raw-Score Points by Reporting Category and grade

Modeling

Grade

3–5 6–7 8–EHS

Minimum number (percentage) of raw-score points overall 7 (19%) 9 (20%) 11 (21%)

Minimum number (percentage) in Grade Level Progress 5 (22%) 6 (21%) 7 (21%)

Minimum number (percentage) in Foundation 2 (14%) 3 (17%) 4 (20%)

Note. Percentages are rounded

Table 3.14. modeling actual average Number of Raw-Score Points for 2 Forms by grade

Modeling

Grade

3 4 5 6 7 8 EHS

Average number of raw-score points 16 13 15 17.5 17.5 21.5 20.5

Average number in Grade Level Progress 10.5 9 8 13 10 12.5 11.5

Average number in Foundation 5.5 4 6 4.5 7.5 9 9

Table 3.15. modeling actual average Percentage of Raw-Score Points for 2 Forms by grade

Modeling

Grade

3 4 5 6 7 8 EHS

Average percentage overall 43% 35% 39% 38% 38% 41% 41%

Average percentage in Grade Level Progress 46% 39% 29% 46% 36% 38% 35%

Average percentage in Foundation 39% 29% 43% 25% 42% 45% 45%

Note. Percentages are rounded

3.4.2 Calculator PolicyStudents taking the mathematics test for grade 6 or higher are encouraged to bring a calculator they are familiar using and can use fluently. A calculator tool is available within the online version. Students are permitted to use most any 4-function, scientific, or graphing calculator. The ACT calculator policy for ACT Aspire is available at www.act.org.

www.act.org

www.act.org


3.24

3.4.3 mathematics Item types, tasks, StimulusThe Mathematics Test uses a specified variety of item types to collect appropriate evidence. Table 3.16 below shows the percentage of raw-score points that come from each item type. These percentages reflect targets and may vary slightly.

Table 3.16. target Percentage of Raw-Score Points by Item type and grade

Item Type

Grade

3–5 6–7 8–EHS

Technology Enhanced 15% 12% 9%

Multiple Choice 42% 53% 53%

Constructed Response 43% 35% 38%

Item Sets

Some test items are part of a set, presented as a common stimulus used for all the items in the set and test items whose solution must utilize some information from the common stimulus. Test items in the set are independent of one another, meaning that getting the correct answer to one item does not hinge on getting the correct answer to another item.

Each form has approximately 2 item sets, with 2–5 items in each set. Items may be technology enhanced, multiple-choice, or constructed response.

Justification & Explanation

Justification & Explanation tasks require students to read and understand a problem, and then generate a response. These tasks are DOK 3. Grade Level Progress has a mix of DOK 3, DOK 2, and DOK 1 items. The Foundation reporting category is looking for growth in what students can do with content from prior years and therefore uses only DOK 2 and DOK 3. Table 3.17 shows the target distribution of DOK by reporting category and grade. Form distributions may vary slightly from these targets. See 2016 Updates on page U.1 for updates to these specifications for 2015–16.


3.25

Table 3.17. target for Raw-Score Points (Percentage of test) [Percentage of Reporting Category] by DOK and grade

Reporting Category

DOK Level

Grade

3–5 6–7 8–EHS

Overall DOK 3 21 (57%) 24 (52%) 29 (55%)

DOK 2 12 (32%) 17 (37%) 18 (34%)

DOK 1 4 (11%) 5 (11%) 6 (11%)

Grade Level Progress

DOK 3 10 (27%) [45%] 12 (26%) [43%] 16 (30%) [48%]

DOK 2 8 (22%) [36%] 11 (24%) [39%] 11 (21%) [33%]

DOK 1 4 (11%) [18%] 5 (11%) [18%] 6 (11%) [18%]

Foundation DOK 3 11 (30%) [73%] 12 (26%) [67%] 13 (25%) [65%]

DOK 2 4 (11%) [27%] 6 (13%) [33%] 7 (13%) [35%]

DOK 1 none none none

Note. Percentages may not sum to 100% due to rounding.

3.4.4 mathematics test blueprintsThe Mathematics Test is composed of different item types (i.e., multiple choice, technology enhanced, and constructed response). There are some nuances to be aware of across delivery modes (i.e., computer and paper) and within reporting categories.

Numbers of the different item types provided in Table 3.18 show a representative range of numbers of items. The reporting category specifications are based on points, not on numbers of items. Technology Enhanced items are replaced by MC items on paper forms. Item counts for Justification & Explanation and Modeling are also included in the counts of items in Grade Level Progress and Foundation. Non-operational items are included in the total number of items, but do not contribute towards raw-score points.


3.26

Table 3.18. Specification Ranges by Item type and Reporting Category for grades 3-5



Item Types

Multiple choice 17–20 17–20 (46–54%)




Grade Level Progress 17 23 (62%)

Numbers & Operations in Base 10 3–4 3–4 (8–11%)

Numbers & Operations—Fractions 3–4 3–4 (8–11%)

Operations & Algebraic Thinking 3–4 3–4 (8–11%)

Geometry 3–4 3–4 (8–11%)

Measurement & Data 3–4 3–4 (8–11%)

Foundation 8 14 (38%)

Justification & Explanation 4 16 (43%)

Modeling ≥12 ≥12 (≥48%)

Depth of Knowledge

DOK Level 1 3–5 3–5 (8–14%)

DOK Level 2 11–13 11–13 (30–35%)

DOK Level 3 8–10 20–22 (54–59%)

Non-Operational Items

Field-test 3–6 –

TOTAL 28–31* 37 (100%)



3.27

Table 3.19. Specification Ranges by Item type and Reporting Category for grades 6-7



Item Types

Multiple choice 23–25 23–25 (50–54%)




Grade Level Progress 22 28 (61%)




Geometry 4–5 4–5 (9–11%)


Foundation 12 18 (39%)


Modeling ≥14 ≥14 (≥41%)

Depth of Knowledge

DOK Level 1 4–6 4–6 (9–13%)

DOK Level 2 16–18 16–18 (35–39%)

DOK Level 3 11–13 23–25 (50–54%)



TOTAL 38–43* 46 (100%)



3.28

Table 3.20. Specification Ranges by Item type and Reporting Category for grade 8



Item Types

Multiple choice 29–30 29–30 (55–57%)




Grade Level Progress 23–24 32–33 (60–62%)




Geometry 5–6 5–6 (9–11%)


Foundation 14–15 20–21 (38–40%)


Modeling ≥15 ≥15 (≥39%)

Depth of Knowledge

DOK Level 1 5–7 5–7 (9–13%)

DOK Level 2 17–19 17–19 (32–36%)

DOK Level 3 13–15 28–30 (53–57%)



TOTAL 43–46* 53 (100%)



3.29

Table 3.21. Specification Ranges by Item type and Reporting Category for EHS



Item Types

Multiple choice 29–30 29–30 (55–57%)




Grade Level Progress 23–24 32–33 (60–62%)




Geometry 5–6 5–6 (9–11%)


Foundation 14–15 20–21 (38–40%)


Modeling ≥17 ≥17 (≥45%)

Depth of Knowledge

DOK Level 1 5–7 5–7 (9–13%)

DOK Level 2 17–19 17–19 (32–36%)

DOK Level 3 13–15 28–30 (53–57%)



TOTAL 44–46* 53 (100%)


Note that for the EHS level, topics new to Grade 8 are included in the GLP reporting category and not the Foundation reporting category. This reporting organization groups topics together, particularly algebra topics, for more coherence in score interpretation.


3.30

3.5 Science TestThe ACT Aspire science assessments are designed based on research evidence about what students need to know and be able to do in order to be on a trajectory where they will be successful in college and career.

Only the ACT Aspire science test questions contribute to students’ science scores—questions on the ACT Aspire math, reading, English, and writing tests are not repurposed to contribute to the science score. Analyses of correlations of other ACT Aspire tests support that the Science test score represents unique achievement information. NCS results also show that educators at all levels, from elementary through postsecondary, report that science is better assessed using a science test rather than a reading, writing, or math test. Testing reading in science is important and aligns with most college and career readiness state standards related to literacy, and because of that, reading in science is assessed on the ACT Aspire reading test.

3.5.1 Science Reporting CategoriesThe science test assesses and reports on science knowledge, skills, and practices across three domains:

• Interpretation of Data

• Scientific Investigation

• Evaluation of Models, Inferences, and Experimental Results

The knowledge, skills, and practices contained in these domains comprise the ACT College and Career Readiness Standards for Science, which link specific skills and knowledge with quantitatively determined score ranges for the ACT science test and a benchmark science score that is predictive of success in science at the post-secondary level. The science test is built on these same skills—skills that students need to learn early, and then continually apply and refine, in order to be on a path to college and career readiness in science. All items on the science test are based on authentic scientific scenarios that are built around important scientific concepts and are designed to mirror the experiences of students and working scientists engaging in real science. Some of the items require that the students have discipline-specific content knowledge (e.g., for the Early High School test, knowledge specific to an introductory middle school biology course), but science content is always assessed in concert with science skills and practices. ACT’s research on science curricula and instruction at the high school and post-secondary levels shows that while science content is important, science skills and practices are more strongly tied to college and career readiness in science.

The content of the science test includes biology (life sciences at the earlier grades), chemistry and physics (physical science at the earlier grades), and Earth/space sciences (e.g., geology, astronomy, and meteorology). Advanced knowledge in these areas is not required, but background knowledge acquired in general, introductory


3.31

science courses may be needed to correctly respond to some of the items in the upper-grade assessments. The assessments do not, however, sample specific content knowledge with enough regularity to make inferences about a student’s attainment of content knowledge in any broad area, or specific part, of the science content domain. The science test stresses science practices over recall of scientific content, complex mathematics skills, and reading ability. When taken with the mathematics test, scores from the two tests are combined to provide a STEM (science, technology, engineering, mathematics) score.

3.5.2 Continuum of ComplexityThere are a wide variety of scenarios in which students may need to apply their scientific knowledge and reasoning skills. Scenarios are categorized along two dimensions on ACT Aspire: Mode and Complexity. Different modes focus on different aspects of science (e.g., scientific data/results, experimentation, and debate). Complexity is determined by several factors that contribute to how challenging a scientific scenario is likely to be for a student. ACT Aspire focuses on five factors in particular: the topic (e.g., weather, photosynthesis rates), the tone (i.e., formality), the nature of the presented data (i.e., the density, content, and structure of graphs, tables, charts, and diagrams), and the difficulty of the embedded concepts (i.e., the introduced and assumed concepts, symbols, terminology, units of measure, and experimental methods). These elements combine in various proportions to result in scenarios of different complexity, and this complexity affects suitability for different grade levels. This complexity also impacts the difficulty of questions and tasks students are asked to perform in a given context (e.g., an easy question can become much more difficult when posed in a more challenging context).

The science test development process and item design is informed by science education research in learning progressions and misconceptions. This research informs the item task designs and test designs to assure that science knowledge, skills, and practices are tested at developmentally appropriate times and in appropriate ways and at increasing levels of sophistication from grade 3 through EHS. Core skills and practices are similar across the grade continuum, as our research shows that these skills and practices should be taught early, and continually refined, as a student progresses through school. What differs most from grade to grade in the science test is the complexity of the contexts in which science knowledge, skills, and practices are assessed. The contexts used on the tests place the students in rich and authentic scientific scenarios that require that the student apply their knowledge, skills, and practices in science to both familiar and unfamiliar situations. With increasing grade level, the scenarios have increasingly complex scientific graphics and experimental designs and are based on increasingly complex concepts. The students must then apply their knowledge, skills, and practices with increasing levels of sophistication. The amount of assumed discipline-specific science content knowledge a student needs to answer questions on the tests also increases. Figure 3.7 illustrates this continuum.


3.32

Figure 3.7. aCt and aCt aspire science content complexity continuum.

3.5.3 Science Item typesACT Aspire science items and tasks require students to apply what they have learned in their science classes to novel, richly contextualized problems. The test is structured to collect evidence about students’ ability to, for example, evaluate the validity of competing scientific arguments or to scrutinize the designs of experiments, and to find relationships in the results of those experiments. Technology-enhanced items and tasks provide various ways to collect unique evidence about how students engage with scientific material. To fully cover this complexity continuum, six different stimulus (passage) modes are employed across the Science Test, which vary from grade 3 through Early High school. These include using modes focused on data interpretation, scientific experimentation, or scientific debate.

Table 3.22 shows how many passages from which stimulus modes are used at each grade level.

Conflicting Viewpoints

A passage format used in grades 8 and EHS in which two or more differing scientific explanations/arguments on the same topic are given, sometimes with one or more graphs, tables, and/or diagrams. Items focus on understanding the explicit and implicit points of each explanation, comparing and contrasting the explanations, drawing conclusions from the arguments, and evaluating the validity of the arguments.

Research Summaries

A passage format used for grades 7–EHS involving a detailed experimental procedure along with one or more graphs, tables, and/or diagrams for experimental setups and results. (Technology enhancements involve gathering the data by manipulating variables.) Items focus on aspects of experimental design, accompanying procedures, obtaining and analyzing data, and drawing conclusions from the data.


3.33

Data Representation

A passage format used for grades 6–EHS in which a brief description is given for one or more graphs, tables, and/or diagrams. Items focus on obtaining and analyzing data and drawing conclusions from the data. Technology enhancements include videos, animations, and data manipulations (e.g., moving data points to affect a trend).

Student Viewpoints

A passage format used for grades 6 and 7 that is analogous to conflicting viewpoints but with one to three brief and simple explanations/arguments that support an observation (e.g., students may have differing interpretations of a data set or teacher demonstration). Items focus on understanding the explicit and implicit points of the argument(s), drawing conclusions from an argument, and evaluating the validity of an argument.

Science Investigations

A passage format used for grades 3–7 that is analogous to research summaries but is shorter, simpler, more student-centered experimental procedures along with one to two graphs, tables, and/or diagrams for experimental setups and results. (Technology enhancements involve gathering the data by manipulating variables.) Items are focused on aspects of experimental design, accompanying procedures, obtaining and analyzing data, and drawing conclusions from the data.

Data Presentation

A passage format used for grades 3–5 that is analogous to data representation, but data sources are simpler (and rarely more than two per unit) and more related to everyday life, with briefer descriptions. Technology enhancements include videos, animations, and data manipulations (e.g., moving data points to affect a trend).

Table 3.22. Stimulus modes used on the aCt aspire Science test

Stimulus Mode

Grade Level

3 4 5 6 7 8 EHS

Conflicting Viewpoints 1 1

Research Summaries 1 2 2

Data Representation 1 1 2 2

Student Viewpoints 1 1

Science Investigation 2 2 2 2 1

Data Presentation 2 2 2


3.34

Table 3.23. aCt aspire Science College and Career Readiness Knowledge and Skill Domain

Reporting Category Skill Area Skill Code Skill Statement

College & Career

Readiness Standards

Interpretation of Data (IOD)

Locating and Understanding

IOD-LU-01 Select one piece of data from a data presentation IOD 201, 401

IOD-LU-02 Find information in text that describes a data presentation IOD 203, 303

IOD-LU-03 Select two or more pieces of data from a data presentation IOD 301, 401

IOD-LU-04 Identify features of a table, graph, or diagram (e.g., axis labels, units of measure) IOD 202

IOD-LU-05 Understand common scientific terminology, symbols, and units of measure IOD 302

Inferring and Translating

IOD-IT-01 Translate information into a table, graph, or diagram IOD 403

IOD-IT-02 Determine how the value of a variable changes as the value of another variable changes in a data presentation IOD 304, 503

IOD-IT-03 Compare data from a data presentation (e.g., find the highest/lowest value; order data from a table) IOD 402, 502

IOD-IT-04 Combine data from a data presentation (e.g., sum data from a table)

IOD 501, 601, 701

IOD-IT-05 Compare data from two or more data presentations (e.g., compare a value in a table to a value in a graph)

IOD 501, 601, 701

IOD-IT-06 Combine data from two or more data presentations (e.g., categorize data from a table using a scale from another table)

IOD 501, 601, 701

IOD-IT-07 Determine and/or use a mathematical relationship that exists between data (e.g., averaging data, unit conversions)

IOD 504, 602

Extending and Reevaluating

IOD-ER-01 Perform an interpolation using data in a table or graph IOD 404, 603

IOD-ER-02 Perform an extrapolation using data in a table or graph IOD 404, 603

IOD-ER-03 Analyze presented data when given new information (e.g., reinterpret a graph when new findings are provided) IOD 505, 702

Scientific Investigation (SIN)

Locating and Comparing

SIN-LC-01 Find information in text that describes an experiment SIN 201, 303

SIN-LC-02 Identify similarities and differences between experiments SIN 404

SIN-LC-03 Determine which experiments utilized a given tool, method, or aspect of design SIN 405


3.35

Table 3.23. (continued)


College & Career

Readiness Standards

Scientific Investigation (SIN) (continued)

Designing and Implementing

SIN-DI-01 Understand the methods, tools, and functions of tools used in an experiment

SIN 202, 301, 302, 402

SIN-DI-02 Understand an experimental design SIN 401, 403, 501

SIN-DI-03 Determine the scientific question that is the basis for an experiment (e.g., the hypothesis) SIN 601

SIN-DI-04 Evaluate the design or methods of an experiment (e.g., possible flaws or inconsistencies; precision and accuracy issues)

SIN 701

Extending and Improving

SIN-EI-01 Predict the results of an additional trial or measurement in an experiment SIN 502

SIN-EI-02 Determine the experimental conditions that would produce specified results SIN 503

SIN-EI-03 Determine an alternate method for testing a hypothesis SIN 602

SIN-EI-04 Predict the effects of modifying the design or methods of an experiment SIN 702

SIN-EI-05 Determine which additional trial or experiment could be performed to enhance or evaluate experimental results SIN 703

Evaluation of Models, Inferences, and Experimental Results (EMI)

Inferences and Results: Evaluating and Extending

EMI-IE-01 Determine which hypothesis, prediction, or conclusion is, or is not, consistent with a data presentation or piece of information in text

EMI 401, 601

EMI-IE-02 Determine which experimental results support or contradict a hypothesis, prediction, or conclusion EMI 505

EMI-IE-03 Determine which hypothesis, prediction, or conclusion is, or is not, consistent with two or more data presentations and/or pieces of information in text

EMI 501, 701

EMI-IE-04 Explain why a hypothesis, prediction, or conclusion is, or is not, consistent with a data presentation or piece of information in text

EMI 401, 601

EMI-IE-05 Explain why presented information, or new information, supports or contradicts a hypothesis or conclusion EMI 502, 702

EMI-IE-06 Explain why a hypothesis, prediction, or conclusion is, or is not, consistent with two or more data presentations and/or pieces of information in text

EMI 501, 701


3.36



College & Career

Readiness Standards

Evaluation of Models, Inferences, and Experimental Results (EMI)

(continued)

Models: Understanding and Comparing (Grade 6 and higher)

EMI-MU-01 Find information in a theoretical model (a viewpoint proposed to explain scientific observations) EMI 201

EMI-MU-02 Identify implications and assumptions in a theoretical model EMI 301, 402

EMI-MU-03 Determine which theoretical models present or imply certain information EMI 302, 403

EMI-MU-04 Identify similarities and differences between theoretical models EMI 404

(Grade 6 and higher)

Models: Evaluating and Extending

EMI-ME-01 Determine which hypothesis, prediction, or conclusion is, or is not, consistent with a theoretical model EMI 401, 601

EMI-ME-02 Determine which hypothesis, prediction, or conclusion is, or is not, consistent with two or more theoretical models EMI 501, 701

EMI-ME-03 Identify the strengths and weaknesses of theoretical models EMI 503

EMI-ME-04 Determine which theoretical models are supported or weakened by new information EMI 504

EMI-ME-05 Determine which theoretical models support or contradict a hypothesis, prediction, or conclusion EMI 505

EMI-ME-06 Use new information to make a prediction based on a theoretical model EMI 603

EMI-ME-07 Explain why presented information, or new information, supports or weakens a theoretical model EMI 602

EMI-ME-08 Explain why a hypothesis, prediction, or conclusion is, or is not, consistent with a theoretical model EMI 401, 601

EMI-ME-09 Explain why a hypothesis, prediction, or conclusion is, or is not, consistent with two or more theoretical models EMI 501, 701


3.37

3.5.4 Science test blueprintsThe test presents several sets of scientific information, each followed by a number of test items. The scientific information is conveyed in one of three different formats: data representation (graphs, tables, and other schematic forms), research summaries (descriptions of several related experiments), or conflicting viewpoints (expressions of several related hypotheses or views that are inconsistent with one another).

Tables 3.24–3.26 show the makeup of the ACT Aspire Science Tests for each grade level. Technology Enhanced items are replaced by MC items on paper forms. Non-operational items were included in the total number of items, but do not contribute towards raw-score points.




Item Types

Multiple choice 29–30 29–30 (55–57%)




Interpretation of Data 12–17 14–19 (39–53%)

Scientific Investigation 6–8 8–11 (22–31%)

Evaluation of Models, Inferences, and Experimental Results 5–8 9–13 (25–36%)

Depth of Knowledge

DOK Level 1 4–7 4–11 (10–25%)

DOK Level 2 13–20 6–27(45–65%)

DOK Level 3 3–8 8–15 (20–35%)



TOTAL 33–34* 36 (100%)



3.38




Item Types

Multiple choice 22–23 22–23 (55–58%)







Depth of Knowledge

DOK Level 1 2–8 2–9 (5–20%)

DOK Level 2 15–25 20–30 (45–65%)

DOK Level 3 3–9 12–18 (25–40%)



TOTAL 37–38* 40 (100%)



3.39

Table 3.26. Specification Ranges by Item type and Reporting Category for grades 8–EHS



Item Types

Multiple choice 23–24 23–24 (58–60%)







Depth of Knowledge

DOK Level 1 2–8 2–9 (5–20%)

DOK Level 2 15–25 20–29 (45–65%)

DOK Level 3 3–9 11–18 (25–40%)


Field-test 6–8 ––

TOTAL 38–40* 40 (100%)


4.1

CHaPtER 4

Item and Task Scoring

4.1 Selected-Response Item ScoringSelected-response items require students to select a correct answer from several alternatives. Each correct selected-response item has a value of 1 point. Incorrect, missing responses (items that a student did not answer) and multiple responses have a value of zero points.

4.2 Technology-Enhanced Items ScoringTechnology-enhanced require students to make decisions about how to correctly answer questions. Some TE items require more than one decision or action. All TE items are dichotomously scored (correct response is 1 point; incorrect or missing responses scored as zero points).

4.3 Constructed-Response TasksStudent responses to constructed-response tasks are read and scored by trained raters. Training materials for field-testing of constructed-response tasks were created during range finding. Range finding involves test development specialists, content experts, and expert raters previewing student responses to determine whether the content-specific scoring criteria for each task accurately reflect and encompass all of the acceptable student responses. The initial round of range finding also includes analysis and validation of the scoring rubrics used across tasks. Once the tryout tasks are scored, they undergo statistical analysis focusing on difficulty, validity, and accessibility to determine whether they are suitable for operational use. During range finding, a selection of student responses is individually rated by multiple raters using the scoring criteria and the appropriate rubric. Responses that do not receive the

ItEm aND taSK SCORINg

4.2

same score by all raters are discussed by the entire group and a consensus score is reached. Additions and clarifications to the scoring criteria can be made at this time, if necessary.

After test development specialists and expert raters have completed range finding, responses are sorted and put together into training sets. Training materials for the qualification item include an anchor set, multiple practice sets, and at least two qualification sets that prospective raters must score accurately in order to be eligible to score field-test responses.

Responses chosen for the anchor set represent clear examples of each score point. Practice set and qualification set responses also include clear examples as well as responses that are not as perfectly aligned to the scoring criteria; these frequently occurring responses are less straightforward and fall either slightly high or low within each score point, making them challenging to score.

Each response included in the training materials is analyzed during range finding. An articulation (rationale) explaining why a particular score was assigned accompanies the response. The articulation explains how the rubric and scoring criteria were used to determine the score; citations from the exemplar response are included, where appropriate, to illustrate the claims made in the articulation.

4.3.1 Performance Scoring Quality Control

4.3.1.1 Scorer Qualifications and Experience

Scoring quality starts with the recruitment process and extends through screening and placement (assigning scorers to items based on their skills and experience), training, qualification, and scoring.

Priority is given to scorers with previous experience in scoring similar assessments. In most cases, these professional scorers have specialized educational and professional experience, including valuable experience in performance scoring. All scorers have, at a minimum, a four-year college degree.

The pool of scorers typically reflects a cross section in terms of age, ethnicity, and gender although placement and retention of scorers is based upon their qualifications and the quality and accuracy of their scoring.

4.3.1.2 Managing Scoring Quality

Scorer exception processing is used at defined intervals to check scorer accuracy. Scorers who fall below predetermined standards receive automatically generated messages that interrupt their scoring and direct them to work with a scoring supervisor to review anchor papers or take other steps to improve their scoring accuracy.


4.3

Validity responses are prescored responses strategically interspersed in the pool of live responses in order to gauge scorer accuracy. These responses are not distinguishable from live responses, but scores are compared to the true scores assigned by scoring supervisors and do not factor into final student scores in any way. The validity mechanism provides an objective and systematic check of accuracy. It verifies that scorers are applying the same standards throughout the project and, therefore, guards against scorer drift and ultimately group drift.

Backreading is the primary tool for proactively guarding against scorer drift. Scoring supervisory staff review these responses to confirm that the scores were correctly assigned and to give customized feedback and remedial training to individual scorers. Backreading scores will override the first score.

Frequency distribution reports show a breakdown of score points assigned on a given task. Expressed in percentages, data in these reports show how often scorers, individually and as a group, assign each score point. A project may have general expectations for frequency distribution ranges, which are typically based on historical data.

Ten percent of ACT Aspire CR student responses will be read and scored independently by two different scorers. Second scoring allows scoring supervisors to closely monitor the performance of scorers, though these scores do not contribute to a student’s final reported score. Supervisors will use the second scores to provide scoring inter-rater reliability statistics and monitor scorer performance.

Inter-rater reliability is the agreement between the first and second scores assigned to student responses. Inter-rater reliability measurements include exact, adjacent, and nonadjacent agreement. Scoring supervisors use inter-rater reliability statistics as one factor in determining the needs for continuing training and intervention on both individual and group levels.

The electronic scoring system is capable of purging the scores assigned by a scorer whose work is deemed substandard. Scoring supervisors can reset scores by individual scorer, date range, or item, whereby the scores assigned by an individual are cleared from the database and the affected responses are reset. The responses are then randomly rerouted to qualified scorers and rescored according to the original scoring design.

Reports used to monitor quality and project completion status are generated and updated automatically and are available to scoring supervisors at any time via the digital scoring system. The reports give daily and cumulative statistics and provide individual and group average agreement percentages. Table 4.1 summarizes the reports.


4.4

Calibration sets are used to reinforce range finding standards, communicate scoring decisions, or correct scoring issues and trends. The primary goal of calibration is to continue training and reinforce the scoring standards. Calibration set responses may be “on the line” between score points, or they may contain unusual examples that are challenging to score, and therefore useful for reinforcing the scoring rubric.

Table 4.1. Scorer Performance Reports

Report Name Description

Daily/Cumulative Inter-Rater Reliability Summary

Group-level summary of both daily and cumulative inter- rater reliability statistics for each day of the scoring project.

Frequency Distribution Report

Task-level summary of score point distribution percentages on both a daily and a cumulative basis.

Daily/Cumulative Validity Summary

Summary of agreement for validity reads of a given task on both a daily and a cumulative basis.

Completion Report Breakdown of the number of responses scored and the number of responses in each stage of scoring (first score, second score, resolution).

Performance Scoring Quality Management Report

Summary of task-level validity and inter-rater reliability on a daily and cumulative basis. This report also shows the number of resolutions required and completed, and task- level frequency distribution.

4.3.2 automated ScoringCombining human and AI scoring provides an effective and efficient process for scoring student responses while maintaining scoring quality.

For ACT Aspire Constructed Response (CR) items in Writing, Science, and Reading, we provide a combined human and artificial intelligence approach to scoring. We train an automated scoring engine on responses collected during the field test to determine which items can be satisfactorily scored using automated scoring. Rigorous statistical measures are used to ensure that the scores provided by each item’s automated scoring calibration are comparable to, and interchangeable with, human scores. This entails comparing human and automated scores by exact agreement, Quadratic Weighted Kappa correlation, and Conditional Agreement by Scorepoint. Automated scoring is only used for an item if the item’s calibration passes the established performance benchmarks for these key calibration approval criteria.

To provide an effective and efficient solution, our goal is for the CR responses to be scored using automated scoring as the first score with a 10% second score by human scorers.


4.5

Overview of the Automated Scoring Process

Automated scoring for ACT Aspire uses a machine-learning approach in which the engine is trained to score based on the collective wisdom of trained human scorers. Using a sample of human-scored student responses, the engine learns the different features that human scorers evaluate when scoring a response and how they weigh and combine those features to produce a score. The process begins with several hundred to one thousand or more human-scored student responses. The collection of student responses is divided into two sets, one set used to train and tune the scoring engine and the other used to validate performance.

The performance of automated scoring depends on both the sample of student responses available and the performance of human scorers. The goal is to have as much, and as accurate information as possible about how a response should be evaluated. Pearson’s human scorers will double-score the field test responses, applying resolution for non-adjacent disagreements, to fully support automated scoring. Having double-scored human responses is critically important. First, double-scoring provides more information about the true quality of a constructed response that can be used when training the automated scoring engine. Knowing that a response is on the cusp between two score points, a high 2 or a low 3 for example, provides a more complete picture of the score range. Second, double-scoring provides a measure of human performance against which automated scoring performance can be evaluated.

The goal is to have as much, and as accurate information as possible about how a response should be evaluated, based on human scoring. Pearson’s human scorers will double- score the field test responses, applying resolution for nonadjacent disagreements, to fully support automated scoring. We will also work with the CDOE to review and evaluate the automated scoring process and results

Involving Humans in the Automated Scoring Process

Humans play a key role in maintaining quality throughout the scoring process, even when automated scoring is used. Not only are human scorers involved in the selection of responses to train the scoring engine, but human scorers also second score 10% of the responses in order to provide an additional quality check of the engine. Human scoring is further monitored through increased validity distribution and through supervisor backreading—key indicators of scoring accuracy.

If human scores are discrepant from the automated scoring engine, an expert human scorer reviews the response to resolve the discrepancy.

Finally, if the automated scoring engine determines it cannot score a response, that response is routed to a human scorer. This is particularly useful with responses that appear to be off-topic, not English, highly unusual, or creative.


4.6

Training the Automated Scoring Engine

To learn to score essays, the automated scoring engine incorporates a range of machine learning and natural language processing technologies to evaluate the structure, style, and content of writing in the responses. The automated scoring engine uses a variety of semantic analysis techniques to generate semantic similarity of words and passages by analyzing large bodies of relevant text. The automated scoring engine can then “understand” the meaning of text much the same as a human reader.

Figure 4.1 illustrates some of the features used by the automated scoring engine to identify key components of student writing performance.

Figure 4.1. Components of student writing performance.

To evaluate aspects of the essay related to development, audience, content, and organization, the automated scoring engine derives semantic models of English based on a statistical analysis of large volumes of text. For essay scoring applications, it is trained on a collection of text that is equivalent to the reading a student is likely to have done over the course of their academic career (about 12 million words).

Because semantic analysis operates over the semantic representation of texts, rather than at the word level, it is able to evaluate similarity even when texts have few words in common. For example, semantic analysis finds the following two sentences to have a high semantic similarity despite having no words in common:

Surgery is often performed by a team of doctors.

On many occasions, several physicians are involved in an operation.


4.7

Semantic analysis allows the automated scoring engine to compare a new response against a large set of prescored responses, using the scores of the most similar responses to help determine the score of the new response. This comparison is similar to the approach that human scorers take when evaluating responses. Human scorers are trained based on anchor papers of annotated student responses with agreed upon scores. Human scorers compare new responses against the anchor set to determine the appropriate score.

The automated scoring engine scores responses similarly, except that it is able to make comparisons against a much larger set of examples. Rather than comparing a new response against a set of 16-24 examples in an anchor set, for example, it compares against the set of hundreds or thousands of responses on which it was trained. It is thus able to take advantage of a much wider range of examples that have been scored and validated based on all of the training and knowledge that goes into human scoring and the methods that are used to monitor and maintain human scoring quality.

Role of Automated Scoring

Automated scoring is used in a carefully calculated combination with human scoring of the Constructed Response items, with the intent to reduce scoring turnaround times and lead to more rapid reporting, while still maintaining scoring reliability and consistency.

Enhancing Human Scoring with an Automated System

AI technology enhances human scoring rather than replacing it. In the traditional scoring model, two human scorers assess each essay. In an AI model, the scoring engine will apply one score and a human the 10% second score.

Consistent Results with Artificial Intelligence Scoring

The automated scoring engine has been used in a variety of assessment settings over two decades to score responses written by regular education students, English learners, and students with disabilities. Indeed, automated scoring can isolate and evaluate different aspects of an essay without bias. As a result, students who express good ideas, but perhaps do not use standard English, can still be rewarded for their ideas.


4.8

5.1

CHaPtER 5

AccessibilityACT Aspire uses a variety of levels of accessibility support including default embedded tools, open access tools, and full accommodations to allow students with disabilities to participate in testing. For more information about ACT Aspire accommodations, see the ACT Aspire Accessibility User’s Guide.

5.1 Development of the ACT Aspire Accessibility Support SystemThe Standards for Educational and Psychological Testing (AERA, APA, & NCME, 2014), address fairness in testing as the central concern posed by the threat to validity known as measurement bias. The Standards specify two major concepts that have emerged in the literature for minimizing such bias, namely accessibility and Universal Design. Accessibility is defined as, “the notion that all test takers should have an unobstructed opportunity to demonstrate their standing on the construct(s) being measured.” (p.49). The second major concept, Universal Design, is defined as, “an approach to test design that seeks to maximize accessibility for all intended examinees” (p.50).

The development of all ACT Aspire accessibility supports followed a theory of action known as Access by Design (Fedorchak, 2013) which incorporates into its conceptual structure elements of Universal Design for Learning (UDL) described by the Center for Applied Special Technologies (CAST, 2011), and Evidence Centered Design (Mislevy, Almond, & Loves, 2004; Mislevy & Haertel, 2006).

As illustrated in Appendix B, the ACT Aspire assessments are designed to be developmentally and conceptually linked across grade levels. To reflect that linkage, the assessment names are the same across all ACT Aspire levels and the scores are on the same vertical score scale. Higher grade levels of ACT Aspire are directly

aCCESSIbIlIty

5.2

connected with the ACT and, therefore, offer an empirical bridge to understanding college and career readiness.

Universal Design is necessary but it is not sufficient by itself to meet the needs of every learner with diverse needs (UDL Guidelines, Version 2.0, Principle III, 2011). Multiple means of engagement are required to meet the needs of all users. To build a system of accessibility supports that meets the needs and provides a fair performance pathway for all learners, a system of accessibility levels of support was identified and created using a structured data collection procedure designed at ACT and called Accessibility Feature Mapping. The development team conducted a detailed feature analysis of every passage and item to determine which accessibility supports could be permitted or provided for at least 7 diverse learner populations that would fully honor the constructs being measured. The data collected from this feature mapping process served to inform the accessibility policy and support system for ACT Aspire. (The ACT Aspire Accessibility Levels of Support resulting from this procedure are described more fully in section 5.2.)

The Accessibility Feature Mapping Process involved creating a data chart for every item in every form for all audio scripted ACT Aspire items. Every feature within each item (passage, stem, equations, graphics, interaction type; etc.) was cross- inventoried by: 1. Constructs being measured (original content metadata targets) and 2. Cognitive Performance demands of the item: (Presentation demands, Interaction & Navigation demands, Response demands and General Test Condition demands). Then for each of 7 targeted learner populations (see schematic below) valid access pathways to this item (either already available or shown to be needed by the analysis) were identified and compared with targets to ensure that all task performance pathways led to valid measurement outcomes and did not violate construct intended for item. The schematic components of this development procedure are shown below:

Components of the Accessibility Feature Mapping Process used to determine valid communication performance pathways (accessibility support options) to be allowed during ACT Aspire Testing.

aCCESSIbIlIty

5.3

What claim(s) does performance on this item support?

Task/Item Content Metadata-Claims (as available)

Claim level 1: Content Area

Claim level 2: Broad subarea w/in content

Claim level 3: Domain

Claim level 4: Primary Cluster

Claim level 5: Secondary Cluster, as applicable

What performance does this task require?

Communication Demands of the Task Based on Claims (boundaries of Acceptable Performance Eveidence):

1. Task Presentaion demands

2. Task Interaction & Navigation

demands

3. Task Responce demands, and

4. General Test Condition demands

Who has a valid pathway to demonstrating the required performance?

Known Communication Access Pathways Usable/Needed by learners with:

1. Default/Typical communication access needs

2. Blind/Low Vision communication access needs

3. Deaf/Hard of Hearing communication access needs

4. Limited Motor Control communication access needs

5. English Language learner communication access needs

6. Reading or Language impaired communication access needs

7. Attention, Focus or Endurance communication access needs

Figure 5.1. accessibility feature mapping process.

5.2 Test Administration and Accessibility Levels of SupportAll accessibility supports permitted during ACT Aspire testing and described in ACT Aspire Accessibility User Guide are designed to remove unnecessary barriers to student performance on the assessments. Based on the feature mapping development process described above that incorporated the targeted construct definitions, all permitted supports fully honor the content, knowledge, and skills the tests measure.

5.2.1 understanding levels of accessibility SupportAccessibility is universal concept that is not restricted to any one group of students. It describes needs we all have regardless of whether or not we have an official diagnostic label. The term “accessibility” also fully addresses the needs of those students with disabilities as well as those who are English Learners. The older and more familiar term “accommodations” describes only one intensive level of support that few students actually need.

Over the last decade in educational research and practice we have come to understand that all students have tools they need and use every day to engage in the classroom and communicate effectively what they have learned and can do. (See the References section later in this manual.) There are different levels of support that students may need in order to demonstrate what they know and can do on academic

aCCESSIbIlIty

5.4

tests. ACT Aspire assessments make several possible levels of support available. All these levels of support taken together are called accessibility supports. These accessibility supports:

• allow all students to gain access to effective means of communication that in turn allow them to demonstrate what they know without providing an advantage over any other student

• enable effective and appropriate engagement, interaction, and communication of student knowledge and skills

• honor and measure academic content as the test developers originally intended

• remove unnecessary barriers to students’ demonstrating the content, knowledge, and skills being measured on ACT Aspire assessments.

Figure 5.2. aCt aspire levels of accessibility

In short, accessibility supports do nothing for the student academically that he or she should be doing independently; they just make interaction and communication possible and fair for each student.

The ACT Aspire accessibility system defines four levels of support that range from minor support (default embedded system tools) to extreme support (modifications). Figure 5.2 shows the architectural structure of ACT Aspire accessibility supports.

ACT Aspire permits the use of those accessibility supports that will honor and validly preserve the skills and knowledge that our tests claim to measure, while removing needless, construct- irrelevant barriers to student performance. The four levels of support in the ACT Aspire accessibility system represent a continuum of supports,

aCCESSIbIlIty

5.5

from least intensive to most intensive, and assumes all users have communication needs that fall somewhere on this continuum. The unique combination of supports needed by a single test taker is called the Personal Needs Profile (PNP). A PNP tells the system which supports to provide for a specific test taker. Many students will not need a documented PNP. When a student’s communication needs are not documented in a PNP, the system treats the student as a default user whose accessibility needs are sufficiently met through the default test administration represented by the base of the pyramid—that is, without accessibility features other than the basic set already embedded for all test takers. (See support level 1, “Default Embedded System Tools” in Figure 5.1; these supports are also described in the next section.) The continuum of supports permitted in ACT Aspire results in a personalized performance opportunity for all. This systematic continuum of supports is schematically illustrated in Figure 5.2.

5.2.2 Support level 1: Default Embedded System toolsThe first level of supports is called the default embedded system tools (see Figure 5.3). They are automatically available to a default user whose accessibility needs are sufficiently met through the basic test administration experience.

Default embedded system tools meet the common, routine accessibility needs of the most typical test takers. All students are provided these tools, as appropriate, even students who have no documented PNP. Default embedded system tools include but are not limited to the following examples in online and paper tests:

• paper test booklet (paper)

• answer document (paper)

• number 2 pencils (paper)

• erasers (paper)

• computer keyboard (online)

• computer screen display (online)

• mouse (online)

• cut, copy, and paste functions in a text entry box (online)

• browser zoom magnification (online)

• answer eliminator (online and paper)

• scratch paper (online and paper)

• personal calculators for Mathematics tests (online and paper)

• mark items for review (online and paper)

These tools are either embedded in the computer test delivery platform or provided at the local level automatically. They are the accessibility tools that nearly everyone uses routinely and assumes will be made available although we seldom think of them in this way. These tools serve a basic accessibility function for all.

aCCESSIbIlIty

5.6

As seen in Figure 5.3. Default embedded system tools are common supports made available to all users upon launch/start of test. These tools are either embedded in the basic computer test delivery platform, or they may be locally provided as needed. No advance request is needed for these supports. Students whose needs are met by default embedded tools do not need a PNP.

Figure 5.3. Default embedded system tools.

5.2.3 Support level 2: Open access toolsOpen Access tools (see Figure 5.4) are available to all users but must be identified in advance in the PNP, planned for, and then selected from the pull-down menu inside the test to be activated (online), or else provided locally.

Many students’ unique sensory and communication accessibility needs are predictable and can be met through a set of accessibility features designed into the underlying structure and delivery format of test items. Rather than overwhelm the user with all the possible tools, Open Access tools provide just the tools needed by individual users.

Open Access tools are slightly more intensive than default embedded system tools but can be delivered in a fully standardized manner that is valid, appropriate, and personalized to the specific access needs identified within an individual student’s PNP. Some of these require the use of tool-specific administration procedures. In ACT Aspire, Open Access tools include but are not limited to the following examples:

• large print (paper)

• color overlay (paper)

• respond in test booklet or on separate paper (paper)

• line reader (online and paper)

• magnifier tool (online and paper)

• answer masking (online and paper)

• dictate responses (online and paper)

• keyboard or augmentative or assistive communication (AAC) + local print (online and paper)

• breaks: supervised within each day (online and paper)

aCCESSIbIlIty

5.7

• special seating/grouping (online and paper)

• location for movement (online and paper)

• individual administration (online and paper)

• home administration (online and paper)

• other setting (online and paper)

• audio environment (online and paper)

• visual environment (online and paper)

• physical/motor equipment (online and paper

Open Access tools should be chosen carefully and specifically to prevent overwhelming or distracting the student during testing. Remember: routine annual documentation of successful (and unsuccessful) use of accessibility tools through the student’s educational experience helps to inform and improve future choices.

As seen in Figure 5.4, Open Access tools may be used by anyone, but to be activated they must be identified in advance in the PNP, planned, and selected from the pull-down menu inside the test to activate them (online), or else provided locally. Room supervisors must follow required procedures. Users should be practiced, familiar, and comfortable using these types of tools as well as comfortable using them in combination with any other tools.

Figure 5.4. Open access tools.

5.2.4 Support level 3: accommodationsAccommodations are high-level accessibility tools needed by relatively few students (see Figure 5.5). The ACT Aspire system requires accommodation-level supports to be requested by educational personnel on behalf of a student through the online PNP process. This will allow any needed resources to be assigned and documented for the student.1

Typically, students who receive this high level of support have a formally documented need for resources or equipment that requires expertise, special training, and/or extensive monitoring to select, administer, and even to use the support effectively and

1 Qualifying procedures or formal documentation required to request and receive accommodation-level support during ACT Aspire testing should be set by schools

aCCESSIbIlIty

5.8

securely. These can include but are not limited to the following examples:

• text-to-speech English audio

• text-to-speech English audio + orienting description for blind/low vision

• text-to-speech Spanish audio

• word-to-word dictionary

• human reader, English audio

• translated test directions

• Braille + tactile graphics (online and paper)

• sign language interpretation

• abacus, locally provided (online and paper)

• extra time (online and paper)

• breaks: securely extend session over multiple days (paper)

Decisions about accommodation-level supports are typically made by an educational team on behalf of and including the student. Accommodation decisions are normally based on a formal, documented evaluation of specialized need. Accommodation supports require substantial additional local resources or highly specialized, expert knowledge to deliver successfully and securely.

As seen in Figure 5.5, accommodations are available to users who have been qualified by their school or district to use them. ACT Aspire recommends that students who use accommodation- level supports have a formally documented need as well as relevant knowledge and familiarity with these tools. Accommodations must be requested through the online PNP process. Any formal qualifying procedure that is required by the responsible educational authority must be completed prior to completing the PNP request process.

Figure 5.5. accommodations

aCCESSIbIlIty

5.9

5.2.5 Support level 4: modificationsModifications are supports that are sometimes used during instruction, but they alter what the test is attempting to measure and thereby prevent meaningful access to performance of the construct being tested (see Figure 5.6). Because modifications violate the construct being tested, they invalidate performance results and communicate low expectations of student achievement. Modifications are not

permitted during ACT Aspire testing. (Modifications are further discussed in the Accessibility User Guide in the section titled, “When Instruction and Assessment Supports Differ.”)

As seen in Figure 5.6 modifications are supports that alter what the test is attempting to measure and therefore are not permitted in ACT Aspire tests.

Figure 5.6. modifications

5.3 Accommodations, Open Access and Embedded ToolsIn our commitment to provide a level field and fair performance opportunities for all students, ACT Aspire provides an integrated system of accessibility supports that include accommodations as well as other forms (less intensive levels) of accessibility support. There are times when supports provided for those who test using the online format are combined with other types of locally provided or paper-format supports. The reverse is also true, as students using the paper format sometimes also take advantage of certain online options. Regardless of test format, all students who use accessibility features must also have this use documented in the online Personal Needs Profile (PNP) portal by appropriate school personnel. For this reason we have provided the general description of ACT Aspire Accessibility Supports here in one section. Full procedural requirements and instructions for using permitted supports during test administration are provided in the ACT Aspire Accessibility User Guide. In addition the following procedural appendices and test administration resources are provided with the ACT Aspire Accessibility User Guide:

• Appendix A: Personal Needs Profile (PNP) Worksheet

• Appendix B: General Response Dictation and Scribing Procedures

• Appendix C: Guidelines for Sign Language Interpretation

• Appendix D: Approved Bilingual Word-to-Word Dictionaries

• Appendix E: Procedures for Local Delivery of Read-Aloud Support

aCCESSIbIlIty

5.10

The following eight tables provide the full list of supports permitted during the ACT Aspire Summative Tests. Tables 5.1-5.4 on the following pages identify the accessibility supports available in the paper summative test format. Tables 5.5-5.8 identify the accessibility supports available in the online summative test format.

5.4 2016 Paper Summative TestingTable 5.1. Paper Summative testing Presentation Supports

Presentation Supports Support Level

Content Area

Reading English Writing Math Science

Human Reader (English Audio)

• Intended for user with ability to see graphics.

• Requires: Locally provided; follow procedure in appendix E

• Recommended: Extra time 300%—must separately select.

Accommodation*Directions

OnlyDirections

OnlyYes Yes Yes

Human Reader (English Audio + Orienting Description)

• Intended for user with blindness or low vision.

• Requires: Locally provided; follow procedure in appendix E. Must separately select and use Braille + Tactile Graphics companion.

• Allow time for shipping of braille materials. Student will also need response support to record responses in paper form.

• Recommended: Extra time 300%—must separately select.

Accommodation*

Directions Only

(then must use Braille + Tactile Graphics)

Directions Only


Yes

(with Braille + Tactile Graphics)

Yes


Yes


Translated Directions

• Allowed for all grades.

• Requires: Locally provided.

Accommodation* Yes Yes Yes Yes Yes

Word-to-Word Dictionary, ACT-Approved

• Requires: Locally provided; follow procedure in appendix D.

Accommodation* — — Yes Yes Yes

American Sign Language (ASL): Directions Only (English Text)

• Requires: Locally provided; follow procedure in appendix C.


American Sign Language (ASL): Test Items (English Text)

• Requires: Locally provided 1:1 administration; follow procedure in appendix C.

• Recommended: Extra time.


Signed Exact English (SEE): Directions Only (English Text)

• Requires: Locally provided; follow procedure in appendix C.


* Qualification for use of permitted accessibility supports must follow policies of your local educational authority.

aCCESSIbIlIty

5.11



Content Area


Signed Exact English (SEE): Test Items (English Text)

• Requires: Locally provided 1:1 administration; follow procedure in appendix C.



Cued Speech

• Requires: Locally provided; follow procedure in appendix E.


Braille, Contracted, American Edition (EBAE) Includes Tactile Graphics

• Requires: Response support to record responses; time for shipment of materials.



Braille, Uncontracted, American Edition (EBAE) Includes Tactile Graphics




Braille, Contracted, Unified English (UEB) Includes Tactile Graphics




Braille, Uncontracted, Unified English (UEB) Includes Tactile Graphics




Large Print

• Requires: Time for shipment of materials.Open Access Yes Yes Yes Yes Yes

Magnifier Tool

• Requires: Locally provided.Open Access Yes Yes Yes Yes Yes

Line Reader


Color Overlay



aCCESSIbIlIty

5.12

Table 5.2. Paper Summative testing Interaction and Navigation Supports

Interaction and Navigation Supports Support Level

Content Area


Abacus

• Requires: Locally provided.Accommodation* — — — Yes —

Answer Masking

• Requires: Locally provided.Open Access Yes Yes — Yes Yes

Answer Eliminator

• Requires: Locally provided; used in test booklet only.Embedded Yes Yes Yes Yes Yes

Highlighter

• Requires: Locally provided; used in test booklet only.Embedded Yes Yes Yes Yes Yes

Scratch Paper

• Requires: Locally provided.Embedded Yes Yes Yes Yes Yes

Calculator (Grades 6–EHS)


• Follow ACT Aspire Calculator Policy; may use accessible calculators.

Embedded — — — Yes —


Table 5.3. Paper Summative testing Response Supports

Response Supports Support Level

Content Area


Electronic Spell Checker

• Requires: Locally provided separate device which must meet specifications provided in Procedures for Administration in Guide.


Respond in Test Booklet or on Separate Paper

• Requires: Response transcription; original work must be returned.


Open Access Yes Yes Yes Yes Yes

Dictate Responses

• Requires: Follow procedure in appendix B.



Keyboard or AAC + Local Print




Mark Item for Review

• Requires: Student mark, once made, must be erased thoroughly.

Embedded Yes Yes Yes Yes Yes


aCCESSIbIlIty

5.13

Table 5.4. Paper Summative testing general test Condition Supports

General Test Condition Supports Support Level

Content Area


Extra Time† Accommodation* Yes Yes Yes Yes Yes

Breaks: Securely Extend Session over Multiple Days


Breaks: Supervised within Each Day Open Access Yes Yes Yes Yes Yes

Special Seating/Grouping Open Access Yes Yes Yes Yes Yes

Location for Movement Open Access Yes Yes Yes Yes Yes

Individual Administration Open Access Yes Yes Yes Yes Yes

Home Administration Open Access Yes Yes Yes Yes Yes

Other Setting Open Access Yes Yes Yes Yes Yes

Audio Environment Open Access Yes Yes Yes Yes Yes

Visual Environment Open Access Yes Yes Yes Yes Yes

Physical/Motor Equipment Open Access Yes Yes Yes Yes Yes


† Extra time represents the maximum allowed within a same-day test. Session may end earlier if time not needed. Automatic assignment of extra time occurs ONLY with TTS audio formats. If using only paper form braille without TTS, extra time must be manually selected in PNP.

5.5 2016 Online Summative TestingTable 5.5. Online Summative testing Presentation Supports


Content Area


Text-to-Speech (English Audio)

• Intended for user with ability to see graphics.

• Requires: PNP system automatically assigns extra time 300%.†

Accommodation*Directions

OnlyDirections

OnlyYes Yes Yes

Text-to-Speech (English Audio + Orienting Description)

• Intended for user with blindness or low vision.

• Requires: Braille + Tactile Graphics Companion; response support to record responses; time for shipment of braille materials; PNP system automatically assigns extra time 300%.†

• PNP system automatically prompts choice of Braille, Contracted or Braille, Uncontracted.

Accommodation*

Directions Only


Directions Only


Yes


Yes


Yes


aCCESSIbIlIty

5.14



Content Area


Translated Test Directions

• Allowed for all grades.

• Requires: Must be provided before test launch.

• Spanish provided in online system; other languages must be locally provided.


Text-to-Speech (Spanish Audio) Item Translation

• Grades 3–6 only.

• Requires: Online prerecorded format; PNP system automatically assigns extra time 300%.†


Word-to-Word Dictionary, ACT-Approved

• Requires: Locally provided; follow procedure in appendix D.


Cued Speech

• Requires: Locally provided, follow procedure in appendix E.


Braille, Contracted, American Edition (EBAE) Includes Tactile Graphics




Braille, Uncontracted, American Edition (EBAE) Includes Tactile Graphics




Braille Contracted, Unified English (UEB) Includes Tactile Graphics




Braille, Uncontracted, Unified English (UEB) Includes Tactile Graphics




Magnifier Tool

• Online platform tool; may be locally provided.Open Access Yes Yes Yes Yes Yes

aCCESSIbIlIty

5.15



Content Area


Line Reader

• Online platform tool; may be locally provided.Open Access Yes Yes Yes Yes Yes

Color Contrast

• Online platform tool or locally provided color overlay.


Browser Zoom Magnification

• Online only.Embedded Yes Yes Yes Yes Yes



Table 5.6. Online Summative testing Interaction and Navigation Supports

Interaction and Navigation Supports Support Level

Content Area


Abacus

• Requires: Locally provided.Accommodation* — — — Yes —

Answer Masking

• Online platform tool.Open Access Yes Yes — Yes Yes

Answer Eliminator

• Online platform tool.Embedded Yes Yes Yes Yes Yes

Highlighter Tool

• Online platform tool.Embedded Yes Yes Yes Yes Yes

Browser Cut, Copy, and Paste

• Online only.Embedded Yes Yes Yes Yes Yes

Scratch Paper

• Requires: Locally provided.Embedded Yes Yes Yes Yes Yes

Calculator (Grades 6–EHS)


• Follow ACT Aspire Calculator Policy; may use accessible calculators.

Embedded — — — Yes —


aCCESSIbIlIty

5.16

Table 5.7. Online Summative testing Response Supports

Response Supports Support Level

Content Area


Electronic Spell Checker

• Requires: Locally provided separate device which must meet specifications provided in Procedures for Administration in Guide.


Respond on Separate Paper

• Requires: Locally provided; response transcription; original work must be returned.



Dictate Responses

• Requires: Follow procedure in appendix B.



Keyboard or AAC + Local Print




Mark Item for Review

• Online platform.

Embedded Yes Yes Yes Yes Yes


Table 5.8. Online Summative testing general test Condition Supports

General Test Condition Supports Support Level

Content Area


Extra Time† Accommodation* Yes Yes Yes Yes Yes

Breaks: Supervised within Each Day Open Access Yes Yes Yes Yes Yes

Special Seating/Grouping Open Access Yes Yes Yes Yes Yes

Location for Movement Open Access Yes Yes Yes Yes Yes

Individual Administration Open Access Yes Yes Yes Yes Yes

Home Administration Open Access Yes Yes Yes Yes Yes

Other Setting Open Access Yes Yes Yes Yes Yes

Audio Environment Open Access Yes Yes Yes Yes Yes

Visual Environment Open Access Yes Yes Yes Yes Yes

Physical/Motor Equipment Open Access Yes Yes Yes Yes Yes



aCCESSIbIlIty

5.17

Table 5.9. Frequency of use of accommodated Forms, aCt aspire, Spring 2015

Subject/Grade

Braille Contracted

Braille Uncontracted

English Audio for Blind

English Audio for Sighted Large Print Spanish Grand Total

English 80 15 455 550

3 15 5 79 99

4 11 3 58 72

5 13 2 68 83

6 7 1 50 58

7 9 60 69

8 7 2 60 69

EHS 18 2 80 100

Mathematics 104 9 10 23,291 590 33 24,037

3 17 3 1 2,582 103 7 2,713

4 17 1 2,692 83 9 2,802

5 17 2,814 94 13 2,938

6 16 1 2,592 66 4 2,679

7 10 1 2,417 78 2,506

8 7 1 3 2,309 79 2,399

EHS 20 3 5 7,885 87 8,000

Reading 117 14 606 737

3 18 4 110 132

4 18 2 86 106

5 19 1 97 117

6 11 3 67 81

7 14 81 95

8 14 3 81 98

EHS 23 1 84 108

aCCESSIbIlIty

5.18


Subject/Grade

Braille Contracted

Braille Uncontracted

English Audio for

Blind

English Audio for Sighted Large Print Spanish Grand Total

Science 50 7 9 18,046 337 16 18,465

3 2 2 1 1,331 52 1,388

4 3 1 1,494 32 2 1,532

5 11 1 2,656 56 12 2,736

6 6 1 1,266 28 2 1,303

7 5 1 2,129 51 2,186

8 4 2 2 1,412 38 1,458

EHS 19 5 7,758 80 7,862

Writing 70 23 7 13,600 452 6 14,158

3 13 5 1 912 85 1,016

4 9 4 1,043 57 3 1,116

5 8 3 941 70 1 1,023

6 8 3 1,055 52 2 1,120

7 6 3 1 1,039 57 1,106

8 6 3 2 1,017 58 1,086

EHS 20 2 3 7,593 73 7,691

Grand Total 421 68 26 54,937 2,440 55 57,947

% of Total 0.73% 0.12% 0.04% 94.81% 4.21% 0.09% 100.00%

6.1

CHaPtER 6

Test AdministrationThe ACT Aspire Test Coordinator and Room Supervisor Manuals contain the instructions for administering the ACT Aspire grades 3–Early High School English, Mathematics, Reading, Science, and Writing subject tests. The test coordinator is the main ACT Aspire contact at a school and the person who makes arrangements for the test administration. The room supervisor is responsible for secure administration of the tests to the students in a testing room. There are separate Room Supervisor Manuals for paper testing formats and for computer based testing formats.

In addition to the Test Coordinator and Room Supervisor Manuals, all training and test administration resources are available online on the websites associated with ACT Aspire: actaspire.tms.pearson.com houses training videos, actaspire.avocet.pearson.com includes links to other training materials, and actaspire.pearson.com has access to many more resources.

6.1 Policies and ProceduresThe Test Coordinator and Room Supervisor Manuals provide direction in administering ACT Aspire tests, including specific instructions for conducting the timed tests. It is important to follow these instructions to successfully measure students’ academic skills. All testing personnel, including room supervisors and other testing staff, are required to read the materials provided by ACT Aspire.

6.2 Standardized ProceduresThroughout the Test Coordinator and Room Supervisor Manuals, there are detailed directions for securing test materials and administering tests in a standardized manner.

It is preferred that relatives or guardians of students taking ACT Aspire should not serve in the role of test coordinator. Whenever possible, this policy should be followed.

actaspire.tms.pearson.com

actaspire.avocet.pearson.com

actaspire.pearson.com

tESt aDmINIStRatION

6.2

However, in some circumstances, this may not be possible. In those circumstances, the district/school should monitor the testing process so that test coordinators who have a relative testing do not handle the test materials of the relative without another responsible individual present.

Relatives of students taking ACT Aspire should not serve in the role of room supervisor in the same testing room as the student relative. It is permissible for a relative to serve as a room supervisor in the same school/district as a related student, provided that student is not testing in the room being supervised by the related testing staff person.

The following actions violate ACT Aspire policies and procedures:

• accessing or obtaining a test booklet or test questions prior to the test for any reason (An exception is provided for American Sign Language and Signing Exact English interpreters assisting students. See “Preparation for Sign Interpretation” in appendix C of the ACT Aspire Accessibility User’s Guide.)

• photocopying, making an electronic copy, or keeping a personal copy of the test or of any test items (An exception is provided for students who need to use digital scanning magnification for test items. See the ACT Aspire Accessibility User’s Guide.)

• taking notes about test questions or any paraphrase of test questions to aid in preparing students for testing

• aiding or assisting a student with a response or answer to a secure test item, including providing formulas

• rephrasing test questions for students

• creating an answer key or “crib sheet” of answers to test questions

• editing or changing student answers after completion of the test, with or without the student’s permission

• allowing students to test in an unsupervised setting

• leaving test materials in an unsecured place or unattended

• failing to properly report and document incidents of prohibited behavior involving students, staff, or others

• allowing students to test longer than the permitted time

• failing to return and account for all testing materials after the testing session has ended

6.3 Selecting and Training Testing StaffThe test coordinator is responsible for selecting and training all room supervisors and other testing staff. The test coordinator provides the continuity and administrative uniformity necessary to ensure that students at a school are tested under the same conditions as students at every other school, and to ensure the security of the test.

tESt aDmINIStRatION

6.3

6.3.1 Room SupervisorsTypically, teachers will administer the tests to students during regular class periods. Depending on the number of students in a class, you may wish to assign adult school staff (not students) to assist the teacher with distributing and collecting test materials and with monitoring testing.

Be sure that everyone involved in test administration has access to the Room Supervisor Manual and is familiar with its contents. All manuals are periodically updated, so it is important to check the Avocet website for updated versions before each new test administration. A room supervisor is needed in each testing room to read directions and monitor students. If test rooms have more than 25 students, additional personnel may be assigned to assist the room supervisor.

Before the test day, all testing personnel should read all of the testing instructions carefully, particularly the verbal instructions, which will be read aloud to students on the test day. It is important that testing personnel are familiar with these instructions.

6.3.1.1 Room Supervisor Qualifications

The ACT Aspire test coordinator should confirm that the room supervisor(s) meet all of the following criteria. Each room supervisor should be

• proficient in English

• experienced in testing and classroom management

• a staff member of the institution or district where the test administration will take place

To protect both students and the room supervisor from questions of possible conflict of interest, the following conditions should also be met. The room supervisor should

• not be a relative or guardian of a student in the assigned room

• not be a private consultant or individual tutor whose fees are paid by a student or student’s family

6.3.1.2 Room Supervisor Responsibilities

Specific responsibilities are:

• Read and thoroughly understand the policies, procedures, and instructions in the Room Supervisor Manual and other materials provided.

• Supervise a test room.

• Distribute test materials if administering paper tests.*

• Start a test session if administering online tests.

• Help students sign in to the online testing system if administering online tests.*

• Read test instructions.

• Properly time tests.

• Walk around the testing room during testing to be sure students are working on the correct test and to observe student behavior.*

tESt aDmINIStRatION

6.4

• Monitor the online testing system as needed.

• Pay careful attention to monitoring students’ behavior during the entire testing session.*

• Collect and account for all answer documents, test booklets, and scratch paper (or, for online testing, authorization tickets) before dismissing students.*

• Ensure students have stopped testing and have correctly signed out of the online testing system.*

• Complete documentation of any testing irregularities.

*Other school staff may assist with these activities.

6.3.2 Responsibilities of Other testing StaffOther school staff can assist the room supervisor with an administration to a group. Another staff member is recommended if a room has 26 or more students. Other staff members can be added for each incremental increase of 25 students in the case of testing in a large group setting.

The staff member can assist in the administration of the tests according to the policies and procedures in the appropriate Room Supervisor Manual. If used, the staff members should meet the same qualifications as a room supervisor (see above).

The testing staff member’s responsibilities can be:

• Read and understand the appropriate Room Supervisor Manual

• Help distribute test materials if administering paper tests.

• Help students sign in to the online testing system if administering online tests.

• Verify the timing of the test with the room supervisor.

• Walk around the room during testing to be sure all students are working on the correct test and to observe student behavior.

• Report any irregularities to the room supervisor immediately.

• Accompany students to the restroom if more than one leaves during the timed portion of the test.

• Pay strict attention to monitoring students during the entire testing session.

• Help collect and account for all answer documents, test booklets, and scratch paper (or, for online testing, authorization tickets).

• Ensure students have stopped testing and have correctly signed out of the online testing system.

tESt aDmINIStRatION

6.5

6.3.3 Staff training SessionsTo ensure that the administration of the tests is standard, it is important for all testing staff to understand the testing procedures outlined in the Room Supervisor Manual and other training resources.

ACT Aspire recommends that a training session be conducted prior to testing for all testing staff to discuss the testing guidelines and organizational details, including:

1. Security of Test Materials (paper testing only)

a. Describe how the materials will be distributed to the test rooms and how room supervisors are to count them.

b. Emphasize that room supervisors are to count test booklets when they receive them from the test coordinator and again before students are dismissed.

c. Emphasize that staff members should never leave a test room unattended.

2. Security and Materials for Online Testing

a. Ensure that all room supervisors have access to the ACT Aspire Portal.

b. Explain that authorization tickets should be printed in advance and stored in a secure location.

c. Describe how to use the online testing system—see the ACT Aspire Portal User Guide posted on the Avocet website for step-by-step instructions.

d. Emphasize that room supervisors must collect used and unused scratch paper and authorization tickets after testing.

e. Emphasize that staff members should never leave a test room unattended.

f. Emphasize that test sessions must be started in the ACT Aspire Portal before students can sign in to the test.

3. Activities Before the Test

a. Determine which set of verbal instructions room supervisors are to follow. Room supervisors should clearly mark those instructions in their manuals.

b. Develop and share with staff a contingency plan for handling unexpected situations that may arise that could affect testing. See Avocet for the ACT Aspire Contingency Plan document.

4. Test Day

a. Discuss when and where staff members are to report on the test day.

b. Stress that no one may be admitted to the testing room once the timed tests have begun. Determine how to handle late arrivals.

c. Stress that verbal instructions for the tests must be read verbatim.

d. Stress that answer documents, test booklets, and login credentials should not be distributed prior to admitting students.

tESt aDmINIStRatION

6.6

e. Stress that accurate timing of each test is critical. For paper testing, room supervisors must record the start, five-minute warning, and stop times in the manuals. Discuss the consequences of a mistimed section.

f. Emphasize that staff members must not read (other than the Room Supervisor Manual), correct papers, or do anything unrelated to administering the test. Their attention should be focused on the students.

g. Emphasize that conversations among staff must be quiet and kept to a minimum. Even whispered conversations can be distracting to students while testing.

h. For room supervisors who are administering the grades 6–Early High School (grades 9–10) mathematics test, emphasize that calculators should be checked before testing to ensure they meet ACT Aspire standards. Review permitted and prohibited calculator listed on the Avocet website (Avocet/Calculators/Calculator FAQ).

i. Note that during the test, staff members should walk quietly around the room, be available to respond to students’ questions, assist in the case of illness, and check that students are working on the correct test. In rooms where students are taking online tests, staff should ensure students have signed in to the correct test and assist them with technical or system navigation issues.

j. Discuss procedures for a student leaving during the test to go to the restroom. For online testing, review “Test Session Interruptions” on the Avocet website for instructions on how to pause a test for a student.

k. Discuss what actions to take if staff members observe prohibited behavior. Review plans for dismissing students (e.g., where they are to be sent, how to maintain vigilance in the test room, documenting actions taken).

l. Discuss what actions to take in the case of a group irregularity (e.g., a power outage) or an emergency.

m. Discuss potential individual irregularities and actions to take.

n. Review the Testing Irregularity Report.

5. After the test

a. Emphasize that room supervisors administering paper tests must verify the count of used and unused test booklets, then return test materials, scratch paper, and Irregularity Reports to the test coordinator.

b. Emphasize that room supervisors administering paper tests should verify that all answer documents have the correct test form gridded.

c. Emphasize that room supervisors administering online tests must collect all used and unused scratch paper and authorization tickets and return them to the test coordinator.

d. Emphasize that all test sessions must be closed in the ACT Aspire Portal after all students assigned to the test session have completed testing. Once test sessions are closed, they cannot be reopened.

7.1

CHaPtER 7

Test SecurityIn order to ensure the validity of the ACT Aspire test scores, test takers, individuals that have a role in administering the tests, and those who are otherwise involved in facilitating the testing process, must strictly observe ACT Aspire’s standardized testing policies, including the Test Security Principles and test security requirements. Those requirements are set forth in the

instructions for the ACT Aspire test and the ACT Aspire Test Supervisor’s Manual and may be supplemented by ACT from time to time with additional communications to test takers and testing staff.

ACT Aspire’s test security requirements are designed to ensure that examinees have an equal opportunity to demonstrate their academic achievement and skills, that examinees who do their own work are not unfairly disadvantaged by examinees who do not, and that scores reported for each examinee are valid. Strict observation of the test security requirements is required to safeguard the validity of the results.

Testing staff must protect the confidentiality of the ACT Aspire test items and responses. Testing staff should be aware of and competent for their roles, including understanding ACT Aspire's test administration policies and procedures and acknowledging and avoiding of conflicts of interest in their roles as test administrators for ACT Aspire.

Testing staff must be alert to activities that can compromise the fairness of the test and the validity of the scores. Such activities include, but are not limited to, cheating and questionable test taking behavior (such as copying answers or using prohibited electronic devices during testing); accessing questions prior to the test; taking photos or making copies of test questions or test materials; posting test questions on the internet; or test proctor or test administrator misconduct (such as providing answers or questions to test takers or permitting test takers to engage in prohibited conduct during testing).

7.1 Data SecurityACT systems hardening baselines are built from the Center for Internet Security (CIS)

tESt SECuRIty

7.2

(see http://www.cisecurity.org) standards and modified to meet the risk profile of applications and systems that they support, per the CIS guidance.

7.1.1 Personally Identifiable InformationACT Aspire, LLC’s, general policy on the release of personally identifiable information (PII) that is collected as part of the ACT Aspire program is that we will not provide such information to a third party without the consent of the individual (or, in the case of an individual under age 18, the consent of a parent or legal guardian). However, ACT Aspire may share PII as follows, as allowed by applicable laws:

• ACT Aspire will share PII with its member companies, NCS Pearson, Inc., and ACT, Inc. Both member companies provide services to ACT Aspire that are essential for ACT Aspire to provide testing services, and both members will use PII only in the same ways ACT Aspire can, in accordance with this policy. In the event of dissolution of ACT Aspire, LLC, its members will assume ownership of its assets, including routine business records that may include PII.

• ACT Aspire may provide PII to CDE once it has been determined that the state will pay for the individual to take the assessment. This PII may include linking information relating to ACT Aspire assessments with information relating to ACT programs such as the ACT assessment.

• If the individual is under 18 years of age, ACT Aspire may provide PII to the individual’s parent or guardian, who may direct us to release the information to third parties. If the individual is 18 years of age or older, ACT Aspire may provide PII to the individual, who may direct us to release the information to third parties.

• ACT Aspire may enter into contracts with third parties to perform functions on its behalf, which may include assisting it in processing PII. These parties are contractually required to maintain the confidentiality of the information and are restricted from using the information for any purposes other than performance of those functions.

• ACT Aspire may disclose PII to facilitate the testing process, for example, to test proctors.

• ACT Aspire may provide PII as required or allowed by operation of law, or to protect the health and safety of its customers or others.

ACT Aspire may authorize its members and other subcontractors to use de-identified and aggregate data for other purposes, such as research. PII will not be included in any de-identified or aggregate data sets.

7.1.2 Security Features of Data SystemsACT employs the Principle of Least Privilege when providing access to users, and access entitlement reviews are conducted periodically. All student Information is secured and password-controlled. Access to student information during various testing processes is limited to personnel who need that access in order to fulfill the

http://www.cisecurity.org

tESt SECuRIty

7.3

requirements of the contract. Access to confidential student information is reviewed on a periodic basis. ACT is in the process of modernizing our infrastructure to measure up to established policy and that project will be ongoing. ACT takes a risk-based approach in prioritizing the order in which systems are to be modernized. ACT does not permit any customer access to data systems in its environment as this would cause exposure for other ACT customers.

The overall approach to security and cyber security standards by Pearson, ACT Aspire’s subcontractor, includes information about how the systems prevent infiltration and protect student information. Pearson’s overall approach involves the following facets:

Security and Audit Management. Pearson undergoes regular audits from internal corporate audit function as well as third-party audits. Identified gaps are reported to the Pearson Chief Information Security Officer and responsible Busi-ness Information Security Officers and the specific business function manage-ment, where a remediation plan is identified, responsibilities are assigned, mile-stones are created, and follow-up documentation is stored. Summary reports from the independent audits can be made available to the OIT Office of Cyber Security upon request.

Update Processes. The “change management” process manages changes to optimize risk mitigation, minimizes the severity of any known impact, and aims to be successful on the first attempt. This includes additions, modifications, and remov-als of authorized components of services and their associated documentation. The process includes acknowledging and recording changes; assessing the impact, cost, benefit, and risk of proposed changes; obtaining approval(s); managing and coordinating implementation of changes; and reviewing and closing requests for change. Critical steps in the change process require management approval before proceeding to the next step of the process.

System Behavior Monitoring. Both during test-taking and at other times, Pear-son monitors for anomalous activity throughout the entire system, not just at the application layer. Should student testing activity trigger a behavior alert, the test is halted and the proctor is alerted. System monitors may be triggered, which will generate an alert to the appropriate team for further investigation.

Features That Prevent Infiltration. Several methods mitigate the risk of uninvited access to systems. All external traffic is encrypted. All item content payloads are encrypted end-to-end using AES-128. All student responses are encrypted us-ing AES-256, using a different key than the content payloads. Students receive a one-time password to take a test. Account and password controls meet accepted standards for length, complexity, lockout, re-use, and expiration.

Systems and components use the following appropriate protocols and design features to neutralize known threats:

Component-to-Component. All sensitive communication between application components that crosses the Internet is encrypted using Secure Sockets Layer

tESt SECuRIty

7.4

(SSL)/Transport Layer Security (TLS). The strongest version of TLS is provided to the user’s browser depending on that browser’s capabilities. This does not apply to assessment item content delivery, because these are encrypted as described below. A different method will enable proctor caching, which is required to reduce their overall bandwidth requirements.

User Authentication and Authorization. Users are only created when there is a business need and are granted a user role in the system based on job require-ments. A user requests access with specific privileges, a role, that are needed to perform their required activities. User accounts are only authorized to access the data specific to their role and organization. The request for access is reviewed by the appropriate program manager and either approved or denied. Users are removed as part of an overall Termination Process in accordance with Human Resources policy. Network access is removed immediately, and accountable man-agers are notified of specific applications from which the users must have access removed. All applications, devices, databases, and network access require authen-tication of the user.

Assessment Item-Level Security. When a test is marked “secure” during pub-lishing, each individual item is encrypted using Advanced Encryption Standard (AES) and a 128-bit key. Test items remain encrypted until they are downloaded to the TestNav client, decrypted, and presented to the student during testing.

Student Response and Score Data Security. Student responses are encrypted using an AES 256-bit key. Score data are communicated between components using SSL/TLS.

Data Storage. Database backups are AES encrypted using Amazon’s S3 Server Side Encryption functionality

Internal corporate security policies are aligned with ISO 27001 and are updated regularly to reflect changes to the industry landscape. Application security practices are based on a variety of accepted good practices, including ISO 27001, Open Web Application Security Project, Web Application Security Consortium, and internally developed standards. Pearson security analysts update the standards and practices on a regular basis.

Specific security protocols for device types, operating systems, browsers, plugins and settings, enterprise management, and virtual desktop practices vary over time as new software versions are released by vendors, such as a new version of Windows or Java, and as new devices come on the market or are rendered obsolete because of age. These protocols are communicated regularly and clearly, early enough for schools and technology managers to confirm they are prepared well in advance of test day. In general, protocols include locking down web browsers or devices so that nothing but the test may be accessed during testing and verifying that supported versions of software include the latest security releases from the original vendor.

tESt SECuRIty

7.5

7.1.3 Data breaches and RemediesACT’s Information Security Policy governs the handling of student data that is classified as confidential restricted. The policy states that confidential restricted information must meet the following guidelines:

• Electronic information assets must only be stored on ACT-approved systems/media with appropriate access controls.

• Only limited authorized users may have access to this information.

• Physical records must be locked in drawers or cabinets while not being used.

• ACT also has a Clear Desk and Clean Screen Standard, Access Management Standard, Malware Protection Standard, Network Security Management Standard, Secure System Configuration Standard, Security Event Logging and Monitoring Standard, and System Vulnerability and Patch Management Standard to form a system of control to protect student data.

ACT has an Information Security Incident Response Plan (ISIRP) that brings needed resources together in an organized manner to deal with an incident, classified as an adverse event, related to the safety and security of ACT networks, computer systems, and data resources. The adverse event could come in a variety of forms: technical attacks (e.g., denial of service attack, malicious code attack, exploitation of a vulnerability), unauthorized behavior (e.g., unauthorized access to ACT systems, inappropriate usage of ACT data, loss of physical assets containing Confidential or Confidential Restricted data), or a combination of activities. The purpose of the plan is to outline specific steps to take in the event of any information security incident.

This Information Security Incident Response Plan charters an ACT Security Incident Response Team (ISIRT) with providing an around-the-clock (i.e., 24x7) coordinated security incident response throughout ACT. The Information Security Officer is the overall ACT representative, and has the responsibility and authority to manage the Information Security Incident Response Team and implement necessary ISIRP actions and decisions during an incident.

The ACT Aspire program, as part of the National Services group of NCS Pearson, Inc., (Pearson) responds to circumstances that relate to a security or privacy issues involving loss or access to student data processed by or on behalf of ACT Aspire.

Pearson has a formally documented Incident Response Policy and Process so that necessary stakeholders are notified and engaged in the process to investigate, remediate, and follow-up on any suspected or actual breach. If a breach is suspected, Pearson staff will communicate with our Chief Information Security Officer and Security team to initiate an investigation. If a breach is confirmed during the initial investigation, we execute the Incident Response Plan. Key stakeholders will be notified in the security, technical, business, and legal groups. Steps are promptly taken to restrict access to the compromised host or application while maintaining the integrity of the environment, to preserve any potential evidence for the investigation.

tESt SECuRIty

7.6

Pearson analysts will perform an investigation using a combination of tools and manual analysis in an attempt to identify the root cause and the extent of the compromise.

At the same time, the business and legal teams will determine what external notification requirements Pearson is subject to, based on legislation and contractual obligations. Required notifications and remedies will be performed.

If/when the root cause is identified, a remediation plan will be created to address the vulnerability, not only on the host(s) or application(s) in question, but across systems that may face the same vulnerability even if they have not yet been compromised. Systems or applications are patched promptly and must pass a security assessment prior to being released for general use. The investigator will submit a report with the details of the incident, and a post-mortem meeting will be held to review that report and discuss any lessons learned from the incident.

7.1.4 Data Privacy and use of Student Data

7.1.4.1. Data Privacy Assurances

ACT and our subcontractors agree not to use student PII in any way or for any purpose other than those expressly granted by the contract.

ACT does not employ the technology to encrypt at rest our storage frames, however, we have the following compensating solutions in this space:

• Storage frames supporting the ACT are dedicated to ACT for our in house system

• Storage frames support the ACT are housed at secure data centers which have SOC1 controls and provide reports to ACT

• E-Business does encrypt PII data within the database structures

• ACT does not use any external tape solutions; all backups are replicated within the solution to a secondary solution at our non-prod data center for disaster recovery purposes

• Office 365 (email, SharePoint) is encrypted as a cloud SaaS solution

7.1.4.2 Security Processes and Procedures

ACT conducts new hire training in Information Security practices and the handling of sensitive data. Annual refreshers are then conducted on an ongoing basis. ACT employs the Principle of Least Privilege so that only those individuals that have a legitimate need to access the data will be able to access the data and employs role-based security so that users only have accesses to the pieces of data necessary to conduct their work on behalf of our customers. Accounts are terminated immediately when a user departs the organization and periodic entitlement reviews are conducted to ensure that those that have access still require that level of access to perform their duties.

tESt SECuRIty

7.7

To manage access to identifiable student data for ACT Aspire, Pearson authorizes access to networks, systems, and applications based on business need to know, or the least privilege necessary to perform stated job duties. Any elevated rights not granted by virtue of job responsibilities at time of hire must be approved by management, and in certain instances, the Information Security Group (such as administrative rights).

Regarding de-identifying data during scoring, the digital scoring platform routes student responses to scorers randomly. Scorers cannot see any student information such as name or other demographics, unless a student self-identifies in his or her response.

The security processes include the following aspects:

Security Training of Staff. A Security Awareness Program is an ongoing effort providing guidance to every employee so they understand security policies, individ-ual responsibilities for compliance, and how behaviors affect the ability to protect systems and data. The core of these efforts is built on the well-known principles of Confidentiality, Integrity, and Availability (CIA). Security awareness begins imme-diately with orientation for new hires. Training covers acceptable use of systems and fundamental best practices. Web-based training modules are available, and all employees are required to complete the course when hired and thereafter are rou-tinely asked to refresh their knowledge. Specific online courses covering Payment Card Industry (PCI) compliance and PII have also been developed. An annual re-fresher course including components of all of these control areas is also provided.

Access to Identifiable Student Data. Pearson authorizes access to networks, systems, and applications based on business need to know, or the least privilege necessary to perform stated job duties. Any elevated rights not granted by virtue of job responsibilities at time of hire must be approved by management, and in certain instances, the Information Security Group (such as administrative rights).

Internal Network Access Controls. Policy requires system access to be granted with a unique user ID. Every employee is provided with a unique domain account and a secure one-time password. To safeguard confidentiality, passwords must be changed upon first logon. Pearson stipulates a minimum password length as well as complexity requirements. Accounts are locked out after five failed login at-tempts. Passwords must be changed every 60 days, and cannot be reused within one year. A screen lock is also enabled by Domain Security Policy to activate after 15 minutes of inactivity. All accounts are locked at time of termination. To verify compliance, domain controls identify and lock all accounts that have been idle for more than 30 days. If they are still unused after 90 days, the account is deleted.

Application Access Controls. For externally facing systems and applications, access to specific data is determined based on user roles and permission require-ments established by our customers. At a minimum, applications hosted on Pear-son systems are required to provide the same level of security as network access would require.

tESt SECuRIty

7.8

System Access Controls. Separation of duties between system administrators, application developers, application DBAs, and system DBAs provides clear defini-tion of responsibility and prevents an individual from attaining the access rights that would allow for easy compromise.

De-identifying Data During Scoring. The digital scoring platform routes stu-dent responses to scorers randomly. Scorers cannot see any student information such as name or other demographics, unless a student self-identifies in his or her response.

7.1.4.3 Transparency Requirements

For paper testing, ACT will store all scored answer documents and used test books at a secure facility until securely destroyed based upon ACT’s retention schedule. This will include all materials used to capture student responses for scoring. ACT’s standard retention policy is to store all scanned answer documents and secure test materials for two years. ACT is also committed to retaining any electronic files including, but not limited to, scanned images, scored files, file exchanges, and contractual documentation at the ACT standard retention rate.

7.1.4.4 Student Privacy Pledge

ACT does not put limits on data retention as it is not aligned with our mission to be able to issue college reportable scores and conduct research to improve the ACT and provide data trends in education. Pearson, the subcontractor for ACT Aspire, LLC, has not signed the Student Privacy Pledge. Pearson remains committed to protecting student privacy and to working with state and local school officials to be good stewards of student personally identifiable information in our care.

7.2 Test Security and Administration IrregularitiesThe irregularity reporting tool in the ACT Aspire Portal is intended for use primarily by school personnel to record any test administration irregularities that may affect student scores or the analysis of ACT Aspire results. Recording an irregularity for a student is not the same as voiding his or her test and dismissing him or her for prohibited behavior. Instructions for using this tool can be found in the related section of the ACT Aspire Portal User Guide. Testing personnel should use the tool to report any of the irregularities occurring within the room.

Room supervisors should enter test irregularities directly into the reporting tool in the ACT Aspire Portal, if possible. If room supervisors do not have access to the Portal, they should use the Testing Irregularity Report located at the end of this manual. All Testing Irregularity Reports should be forwarded to the local test coordinator after testing and be entered into the Portal.

tESt SECuRIty

7.9

For either format of the test, room supervisors should document any of the following occurrences during administration:

• A student engages in any instance of prohibited behavior as outlined above.

• A student becomes ill or leaves the room during testing.

• A student fails to follow instructions (responds to questions randomly, obviously does not read questions prior to responding, or refuses to respond to questions).

• A general disturbance or distraction occurs which could affect one or more students’ results.

• A student questions the accuracy or validity of an item.

• A student has a technical issue that interrupts testing (online testing).

• A student has a defective test booklet or answer document (for paper testing).

For any instances where students can resume testing after illness, a technical issue, or a general disturbance, room supervisors should follow the instructions about how to resume a test session from the “Resume a Test” section of the Avocet website.

For the latest update of irregularity categories and codes used in the ACT Aspire Portal, see the “Irregularities” section of the Avocet website.

The irregularities in the Environment/Materials category include external factors that may affect student testing. These include things like outside noises or hot/cold room temperatures; damaged, missing or stolen test materials; and occurrences like power outages, severe weather, or emergency evacuations.

The Examinee category of irregularities includes student behaviors that may affect their performance or the performance of other students. These include the exhibition of prohibited behaviors described previously, student complaints about testing conditions, or challenges of test items.

The Staff category includes actions testing staff may engage in that affect testing. These include failure to follow testing procedures like mistiming a test or not reading the verbal instructions from the Room Supervisor Manual, or other inappropriate behavior like engaging in personal communication via other staff, telephone, or text during testing.

The Technical category pertains to the performance of online testing and includes system failure, slowness, or freezing; difficulties launching the test or with students using the testing platform; and other system issues like problems with using a keyboard, mouse, monitor, or other related hardware.

8.1

CHaPtER 8

ACT Aspire ReportingACT Aspire assessments are vertically scaled and designed to measure student growth in a longitudinal assessment system. ACT Aspire’s scale scores are linked to college and career data through scores on the ACT and the ACT National Career Readiness Certificate™ (ACT NCRC®) program.

The objectives of ACT Aspire summative reporting are:

• To clearly communicate what College and Career Readiness means and whether a student is on track to achieve it

• To clearly communicate strengths and opportunity areas early on in a child’s educational career to enable greater personal academic and career success.

• To include new measures that provide a better picture of student academic and personal achievement

• To clearly communicate each student’s personal path to improvement and provide actionable next steps

• To connect teaching and learning by enabling educators with reporting on student specific skills

• To provide educators at all levels with the insights needed to improve student performance

• To leverage technology to make the reporting user experience faster and more effective through the inclusion of deeper, richer insights and more valuable data online than in a traditional paper report

• To leverage technology to make reporting results available across more platforms including computers and tablets

The ACT Aspire Portal serves as the delivery engine for test results. Individual student reports, a student performance file, and several aggregate reports are posted in the portal after assessment scoring is completed. Interpretive materials are available to help parents, teachers, and students better understand what is needed for continual academic growth.

aCt aSPIRE REPORtINg

8.2

Support materials include the following: Interpretive Guide for ACT Aspire Summative Reports, Understanding Your ACT Aspire Summative Results (for students and parents), and Sample

Items for Reporting. These and other reporting resources can be found on the aCt aspire avocet web page.

Subsequent chapters in this Technical Manual describe in greater detail, the scores and metrics included in ACT Aspire reporting.

http://actaspire.avocet.pearson.com/actaspire/home

http://actaspire.avocet.pearson.com/actaspire/home

9.1

CHaPtER 9

ACT Aspire ScoresThis chapter introduces the ACT Aspire subject scale scores, Composite score, ELA score and STEM score. Also described are scores based on subsets of items from each subject test, including the reporting category scores and the Progress with Text Complexity score.

9.1 ACT Aspire Subject ScoresACT Aspire scale scores are reported from grade 3 through EHS in English, mathematics, reading, science and writing. Scale scores for each subject are provided on a three-digit scale. Scale score ranges for each grade and subject are provided in Table 10.13 (Chapter 10). Summary descriptive statistics for the ACT Aspire subject scale scores from the 2013 scaling study are listed in Tables 10.8–10.12.

9.2 ACT Aspire Composite ScoreThe ACT Aspire Composite score represents the overall performance on the English, mathematics, reading and science tests.1 It is calculated as the average of the scale scores on the four subjects rounded to an integer (.5 rounds up). The Composite score is reported only for students taking grade 8 or EHS tests in all four subjects. Table 9.1 lists descriptive statistics for ACT Aspire Composite scores from the 2013 scaling study.

Table 9.1. Composite Score Descriptive Statistics based on the 2013 Scaling Study

Grade N Mean SD Min P10 P25 P50 P75 P90 P95 Max

8 2945 423.02 6.84 406 414 418 423 428 432 435 442

9 1641 423.99 7.93 408 414 418 423 430 435 438 447

10 1649 426.28 8.53 409 415 419 426 433 438 440 447

1 Note that the writing test is not included in the Composite score.

aCt aSPIRE SCORES

9.2

Averaging four ACT Aspire subject test scale scores to obtain a Composite score implies that each test is contributing equally to the Composite score. The weights used to calculate the Composite (in this case, .25) are often referred to as nominal weights. Other definitions of the contribution of a test score to a composite score may be more useful. For example, Wang and Stanley (1970) described effective weights as an index of the contribution of a test score to a composite score. Specifically, the contribution of a test score is defined as the sum of the covariances between the test score and all components contributing to the Composite score. These contributions can be summed over tests and then each can be divided by its sum to arrive at proportional effective weights. Proportional effective weights are referred to as effective weights here.

With nominal weights of .25 for each test, the effective weights can be used to verify that the nominal weight interpretation of Composite scores (i.e., composite as an equal weighted combination of contributing scores) is reasonable. Wang and Stanley (1970) state that variables will rarely have equal effective weights, unless explicitly designed to do so. Therefore, the effective weights would need to deviate substantially and consistently from nominal weights to justify applying different weights or a different interpretation of weights.

The effective weights using spring 2013 scale scores are shown in Table 9.2 for the ACT Aspire Composite scores and range from .14 to .32. The mathematics tests at grades 3–6 appeared to have smaller effective weights compared to English, reading, and science due to relatively smaller variances and covariances. However, the equal nominal weights appeared justifiable for ACT Aspire Composite scores.

Table 9.2. Composite Score Effective Weights-Equal Nominal Weights

Grade English Mathematics Reading Science

3 .29 .16 .25 .29

4 .28 .14 .27 .31

5 .29 .15 .26 .29

6 .29 .18 .25 .28

7 .29 .21 .23 .27

8 .29 .23 .23 .26

9 .32 .23 .22 .24

10 .31 .24 .22 .24

aCt aSPIRE SCORES

9.3

9.3 ACT Aspire ELA ScoreThe ACT Aspire ELA score represents the overall performance on the English, reading and writing tests. It is calculated as the average of the scale scores on the three subjects rounded to an integer (.5 rounds up). ELA scores are provided to students at all grade levels between 3 and EHS but are provided only when a student obtains scale scores for all three subject tests at the same grade level. Nominal weights for ELA score are .33 (equal weights for each of the three contributing subject scores). The effective weights of spring 2013 scale scores are listed in Table 9.3 and ranged from .26 to .43. English appeared to contribute more to the effective weights for grades 7–10, with weights greater than .40, compared to writing, where weights dipped below .30. However, the effective weights did not deviate far from the equal nominal weights for ELA scores.

Table 9.3. Ela Score Effective Weights-Equal Nominal Weights

Grade English Reading Writing

3 .35 .30 .35

4 .33 .31 .36

5 .36 .32 .33

6 .36 .30 .33

7 .41 .31 .29

8 .42 .32 .26

9 .43 .30 .27

10 .43 .30 .27

9.4 ACT Aspire STEM ScoreThe ACT Aspire STEM score represents the overall performance on the mathematics and science tests. It is calculated as the average of the scale scores on the two subjects rounded to an integer (.5 rounds up). The STEM score is provided for students at all grade levels between 3 and EHS but only when a student obtains scale scores for mathematics and science tests at the same grade level. Nominal weights for STEM are .5 (equal weights for each of the two contributing subject scores). The effective weights of spring 2013 scale scores are listed in Table 9.4 and ranged from .33 to .67. Similar to the Composite scores, the mathematics scores had smaller variances and covariances compared to science, particularly for grades 3, 4, and 5, which contributed to smaller effective weights. Despite the observed differences at lower grades, where science contributed nearly double to the STEM score variance compared to mathematics, the effective weights for grades 6–10 were progressively more similar to the nominal weights.

aCt aSPIRE SCORES

9.4

Table 9.4. StEm Score Effective Weights-Equal Nominal Weights

Grade Mathematics Science

3 .36 .64

4 .33 .67

5 .37 .63

6 .42 .58

7 .45 .55

8 .48 .52

9 .50 .50

10 .50 .50

9.5 Reporting Category ScoresStudent performance is also described in terms of the ACT Aspire reporting category scores. Score reports describe the percent and number of points students earn out of the total number of points possible in each reporting category. A list of the reporting categories can be found in Table 9.5. Reporting category scores can be used as supplemental information to further identify strength and weakness in areas within a subject.

Table 9.5. Reporting Categories within each Subject area

Subject Reporting Category

English Production of Writing

Knowledge of Language

Conventions of Standard English

Mathematics Grade Level Progress

Number and Operations—Fractions

Number and Operations in Base 10

The Number System

Number and Quantity

Operations and Algebraic Thinking

Expressions and Equations

Ratios and Proportional Relationships

Algebra

Functions

Geometry

Measurement and Data

Statistics and Probability

aCt aSPIRE SCORES

9.5


Subject Reporting Category

Foundation

Justification and Explanation

Modeling

Reading Key Ideas and Details

Craft and Structure

Integration of Knowledge and Ideas

Science Interpretation of Data

Scientific Investigation

Evaluation of Models, Inferences, and Experimental Results

Writing Ideas and Analysis

Development and Support

Organization

Language Use and Conventions

9.6 Progress with Text ComplexityThe ACT Aspire reading test includes a Progress with Text Complexity indicator (“Yes” or “No”) to indicate whether students have made sufficient progress in reading increasingly complex texts (see Chapter 3, Section 3.2.2.3). This indicator is based on a set of items from the ACT Aspire reading test that are judged as indicative of performance when reading and understanding increasingly complex texts. It is calculated by comparing the percent correct on the items to an empirically defined cut score. Regression was used to predict the percent correct on the set of items indicative of progress with text complexity using the reading scale score at the reading benchmark (see Chapter 13 for a description of the reading benchmark).

9.7 Progress toward Career Readiness IndicatorStudents at grade 8 through EHS with Composite scores (i.e., receiving scores on English, mathematics, reading, and science tests) receive a Progress toward Career Readiness indicator to help them predict future achievement on the ACT National Career Readiness Certificate (NCRC) toward the end of high school using their ACT Aspire performance. See Chapter 15 for details of how the indicator was developed. The indicator reported to the student is a verbal statement indicating which certificate level (i.e., Bronze, Silver, or Gold) the student is making progress toward. One purpose of the indicator is to encourage students to think about the knowledge and skills future job training will require. Whether or not a student plans to enter college immediately after high school, there is likely a need for some type of postsecondary training.

10.1

CHaPtER 10

ACT Aspire Score ScaleThis chapter presents the vertical scaling process for English, mathematics, reading, and science. The scaling test design was employed to establish the vertical scales for these ACT Aspire tests. Each student took the grade appropriate tests and the scaling tests of the same subjects. To be operationally feasible, four separate individual scaling tests, instead of one whole scaling test spanning all grade levels per subject were constructed. These four separate individual scaling tests were then re-connected to recover the whole scaling test for each subject for further analysis. Interim vertical scales with desired psychometric properties were first constructed based on the re-connected whole scaling test. Each on-grade test was then linked to the interim vertical scale for each subject and each grade. To maintain the vertical scale, future forms will be horizontally equated to the base form. Scaling for ACT Aspire writing, presented at the end of the chapter, was rubric-driven and writing scale scores do not have a vertical scale interpretation like those for scale scores obtained from other subjects.

10.1 ACT Aspire Scaling PhilosophyA conceptual definition of growth is crucial in developing the vertical scale. Kolen and Brennan (2014) provide two general definitions—the domain definition and the grade-to-grade definition. Under the domain definition, growth is defined over the entire range of test content covered by the battery, or the domain of content. Under the grade-to-grade definition, growth is defined over the content that is on a test level appropriate for typical students at a particular grade. In designing ACT Aspire assessments, ACT did not treat each grade level skills isolated only to that grade level. ACT Aspire assessments and items were designed to elicit evidence across the construct and across the learning trajectory. As such, ACT ascribes to a domain definition of growth, where student achievement over the entire range of content is considered. This conceptualization of growth leads naturally to the scaling test data collection design.

aCt aSPIRE SCORE SCalE

10.2

The scaling test design involves creation of a single test with items that cover the range of content and difficulty across the domain. This test is then administered to students across all covered grade levels and vertical scaling is used to place performance of students across grade levels onto the same scale. The next section describes in detail the scaling study used to establish the ACT Aspire score scales.

10.2 Scaling Study

10.2.1 Scaling Study DesignThe purpose of the scaling study is to establish a vertical scale for the ACT Aspire English, mathematics, reading, and science assessments. The ACT Aspire assessments include seven grade level tests per subject: grades 3–8 and early high school (EHS), which is given to students in grades 9 and 10. In addition, by including grade 11 in the scaling study, the vertical scale could be extended to include the ACT (see Appendix B), even though the ACT is not included in the ACT Aspire assessments.

The scaling test design was adopted to create the vertical scale for each subject. Under this design, each student completes two tests: an on-grade test appropriate for his/her grade level and a scaling test with items from multiple grades. The vertical scale is defined using the scaling test, and the scores for each on-grade test are linked to the scale through the scaling test.

10.2.2 Construction of Scaling testsDesign of the scaling tests is a key component in creating a sound vertical scale under the scaling test design. Items from the scaling tests should cover all content that defines the vertical scale and be sensitive enough to measure growth. Since the vertical scale covers grade 3 to grade 11, it is ideal to construct one scaling test per subject with items from all grade levels and administer this single scaling test to students from grades 3 to 11.

However, to include a sufficient number of items from each content domain in the scaling test, this single test consisting of eight levels of tests (i.e., grades 3–8, 9/10, and 11) would be much too long to administer operationally and would require students in certain grades to complete items which are significantly above or below their grade level and likely achievement level. The length of the scaling test and the prospect of, for example, administering high school students grade 3 items and third graders items intended for high school students, would be unlikely to result in good measurement. To create scaling tests of reasonable length, a single scaling test covering items from grades 3 through 11 (hereafter referred to as the whole scaling test and abbreviated WST) was broken into four separate tests in this study, each of which includes items from two or three consecutive grades with one overlapping grade (referred to as bridge grades) between tests. Specifically, scaling test 1 (ST1) includes items from grades 3 to 5, scaling test 2 (ST2) includes items from grades 5 to 7, scaling test 3 (ST3) includes items from grades 7 to EHS, and scaling test 4 (ST4) includes items from EHS and the ACT (see Figure 10.1).


10.3

Figure 10.1. Construction of the aCt aspire scaling tests

The scaling tests were designed to represent a miniature version of each on-grade test. The items chosen for the miniature version of each on-grade test were selected to be representative with respect to both content coverage and item statistics to the extent possible (approximately one-third of the items on an on-grade form for each grade were used in the scaling tests). There were no common items between the scaling test and the on-grade test administered to a student.

Within each scaling test, items were grouped and sorted by grade levels so that lower grade items were positioned earlier in the test than upper grade items. Items in the bridge grade 5 were identical in ST1 and ST2. Similarly, items in the bridge grades 7 and EHS were identical in ST2 and ST3, and in ST3 and ST4, respectively. The only difference between these items in different scaling tests was that they appear at the end of the lower level scaling test but at the beginning of the upper level scaling test. The time limit for each scaling test was 40 minutes.

10.2.3 Data Collection DesignThe purpose of the scaling study was to set up a vertical scale spanning from grades 3 to 11 and cross-sectional data collected for the study would be used for growth analysis. To minimize the impact on student growth caused by potential differences with regard to student characteristics across grade levels, it was desired to have cross-grade samples that were similarly representative of student populations. To achieve this goal, invitations were sent out at the district level. If districts agreed to participate in the study, they were asked to test students from each grade level. While participating districts were requested to test at least some students from every grade level, not every school in a district needed to participate. ACT encouraged each district to test at least half, but preferably all of its students.

Each student in grades 3–10 was asked to take five on-grade tests at the appropriate grade level (English, mathematics, reading, science, and writing) and two scaling


10.4

tests of different subjects, all of which were delivered online. They were not required to take the tests in one sitting or on the same day; a four-week testing window from April 22 through May 17, 2013, provided flexibility in scheduling and administering the tests. grade 11 students took the ACT battery tests with multiple choice items on paper plus the ACT writing test on paper if the district selected to do so. All ACT tests were administered under the standardized condition and students received college reportable ACT scores. In addition, each grade 11 student took three online delivered 30-minute constructed-response sections as the augmented piece to the ACT (mathematics, reading, and science), two scaling tests of different subjects, and the ACT writing test (if the districts did not select to administer the paper ACT writing test together with the ACT battery tests). In total, around 37,000 students from 232 schools and 23 districts in 14 states completed at least one test in the study. Tables C1–C5 (Appendix C) presents demographic information for students participating in the scaling study.

Once data were collected, analyses to create the scale were conducted separately for each subject. Students from grades 9 and 10 were combined for analysis of the early high school test. Table 10.1 presents the data structure used to analyze each subject. Each row lists the student grade level and the columns list the on-grade or scaling test assigned to students. Bridge grades are listed twice, one row as “A” for those students assigned to a lower level scaling test and one row as “B” for those assigned the upper level scaling test. The second column lists the on-grade test taken by students. The third column lists the scaling test taken by students. For example, grade 7 students (bridge grade) assigned to the “A” group would take the grade 7 on-grade test and the grades 5–7 scaling test (ST2). Because grades 9 and 10 were bridge grades, grade 9 students completed the EHS on-grade test plus either the lower level scaling test (ST3) or the upper level scaling test (ST4). Grade 10 followed a similar pattern: grade 10 students completed the EHS on-grade test plus ST3 or ST4.

In the scaling study, each student in each grade completed one scaling test and one on-grade test for the same subject. Approximately twice as many students were sampled for bridge grades (5, 7, and 9/10) compared to those from non-bridge grades to allow half of the bridge grade students to take the lower level scaling test and half to take the upper level one. All students in the same grade took the same on-grade test. The two groups taking lower/upper level scaling tests in each bridge grade were designed to be randomly equivalent. Once lists of recruited students were available, students at a bridge grade were randomly assigned to take the upper or lower level scaling test by spiraling tests across students. For example, for four students at a bridge grade within the same classroom, if the first student was assigned the lower level scaling test, the second would be assigned the upper, the third the lower, and the fourth the upper level scaling test. This spiraling pattern continued for all students in this classroom and continued into other classrooms and schools at the same grade level.


10.5

Table 10.1. Data Collection Design of the Scaling Study

Student Grade

On-Grade Test

Scaling Test

3 3 ST1

4 4 ST1

5A 5 ST1

5B 5 ST2

6 6 ST2

7A 7 ST2

7B 7 ST3

8 8 ST3

9A EHS ST3

9B EHS ST4

10A EHS ST3

10B EHS ST4

11 ACT ST4

12 ACT ST4

Note. ST1 includes items from grades 3–5; ST2 incudes items from grades 5–7; ST3 includes items from grades 7–EHS; ST4 includes items from EHS and the ACT.

10.2.4 Creation of Vertical ScalesUnder the scaling test design, the vertical alignment of student performance across all grades was obtained from the scaling tests. Once the alignment was established, a scale with desirable properties (i.e., target mean, standard deviation, standard error of measurement, number of scale score points, etc.) was created. Then, the base form of each on-grade test was linked to the vertical scale. As new on-grade forms are developed, they are horizontally equated to the base form to maintain the vertical scale (see Chapter 20).

The process of establishing the ACT Aspire vertical scale can be summarized in three steps:

1. Link across the four scaling tests to establish the vertical relationship from grades 3 to 11 in a subject.

2. Create the scale with desired properties based on the linked scaling test.

3. Link each on-grade test to the vertical scale.

Detailed descriptions of these three steps are presented below.


10.6

Step 1: Link across the Four Scaling Tests

The goal of this step was to link the four separate scaling tests so that scores of students taking different scaling tests were put on the same scale. Note that if the whole scaling test were given to all students, this step would not be necessary. ST3 (composed of items from grades 7 to EHS) was selected as the base test to conduct the linking and was chosen because (1) it contained items at the top of the ACT Aspire scale, (2) two of the three other scaling tests (ST2 and ST4) could be linked to it directly, and (3) it is adjacent to ST4, which was used to link the ACT with constructed-response tests (see Appendix B) to the ACT Aspire scale.

Adjacent scaling tests were linked through random equivalent groups taking lower/upper level scaling tests in each bridge grade (e.g., link ST1: grades 3-5 to ST2: grades 5-7 using grade 5 students). This linking design assumes the two groups at the bridge grades, one taking the upper level scaling test (denoted as B) and one taking the lower level scaling test (denoted as A), are randomly equivalent.

At the data collection stage, form spiraling occurred at the student level within each grade in a school. Therefore, the number of students assigned to the lower level scaling test was very close to that of students taking the upper level test in each bridge grade of each school. However, for a variety of reasons, students recruited and assigned to a form may not have actually tested. This could lead to unbalanced numbers of students taking different scaling tests that would affect the defensibility of randomly equivalent groups. A data cleaning rule was applied to attempt to reduce such effects and help ensure group equivalence. Specifically, if all students in the bridge grade in a school only took one level of a scaling test or if the ratio of students taking one level of the scaling test over the other level was more than 2, all students in that grade from that school were removed from the analysis.

To further evaluate whether the two groups were equivalent after data cleaning, on-grade test raw scores were compared between the two groups because they took the same on-grade test form. Table 10.2 presents sample size, mean, and standard deviation of on-grade test scores for the two groups in the bridge grades, as well as p-values of independent-sample t-tests and effect sizes between the two groups. None of the t-tests were statistically significant at the .05 level — an indication that scores for the two groups from each bridge grade did not differ significantly. Magnitudes of effect sizes were also very small (mostly within ±0.05).

Instead of using random equivalent groups to link across scaling tests, an alternative possible design was to link through common items between adjacent scaling tests. However, since common items appeared at the end of the lower level scaling test and at the beginning of the upper level scaling test, context effects were a concern. Items might appear easier when they were positioned at the beginning of the test than at the end of the test. To investigate whether context effects were present, the total raw score points on the common items were compared between the two groups


10.7

of students taking lower/upper level scaling tests. Since it was previously shown that the two groups were randomly equivalent, the average raw scores on the common item set across groups should be similar if there were no context effects. On the other hand, if context effects do exist, non-ignorable mean differences on the common items across groups would be expected. For example, it has been shown that the Groups 5A (grade 5, taking lower level scaling test) and 5B (grade 5, taking upper level scaling test) are equivalent, so if the average raw score on grade 5 items in the scaling test for 5A is different from that for 5B, context effects likely exist. Table 10.3 shows descriptive statistics of lower and upper level scaling test raw scores for groups taking common items, as well as p-values from t-tests and effect sizes between scores from the two groups. Group B scored higher on the common items for all subjects and all bridge grades. Items appearing at the beginning of the upper level scaling test appeared to be easier than when the same items appeared at the end of the lower level scaling test. The t-tests were statistically significant (at the 0.05 level) for grades 5 and 7 in all subjects and grade 9/10 in mathematics and science. Context effects appeared to affect performance on the common items, which is a violation of the assumption that items should perform similarly under the common-item design. Therefore, this design was not adopted.

Table 10.2. Raw Scores of On-grade tests for two groups taking lower/upper level Scaling tests

Subject Grade

Group A Group B p-value from t-test

Effect SizeN Mean SD N Mean SD

English 5 845 14.588 4.345 871 14.701 4.456 0.595 –0.03

7 889 19.580 5.898 928 19.496 5.820 0.760 0.01

9/10 667 26.187 10.461 687 25.128 10.431 0.062 0.10

Mathematics 5 868 10.058 3.382 870 10.176 3.459 0.472 –0.03

7 863 15.651 6.453 882 15.593 6.239 0.849 0.01

9/10 688 17.247 8.440 706 17.479 8.927 0.618 –0.03

Reading 5 822 15.798 5.492 833 15.739 5.717 0.831 0.01

7 806 13.166 5.241 800 13.340 5.364 0.511 –0.03

9/10 707 14.651 7.058 684 14.415 7.088 0.534 0.03

Science 5 727 18.171 7.132 721 18.337 7.104 0.657 –0.02

7 819 18.812 7.511 834 18.801 7.955 0.977 0.00

9/10 634 14.218 8.082 648 14.520 7.772 0.495 –0.04


10.8

Table 10.3. Raw Scores on Common Items for two groups taking lower/upper level Scaling tests

Subject Grade

Group A Group Bp-value

from t-testEffect SizeN Mean SD N Mean SD

English 5 845 10.174 3.818 871 11.200 3.292 < 0.0005 -0.29

7 889 9.529 3.907 928 10.335 3.534 < 0.0005 -0.22

9/10 667 12.337 5.614 687 12.582 5.280 0.408 -0.04

Mathematics 5 868 4.358 1.930 870 4.670 1.952 0.001 -0.16

7 863 3.928 2.299 882 4.195 2.296 0.015 -0.12

9/10 688 3.911 2.954 706 4.249 2.840 0.030 -0.12

Reading 5 822 3.456 1.720 833 3.619 1.635 0.048 -0.10

7 806 4.227 2.262 800 4.591 2.309 0.001 -0.16

9/10 707 4.494 3.225 684 4.722 3.182 0.185 -0.07

Science 5 727 3.884 2.129 721 4.277 2.080 < 0.0005 -0.19

7 819 3.891 1.984 834 4.125 1.910 0.015 -0.12

9/10 634 6.412 4.302 648 7.110 4.180 0.003 -0.16

Item response theory (IRT) was used as the statistical method to conduct the linking. This method involves putting IRT item parameters from the four scaling tests on the same scale. The two-parameter logistic (2PL) model was used for the dichotomously-scored items. The generalized partial credit model (GPCM) was used for polytomously-scored (i.e., constructed-response) items. The 2PL model contains two parameters for an item plus a parameter for student proficiency (theta). The GPCM is analogous to the 2PL model but incorporates more than two score categories. All calibrations were conducted using single group analysis with PARSCALE software (Muraki & Bock, 2003). Each of the four scaling tests was first independently calibrated. Under the randomly equivalent groups design, the mean and standard deviation (SD) of theta scores between the two groups in the bridge grade were used to compute the scale transformation constants (see Kolen & Brennan 2014, pp. 180–182). Students with extreme theta scores (less than or equal to −6 or greater than or equal to +6) were excluded from the calculation of mean and SD, because these theta scores were fixed by arbitrary boundaries set up in the scoring software. The obtained scale transformation slope and intercept were then applied to item parameter estimates from each independent calibration to generate the scaled item parameters for all scaling tests. Students’ theta scores were then estimated from the scaled item parameters which enabled them to be put on the same scale.

To examine and ensure the process of linking the four scaling tests did not distort the grade-to-grade growth relationships in the scaling tests, effect sizes were computed


10.9

based on the predicted whole scaling test scores and compared against the effect sizes computed from individual scaling test raw scores. The predicted whole scaling test scores were true score estimates obtained by applying the scaled item parameter estimates of all items on the whole scaling test given the scaled theta estimates of each student obtained from the part of the whole scaling test each student was actually administered.

Effect size was computed as the difference between the mean scores of adjacent grades divided by the square root of the average variances for the two groups (Kolen & Brennan, 2014, p. 461). Using the reading test as an example, the grade-to-grade growth in effect size derived from the whole scaling test scores using 2PL model and effect size derived from the individual scaling test raw scores are plotted in Figure 10.2. For example, the grade 3 to grade 4 effect size using individual scaling test raw scores was computed using ST1 raw scores on grades 3 and 4 students. For the bridge grades, the effect size was computed between one of the random groups and its adjacent grade taking the same scaling test. For example, effect size between grades 4 and 5 was computed on ST1 raw scores between grade 4 and 5A students; effect size between grades 5 and 6 was computed on ST2 raw scores between grade 5B and 6 students. Since 5A and 5B groups were shown to be equivalent, either group could be treated as representative of the entire group. The effect size between grades 9 and 10 was the average of effect sizes for 9A versus 10A and 9B versus 10B.

The overall growth pattern is very similar between the two sets of effect sizes, which indicate that the linking process did not distort the original grade-to-grade relationship. The magnitude of growth tends to decrease as grade level increases, with growth from grades 3 to 5 among the largest. There is zero to slightly negative growth observed between grade 5 and 6.

Figure 10.2. Effect sizes on individual scaling test raw score scale and whole scaling test raw score scale derived from the 2Pl model

Reading

-0.10

0.00

0.10

0.20

0.30

0.40

0.50

0.60

3-4 4-5 5-6 6-7 7-8 8-9 9-10 10-11

Effe

ct S

ize

Grade Span

Individual Scaling Test Raw Score

Whole Scaling Test Score 2PL


10.10

Step 2: Create the Vertical Scale Based on the Linked Scaling Tests

Generate the Projected Whole Scaling Test Raw Scores

After step 1 was completed, IRT item parameters for all four scaling tests were placed on the same scale. Students’ theta scores were also on the same scale when estimated from the scaled item parameters. Based on the scaled item parameter estimates and theta estimates, students’ true scores on the whole scaling test, which were on the raw score metric, were estimated from the IRT test characteristic curve (TCC) of the whole scaling test.

Create an Interim Scale with Constant Conditional Standard Error of Measurement (CSEM)

The conditional standard error of measurement (CSEM) of the raw scores on a test is typically an inverted U shape with the CSEM values being much smaller at the two ends. To stabilize the CSEM along the score scale, raw scores on the whole scaling test were nonlinearly transformed to an interim score scale using the arcsine transformation (e.g., see Kolen & Brennan, 2014, p. 405). Although this stabilization was applied to the scaling test to obtain a constant CSEM property for the ACT Aspire scale, after the on-grade tests were linked to the whole scaling test, there was no guarantee that the constant CSEM property would be maintained on the on-grade tests. Empirical results indicate that applying the arcsine transformation did help to stabilize the scale score CSEM of the on-grade test after it was linked to the whole scaling test.

To compute the CSEM, an extended Lord-Wingersky recursive algorithm (Hanson, 1994; Wang, Kolen, & Harris 2000, p. 219; Kolen & Brennan, 2014, p. 199) was adopted to obtain the expected distribution on the interim scale score for each student based on his/her theta score. The standard deviation of the expected score distribution was the CSEM of the interim score scale for that student. Weights were used to equalize the contribution of students from different grade levels to the CSEM calculation because sample sizes varied across grades, particularly for the bridge grades where the data collection design resulted in larger samples of students compared to other grades (e.g., see sample sizes listed in Tables 10.4–10.7).

Linear Transformation of the Interim Scale to the Final Scale

The interim scale scores, with the property of constant CSEM, were then linearly transformed to the ACT Aspire scale. Since linear transformation does not change the relative magnitude of the CSEMs, the constant CSEM property was carried over to the final scale. The linear transformation was selected so that the standard error of measurement on the whole scaling test was about 1.8 scale score points and the mean was set to an arbitrary value to anchor the scale.


10.11

Step 3: Link the On-Grade Test to the Vertical Scale

The last step in the vertical linking process was to link each of the on-grade English, mathematics, reading, and science tests to the vertical scale established using the whole scaling test in each subject. A single group linking design was adopted, since the same group of students took both the on-grade test and the scaling test in the same subject.

The IRT observed score linking method (Kolen & Brennan, 2014) was used to create the link. First, each on-grade test form was calibrated independently. Second, item parameter estimates from the on-grade test were transformed to the whole scaling test scale by matching the mean and standard deviation of theta scores estimated from each on-grade test to theta scores estimated from the scaled item parameters of the scaling tests. Third, the estimated distribution of number-correct scores (using scaled item parameters) on the on-grade form was linked to that of the whole scaling test. The IRT observed score linking yields the raw-to-scale conversion for each on-grade test. The scale scores were rounded, truncated, and shifted to desired score ranges to provide the final reportable scale scores.

10.3 Evaluation of the Vertical ScalesOnce the scale was created and students’ scale scores were generated, the following analyses were conducted to evaluate the scales:

1. Check whether the scale maintains the on-the-same-scale property. When multiple grades of students took the same scaling test, their scores on that scaling test were automatically on the same scale. The entire scaling process for ACT Aspire involved multiple steps, including connecting individual scaling tests and linking the on-grade test to the whole scaling tests. Therefore, it was important to verify that ACT Aspire scale scores obtained from on-grade tests could maintain the vertical relationships obtained from the scaling test. This property is referred to as the on-the-same-scale property. To evaluate this property, scaling test raw score on each of the four scaling tests were used as a reference to define the observed relation-ships across grade level. If the on-the-same-scale property was maintained, stu-dents who had the same raw scores on the scaling test should have similar scale scores derived from the on-grade test, even though their scale scores are from on-grade tests at different grade levels.

2. Evaluate the constant CSEM property. As mentioned above, the CSEM stabilization process was applied to the scale defined by the scaling tests. To evaluate whether this property was maintained for the on-grade tests, CSEM curves for each on-grade test were plotted and examined.

3. Evaluate the growth pattern. The growth pattern derived from ACT Aspire scale scores were examined and compared against the growth pattern derived from the raw scaling test scores.


10.12

10.4 Results

10.4.1 Scaling test Raw ScoresTables 10.4–10.7 present the means and standard deviations of raw scores on individual scaling tests and the means and standard deviations of raw scores on the portion of grade-specific items in each test. For example, the grade 3 students’ average raw score is 22.525 for the English scaling test (ST1) and when items are grouped by grade levels, the average score is 7.313 on all grade 3 items, 7.481 on all grade 4 items, and 7.731 on all grade 5 items. The average scores on items from different grade levels (i.e., across the columns) are not directly comparable since the number of items is different across grade levels. Similarly, average scaling test raw scores are only comparable among groups who took the same scaling test since the numbers of score points and most of the items are different across the four scaling tests.

Within each scaling test, summarizing the average total raw score on the scaling test by grade-level provides diagnostic information regarding where growth occurs. For example, a slightly negative growth occurs between grade 5 students (5B) and grade 6 students on ST2 in reading. Reviewing the means on items grouped by grade levels reveals that reverse growth occurred on the grade 6 items for reading. Another example of negative growth is on ST4 in reading between group 9B and 10B. Group 10B performed worse on both the EHS and the ACT items.

Context effects are also observable in Tables 10.4–10.7 between the bridge grade groups of students (A and B groups), because the common item mean scores are always higher in the B group (where common items are given at the beginning of the test) than the A group (where common items are given at the end of the test). This is true for all subjects and all bridge grades. As explained earlier, this was one reason why the common item linking design was not used to develop the vertical scale.

The standard deviations of raw scores tended to increase as grade level increased within the same scaling test. In other words, group variability increased as students advance in grade level.

10.4.2 Evaluating On-the-Same-Scale PropertyFigure 10.3 presents plots of the average scale scores per grade against the scaling test raw score, by scaling test. As can be seen, students with the same scaling test raw score have similar scale scores, on average, regardless of which on-grade tests they have taken. These results are consistent with an interpretation of scores of on-grade tests as being on the same vertical scale.

10.13


Table 10.4. mean and Standard Deviation of Raw Scores on English Scaling test (St) by grade Specific Items

ST (Score Range) Grade N

Mean (SD) of ST

Mean (SD) of Grade Specific Items (Score Range)

G3 (0-16)

G4 (0-17)

G5 (0-17)

G6 (0-16)

G7 (0-17)

G8 (0-16)

G_EHS (0-25)

G_ACT (0-45)

ST1 (0 – 50)

3 181022.525

(7.975)

7.313

(2.542)

7.481

(3.308)

7.731

(3.499)

4 161726.078

(8.320)

8.208

(2.585)

8.730

(3.347)

9.140

(3.689)

5A 84529.060

(8.631)

9.058

(2.713)

9.828

(3.380)

10.174

(3.818)

ST2 (0 – 50)

5B 87128.021

(8.704)

11.200

(3.292)

8.503

(2.998)

8.318

(3.582)

6 182428.548

(9.528)

11.386

(3.604)

8.490

(3.189)

8.672

(3.834)

7A 88930.670

(9.678)

12.048

(3.496)

9.093

(3.280)

9.529

(3.907)

ST3 (0 – 58)

7B 92829.763

(10.335)

10.335

(3.534)

8.672

(3.524)

10.755

(4.514)

8 143631.769

(11.059)

10.818

(3.683)

9.171

(3.677)

11.781

(4.807)

9A 34332.402

(12.237)

11.117

(3.975)

9.172

(4.042)

12.114

(5.228)

10A 32432.710

(13.787)

10.843

(4.267)

9.293

(4.410)

12.574

(5.995)

ST4 (0 – 70)

9B 36829.481

(12.552)

12.332

(5.122)

17.149

(8.281)

10B 31931.367

(14.036)

12.871

(5.45)

18.495

(9.288)

11 76032.925

(13.797)

13.943

(5.206)

18.982

(9.356)

10.14


Table 10.5. mean and Standard Deviation of Raw Scores on mathematics Scaling test (St) by grade Specific Items


Mean (SD) of ST


G3 (0-12)

G4 (0-12)

G5 (0-13)

G6 (0-11)

G7 (0-13)

G8 (0-7)

G_EHS (0-17)

G_ACT (0-21)

ST1 (0 – 37)

3 17939.090

(3.573)

3.732

(1.944)

2.697

(1.472)

2.661

(1.503)

4 162411.580

(4.360)

4.573

(2.029)

3.345

(1.841)

3.662

(1.743)

5A 86813.778

(5.169)

5.315

(2.133)

4.105

(2.212)

4.358

(1.930)

ST2 (0 – 37)

5B 8709.924

(3.434)

4.670

(1.952)

2.600

(1.237)

2.654

(1.501)

6 175110.817

(4.216)

4.812

(2.047)

2.939

(1.428)

3.065

(1.752)

7A 86312.818

(5.17)

5.395

(2.129)

3.495

(1.693)

3.928

(2.299)

ST3 (0 – 37)

7B 8829.347

(4.689)

4.195

(2.296)

2.407

(1.508)

2.745

(1.942)

8 145110.580

(5.289)

4.504

(2.369)

2.682

(1.505)

3.394

(2.375)

9A 38111.213

(6.218)

4.559

(2.704)

2.877

(1.679)

3.777

(2.719)

10A 30712.160

(7.023)

5.156

(2.836)

2.925

(1.745)

4.078

(3.219)

ST4 (0 – 38)

9B 37610.580

(5.691)

4.133

(2.632)

6.447

(3.427)

10B 33011.361

(6.926)

4.382

(3.060)

6.979

(4.269)

11 66912.765

(6.632)

5.139

(3.041)

7.626

(4.087)

10.15


Table 10.6. mean and Standard Deviation of Raw Scores on Reading Scaling test (St) by grade Specific Items


Mean (SD) of ST


G3 (0-8)

G4 (0-8)

G5 (0-7)

G6 (0-9)

G7 (0-10)

G8 (0-5)

G_EHS (0-13)

G_ACT (0-18)

ST1

(0 – 23)

3 17698.077

(4.054)

3.003

(1.686)

2.757

(1.879)

2.317

(1.451)

4 15699.811

(4.467)

3.516

(1.755)

3.587

(2.04)

2.709

(1.573)

5A 82212.102

(4.67)

4.153

(1.71)

4.493

(2.154)

3.456

(1.72)

ST2

(0 – 26)

5B 83312.552

(4.903)

3.619

(1.635)

5.261

(2.322)

3.672

(2.036)

6 175912.495

(5.357)

3.624

(1.695)

5.077

(2.422)

3.794

(2.179)

7A 80613.878

(5.384)

3.97

(1.685)

5.681

(2.388)

4.227

(2.262)

ST3

(0 – 28)

7B 80010.824

(5.344)

4.591

(2.309)

2.644

(1.502)

3.589

(2.491)

8 132111.775

(5.687)

4.861

(2.359)

2.864

(1.597)

4.05

(2.697)

9A 38612.212

(6.105)

4.982

(2.375)

2.876

(1.621)

4.355

(2.988)

10A 32112.879

(7.243)

5.202

(2.807)

3.016

(1.769)

4.66

(3.487)

ST4

(0 – 31)

9B 36911.756

(6.02)

4.772

(3.094)

6.984

(3.529)

10B 31511.514

(6.701)

4.663

(3.287)

6.851

(3.932)

11 70712.795

(6.181)

5.413

(3.059)

7.382

(3.775)

10.16


Table 10.7. mean and Standard Deviation of Raw Scores on Science Scaling test (St) by grade Specific Items


Mean (SD) of ST


G3 (0-6)

G4 (0-16)

G5 (0-9)

G6 (0-14)

G7 (0-8)

G8 (0-11)

G_EHS (0-17)

G_ACT (0-22)

ST1 (0 – 31)

3 180612.108

(5.049)

3.538

(1.515)

6.095

(2.909)

2.475

(1.699)

4 155114.453

(5.466)

3.983

(1.434)

7.362

(3.168)

3.108

(1.873)

5A 72716.993

(5.945)

4.415

(1.286)

8.693

(3.484)

3.884

(2.129)

ST2 (0 – 31)

5B 72115.434

(6.267)

4.277

(2.08)

7.712

(3.247)

3.445

(1.852)

6 151415.687

(6.789)

4.327

(2.236)

7.814

(3.448)

3.546

(1.984)

7A 81917.098

(6.836)

4.75

(2.245)

8.457

(3.44)

3.891

(1.984)

ST3 (0 – 36)

7B 83414.138

(6.306)

4.125

(1.91)

4.788

(2.266)

5.225

(3.302)

8 142615.243

(6.993)

4.391

(1.946)

5.1

(2.4)

5.752

(3.749)

9A336

16.202

(8.015)

4.464

(2.141)

5.39

(2.728)

6.348

(4.195)

10A 29816.319

(8.281)

4.47

(2.181)

5.366

(2.707)

6.483

(4.425)

ST4 (0 – 39)

9B 32814.137

(6.547)

6.747

(3.832)

7.39

(3.517)

10B 32015.272

(7.795)

7.481

(4.484)

7.791

(4.066)

11 68515.54

(7.576)

7.672

(4.232)

7.869

(4.138)


10.17

Figure 10.3. Scatter plots of average scale scores against scaling test raw scores by grade

A. English


10.18

Figure 10.3. (continued)

B. Mathematics


10.19


C. Reading


10.20


D. Science


10.21

10.4.3 Evaluating Constant Conditional Standard Error of measurement PropertyThe CSEMs of raw scores and scale scores for English, mathematics, reading, and science are plotted in Figure 10.4. Each curve represents a grade level, and each figure represents CSEM for raw or scale scores on one ACT Aspire subject.

The ACT Aspire score scales begin at 400 and have different maximum value (up to 460) depending on the grade and subject area. For most grades and subjects, the scale score CSEM curves are relatively flat along the scale range, especially when compared to the raw score CSEM, which show a typical inverted U-shape.

In Figure 10.4, for most ACT Aspire scale score points, CSEMs are within a range of two scale score points. The CSEMs drop dramatically when scale scores are low and they never appear higher than roughly 4 scale score points. While these CSEMs are not perfectly flat, which would imply a perfectly constant CSEM across the score scale, from a practical perspective they are reasonably constant. For example, the second graph in Figure 10.4 presents CSEMs for English, and the CSEMs for most of the scale score points are between 2 and 4 scale score points across all grades. Within each grade for all subjects, scale score CSEMs are generally within one scale score point for most of the score scale points. For example, the grade 4 English CSEM for scale scores fluctuates between 2 and 3 across most of the score scale points excluding the bottom few scale score points.


10.22

Figure 10.4. Conditional standard error of measurement on raw and scale score scales

A. English


10.23


B. Mathematics


10.24


C. Reading


10.25


D. Science


10.26

10.4.4 growth PatternsEffect sizes derived from comparing different grade levels across three types of scores were computed and compared. Figure 10.5 displays three sets of effect sizes, one derived from the final ACT Aspire scale scores on the on-grade test, one from the projected whole scaling test scores, and one from individual scaling test raw scores. Effect sizes derived from individual scaling test raw scores were computed between two groups taking the same scaling test, as described in the section of “Creating the Vertical Scale – Step 1: Link across the Four Scaling Tests.”

In general, the three sets of effect sizes for lower grades (3–8) are very similar. The patterns diverge in some cases for upper grades. One reason might be that grade 9 and 10 students were combined during the scaling analysis. Also, there were two sources of growth from grade 9 to grade 10: between 9A and 10A students who took ST3 and between 9B and 10B who took ST4. The effect size computed using the scaling test raw scores was the average of both sources. If the growth patterns were different from 9A to 10A than from 9B to 10B, this might lead to different effect sizes between 9 and 10 when combining the A and B groups.

Figure 10.5. Effect sizes on individual scaling test raw score scale, whole scaling test raw score scale, and aCt aspire score scale

English

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

3-4 4-5 5-6 6-7 7-8 8-9 9-10 10-11

Eff

ect

Siz

e

Grade Span Individual Scaling Test Raw Score Whole Scaling Test Score ACT Aspire Scale Score


10.27


Mathematics

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

3-4 4-5 5-6 6-7 7-8 8-9 9-10 10-11

Eff

ect

Siz

e

Grade Span

Individual Scaling Test Raw Score Whole Scaling Test Score ACT Aspire Scale Score

Reading

-0.10 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90

3-4 4-5 5-6 6-7 7-8 8-9 9-10 10-11

Eff

ect

Siz

e

Grade Span


Science

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

3-4 4-5 5-6 6-7 7-8 8-9 9-10 10-11

Eff

ect

Siz

e

Grade Span



10.28

Basic descriptive statistics of the final scale scores for all students in the scaling study are presented in Tables 10.8–10.12.. Note that the n-counts in these tables are larger than the groups used for the scaling analysis, where only students who took both scaling tests and on-grade tests were included. Since this is not longitudinal data and samples may not be representative in each grade, the decreasing means observed for certain subjects as grade increases may be explainable as a characteristic of the sample rather than a characteristic of the scale.1 Future results from operational administrations with larger samples of highly motivated students taking ACT Aspire over multiple years will provide more robust estimates of longitudinal growth on the ACT Aspire scale.

Table 10.8. English Scale Score Descriptive Statistics based on Scaling Study by grade


3 4,277 416.50 6.14 403 409 412 416 421 424 428 435

4 3,825 419.65 6.23 402 411 415 420 423 428 430 438

5 4,301 421.98 7.04 403 413 417 421 427 431 435 442

6 4,683 422.56 8.09 400 412 416 423 428 433 437 448

7 4,762 424.84 8.56 400 413 419 425 430 436 439 450

8 3,609 426.38 8.94 401 415 421 426 433 438 442 452

9 2,415 425.48 10.76 400 413 417 424 433 440 445 456

10 2,258 429.07 11.59 400 414 419 429 439 445 447 456

11 2,240 429.15 11.64 400 414 420 429 438 445 448 460

Table 10.9. mathematics Scale Score Descriptive Statistics based on Scaling Study by grade


3 4,300 411.75 3.69 400 407 409 412 414 417 418 426

4 3,830 414.75 3.81 403 410 413 415 417 420 421 429

5 4,260 416.78 4.43 404 412 414 417 419 422 424 435

6 4,577 418.15 5.72 402 412 414 418 421 426 429 441

7 4,497 419.83 6.42 402 413 415 419 424 429 431 442

8 3,746 422.48 7.30 403 413 417 422 427 432 436 448

9 2,479 422.82 8.05 406 414 417 422 428 434 438 449

10 2,528 425.15 9.10 406 414 418 424 432 438 441 450

11 1,796 427.34 9.38 407 417 420 426 433 441 445 457

1 Writing scale scores would not be expected to increase across grades due to how it was scaled. See section 10.5 “Scaling ACT Aspire Writing.”


10.29

Table 10.10. Reading Scale Score Descriptive Statistics based on Scaling Study by grade


3 4,307 411.56 5.27 401 406 407 411 415 419 422 429

4 3,661 414.02 5.66 401 407 410 414 418 423 424 431

5 4,129 416.20 6.23 401 408 411 416 420 425 427 434

6 4,520 416.90 6.84 402 409 412 416 422 427 428 436

7 4,475 418.75 6.73 402 410 414 419 424 427 429 438

8 3,585 419.76 7.14 401 410 414 420 425 429 431 440

9 2,257 419.71 7.86 403 410 414 419 426 430 433 442

10 2,260 421.30 8.22 403 411 415 421 428 433 434 442

11 1,789 422.43 7.37 402 413 417 422 428 433 435 442

Table 10.11. Science Scale Score Descriptive Statistics based on Scaling Study by grade


3 4,214 413.89 5.99 401 407 409 414 418 422 424 433

4 3,571 416.29 6.53 400 407 412 416 421 425 427 435

5 3,903 419.05 6.78 401 410 414 420 424 427 430 438

6 4,642 418.76 7.54 400 409 412 419 424 429 431 440

7 4,756 420.49 7.65 401 410 414 421 426 431 432 441

8 3,544 422.14 7.91 401 412 416 422 427 433 435 446

9 2,167 422.65 8.22 402 412 417 421 429 434 437 447

10 2,314 424.43 8.89 402 414 417 424 431 436 439 449

11 1,717 424.59 9.04 400 412 417 425 431 437 439 449

Table 10.12. Writing Scale Score Descriptive Statistics based on Scaling Study by grade


3 4,109 422.57 6.49 408 414 418 422 428 430 432 440

4 3,439 423.16 7.06 408 416 418 424 428 432 434 440

5 3,935 422.21 7.00 408 412 418 424 426 432 432 440

6 4,512 425.80 8.06 408 416 420 426 432 436 440 448

7 4,666 425.47 6.87 408 416 420 426 430 434 438 448

8 3,522 422.66 6.40 408 416 418 424 426 432 432 448

9 2,117 422.22 7.66 408 410 418 424 428 432 434 448

10 2,258 424.40 8.28 408 412 418 424 430 434 440 448


10.30

The Lowest Obtainable Scale Scores (LOSS) and the Highest Obtainable Scale Scores (HOSS) of the ACT Aspire scales for all subjects and grades are presented in Table 10.13. Note that English, mathematics, reading, and science tests all have the same minimum scale score of 400 but the maximum scale scores vary across grade level within a subject. The HOSS are different among the four subjects and also across grade levels. HOSS increase as the grade level increase in each subject. All writing tests across grades have the same minimum scale score of 408 but the maximum scale scores are 440 or 448, depending on the grade level. Scaling of the ACT Aspire writing test was conducted differently from the other subjects and is described in the next section.

Table 10.13. lowest Obtainable Scale Scores (lOSS) and Highest Obtainable Scale Scores (HOSS)

Subject Grade LOSS HOSS Subject Grade LOSS HOSS

English 3 400 435 Science 3 400 433

4 400 438 4 400 436

5 400 442 5 400 438

6 400 448 6 400 440

7 400 450 7 400 443

8 400 452 8 400 446

9 400 456 9 400 449

10 400 456 10 400 449

Mathematics 3 400 434 Writing 3 408 440

4 400 440 4 408 440

5 400 446 5 408 440

6 400 451 6 408 448

7 400 453 7 408 448

8 400 456 8 408 448

9 400 460 9 408 448

10 400 460 10 408 448

Reading 3 400 429

4 400 431

5 400 434

6 400 436

7 400 438

8 400 440

9 400 442

10 400 442


10.31

10.5 Scaling ACT Aspire WritingThe ACT Aspire writing scale scores are rubric-driven and are based on four domains (Ideas and Analysis, Development and Support, Organization, and Language Use and Conventions), with each domain scored on a 1 to 5 or a 1 to 6 scale (1 to 5 in grades 3–5, 1 to 6 in grades 6–EHS). The scoring rubrics become more complex as grade increases, however, and a domain score of 4 indicates expected performance at each grade in each rubric domain. The total writing raw score is the sum of the four domain raw scores and ranges from 4 to 20 for grades 3–5 and from 4 to 24 for grades 6–EHS. A common linear function shared by all grades was used to convert the total raw scores to the three-digit ACT Aspire scale scores for the writing base forms. The lowest obtainable scale score for writing is 408 for all grades, and the highest obtainable scale score is 440 for grades 3–5 and 448 for grades 6–EHS. Table 10.13 lists the lowest and highest obtainable scale scores for writing by grade level.

The scoring process for writing implies that writing test scale scores do not have a vertical scale interpretation like those for scale scores obtained in the other subjects. Specifically, writing scale scores across grade levels reflect performance on the rubric-driven expectations; the underlying performance needed to receive the same score is automatically adjusted to grade-level expectations. Consistency in scores across grade levels suggests the same performance relative to grade-level expectations. For example, if a student receives a score of 432 on the writing test in grade 3 and a score of 432 in grade 8, it means that the student has achieved a level of performance that is consistently located relative to grade-level expectations. In other words, the same score in grades 3 and 8 means that a student has grown to meet grade-level expectations. Table 10.12 lists the descriptive statistics for ACT Aspire writing scale scores from the sample of students included in the scaling study.

11.1

CHaPtER 11

ACT Aspire NormsThis chapter describes percentages of students meeting ACT Readiness Benchmarks, sample demographic information, and norm tables based on the norm sample that is used for spring 2014 – spring 2015 ACT Aspire reporting.

ACT Aspire norms are defined as the cumulative percentage of students scoring at or below a given score in the norm sample. The norm sample is a reference sample of students taking the ACT Aspire tests. Norms tables list the cumulative percentage of students who are at or below each scale score point in the norm sample. For example, a cumulative percent of 50 for a scale score of 420 on the mathematics test means that 50% of students in the norm group achieved a scale score of 420 or below on that test.

In 2014, normative information was provided based on student samples from 2013 special study data and 2014 operational data. Starting in fall 2015, ACT Aspire began reporting three-year rolling norms, similar to the ACT. Rolling norms include student samples from the three most recent years of ACT Aspire test administrations.

Prior to fall 2015, ACT Aspire norms were national, with a broad representation across the country, but they were not nationally representative norms, since they had not been statistically weighted to more closely mimic the national distribution of demographics. The norm information starting from fall 2015 was described in the later addendum section. ACT Aspire includes national norms for grades 3 through EHS in English, mathematics, reading, science and writing. Table 11.1 shows the percentage of students who met the benchmark in the 2014 norm sample in each of the five subjects, which are also percentages of students who attain the ‘Ready’ or ‘Exceeding’ performance levels (see Chapter 13). Norm group demographic characteristics, including gender, state, and race/ethnicity, for each grade and subject area are included in Tables 11.2–11.6. The 2014 ACT Aspire norms are listed in Tables 11.7–11.11.

aCt aSPIRE NORmS

11.2

Table 11.1. Percentage of Students meeting Subject benchmark in the 2014 Norm Sample

Grade English Mathematics Reading Science Writing

3 71 50 34 29 16

4 69 45 37 35 19

5 68 40 33 37 26

6 68 43 41 38 42

7 71 34 35 33 26

8 73 31 45 34 26

9 60 35 38 30 35

10 63 32 34 31 45

Table 11.2. 2014 aCt aspire English Norm group Demographics

Grade (%)

3 (n=36,139)

4 (n=36,353)

5 (n=36,890)

6 (n=35,258)

7 (n=35,618)

8 (n=37,796)

9 (n=23,096)

10 (n=19,417)

Gender

F 45.24 45.29 46.12 44.99 44.49 45.41 44.34 44.26

M 47.21 47.22 46.89 46.11 45.61 45.72 43.31 43.59

No response 7.56 7.48 6.99 8.90 9.90 8.87 12.36 12.15

State

AL 66.38 67.74 65.16 61.33 56.29 51.44 7.13 6.44

AR 0.05 0.05 0.04 0.07 0.04 0.06 0.07 0.09

AZ — — — — — 0.15 0.31 0.26

CA 0.28 0.24 2.94 2.99 0.33 0.24 1.26 1.59

CO 2.42 2.28 2.03 1.70 1.62 1.91 2.22 1.98

CT 0.16 0.16 0.20 0.18 0.12 0.19 1.27 1.34

FL 0.06 0.11 — — — 0.10 1.02 0.62

GA 0.08 0.09 0.12 0.10 — — — —

IA 0.54 0.58 0.54 0.35 0.52 0.33 0.23 1.12

IL 1.54 1.47 1.60 1.79 1.87 1.66 3.97 4.29

IN 1.01 1.10 1.02 2.22 1.84 0.64 1.13 1.73

KS 1.66 1.64 1.51 2.84 2.54 1.64 2.55 2.44

KY 4.91 4.98 4.03 3.40 3.82 2.66 5.94 2.42

LA 2.17 2.07 1.99 2.05 1.66 1.55 3.16 2.98

MA — — — — — — — 0.09

MI 3.65 2.96 3.59 4.75 8.41 17.28 22.38 26.19

MN 0.32 0.31 0.34 0.48 0.38 0.37 — 0.52

MO 1.84 1.44 1.93 1.65 1.51 1.37 2.93 4.11

Note. — indicates no students tested.

aCt aSPIRE NORmS

11.3


Grade (%)

3 (n=36,139)

4 (n=36,353)

5 (n=36,890)

6 (n=35,258)

7 (n=35,618)

8 (n=37,796)

9 (n=23,096)

10 (n=19,417)

MS 2.27 2.64 2.59 2.77 2.65 2.07 2.55 1.48

ND — — — — — 0.23 0.16 0.31

NE — — — — — — 1.59 1.89

NJ — — — — — — 0.44 —

NM — — — — — — 0.57 —

NV — — — — — 0.09 0.06 0.07

OH 1.48 1.24 1.14 1.61 3.76 1.74 9.30 5.28

OK 0.70 0.91 0.69 1.14 1.08 1.03 0.14 0.09

PA — — — — — — 1.13 1.18

SC 3.61 3.57 3.63 3.73 3.66 3.63 1.69 1.99

SD — — 0.20 — — 0.21 — 0.25

TN 1.31 1.43 1.42 1.45 1.84 1.95 4.30 4.99

TX 1.96 1.16 1.60 1.32 1.90 2.07 2.32 5.04

UT 0.79 0.82 0.80 0.96 1.34 0.96 0.71 —

VA — — — 0.08 — 0.25 — —

WI 0.82 1.01 0.89 1.01 2.74 4.11 19.16 18.98

No response — — — 0.01 0.06 0.08 0.31 0.27

Race/Ethnicity

Black/African American 22.62 22.41 22.43 21.47 20.97 18.37 9.07 8.49

American Indian/Alaska Native

1.07 1.03 0.92 1.19 0.82 0.88 0.52 0.64

White 44.81 45.24 43.64 41.79 42.25 44.53 35.21 38.01

Hispanic/Latino 5.47 5.27 5.29 6.67 6.52 5.62 5.19 4.65

Asian 1.68 1.62 1.59 1.28 1.42 1.44 1.40 2.02

Native Hawaiian/Other Pacific Islander

0.58 0.52 0.91 0.78 0.55 0.40 0.48 0.37

Two or more races 0.41 0.32 0.32 0.41 0.61 0.45 0.51 0.78

No response 23.35 23.59 24.89 26.42 26.85 28.31 47.63 45.04


aCt aSPIRE NORmS

11.4

Table 11.3. 2014 aCt aspire mathematics Norm group DemographicsGrade (%)

3 (n=76,861)

4 (n=76,012)

5 (n=77,083)

6 (n=75,651)

7 (n=77,891)

8 (n=80,763)

9 (n=23,011)

10 (n=19,441)

Gender

F 47.15 47.00 47.30 46.91 46.46 47.50 44.60 44.04

M 48.92 49.08 48.96 48.50 48.62 48.19 43.26 43.56

No response 3.93 3.92 3.74 4.59 4.92 4.32 12.14 12.40

State

AL 83.98 84.62 82.65 81.73 80.05 76.23 5.53 4.99

AR 0.02 0.02 0.02 0.03 0.02 0.03 0.07 0.09

AZ — — — — — 0.07 0.30 0.26

CA 0.13 0.11 1.33 1.37 0.11 0.09 1.24 1.62

CO 1.15 1.10 0.98 0.81 0.74 0.89 2.23 2.06

CT 0.06 0.09 0.09 0.02 0.06 0.08 1.27 1.33

FL 0.03 0.05 — — — 0.04 0.98 0.59

GA 0.04 0.04 0.06 0.05 — — — —

IA 0.26 0.28 0.26 0.16 0.24 0.15 0.24 1.15

ID — — — — — 0.09 — —

IL 0.74 0.70 0.76 0.85 0.86 0.79 4.05 3.93

IN 0.52 0.52 0.49 1.06 0.82 0.37 1.02 1.91

KS 0.81 0.77 0.73 1.20 1.00 0.75 2.88 2.74

KY 2.25 2.38 1.93 1.52 1.72 1.28 5.59 2.41

LA 1.03 0.99 0.95 0.95 0.76 0.73 3.09 2.95

MA — — — — — — — 0.09

MI 1.77 1.44 1.88 2.79 4.10 8.67 22.38 26.31

MN 0.15 0.15 0.16 0.29 0.01 0.17 — 0.53

MO 0.85 0.69 0.91 0.72 0.72 0.64 2.89 3.37

MS 1.11 1.26 1.24 1.32 1.09 0.98 1.79 1.50

ND — — — — — 0.11 0.17 0.31

NE — — — — — — 1.60 1.89

NJ — — — — — — 0.44 —

NM — — — — — — 0.59 —

NV 0.01 0.01 0.01 — — 0.04 0.06 0.07


aCt aSPIRE NORmS

11.5

Table 11.3. (continued)Grade (%)

3 (n=76,861)

4 (n=76,012)

5 (n=77,083)

6 (n=75,651)

7 (n=77,891)

8 (n=80,763)

9 (n=23,011)

10 (n=19,441)

OH 0.69 0.56 0.54 0.75 1.65 0.94 9.51 5.92

OK 0.43 0.43 0.33 0.54 0.54 0.50 0.17 0.09

PA — — — — — — 1.00 1.18

SC 1.69 1.71 1.73 1.74 1.66 1.68 2.05 1.99

SD — — 0.09 — — 0.10 — 0.25

TN 0.62 0.67 0.68 0.68 0.84 0.88 4.24 4.98

TX 0.90 0.57 1.39 0.56 1.07 1.04 2.63 5.12

UT 0.37 0.38 0.36 0.35 0.66 0.46 0.72 —

VA — — — 0.04 — 0.12 — —

WI 0.40 0.45 0.42 0.47 1.25 2.04 20.98 20.11

No response — — — 0.00 0.03 0.04 0.30 0.27

Race/Ethnicity



0.84 0.84 0.83 0.93 0.78 0.80 0.65 0.67

White 50.40 50.64 50.00 49.05 49.75 50.61 35.77 36.98

Hispanic/Latino 6.17 5.96 5.70 5.81 5.64 5.12 5.34 4.92

Asian 1.48 1.50 1.48 1.25 1.31 1.35 1.40 2.01


0.31 0.28 0.47 0.36 0.28 0.22 0.56 0.38

Two or more races 0.21 0.16 0.16 0.18 0.32 0.22 0.53 0.78

No response 13.96 13.54 14.34 14.79 14.21 14.91 47.63 45.97


aCt aSPIRE NORmS

11.6

Table 11.4. 2014 aCt aspire Reading Norm group DemographicsGrade (%)

3 (n=76,817)

4 (n=75,577)

5 (n=76,316)

6 (n=74,862)

7 (n=77,497)

8 (n=80,077)

9 (n=22,711)

10 (n=19,060)

Gender

F 47.17 47.10 47.34 46.93 46.59 47.48 44.57 43.92

M 49.05 49.12 49.06 48.56 48.84 48.20 43.15 43.48

No response 3.78 3.78 3.60 4.51 4.56 4.33 12.28 12.60

State

AL 84.13 85.03 83.56 82.45 80.36 76.93 5.70 5.70

AR 0.02 0.02 0.02 0.03 0.02 0.03 0.07 0.09

AZ — — — — — 0.07 0.32 0.27

CA 0.12 0.10 1.43 1.39 0.19 0.05 1.23 1.58

CO 1.15 1.10 0.98 0.81 0.75 0.91 2.23 2.03

CT 0.04 0.06 0.09 0.09 0.02 0.10 1.16 1.36

FL 0.03 0.05 — — — 0.05 1.06 0.63

GA 0.04 0.04 0.06 0.05 — — — —

IA 0.26 0.28 0.26 0.17 0.24 0.15 0.24 1.17

ID — — — — — 0.09 — —

IL 0.73 0.70 0.77 0.85 0.83 0.79 4.12 4.47

IN 0.49 0.41 0.42 1.02 0.82 0.30 0.78 1.21

KS 0.77 0.74 0.67 1.18 1.00 0.77 2.29 2.46

KY 2.24 2.39 1.95 1.50 1.71 1.03 5.14 1.41

LA 1.03 1.00 0.96 0.94 0.76 0.73 3.05 2.90

MA — — — — — — — 0.09

MI 1.83 1.48 1.85 2.26 3.88 8.46 22.80 26.78

MN 0.15 0.15 0.17 0.26 0.04 0.18 — 0.52

MO 0.85 0.70 0.89 0.75 0.71 0.55 2.90 3.81

MS 1.23 1.27 1.23 1.34 1.09 0.94 2.49 1.45

ND — — — — — 0.11 0.17 0.31

NE — — — — — — 1.62 1.92

NJ — — — — — — 0.43 —

NM — — — — — — 0.61 —


aCt aSPIRE NORmS

11.7


3 (n=76,817)

4 (n=75,577)

5 (n=76,316)

6 (n=74,862)

7 (n=77,497)

8 (n=80,077)

9 (n=22,711)

10 (n=19,060)

NV 0.01 0.01 0.01 — — 0.04 0.06 0.07

OH 0.69 0.59 0.55 0.65 1.75 0.93 9.70 5.67

OK 0.42 0.38 0.27 0.53 0.54 0.48 0.13 0.09

PA — — — — — — 0.96 1.23

SC 1.69 1.71 1.76 1.76 1.67 1.72 2.09 2.00

SD — — 0.09 — — 0.10 — 0.27

TN 0.63 0.68 0.69 0.69 0.85 0.91 4.38 4.99

TX 0.92 0.56 0.75 0.62 0.85 0.94 2.27 5.24

UT 0.17 0.14 0.13 0.22 0.62 0.45 0.70 —

VA — — — 0.03 — 0.12 — —

WI 0.39 0.41 0.43 0.40 1.26 2.04 20.99 19.99

No response — — — 0.00 0.03 0.04 0.32 0.27

Race/Ethnicity



0.82 0.83 0.82 0.92 0.77 0.80 0.66 0.71

White 50.30 50.77 50.01 49.17 49.74 50.54 35.08 36.85

Hispanic/Latino 6.14 5.86 5.56 5.79 5.63 5.00 5.03 4.71

Asian 1.45 1.48 1.42 1.24 1.34 1.30 1.36 2.00


0.26 0.23 0.41 0.31 0.22 0.22 0.53 0.38

Two or more races 0.21 0.15 0.15 0.18 0.33 0.18 0.55 0.80

No response 13.94 13.50 14.41 14.57 14.09 15.04 48.35 46.59


aCt aSPIRE NORmS

11.8

Table 11.5. 2014 aCt aspire Science Norm group DemographicsGrade (%)

3 (n=33,076)

4 (n=34,135)

5 (n=33,317)

6 (n=31,928)

7 (n=32,618)

8 (n=35,162)

9 (n=22,238)

10 (n=18,616)

Gender

F 45.24 44.98 45.81 44.92 44.15 45.35 44.29 44.08

M 46.72 47.18 46.59 45.72 45.37 45.31 43.05 43.60

No response 8.04 7.84 7.59 9.36 10.48 9.34 12.66 12.32

State

AL 63.30 66.95 62.04 57.38 52.50 49.04 5.81 5.23

AR 0.05 0.05 0.05 0.07 0.05 0.06 0.07 0.09

AZ — — — — — 0.16 0.31 0.27

CA 0.24 0.24 3.16 3.19 0.29 0.18 0.36 1.21

CO 2.68 2.31 2.13 1.78 1.64 2.00 2.31 2.19

CT 0.15 0.19 0.20 0.19 0.13 0.17 1.17 1.39

FL 0.07 0.12 — — — 0.11 0.99 0.61

GA 0.08 0.09 0.14 0.12 — — — —

IA 0.60 0.62 0.61 0.39 0.46 0.26 0.24 1.20

IL 1.76 1.58 1.72 1.90 2.06 1.81 3.98 4.26

IN 0.93 0.91 0.80 2.27 1.91 0.71 0.88 1.62

KS 1.86 1.71 1.67 3.16 2.80 1.76 2.32 2.35

KY 5.31 5.59 4.78 3.45 4.06 2.92 5.27 1.44

LA 2.36 2.20 2.19 2.22 1.74 1.67 3.11 3.01

MA — — — — — — — 0.09

MI 3.99 2.74 4.35 6.12 9.31 17.46 22.21 27.14

MN 0.34 0.32 0.37 1.02 0.42 0.39 — 0.56

MO 1.85 1.50 1.91 1.84 1.67 1.46 2.76 3.24

MS 2.84 2.55 2.58 3.11 2.45 2.10 2.62 1.63

ND — — — — — 0.24 0.17 0.32

NE — — — — — — 1.65 1.97

NJ — — — — — — 0.44 —

NM — — — — — — 0.60 —

NV — — — — — 0.10 0.06 0.07


aCt aSPIRE NORmS

11.9


3 (n=33,076)

4 (n=34,135)

5 (n=33,317)

6 (n=31,928)

7 (n=32,618)

8 (n=35,162)

9 (n=22,238)

10 (n=18,616)

OH 1.68 1.27 1.32 1.51 4.43 2.07 9.92 6.53

OK 0.92 0.94 0.77 1.25 1.28 1.10 0.13 0.10

PA — — — — — — 0.98 1.26

SC 3.91 3.79 4.00 4.12 3.98 3.89 1.81 2.05

SD — — 0.22 — — 0.23 — 0.25

TN 1.42 1.51 1.56 1.61 2.18 2.06 4.46 5.12

TX 2.12 1.27 1.78 1.37 2.07 2.44 2.77 4.79

UT 0.68 0.63 0.65 0.86 1.54 1.03 0.71 —

VA — — — 0.09 — 0.27 — —

WI 0.89 0.90 1.00 0.98 2.96 4.22 21.55 19.72

No response — — — 0.01 0.07 0.09 0.32 0.28

Race/Ethnicity



0.98 0.86 0.61 1.04 0.47 0.78 0.65 0.70

White 43.12 44.11 40.92 40.85 38.62 43.42 35.12 37.13

Hispanic/Latino 6.35 5.72 5.35 6.79 6.95 5.72 4.99 4.40

Asian 1.57 1.39 1.34 1.23 1.23 1.32 1.34 2.01


0.64 0.54 1.00 0.78 0.58 0.45 0.44 0.40

Two or more races 0.47 0.35 0.34 0.43 0.77 0.49 0.54 0.79

No response 25.18 24.70 27.12 27.01 28.38 29.04 48.75 46.85

Note: — indicates no students tested.

aCt aSPIRE NORmS

11.10

Table 11.6. 2014 aCt aspire Writing Norm group Demographics

Grade (%)

3 (n=25,301)

4 (n=26,867)

5 (n=27,071)

6 (n=29,045)

7 (n=30,935)

8 (n=32,984)

9 (n=15,361)

10 (n=13,144)

Gender

F 47.21 45.59 46.30 46.83 46.20 46.68 43.84 42.73

M 46.01 47.74 47.12 47.15 47.07 46.36 41.22 41.68

No response 6.78 6.67 6.58 6.01 6.73 6.96 14.94 15.59

State

AL 61.82 66.06 60.88 61.58 56.81 55.33 — —

AR 0.07 0.07 0.06 0.08 0.05 0.06 0.10 0.13

AZ — — — — — 0.17 0.47 0.39

CA 0.39 0.30 4.14 3.55 0.37 0.18 0.44 1.34

CO 3.36 3.10 2.79 2.06 1.87 2.18 3.19 2.91

CT 0.20 0.08 0.10 0.05 0.10 0.06 1.74 1.45

FL 0.07 0.15 — — — 0.11 1.38 0.73

GA 0.10 0.11 0.17 0.12 — — — —

IA 0.13 0.12 0.13 0.12 0.34 0.09 0.17 1.12

IL 2.09 1.88 2.07 2.00 1.95 1.81 5.06 5.25

IN 0.83 0.49 0.75 2.20 1.92 0.79 1.35 2.58

KS 2.21 2.13 1.94 3.39 2.83 1.72 2.87 3.38

KY 6.40 6.32 5.10 3.43 3.81 2.62 5.81 0.86

LA 2.81 2.64 2.56 2.25 1.74 1.29 3.71 3.25

MA — — — — — — — 0.13

MI 1.78 1.24 2.75 3.47 7.98 14.88 27.43 30.48

MN 0.44 0.41 0.46 0.63 0.44 0.42 — 0.68

MO 2.49 1.73 2.56 1.93 1.80 1.59 3.54 4.50

MS 2.99 2.77 2.12 2.52 2.09 1.76 1.87 1.38

ND — — — — — 0.26 — 0.12

NE — — — — — — 0.10 0.17

NJ — — — — — — 0.64 —

NM — — — — — — 0.77 —

NV — — — — — — 0.09 0.10

OH 1.91 1.54 1.39 1.55 3.98 1.73 9.35 6.77


aCt aSPIRE NORmS

11.11


Grade (%)

3 (n=25,301)

4 (n=26,867)

5 (n=27,071)

6 (n=29,045)

7 (n=30,935)

8 (n=32,984)

9 (n=15,361)

10 (n=13,144)

OK 0.98 1.08 0.94 1.34 1.27 1.14 0.19 0.14

PA — — — — — — 1.41 1.73

SC 4.88 4.78 4.92 4.50 4.15 4.15 2.67 2.87

SD — — 0.26 — — 0.25 — —

TN 0.53 0.52 0.64 0.67 0.61 0.67 0.87 1.71

TX 2.57 1.53 2.22 1.39 2.16 2.25 3.32 7.44

UT 0.37 0.38 0.37 0.49 1.10 0.48 — —

VA — — — 0.10 — 0.29 — —

WI 0.57 0.57 0.69 0.56 2.61 3.62 21.00 18.00

No response — — — 0.01 — 0.09 0.45 0.40

Race/Ethnicity



1.00 0.89 0.99 1.05 1.16 0.95 0.60 0.70

White 43.80 45.24 43.80 42.16 43.24 46.71 31.91 34.24

Hispanic/Latino 6.62 6.35 5.94 7.56 7.45 6.16 4.95 5.58

Asian 1.36 1.22 1.16 1.05 1.17 1.13 0.92 1.64


0.25 0.19 0.68 0.46 0.23 0.11 0.05 0.11

Two or more races 0.42 0.33 0.33 0.39 0.77 0.48 0.59 1.01

No response 21.88 20.82 23.99 22.02 22.80 22.07 54.37 50.23


aCt aSPIRE NORmS

11.12

Table 11.7. 2014 aCt aspire English Norms: Percentage of Students at or below Each Scale Score

Scale Score

Grade

3 4 5 6 7 8 9 10

400 1 1 1 1 1 1 1 1

401 1 1 1 1 1 1 1 1

402 1 1 1 1 1 1 1 1

403 1 1 1 1 1 1 1 1

404 1 1 1 1 1 1 1 1

405 2 1 1 1 1 1 1 1

406 3 1 1 1 2 1 1 1

407 5 2 1 2 2 1 2 1

408 8 2 2 2 4 2 2 2

409 12 4 2 4 4 3 3 2

410 17 6 4 5 6 3 4 3

411 24 9 5 6 7 4 5 4

412 29 13 7 8 8 6 7 5

413 36 16 10 10 10 6 8 6

414 43 21 15 13 12 8 10 7

415 50 27 18 15 14 10 13 9

416 56 31 23 18 17 11 15 10

417 63 38 27 22 20 14 16 12

418 66 44 32 26 21 17 19 14

419 72 52 37 32 26 21 22 16

420 77 55 44 35 29 22 25 19

421 81 61 48 42 33 27 28 21

422 83 69 52 46 38 31 30 22

423 88 72 60 50 43 36 33 25

424 91 77 65 56 44 39 37 28

425 93 84 68 60 50 42 40 31

426 95 85 72 66 55 47 43 33

427 95 90 79 68 60 53 47 37

428 97 92 80 73 62 55 50 40

429 97 95 86 77 67 61 54 43

430 99 96 86 81 73 67 57 47

431 99 98 91 84 75 67 61 50

432 99 98 91 87 78 73 65 53

433 99 99 95 89 83 76 68 57

434 99 99 95 92 85 78 71 60

435 100 99 97 93 87 83 74 63

436 99 98 95 91 86 77 67

437 99 98 96 92 88 80 70

438 100 99 97 94 90 82 72

aCt aSPIRE NORmS

11.13


Scale Score

Grade

3 4 5 6 7 8 9 10

439 99 98 96 92 84 75

440 99 98 97 94 87 79

441 99 99 98 95 89 81

442 100 99 99 97 91 84

443 99 99 97 93 87

444 99 99 99 94 90

445 99 99 99 96 92

446 99 99 99 97 94

447 99 99 99 98 95

448 100 99 99 99 96

449 99 99 99 98

450 100 99 99 98

451 99 99 99

452 100 99 99

453 99 99

454 99 99

455 99 99

456 100 100

aCt aSPIRE NORmS

11.14

Table 11.8. 2014 aCt aspire mathematics Norms: Percentage of Students at or below Each Scale Score

Scale Score

Grade

3 4 5 6 7 8 9 10

400 1 1 1 1 1 1 1 1

401 1 1 1 1 1 1 1 1

402 1 1 1 1 1 1 1 1

403 2 1 1 1 1 1 1 1

404 3 1 1 1 1 1 1 1

405 5 1 1 1 1 1 1 1

406 8 1 1 1 2 1 1 1

407 11 3 1 2 3 1 1 1

408 16 5 3 2 5 2 1 1

409 22 6 5 3 5 3 2 2

410 32 10 8 5 8 5 2 2

411 39 17 10 8 12 7 4 4

412 50 23 16 11 13 10 6 6

413 58 34 21 16 19 13 8 6

414 69 45 31 22 25 17 11 9

415 76 55 40 29 32 24 14 12

416 84 65 50 35 39 29 18 15

417 89 74 60 42 43 34 23 19

418 94 80 64 50 50 41 28 23

419 96 85 70 57 56 46 32 26

420 98 90 76 63 61 50 37 30

421 99 93 82 70 66 55 39 32

422 99 95 87 75 71 61 44 36

423 99 96 88 79 75 65 49 40

424 99 98 91 83 79 69 53 44

425 99 98 94 86 82 72 58 48

426 99 99 95 89 86 75 61 51

427 99 99 97 91 89 79 65 54

428 99 99 98 94 91 82 69 58

429 100 99 98 95 93 84 72 61

430 100 99 99 96 94 87 75 65

431 100 99 99 97 96 88 79 68

432 100 99 99 98 97 90 82 72

433 100 99 99 99 97 92 84 75

434 100 100 99 99 98 94 88 79

435 100 99 99 98 95 90 82

436 100 99 99 99 96 92 85

437 100 99 99 99 97 93 87

438 100 99 99 99 98 95 89

aCt aSPIRE NORmS

11.15


Scale Score

Grade

3 4 5 6 7 8 9 10

439 100 99 99 99 98 96 91

440 100 99 99 99 99 97 93

441 99 99 99 99 98 95

442 99 99 99 99 98 96

443 99 99 99 99 99 97

444 99 99 99 99 99 98

445 100 99 99 99 99 99

446 100 99 99 99 99 99

447 99 99 99 99 99

448 99 100 99 99 99

449 99 100 99 99 99

450 100 100 99 99 99

451 100 100 99 99 99

452 100 100 99 99

453 100 100 99 99

454 100 99 99

455 100 99 99

456 100 99 99

457 99 100

458 99 100

459 100 100

460 100 100

aCt aSPIRE NORmS

11.16

Table 11.9. 2014 aCt aspire Reading Norms: Percentage of Students at or below Each Scale Score

Scale Score

Grade

3 4 5 6 7 8 9 10

400 1 1 1 1 1 1 1 1

401 1 1 1 1 1 1 1 1

402 1 1 1 1 1 1 1 1

403 2 1 1 1 1 1 1 1

404 5 1 1 1 1 1 1 1

405 8 3 2 1 1 1 1 1

406 14 6 3 2 1 1 1 1

407 23 10 5 4 3 2 2 2

408 28 14 7 6 5 2 4 3

409 36 20 10 9 5 4 4 4

410 43 25 16 13 9 5 8 6

411 48 31 21 16 13 8 11 10

412 54 36 26 18 16 10 15 13

413 61 45 31 22 21 13 16 14

414 66 51 35 26 25 17 21 17

415 72 57 40 31 31 20 26 21

416 77 63 49 35 33 23 30 24

417 82 68 55 40 38 27 34 28

418 87 75 61 46 44 32 39 31

419 91 80 67 50 48 36 43 35

420 94 84 73 59 53 41 47 39

421 95 86 78 61 59 47 52 43

422 96 90 80 67 65 49 57 47

423 98 93 84 73 71 55 61 51

424 99 96 86 78 76 58 62 52

425 99 97 91 83 81 64 66 57

426 99 98 94 87 86 70 71 62

427 99 99 94 89 90 76 75 66

428 99 99 97 93 93 82 80 72

429 100 99 98 96 96 87 84 77

430 99 99 96 96 89 89 82

431 100 99 98 98 91 89 83

432 99 99 99 95 92 87

433 99 99 99 97 95 92

434 100 99 99 98 97 95

435 99 99 99 99 98

436 100 99 99 99 98

437 99 99 99 99

438 100 99 99 99

aCt aSPIRE NORmS

11.17


Scale Score

Grade

3 4 5 6 7 8 9 10

439 99 99 99

440 100 99 99

441 99 99

442 100 100

aCt aSPIRE NORmS

11.18

Table 11.10. 2014 aCt aspire Science Norms: Percentage of Students at or below Each Scale Score

Scale Score

Grade

3 4 5 6 7 8 9 10

400 1 1 1 1 1 1 1 1

401 1 1 1 1 1 1 1 1

402 1 1 1 1 1 1 1 1

403 2 1 1 1 1 1 1 1

404 4 2 1 1 1 1 1 1

405 6 3 2 2 1 1 1 1

406 10 6 3 2 2 1 1 1

407 15 8 4 4 3 1 1 1

408 22 12 6 5 5 2 2 2

409 28 15 8 8 7 3 2 2

410 34 18 11 10 10 5 4 4

411 39 22 14 13 14 7 4 4

412 46 26 18 17 18 9 7 6

413 51 31 22 22 21 12 7 7

414 58 37 26 25 26 15 11 9

415 62 42 30 29 28 19 16 14

416 66 47 34 34 32 23 17 15

417 71 54 39 38 38 27 22 19

418 76 60 45 42 42 28 24 21

419 81 65 51 47 46 33 29 25

420 84 70 57 51 50 37 35 29

421 88 75 63 57 56 42 40 34

422 91 79 68 62 60 46 45 38

423 94 84 74 67 64 51 46 39

424 95 87 78 72 67 57 50 42

425 97 90 83 77 72 62 54 46

426 98 93 89 81 76 66 59 50

427 99 95 92 85 80 70 63 53

428 99 97 94 89 83 75 67 57

429 99 98 96 91 86 79 70 61

430 99 99 97 93 88 82 76 66

431 99 99 98 95 91 85 79 69

432 99 99 99 96 94 88 82 73

433 100 99 99 98 96 90 84 76

434 99 99 99 98 92 87 79

435 99 99 99 99 95 91 83

436 100 99 99 99 96 92 86

437 99 99 99 97 94 88

438 100 99 99 98 96 91

aCt aSPIRE NORmS

11.19


Scale Score

Grade

3 4 5 6 7 8 9 10

439 99 99 99 97 94

440 100 99 99 98 95

441 99 99 99 97

442 99 99 99 98

443 100 99 99 99

444 99 99 99

445 99 99 99

446 100 99 99

447 99 99

448 99 99

449 100 100

aCt aSPIRE NORmS

11.20

Table 11.11. 2014 aCt aspire Writing Norms: Percentage of Students at or below Each Scale Score

Scale Score

Grade

3 4 5 6 7 8 9 10

408 1 1 2 1 1 2 1 1

409 1 1 2 2 1 2 1 1

410 4 4 5 2 5 4 5 4

411 4 4 5 2 5 5 5 4

412 7 5 7 3 9 7 7 6

413 7 5 7 3 9 7 7 6

414 11 7 8 7 11 9 9 7

415 11 7 8 7 12 9 9 7

416 20 14 18 14 19 19 13 10

417 20 14 18 15 19 19 13 10

418 33 27 27 18 29 31 26 20

419 33 29 27 19 30 31 26 20

420 51 40 35 22 40 39 31 24

421 51 40 35 22 41 39 31 24

422 63 51 40 35 49 44 35 27

423 63 51 40 42 49 44 35 27

424 74 65 58 51 65 61 47 37

425 75 65 58 51 65 61 47 37

426 83 80 73 58 73 74 65 55

427 84 81 74 58 74 74 65 55

428 89 87 78 61 81 81 71 61

429 89 87 79 61 82 81 71 61

430 94 92 82 64 86 84 76 66

431 94 92 82 80 87 84 76 66

432 97 97 93 87 95 95 88 81

433 97 97 93 89 95 95 88 81

434 98 98 96 91 96 97 95 90

435 98 98 96 91 96 97 95 90

436 99 99 97 92 98 98 96 92

437 99 99 97 93 98 98 96 92

438 99 99 98 94 98 98 97 94

439 99 99 98 94 98 98 97 94

440 100 100 100 99 99 99 99 98

441 99 99 99 99 98

442 99 99 99 99 99

443 99 99 99 99 99

444 99 99 99 99 99

445 99 99 99 99 99

aCt aSPIRE NORmS

11.21


Scale Score

Grade

3 4 5 6 7 8 9 10

446 99 99 99 99 99

447 99 99 99 99 99

448 100 100 100 100 100

12.1

CHaPtER 12

EPAS® to ACT Aspire ConcordanceThis chapter presents a concordance study between the ACT Educational Planning and Assessment System (EPAS®) 1–36 scale, which consists of the ACT (grades 11–12) and two legacy tests, ACT Explore® (grades 8–9) and ACT Plan® (grade 10), and the three-digit ACT Aspire scale. The concordance study established a direct link between scores on EPAS and ACT Aspire. This link was used to facilitate a smooth transition to ACT Aspire for users of ACT Explore and ACT Plan, to establish the ACT Readiness Benchmarks for grades 3–11 (described in Chapter 13), and to establish the Progress toward Career Readiness indicator (Chapter 15).

EPAS is an integrated series of paper-administered, curriculum-based tests of educational development with selected-response items typically taken by students from grade 8 through high school. ACT Aspire, on the other hand, is offered both on paper and online; includes selected-response, constructed-response, and technology-enhanced item types; and is a vertically-articulated, benchmarked, and standards-based system of assessments that can be taken by students from grades 3 through early high school. The grade 8 through early high school tests in EPAS and ACT Aspire are intended to measure similar constructs but differ in test specifications, which is a circumstance where concordances are an applicable type of scale alignment (Holland & Dorans, 2006).

EPaS® tO aCt aSPIRE CONCORDaNCE

12.2

12.1 MethodThe concordance analysis linked relating scores from ACT Explore, ACT Plan, and the ACT to ACT Aspire using percentile ranks, where concorded scores are defined as those having the same percentage of students at or below a given score with respect to the group of students used in the study. The collection of concorded scores is referred to as a concordance table, which is useful for determining the cut scores on one test that result in approximately the same proportion of students identified by the other test although not necessarily the same students.

The students in the concordance analysis were from the spring 2013 special studies in which participating students took ACT Aspire English, mathematics, reading, or science tests plus historical ACT Explore, ACT Plan, or the ACT assessments from the 2012–2013 academic year. Note that grade 11 students took the ACT plus a separately timed section of constructed-response items designed as an add-on to the ACT (see Appendix B). The constructed-response portion was combined with an operationally administered form of the ACT (selected response only) to obtain scores on the ACT Aspire scale. This version of the ACT is referred to as the ACT with constructed-response tests. The final analysis sample included students from grade 7 and above who took one of the EPAS tests (ACT Explore, ACT Plan, or the ACT) and the ACT Aspire test in the same subject and grade level (which included the ACT with constructed response test for grade 11 students).

Since both EPAS and ACT Aspire are vertically scaled assessments covering different grade spans, the concordance was established to link between two scales (i.e., the 1–36 EPAS scale to the ACT Aspire 400+ scale), rather than between two tests (e.g., between ACT Explore and ACT Aspire grade 8 test).

The concordance relationship between EPAS and ACT Aspire scales was estimated using the equipercentile method described by Kolen and Brennan (2014). This method uses the percentile ranks (i.e., the proportion of students at or below each score) to define the relationship between the two scales. For a particular EPAS score, the corresponding ACT Aspire score is the one that has the same percentile rank.

Evaluation of Results

The derived concordance table was applied to the existing datasets with students’ ACT Explore and ACT Plan scores. The purpose of the evaluation was to examine distributions of the concorded ACT Aspire scale scores that should be similar to those of the original EPAS scale scores. Two evaluation samples were used. One sample was composed of cross-sectional data in the academic year of 2013–2014 with the cohort of grade 8 students who took ACT Explore and the cohort of grade 10 students who took ACT Plan. The other sample was a longitudinal data set including the cohort of grade 8 students in the academic year of 2010–2011 who took ACT Explore in that year and took ACT Plan two years later when they were in grade 10. The following properties were examined:


12.3

1. Using the cross-sectional data, compare distributions of ACT Explore/ACT Plan scores against the distributions of concorded ACT Aspire scale scores. The purpose of this analysis was to verify that ACT Explore/ACT Plan and concorded ACT Aspire score distributions did not appear to differ.

2. Using the longitudinal data, create box plots of ACT Plan scores conditional on each ACT Explore score point, and create the same set of plots based on the concorded ACT Aspire scale scores. The purpose of this analysis was to examine whether the relationship between grade 8 and grade 10 scores of the same cohort stayed the same when the concorded ACT Aspire scores were used.

3. Using the longitudinal data, compute effect sizes from grade 8 to grade 10, based on the EPAS scale score and the concorded ACT Aspire scale scores. The effect size was computed as the scale score difference between the two grades divided by the square root of the average variances. The purpose of this analysis was to examine whether the magnitude of growth measured by effect size was similar when the concordance was applied.

12.2 ResultsTable 12.1 presents the n-counts of the analysis sample, the correlation between the EPAS and ACT Aspire scale scores, and descriptive statistics of scores on both scales. As shown, a sample of over 16,000 students who had both EPAS and ACT Aspire scores was used to conduct the concordance analysis for four subject areas: English, mathematics, reading and science. Correlations between the two scale scores are higher in English and mathematics than in reading and science. Table 12.2 displays the n-counts and percentages of students by gender, race/ethnicity, and geographic region for the sample used in the concordance analysis. The scatter plots between the two sets of scores, shown in Figure 12.1, indicate a linear relationship.

Table 12.1. Descriptive Statistics of the Sample for Concordance analysis

Subject N

Correlation between

EPAS and ACT Aspire

EPAS ACT Aspire

Mean SD Min Max Mean SD Min Max

English 18,204 .81 16.04 4.87 1 36 427.68 10.24 400 460

Mathematics 17,787 .80 16.50 4.37 1 36 423.44 8.62 401 457

Reading 16,922 .70 16.02 4.72 1 36 420.74 7.57 401 446

Science 17,102 .74 17.62 3.87 1 36 423.06 8.32 400 449


12.4

Table 12.2. Demographic Information of the Sample for Concordance analysis

Demographic Grouping %

Subject (N)

English (18,204)

Mathematics (17,787)

Reading (16,922)

Science (17,102)

Gender

Not Specified 19 16 16 17

Female 40 41 41 41

Male 41 42 42 42

Race/Ethnicity

Not Specified 40 38 39 39

Black/African American 19 20 20 19

American Indian/Alaskan Native 1 1 1 1

White 34 35 34 35

Hispanic/Latino 4 4 4 4

Asian 1 1 1 1

Native Hawaiian/Other Pacific Islander 1 1 1 1

Two or More Races 1 1 1 1

Geographic Region

East 56 55 57 57

Midwest 34 35 34 34

Southwest 6 5 5 5

West 4 4 3 4


12.5

Figure 12.1. Scatter plots of aCt aspire scale scores versus EPaS scale scores

English

Mathematics


12.6


Reading

Science


12.7

Evaluating Results

Descriptive statistics of ACT Explore and the concorded ACT Aspire scores in the cross-sectional evaluation sample are shown in Table 12.3, and the scores for ACT Plan are shown in Table 12.4. Figure 12.2 presents the distributions of ACT Explore and concorded ACT Aspire scores, and Figure 12.3 presents similar information for ACT Plan scores. It can be seen that the concorded score distributions match those of the original scores.

For each pair of panels in Figure 12.4, the top panel presents the box plots of ACT Plan scores conditional on ACT Explore scores using the longitudinal evaluation sample. The bottom panel presents the plots using the concorded ACT Aspire score. As shown, the relationship between students’ grade 10 and grade 8 performance is highly similar when the concordance was applied.

Table 12.3. Descriptive Statistics of aCt Explore and Concorded aCt aspire Scores in Cross-Sectional Evaluation Sample

Subject N

ACT Explore Concorded ACT Aspire


English 1,034,067 14.40 4.37 1 25 424.17 9.64 400 445

Math 1,033,788 15.40 3.73 1 25 421.33 7.71 400 439

Reading 1,032,226 14.56 3.99 1 25 418.23 7.05 400 434

Science 1,030,623 16.54 3.33 1 25 420.78 7.51 400 438

Table 12.4. Descriptive Statistics of aCt Plan and Concorded aCt aspire Scores in Cross-Sectional Evaluation Sample

Subject N

ACT Plan Concorded ACT Aspire


English 1,251,757 16.94 4.72 1 32 429.59 9.72 400 453

Math 1,251,809 17.74 4.85 1 32 425.83 9.25 400 452

Reading 1,250,597 17.15 4.66 1 30 422.66 7.21 400 437

Science 1,250,022 18.58 4.13 1 32 425.01 8.53 400 447


12.8

Figure 12.2. Distributions of aCt Explore scale scores and concorded aCt aspire scale scores


12.9



12.10

Figure 12.3. Distributions of aCt Plan scale scores and concorded aCt aspire scale scores


12.11



12.12

Figure 12.4. box plots of aCt Plan Scores (or concorded aCt aspire scale scores) conditional on aCt Explore Scores (or concorded aCt aspire Scores)

English—EPAS Scale

English—Concorded ACT Aspire Scale


12.13


Mathematics—EPAS Scale

Mathematics—Concorded ACT Aspire Scale


12.14


Reading—EPAS Scale

Reading—Concorded ACT Aspire Scale


12.15


Science—EPAS Scale

Science—ACT Aspire Concorded Scale


12.16

Descriptive statistics of the ACT Explore and ACT Plan scale scores for the longitudinal sample are presented in Table 12.5. The corresponding concorded ACT Aspire scores are presented in Table 12.6. Effect sizes, computed from both scale scores shown in the last column for each table, are very similar between the EPAS scale and the concorded ACT Aspire scale.

Table 12.5. Descriptive Statistics of aCt Explore grade 8 and aCt Plan grade 10 Scale Scores in longitudinal Data

Subject N

Grade 8—ACT Explore Grade 10—ACT PlanEffect SizeMean SD Min Max Mean SD Min Max

English 572,302 14.66 4.17 1 25 17.10 4.42 1 32 .57

Mathematics 572,051 15.63 3.46 1 25 17.84 4.57 1 32 .55

Reading 571,328 14.80 3.91 1 25 17.19 4.48 1 30 .57

Science 570,063 16.80 3.31 1 25 18.52 3.77 1 32 .48

Table 12.6. Descriptive Statistics of Concorded aCt aspire Scale Scores in longitudinal Data

Subject N

Grade 8—Concorded ACT Aspire

Grade 10—Concorded ACT Aspire

Effect SizeMean SD Min Max Mean SD Min Max

English 572,302 424.79 9.31 400 445 430.05 9.08 400 453 .57

Mathematics 572,051 421.85 7.24 400 439 426.12 8.76 400 452 .53

Reading 571,328 418.73 6.93 400 434 422.69 6.96 400 437 .57

Science 570,063 421.39 7.54 400 438 425.10 8.04 400 447 .48

The final derived EPAS to ACT Aspire concordance table is presented in Table 12.7.

12.3 SummaryThe primary purpose of the concordance analysis was to facilitate the transition from ACT Explore and ACT Plan assessments to ACT Aspire assessment. While the EPAS to ACT Aspire concordance tables provide a link between the two scales, the concordance should be used cautiously. For example, the sample of students included in the concordance analysis may not be representative of all EPAS or ACT Aspire test-takers. In addition, population dependence often results when the two tests that are linked do not measure the same construct. It is generally inappropriate to use the concorded scores to estimate individual student performance on the EPAS/ACT Aspire tests because scores resulting from concordance analysis are not viewed as interchangeable with actual scores. It is preferable to use actual scores from the EPAS or ACT Aspire when possible.


12.17

Table 12.7. EPaS to aCt aspire Concordance

EPAS Scale Score

Concorded ACT Aspire Scale Score

English Mathematics Reading Science

1 400 400 400 400

2 401 402 401 401

3 402 403 402 402

4 403 404 403 403

5 404 406 404 403

6 405 407 405 404

7 406 408 406 405

8 408 409 407 406

9 410 409 408 407

10 414 411 410 408

11 417 412 411 409

12 419 414 413 410

13 422 415 415 412

14 424 417 418 414

15 426 420 420 416

16 428 422 422 419

17 430 425 424 422

18 433 428 425 425

19 435 431 427 427

20 437 432 428 430

21 439 434 429 432

22 440 435 430 434

23 442 437 431 435

24 443 438 432 436

25 445 439 434 438


12.18


EPAS Scale Score

Concorded ACT Aspire Scale Score

English Mathematics Reading Science

26 447 440 434 439

27 448 441 435 440

28 449 443 435 441

29 450 444 436 442

30 451 446 437 443

31 451 449 437 445

32 453 452 438 447

13.1

CHaPtER 13

ACT Readiness BenchmarksThis chapter presents the method and procedures used to set up the ACT Readiness Benchmarks, which provide information about students’ college and career readiness by grade 11. The ACT Readiness Benchmarks for grades 8 through 10 were derived from the EPAS to ACT Aspire concordance tables. The corresponding concorded ACT Aspire scale scores for the ACT College Readiness Benchmarks were taken as the ACT Readiness Benchmarks on the ACT Aspire scale. The grade 8 ACT Readiness Benchmarks were then used to obtain the ACT Readiness Benchmarks for grades 3–7 using a z-score backmapping procedure. The backmapped benchmarks will be reviewed and may be updated once longitudinal data are available.

13.1 IntroductionThe ACT College Readiness Benchmarks are cornerstones of the ACT and the legacy assessments ACT Explore (grades 8 and 9) and ACT Plan (grade 10), which together form EPAS. The Benchmarks were established to reflect college and career readiness. The Benchmark on each of four subject tests of the ACT (English, mathematics, reading, and science) are the score on the 1–36 EPAS scale at which students have a 50% probability of attaining a grade of B or higher or about a 75% chance of obtaining a C or higher in selected credit-bearing first-year college courses (for additional details, see ACT, 2014b).

The ACT Readiness Benchmarks used with ACT Aspire were created to be aligned with the ACT College Readiness Benchmarks used with EPAS. Similar to EPAS, each ACT Aspire grade and subject has its own ACT Readiness Benchmark. Students at or above the benchmark are on target to meet the corresponding ACT College Readiness Benchmarks in grade 11.

aCt REaDINESS bENCHmaRKS

13.2

13.2 ACT Readiness Benchmarks for English, Mathematics, Reading, and ScienceFor English, mathematics, reading, and science, the ACT Readiness Benchmarks for grades 8 through 10 were derived using the EPAS to ACT Aspire concordance tables (see Chapter 12). Benchmarks for grades 3–7 were created using a backmapping procedure.

13.2.1 grades 8–10A concordance between the EPAS 1–36 scale and the ACT Aspire three-digit scale was used to establish ACT Readiness Benchmarks for grades 8–10 using ACT College Readiness Benchmarks that had already been established for EPAS. The concordance study included students taking both the EPAS and ACT Aspire tests. The concordance between EPAS scale scores and ACT Aspire scale scores was obtained using equipercentile linking. The ACT College Readiness Benchmarks for grades 8, 9, and 10 were used to identify the corresponding concorded ACT Aspire scores, which were then defined as the ACT Readiness Benchmarks for grades 8–10. Note that the concorded benchmarks derived for grades 8–10 may be updated as more longitudinal data become available.

The ACT College Readiness Benchmarks used in this analysis were obtained by using an alternate set of benchmarks than those typically reported for ACT Explore and ACT Plan. This alternate set of benchmarks was based on students tested in spring, while EPAS benchmarks typically used for grades 8–10 are based on students tested in fall. ACT Aspire is administered in both fall and spring, but it is expected that benchmarks would differ slightly within the same grade due to additional instruction and academic achievement, which occurs throughout the school year. Spring benchmarks were used because they anchored the ACT Readiness Benchmarks to spring levels of educational development, near the end of a particular grade level when ACT Aspire was anticipated to be most commonly administered. Therefore, the ACT Readiness Benchmarks used for ACT Aspire are based on students tested in spring and reflect performance levels of students toward the end of an academic year. This should be kept in mind when interpreting results from fall testing. Table 13.1 presents the ACT College Readiness Benchmarks on the 1–36 scale for grades 8, 9, and 10 used in this study and the corresponding ACT Readiness Benchmarks, which are the concorded ACT Aspire scores obtained from the concordance. As shown in the table, ACT College Readiness Benchmarks and ACT Readiness Benchmarks are based on scale scores, and only one set of such benchmarks are available. Students’ scale scores are made comparable through the equating process regardless if they take the ACT Aspire tests online or on paper. For more details about equating, see Chapter 20.


13.3

Table 13.1. aCt College Readiness benchmarks and aCt Readiness benchmarks for grades 8–10

Subject GradeACT College

Readiness Benchmark (Spring)ACT Readiness Benchmark

(Concorded ACT Aspire Score)

English 8 13 422

9 15 426

10 16 428

Mathematics 8 17 425

9 18 428

10 20 432

Reading 8 17 424

9 18 425

10 20 428

Science 8 19 427

9 20 430

10 21 432

Agreement rates for classifying students at or above both sets of benchmarks were compared using data from the concordance study. There was agreement if students were classified into the same category using both the EPAS benchmarks and the ACT Aspire benchmarks for their particular grade. As shown in Table 13.2, the agreement rates for all subjects were at or above 80%.

Table 13.2. Classification agreement Rate between aCt aspire and EPaS benchmarks

ACT Aspire Classification Agreement Rate

ACT Explore ACT Plan

English 81% 80%

Mathematics 82% 85%

Reading 81% 81%

Science 83% 84%

13.2.2 grades 3–7ACT Readiness Benchmarks for grades 3–7 in English, mathematics, reading, and science were backmapped from the grade 8 ACT Readiness Benchmark described above. Backmapping used a z-score approach, which involved identifying the ACT Aspire scores at a standardized score (z-score) that corresponded to the z-score for the grade 8 ACT Readiness Benchmark. Specifically, a z-score (z8) corresponding to the grade 8 ACT Readiness Benchmark was computed based on the difference between this benchmark score (x) and mean (μ) of grade 8 scale scores and divided by the standard deviation (σ) of those scores: z8 = (x – μ)/σ. This z-score was then


13.4

used to obtain the Readiness Benchmarks for grades 3–7 (xj ) based on the standard deviation (σj ) and mean (μj ) of the scale scores for those grades, respectively, with the formula xj = z8σj + μj , where j represents a grade ( j = 3, 4, 5, 6, 7). Spring 2013 ACT Aspire special study data were used to create benchmarks for grades 3–7. Table 13.3 shows the backmapped ACT Readiness Benchmarks for grades 3–7. Note that the backmapped benchmarks derived for grades 3–7 will be reviewed and may be updated as more longitudinal data become available.

Table 13.3. aCt Readiness benchmarks for grades 3–7

Grade English Mathematics Reading Science

3 413 413 415 418

4 417 416 417 420

5 419 418 420 422

6 420 420 421 423

7 421 422 423 425

13.3 ACT Readiness Benchmark for WritingThe ACT Readiness Benchmark for writing was set as the scale score corresponding to raw scores at a cut determined by ACT content experts, based on their knowledge, expertise, and experience with writing content and their development of the rubrics used to score writing prompts. A domain score of 4 formed the basis for the writing cut, which corresponds to performance consistent with grade-level expectations on the rubric. Recall that writing scores consist of four domains, each scored on a rubric with scores from 1 to 5 (grades 3–5) or 1 to 6 (grades 6–10). Content experts determined that a domain score of 4 was at the expected readiness level for each grade level. Across the four domains, if a student obtained domain scores of two fours and two threes, the student’s score would be considered at the cut for identifying ready performance on the spring 2013 writing base form. This pattern of four domain scores corresponded to a scale score of 428 on the base form. A scale score of 428 is used as the cut score for writing at all grade levels. The writing benchmark may be updated as more longitudinal data become available.


13.5

13.4 ACT Readiness Benchmarks for ELA and STEMThe ACT Readiness Benchmarks for ELA and STEM are computed as the average of the subject Readiness Benchmarks that contribute to each score. ACT Readiness Benchmarks for ELA are the average of the benchmarks for English, reading, and writing tests. Prior to fall 2015, ACT Readiness Benchmarks for STEM were the average of the benchmarks for mathematics and science tests. Table 13.4 lists the ACT Readiness Benchmarks for ELA and STEM across grade levels. Benchmarks could have been established through a separate backmapping process, but the average was used because it was anticipated to be simpler for users, less likely to lead to confusion, and there was no such score or benchmark on the ACT test to backmap from at that time. However, starting from fall 2015, both ELA and STEM scores were reported for the ACT tests and the benchmark for the STEM score was established for the ACT test. Therefore, accordingly, the ACT Aspire STEM benchmark was also updated. Please see the chapter of "2016 Updates" in this technical manual for more details.1

Table 13.4. aCt Readiness benchmarks in Ela and StEm for grades 3–10

Grade ELA STEM

3 419 416

4 421 418

5 422 420

6 423 422

7 424 424

8 425 426

9 426 429

10 428 432

13.5 ACT Readiness LevelsIn addition to the ACT Readiness Benchmarks, students are provided with a description of how they perform relative to the ACT Readiness Benchmarks in English, mathematics, reading, science, and writing. This description is called the ACT Readiness Level and includes four categories defined by three cut scores for each subject and each grade: high cut above the benchmark, the benchmark, and low cut below the benchmark.

1 It is possible for a student to be inconsistent with readiness in a particular subject area. This is due to the compensatory nature of the ELA and STEM scores, where subject area scores are combined to obtain ELA and STEM scores. For example, a student could be below the mathematics benchmark and still be above the STEM benchmark if the student had a relatively high science test score. In such a case, the science test score could pull the STEM score up enough to meet the STEM benchmark.


13.6

The four ACT Readiness Levels include:

1. Exceeding: at or above the high cut

2. Ready: at or above the benchmark and below the high cut

3. Close: at or above the low cut and below the benchmark

4. In Need of Support: below the low cut

Each of the four ACT Readiness Levels is intended to provide a description of how students perform relative to the ACT Readiness Benchmarks in each subject area.

The high and low cuts were set considering the standard error of measurement (SEM; Geisinger, 1991). Specifically, the high cut was set to be two SEMs above the Benchmark, and the low cut was set to be two SEMs below the Benchmark for all subjects except writing. Two SEMs were chosen because it represented a substantial deviation from the benchmark. From a statistical perspective, under typical assumptions, two standard errors from the benchmark represent a roughly 95% confidence interval for the Ready and Close category; we can be about 95% confident that scores falling outside of this range are indeed above or below the benchmark. Of course, values other than two SEMs could have been chosen to define the ACT Readiness Levels, but two SEMs was deemed a reasonable compromise between adequate representation of the descriptors Exceeding and In Need of Support and statistical characteristics indicative of deviation from Ready and Close.

For writing, once the ACT Readiness Benchmark was established, the low and high cuts were set by content experts based on rubrics. Specifically, the low cut was defined as the scale score on the spring 2013 writing base form that corresponded to two threes and two twos on the four domains. The high cut was defined as the scale score on the spring 2013 writing base form that corresponded to two fives and two fours on the four domains. All benchmarks, low cuts, and high cuts are presented in Table 13.5.


13.7

Table 13.5. benchmarks, low Cuts, and High Cuts for aCt Readiness benchmarks by Subject and grade

Subject Tested Grade Low Cut Benchmark High Cut

English 3 408 413 418

4 411 417 423

5 412 419 426

6 413 420 427

7 413 421 429

8 415 422 429

9 419 426 433

10 421 428 435

Mathematics 3 409 413 417

4 411 416 421

5 412 418 424

6 414 420 426

7 416 422 428

8 419 425 431

9 422 428 434

10 426 432 438

Reading 3 411 415 419

4 412 417 422

5 415 420 425

6 416 421 426

7 417 423 429

8 418 424 430

9 419 425 431

10 422 428 434

Science 3 414 418 422

4 415 420 425

5 417 422 427

6 418 423 428

7 420 425 430

8 422 427 432

9 424 430 436

10 426 432 438

Writing 3 420 428 436

4 420 428 436

5 420 428 436

6 420 428 436

7 420 428 436

8 420 428 436

9 420 428 436

10 420 428 436


13.8

13.6 Readiness Ranges for Reporting CategoriesIn order to provide students with more detailed information within each subject, items that measure the same skills and abilities are grouped into reporting categories. For each reporting category, the total number of points possible, the total number of points a student achieved and the percentage of points correct are provided.

In addition, the ACT Readiness Range in each reporting category is provided to show how a student who has met the ACT Readiness Benchmark in a particular subject area would typically perform within the reporting category. In this way students can compare the percentage of points achieved in each category to the percentage of points attained by a typical student who is on track to be ready. If their scores fall below the ACT Readiness Range, they may be in need of additional support.

The minimum of the ACT Readiness Range for each reporting category corresponds to the predicted percent correct on that reporting category at the ACT Readiness Benchmark for that subject test. The maximum corresponds to 100% correct. Regression was used to predict the percent correct on the reporting category using the scale score at the Benchmark point. Note that the ACT Readiness Range will differ across forms.

14.1

CHaPtER 14

ACT Aspire GrowthACT Aspire supports interpretations of student and group-level growth. Score reports include the following components, each of which contributes to interpretations of growth:

• A student’s current and prior-year scores in all tested subjects (English, mathematics, reading, science, and writing)

• Comparison to ACT Readiness Benchmarks (Chapter 13) that indicate whether students are on target to meet the ACT College Readiness Benchmarks in grade 11

• Classification of a student into ACT Readiness Levels (Chapter 13) that describe where a student scores relative to the ACT Readiness Benchmarks

• Predicted score paths that provide ranges for a student’s expected scores in future years (and predicted score ranges on the ACT for grades 9 and 10)

• Classification of student growth as “Low,” “Average,” or “High,” based on student growth percentiles (SGPs).

The latter two components (predicted score paths and SGPs) are described in this chapter. Other topics discussed in this chapter include ACT Aspire gain scores, growth-to-standards models, measurement error of growth scores, and aggregate growth scores for research and evaluation.

14.1 Predicted Score PathsPredicted score paths are reported to enhance understanding of where a student (or a group of students) is likely to score in future years, assuming typical growth. This information can be used:

• To determine if students are likely to meet ACT Readiness Benchmarks over the next two years

aCt aSPIRE gROWtH

14.2

• To identify students who are unlikely to meet a future-year standard (other than the ACT Readiness Benchmark) and thus are candidates for extra academic support

• To predict aggregate future achievement for a classroom, school, district, or state

• To predict ACT score ranges (for grades 9 and 10)

Predicted ACT Aspire score ranges are provided for English, mathematics, reading, science, and writing. For grades 9 and 10, predicted grade 11 ACT scores are provided for the same subjects, as well as ACT Composite scores. Writing scores are not vertically scaled across grade levels, and predicted paths for writing scores only encompass one year.

Figure 14.1. Prototype of aCt aspire Student Progress Report

14.1.1 Student Progress ReportsAn example Student Progress Report that illustrates the predicted path is shown in Figure 14.1. The predicted path is represented by a cone-shaped orange-shaded area that covers two years. In this example, an eighth-grade student scored 417 on the ACT Aspire English assessment. The student’s predicted path covers the score range 416–424 for grade 9 and 415–431 for grade 10. (Note that the numbers forming the predicted score ranges do not appear on the progress report and are subject to change over time). Assuming typical growth, it is estimated that the two-year predicted score ranges cover about 75% of actual future scores. The one-year predicted ranges cover about 50% of actual future scores.

14.1.2 aggregate Progress ReportsPredicted mean scores are used to form aggregate one-year predicted paths for classrooms, schools, and districts. The aggregate predicted paths are drawn as lines connecting the current year mean score to the next year’s predicted mean score.

Predicted mean scores for classrooms, schools, and districts are calculated as the mean of individual student predicted scores. Individual student predicted scores are based on the estimated mean of the test score distribution, conditional on the prior-year test score. A prototype Aggregate Progress Report is shown in Figure 14.2. In this example, the mean ACT Aspire science score was 418 for a group of grade 9 students. The predicted grade 10 mean ACT Aspire science score for the same group is plotted, and the predicted grade 11 mean science score on the ACT is 17.8.

aCt aSPIRE gROWtH

14.3

We summarize the methods used to develop the predicted paths later in this chapter. The development of the predicted paths that were used for reporting in spring 2014 through spring 2015 is documented in a separate report (Allen, 2014).

Figure 14.2. Prototype aCt aspire aggregate Progress Report

14.1.3 Samples used to Develop the Predicted PathsStudents who tested in multiple years form a longitudinal sample of ACT Aspire-tested students and were used to develop predicted paths that are used for ACT Aspire reports. Each year, the longitudinal sample is updated with data from the most recent testing year. Here, we describe the sample used in 2015 for the development of the predicted paths that are used for reporting in spring 2016. The longitudinal sample was formed by matching ACT Aspire student records from spring 2013, spring 2014, and spring 2015. For grades 9–11 and 10–11, the longitudinal samples were formed by matching ACT Aspire student records from spring 2013 and spring 2014 to ACT test records from spring 2014 and spring 2015.

Table 14.1 provides the sample sizes for each grade level and subject area. For writing (all grade levels) and grade 9 test scores, one-year predicted paths were developed using students tested in consecutive years (e.g., grade level pairs 3–4, 4–5, 5–6, 6–7, 7–8, 8–9, and 9–10). For all other subject areas, two-year predicted paths were developed using students tested two years apart (3–5, 4–6, etc.). Predicted ACT score ranges for grade 10 are based on students tested in consecutive years with ACT Aspire and the ACT, while predicted ACT score ranges for grade 9 are based on students who tested two years apart with ACT Aspire and the ACT.

aCt aSPIRE gROWtH

14.4

Table 14.1. longitudinal Samples used to Develop Predicted Paths, Predicted aCt Score Ranges, and Student growth Percentiles for Spring 2016 Reporting

Grade Level Pair

Subject Area

English Mathematics Reading Science Writing Composite

3–4 16,658 61,514 61,615 15,371 13,935

3–5 4,858 9,473 9,566 7,882

4–5 16,949 61,696 61,640 17,651 15,062

4–6 4,969 9,503 9,510 4,956

5–6 16,498 60,824 60,897 12,974 14,132

5–7 5,196 9,139 9,154 7,984

6–7 15,456 59,407 59,382 15,648 15,240

6–8 3,996 6,996 6,937 3,530

7–8 14,768 58,884 58,810 11,452 15,177

7–9 744 717 711 694

8–9 3,874 4,148 3,924 3,876 3,363

8–10 453 423 405 357

9–10 6,436 6,634 6,600 6,172 4,862

9–11 (the ACT) 2,414 2,085 2,002 2,085 392 1,515

10–11 (the ACT) 11,275 10,916 10,923 10,720 6,444 9,243

*Concordance-derived sample

14.1.4 Statistical models used to Develop the Predicted PathsTwo-year predicted paths are based on a regression model where grade g test score is used to predict grade g+2 test score. The regression model uses linear and quadratic effects of the grade g score to predict the grade g+2 score, and a 70% prediction interval forms the endpoints of the two-year predicted path score range. (Note that the 70% prediction interval results in a coverage rate of about 75% because the test scores take integer values and scores that are equal to the interval endpoints are counted as being within the interval). The full predicted path, which encompasses one-year and two-year predictions, is drawn by connecting the current year score to the endpoints of the two-year predicted path. This is the approach used for English, mathematics, reading, and science for grades 3–8.

For writing, all grade 9 test scores, and grade 10 predicted ACT score ranges, only one-year predicted score ranges are reported. For these cases, the predicted path endpoints are defined as the scores closest to the 25th and 75th percentiles of the score distribution at grade g+1, conditional on score at grade g. The quantiles of the conditional score distribution are estimated using the same methodology used to estimate student growth percentiles (described later in this chapter).

aCt aSPIRE gROWtH

14.5

14.1.5 limitations of the Predicted PathsThe predicted paths were developed using samples of students tested in multiple years between spring 2013, spring 2014, and spring 2015. A large percentage of students come from one state (Alabama, the first state to administer ACT Aspire on a statewide basis). With larger samples of students, the estimation of the predicted paths will have less sampling error. It is also possible that predicted path estimates could shift up or down with the inclusion of more school districts and states in the sample, or with changes in student growth over time. The predicted paths will be re-estimated as more data become available. The predicted paths are currently designed to have coverage rates of about 75% for the two-year paths, and about 50% for the one-year paths.

14.2 Student Growth PercentilesStudent growth percentiles (SGPs) represent the relative standing of a student’s current achievement compared to others with similar prior achievement. ACT Aspire SGPs, ranging from 1 to 100, are provided for students who test in consecutive years. They are used to classify students’ growth into the following categories: “Low” (SGP<25), “Average” (SGP between 25 and 75), or “High” (SGP>75). ACT Aspire SGPs measure growth over one-year time intervals.

SGPs are estimated using quantile regression methods (Koenker, 2005) by the SGP R package (Betebenner, VanIwaarden, Domingu, & Shang, 2014). SGPs are reported for students who also tested the previous year with the tests one grade level apart.

When interpreting SGPs, the reference group used to estimate the model should always be considered. The sample sizes used to estimate the SGPs reported in fall 2015 and spring 2016 are given in Table 14.1 (for grade level pairs one year apart). The SGPs used for ACT Aspire are based on all students tested nationally in adjacent years, so the reference group is expected to change over time. Because of these expected changes and possible changes in the amount of student growth observed nationally over time, SGPs reported by ACT Aspire can shift over time, confounding comparisons of growth for different cohorts of students.

14.2.1 Consistency of SgPs across Subject areasBased on the same samples used to develop the predicted paths and SGPs, we estimated the correlations of SGPs across subject areas. A positive correlation indicates consistency between SGPs for different subject areas. A negative correlation indicates inconsistency between SGPs for different subject areas. In Table 14.2, the correlations are presented for each pair of subject areas. The sample used to estimate each correlation includes each instance of different subject tests being taken in consecutive years. Results are presented for all grade levels combined.

aCt aSPIRE gROWtH

14.6

All of the correlations in Table 14.2 are positive and range from 0.149 (writing and mathematics) to 0.273 (reading and science). The correlations provide some evidence of consistency between SGPs for different subject areas. Students who demonstrate high (or low) growth in one subject area are more likely to also demonstrate high (or low) growth in other subject areas. Possible explanations for this consistency in SGPs across subject areas include 1) development of knowledge and skills in one subject area may directly influence development in other subject areas, and 2) growth across subject areas may be driven by a shared factor, such as motivation. The correlations are relatively small, suggesting modest levels of consistency. The correlations are attenuated towards 0 because of measurement error in the SGPs.

Table 14.2. Correlations of SgPs across Subject areas

Subject area pair N Correlation

English, Reading 99,414 0.233

English, Writing 67,233 0.157

English, Mathematics 99,438 0.195

English, Science 78,092 0.236

Reading, Writing 87,606 0.156

Reading, Mathematics 319,935 0.194

Reading, Science 92,207 0.273

Writing, Mathematics 87,601 0.149

Writing, Science 62,957 0.159

Mathematics, Science 92,397 0.227

14.3 ACT Aspire Gain ScoresACT Aspire English, mathematics, reading, and science scores are vertically scaled across grade levels (see Chapter 10 for details on the scaling procedures), making it possible to monitor progress over time on the same scale. Writing scores are not vertically scaled across grade levels, but instead are based on scoring rubrics where a score of “4” indicates meeting expectations for that grade level. Consistency in writing scores over time suggests the same performance, relative to grade level expectations (see Chapter 10). For all subjects, viewing scores graphically over multiple years provides insights about a student’s current achievement level, as well as progress made with respect to ACT Readiness Benchmarks and ACT Readiness Levels.

Gain scores are the arithmetic difference in scores from one year to the next. Gain scores are an attractive growth measure because of their simplicity and intuitive appeal. Unlike SGPs, calculations of gain scores are independent of reference groups. However, normative information from a reference group is still helpful for the

aCt aSPIRE gROWtH

14.7

interpretation of gain scores. For example, knowing the mean gain over one year can be used to determine whether a student's gain is higher or lower than average. For all subjects except writing, positive ACT Aspire gain scores are anticipated because students are expected to increase their knowledge and skills in the tested areas after one year of schooling.

Table 14.3. aCt aspire gain Score means and Standard Deviations

Grade Level Pair

Subject Area

English Mathematics Reading Science Writing Composite

3–4 3.8 (4.9) 3.1 (3.5) 2.6 (4.0) 3.3 (4.5) 2.3 (5.5)

4–5 2.9 (5.2) 2.2 (4.0) 2.5 (4.2) 2.2 (4.5) 1.0 (6.3)

5–6 2.1 (5.8) 2.9 (4.7) 2.3 (4.6) 1.9 (4.8) 1.0 (7.1)

6–7 2.5 (6.1) 0.1 (5.0) 0.3 (4.9) 1.1 (5.1) -2.9 (7.1)

7–8 1.8 (6.3) 2.1 (4.7) 2.5 (4.9) 2.8 (5.4) 1.4 (6.0)

8–9 1.0 (5.9) 1.4 (5.0) 0.0 (5.5) 1.9 (5.6) 1.7 (6.4) 1.1 (3.4)

9–10 1.0 (6.3) 1.3 (5.2) 1.1 (5.7) 1.3 (6.4) 1.9 (6.3) 1.2 (3.7)

Source: Based on data from spring 2013, spring 2014, and spring 2015.

For the longitudinal samples used to develop the ACT Aspire Student Growth Percentiles (see Table 14.1), gain score means and standard deviations are provided in Table 14.3. There is considerable variation across grade levels and subject areas in mean gain scores. For most subjects with vertical scales (all but writing), mean gain scores are typically positive, showing that students in the sample typically increased their knowledge and skills in the tested areas after one year of schooling. Mean gain score is very small for mathematics grades 6–7 (0.1), reading grades 6–7 (0.3), and reading grades 8–9 (0.0). For the writing test, one should not necessarily expect positive mean gain scores because scores are not vertically scaled across grade levels. The mean gain scores for ACT Aspire writing ranged from 2.3 for grades 3–4 to −2.9 for grades 6–7.

14.3.1 Interpreting aCt aspire gain ScoresBecause ACT Aspire scores in all subject areas except writing are reported on a common vertical scale, gain scores are intended to be interpreted as measuring change in knowledge and skills from one year’s test to another. However, because no educational scale can be interpreted as having equal intervals in a strict sense, it should not be assumed that the meaning of gain scores is the same across the score scale. For example, it is possible that some regions of the score scale can be more sensitive to student learning. Also, students who score at the low end of a scale are generally expected to have larger one-year gain scores than students who score at the high end. This phenomenon is illustrated in Figure 14.3 using the longitudinal sample of ACT Aspire-tested students who took the grade 7 reading test and the

aCt aSPIRE gROWtH

14.8

grade 8 reading test in consecutive years. The figure shows the mean gain score, as well as the 25th and 75th percentile of the gain score distribution, for each grade 7 reading score point with a sample size of at least 100 students. The negative relationship between prior-year score and one-year gain score results because test scores are only estimates of true achievement levels: students who score far above (below) the mean are more likely than others to have scored above (below) their true achievement level and tend to score closer to the mean the next year.

Figure 14.3. aCt aspire gain score statistics for grade 7–8 reading

Across grade levels and subject areas, the median correlation of prior year score and gain score is -0.29, the 25th percentile is -0.41 and the 75th percentile is -0.25. For reading, the correlation of grade 7 reading score to one-year gain score is -0.29, so Figure 14.3 represents a typical strength of relationship between prior year score and gain score.

14.4 Growth-to-Standards Models with ACT AspireUnlike normative growth measures such as SGPs, growth-to-standards models determine if students are making sufficient progress toward a performance standard, such as the ACT Readiness Benchmarks or other ACT Readiness Levels. ACT Aspire supports growth-to-standards models because scores are reported over time and plotted against the ACT Readiness Benchmarks and ACT Readiness Levels (see Figure 14.1). The predicted paths also support growth-to-standards models because they indicate how students are likely to perform over the next two years, assuming typical growth. For example, if a fourth grader’s predicted path falls below the ACT Readiness Benchmark for grades 5 and 6, he or she knows that atypically high growth over the next two years will be needed to reach the ACT Readiness Benchmark.

aCt aSPIRE gROWtH

14.9

Growth-to-standards models can be implemented by specifying a performance standard such as the ACT Readiness Benchmark and specifying the amount of time students have to reach the performance standard. For the student represented by the example report in Figure 14.1, the student’s score (417) was 5 points below the grade 8 ACT Readiness Benchmark for English (422). If the model assumes students have two years to catch up, the student would need to gain 11 score points to reach the grade 10 ACT Readiness Benchmark for English (428). Under a growth-to-standards model, the student would need to gain at least 6 points from grade 8 to grade 9 to have made sufficient progress toward the grade 10 performance standard. SGPs can also be used within a growth-to-standards model to communicate how much progress is needed using the growth percentile metric instead of the gain score metric.

14.5 Measurement Error of Growth ScoresMeasures of individual student growth, including SGPs, are subject to measurement error. This means that a student’s actual growth in academic achievement may be different than what is represented by their SGP (Wells, Sireci, & Bahry, 2014) or gain score. All test scores have measurement error, and measurement errors of SGPs and gain scores are more pronounced because measurement error of multiple test scores is compounded. In tests with vertical scales such as ACT Aspire, negative gain scores may be observed, due in part to measurement error. Because of their measurement error, neither SGPs nor gain scores should be used as the sole indicator of a student’s academic progress over one year. Instead, SGPs and gain scores can complement other evidence of learning such as performance on homework and quizzes, unit tests, and teacher observation.

14.6 Aggregate Growth Scores for Research and EvaluationFor classrooms, schools, districts, states, and other user-defined groups, aggregate growth statistics are available to describe how much growth occurred in each group. Mean scale scores are plotted on longitudinal progress reports and the percentage of students in each growth category (low, average, or high) is reported. It is also possible to calculate other summary growth measures, such as the mean or median SGP, using available data.

Data from ACT Aspire may be used as one indicator of program effectiveness. One of the secondary interpretations of ACT Aspire scores involves providing empirical data for inferences related to accountability (see Chapter 1). Before using ACT Aspire for evaluating program effectiveness, a content review is critical to ensure that ACT Aspire measures important outcomes of the program. For example, a district might want to use the multiyear change in mean SGP of 10th–grade English students as one measure of the effectiveness of a new 10th–grade English curriculum. Prior to implementation, ACT Aspire content should be reviewed against the intended outcomes of the program to evaluate its appropriateness for measuring curriculum effectiveness.

aCt aSPIRE gROWtH

14.10

14.6.1 Normative Data for School mean SgPsIf used for evaluating educator effectiveness, ACT Aspire growth data should be used as one of multiple sources of evidence. As the stakes for particular uses increase, it becomes more important to carefully evaluate ACT Aspire score interpretations for these uses and to gather multiple sources of evidence. Measures of aggregate growth include the mean and median SGP. Research suggests that the mean SGP may have advantages over the median SGP in terms of efficiency, greater alignment with expected values, and greater robustness to scale transformations (Castellano & Ho, 2015).

We now examine distributions of school mean SGPs (MGPs), by subject area. To derive summary statistics of distributions of school mean SGPs, the following steps were taken:

1. For each combination of school, grade level, and subject area with at least ten stu-dents tested in consecutive grade levels, the mean SGP is calculated.

2. For each subject area, summary statistics of mean SGP, including N (number of combinations of school and grade level pairs), mean, standard deviation, and se-lected percentiles are calculated.

The results presented in Table 14.4 (see also Figure 14.4) can be used to aid interpretation of school mean SGP values. For example, a mean mathematics SGP of 40 is between the 10th and 20th percentiles. Conversely, a mean mathematics SGP of 60 is between the 80th and 90th percentiles. The results in Table 14.4 are combined across grade levels, and so do not reveal possible variation in mean SGP across grade levels.

It is also important to note that the distribution of school mean SGP is affected by the size of the group. When mean SGP is based on a smaller sample size, it is less precise, and extreme mean SGP values are more likely to result.

aCt aSPIRE gROWtH

14.11

Table 14.4. Distributions of School mgPs, by Subject area

Statistic

Subject Area

English Mathematics Reading Science Writing

N students 100,134 322,142 321,935 91,381 86,635

N units* 1,093 3,285 3,279 1,084 1,130

Mean 49.6 49.2 49.5 50.0 50.9

SD 9.1 9.9 8.4 9.9 11.4

Percentiles

5 35 34 36 34 32

10 39 37 39 37 36

20 42 41 42 42 41

25 43 42 44 43 43

30 45 44 45 45 45

40 47 46 48 47 48

50 49 49 50 50 50

60 52 51 52 52 54

70 54 54 54 55 57

75 56 56 55 57 58

80 58 57 56 58 60

90 61 62 60 63 65

95 65 66 64 66 70

*The number of school/ grade level combinations.

Figure 14.4. Distributions of school mgPs, by subject area

0

10

20

30

40

50

60

70

80

5 10 20 25 30 40 50 60 70 75 80 90 95

Mea

n S

GP

Percentile

Distributions of School Mean SGPs, by Subject Area

English

Mathematics

Reading

Science

Writing

aCt aSPIRE gROWtH

14.12

14.6.2 mean SgP Differences across Student SubgroupsThe mean SGP can be used to identify growth differences across districts, schools, classrooms, or other groups. However, this alone does not explain reasons for the growth differences. When comparing mean SGPs across groups, it is important to consider whether differences in the student composition of the groups could explain the group differences. For example, a school serving economically disadvantaged students might be expected to have lower mean SGP than a school serving students from affluent families.

Analyses were conducted to examine SGP differences across student subgroups. The subgroups are defined by race/ethnicity, gender, English language learner status, economic disadvantage status, and special education status. All subgroup indicators are provided by school or district personnel. Special education status is determined by student participation in an individualized education program (IEP). In Table 14.5, we present the SGP sample sizes for selected student subgroups, using all available longitudinal data collected through spring 2015. In Table 14.6, we present the mean SGP for selected student subgroups. Results are aggregated across grade levels.

By race/ethnicity, mean SGPs are consistently higher for Asian students and lowest for African American students. The mean SGPs for white students were slightly above average (50), ranging from 51.6 in English to 53.2 in writing. For Hispanic students, mean SGPs ranged from 45.8 in English to 48.3 in mathematics and writing. The gender differences are relatively small, with the mean SGP for females consistently larger than the mean SGP for males. The largest gender difference is observed for writing (55.2 for females, 47.7 for males) and the smallest difference is observed for mathematics (50.9 for females, 49.2 for males).

Mean SGPs for English language learners were below average, ranging from 42.9 in English to 47.0 in mathematics. Similarly, for economically disadvantaged students, mean SGP ranged from 43.9 in English to 49.6 in writing. Across student subgroups, mean SGPs were lowest for special education students, ranging from 36.0 in writing to 38.4 in mathematics.

aCt aSPIRE gROWtH

14.13

Table 14.5. SgP Sample Size by Subject area and Student Subgroup

Subgroup English Mathematics Reading Science Writing

Race/ethnicity

Black/African American 23,828 95,901 96,195 23,145 21,664

American Indian/Alaska Native 1,246 2,964 2,934 844 1,024

Asian 1,965 4,700 4,670 1,423 1,159

Hispanic or Latino 6,237 19,827 19,715 6,004 5,518


Two or more races 668 664 672 642 462

White 59,353 188,028 187,590 53,639 51,391

Gender

Female 53,095 161,944 161,748 48,965 45,095

Male 48,064 161,302 161,250 44,137 42,543

English language learner 1,310 1,500 1,482 1,314 1,261

Economically disadvantaged 6,685 9,927 9,943 6,618 5,778

Special education 2,377 4,608 4,609 2,469 2,190

Table 14.6. mean SgP by Subject area and Student Subgroup

Subgroup English Mathematics Reading Science Writing

Race/ethnicity

Black/African American 44.8 44.1 45.7 44.1 46.5

American Indian/Alaska Native 49.9 50.6 51.4 49.2 49.6

Asian 55.7 61.3 58.3 56.4 57.6

Hispanic or Latino 45.8 48.3 47.9 47.1 48.3


Two or more races 47.5 48.5 48.0 47.7 49.8

White 51.6 52.8 52.0 51.9 53.2

Gender

Female 51.7 50.9 51.7 50.8 55.2

Male 47.7 49.2 48.2 48.7 47.7

English language learner 42.9 47.0 45.2 46.1 46.5

Economically disadvantaged 43.9 46.2 47.6 45.1 49.6

Special education 36.7 38.4 37.8 38.0 36.0

aCt aSPIRE gROWtH

14.14

14.7 Additional Growth Modeling ResourcesFor additional information on growth models, see ACT Growth Modeling Resources online at http://www.act.org/content/act/en/research/act-growth-modeling-resources.html. Student growth percentile lookup tables for ACT Aspire are provided under “Student Growth Percentile Model.”

For information that can help parents, teachers, and administrators interpret growth data, see “Answers to questions about student growth” (parent, teacher, and administrator versions), available at:

http://www.act.org/content/dam/act/unsecured/documents/answers-to-administrators-Questions-about-Student-growth.pdf

http://www.act.org/content/dam/act/unsecured/documents/answers-to-teachers-Questions-about-Student-growth.pdf

http://www.act.org/content/dam/act/unsecured/documents/answers-to-Parents-Questions-about-Student-growth.pdf

http://www.act.org/content/act/en/research/act-growth-modeling-resources.html

http://www.act.org/content/act/en/research/act-growth-modeling-resources.html

http://www.act.org/content/dam/act/unsecured/documents/Answers-to-Administrators-Questions-about-Student-Growth.pdf




http://www.act.org/content/dam/act/unsecured/documents/Answers-to-Parents-Questions-about-Student-Growth.pdf

http://www.act.org/content/dam/act/unsecured/documents/Answers-to-Parents-Questions-about-Student-Growth.pdf

15.1

CHaPtER 15

Progress toward Career ReadinessThis chapter describes the procedures and results of a study used to develop predicted performance levels on the ACT National Career Readiness Certificate using ACT Aspire scores. This predicted performance was used to develop the Progress toward Career Readiness indicator included on ACT Aspire score reports.

15.1 IntroductionFor decades it has been a commonly held belief that high school students planning to go to college need to take more rigorous coursework than those going directly into the workforce. Today, however, many employers report that in an expanding global economy, entry-level workers need many of the same types of knowledge and skills as college-bound students (ACT, 2006). To help students predict and monitor whether they are on track to be career ready toward the end of high school using ACT Aspire performance, students at grade 8 through EHS with Composite scores (i.e., receiving scores on English, mathematics, reading, and science tests) receive a Progress toward Career Readiness indicator.

The ACT National Career Readiness Certificate™ (ACT NCRC®), awarded based on ACT WorkKeys® test results, is a portable credential that demonstrates achievement and is based on assessment level scores associated with workplace employability skills in three areas: applied mathematics, locating information, and reading for information. Assessment level scores typically run from Level 3 to Level 6 or 7. In addition, the ACT NCRC has four levels of certificate: Bronze, Silver, Gold, and Platinum. A Platinum certificate is earned if the three assessment level scores a student earns are all Level 6 or higher. A Gold certificate is earned if the three assessment level scores a student earns are all Level 5 or higher. A Silver certificate

PROgRESS tOWaRD CaREER REaDINESS

15.2

is earned if the three assessment level scores a student earns are all Level 4 or higher. A Bronze certificate is earned if the three assessment level scores a student earns are all Level 3 or higher. If any of the three assessment level scores are less than Level 3, no certificate is achieved (ACT, 2007).

The actual knowledge and skills required across careers and career domains may differ greatly. Further, the ACT and ACT Aspire do not measure all such knowledge and skills. Rather, the ACT and ACT Aspire measure academic achievement, or foundation skills which are a subset of skills associated with career readiness. The linkage between ACT Aspire and the ACT NCRC is based only on these academic skills and provides a prediction of future performance. Because the constructs and content across the ACT and ACT Aspire differ somewhat from that of the ACT WorkKeys used in the ACT NCRC, a prediction method was used to indicate if a student is likely to meet Bronze, Silver, or Gold ACT NCRC level at the completion of high school.

The Progress toward Career Readiness cut scores were created for grades 8–11 to define indicators of progress towards career readiness to predict future ACT NCRC performance (see Figure 15.1). The first step in creating the cut scores was linking ACT NCRC level to the ACT Composite scores using a prediction method. A concordance was then used to find the ACT Aspire Composite scores that corresponded to each of the cut scores on the EPAS scale. Finally, backmapping was used to obtain the cut scores associated with Bronze, Silver, and Gold ACT NCRC levels for grades 8–10.

Figure 15.1. Sample Progress toward Career Readiness indicator from aCt aspire report


15.3

15.2 Link ACT NCRC Levels to the ACT Composite ScoresData from over 110,000 grade 11 students who took ACT WorkKeys and the ACT were used to establish the link between the ACT Composite scores and the ACT NCRC levels. Four separate logistic regressions were conducted to determine the ACT Composite score that corresponded to a 50% chance of obtaining each of the four ACT NCRC levels, respectively. The ACT Composite score was the independent variable, and the status of whether a student was at or above a specific ACT NCRC level was the dependent variable (similar methods have been used for setting ACT College Readiness Benchmarks; e.g., Allen & Sconing, 2005).

Table 15.1 provides descriptive statistics of the ACT Composite scores for students at different ACT NCRC levels. As ACT NCRC level increases, the mean and median of ACT Composite scores increase, which is indicative of the positive relationship between ACT NCRC level and ACT Composite scores. The variability in performance on the ACT increases as ACT NCRC level increases through the Gold level. Between the Gold and Platinum levels, variability decreases slightly, which is likely the result of instability in the standard deviation due to the relatively small number of students in the Platinum group compared to the others.

Table 15.2 presents the logistic regression results—unrounded and rounded ACT Composite scores that are associated with a 50% chance of obtaining different ACT NCRC levels. For example, an ACT Composite score of 17 is required for a 50% chance of obtaining the ACT NCRC Silver certificate or higher while an ACT Composite score of 25 is required for a 50% chance of obtaining the ACT NCRC Gold certificate or higher.

Table 15.1. Descriptive Statistics of aCt Composite Scores by aCt NCRC level

ACT NCRC level

ACT Composite Score

N Mean Median SD Min Max

No Certificate 13,628 13.69 13 2.28 4 32

Bronze 23,726 15.86 16 2.64 6 33

Silver 50,087 19.76 20 3.56 7 35

Gold 25,669 25.20 25 3.81 11 36

Platinum 1,497 30.33 31 3.15 18 36


15.4

Table 15.2. aCt Composite Scores Corresponding to a 50% Chance of Obtaining Different aCt NCRC levels

ACT NCRC levels

ACT Composite Cut Score

Unrounded Rounded

Bronze 12.74 13

Silver 16.43 17

Gold 24.15 25

Platinum 34.80 35

To ensure each final ACT Composite cut score corresponds to at least a 50% chance of receiving an ACT NCRC level, all unrounded Composite scores were rounded up to the final reported cut values. These four cut values are based on a sample of grade 11 students, so they should be interpreted as an indicator of career readiness for grade 11 students.

15.3 Link the EPAS Composite Scores to ACT Aspire Composite ScoresThe second step in the linkage between ACT Aspire and ACT NCRC certificate levels was to establish a concordance between the EPAS Composite scores on the EPAS tests (ACT Explore, ACT Plan, or the ACT) and ACT Aspire Composite scores. Data from spring 2013 ACT Aspire tests were merged with historical EPAS student data files to obtain a sample of students with EPAS and ACT Aspire Composite scores. Table 15.3 presents means, standard deviations, and the correlation between the EPAS and ACT Aspire Composite scores for the 13,528 students included in the sample.

Table 15.3. Descriptive Statistics of the analysis Sample (N=13,528)

EPAS Composite Score

ACT Aspire Composite Score

Mean 16.66 424.05

SD 3.95 7.72

Correlation .86

The concordance relationship between EPAS and ACT Aspire Composite scores was estimated using the equipercentile method described by Kolen and Brennan (2014). The same methodology was used to find the concordance relationship between EPAS and ACT Aspire for the individual subjects. See Chapter 12 for details on the methodology.


15.5

15.4 Identify the ACT Aspire Composite Scores Corresponding to Each ACT NCRC LevelOnce the concordance between the EPAS Composite scores and the ACT Aspire Composite scores was established, the Composite score on the ACT with constructed-response tests that best predicts each ACT NCRC level for grade 11 was taken directly from the concordance table. Table 15.4 presents the EPAS Composite scores corresponding to ACT NCRC Levels and their concorded Composite scores on the ACT with constructed-response tests in grade 11. The Platinum level was excluded due to small sample sizes.

To obtain the Progress toward Career Readiness indicator for each ACT NCRC level for grades 8, 9, and 10, the grade 11 Progress toward Career Readiness cut scores were backmapped to lower grades using z-score method, the same procedure used to backmap the ACT Readiness Benchmarks in Chapter 13. The ACT Aspire Composite score associated with each ACT NCRC level for a particular grade was the score that corresponds to the same standardized z-score of the EPAS Composite Score for each grade 11 ACT NCRC level in the sample used for the EPAS-to-ACT-Aspire concordance analysis. Grade 11 z-scores associated with Bronze, Silver, and Gold ACT NCRC levels were first computed. Then for each of grades 8-10, the ACT Aspire Composite score associated with the z-score was identified by multiplying grade specific scale score standard deviation and adding grade specific scale score mean to determine the backmapped Composite score corresponding to Bronze, Silver, and Gold level certificates. Table 15.5 presents the ACT Aspire Composite score for each of the three ACT NCRC levels for grades 8–10.

Table 15.4. grade 11 EPaS and aCt aspire Composite Scores Corresponding to aCt NCRC levels

ACT NCRC Level

Corresponding EPAS Composite

Score

Corresponding Composite Score on the ACT with Constructed-Response Tests

(Grade 11)

Bronze 13 416

Silver 17 425

Gold 25 439

Table 15.5. aCt aspire Composite Scores Corresponding to a 50% Chance of Obtaining Different aCt NCRC levels

Grade

ACT NCRC Level

Bronze Silver Gold

8 415 422 434

9 415 423 436

10 416 425 439


15.6

15.5 Report Progress toward Career Readiness using ACT Aspire Cut ScoresStudents who take the four 8th-grade or EHS tests of English, mathematics, reading, and science obtain a Composite score and receive an indicator of their Progress toward Career Readiness (see Figure 15.1 for a sample from the score report). The student’s Composite score is listed along with the Bronze, Silver, and Gold level ACT NCRC certificates. The indicator reported to the student is a verbal statement indicating which certificate level the student is making progress toward. A purpose of the indicator is to encourage students to think about the knowledge and skills future job training will require. Whether or not a student plans to enter college immediately after high school, there is likely a need for some type of postsecondary training. It is not possible to profile the requirements for all possible postsecondary paths a student may consider, but the ACT NCRC and the ACT WorkKeys job profiles provide a starting point for students to understand that even if they are not planning to go to college, there are still knowledge and skills they need to attain before they leave high school. The specific foundational knowledge and skills required across careers and career paths differ, and ACT Aspire may not measure all such skills.

The Progress toward Career Readiness indicator provides a statistical prediction of the likely ACT NCRC level a student would obtain toward the end of high school given the student’s current performance, but caution should be used in its interpretation. The Progress toward Career Readiness indicator is not a substitute for actual ACT NCRC level obtained by taking ACT WorkKeys. Actual performance could differ from the statistically predicted performance for a variety of reasons, including such factors as statistical uncertainty in the prediction, a student’s individual educational achievement, and a student’s growth trajectory.

16.1

CHaPtER 16

ACT Aspire Reliability

16.1 IntroductionThis chapter presents the methods and results of ACT Aspire raw score, scale score, and composite score reliabilities and conditional standard errors of measurement.

Some degree of inconsistency or error is contained in the measurement of cognitive characteristics. A student took one form of a test on one occasion and a second, parallel form on another occasion would likely earn somewhat different scores on the two administrations. These differences might be due to the student or the testing situation, such as different motivations, different levels of distractions across occasions, or student growth between testing events. Differences across testing occasions might also be due to the particular sample of test items or prompts included on each test form. While procedures are in place to reduce differences across testing occasions, differences cannot be eliminated.

Reliability coefficients are estimates of the consistency of test scores. They typically range from zero to one, with values near one indicating greater consistency and those near zero indicating little or no consistency. The standard error of measurement (SEM) is closely related to test reliability. The SEM summarizes the amount of error or inconsistency in scores on a test. As the Standards for Educational and Psychological Testing stated: “For each total score, subscore, or combination of scores that is to be interpreted, estimates of relevant indices of reliability/precision should be reported.” (AERA, APA, & NCME, 2014, p. 43).

aCt aSPIRE RElIabIlIty

16.2

16.2 Raw Score ReliabilityReliability coefficients are usually estimated based on a single test administration by calculating the inter-item covariances. These coefficients are referred to as internal consistency reliability. Cronbach’s coefficient alpha (Cronbach, 1951) is one of the most widely used estimates of test reliability and was computed for all of the ACT Aspire tests. Coefficient alpha can be computed using the following formula:

211

22ˆ 1

1 1

k kkiji i j ii

xx

ssk kk s k s

α = ≠== − =− −

∑ ∑∑ ,

where k is the number of test items, si2 is the sample variance of the ith item, sij is the

sample covariance between item i and item j, and sx2 is the sample variance of the

observed total raw score.

Although coefficient alpha is often used to estimate internal consistency reliability, it can, at times, underestimate the true value of test reliability depending on the characteristics of the specific test under consideration. In computing test reliabilities for the ACT Aspire tests, stratified coefficient alpha (Cronbach, SchÖnemann, & McKie, 1965) and congeneric reliability coefficients (Gilmer & Feldt, 1983) were also computed as a check on coefficient alpha. All three reliability coefficients produced nearly equal values for ACT Aspire subject tests at each grade level. As a consequence, Cronbach’s coefficient alpha was used in reporting raw score reliability in Tables 16.1 and 16.2.

Table 16.1. Raw and Scale Score Reliability Coefficient Ranges by grade for Subject tests and Composite Score from Spring 2013 Special Studies Data

Grade

Subject Score 3 4 5 6 7 8 EHS

English Raw .67–.79 .75–.80 .76–.79 .78–.81 .78–.85 .84–.87 .90–.91

Scale .70–.79 .76–.80 .77–.79 .80–.81 .79–.85 .84–.86 .90–.91

Mathematics Raw .73–.79 .55–.76 .57–.77 .61–.74 .66–.83 .84–.87 .86–.89

Scale .75–.79 .62–.75 .65–.77 .74–.77 .70–.85 .86–.89 .87–.90

Reading Raw .83–.85 .83–.84 .81–.84 .81–.84 .77–.83 .81–.86 .87–.87

Scale .83–.85 .83–.85 .81–.84 .82–.85 .79–.84 .82–.87 .88–.88

Science Raw .85–.88 .83–.84 .84–.87 .85–.88 .85–.89 .84–.88 .87–.90

Scale .86–.88 .83–.85 .84–.88 .86–.89 .85–.90 .85–.89 .86–.89

Composite* Scale .95–.96 .96–.97

* Composite scores are not reported below grade 8.


16.3

Table 16.2. Raw Score Reliability Coefficient Ranges by grade for Subject tests from Spring 2014 Operational Data

Grade

Subject 3 4 5 6 7 8 EHS

English .78–.82 .74–.78 .75–.78 .80–.82 .78–.83 .84–.85 .88–.90

Mathematics .78–.79 .67–.68 .67–.71 .77–.79 .81–.84 .86–.87 .82–.88

Reading .83–.85 .82–.84 .82–.84 .82–.84 .79–.80 .81–.83 .85–.87

Science .86–.89 .82–.84 .83–.84 .86–.87 .87–.89 .86–.87 .85–.89

The raw score reliabilities from the spring 2013 special studies listed in Table 16.1 for English, mathematics, reading, and science tests ranged from .55 (grade 4 mathematics) to a high of .91 (EHS English). Mathematics reliabilities tended to be quite low in some cases, particularly for grades 4–7.

The raw score reliabilities from the spring 2014 operational administration are listed in Table 16.2 for English, mathematics, reading, and science tests. Compared to 2013, reliabilities improved and ranged from .67 (grades 4 and 5 mathematics) to .90 (EHS English). Mathematics reliabilities still tended to be relatively low, specifically in grades 4 and 5.

Tests that consist of a single item typically use different types of information to estimate raw score reliability. For a test like ACT Aspire writing, which consists of a single writing prompt scored mostly by a single rater, one aspect of score reliability is rater consistency. While rater consistency should not be confused with score reliability due to task (in this case, the particular writing prompt administered), rater consistency is an important contributor to the reliability of writing prompt scores. One way to analyze rater consistency is by estimating correlations between two raters. In the spring 2013 ACT Aspire special studies, a subsample of students at each grade were rated by two raters. Table 16.3 lists the sample sizes and percentages of students in demographic groupings and Table 16.4 contains Pearson correlations between raters. The writing prompt is rated on four domains, or traits, which include Ideas and Analysis, Development and Support, Organization, and Language Use and Conventions. Scores on these domains are combined to obtain a total score.1 Correlations are reported for domain scores and total scores between raters. Writing total score correlations ranged from .70 (grade 7) to .81 grade 3 online and EHS online. Domain correlations ranged from .61 to .77.

1 For additional details, see Chapter 3, Section 3.2.3.


16.4

Table 16.3. Percentages of Students for Writing Rater analysis in the Spring 2013 Special Studies by grade and Demographics

Demographic Grouping

Grade (N)

3 (2,141)

4 (1,902)

5 (2,041)

6 (2,190)

7 (2,243)

8 (1,701)

EHS (2,033)

Gender

Not Specified 7 8 6 4 5 6 4

Female 47 46 46 49 48 48 51

Male 45 46 48 47 47 46 45

Race/Ethnicity

Not Specified 36 38 34 27 30 29 38

Black/African American 20 19 21 22 19 21 22

American Indian/Alaskan Native 1 1 1 1 1 1 1

White 34 34 33 31 29 31 24

Hispanic/Latino 6 7 8 16 15 15 11

Asian 2 1 2 2 2 2 3

Native Hawaiian/Other Pacific Islander 1 1 1 1 1 0

Two or More Races 1 0 1 1 3 1 1

Geographic Region

East 46 51 39 34 33 40 26

Midwest 31 30 36 39 41 38 53

Southwest 17 13 18 12 15 16 15

West 6 6 7 14 11 6 6

Note. Zero indicates a percentage fewer than 1%. Missing values indicate no students. Percentages within a demographic grouping may not sum to 100% due to rounding.


16.5

Table 16.4. Writing test Correlations between Rater 1 and Rater 2 by Domain and by Form from Spring 2013 Special Studies Data

Trait Average Correlation among the Four Traits

Total Score CorrelationGrade N 1 2 3 4

3 2,141 .68 .72 .73 .66 .70 .81

4 1,902 .74 .72 .70 .71 .72 .78

5 2,041 .63 .65 .65 .69 .66 .73

6 2,190 .65 .66 .68 .65 .66 .72

7 2,243 .61 .62 .62 .65 .62 .70

8 1,701 .70 .72 .72 .65 .70 .77

EHS 2,033 .76 .76 .77 .72 .75 .81

Table 16.5. Writing test Reliability Coefficients based on Four Domain Scores from Spring 2013 Special Studies Data

Grade N Reliability

3 5,307 .91

4 4,709 .96

5 5,046 .95

6 5,407 .96

7 5,563 .95

8 4,717 .96

EHS 5,073 .96

Although ACT Aspire writing consists of a single writing prompt, ratings on the four domains that contribute to a final writing score can be used to obtain an internal consistency reliability estimate, like the coefficient alpha, for a writing prompt. These reliabilities are listed in Table 16.5 for writing prompts from spring 2013 and ranged from .91 to .96, which indicates that the four traits within a single writing prompt are quite reliable. However, these coefficients do not account for rater, prompt, or occasion variability, each of which is likely to be a more important contributor to the precision of the ACT Aspire writing scores than the domains within a prompt. The large internal consistency reliability based on domain scores is a reflection of the large correlations among domains within a writing prompt.

ACT Aspire also reports students' performance (percent and number of points students earn out of the total number of points possible) in each reporting category under each subject test. The total number of points possible in each reporting category may vary across forms (see Chapter 3). Raw score reliabilities (i.e., Cronbach's alpha) and standard error of measurement for each reporting category


16.6

on the forms administered in spring 2014 are reported in Appendix D. Standard error measurement is computed as raw score standard deviation multiplied by square root of one minus reliability.

16.3 Scale Score Reliability and Conditional Standard Error of MeasurementThe ACT Aspire scale was developed under the framework of the unidimensional item response theory (IRT) models, which involves statistical models that can be used to obtain estimate of scale score reliabilities and conditional standard errors of measurement (CSEMs).

In the unidimensional IRT models, the person trait parameter representing an examinee' proficiency is commonly denoted theta ( ). In addition to this person parameter, IRT models contain item parameters that represent particular characteristics of the items under the specific model. Additional details regarding IRT models applied to ACT Aspire can be found in Chapter 10. For more information regarding the IRT models in general, see Baker and Kim (2004), de Ayala (2009), or Yen and Fitzpatrick (2006).

To compute the CSEM for ACT Aspire scale scores, the Lord-Wingersky recursive algorithm and its extension (Lord & Wingersky, 1984; Hanson, 1994; Kolen & Brennan, 2014) were first used to estimate , , the conditional distribution of the expected raw scores X given a theta value . With a maximum number of raw score points K and the observed scale score sc(X), the true scale score

given is , and the CSEM of scale

scores given is

To estimate the scale score reliability , the conditional measurementerror variance of scale scores given , is first obtained bysquaring the CSEM, and the aggregate error variance can then be calculated

as , where is the distribution of in the

population. Using a discrete distribution on a finite number of equally spaced points,

such as quadrature points and weights to express , the aggregate error variance

can be approximated by a summation form, , where

q denotes the quadrature points and Q is the maximum number of quadrature points.Finally, letting be the variance of observed scale scores, the scale score

reliability can be estimated as . For more details regarding the

procedures of estimating the CSEM and reliability for scale scores, see Kolen and Brennan (2014).


16.7

ACT Aspire writing scale score reliability and measurement error variance were computed based on a sample of examinees with scores from two raters in the spring 2014 operational administration. Two methods were used to compute reliability and measurement error variance, with one using the G theory method (Brennan, 2001) and the other using the Pearson correlation between scale scores from two raters as an estimate for the inter-rater reliability. Since both methods produced highly similar estimates, only G theory results are reported.

Under the G theory framework, a generalizability (G) study was designed to isolate and estimate various sources of error. Then a decision (D) study was conducted that uses variance components estimated in the G study to provide information for making decisions on the objects of measurement–examinees in this case. In the available dataset, all students took one prompt and the same prompt was given to different students in two modes: online and paper. A single-facet p x r ' design was used for the G study, where p denotes examinees and r ' ratings from different raters. In the p x r ' design, each examinee’s response was graded by two raters, with the pair of raters possibly differing across all examinees. The variance components in this design included examinee effect, rating effect, and effects due to the interaction between examinees and ratings. Under the G theory framework, reliability is estimated using the following formula that involves absolute error variance estimated in the D study.

where 2 (Δ) is the absolute error variance and 2 (p) is variance component for examinees (Brennan, 2001, p.31). (Δ) is reported as the standard error of measurement for the writing test.

Other than scale scores for the five subjects (English, mathematics, reading, science, and writing), ACT Aspire also reports scale scores for three types of combined scores: Composite, ELA, and STEM scores. The Composite score is computed as the average of English, mathematics, reading, and science scale scores; the ELA score is computed as the average of English, reading, and writing scale scores; and the STEM score is computed as the average of mathematics and science scale scores. Assuming that measurement error is independent across the subject tests contributing to the combined scores, the standard error of measurement for the combined scores is computed using the formula

where J is the number of subject tests contributing to the combined scores (i.e., J =4 for Composite scores; J=3 for ELA scores and J = 2 for STEM scores) and refers to the scale score measurement error variance for subject test j. Reliability for each of the combined scores is calculated using the following formula


16.8

where is the the observed score variance and is the measurement error variance of the combined score. The reliability for the combined score is expected to be higher than that for each contributing subject.

The scale score reliabilities and standard error of measurement for the five subjects, Composite, ELA, and STEM scores for the base online forms administered operationally in spring 2014 are presented in Table 16.6. Scale score reliabilities were very close to those derived in the 2013 special studies for the four subject tests which are shown below the raw score reliability in Table 16.1. English scale score reliabilities ranged from .76 (grade 4) to .89 (EHS). Mathematics scale score reliabilities ranged from .67 (grade 4) to .88 (grades 8 and EHS). Reading scale score reliabilities ranged from .81 (grade 7) to .87 (EHS). Science scale score reliabilities ranged from .84 (grade 4) to .89 (grade 7).

Scale score reliabilities are useful because they are an estimate of the precision of the scores reported to students. Raw score reliabilities are also useful for obtaining an estimate of score consistency, but raw number-of-point scores are not used for interpreting student performance on ACT Aspire. Therefore, where possible, scale score reliabilities are preferable.

Table 16.6. Scale Score Reliability Coefficient and Standard Error of measurement by grade and Subject from Spring 2014 Operational Data

Subject

Grade

3 4 5 6 7 8 EHS

English Reliability 0.79 0.76 0.78 0.82 0.82 0.84 0.89

SEM 2.67 2.97 3.24 3.36 3.70 3.53 3.32

Mathematics Reliability 0.80 0.67 0.71 0.79 0.84 0.88 0.88

SEM 1.82 2.34 2.75 2.81 2.74 2.82 2.93

Reading Reliability 0.85 0.85 0.84 0.84 0.81 0.83 0.87

SEM 2.04 2.35 2.49 2.60 2.83 2.91 2.73

Science Reliability 0.88 0.84 0.85 0.88 0.89 0.87 0.87

SEM 2.04 2.49 2.44 2.40 2.47 2.74 3.02

Writing Reliability 0.64 0.67 0.68 0.65 0.75 0.67 0.73

SEM 3.45 3.14 4.11 4.58 3.65 4.52 4.03

Composite* Reliability 0.95 0.96

SEM 1.51 1.50

ELA Reliability 0.88 0.87 0.87 0.88 0.90 0.88 0.93

SEM 1.60 1.64 1.93 2.08 1.97 2.15 1.96

STEM Reliability 0.91 0.86 0.87 0.90 0.92 0.92 0.93

SEM 1.37 1.71 1.84 1.85 1.84 1.97 2.10

Note. Composite scores were not reported below grade 8.

17.1

CHaPtER 17

ACT Aspire ValidityValidity involves collecting evidence regarding the degree to which score interpretations for proposed uses are supported. Validity evidence for ACT Aspire is organized into six areas, including content-oriented evidence, cognitive processes, internal structure, relationships to other constructs, relationships with criteria, and consequences. The validity evidence regarding each area is addressed in this chapter.

According to the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 2014), “Validity refers to the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests” (p. 11). Validation is the process of justifying a particular interpretation or use and may involve logical, empirical, or theoretical components. In the fourth edition of Educational Measurement, Kane (2006) stated that

Measurement uses limited samples of observations to draw general and abstract conclusions about persons or other units (e.g., classes, schools). To validate an interpretation or use of measurements is to evaluate the rationale, or argument, for the claims being made, and this in turn requires a clear statement of the proposed interpretations and uses and a critical evaluation of these interpretations and uses. Ultimately, the need for validation derives from the scientific and social requirement that public claims and decisions be justified (p. 17).

The potential interpretations and uses of ACT Aspire scores are numerous and diverse. Some interpretations and uses are anticipated and others are not, but each needs to be justified by a validity argument. Every interpretation and use of ACT Aspire scores cannot be fully validated in a single chapter of a technical manual. For example, evidence and information that can feed validation is contained in each artifact of technical documentation, research study, research report, and piece of

aCt aSPIRE ValIDIty

17.2

documentation related to ACT Aspire. In addition, test users may develop particular interpretations and uses that are not covered explicitly in this validity chapter or this technical manual.

The evidence summarized here should be viewed as necessary for validating interpretations of ACT Aspire scores but may not be sufficient for validating a particular interpretation for a proposed use. Test users should consider their specific interpretations and uses of ACT Aspire scores and may need to develop interpretive and/or validity arguments to support their uses. In addition, as the ACT Aspire assessments mature, validity evidence will continue to be gathered and evaluated once additional evidence becomes available and the interpretations and uses of ACT Aspire scores evolve.

In this section, two primary and three secondary intended interpretations of ACT Aspire scores are reviewed and evidence regarding the validity of ACT Aspire scores for particular interpretations is described. Validity evidence is organized into the following six areas, as described by The Standards (AERA et al., 2014):

1. content-oriented evidence,

2. evidence regarding cognitive processes,

3. evidence regarding internal structure,

4. evidence regarding relationships with conceptually related constructs,

5. evidence regarding relationships with criteria, and

6. evidence based on consequences of tests.

17.1 Interpretations of ACT Aspire ScoresACT Aspire scores include two primary interpretations and three secondary interpretations. The two primary interpretations are to identify students’ readiness on (a) a college readiness trajectory and (b) a career-readiness trajectory. The three secondary interpretations are to provide instructionally actionable information to educators, empirical data for inferences related to accountability, and empirical support for inferences about international comparisons.

Fundamental to the five interpretations of ACT Aspire scores is the assumption that scores are indicative of performance on a particular set of constructs, specifically the subject areas assessed by ACT Aspire: English, mathematics, reading, science, and writing. Scores on each subject area are intended to provide inferences about students’ knowledge and skills (achievement or performance) in these subjects. Scores obtained (or some combination of them) are then interpreted as indicators of readiness for college and career and are intended to be used to identify student status on the path to readiness for college and career. Therefore, one aspect of validation for ACT Aspire is gathering evidence for the foundational interpretation that ACT Aspire scores are indicative of academic performance in English, mathematics, reading, science, and writing. Gathering evidence to validate the five intended

aCt aSPIRE ValIDIty

17.3

interpretations of ACT Aspire scores builds on the validation of this foundational interpretation.

Some uses and interpretations of individual student scores and aggregate score results have been commonly implemented with assessments of student achievement and are likely to continue. At the student level some of these have included (a) student proficiency against pre-defined performance levels, (b) student growth across grade levels within a particular subject, (c) predicting future performance, (d) student level diagnostic information for instructional purposes, and (e) ranking students (e.g., for selection into special programs). Some uses and interpretations of aggregate scores at the group level have included (a) school and district accountability (e.g., percentage of students above a “satisfactory” cut score), (b) school, district, and classroom growth, and (c) evaluating the impact of the curriculum or special programs. These are examples of additional interpretations and uses that may be implemented with ACT Aspire scores. Some may closely align with the primary and secondary intended interpretations of ACT Aspire scores (e.g., predicting future performance, instructionally actionable information, etc.), and others may not (e.g., using scores for evaluating the impact of curriculum). Each interpretation and use should be carefully evaluated against the available evidence.

17.2 Validity Evidence

17.2.1 Content-Oriented EvidenceContent-oriented evidence for validity is based on the appropriateness of test content and the procedures used to specify and create test content. For the ACT Aspire forms, this evidence is based on the justifications for and the connections between

• the content domain,

• the knowledge and skills implied by the ACT Aspire content specifications,

• the development of items,

• the development of forms,

• test administration conditions, and

• item and test scoring.

Most of the descriptions and justifications for content were provided in earlier chapters, but a brief review is included below.

The content domains for ACT Aspire include (a) English, reading, and writing (ELA) and (b) mathematics and science (STEM) across grades 3-10. These content domains include knowledge and skills needed by students for success in school (i.e., academic achievement) and for success in college and career. ACT Aspire test forms in a particular subject area include samples of test items intended to represent content from the domain implied by a subject area (see Chapter 2).

aCt aSPIRE ValIDIty

17.4

In addition, the ACT College and Career Readiness Standards (grades 8 and above) and the ACT Readiness Standards (grades 3-7) provide statements of what students should know and be able to do to be on-track to attaining college and career readiness in English, mathematics, reading, science, and writing at each grade level. These statements were developed by content and measurement experts based on research that included the ACT National Curriculum Survey. These standards are consistent with many states’ standards that focus on college and career readiness (see Chapter 3).

ACT Aspire content specifications identify the number, type, and depth-of-knowledge (DOK) level of items required when each form was developed. These specifications determine the type and amount of evidence obtained within the content domain. The extent to which the type and selection of content included in the content specifications represent the content domain determines how representative test form content is of the intended content domain (see Chapter 2).

Item writers are carefully chosen from a pool of qualified educators and follow standardized procedures and templates when creating items. All items undergo rigorous internal reviews and external audits to ensure that they are accurate, appropriate, and fair for the intended population of students. Items are pretested with students drawn from the intended ACT Aspire population to verify item quality and item characteristics (see Chapters 2 and 4).

Initial ACT Aspire forms are developed to follow the content specifications. Initial forms are reviewed by content experts to determine the accuracy and appropriateness of the collection of items. If experts determine that adjustments should be made to the initial test forms, items may be swapped that maintain consistency with the content specifications. A form is finalized once it passes these reviews (see Chapter 2).

ACT Aspire is a standardized test administered on paper or online under time limits. The standardization of the ACT Aspire test administration is intended to ensure that test administration characteristics do not contribute adversely to some students’ performance compared to others. Documentation of test administration practices is included in the ACT Aspire Room Supervisor Manual. However, where possible, students are allowed some flexibility in the test administration to participate in a way that is consistent with their needs but that does erode the intended and relevant characteristics of the test or test administration. In addition, a standardized process is followed for documenting and provided students who need particularly strong supports, such as testing accommodations, the flexibility they need. See Chapter 5 for additional details.

ACT Aspire standardized test administrations on paper and online are established to be as similar as possible. However, test administration conditions are different because the mode of administration is different. ACT Aspire forms across mode are not assumed to be the same, in part due to potential differences in performance

aCt aSPIRE ValIDIty

17.5

due to mode and the availability of technology-enhanced items only in online tests. Chapter 19 addresses performance differences across online and paper forms and Chapter 20 addresses how scores are equated across online and paper forms so that scale scores are reported on the ACT Aspire scale. Treating forms across mode separately when linking them to the ACT Aspire scale helps ensure that scores from online forms and paper forms are comparable.

Descriptions of item scoring procedures are included in Chapter 4 and test scoring procedures are included in Chapter 9. ACT Aspire item and test scoring is intended to be applied uniformly across students and forms so that scores consistently reflect student performance across various student, item, and test characteristics. For example, rater training and quality control procedures for scoring constructed response items are explained in Chapters 2 and 4.

17.2.2 Evidence Regarding Cognitive ProcessesEvidence regarding cognitive processes includes verifying that an intended interpretation assuming a particular psychological process or cognitive operation is indeed used by test takers. Two sources of validity evidence are commonly used to evaluate cognitive processes. The first is to use think aloud or cognitive lab procedures, where students say what they are thinking while working on test materials. This procedure is used to gather evidence that the intended cognitive processes for a test item are consistent with the processes that students verbalize that they use to respond to an item. These procedures are applicable to a variety of artifacts collected and scored as part of an assessment, but can be particularly helpful for evaluating artifacts that require students to construct responses, which often requires stronger assumptions about the cognitive processes used to provide a response. The second source of evidence commonly used to evaluate cognitive processes applies to assessments where student responses are rated. In this case, the extent to which the raters accurately interpret and apply scoring rubrics, rules, and/or criteria contributes to evidence regarding cognitive processes.

As part of piloting constructed response (CR) items for ACT Aspire in 2012 and 2013, students at grades 8, 10, and 11 from several schools were recruited to respond to two or three CR tasks in mathematics or science. Students completed tasks in pairs or small groups using a think aloud protocol that involved discussing with their partners/group their strategies and reasoning processes as they completed the CR items. After completing items, students were asked to fill out a brief survey (e.g., what the student liked, disliked, and found confusing about the items) and engage in a brief group interview regarding their experiences with the items.

These pilots helped verify that ACT Aspire CR items required students to draw on the intended knowledge and skills to produce their answer. Student responses from think aloud tasks, surveys, and interviews indicated that the CR items required the intended knowledge and skills for students to supply an answer. In some cases, students found

aCt aSPIRE ValIDIty

17.6

tasks challenging, but it was apparent that responding to items required students to use intended knowledge and skills in mathematics or science to respond to each CR item.

Another source of evidence regarding cognitive processes can be found in Chapters 2 and 4 describing the training of raters and scoring of items. This process involves carefully defined procedures, checks, and reviews to ensure the quality and validity of scores from ratings obtained for constructed response items. The procedures and processes include checks of the validity of scores during the scoring process, where initial scores may be rejected if they do not meet quality checks. In addition, during the process of rubric development and preliminary application of the rubric to student responses, the coherence and consistency of rubrics are considered against the content intended to be measured by each item to help ensure that the rubric leads to ratings that indeed reflect the intended knowledge and skills for the item.

17.2.3 Evidence Regarding Internal StructureEvidence regarding internal structure is based on evaluating intended score interpretations from the perspective of expected characteristics among test items or parts of the test. ACT Aspire is intended to measure academic achievement in English, mathematics, reading, science, and writing. In addition, Composite scores1, ELA scores, and STEM scores are reported for students. Indicators of on-track to college readiness and on-track to career readiness are also provided to students along with additional scores describing the percentages of points obtained in reporting categories (a sub-set of items within a subject area) and an indicator regarding progress with text complexity.

Dimensionality. The expectation for scores in English, mathematics, reading, science, and writing, is that they are reasonably essentially unidimensional. Even though reporting category scores (in percentage of points correct) are included in ACT Aspire reports, which would suggest distinct sub-dimensions within a subject area, ACT Aspire was developed to measure academic achievement in English, mathematics, reading, science, and writing, so a dominant dimension would be expected in empirical analysis of dimensionality.

A study of the dimensionality was conducted to evaluate the extent to which ACT Aspire item performance appeared to be best represented by a single dimension. Analysis included conducting an exploratory factor analysis (EFA)2, which is often used to evaluate the dimensionality of performance on assessments like ACT Aspire (e.g., Sireci, 2009, p. 30). Data included item scores from students participating in the spring 2013 comparability study3 in English, mathematics, reading, and science4.

1 Grade 8 and above only.2 Mplus software (Muthén & Muthén, 2008) was used to conduct the exploratory factor analysis on the inter-item

correlation matrices. Settings used to extract factors included weighted least squares estimation and the Geomin (oblique) factor rotation.

3 For additional details regarding the comparability study, see Chapter 19.4 Writing was excluded from the dimensionality analysis because the writing test consists of a single writing prompt.

aCt aSPIRE ValIDIty

17.7

Scree plots of eigenvalues, model fit, and factor loadings were examined to evaluate dimensionality.

Scree plots display factor analysis eigenvalues against the number of extracted factors, and evaluation of scree plots involves identifying the “elbow” in the plot to indicate the number of dimensions to retain (Cattell, 1966). The scree plots for each subject listed in Figure 17.1 show that for all reading forms and the majority of the English, mathematics and science forms, the “elbow” appears after the first eigenvalue, which is evidence for a single dimension. For a few English forms (paper forms for grades 3 and 6), mathematics forms (paper forms for grade 4 and 6), and science (paper forms for grade 3, 4, and 6), the “elbow” appears after the second eigenvalue indicating that a two-factor model might fit the data better. Tables 17.1-17.4 show the eigenvalue and the proportion of accounted variance for each factor in forms from different subjects. As shown in these tables, the percentage of variance accounted for by the second factor for these forms did not exceed 10% (except in two cases where the percentages were 10.1 and 10.3). Hatcher (1994) suggested that factors accounting for less than 10% of the variance should not be retained, so we could argue for not retaining second factors based on a combination of scree plots and percentage of variance accounted for. The scree plots generally showed evidence that item performance was best represented by a unidimensional model although in some cases, two factors may be better represent the data.

Model fit was also evaluated. Based on the scree plot results, which indicated that up to two factors may be justifiable for data in some subjects at certain grade levels, model fit was evaluated by comparing one-factor and two-factor models for each subject and grade-level to determine whether the one-factor or two-factor model appeared to better represent students’ item-level performance on ACT Aspire assessments. The goodness-of-fit for each model was examined using various model fit statistics. The fit statistics included the chi-square statistic (CHISQ) and several other fit statistics, including (along with cutoff values) comparative fit index (CFI; .95), Tucker-Lewis index (TLI; .95), root mean square error of approximation (RMSEA; .06), and standardized root mean square residual (SRMR; .08) (see Hu and Bentler, 2009). The difference in chi-square statistics between one- and two-factor models was also evaluated using a statistical significance test. The correlation between Factors 1 and 2 was used to evaluate how well the two factors, when modeled, could be distinguished from each other. Fit indices were evaluated considering the goodness of fit of the one-factor and two-factor models.

Tables 17.5–17.8 display the fit statistics for the one-factor and two-factor models. The statistical significance of the change in chi-square statistics between the one- and two-factor models for each grade and subject is also included in the tables (i.e. the rows associated with p-value). These chi-square difference tests are statistically significant for all forms, as indicated by the * in the rows associated with chi-square although this is likely due to the well-known sensitivity of chi-square (e.g., Bollen,

aCt aSPIRE ValIDIty

17.8

1989), where relatively small changes in chi-square statistics are identified as statistically significant when sample sizes are large. For this reason, dimensionality analysis also typically includes other fit indices. Fit indices were flagged with a * if they showed inadequate fit (i.e., the CFI, TLI, RMSEA, SRMR rows). Forms that showed inadequate fit with the one-factor model generally showed adequate fit with the two-factor model.

Fit statistics for the English, mathematics, reading, and science showed evidence for unidimensionality in most cases; the two-factor model did not generally show meaningful improvement in model fit compared to the one-factor model. The most noticeable improvement in fit for two-factor models was observed for mathematics grade 4 paper, mathematics grade 5 paper, and mathematics grade 6 online and paper. Interestingly, for these forms the inter-factor correlations were relatively small compared to many of the other forms, indicating that the second factor for these forms, if meaningful, appeared to be measuring something different than the first factor.

The factor loadings were examined to investigate the substantive meaning of the constructs defined by the two factors. Factor loadings are of particular interest for data from forms showing noticeable improvement in fit under the two-factor model because the patterns of loadings across Factor 1 and Factor 2 may provide us with an idea of what each factor appears to be measuring (e.g., Gorsuch, 1983 , pp. 206-212). The loadings for the two-factor model are shown in Table 17.9 for the forms that showed particularly strong evidence of improvement in fit between the one- and two-factor models (i.e., mathematics grade 4 paper, mathematics grade 5 paper, and mathematics grade 6 online and paper). Given that the items with the largest loadings on the second factor tended to occur toward the end of the test, these data may have exhibited a degree of speededness. This pattern was particularly evident for mathematics grades 4 , 5, and 6 paper forms although the last two items (and other items in the second half of the test) on the grade 6 online form also had relatively large loadings on a second factor5.

The exploratory factor analysis results provided evidence for unidimensionality with most of the ACT Aspire data. However, grades 4 and 6 mathematics in particular may require monitoring to determine whether a second dimension is observed for future administrations of ACT Aspire.

Relationships among scores. Because English, reading, and writing scale scores are combined to obtain ELA scores and mathematics and science are combined to obtain STEM scores, the expectation would be to observe larger correlations among scores

5 While grade 4 online did not show particularly strong evidence for a two dimensional solution, the factor loadings under a two-factor solution showed a similar pattern, with the last few items having strong positive loadings on a second factor.

aCt aSPIRE ValIDIty

17.9

contributing to ELA and STEM than across these reported scores. For example, the expectation would be that English and reading scores are more highly correlated than mathematics and reading.

Table 17.10 lists the means, standard deviations, and sample sizes for English, mathematics, reading, science, writing scale scores, and those combined scores from all data available from spring 2013 ACT Aspire special studies. Table 17.11 lists the correlations among ACT Aspire scores for students taking English, mathematics, reading, science, and writing tests by grade level. Patterns of correlations at each grade are generally consistent with the expectation that English, reading, writing and ELA show higher correlations among each other and mathematics, science, and STEM show higher correlations among each other with two exceptions. In grades 3, 5, and 6, English and science had correlations similar to or higher than the correlation between English and reading. Also, in grades 3 to 9, the correlation between science and reading was the same as or greater than the correlation between science and mathematics.

The pattern of relationships among scores is generally consistent with the interpretations of scores reflecting performance in English, mathematics, reading, science, writing, ELA, and STEM. In particular, the amount of reading required of the science test can be explained by the relatively strong relationships between (a) science and English and (b) science and reading.

aCt aSPIRE ValIDIty

17.10

Figure 17.1. Scree plots of aCt aspire online and paper forms by subject and grade using data from the spring 2013 comparability study

English Grade 3 English Grade 4


aCt aSPIRE ValIDIty

17.11


English Early High School

aCt aSPIRE ValIDIty

17.12

Mathematics Grade 3 Mathematics Grade 4


aCt aSPIRE ValIDIty

17.13


Mathematics Early High School

aCt aSPIRE ValIDIty

17.14

Reading Grade 3 Reading Grade 4


aCt aSPIRE ValIDIty

17.15


Reading Early High School

aCt aSPIRE ValIDIty

17.16

Science Grade 3 Science Grade 4


aCt aSPIRE ValIDIty

17.17


Science Early High School

aCt aSPIRE ValIDIty

17.18

Table 17.1. English aCt aspire Online and Paper mode Exploratory Factor analysis Eigenvalues and Percentage of Variance Explained, by grade level using Data from the Spring 2013 Comparability Study

Fact

or

Gra

de 3

Gra

de 4

Gra

de 5

Gra

de 6

Gra

de 7

Gra

de 8

Ear

ly H

igh

Sch

ool

Onl

ine

Pap

erO

nlin

eP

aper

Onl

ine

Pap

erO

nlin

eP

aper

Onl

ine

Pap

erO

nlin

eP

aper

Onl

ine

Pap

er

(N=

1,35

9)(N

=1,

405)

(N=

1,48

0)(N

=1,

489)

(N=

1,34

9)(N

=1,

346)

(N=

1,12

9)(N

=1,

166)

(N=

1,01

9)(N

=1,

033)

(N=

866)

(N=

887)

(N=

1,17

6)(N

=1,

209)

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

%E

ig.

%E

ig.

%E

ig.

%E

ig.

%E

ig.

%E

ig.

%E

ig.

%

16

.62

6.5

6.4

25

.46

.22

4.6

6.8

27

.17

.02

8.0

6.8

27

.17

.82

2.3

7.6

21

.78

.02

2.9

7.7

21

.99

.42

6.8

10

.83

0.9

14

.42

8.9

14

.22

8.3

21

.45

.72

.51

0.1

1.6

6.4

1.9

7.7

1.7

6.8

2.0

7.9

2.3

6.4

2.9

8.3

1.9

5.5

2.2

6.2

2.1

5.9

1.8

5.2

2.3

4.6

2.8

5.6

31

.45

.71

.35

.21

.35

.41

.35

.31

.45

.61

.35

.31

.44

.11

.74

.91

.54

.41

.54

.31

.74

.91

.75

.01

.73

.31

.93

.7

41

.24

.81

.24

.91

.24

.71

.25

.01

.24

.81

.14

.41

.44

.01

.44

.01

.43

.91

.54

.21

.44

.01

.33

.71

.52

.91

.63

.2

51

.14

.61

.14

.31

.14

.51

.14

.41

.04

.11

.14

.31

.33

.61

.33

.81

.33

.61

.44

.01

.33

.71

.23

.51

.32

.51

.32

.6

61

.14

.21

.14

.21

.14

.21

.03

.91

.04

.01

.04

.01

.23

.41

.23

.51

.23

.51

.23

.51

.23

.41

.23

.51

.22

.41

.22

.5

71

.04

.01

.03

.91

.04

.20

.93

.81

.03

.90

.93

.81

.23

.31

.13

.11

.23

.41

.23

.41

.13

.21

.13

.21

.22

.41

.22

.4

81

.03

.80

.93

.71

.03

.90

.93

.60

.93

.70

.93

.71

.13

.11

.13

.11

.13

.11

.13

.21

.13

.11

.13

.11

.22

.31

.12

.3

90

.93

.60

.83

.40

.93

.80

.93

.50

.93

.60

.93

.61

.13

.11

.03

.01

.02

.91

.13

.11

.03

.01

.02

.91

.12

.31

.12

.3

10

0.9

3.5

0.8

3.3

0.9

3.4

0.8

3.3

0.8

3.2

0.8

3.3

1.0

2.9

1.0

2.9

1.0

2.9

1.0

3.0

1.0

2.9

1.0

2.8

1.1

2.2

1.1

2.1

11

0.8

3.3

0.8

3.2

0.8

3.3

0.8

3.1

0.8

3.1

0.8

3.2

1.0

2.7

1.0

2.8

1.0

2.9

1.0

2.8

1.0

2.7

1.0

2.7

1.0

2.0

1.0

2.0

12

0.8

3.1

0.8

3.1

0.8

3.2

0.8

3.0

0.7

2.9

0.8

3.1

1.0

2.7

0.9

2.6

1.0

2.7

0.9

2.7

0.9

2.7

0.9

2.5

1.0

2.0

1.0

1.9

13

0.7

3.0

0.8

3.0

0.7

3.0

0.7

2.9

0.7

2.8

0.7

2.9

0.9

2.6

0.9

2.6

0.9

2.6

0.9

2.6

0.9

2.5

0.9

2.5

1.0

1.9

0.9

1.8

14

0.7

2.9

0.7

2.9

0.7

2.9

0.7

2.7

0.7

2.7

0.7

2.8

0.9

2.6

0.9

2.5

0.9

2.5

0.9

2.6

0.9

2.5

0.8

2.3

0.9

1.9

0.9

1.8

15

0.7

2.7

0.6

2.5

0.7

2.8

0.7

2.7

0.7

2.6

0.7

2.6

0.9

2.4

0.8

2.4

0.8

2.4

0.9

2.5

0.8

2.3

0.7

2.1

0.9

1.8

0.9

1.8

16

0.6

2.5

0.6

2.4

0.7

2.6

0.6

2.4

0.6

2.5

0.6

2.5

0.8

2.3

0.8

2.3

0.8

2.3

0.8

2.4

0.8

2.2

0.7

2.1

0.9

1.7

0.9

1.7

17

0.6

2.5

0.6

2.3

0.6

2.5

0.6

2.4

0.6

2.3

0.6

2.4

0.8

2.2

0.8

2.2

0.8

2.2

0.8

2.3

0.7

2.1

0.7

1.9

0.8

1.7

0.8

1.6

18

0.6

2.3

0.5

2.2

0.6

2.4

0.6

2.3

0.6

2.2

0.5

2.1

0.8

2.2

0.7

2.1

0.8

2.2

0.8

2.2

0.7

2.0

0.7

1.9

0.8

1.6

0.8

1.6

19

0.5

2.1

0.5

1.9

0.6

2.3

0.5

2.1

0.5

2.1

0.5

2.1

0.7

2.1

0.7

2.0

0.7

2.1

0.7

2.1

0.7

1.9

0.6

1.8

0.8

1.6

0.8

1.6

20

0.5

1.9

0.4

1.7

0.5

2.2

0.5

1.9

0.5

2.0

0.5

2.0

0.7

1.9

0.7

1.9

0.7

2.0

0.7

1.9

0.6

1.8

0.6

1.7

0.8

1.6

0.8

1.6

21

0.5

1.8

0.4

1.6

0.5

1.9

0.5

1.8

0.5

1.8

0.5

1.8

0.6

1.8

0.6

1.8

0.7

1.9

0.6

1.8

0.6

1.7

0.6

1.6

0.8

1.5

0.7

1.5

22

0.4

1.7

0.4

1.5

0.5

1.8

0.4

1.7

0.4

1.7

0.4

1.5

0.6

1.7

0.6

1.8

0.6

1.8

0.6

1.8

0.6

1.6

0.5

1.5

0.7

1.4

0.7

1.5

23

0.4

1.4

0.3

1.4

0.4

1.7

0.4

1.6

0.4

1.6

0.3

1.3

0.6

1.7

0.6

1.7

0.6

1.8

0.6

1.7

0.5

1.5

0.5

1.5

0.7

1.4

0.7

1.4

24

0.3

1.2

0.3

1.0

0.3

1.3

0.3

1.3

0.3

1.0

0.3

1.2

0.6

1.6

0.5

1.6

0.6

1.7

0.6

1.6

0.5

1.5

0.5

1.4

0.7

1.4

0.7

1.3

25

0.3

1.1

0.2

0.7

0.2

1.0

0.1

0.6

0.2

0.6

0.3

1.1

0.6

1.6

0.5

1.4

0.6

1.6

0.5

1.5

0.5

1.4

0.5

1.4

0.7

1.3

0.6

1.3

aCt aSPIRE ValIDIty

17.19

Fact

or

Gra

de 3

Gra

de 4

Gra

de 5

Gra

de 6

Gra

de 7

Gra

de 8

Ear

ly H

igh

Sch

ool

Onl

ine

Pap

erO

nlin

eP

aper

Onl

ine

Pap

erO

nlin

eP

aper

Onl

ine

Pap

erO

nlin

eP

aper

Onl

ine

Pap

er

(N=

1,35

9)(N

=1,

405)

(N=

1,48

0)(N

=1,

489)

(N=

1,34

9)(N

=1,

346)

(N=

1,12

9)(N

=1,

166)

(N=

1,01

9)(N

=1,

033)

(N=

866)

(N=

887)

(N=

1,17

6)(N

=1,

209)

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

%E

ig.

%E

ig.

%E

ig.

%E

ig.

%E

ig.

%E

ig.

%E

ig.

%

26

0.5

1.5

0.5

1.4

0.5

1.4

0.5

1.5

0.5

1.3

0.4

1.2

0.6

1.3

0.6

1.2

27

0.5

1.4

0.5

1.3

0.5

1.4

0.5

1.4

0.4

1.3

0.4

1.1

0.6

1.2

0.6

1.2

28

0.5

1.3

0.5

1.3

0.5

1.3

0.5

1.4

0.4

1.1

0.3

0.9

0.6

1.2

0.6

1.2

29

0.4

1.3

0.4

1.2

0.5

1.3

0.5

1.3

0.4

1.0

0.3

0.8

0.6

1.2

0.6

1.2

30

0.4

1.2

0.4

1.2

0.4

1.2

0.4

1.1

0.3

1.0

0.3

0.8

0.6

1.2

0.6

1.1

31

0.4

1.1

0.4

1.1

0.4

1.1

0.4

1.1

0.3

0.8

0.2

0.6

0.6

1.1

0.5

1.1

32

0.4

1.0

0.3

0.9

0.4

1.0

0.3

1.0

0.3

0.8

0.2

0.6

0.5

1.1

0.5

1.1

33

0.3

0.9

0.3

0.8

0.3

0.9

0.3

0.8

0.2

0.6

0.2

0.5

0.5

1.1

0.5

1.0

34

0.3

0.7

0.3

0.7

0.3

0.8

0.3

0.8

0.2

0.5

0.1

0.4

0.5

1.0

0.5

1.0

35

0.2

0.6

0.1

0.4

0.2

0.6

0.2

0.6

0.1

0.4

0.1

0.4

0.5

1.0

0.5

0.9

36

0.5

0.9

0.5

0.9

37

0.4

0.9

0.4

0.9

38

0.4

0.8

0.4

0.8

39

0.4

0.8

0.4

0.8

40

0.4

0.8

0.4

0.8

41

0.4

0.7

0.4

0.7

42

0.3

0.7

0.3

0.7

43

0.3

0.7

0.3

0.7

44

0.3

0.7

0.3

0.6

45

0.3

0.6

0.3

0.6

46

0.3

0.6

0.3

0.5

47

0.3

0.5

0.2

0.4

48

0.2

0.5

0.2

0.4

49

0.2

0.4

0.2

0.4

50

0.2

0.4

0.2

0.4

Not

e. E

ig. d

enot

es e

igen

valu

e.


aCt aSPIRE ValIDIty

17.20

Table 17.2. mathematics aCt aspire Online and Paper mode Exploratory Factor analysis Eigenvalues and Percentages of Variance Explained, by grade level using Data from the Spring 2013 Comparability StudyFa

ctor

Gra

de 3

Gra

de 4

Gra

de 5

Gra

de 6

Gra

de 7

Gra

de 8

Ear

ly H

igh

Sch

ool

Onl

ine

Pap

erO

nlin

eP

aper

Onl

ine

Pap

erO

nlin

eP

aper

Onl

ine

Pap

erO

nlin

eP

aper

Onl

ine

Pap

er

(N=

1,48

8)(N

=1,

539)

(N=

1,53

4)(N

=1,

532)

,(N

=1,

450)

(N=

1,42

7)(N

=1,

156)

(N=

1,19

4)(N

=97

4)(N

=1,

010)

(N=

896)

(N=

927)

(N=

1,07

5)(N

=1,

098)

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

16

.32

5.3

5.8

23

.14

.41

7.7

3.8

15

.04

.61

8.5

4.0

16

.16

.61

9.5

5.6

16

.58

.52

5.1

7.7

22

.61

1.3

29

.71

1.2

29

.41

2.1

31

.81

0.7

28

.3

21

.56

.02

.08

.11

.66

.42

.28

.81

.66

.41

.87

.12

.57

.33

.39

.71

.85

.42

.67

.61

.95

.01

.84

.81

.84

.72

.05

.4

31

.35

.21

.35

.01

.45

.51

.45

.71

.45

.61

.66

.51

.64

.81

.75

.11

.54

.31

.54

.41

.54

.01

.84

.71

.33

.31

.43

.6

41

.14

.61

.14

.51

.35

.01

.35

.21

.24

.81

.35

.21

.44

.11

.54

.31

.44

.01

.44

.01

.43

.81

.33

.41

.23

.31

.33

.4

51

.14

.51

.14

.51

.24

.81

.25

.01

.24

.71

.24

.91

.44

.01

.33

.81

.44

.01

.33

.71

.33

.31

.33

.31

.23

.11

.23

.1

61

.14

.31

.14

.31

.24

.61

.14

.41

.14

.61

.14

.61

.23

.61

.23

.51

.33

.71

.23

.41

.23

.31

.23

.21

.12

.91

.13

.0

71

.04

.21

.04

.21

.14

.51

.14

.21

.14

.31

.14

.51

.23

.51

.23

.51

.13

.31

.23

.41

.23

.11

.23

.21

.12

.81

.12

.8

81

.03

.91

.04

.01

.14

.41

.04

.21

.14

.21

.14

.21

.13

.21

.13

.31

.13

.11

.13

.31

.12

.91

.13

.01

.02

.71

.12

.8

91

.03

.80

.93

.71

.14

.21

.04

.01

.04

.01

.04

.01

.13

.11

.13

.21

.03

.01

.03

.01

.02

.71

.12

.81

.02

.61

.12

.8

10

0.9

3.5

0.9

3.6

1.0

4.0

1.0

3.9

0.9

3.8

1.0

3.8

1.0

3.0

1.1

3.2

1.0

3.0

1.0

3.0

1.0

2.6

1.0

2.6

1.0

2.6

1.0

2.6

11

0.9

3.4

0.8

3.4

0.9

3.8

0.9

3.7

0.9

3.6

1.0

3.8

1.0

2.9

1.0

2.9

1.0

2.9

1.0

2.8

1.0

2.5

0.9

2.4

0.9

2.4

1.0

2.6

12

0.8

3.2

0.8

3.2

0.9

3.7

0.9

3.6

0.8

3.3

0.9

3.6

1.0

2.9

1.0

2.9

0.9

2.7

1.0

2.8

0.9

2.4

0.9

2.4

0.9

2.4

0.9

2.4

13

0.8

3.1

0.8

3.0

0.9

3.4

0.9

3.5

0.8

3.2

0.8

3.2

0.9

2.8

0.9

2.7

0.9

2.7

0.9

2.5

0.9

2.4

0.9

2.3

0.9

2.3

0.9

2.3

14

0.7

2.9

0.7

3.0

0.8

3.3

0.8

3.2

0.8

3.1

0.8

3.1

0.9

2.6

0.9

2.6

0.9

2.6

0.8

2.5

0.9

2.3

0.8

2.2

0.9

2.3

0.8

2.2

15

0.7

2.8

0.7

2.7

0.8

3.1

0.8

3.0

0.8

3.0

0.8

3.0

0.9

2.5

0.9

2.5

0.8

2.4

0.8

2.4

0.8

2.1

0.8

2.1

0.8

2.1

0.8

2.2

16

0.7

2.7

0.6

2.6

0.7

2.8

0.7

2.8

0.7

2.8

0.7

3.0

0.8

2.4

0.8

2.4

0.8

2.2

0.8

2.2

0.8

2.0

0.8

2.1

0.7

1.9

0.8

2.1

17

0.6

2.5

0.6

2.5

0.7

2.7

0.7

2.6

0.7

2.7

0.7

2.9

0.8

2.3

0.8

2.3

0.8

2.2

0.8

2.2

0.8

2.0

0.7

1.9

0.7

1.9

0.8

2.0

18

0.6

2.4

0.6

2.4

0.6

2.6

0.6

2.6

0.7

2.7

0.7

2.7

0.8

2.2

0.8

2.2

0.7

2.1

0.7

2.2

0.7

1.9

0.7

1.8

0.7

1.8

0.7

1.9

19

0.6

2.2

0.5

2.2

0.6

2.5

0.6

2.5

0.6

2.6

0.6

2.5

0.7

2.2

0.7

2.1

0.7

2.0

0.7

2.0

0.7

1.7

0.7

1.8

0.7

1.8

0.7

1.8

20

0.5

2.1

0.5

2.0

0.6

2.3

0.6

2.4

0.6

2.4

0.6

2.4

0.7

2.2

0.7

2.0

0.6

1.9

0.7

1.9

0.6

1.7

0.6

1.6

0.6

1.6

0.7

1.8

21

0.5

1.8

0.5

1.8

0.6

2.2

0.6

2.3

0.5

2.2

0.6

2.2

0.7

2.0

0.7

1.9

0.6

1.8

0.6

1.8

0.6

1.7

0.6

1.6

0.6

1.6

0.6

1.7

22

0.4

1.7

0.4

1.8

0.5

2.1

0.5

2.2

0.5

2.0

0.5

2.1

0.6

1.9

0.6

1.9

0.6

1.7

0.6

1.7

0.6

1.6

0.6

1.6

0.6

1.5

0.6

1.7

23

0.4

1.6

0.4

1.5

0.5

1.9

0.5

2.0

0.5

2.0

0.5

1.8

0.6

1.9

0.6

1.8

0.6

1.6

0.6

1.7

0.6

1.5

0.6

1.5

0.6

1.4

0.6

1.6

24

0.3

1.2

0.4

1.5

0.4

1.7

0.4

1.7

0.5

1.8

0.4

1.7

0.6

1.8

0.6

1.8

0.5

1.5

0.5

1.6

0.6

1.5

0.5

1.4

0.5

1.4

0.5

1.4

25

0.2

0.8

0.3

1.2

0.1

0.6

0.4

1.4

0.4

1.6

0.3

1.0

0.5

1.6

0.6

1.6

0.5

1.4

0.5

1.5

0.5

1.4

0.5

1.4

0.5

1.3

0.5

1.4

aCt aSPIRE ValIDIty

17.21

Fact

or

Gra

de 3

Gra

de 4

Gra

de 5

Gra

de 6

Gra

de 7

Gra

de 8

Ear

ly H

igh

Sch

ool

Onl

ine

Pap

erO

nlin

eP

aper

Onl

ine

Pap

erO

nlin

eP

aper

Onl

ine

Pap

erO

nlin

eP

aper

Onl

ine

Pap

er

(N=

1,48

8)(N

=1,

539)

(N=

1,53

4)(N

=1,

532)

,(N

=1,

450)

(N=

1,42

7)(N

=1,

156)

(N=

1,19

4)(N

=97

4)(N

=1,

010)

(N=

896)

(N=

927)

(N=

1,07

5)(N

=1,

098)

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

26

0.5

1.5

0.5

1.5

0.5

1.4

0.5

1.4

0.5

1.3

0.5

1.3

0.5

1.3

0.5

1.4

27

0.5

1.5

0.5

1.4

0.4

1.3

0.5

1.4

0.5

1.2

0.5

1.2

0.5

1.3

0.5

1.3

28

0.5

1.4

0.5

1.3

0.4

1.2

0.5

1.3

0.4

1.1

0.4

1.2

0.5

1.2

0.5

1.3

29

0.4

1.3

0.4

1.3

0.4

1.2

0.4

1.3

0.4

1.1

0.4

1.1

0.4

1.2

0.4

1.2

30

0.4

1.2

0.4

1.1

0.4

1.0

0.4

1.1

0.4

1.0

0.4

1.0

0.4

1.1

0.4

1.1

31

0.4

1.1

0.4

1.1

0.3

0.9

0.3

0.9

0.4

0.9

0.4

0.9

0.4

1.0

0.4

1.1

32

0.3

0.9

0.3

0.9

0.3

0.9

0.3

0.9

0.3

0.8

0.3

0.9

0.4

1.0

0.4

1.0

33

0.3

0.8

0.3

0.8

0.3

0.8

0.3

0.8

0.3

0.7

0.3

0.9

0.4

0.9

0.4

0.9

34

0.1

0.2

0.2

0.6

0.2

0.6

0.2

0.6

0.3

0.7

0.3

0.8

0.3

0.9

0.3

0.9

35

0.2

0.7

0.3

0.7

0.3

0.8

0.3

0.8

36

0.2

0.5

0.3

0.7

0.3

0.7

0.3

0.8

37

0.2

0.4

0.2

0.5

0.2

0.5

0.3

0.7

38

0.1

0.3

0.2

0.4

0.2

0.4

0.2

0.6

Not

e. E

ig. d

enot

es e

igen

valu

e.


aCt aSPIRE ValIDIty

17.22

Table 17.3. Reading aCt aspire Online and Paper mode Exploratory Factor analysis Eigenvalues and Percentages of Variance Explained, by grade level using Data from the Spring 2013 Comparability StudyFa

ctor

Gra

de 3

Gra

de 4

Gra

de 5

Gra

de 6

Gra

de 7

Gra

de 8

Ear

ly H

igh

Sch

ool

Onl

ine

Pap

erO

nlin

eP

aper

Onl

ine

Pap

erO

nlin

eP

aper

Onl

ine

Pap

erO

nlin

eP

aper

Onl

ine

Pap

er

(N=

1,44

1)(N

=1,

513)

(N=

1,39

9)(N

=1,

401)

(N=

1,29

9)(N

=1,

294)

(N=

1,00

4)(N

=1,

038)

(N=

1,01

0)(N

=1,

037)

(N=

800)

(N=

831)

(N=

857)

(N=

919)

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

18

.03

3.5

8.0

33

.37

.73

2.1

8.1

33

.67

.93

2.8

7.9

33

.07

.33

1.8

7.0

30

.47

.33

0.4

7.3

30

.49

.03

7.6

9.4

39

.21

0.0

41

.61

0.6

44

.0

21

.35

.51

.77

.11

.35

.41

.35

.61

.45

.61

.46

.01

.25

.41

.46

.21

.35

.31

.35

.51

.56

.41

.66

.81

.35

.41

.45

.8

31

.04

.31

.14

.71

.14

.51

.14

.61

.24

.91

.24

.81

.14

.81

.25

.21

.25

.01

.25

.11

.25

.01

.14

.71

.14

.51

.14

.6

41

.04

.21

.14

.41

.04

.31

.04

.11

.04

.31

.04

.21

.14

.71

.14

.91

.14

.71

.14

.71

.14

.81

.04

.31

.04

.11

.04

.0

51

.04

.01

.04

.11

.04

.01

.04

.11

.04

.21

.04

.11

.04

.41

.14

.71

.04

.31

.04

.41

.14

.51

.04

.30

.93

.90

.93

.6

60

.93

.90

.93

.80

.93

.80

.93

.90

.93

.90

.93

.90

.94

.11

.04

.51

.04

.11

.04

.21

.04

.00

.93

.70

.93

.80

.83

.4

70

.93

.70

.93

.70

.93

.70

.93

.70

.93

.70

.93

.60

.94

.01

.04

.20

.93

.90

.93

.90

.93

.70

.83

.50

.83

.50

.83

.2

80

.83

.50

.83

.50

.93

.60

.93

.60

.93

.60

.83

.40

.93

.90

.93

.90

.93

.60

.93

.90

.83

.40

.83

.40

.83

.10

.83

.2

90

.83

.20

.83

.40

.83

.30

.83

.30

.83

.30

.83

.40

.83

.60

.83

.60

.83

.40

.93

.50

.83

.30

.83

.10

.72

.90

.73

.1

10

0.8

3.2

0.7

3.1

0.8

3.1

0.8

3.2

0.7

3.1

0.8

3.1

0.8

3.4

0.8

3.5

0.8

3.2

0.8

3.4

0.8

3.3

0.7

2.8

0.7

2.8

0.6

2.6

11

0.7

3.1

0.7

3.0

0.7

3.0

0.7

3.0

0.7

3.0

0.7

2.9

0.7

3.1

0.8

3.3

0.7

3.0

0.8

3.3

0.7

2.7

0.6

2.6

0.7

2.8

0.6

2.5

12

0.7

3.0

0.7

2.8

0.7

2.9

0.7

2.9

0.7

3.0

0.7

2.9

0.7

3.1

0.7

3.2

0.7

3.0

0.7

3.0

0.6

2.7

0.6

2.6

0.6

2.5

0.6

2.4

13

0.7

2.9

0.7

2.7

0.7

2.7

0.7

2.8

0.7

2.7

0.7

2.8

0.7

2.9

0.6

2.8

0.7

2.9

0.7

3.0

0.6

2.6

0.6

2.4

0.6

2.3

0.5

2.2

14

0.7

2.7

0.6

2.4

0.6

2.7

0.6

2.6

0.6

2.6

0.6

2.6

0.7

2.9

0.6

2.6

0.7

2.7

0.7

2.8

0.6

2.3

0.5

2.3

0.5

2.2

0.5

2.1

15

0.6

2.5

0.6

2.4

0.6

2.6

0.6

2.4

0.6

2.5

0.6

2.5

0.6

2.7

0.6

2.5

0.6

2.7

0.6

2.6

0.5

2.2

0.5

2.1

0.5

2.1

0.5

2.0

16

0.6

2.4

0.6

2.3

0.6

2.5

0.6

2.3

0.6

2.3

0.6

2.3

0.6

2.5

0.5

2.3

0.6

2.5

0.6

2.5

0.5

2.0

0.4

1.8

0.5

2.0

0.4

1.8

17

0.5

2.3

0.5

2.2

0.6

2.4

0.5

2.2

0.5

2.2

0.5

2.2

0.5

2.3

0.5

2.2

0.6

2.5

0.5

2.3

0.5

1.9

0.4

1.7

0.4

1.7

0.4

1.8

18

0.5

2.1

0.5

2.1

0.5

2.3

0.5

2.1

0.5

2.1

0.5

2.1

0.5

2.2

0.5

2.0

0.5

2.2

0.5

2.2

0.4

1.6

0.4

1.6

0.4

1.7

0.4

1.5

19

0.5

2.0

0.5

1.9

0.5

2.2

0.5

2.1

0.5

2.1

0.5

2.0

0.5

2.2

0.4

1.9

0.5

2.2

0.4

1.9

0.4

1.6

0.4

1.5

0.4

1.5

0.3

1.4

20

0.5

1.9

0.4

1.8

0.5

2.0

0.5

1.9

0.5

1.9

0.5

2.0

0.4

1.8

0.4

1.7

0.5

2.1

0.4

1.8

0.4

1.5

0.3

1.4

0.3

1.4

0.3

1.2

21

0.4

1.7

0.4

1.6

0.5

1.9

0.4

1.8

0.4

1.8

0.4

1.8

0.4

1.6

0.4

1.6

0.5

1.9

0.4

1.7

0.3

1.3

0.3

1.2

0.3

1.3

0.3

1.1

22

0.4

1.6

0.4

1.5

0.4

1.7

0.4

1.5

0.4

1.8

0.4

1.7

0.3

1.5

0.3

1.5

0.4

1.8

0.4

1.5

0.2

1.0

0.3

1.1

0.3

1.2

0.2

1.0

23

0.4

1.5

0.3

1.2

0.4

1.6

0.3

1.4

0.3

1.4

0.3

1.4

0.3

1.2

0.3

1.3

0.3

1.4

0.3

1.3

0.1

0.5

0.3

1.1

0.2

1.0

0.2

0.8

24

0.3

1.3

0.2

0.9

0.4

1.6

0.3

1.3

0.3

1.3

0.3

1.3

0.3

1.1

0.3

1.1

0.0

0.1

0.2

0.7

0.2

0.9

0.2

0.6

Not

e. E

ig. d

enot

es e

igen

valu

e.

aCt aSPIRE ValIDIty

17.23

Table 17.4. Science aCt aspire Online and Paper mode Exploratory Factor analysis Eigenvalues and Percentages of Variance Explained, by grade level using Data from the Spring 2013 Comparability Study

Fact

or

Gra

de 3

Gra

de 4

Gra

de 5

Gra

de 6

Gra

de 7

Gra

de 8

Ear

ly H

igh

Sch

ool

Onl

ine

Pap

erO

nlin

eP

aper

Onl

ine

Pap

erO

nlin

eP

aper

Onl

ine

Pap

erO

nlin

eP

aper

Onl

ine

Pap

er

(N=

1,37

6)(N

=1,

405)

(N=

1,43

5)(N

=1,

444)

(N=

1,37

8)(N

=1,

379)

(N=

1,04

7)(N

=1,

040)

(N=

1,02

8)(N

=1,

042)

(N=

913)

(N=

934)

(N=

959)

(N=

1,02

3)

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

11

0.7

38

.11

0.2

36

.39

.13

2.5

8.8

31

.39

.33

3.1

9.4

33

.51

0.8

33

.71

0.9

34

.11

2.1

37

.71

1.4

35

.51

0.9

34

.11

1.0

34

.51

0.2

31

.71

0.3

32

.3

21

.86

.32

.79

.82

.17

.42

.91

0.3

1.9

6.9

2.2

7.9

1.7

5.4

2.8

8.7

1.5

4.8

2.3

7.1

1.9

6.1

2.2

6.8

1.7

5.5

2.0

6.3

31

.14

.11

.24

.31

.34

.51

.44

.91

.14

.11

.34

.61

.44

.41

.34

.21

.44

.31

.54

.51

.34

.01

.34

.21

.34

.01

.34

.1

41

.03

.61

.24

.31

.14

.01

.24

.31

.13

.81

.24

.51

.13

.61

.23

.91

.23

.71

.23

.71

.33

.91

.33

.91

.23

.71

.34

.0

51

.03

.41

.03

.51

.03

.71

.13

.81

.03

.51

.13

.91

.13

.51

.13

.51

.13

.51

.13

.41

.13

.41

.13

.51

.13

.61

.23

.6

60

.93

.20

.93

.20

.93

.31

.03

.61

.03

.51

.03

.51

.03

.31

.03

.11

.13

.31

.03

.31

.13

.31

.13

.31

.13

.31

.13

.4

70

.93

.20

.83

.00

.93

.21

.03

.40

.93

.20

.93

.41

.03

.11

.03

.01

.03

.11

.03

.11

.03

.21

.03

.21

.03

.21

.03

.2

80

.82

.90

.82

.80

.93

.10

.93

.20

.93

.20

.93

.20

.92

.90

.92

.80

.92

.80

.92

.90

.93

.01

.03

.01

.03

.11

.03

.0

90

.82

.80

.82

.70

.82

.90

.82

.90

.83

.00

.82

.90

.92

.80

.82

.60

.92

.70

.82

.70

.92

.90

.92

.90

.92

.90

.92

.9

10

0.7

2.6

0.7

2.6

0.8

2.8

0.8

2.8

0.8

2.8

0.8

2.8

0.9

2.7

0.8

2.6

0.8

2.5

0.8

2.4

0.9

2.7

0.8

2.6

0.9

2.8

0.9

2.9

11

0.7

2.5

0.7

2.4

0.8

2.7

0.7

2.6

0.7

2.6

0.8

2.8

0.8

2.6

0.7

2.3

0.8

2.3

0.7

2.3

0.8

2.6

0.8

2.6

0.9

2.7

0.8

2.6

12

0.7

2.4

0.7

2.4

0.7

2.6

0.7

2.5

0.7

2.5

0.7

2.5

0.8

2.6

0.7

2.2

0.7

2.3

0.7

2.2

0.8

2.5

0.8

2.5

0.8

2.5

0.8

2.5

13

0.6

2.2

0.6

2.1

0.7

2.5

0.7

2.4

0.7

2.4

0.7

2.4

0.8

2.4

0.7

2.2

0.7

2.3

0.7

2.1

0.8

2.4

0.7

2.3

0.8

2.5

0.8

2.4

14

0.6

2.1

0.6

2.1

0.7

2.4

0.6

2.3

0.7

2.4

0.6

2.3

0.7

2.3

0.7

2.1

0.7

2.1

0.7

2.0

0.7

2.3

0.7

2.2

0.8

2.4

0.7

2.3

15

0.6

2.0

0.6

2.0

0.7

2.3

0.6

2.1

0.6

2.2

0.6

2.2

0.7

2.3

0.6

1.9

0.7

2.1

0.6

1.9

0.7

2.1

0.7

2.1

0.7

2.3

0.7

2.2

16

0.6

2.0

0.5

1.9

0.6

2.2

0.6

2.0

0.6

2.1

0.6

2.0

0.7

2.1

0.6

1.9

0.6

1.9

0.6

1.9

0.6

2.0

0.6

2.0

0.7

2.3

0.7

2.0

17

0.5

1.9

0.5

1.7

0.6

2.1

0.5

1.9

0.6

2.1

0.5

2.0

0.6

1.9

0.6

1.8

0.6

1.9

0.6

1.8

0.6

1.9

0.6

1.9

0.7

2.1

0.6

2.0

18

0.5

1.7

0.5

1.7

0.6

2.0

0.5

1.9

0.6

2.0

0.5

1.9

0.6

1.9

0.6

1.8

0.6

1.9

0.5

1.7

0.6

1.8

0.6

1.8

0.6

2.0

0.6

1.9

19

0.5

1.7

0.4

1.6

0.5

1.9

0.5

1.7

0.5

1.9

0.5

1.8

0.6

1.8

0.5

1.7

0.6

1.8

0.5

1.7

0.6

1.8

0.6

1.7

0.6

1.8

0.6

1.8

20

0.4

1.6

0.4

1.5

0.5

1.9

0.5

1.7

0.5

1.9

0.5

1.6

0.5

1.7

0.5

1.5

0.5

1.7

0.5

1.6

0.6

1.7

0.5

1.6

0.6

1.8

0.6

1.7

21

0.4

1.6

0.4

1.4

0.5

1.7

0.4

1.5

0.5

1.8

0.4

1.5

0.5

1.6

0.5

1.5

0.5

1.5

0.5

1.5

0.5

1.5

0.5

1.4

0.6

1.7

0.5

1.6

22

0.4

1.5

0.4

1.3

0.4

1.5

0.4

1.5

0.5

1.7

0.4

1.3

0.5

1.5

0.4

1.4

0.5

1.5

0.4

1.4

0.5

1.5

0.4

1.3

0.5

1.7

0.5

1.5

aCt aSPIRE ValIDIty

17.24 Fact

or

Gra

de 3

Gra

de 4

Gra

de 5

Gra

de 6

Gra

de 7

Gra

de 8

Ear

ly H

igh

Sch

ool

Onl

ine

Pap

erO

nlin

eP

aper

Onl

ine

Pap

erO

nlin

eP

aper

Onl

ine

Pap

erO

nlin

eP

aper

Onl

ine

Pap

er

(N=

1,37

6)(N

=1,

405)

(N=

1,43

5)(N

=1,

444)

(N=

1,37

8)(N

=1,

379)

(N=

1,04

7)(N

=1,

040)

(N=

1,02

8)(N

=1,

042)

(N=

913)

(N=

934)

(N=

959)

(N=

1,02

3)

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

Eig

.%

23

0.4

1.4

0.4

1.3

0.4

1.4

0.3

1.2

0.5

1.6

0.4

1.3

0.4

1.3

0.4

1.3

0.4

1.3

0.4

1.4

0.4

1.4

0.4

1.3

0.5

1.5

0.4

1.4

24

0.3

1.2

0.3

1.1

0.4

1.4

0.3

1.2

0.4

1.5

0.3

1.1

0.4

1.3

0.4

1.3

0.4

1.3

0.4

1.3

0.4

1.3

0.4

1.2

0.4

1.4

0.4

1.4

25

0.3

1.1

0.3

1.0

0.4

1.3

0.3

1.1

0.4

1.3

0.3

1.1

0.4

1.2

0.4

1.2

0.4

1.2

0.4

1.2

0.4

1.2

0.3

1.0

0.4

1.3

0.4

1.2

26

0.3

1.0

0.2

0.8

0.3

1.2

0.3

1.0

0.3

1.1

0.3

0.9

0.4

1.1

0.3

1.0

0.4

1.1

0.3

1.1

0.4

1.1

0.3

0.9

0.4

1.2

0.4

1.1

27

0.3

0.9

0.2

0.7

0.2

0.8

0.2

0.7

0.3

1.0

0.2

0.7

0.3

1.0

0.3

1.0

0.3

1.1

0.3

1.0

0.3

1.0

0.3

0.9

0.4

1.2

0.3

1.0

28

0.2

0.6

0.1

0.5

0.2

0.6

0.0

0.2

0.2

0.9

0.2

0.6

0.3

1.0

0.3

0.9

0.3

0.9

0.3

1.0

0.3

0.9

0.3

0.8

0.4

1.1

0.3

1.0

29

0.3

0.9

0.3

0.8

0.3

0.8

0.3

0.9

0.3

0.8

0.2

0.7

0.3

1.0

0.3

0.8

30

0.3

0.8

0.2

0.8

0.3

0.8

0.2

0.7

0.2

0.7

0.2

0.7

0.3

0.9

0.3

0.8

31

0.2

0.8

0.2

0.5

0.2

0.5

0.2

0.5

0.1

0.4

0.2

0.6

0.2

0.7

0.2

0.6

32

0.2

0.7

0.1

0.4

-0.2

-0.8

0.1

0.2

0.1

0.3

0.1

0.4

0.1

0.2

0.1

0.4

Not

e. E

ig. d

enot

es e

igen

valu

e.


aCt aSPIRE ValIDIty

17.25

Table 17.5. English test Fit Statistics for One- and two-Factor models using Data from the 2013 aCt aspire mode Comparability Study

12

DIF

F1

2D

IFF

12

DIF

F1

2D

IFF

12

DIF

Fχ2

697

.63

48

3.9

221

3.7

015

25

.28

52

9.0

19

96

.26

66

3.4

34

28

.16

23

5.2

89

06

.92

513

.66

39

3.2

76

57.2

24

34

.92

22

2.3

0D

F2

752

512

42

752

512

42

752

512

42

752

512

42

752

512

4p-

valu

e0

*0

*0

*0

*0

*0

*0

*0

*0

*0

*0

*0

*0

*0

*0

*C

FI

0.9

60

.98

0.8

8*

0.9

70

.95

0.9

80

.94

*0

.97

0.9

60

.98

TLI

0.9

50

.97

0.8

7*

0.9

70

.94

*0

.97

0.9

3*

0.9

70

.96

0.9

8R

MS

EA

0.0

30

.03

0.0

60

.03

0.0

30

.02

0.0

40

.03

0.0

30

.02

SR

MR

0.0

60

.05

0.0

9*

0.0

50

.06

0.0

50

.07

0.0

50

.06

0.0

5C

OR

R0

.65

0.4

60

.20

*0

.58

0.5

2

12

DIF

F1

2D

IFF

12

DIF

F1

2D

IFF

12

DIF

Fχ2

85

5.8

93

65

.61

49

0.2

812

13.6

876

3.9

94

49

.69

201

5.4

29

00

.40

1115

.02

103

3.2

071

5.3

631

7.8

513

08

.33

861

.31

447

.03

DF

275

251

24

56

05

26

34

56

05

26

34

56

05

26

34

56

05

26

34

p-va

lue

0*

0*

0*

0*

0*

0*

0*

0*

0*

0*

0*

0*

0*

0*

0*

CF

I0

.94

*0

.99

0.9

4*

0.9

80

.87

*0

.97

0.9

60

.98

0.9

3*

0.9

7TL

I0

.94

*0

.99

0.9

4*

0.9

80

.87

*0

.96

0.9

50

.98

0.9

2*

0.9

6R

MS

EA

0.0

40

.02

0.0

30

.02

0.0

50

.03

0.0

30

.02

0.0

40

.03

SR

MR

0.0

70

.04

0.0

60

.05

0.0

8*

0.0

50

.06

0.0

50

.07

0.0

5C

OR

R0

.48

0.4

70

.43

0.5

50

.53

12

DIF

F1

2D

IFF

12

DIF

F1

2D

IFF

χ210

17.1

672

1.8

92

95

.27

116

5.6

291

7.21

24

8.4

02

421

.58

168

5.6

073

5.9

83

397

.76

194

6.5

314

51.2

3D

F5

60

52

63

45

60

52

63

411

7511

26

49

1175

112

64

9p-

valu

e0

*0

*0

*0

*0

*0

*0

*0

*0

*0

*0

*0

*C

FI

0.9

60

.98

0.9

70

.98

0.9

70

.99

0.9

50

.98

TLI

0.9

60

.98

0.9

60

.97

0.9

70

.99

0.9

50

.98

RM

SE

A0

.03

0.0

20

.04

0.0

30

.03

0.0

20

.04

0.0

3S

RM

R0

.07

0.0

60

.07

0.0

60

.06

0.0

40

.06

0.0

5C

OR

R0

.51

0.5

90

.54

0.6

0

Fit

Sta

tsG

rade

8 O

nlin

eG

rade

8 P

aper

Gra

de 9

Onl

ine

Gra

de 9

Pap

er

Fit

Sta

tsG

rade

5 P

aper

Gra

de 6

Onl

ine

Gra

de 6

Pap

erG

rade

7 O

nlin

eG

rade

7 P

aper

Fit

Sta

tsG

rade

3 O

nlin

eG

rade

3 P

aper

Gra

de 4

Onl

ine

Gra

de 4

Pap

erG

rade

5 O

nlin

e

Not

e. D

F is

the

degr

ee o

f fre

edom

; CF

I is

the

com

para

tive

fit in

dex;

TLI

is th

e Tu

cker

-Lew

is in

dex;

RM

SE

A is

the

root

mea

n sq

uare

err

or o

f app

roxi

mat

ion;

SR

MR

is th

e st

anda

rdiz

ed

root

mea

n sq

uare

resi

dual

; and

CO

RR

is th

e co

rrel

atio

n be

twee

n Fa

ctor

1 a

nd F

acto

r 2 lo

adin

gs. F

lag

crite

ria in

clud

ed C

FI <

0.9

5, T

LI <

0.9

5, R

MS

EA

> 0

.06

, SR

MR

> 0

.08

, |D

IFF

CF

I| >

0.1

, |D

IFF

TLI|

> 0

.1, |

DIF

F R

MS

EA

| > 0

.05

, |D

IFF

SR

MR

| > 0

.05

, and

CO

RR

< 0

.3. |

DIF

F| re

pres

ents

the

abso

lute

diff

eren

ce fo

r an

inde

x.

aCt aSPIRE ValIDIty

17.26

Table 17.6. mathematics test Fit Statistics for One- and two-Factor models using Data from the 2013 aCt aspire mode Comparability Study

12

DIF

F1

2D

IFF

12

DIF

F1

2D

IFF

12

DIF

Fχ2

798

.73

461

.48

33

7.2

613

82

.23

64

3.3

073

8.9

3*

741

.03

45

4.2

02

86

.84

135

5.7

14

60

.17

89

5.5

4*

54

6.3

74

05

.29

141

.08

DF

275

251

24

275

251

24

275

251

24

275

251

24

275

251

24

p-va

lue

0*

0*

0*

0*

0*

0*

0*

0*

0*

0*

0*

0*

0*

0*

0*

CF

I0

.95

0.9

80

.89

*0

.96

0.8

8*

0.9

5*

0.7

1*

0.9

4*

0.9

3*

0.9

6TL

I0

.95

0.9

80

.88

*0

.95

0.8

7*

0.9

4*

0.6

9*

0.9

3*

0.9

2*

0.9

5R

MS

EA

0.0

40

.02

0.0

50

.03

0.0

30

.02

0.0

50

.02

0.0

30

.02

SR

MR

0.0

60

.05

0.0

70

.05

0.0

60

.05

0.0

70

.05

0.0

60

.05

CO

RR

0.6

30

.35

0.3

90

.24

*0

.59

12

DIF

F1

2D

IFF

12

DIF

F1

2D

IFF

12

DIF

Fχ2

783

.97

45

9.3

43

24

.63

1601

.12

647

.68

95

3.4

4*

30

44

.77

89

0.0

321

54

.74*

105

5.5

76

94

.45

361

.12

20

04

.14

748

.06

125

6.0

8*

DF

275

251

24

52

74

94

33

52

74

94

33

52

74

94

33

52

74

94

33

p-va

lue

0*

0*

0*

0*

0*

0*

0*

0*

0*

0*

0*

0*

0*

0*

0*

CF

I0

.86

*0

.94

*0

.85

*0

.98

0.6

8*

0.9

50

.96

0.9

90

.88

*0

.98

TLI

0.8

4*

0.9

3*

0.8

4*

0.9

80

.66

*0

.94

*0

.96

0.9

80

.87

*0

.98

RM

SE

A0

.04

0.0

20

.04

0.0

20

.06

*0

.03

0.0

30

.02

0.0

50

.02

SR

MR

0.0

70

.06

0.0

70

.05

0.1

0*

0.0

60

.06

0.0

50

.07

0.0

5C

OR

R0

.29

*0

.19

*0

.14

*0

.53

0.2

6*

12

DIF

F1

2D

IFF

12

DIF

F1

2D

IFF

χ213

33

.91

103

7.57

29

6.3

513

96

.14

974

.08

42

2.0

610

89

.36

84

6.1

92

43

.17

138

0.0

19

43

.23

43

6.7

8D

F6

65

62

83

76

65

62

83

76

65

62

83

76

65

62

83

7p-

valu

e0

*0

*0

*0

*0

*0

*0

*0

*0

*0

*0

*0

*C

FI

0.9

70

.98

0.9

70

.99

0.9

90

.99

0.9

70

.99

TLI

0.9

70

.98

0.9

70

.98

0.9

90

.99

0.9

70

.99

RM

SE

A0

.03

0.0

30

.03

0.0

20

.02

0.0

20

.03

0.0

2S

RM

R0

.06

0.0

60

.06

0.0

50

.05

0.0

40

.06

0.0

4C

OR

R0

.73

0.5

40

.56

0.4

9

Fit

Sta

ts

Gra

de 3

Onl

ine

Gra

de 7

Onl

ine

Gra

de 7

Pap

er

Gra

de 8

Onl

ine

Gra

de 8

Pap

erG

rade

9 O

nlin

eG

rade

9 P

aper

Fit

Sta

ts

Gra

de 3

Pap

erG

rade

4 O

nlin

eG

rade

4 P

aper

Gra

de 5

Onl

ine

Gra

de 5

Pap

erG

rade

6 O

nlin

eG

rade

6 P

aper

Fit

Sta

ts

Not

e. D

F is

the

degr

ee o

f fre

edom

; CF

I is

the

com

para

tive

fit in

dex;

TLI

is th

e Tu

cker

-Lew

is in

dex;

RM

SE

A is

the

root

mea

n sq

uare

err

or o

f app

roxi

mat

ion;

SR

MR

is th

e st

anda

rdiz

ed ro

ot

mea

n sq

uare

resi

dual

; and

CO

RR

is th

e co

rrel

atio

n be

twee

n Fa

ctor

1 a

nd F

acto

r 2 lo

adin

gs. F

lag

crite

ria in

clud

ed C

FI <

0.9

5, T

LI <

0.9

5, R

MS

EA

> 0

.06

, SR

MR

> 0

.08

, |D

IFF

CF

I| >

0

.1, |

DIF

F TL

I| >

0.1

, |D

IFF

RM

SE

A| >

0.0

5, |

DIF

F S

RM

R| >

0.0

5, a

nd C

OR

R <

0.3

. |D

IFF|

repr

esen

ts th

e ab

solu

te d

iffer

ence

for a

n in

dex.

aCt aSPIRE ValIDIty

17.27

Table 17.7. Reading test Fit Statistics for One- and two-Factor models using Data from the 2013 aCt aspire mode Comparability Study

12

DIF

F1

2D

IFF

12

DIF

F1

2D

IFF

12

DIF

Fχ2

45

4.0

03

35

.7411

8.2

79

32

.69

417.

4751

5.2

3*

372

.88

28

0.5

59

2.3

35

56

.80

312

.35

24

4.4

54

80

.62

351

.46

129

.16

DF

25

22

29

23

25

22

29

23

25

22

29

23

25

22

29

23

25

22

29

23

p-va

lue

0*

0*

0*

0*

0*

0*

0*

0.0

10

*0

*0

*0

*0

*0

*0

*C

FI

0.9

91

.00

0.9

70

.99

0.9

91

.00

0.9

81

.00

0.9

90

.99

TLI

0.9

90

.99

0.9

60

.99

0.9

91

.00

0.9

80

.99

0.9

80

.99

RM

SE

A0

.02

0.0

20

.04

0.0

20

.02

0.0

10

.03

0.0

20

.03

0.0

2S

RM

R0

.04

0.0

30

.06

0.0

40

.04

0.0

30

.05

0.0

40

.04

0.0

4C

OR

R0

.38

0.6

00

.73

0.6

70

.76

12

DIF

F1

2D

IFF

12

DIF

F1

2D

IFF

12

DIF

Fχ2

54

8.5

131

7.77

23

0.74

34

0.7

22

68

.48

72.2

55

99

.90

39

5.9

92

03

.91

35

2.5

82

75.2

677

.32

48

8.6

13

87.3

310

1.2

7D

F2

52

22

92

32

30

20

82

22

30

20

82

22

52

22

92

32

52

22

92

3p-

valu

e0

*0

*0

*0

*0

*0

*0

*0

*0

*0

*0

.02

0*

0*

0*

0*

CF

I0

.98

0.9

90

.99

1.0

00

.96

0.9

80

.99

1.0

00

.98

0.9

9TL

I0

.98

0.9

90

.99

0.9

90

.96

0.9

80

.99

0.9

90

.98

0.9

8R

MS

EA

0.0

30

.02

0.0

20

.02

0.0

40

.03

0.0

20

.01

0.0

30

.03

SR

MR

0.0

50

.04

0.0

40

.04

0.0

60

.04

0.0

50

.04

0.0

50

.04

CO

RR

0.6

80

.76

0.6

80

.67

0.7

0

12

DIF

F1

2D

IFF

12

DIF

F1

2D

IFF

χ24

95

.96

374

.81

121

.15

575

.55

35

2.7

62

22

.79

49

2.0

93

80

.64

111

.45

677.7

73

92

.32

28

5.4

6D

F2

52

22

92

32

52

22

92

32

52

22

92

32

52

22

92

3p-

valu

e0

*0

*0

*0

*0

*0

*0

*0

*0

*0

*0

*0

*C

FI

0.9

80

.99

-0.0

10

.98

0.9

9-0

.01

0.9

90

.99

-0.0

10

.98

0.9

9-0

.01

TLI

0.9

80

.99

-0.0

10

.98

0.9

9-0

.01

0.9

90

.99

00

.98

0.9

9-0

.01

RM

SE

A0

.04

0.0

30

.01

0.0

40

.03

0.0

10

.03

0.0

30

.01

0.0

40

.03

0.0

2S

RM

R0

.07

0.0

50

.01

0.0

60

.05

0.0

20

.05

0.0

40

.01

0.0

50

.04

0.0

1C

OR

R0

.60

0.5

70

.18

*0

.80

Fit

Sta

tsG

rade

3 O

nlin

eG

rade

3 P

aper

Gra

de 4

Onl

ine

Gra

de 4

Pap

erG

rade

5 O

nlin

e

Fit

Sta

tsG

rade

5 P

aper

Gra

de 6

Onl

ine

Gra

de 6

Pap

erG

rade

7 O

nlin

eG

rade

7 P

aper

Fit

Sta

tsG

rade

8 O

nlin

eG

rade

8 P

aper

Gra

de 9

Onl

ine

Gra

de 9

Pap

er

Not

e. D

F is

the

degr

ee o

f fre

edom

; CF

I is

the

com

para

tive

fit in

dex;

TLI

is th

e Tu

cker

-Lew

is in

dex;

RM

SE

A is

the

root

mea

n sq

uare

err

or o

f app

roxi

mat

ion;

SR

MR

is th

e st

anda

rdiz

ed

root

mea

n sq

uare

resi

dual

; and

CO

RR

is th

e co

rrel

atio

n be

twee

n Fa

ctor

1 a

nd F

acto

r 2 lo

adin

gs. F

lag

crite

ria in

clud

ed C

FI <

0.9

5, T

LI <

0.9

5, R

MS

EA

> 0

.06

, SR

MR

> 0

.08

, |D

IFF

CF

I| >

0.1

, |D

IFF

TLI|

> 0

.1, |

DIF

F R

MS

EA

| > 0

.05

, |D

IFF

SR

MR

| > 0

.05

, and

CO

RR

< 0

.3. |

DIF

F| re

pres

ents

the

abso

lute

diff

eren

ce fo

r an

inde

x.

aCt aSPIRE ValIDIty

17.28

Table 17.8. Science test Fit Statistics for One- and two-Factor models using Data from the 2013 aCt aspire mode Comparability Study

12

DIF

F1

2D

IFF

12

DIF

F1

2D

IFF

12

DIF

Fχ2

123

0.0

46

57.0

757

2.9

731

05

.00

104

9.0

72

05

5.9

213

49

.42

50

6.3

38

43

.09

29

24

.30

815

.08

210

9.2

19

45

.54

501

.38

44

4.1

6D

F3

50

32

32

73

50

32

32

73

50

32

32

73

50

32

32

73

50

32

32

7p-

valu

e0

*0

*0

*0

*0

*0

*0

*0

*0

*0

*0

*0

*0

*0

*0

*C

FI

0.9

80

.99

0.9

2*

0.9

80

.95

0.9

90

.89

*0

.98

0.9

70

.99

TLI

0.9

70

.99

0.9

1*

0.9

80

.95

0.9

90

.88

*0

.98

0.9

70

.99

RM

SE

A0

.04

0.0

30

.08

*0

.04

0.0

50

.02

0.0

7*

0.0

30

.04

0.0

2S

RM

R0

.06

0.0

40

.09

*0

.05

0.0

70

.04

0.1

0*

0.0

50

.06

0.0

4C

OR

R0

.67

0.4

70

.58

0.4

50

.55

12

DIF

F1

2D

IFF

12

DIF

F1

2D

IFF

12

DIF

Fχ2

176

8.5

081

4.3

39

54

.17

112

4.2

677

9.3

03

44

.96

25

93

.28

92

9.1

016

64

.18

112

0.0

187

9.8

52

40

.17

1741

.53

83

9.8

59

01.6

8D

F3

50

32

32

74

64

43

331

46

44

33

314

64

43

331

46

44

33

31p-

valu

e0

*0

*0

*0

*0

*0

*0

*0

*0

*0

*0

*0

*0

*0

*0

*C

FI

0.9

4*

0.9

80

.98

0.9

90

.93

*0

.98

0.9

80

.99

0.9

60

.99

TLI

0.9

4*

0.9

80

.97

0.9

90

.92

*0

.98

0.9

80

.99

0.9

50

.98

RM

SE

A0

.05

0.0

30

.04

0.0

30

.07

*0

.03

0.0

40

.03

0.0

50

.03

SR

MR

0.0

80

.05

0.0

60

.05

0.0

8*

0.0

50

.06

0.0

50

.07

0.0

5C

OR

R0

.46

0.6

70

.47

0.2

2*

0.6

2

12

DIF

F1

2D

IFF

12

DIF

F1

2D

IFF

χ297

2.3

961

4.6

53

57.74

1376

.42

801

.27

575

.15

85

6.2

46

05

.68

25

0.5

613

67.9

58

91.0

147

6.9

4D

F4

64

43

331

46

44

33

314

64

43

331

46

44

33

31p-

valu

e0

*0

*0

*0

*0

*0

*0

*0

*0

*0

*0

*0

*C

FI

0.9

80

.99

0.9

60

.99

0.9

80

.99

0.9

60

.98

TLI

0.9

80

.99

0.9

60

.98

0.9

80

.99

0.9

60

.98

RM

SE

A0

.04

0.0

20

.05

0.0

30

.03

0.0

20

.04

0.0

3S

RM

R0

.06

0.0

50

.07

0.0

50

.06

0.0

50

.06

0.0

5C

OR

R0

.59

0.5

00

.29

*0

.57

Fit

Sta

tsG

rade

8 O

nlin

eG

rade

8 P

aper

Gra

de 9

Onl

ine

Gra

de 9

Pap

er

Fit

Sta

tsG

rade

5 P

aper

Gra

de 6

Onl

ine

Gra

de 6

Pap

erG

rade

7 O

nlin

eG

rade

7 P

aper

Fit

Sta

tsG

rade

3 O

nlin

eG

rade

3 P

aper

Gra

de 4

Onl

ine

Gra

de 4

Pap

erG

rade

5 O

nlin

e

Not

e. D

F is

the

degr

ee o

f fre

edom

; CF

I is

the

com

para

tive

fit in

dex;

TLI

is th

e Tu

cker

-Lew

is in

dex;

RM

SE

A is

the

root

mea

n sq

uare

err

or o

f app

roxi

mat

ion;

SR

MR

is th

e st

anda

rdiz

ed

root

mea

n sq

uare

resi

dual

; and

CO

RR

is th

e co

rrel

atio

n be

twee

n Fa

ctor

1 a

nd F

acto

r 2 lo

adin

gs. F

lag

crite

ria in

clud

ed C

FI <

0.9

5, T

LI <

0.9

5, R

MS

EA

> 0

.06

, SR

MR

> 0

.08

, |D

IFF

CF

I| >

0.1

, |D

IFF

TLI|

> 0

.1, |

DIF

F R

MS

EA

| > 0

.05

, |D

IFF

SR

MR

| > 0

.05

, and

CO

RR

< 0

.3. |

DIF

F| re

pres

ents

the

abso

lute

diff

eren

ce fo

r an

inde

x.

aCt aSPIRE ValIDIty

17.29

Table 17.9. two-Factor Exploratory Factor model loadings for Four aCt aspire Forms Identified as Showing Evidence of two Factors based on analysis of Fit Statistics

Item

Mathematics Grade 4 Paper


Mathematics Grade 6 Online


Factor 1 Factor 2 Factor 1 Factor 2 Factor 1 Factor 2 Factor 1 Factor 2

1 0.32 0.02 0.21 -0.07 0.55 -0.09 0.48 0.09

2 0.33 -0.03 0.57 -0.04 0.43 -0.09 0.52 -0.10

3 0.46 0.00 0.44 -0.12 0.10 -0.03 0.27 -0.01

4 0.59 -0.03 0.19 -0.06 0.14 0.01 -0.18 0.07

5 0.35 -0.08 0.39 0.04 0.58 -0.06 0.60 0.00

6 0.61 0.00 0.14 -0.02 -0.08 0.11 -0.01 -0.02

7 0.46 -0.10 0.15 -0.04 0.55 -0.07 0.59 -0.04

8 0.22 0.02 0.04 -0.07 0.61 -0.05 0.60 0.00

9 0.15 -0.01 0.30 -0.03 0.37 0.00 0.38 -0.06

10 0.22 0.02 0.56 -0.20 0.45 -0.06 0.45 0.01

11 0.09 0.01 0.03 0.09 0.44 -0.05 0.54 -0.08

12 0.52 0.10 0.52 0.00 0.45 -0.08 0.42 0.07

13 0.05 0.15 0.24 0.04 -0.04 0.14 0.54 -0.02

14 0.17 0.14 0.03 0.11 0.41 0.07 0.44 -0.12

15 0.18 0.16 0.28 0.06 0.48 0.05 0.04 -0.03

16 -0.05 0.21 0.46 0.11 0.39 -0.02 0.27 0.04

17 0.32 0.47 0.38 0.27 0.13 0.10 0.44 0.11

18 0.13 0.42 0.26 0.25 0.32 0.12 0.45 0.07

19 0.04 0.44 0.00 0.20 0.52 0.35 0.58 0.24

20 -0.11 0.31 0.23 0.40 0.17 0.28 0.20 0.08

21 -0.01 0.36 0.18 0.40 0.14 0.41 0.22 0.17

22 0.54 0.02 0.39 0.30 0.01 0.65 0.17 0.22

23 0.39 0.20 0.44 0.17 -0.03 0.41 0.34 0.27

24 0.00 0.69 0.25 0.53 0.20 0.33 0.08 0.34

25 -0.18 0.67 -0.02 0.79 0.33 0.02 0.24 0.50

26 0.58 0.03 0.07 0.40

27 0.42 0.22 0.01 0.38

28 0.64 0.00 -0.04 0.73

29 0.47 0.17 -0.06 0.47

30 0.57 0.30 -0.01 0.43

31 0.29 0.26 0.29 0.38

32 0.45 0.19 0.27 0.37

33 0.31 0.42 0.09 0.74

34 -0.06 0.84 -0.21 0.76

aCt aSPIRE ValIDIty

17.30

Table 17.10. Summary Statistics for aCt aspire Scale Scores by grade

Grade Statistic English Mathematics Reading Science Writing ELA STEM Composite

Mean 416.9 412.37 412.17 414.27 422.76 417.27 413.57 414.05

3 SD 5.97 3.74 5.27 5.95 6.65 5 4.45 4.61

N 5,534 5,534 5,534 5,534 5,534 5,534 5,534 5,534

Mean 420 415.19 414.46 417.12 423.09 419.18 416.41 416.82

4 SD 6.07 3.92 5.6 6.49 7 5.15 4.6 4.69

N 5,354 5,354 5,354 5,354 5,354 5,354 5,354 5,354

Mean 422.33 417.24 416.65 419.27 423.02 420.66 418.5 419

5 SD 6.95 4.53 6.16 6.64 7.03 5.59 5.01 5.19

N 5,300 5,300 5,300 5,300 5,300 5,300 5,300 5,300

Mean 423.89 419.09 417.88 419.78 426.76 422.84 419.69 420.28

6 SD 7.86 5.64 6.75 7.26 7.81 6.28 5.82 5.98

N 5,087 5,087 5,087 5,087 5,087 5,087 5,087 5,087

Mean 425.37 420.24 419.26 420.8 425.43 423.36 420.78 421.55

7 SD 8.51 6.39 6.67 7.71 7.09 6.25 6.5 6.39

N 5,119 5,119 5,119 5,119 5,119 5,119 5,119 5,119

Mean 426.81 422.68 420.36 422.62 423.13 423.44 422.89 423.24

8 SD 8.86 7.36 7.17 7.76 6.59 6.5 7.11 6.94

N 3,995 3,995 3,995 3,995 3,995 3,995 3,995 3,995

Mean 427.86 424.53 421.1 423.67 423.67 424.21 424.35 424.41

9 SD 10.46 8.21 7.78 8.11 7.43 7.6 7.6 7.73

N 2,287 2,287 2,287 2,287 2,287 2,287 2,287 2,287

Mean 431.2 427.22 422.49 425.6 425.74 426.47 426.66 426.75

10 SD 11.26 9.1 8.37 8.96 8.07 8.21 8.52 8.51

N 1,778 1,778 1,778 1,778 1,778 1,778 1,778 1,778

Note. Sample includes students taking English, mathematics, reading, science, and writing tests from the spring 2013 special studies.

aCt aSPIRE ValIDIty

17.31

Table 17.11. Correlations among aCt aspire Scale Scores by grade

Grade English Mathematics Reading Science Writing ELA STEM

3

English

Mathematics 0.62

Reading 0.72 0.63

Science 0.72 0.67 0.76

Writing 0.47 0.46 0.49 0.45

ELA 0.86 0.67 0.85 0.75 0.80

STEM 0.74 0.86 0.77 0.95 0.49 0.79

Composite 0.89 0.80 0.89 0.91 0.53 0.90 0.94

4

English

Mathematics 0.47

Reading 0.69 0.49

Science 0.67 0.53 0.75

Writing 0.46 0.34 0.45 0.42

ELA 0.85 0.52 0.84 0.72 0.80

STEM 0.67 0.80 0.73 0.93 0.44 0.73

Composite 0.86 0.69 0.88 0.90 0.50 0.88 0.93

5

English

Mathematics 0.49

Reading 0.70 0.53

Science 0.70 0.58 0.75

Writing 0.45 0.36 0.47 0.46

ELA 0.86 0.55 0.85 0.76 0.78

STEM 0.69 0.84 0.74 0.93 0.47 0.75

Composite 0.87 0.73 0.88 0.90 0.52 0.90 0.93

6

English

Mathematics 0.60

Reading 0.70 0.57

Science 0.72 0.62 0.78

Writing 0.50 0.40 0.48 0.48

ELA 0.87 0.62 0.85 0.77 0.80

STEM 0.74 0.87 0.76 0.92 0.49 0.78

Composite 0.88 0.78 0.88 0.90 0.54 0.91 0.94

Note. Sample included students taking English, mathematics, reading, science, and writing tests from the spring 2013 special studies.

aCt aSPIRE ValIDIty

17.32


Grade English Mathematics Reading Science Writing ELA STEM

7

English

Mathematics 0.62

Reading 0.69 0.64

Science 0.68 0.69 0.74

Writing 0.49 0.45 0.47 0.47

ELA 0.89 0.68 0.85 0.75 0.77

STEM 0.71 0.90 0.75 0.93 0.50 0.78

Composite 0.88 0.83 0.87 0.90 0.54 0.91 0.94

8

English

Mathematics 0.69

Reading 0.72 0.68

Science 0.72 0.76 0.76

Writing 0.54 0.52 0.54 0.53

ELA 0.90 0.74 0.88 0.79 0.78

STEM 0.75 0.94 0.77 0.94 0.56 0.81

Composite 0.89 0.87 0.88 0.91 0.60 0.93 0.95

9

English

Mathematics 0.73

Reading 0.77 0.69

Science 0.73 0.73 0.73

Writing 0.64 0.57 0.62 0.57

ELA 0.93 0.75 0.89 0.77 0.83

STEM 0.79 0.93 0.76 0.93 0.61 0.82

Composite 0.92 0.88 0.88 0.89 0.67 0.94 0.95

10

English

Mathematics 0.74

Reading 0.77 0.72

Science 0.74 0.78 0.75

Writing 0.65 0.60 0.60 0.59

ELA 0.93 0.78 0.89 0.79 0.83

STEM 0.79 0.94 0.78 0.94 0.63 0.83

Composite 0.91 0.90 0.89 0.90 0.68 0.94 0.95

Note. Sample included students taking English, mathematics, reading, science, and writing tests from the spring 2013 special studies.

aCt aSPIRE ValIDIty

17.33

17.2.4 Evidence Regarding Relationships with Conceptually Related ConstructsEvidence regarding relationships with conceptually related constructs refers to evaluating the relationships between scores on test items compared to other variables. Two studies summarized below provide evidence regarding the relationships of ACT Aspire scale scores with scale scores on other assessments of similar constructs. One study compares grade 8 ACT Aspire scores to ACT Explore scores and early high school ACT Aspire scores to ACT Plan scores. A second study

compares grades 3 to 8 ACT Aspire scores to scores on a state assessment.

17.2.4.1 Study 1: Comparison of ACT Explore and ACT Plan Scores to ACT Aspire Scores

In this study the relationships between ACT Explore, ACT Plan, and ACT Aspire English, mathematics, reading, and science scale scores and Composite scores were compared using samples of students taking ACT Explore and ACT Aspire (grades 8 and 9) or ACT Plan and ACT Aspire assessments (grade 10). Samples included students participating in the special 2013 ACT Aspire studies described in Chapter 12 and who had also taken ACT Plan or ACT Explore separately. The sample sizes and demographics by grade and subject in each sample are listed in Table 17.12. The samples included students from a total of 122 districts and 263 schools.

ACT Explore and ACT Plan are both intended to measure academic achievement, similar to ACT Aspire. In fact, ACT Aspire is intended to serve similar purposes and share similar interpretations as ACT Explore and ACT Plan: ACT Aspire is the successor to those assessments. ACT Explore and ACT Plan scale scores are intended to be on the same scale but have different scale ranges. ACT Explore scale scores range from 1 to 25 and ACT Plan scale scores range from 1 to 32. ACT Aspire scores are also intended to be on the same scale and have different scale score ranges across grades (see Table 17.13).

aCt aSPIRE ValIDIty

17.34

Table 17.12. Percentages of Students in the aCt Explore, aCt Plan, and aCt aspire matched Sample by Subject, grade and Demographics

Dem

ogra

phic

Gro

upin

g %

Eng

lish

Mat

hem

atic

sR

eadi

ngS

cien

ceC

ompo

site

Gra

de8

910

89

108

910

89

108

910

(N)

(7,5

74)

(1,5

63)

(4,6

43)

(7,8

03)

(1,6

41)

(4,5

45)

(7,5

94)

(1,4

84)

(4,2

49)

(7,7

79)

(1,5

75)

(4,2

36)

(6,4

19)

(1,1

81)

(3,1

54)

Gen

der

Not

Spe

cifie

d1

04

21

03

11

03

19

10

21

13

1

Fem

ale

45

48

49

45

49

49

45

48

49

45

44

48

45

48

50

Mal

e4

54

84

94

54

85

04

54

95

04

54

65

04

54

95

0

Rac

e/E

thni

city

Not

Spe

cifie

d3

81

32

63

61

22

93

61

32

93

51

93

13

81

33

0

Bla

ck/A

fric

an A

mer

ican

19

34

21

21

33

21

21

33

22

20

26

20

20

24

21

Am

eric

an In

dian

/Ala

skan

Nat

ive

10

01

00

10

11

01

10

1

Whi

te3

54

54

53

64

64

23

64

74

13

74

84

13

55

44

2

His

pani

c/La

tino

45

35

64

44

35

43

45

3

Asi

an1

22

12

21

22

12

21

32

Nat

ive

Haw

aiia

n/O

ther

Pac

ific

Isla

nder

10

01

00

10

01

00

10

0

Two

or M

ore

Rac

es0

11

00

10

11

01

10

12

Reg

ion

Eas

t7

51

85

77

41

75

27

61

75

37

61

95

17

81

85

6

Mid

wes

t1

87

83

92

07

84

31

87

94

31

77

84

41

58

03

9

Sou

thw

est

50

24

02

40

14

02

50

2

Wes

t2

42

25

32

33

23

32

23

Not

e. Z

ero

indi

cate

s a

perc

enta

ge fe

wer

than

1%

. Mis

sing

val

ues

indi

cate

no

stud

ents

. Per

cent

ages

with

in a

dem

ogra

phic

gro

upin

g m

ay n

ot s

um u

p to

10

0%

due

to

roun

ding

.

aCt aSPIRE ValIDIty

17.35

Table 17.13 lists the mean, standard deviation, minimum, and maximum of scale scores for ACT Explore/ACT Plan and ACT Aspire. The mean scale scores increase for ACT Explore/Plan and ACT Aspire across grades 8 to 10. This pattern is consistent with the argument that students at higher grades have higher achievement and therefore should have higher scores.

Table 17.13. Descriptive Statistics for aCt Explore, aCt Plan and aCt aspire Scale Scores

Subject Grade N Assessment Mean SD Minimum Maximum

English8 7,574

ACT Explore

ACT Aspire

14.62

426.17

4.02

8.72

1

400

25

452

9 1,563ACT Explore

ACT Aspire

15.73

426.87

4.38

10.75

2

400

25

454

10 4,643ACT Plan

ACT Aspire

17.37

429.58

4.65

11.19

1

400

32

456

Mathematics8 7,803

ACT Explore

ACT Aspire

14.88

421.22

3.29

7.54

1

401

25

449

9 1,641ACT Explore

ACT Aspire

16.56

423.85

3.76

8.47

4

403

25

451

10 4,545ACT Plan

ACT Aspire

17.82

425.10

4.70

9.14

1

403

32

455

Reading8 7,594

ACT Explore

ACT Aspire

14.60

420.15

3.82

7.29

1

401

25

440

9 1,484ACT Explore

ACT Aspire

15.30

420.18

4.20

7.74

1

403

25

442

10 4,249ACT Plan

ACT Aspire

17.10

420.95

4.54

8.03

1

403

30

442

Science8 7,779

ACT Explore

ACT Aspire

16.63

421.82

3.21

7.75

3

402

25

446

9 1,575ACT Explore

ACT Aspire

17.69

423.90

3.54

8.17

5

402

25

446

10 4,236ACT Plan

ACT Aspire

18.59

423.97

3.89

8.80

1

401

32

449

Composite8 6,419

ACT Explore

ACT Aspire

15.35

422.62

3.16

6.95

3

406

25

443

9 1,181ACT Explore

ACT Aspire

16.88

424.75

3.44

7.80

8

407

25

445

10 3,154ACT Plan

ACT Aspire

17.89

425.22

3.97

8.38

6

408

31

447

aCt aSPIRE ValIDIty

17.36

Table 17.14 lists the correlations between same-subject ACT Explore or ACT Plan scale scores and ACT Aspire scale scores for grades 8 to 10 (sample sizes are listed in Table 17.13). Correlations ranged from .65 (grade 10 reading) to .85 (grade 8 Composite), which are indicative of moderate to strong linear relationships between ACT Explore/ACT Plan and ACT Aspire. From a linear regression perspective, we can say that 42% to 72% of the variance in scale scores is shared between ACT Explore/ACT Plan and ACT Aspire.

Table 17.14. Correlations between aCt Explore/aCt Plan Scale Scores and aCt aspire Scale Scores

Sample/Grade English Mathematics Reading Science Composite

ACT Explore-ACT Aspire/8 .75 .72 .70 .69 .85


ACT Plan-ACT Aspire/10 .78 .77 .65 .69 .84

Table 17.15 lists the disattenuated correlations between ACT Explore or ACT Plan scale scores and ACT Aspire scale scores (sample sizes are listed in Table 17.13). Disattenuated correlations are estimates of the linear relationships between scores after the reliability of each test was taken into account.6 In classical test theory, disattenuated correlations are referred to as estimates of the relationship between true scores; they provide an estimate of the relationship between ACT Explore/ACT Plan and ACT Aspire as if each contributing score were perfectly reliable. Published or available reliability coefficients for ACT Explore (ACT, 2013a), ACT Plan (ACT, 2013b), and ACT Aspire (see Table 17.16) were used to calculate disattenuated correlations, which ranged from .75 to .91 across subjects, indicative of moderate to strong correlations. From a linear regression perspective, we can say that 56% to 82% of the variance in true scores is shared between ACT Explore/ACT Plan and ACT Aspire.

Table 17.15. Disattenuated Correlations between aCt Explore/aCt Plan Scale Scores and aCt aspire Scale Scores

Sample/Grade English Mathematics Reading Science Composite



ACT Plan-ACT Aspire/10 .88 .91 .75 .81 .88

6 Disattenuated correlations are calculated as the correlation divided by the square root of the product of the

reliabilities or

aCt aSPIRE ValIDIty

17.37

Table 17.16. Scale Score Reliabilities

Test and Grade Level English Mathematics Reading Science Composite

ACT Aspire 8* .85 .88 .85 .87 .96

ACT Aspire Early High School* .91 .89 .88 .88 .97

ACT Explore 8† .84 .76 .86 .79 .94

ACT Explore 9† .86 .80 .86 .82 .95

ACT Plan 10‡ .87 .80 .85 .82 .95

* Obtained from spring 2013 reliabilities reported above in the Reliability section† Obtained from the ACT Explore Technical Manual (ACT, 2013a)‡ Obtained from the ACT Plan Technical Manual (ACT, 2013b)

Disattenuated correlations also provide us with an estimated upper limit of the observed correlations in Table 17.14. With this in mind, many of the moderate observed correlations in Table 17.14 are not that far from their disattenuated values, suggesting that they couldn’t be much higher than those observed due to the reliability of the contributing tests. For example, the smallest observed correlations in Table 17.14 were between reading tests (.70, .66, and .65), but the disattenuated correlations were roughly .10 higher (.82, .76, and .75).

Figures 17.2–17.6 display the relationships between ACT Explore or ACT Plan scale scores and ACT Aspire scale scores using box plots. Each box represents the distribution of ACT Explore or ACT Plan scale scores for a particular ACT Aspire scale score, with the box itself representing the the 25th percentile to the 75th percentiles of ACT Explore or ACT Plan scores, the line in the middle of the box representing the 50th percentile (or median), and the diamond representing the mean. The upper and lower whiskers of the box plots represent plus or minus 1.5 times the interquartile range (i.e., the range represented by the difference between 25th and 75th percentiles), and the circles represent individual scores outside of the range represented by the whiskers. As ACT Aspire scale scores increase (horizontal axis), the boxes representing the ACT Explore or ACT Plan scales score distribution also generally enlarge (vertical axis), illustrating the positive relationship between scale scores.

aCt aSPIRE ValIDIty

17.38

Figure 17.2. box plots of aCt Explore or aCt Plan English scale scores conditional on aCt aspire English scale scores

Grade 8

Grade 9

aCt aSPIRE ValIDIty

17.39


Grade 10

Figure 17.3. box plots of aCt Explore or aCt Plan mathematics scale scores conditional on aCt aspire mathematics scale scores

Grade 8

aCt aSPIRE ValIDIty

17.40


Grade 9

Grade 10

aCt aSPIRE ValIDIty

17.41

Figure 17.4. box plots of aCt Explore or aCt Plan reading scale scores conditional on aCt aspire reading scale scores

Grade 8

Grade 9

aCt aSPIRE ValIDIty

17.42


Grade 10

Figure 17.5. box plots of aCt Explore or aCt Plan science scale scores conditional on aCt aspire science scale scores

Grade 8

aCt aSPIRE ValIDIty

17.43


Grade 9

Grade 10

aCt aSPIRE ValIDIty

17.44

Figure 17.6. box plots of aCt Explore or aCt Plan Composite scale scores conditional on aCt aspire Composite scale scores

Grade 8

Grade 9

aCt aSPIRE ValIDIty

17.45


Grade 10

Together, the correlations and box plots show that ACT Explore/ACT Plan and ACT Aspire scores are moderately to strongly related. Because ACT Explore/ACT Plan and ACT Aspire are designed to measure student achievement and have similar purposes, we would expect to observe relatively strong, but not perfect, positive relationships between scale scores across assessments, similar to those observed in Tables 17.14 and 17.15. The observed correlations were lower than anticipated, and this appeared to be explained by the reliabilities of the tests tempering the relationships between scores.

In addition to considering the relationships between scale scores across ACT Explore or ACT Plan and ACT Aspire for the same subject, we can consider the pattern of relationships across subjects (in this case, English, mathematics, reading, and science) and assessments (ACT Explore/ACT Plan and ACT Aspire). This analysis is commonly referred to as investigating a multitrait-multimethod matrix (Campbell & Fiske, 1959). This matrix is intended to provide convergent and divergent evidence regarding the subjects (traits) being measured by different assessments (methods).

Table 17.17 contains a multitrait-multimethod matrix that includes ACT Explore/ACT Plan and ACT Aspire English, mathematics, reading, and science scores for grades 8 to 10. Test reliabilities are reported in parentheses when a row and column contains the same subject and same test (for example, ACT Explore reading with ACT Explore

aCt aSPIRE ValIDIty

17.46

reading)7 and the remaining cells contain correlations among scores. Ideally, we would include a variety of assessments measuring performance in different subjects to study effects due to methods and those due to traits. However, availability of traits and methods was limited to ACT Explore/ACT Plan scores and ACT Aspire English, reading, mathematics, and science scores for this analysis.

Table 17.17. multitrait-multimethod matrices for aCt Explore/Plan and aCt aspire Scale Scores by grade level

ACT Explore/ACT Plan ACT Aspire

Grade(valid N) Scale Score English Mathematics Reading Science English Mathematics Reading Science

Grade 8 (6,419)

ACT Explore English (.84)

ACT Explore Mathematics .68 (.76)

ACT Explore Reading .76 .63 (.86)

ACT Explore Science .70 .67 .71 (.79)

ACT Aspire English .76 .62 .69 .63 (.85)

ACT Aspire Mathematics .69 .72 .63 .66 .71 (.88)

ACT Aspire Reading .68 .57 .70 .63 .73 .67 (.85)

ACT Aspire Science .69 .64 .68 .69 .72 .77 .76 (.87)

Grade 9 (1,181)

ACT Explore English (.86)

ACT Explore Mathematics .70 (.80)

ACT Explore Reading .75 .66 (.86)

ACT Explore Science .69 .70 .73 (.82)





Grade 10 (3,154)

ACT Plan English (.87)

ACT Plan Mathematics .73 (.80)

ACT Plan Reading .78 .66 (.85)

ACT Plan Science .74 .76 .71 (.82)





Note. Cells in parentheses are coefficient alpha reliabilities. Others are correlations.

7 These reliabilities are coefficient alpha.

aCt aSPIRE ValIDIty

17.47

For convergent evidence that ACT Aspire and ACT Explore/ACT Plan are measuring English, mathematics, reading, and science achievement, we want to see large positive monotrait-heteromethod correlations, which are represented by the correlation between ACT Explore/ACT Plan and ACT Aspire in Table 17.17 (see bold text). For discriminant evidence, we want to see smaller positive heterotrait-monomethod correlations, which are represented by all of the correlations between subjects within a given test (ACT Explore, ACT Plan, or ACT Aspire) in Table 17.17. Finally, we want to see the smallest correlations for heterotrait-heteromethod correlations, which are represented by the correlations between different subjects across different tests. To support interpreting ACT Explore/ACT Plan and ACT Aspire test scores as measuring similar academic achievement in a subject and distinct academic achievements in different subjects, we want to see stronger convergent evidence (monotrait-heteromethod correlations) and weaker discriminant evidence (heterotrait-heteromethod and heterotrait-monomethod correlations).

For English, the monotrait-heteromethod correlations (ACT Explore/ACT Plan English and ACT Aspire English) were all relatively large (above .75) and were among the largest in Table 17.17. Within ACT Aspire, the heterotrait-monomethod correlations for English were also relatively large although none were larger than the monotrait-heteromethod correlations. Within ACT Explore and ACT Plan, the heterotrait-monomethod correlations for English were relatively large, and for grades 8 to 10 reading the correlations were the same as or larger than the monotrait-heteromethod correlations. Most of the heterotrait-heteromethod correlations were the lowest but were not that much lower than the heterotrait-monomethod correlations. Higher alphas for ACT Aspire than for ACT Explore/ACT Plan indicated slightly stronger convergent evidence for ACT Aspire than for ACT Explore/ACT Plan. Finally, the large heterotrait-monomethod correlations indicate that the English test was not strongly differentiated from the other subjects.

For mathematics, the monotrait-heteromethod correlations ranged from .72 to .77. Within ACT Aspire, the heterotrait-monomethod correlations for mathematics were moderate to large, ranging from .67 to .78. In particular, the correlations between ACT Aspire science and ACT Aspire mathematics were slightly larger (.77 to .78) than the correlations between ACT Aspire mathematics and reading or English. Within ACT Explore/ACT Plan the heterotrait-monomethod correlations for mathematics were moderate to large but all were smaller than the monotrait-heteromethod correlations. The heterotrait-heteromethod correlations were smaller than the other correlations. Higher alphas for the ACT Aspire suggested that ACT Aspire showed slightly stronger convergent evidence than ACT Explore/ACT Plan. Notably, ACT Aspire mathematics and science were most closely related.

aCt aSPIRE ValIDIty

17.48

For reading, the monotrait-heteromethod correlations ranged from .65 to .70. For ACT Explore/ACT Plan and ACT Aspire, the heterotrait-monomethod correlations between reading and English and between reading and science were larger than the monotrait-heteromethod correlations. In addition, for ACT Aspire, the correlation between reading and mathematics in grade 10 was larger than the correlation between reading tests for ACT Plan and ACT Aspire. The heterotrait-heteromethod correlations for grade 10 English and reading were larger than the monotrait-heteromethod correlation for grade 10 reading. The ACT Aspire and ACT Explore/ACT Plan reading tests showed similar patterns of relatively weak convergent evidence compared to discriminant evidence; the reading tests showed particularly strong method effects.

For science, the monotrait-heteromethod correlations ranged from .68 to .71. Nearly all of the heterotrait-monomethod correlations were larger than the monotrait-heteromethod correlations. The one exception was the grade 8 ACT Explore mathematics and science correlation, which was .67. The heterotrait-heteromethod correlations were smaller than the heterotrait-monomethod correlations although for grade 8 the ACT Aspire science and ACT Explore English correlation was the same as the monotrait-heteromethod correlation for science (.69), and for grade 10 the ACT Plan science and ACT Aspire mathematics correlation (.73) was larger than the monotrait-heteromethod correlation for science (.71).

The multitrait-multimethod matrix for ACT Explore/ACT Plan and ACT Aspire provided evidence regarding how well a subject (trait) was measured by different assessments (methods) and how well the subjects were differentiated from one another. This evidence was mixed. English showed the strongest convergent evidence, particularly for ACT Aspire, followed by mathematics, reading, and science. ACT Aspire mathematics and science showed particularly strong relationships. While ACT Explore/ACT Plan and ACT Aspire are intended to measure the same traits, the multitrait-multimethod results are consistent with the argument that the assessments may be systematically different in ways that lead to stronger relationships among scores within each assessment compared to between the assessments, particularly for reading and science. One explanation is that ACT Aspire forms contain a variety of item types, including technology-enhanced and fill-in-the-blank items (online) and constructed-response items. The fact that both ACT Explore/ACT Plan and ACT Aspire English tests do not have constructed-response items explains its strongest convergent evidence among all the subjects.

While there may be some degree of method effects, the results of this study support the argument that ACT Aspire is measuring achievement in English, mathematics, reading, and science in grades 8, 9, and 10. The correlations of ACT Aspire scores with ACT Explore/ACT Plan tests, which are intended to measure achievement and have been separately validated, are moderate to large. The magnitudes of these correlations are similar to those observed between ACT Explore and ACT Plan and between ACT Explore and the ACT in other technical documentation (e.g., ACT 2014b,

p. 87).

aCt aSPIRE ValIDIty

17.49

17.2.4.2 Study 2: Comparison of State Assessment Scores to ACT Aspire Scores

In this study, the relationships between Alabama reading and mathematics test scores (plus science; ARMT+) and ACT Aspire scale scores were compared using a sample of Alabama students taking both assessments in spring of 2013 in grades 3–8.8 This sample included students participating in 2013 ACT Aspire special studies described in other chapters. The sample sizes by grade and subject are listed in Table 17.18 ; this sample included roughly 5% to 15% of the tested population in Alabama.

ARMT+ and ACT Aspire are intended to assess student achievement. ARMT+ was designed to assess students’ mastery of state content standards in mathematics, reading, and science. ACT Aspire is designed to measure students’ academic achievement in English, mathematics, reading, science, and writing. While each test is built using different blueprints and different specifications, the traits intended to be measured by both include mathematics, reading, and science.

Table 17.19 lists the means, standard deviations, minimums, and maximums for ARMT+ and ACT Aspire scale scores by grade and subject. In all but three cases, mean scale scores increase across grades. For grades 6 and 7, the ACT Aspire reading mean scale score decreases slightly from 418.46 to 418.08.9 For grades 5 and 6 mathematics, the ARMT+ mean scale score decreases from 676.56 to 667.36 and for grades 5 and 7 science, ARMT+ mean scale scores decrease from 596.87 to 563.94. However, scores on ARMT+ are not strictly interpreted assuming vertical scale properties. For this reason, grade-to-grade comparisons of ARMT+ and ACT Aspire scores with data from 2013 administrations were not entirely appropriate. In this study, we compared ARMT+ scores and ACT Aspire scores within grades although many tables and figures include multiple grade levels to save space.

8 Alabama science scores were only available for grades 5 and 7.9 This decrease in mean scale scores is consistent with results from other samples of students, including those

included in the scaling study for ACT Aspire.

aCt aSPIRE ValIDIty

17.50

Table 17.18. Sample Sizes by grade and Subject for the Sample of Students with Scores on aRmt+ and aCt aspire in Spring 2013

Grade Mathematics Reading Science Total

3 9,020 9,198 — 9,688

4 9,185 9,249 — 9,849

5 8,188 8,231 7,209 9,011

6 6,257 6,242 — 6,695

7 5,060 5,058 4,702 5,423

8 4,045 4,306 — 4,475

Total 41,755 42,284 11,911 45,141

Table 17.19. Descriptive Statistics for aRmt+ and aCt aspire Scale Scores

Subject Grade N Assessment Mean SD Minimum Maximum

Mathematics3 9,020

AlabamaACT Aspire

635.27412.18

39.773.91

503400

781429

4 9,185AlabamaACT Aspire

649.85414.91

44.123.87

471402

865434


676.56416.85

38.184.78

560402

846437


667.36418.12

34.285.53

563402

823445


678.25419.10

42.236.59

573401

873445


696.32420.21

33.057.58

606401

835449

Reading3 9,198

AlabamaACT Aspire

634.60412.28

37.595.24

471401

785429


651.80414.60

37.705.64

513401

789431


663.56416.45

35.776.19

549401

849434


671.91418.46

35.706.66

538401

802436


681.49418.08

33.336.69

554402

787438


681.71420.32

31.507.19

577401

818440

Science5 7,209

AlabamaACT Aspire

596.87418.31

103.506.43

136401

999438


563.94418.58

106.927.42

42401

999440

aCt aSPIRE ValIDIty

17.51

Table 17.20 lists the correlations between same-subject ARMT+ scale scores and ACT Aspire scale scores across grades. Correlations ranged from .60 to .81, with higher correlations observed as grade increased for mathematics and science. These correlations are indicative of moderate to strong linear relationships between ARMT+ and ACT Aspire scores across subjects. From a linear regression perspective, we can say that 36% to 66% of the variance in scale scores is shared between ARMT+ and ACT Aspire.10

Table 17.21 lists the disattenuated correlations between ARMT+ and ACT Aspire. Published or available reliability coefficients for ARMT+ and ACT Aspire were used to calculate disattenuated correlations, which ranged from .71 to .90 and are indicative of moderate to strong correlations. From a linear regression perspective, we can say that 50% to 81% of the variance in true scores is shared between ARMT+ and ACT Aspire.

Table 17.20. Correlations between aRmt+ Scale Scores and aCt aspire Scale Scores

Grade Mathematics Reading Science

3 .60 .73 —

4 .62 .77 —

5 .64 .76 .65

6 .64 .74 —

7 .72 .72 .69

8 .81 .75 —

Table 17.21. Disattenuated Correlations between aRmt+ Scale Scores and aCt aspire Scale Scores

Grade Mathematics Reading Science

3 .71 .84 —

4 .77 .88 —

5 .79 .87 .73

6 .77 .85 —

7 .84 .84 .76

8 .90 .85 —

10 Percentage of variance shared is calculated by squaring the correlation between scores.

aCt aSPIRE ValIDIty

17.52

Figures 17.7–17.9 display the relationships between ARMT+ scale scores and ACT Aspire scale scores using box plots. Each box represents the distribution of ARMT+ scale scores for a particular ACT Aspire scale score. The important pattern to observe in Figures 17.7 to 17.9 is that as ACT Aspire scale scores increase (horizontal axis), the boxes representing the ARMT+ scales score distribution also generally mount (vertical axis).

Together, the correlations and box plots show that ARMT+ and ACT Aspire scores are moderately to strongly related. Because ARMT+ and ACT Aspire are both designed to measure student achievement, but each are distinct assessments built to different test specifications with their own unique scale, we would expect to observe moderately to strongly, but not perfectly, positive relationships between ARMT+ scale scores and ACT Aspire scale scores.

Table 17.22 contains a multitrait-multimethod matrix which includes scores in mathematics, reading, and science for ARMT+ and ACT Aspire. Test reliabilities are reported in parentheses when a row and a column contain the same subject and same test (e.g., ARMT+ reading with ARMT+ reading) and the remaining cells contain correlations between scores. Ideally, we would include a variety of assessments measuring performance in different subjects to study effects due to methods and due to traits. However, availability of methods and traits was limited to scores in mathematics, reading, and science for ARMT+ and ACT Aspire in this analysis.

Figure 17.7. box plots of aRmt+ mathematics scale scores conditional on aCt aspire mathematics scale scores

Grade 3

aCt aSPIRE ValIDIty

17.53


Grade 4

Grade 5

aCt aSPIRE ValIDIty

17.54


Grade 6

Grade 7

aCt aSPIRE ValIDIty

17.55


Grade 8

Figure 17.8. box plots of aRmt+ reading scale scores conditional on aCt aspire reading scale scores

Grade 3

aCt aSPIRE ValIDIty

17.56


Grade 4

Grade 5

aCt aSPIRE ValIDIty

17.57


Grade 6

Grade 7

aCt aSPIRE ValIDIty

17.58


Grade 8

Figure 17.9. box plots of aRmt+ science scale scores conditional on aCt aspire science scale scores

Grade 5

aCt aSPIRE ValIDIty

17.59


Grade 7

aCt aSPIRE ValIDIty

17.60

Table 17.22. multitrait-multimethod matrices for aRmt+ and aCt aspire Scale Scores by grade level

Alabama ACT Aspire

Grade (valid N) Scale Score Mathematics Reading Science Mathematics Reading Science

Grade 3 (8,530)

Alabama Mathematics (.92)

Alabama Reading .71 (.90)

ACT Aspire Mathematics .60 .63 (.77)

ACT Aspire Reading .56 .73 .65 (.84)

Grade 4 (8,585)





Grade 5 (6,543)



Alabama Science .70 .73 (.92)

ACT Aspire Mathematics .63 .57 .51 (.71)

ACT Aspire Reading .65 .76 .63 .58 (.83)

ACT Aspire Science .70 .73 .65 .64 .75 (.86)

Grade 6 (5,804)





Grade 7 (4,345)



Alabama Science .75 .74 (.94)

ACT Aspire Mathematics .71 .59 .60 (.78)

ACT Aspire Reading .68 .72 .69 .63 (.82)

ACT Aspire Science .73 .70 .69 .69 .76 (.88)

Grade 8 (3,876)





Note. Cells in parentheses are coefficient alpha reliabilities. Others are correlations.

aCt aSPIRE ValIDIty

17.61

For convergent evidence that ACT Aspire and ARMT+ are measuring mathematics, reading, and science achievement, we want to see large positive monotrait-heteromethod correlations, which are represented by the correlation between ARMT+ and ACT Aspire scores in the same subject (e.g., ARMT+ reading and ACT Aspire reading; see bold text in Table 17.22). For discriminant evidence, we want to see smaller positive heterotrait-monomethod correlations, which are represented by all of the correlations among subjects within a given test (ARMT+ or ACT Aspire) in Table 17.22. Finally, we want to see the smallest correlations for heterotrait-heteromethod correlations, which are represented by the correlations between different subjects across different tests. To support interpreting ARMT+ and ACT Aspire test scores as measuring similar academic achievement in a subject and distinct academic achievements in different subjects, we want to see stronger convergent evidence (monotrait-heteromethod correlations) and weaker discriminant evidence (heterotrait-heteromethod and heterotrait-monomethod correlations).

In mathematics, we see some divergent evidence for ARMT+ in all grades except grade 8. For example, in each of grades 3–7, ARMT+ mathematics correlated more highly with ARMT+ reading than with ACT Aspire mathematics (heterotrait-monomethod). For grades 5 and 7, ARMT+ mathematics correlated more highly with ARMT+ science than with ACT Aspire mathematics. In addition, in some cases ARMT+ mathematics correlated more highly with ACT Aspire reading and science than with ACT Aspire mathematics (heterotrait-heteromethod). ACT Aspire mathematics scores also showed some divergent evidence. For example, ACT Aspire grade 3 mathematics scores were correlated more with ACT Aspire reading scores than with ARMT+ mathematics scores.

In reading, we see that the monotrait-heteromethod correlations between ARMT+ and ACT Aspire were relatively large (all are above .7) and for all but grades 7 and 8 these were the largest correlations in the matrices. Moreover, almost all of the heterotrait-monomethod correlations were smaller than the monotrait-heteromethod correlations. The heterotrait-heteromethod correlations were the smallest most of time. One anomaly in reading was that for grade 7, the correlation between reading test scores on ARMT+ and ACT Aspire was not as large as the correlation between (a) ARMT+ reading and ARMT+ science and (b) ACT Aspire reading and ACT Aspire science. In other words, the method effect is slightly stronger for grade 7 reading than the trait effect, at least when reading and science were considered. In general, though, these patterns of correlations indicate stronger convergent evidence than discriminant evidence across reading tests for ARMT+ and ACT Aspire.

In science, we see that the monotrait-heteromethod correlations between ARMT+ and ACT Aspire were not as large as many of the heterotrait-monomethod correlations. For example, for grade 5, the science correlation between ARMT+ and ACT Aspire was .65, all of the heterotrait-monomethod correlations for ARMT+ (i.e., those in the first three rows for grade 5) were .7 or higher, and the correlation

aCt aSPIRE ValIDIty

17.62

between ACT Aspire reading and science was .75. In science, we see weaker convergent evidence than discriminant evidence, particularly for ARMT+. However, these results may not be surprising given that science is likely to draw on both reading and mathematics.

One possible explanation for the divergent evidence observed in science and mathematics is the effects of the test reliabilities on the correlations among variables. Table 17.23 lists the disattenuated correlations among ARMT+ and ACT Aspire scale scores. As mentioned earlier, these correlations are an estimate of the true score relationships after the reliabilities of the two scores are taken into account. If we consider the disattenuated correlations as an upper limit on the observed correlations between scores (i.e., those in Table 17.22), then the correlations shown in Table 17.23 are quite strong. For example, the correlation between ACT Aspire grade 4 mathematics and reading was .57 with a disattenuated correlation of .75, which was one of the largest differences between observed and disattenuated correlation. In addition, scanning through Table 17.23 reveals that the pattern of convergent evidence is improved and the divergent evidence somewhat weaker using disattenuated correlations, which is indicative that test reliability may explain part of the stronger divergent evidence for mathematics and science. For example, grade 7 ARMT+ mathematics was most strongly correlated with ARMT+ science and reading in Table 17.22 (divergent evidence), but the largest disattenuated correlation for grade 7 ARMT+ mathematics was between mathematics scores for ARMT+ and ACT Aspire.

To summarize, we see relatively strong positive relationships between ACT Aspire and ARMT+ scores in the same subject, particularly if we account for the reliabilities of scores. In addition, we see stronger convergent evidence that ARMT+ and ACT Aspire are measuring similar traits in reading. In mathematics and science, we see divergent evidence in grades 3–7 indicative of method effects. The results are consistent with an interpretation of mathematics and science subjects as being particularly distinct from one another across the ARMT+ and ACT Aspire tests, but the reliabilities of some tests likely exaggerated such divergence. Test design difference could also explain the divergent evidence observed in science. ARMT+ science only included selected-response items, whereas ACT Aspire also included constructed-response items. However, mathematics and reading tests for ARMT+ and ACT Aspire contained selected-response items and constructed-response items, so this explanation may not apply to what was observed for mathematics tests.

aCt aSPIRE ValIDIty

17.63

Table 17.23. Disattenuated Correlations between aRmt+ and aCt aspire Scale Scores by grade level

Alabama ACT Aspire

Grade (valid N) Scale Score Mathematics Reading Science Mathematics Reading Science

Grade 3 (8,530)

Alabama Mathematics

Alabama Reading .78

ACT Aspire Mathematics .71 .76

ACT Aspire Reading .64 .84 .81

Grade 4 (8,585)

Alabama Mathematics

Alabama Reading .77



Grade 5 (6,543)

Alabama Mathematics

Alabama Reading .80

Alabama Science .76 .76

ACT Aspire Mathematics .78 .78 .63

ACT Aspire Reading .74 .87 .72 .76

ACT Aspire Science .79 .83 .73 .82 .89

Grade 6 (5,804)

Alabama Mathematics

Alabama Reading .80



Grade 7 (4,345)

Alabama Mathematics

Alabama Reading .79

Alabama Science .80 .80

ACT Aspire Mathematics .83 .70 .70

ACT Aspire Reading .77 .84 .79 .79

ACT Aspire Science .80 .79 .76 .83 .89

Grade 8 (3,876)

Alabama Mathematics

Alabama Reading .81



aCt aSPIRE ValIDIty

17.64

From a validity perspective, these results support the argument that ACT Aspire and ARMT+ measure academic achievement in mathematics, reading, and science. However, these results also support the argument that ARMT+ and ACT Aspire are not the same and likely to measure different aspects within the broad domain of academic achievement in each subject area.

17.2.5 Evidence Regarding Relationships with CriteriaEvidence regarding relationships with criteria refers to evaluating the relationship between scores on test items compared to one or more criterion variables. One of the primary interpretations of ACT Aspire scores is to identify students’ readiness on (a) a college readiness trajectory and (b) a career readiness trajectory. This implies that ACT Aspire scores can successfully predict college readiness and career readiness. Some evidence regarding relationships with college and career readiness are available in Chapters 14 (growth) and 15 (career readiness). Additional evidence using results from growth analysis that includes predictions between ACT Aspire and the ACT scores is included in Allen (2014). However, evidence from longitudinal data connecting performance on ACT Aspire to eventual performance in college and career are not yet available for ACT Aspire.

17.2.6 Evidence based on Consequences of testsEvidence based on consequences of tests refers to evaluating whether unintended consequences are influenced by or attributable to unintended characteristics or under-representation of the intended construct for particular students (or groups of students).

For example, cognitive characteristics like academic achievement, as measured by ACT Aspire, are not the only characteristic indicative of readiness on a college and career readiness trajectory. Other characteristics, such as noncognitive variables (e.g., behaviors, attitudes, and education and career planning skills) have also been shown to contribute to student success (Mattern, Burrus, Camara, O’Connor, Hansen, Gambrell, Casillas, & Bobek, 2014). An unintended consequence of using ACT Aspire performance as indicative of readiness for college and career may be to perpetuate an over emphasis on the cognitive contribution and an underemphasis on important, but frequently difficult to measure, noncognitive contributions.

Providing instructionally actionable information to educators may have unintended consequences. If used in an extreme way, such information could lead to what is often referred to as “teaching to the test”. ACT Aspire is a summative assessment of academic achievement that is built to particular specifications that are based on standards and ultimately sample from broad domains (English, mathematics, reading, science, and writing). However, the collection of items that appear on the ACT Aspire test forms are a sampling of content, as defined by the test specifications, from the broad content domain. There is a risk that if the instructionally actionable information is too specific, it may lead educators to pay less attention to or ignore those aspects of the domain that are less heavily emphasized by the ACT Aspire test specifications.

aCt aSPIRE ValIDIty

17.65

Another aspect of unintended consequences is potential performance differences across particular groupings or characteristics of students. Where such differences occur, they would need to be carefully considered to ensure that they are not explainable by some unintended characteristic. This will be an ongoing area of inquiry for ACT Aspire.

In each of these examples of unintended consequences, we need to evaluate the extent to which construct underrepresentation or unintended characteristics of the construct contribute to these consequences. At this early point in the usage of ACT Aspire assessments, it is difficult to adequately evaluate consequences because data for evaluating consequences are not always immediately available. However, the entire testing cycle, from determining the content domains to evaluating the results from test administrations has been established to reduce or eliminate problematic sources of construct underrepresentation or unintended characteristics. Descriptions of these principles, processes and procedures are included in various chapters of this manual, including test development, test administration, reporting, and scaling.

In addition, unintended consequences of ACT Aspire scores cannot always be anticipated, so the examples and descriptions above should be viewed as a sampling of unintended consequences rather than a comprehensive list. ACT will continue to review and monitor consequences and uses of ACT Aspire performance and collect data that can be used to evaluate unintended consequences.

17.3 SummaryThis chapter provided evidence regarding the validity of ACT Aspire scores. The interpretation of ACT Aspire scale scores as measures of academic achievement is foundational to the primary and secondary interpretations of ACT Aspire performance. However, much work on the validation of ACT Aspire score interpretations for proposed uses remains. This is especially true for a relatively new testing program like ACT Aspire where the body of evidence must be established.

This chapter presented and summarized two studies that provided evidence regarding the relationships of ACT Aspire scores with scores on other assessments. However, several types of evidence need to be addressed directly to support further validation of ACT Aspire scores, including evidence based on test content, evidence based on response processes, evidence based on internal test structure, evidence based on relations to other variables (specifically, test-criterion relationships), and evidence considering the consequences of testing. Evidence will continue to be collected to support the interpretations of ACT Aspire scores.

18.1

CHaPtER 18

FairnessThe chapter contains evidence to address the fairness of ACT Aspire assessments. Three aspects of fairness are considered: fairness during the testing process, fairness in test accessibility, and fairness in score interpretations for intended uses.

18.1 OverviewFairness is a fundamental consideration in the interpretation of test scores for particular uses. However, the term fairness is used in many ways and does not have a single technical definition. In addition, a comprehensive evaluation of fairness would consider the full range of goals implied by a testing program which could extend beyond the interpretations of scores for particular uses. Such goals could include social goals and legal requirements (AERA et al., 2014; Camilli, 2006).

This chapter adopts the perspective on fairness included in The Standards for Educational and Psychological Testing (AERA et al., 2014), which focuses on tests, testing, and test use when evaluating fairness. Three aspects of fairness are considered: fairness during the testing process, fairness in test accessibility, and fairness in score interpretations for intended uses. Each aspect of fairness is described in more detail in separate sections below.

Similar to validity, evaluation of fairness is ongoing, with evidence being collected and reported as it becomes available. Evidence regarding fairness with ACT Aspire assessments is not limited to this chapter and can be drawn from other chapters of this manual and other ACT Aspire documentation although we have attempted to present or reference relevant available evidence for evaluating fairness here.

FaIRNESS

18.2

18.2 Fairness During the Testing ProcessFairness during the testing process refers to students being tested in a way that optimizes their capability of showing what they know and can do. In other words, the entire testing process, from test design to scoring, facilitates students being able to perform their best and does not adversely affect the performance of particular students (or groups of students). ACT Aspire design, development, and scoring are conducted in a way that attempts to be sensitive to issues of fairness. Earlier chapters regarding test development, scoring, accessibility, scores, and reporting illustrate this commitment to fairness during the testing process.

18.3 Fairness in Test AccessibilityAccessibility in the context of fairness refers to the extent to which test takers can access the knowledge, skills, and/or abilities intended to be measured by the test without being unduly burdened by aspects of the test or test administration that may affect or limit access. For example, a student with a visual impairment may not be able to appropriately answer questions on a mathematics achievement test if the test material cannot be seen. ACT Aspire includes a variety of accessibility options for students, additional details can be found in Chapter 5.

18.4 Fairness in Score Interpretations for Intended UsesThe validity of score interpretations for the intended population served by an assessment is another consideration for fairness. It involves studying the validity of test score interpretations for different groupings of students but also implies considering individuals, especially when scores are interpreted for individual students.

When evaluating validity in the context of fairness, it is particularly important to consider the extent to which characteristics of the test may be under-represented or irrelevant to score interpretations for particular uses and may differentially affect groups taking the test. These characteristics are often referred to as construct under-representation or construct-irrelevant components (e.g., AERA et al., 2014) and may lead to scores having different meaning, and therefore differential validity, across groups. Evaluation of validity across groups can involve evidence from such analysis as differential item functioning, differential test functioning, and predictive bias (e.g., AERA et al., pp. 51-52). Empirical evidence across groupings of test takers is not yet available for inclusion in this technical manual. However, the development of ACT Aspire tests includes processes and procedures that attempt to reduce characteristics of the tests that could adversely affect the validity of test score interpretations across groups (e.g., see Chapter 2).

Evaluating fairness at the individual test score level is particularly important for tests like ACT Aspire, where score interpretations may be applied at the individual level and uses may have implications for individual test takers. There are two aspects of validity

FaIRNESS

18.3

at the individual student level. One is whether the test can support interpretations of scores for intended uses that involve the individual student. For example, ACT Aspire scores are intended to provide instructionally actionable information, which could imply making inferences for individual students. Collection of data to evaluate such evidence is ongoing for ACT Aspire and relevant validity evidence can be found in Chapter 17.

A second aspect of validity at the individual student level is that the interpretation of a score for a particular student is valid for its intended use. For example, test users should consider scores and score interpretations from ACT Aspire in the context of what they know about their students and consider how valid they judge particular interpretations of scores for intended uses to be for each student. It may be that a teacher, parent, student or other test user knows that scores are not appropriately reflective of a student’s achievement or that a student needs additional supports (e.g., test accommodations) for the student’s scores to be appropriately interpreted. Hopefully, consideration of student needs during testing can be addressed so that interpretations of scores for the student are valid for their intended uses, but test users should keep validity in mind when interpreting individual student performance. Best practices suggest using multiple sources of information when making decisions that affect individuals.

19.1

CHaPtER 19

ACT Aspire Mode Comparability StudyThis chapter describes a mode comparability study comparing ACT Aspire paper and online forms. The study included data from a special administration of ACT Aspire that involved a randomly equivalent group design. Because online and paper test forms are not 100% the same, test forms are statistically linked prior to obtaining scale scores to adjust for differences across mode. This chapter examines potential mode effects for the same items across mode and examines the comparability of scale scores after statistical linking. The results indicate that it is not reasonable to ignore possible mode effects, but after linking, scales scores across mode appeared comparable.

19.1 IntroductionA mode comparability study was undertaken to investigate ACT Aspire performance across online and paper modes. Most test forms included in this study were not identical across modes. This led us to assume that scores from paper and online forms were not interchangeable without first statistically linking test forms across modes.

Two primary purposes of the comparability study included (1) determining whether mode of administration affected student performance on common (i.e., identical) items across modes prior to scaling and (2) investigating the comparability of ACT Aspire scale scores, which were obtained after linking paper and online forms. The overarching question to address with the comparability study was whether the validity of the interpretations of test scores is similar across paper and online test forms. In this chapter we describe comparisons of raw number-of-points scores (on identical items), item-level performance, and scale scores across modes.

aCt aSPIRE mODE COmPaRabIlIty StuDy

19.2

Studying the effects of mode on performance of collections of identical items gives us an idea of direction and degree of differences due to mode. This, in turn, can help us determine whether performance on collections of identical items across modes could be considered interchangeable without statistical linking across modes to moderate potential mode effects.

Investigating the comparability of scale scores after linking forms across modes, where items on forms are not 100% the same but may have most of the same items, provides us with evidence regarding (a) the effectiveness of the linking and (b) the apparent interchangeability of scale scores across forms administered in different modes. When a testing program maintains test forms administered in different modes and scores are to be used interchangeably across forms, it is incumbent on those maintaining the test to show that scale scores are indeed comparable across forms (e.g., see standard 5.17 of the Standards for Educational and Psychological Testing; AERA et al., 2014, p. 106).

19.2 Method and ResultsTest materials included one online and one paper ACT Aspire English, mathematics, reading, science, and writing test form for grades 3–10. At least 80% of test items were the same1 across modes and consisted of multiple-choice and constructed-response item types.2 However, most online test forms also contained a small number of technology-enhanced and fill-in-the-blank items that were not amenable to paper-based presentation. Paper forms contained analogous multiple-choice items covering the same content, but these items were not considered identical across forms due to item-type differences.3 Table 19.1 shows the percentages of items considered different across modes. We handled these nonidentical items differently depending on the purpose of our analysis. For studying mode effects on raw number-of-points scores, we excluded such items and focused on the common (identical) items to gauge mode effects. For studying the comparability of scale scores across forms (by mode) we included the nonidentical items because scale scores are obtained using all operational items on a test form.

Participants in this study included students in grades 3 through 10 from a sample representing 13 states, 72 districts, and 108 schools. Table 19.2 identifies sample sizes for each subject by grade and mode. Sample sizes ranged from 645 (online grade 8 writing) to 1,539 (paper grade 3 mathematics). Additional details regarding the samples are provided in Tables 19.3–19.7, specifically percentages of demographic groupings by grade and mode.

1 Items contained identical content and required a consistent response process (e.g., selecting an option) from examinees.2 English forms did not contain constructed-response items.3 A separate study will explore nonoverlapping items across paper and online modes.


19.3

Table 19.1. Percentage of Items Different across Online and Paper Forms

Grade

Subject 3 4 5 6 7 8 EHS

English 16% 20% 16% 9% 6% — —

Mathematics 20% 16% 12% 18% 15% 8% 5%

Reading 8% 8% 13% 8% 13% — —

Science 11% 14% 14% 13% 13% 9% 13%

Writing — — — — — — —

Note. Technology-enhanced items and fill-in-the blank items were included in online forms, with selected-response analogs on paper forms. — = all items identical across modes (writing was based on a single prompt).

Table 19.2. Sample Sizes by grade and SubjectGrade

Subject

3 4 5 6 7 8 EHS

O P O P O P O P O P O P O P

English 1,359 1,405 1,480 1,489 1,349 1,346 1,129 1,166 1,019 1,033 866 887 1,176 1,209

Mathematics 1,488 1,539 1,534 1,532 1,450 1,427 1,156 1,194 974 1,010 896 927 1,075 1,098

Reading 1,441 1,513 1,399 1,401 1,299 1,294 1,004 1,038 1,010 1,037 800 831 857 919

Science 1,376 1,405 1,435 1,444 1,378 1,379 1,047 1,040 1,028 1,042 913 934 959 1,023

Writing 1,189 1,268 1,260 1,280 1,101 1,102 840 885 825 836 645 658 698 767

Note. O = online form, P = paper form.


19.4

19.3 Spring 2013 Comparability Study Demographic Summary TablesTable 19.3. Percentages of Students taking English tests in the Comparability Study by grade, mode and Demographics

Gra

de

Mod

e (N

)

34

56

78

EH

S

Dem

ogra

phic

Gro

upin

g %

OP

OP

OP

OP

OP

OP

OP

(1,3

59)

(1,4

05)

(1,4

80)

(1,4

89)

(1,3

49)

(1,3

46)

(1,1

29)

(1,1

66)

(1,0

19)

(1,0

33)

(866

)(8

87)

(1,1

76)

(1,2

09)

Gen

der

Not

Spe

cifie

d1

91

91

51

51

41

41

41

41

61

71

92

01

71

8

Fem

ale

40

41

40

42

40

43

42

41

42

43

42

41

36

36

Mal

e4

14

04

54

44

54

34

34

54

24

13

94

04

74

5

Rac

e/E

thni

city

Not

Spe

cifie

d4

24

34

14

23

53

52

82

72

52

52

42

42

62

8

Bla

ck/A

fric

an

Am

eric

an3

34

33

46

56

77

87

6

Am

eric

an In

dian

/A

lask

an N

ativ

e2

32

12

12

32

23

31

2

Whi

te4

74

44

64

54

64

54

64

75

25

24

84

44

94

7

His

pani

c/La

tino

33

33

23

56

43

66

56

Asi

an1

12

23

24

34

36

75

5

Nat

ive

Haw

aiia

n/O

ther

P

acifi

c Is

land

er2

23

28

88

76

76

65

5

Two

or M

ore

Rac

es0

01

11

11

11

12

11

1

Geo

grap

hic

Reg

ion

Eas

t7

37

47

06

96

56

45

15

45

55

65

45

46

05

8

Mid

wes

t1

21

21

31

32

12

02

32

22

22

11

31

33

13

3

Sou

thw

est

98

11

12

88

17

16

18

17

22

22

22

Wes

t6

66

77

79

95

61

11

17

7

Not

e. O

= o

nlin

e, P

= p

aper

. Zer

o in

dica

tes

few

er th

an 0

.5%

. Mis

sing

val

ues

indi

cate

no

stud

ents

for a

cel

l. P

erce

ntag

es w

ithin

a d

emog

raph

ic g

roup

ing

may

not

add

up

to 1

00

%

due

to ro

undi

ng.


19.5

Table 19.4. Percentages of Students taking mathematics tests in the Comparability Study by grade, mode and Demographics

Gra

de

Mod

e (N

)

34

56

78

EH

S

Dem

ogra

phic

Gro

upin

g %

OP

OP

OP

OP

OP

OP

OP

(1,4

88)

(1,5

39)

(1,5

34)

(1,5

32)

(1,4

50)

(1,4

27)

(1,1

56)

(1,1

94)

(974

)(1

,010

)(8

96)

(927

)(1

,075

)(1

,098

)

Gen

der

Not

Spe

cifie

d1

71

71

61

61

41

41

41

41

71

71

91

91

41

5

Fem

ale

40

42

40

41

40

43

43

42

40

42

42

41

37

38

Mal

e4

24

24

44

34

64

34

34

54

24

13

94

04

94

8

Rac

e/E

thni

city

Not

Spe

cifie

d4

24

24

14

23

33

33

73

62

62

62

32

42

12

2

Bla

ck/A

fric

an A

mer

ican

55

65

67

55

68

68

56

Am

eric

an In

dian

/Ala

skan

N

ativ

e2

22

22

12

32

23

32

2

Whi

te4

44

24

34

34

64

54

24

35

15

04

94

65

45

1

His

pani

c/La

tino

22

33

22

33

33

56

66

Asi

an2

32

23

23

34

35

76

6

Nat

ive

Haw

aiia

n/O

ther

P

acifi

c Is

land

er3

32

27

76

76

76

55

6

Two

or M

ore

Rac

es0

10

11

11

11

11

11

1

Geo

grap

hic

Reg

ion

Eas

t7

47

57

17

26

76

75

05

05

65

55

65

66

05

9

Mid

wes

t1

11

11

21

12

01

92

93

02

02

11

31

23

03

1

Sou

thw

est

11

10

10

11

78

16

15

17

18

19

21

22

Wes

t3

46

67

65

56

61

21

18

7

Not

e. O

= o

nlin

e, P

= p

aper

. Zer

o in

dica

tes

few

er th

an 0

.5%

. Mis

sing

val

ues

indi

cate

no

stud

ents

for a

cel

l. P

erce

ntag

es w

ithin

a d

emog

raph

ic g

roup

ing

may

not

add

up

to 1

00

% d

ue to

ro

undi

ng.


19.6

Table 19.5. Percentages of Students taking Reading tests in the Comparability Study by grade, mode and Demographics

Gra

de

Mod

e (N

)

34

56

78

EH

S

Dem

ogra

phic

Gro

upin

g %

OP

OP

OP

OP

OP

OP

OP

(1,4

41)

(1,5

13)

(1,3

99)

(1,4

01)

(1,2

99)

(1,2

94)

(1,0

04)

(1,0

38)

(1,0

10)

(1,0

37)

(800

)(8

31)

(857

)(9

19)

Gen

der

Not

Spe

cifie

d1

51

51

61

61

31

31

31

31

31

42

02

11

81

8

Fem

ale

42

42

40

41

40

43

44

41

43

44

41

39

35

33

Mal

e4

34

24

44

34

74

44

34

64

44

23

93

94

74

9

Rac

e/E

thni

city

Not

Spe

cifie

d4

24

44

24

33

23

33

23

22

52

52

42

52

72

7

Bla

ck/A

fric

an A

mer

ican

56

64

67

66

67

89

77

Am

eric

an In

dian

/Ala

skan

N

ativ

e2

22

22

12

33

23

33

3

Whi

te4

54

24

64

74

84

84

74

55

25

04

64

34

14

3

His

pani

c/La

tino

12

11

11

34

56

66

77

Asi

an2

31

23

23

35

45

67

6

Nat

ive

Haw

aiia

n/O

ther

P

acifi

c Is

land

er2

11

17

65

63

47

66

6

Two

or M

ore

Rac

es0

11

11

12

11

11

12

1

Geo

grap

hic

Reg

ion

77

77

78

79

74

74

58

58

54

54

49

50

53

52

Eas

t

Mid

wes

t1

21

11

41

42

22

12

42

52

32

21

51

53

53

6

Sou

thw

est

11

11

87

44

18

17

16

17

23

24

33

Wes

t7

71

31

29

9

Not

e. O

= o

nlin

e, P

= p

aper

. Zer

o in

dica

tes

few

er th

an 0

.5%

. Mis

sing

val

ues

indi

cate

no

stud

ents

for a

cel

l. P

erce

ntag

es w

ithin

a d

emog

raph

ic g

roup

ing

may

not

add

up

to 1

00

% d

ue to

ro

undi

ng.


19.7

Table 19.6. Percentages of Students taking Science tests in the Comparability Study by grade, mode and Demographics

Gra

de

Mod

e (N

)

34

56

78

EH

S

Dem

ogra

phic

Gro

upin

g %

OP

OP

OP

OP

OP

OP

OP

(1,3

76)

(1,4

05)

(1,4

35)

(1,4

44)

(1,3

78)

(1,3

79)

(1,0

47)

(1,0

40)

(1,0

28)

(1,0

42)

(913

)(9

34)

(959

)(1

,023

)

Gen

der

Not

Spe

cifie

d1

91

81

51

51

41

41

51

61

61

71

81

92

12

1

Fem

ale

40

41

40

41

40

42

42

40

42

43

43

41

34

33

Mal

e4

24

14

54

44

64

54

34

54

34

03

94

04

54

6

Rac

e/E

thni

city

Not

Spe

cifie

d4

84

74

24

33

53

63

13

12

72

72

32

33

33

4

Bla

ck/A

fric

an A

mer

ican

11

21

23

66

67

78

54

Am

eric

an In

dian

/Ala

skan

N

ativ

e2

32

22

12

33

24

32

3

Whi

te4

24

04

74

84

84

74

64

65

05

04

94

74

14

1

His

pani

c/La

tino

22

22

21

55

33

56

66

Asi

an2

32

23

24

34

36

67

5

Nat

ive

Haw

aiia

n/O

ther

P

acifi

c Is

land

er3

33

28

84

56

76

55

6

Two

or M

ore

Rac

es0

11

11

12

11

21

11

2

Geo

grap

hic

Reg

ion

Eas

t7

37

47

47

36

86

85

55

75

45

35

55

54

54

5

Mid

wes

t1

31

21

41

32

22

12

12

02

52

41

31

34

54

5

Sou

thw

est

12

12

11

12

88

18

18

16

17

21

21

32

Wes

t2

22

22

27

55

61

11

18

8

Not

e. O

= o

nlin

e, P

= p

aper

. Zer

o in

dica

tes

few

er th

an 0

.5%

. Mis

sing

val

ues

indi

cate

no

stud

ents

for a

cel

l. P

erce

ntag

es w

ithin

a d

emog

raph

ic g

roup

ing

may

not

add

up

to 1

00

% d

ue to

ro

undi

ng.


19.8

Table 19.7. Percentages of Students taking Writing tests in the Comparability Study by grade, mode and Demographics

Gra

de

Mod

e (N

)

34

56

78

EH

S

Dem

ogra

phic

Gro

upin

g %

OP

OP

OP

OP

OP

OP

OP

(1,1

89)

(1,2

68)

(1,2

60)

(1,2

80)

(1,1

01)

(1,1

02)

(840

)(8

85)

(825

)(8

36)

(645

)(6

58)

(698

)(7

67)

Gen

der

Not

Spe

cifie

d1

71

61

41

31

31

21

61

51

61

72

12

32

62

7

Fem

ale

41

42

41

42

41

44

41

41

42

43

42

39

29

28

Mal

e4

14

14

54

54

64

44

34

44

24

03

73

84

54

5

Rac

e/E

thni

city

Not

Spe

cifie

d4

74

94

64

74

04

03

73

62

93

13

03

13

93

8

Bla

ck/A

fric

an A

mer

ican

33

32

33

65

67

79

56

Am

eric

an In

dian

/Ala

skan

N

ativ

e2

22

22

23

33

24

43

3

Whi

te4

54

24

64

64

74

84

64

55

55

35

14

84

74

7

His

pani

c/La

tino

12

22

11

34

43

33

45

Asi

an1

10

10

01

11

12

31

0

Nat

ive

Haw

aiia

n/O

ther

P

acifi

c Is

land

er0

00

06

53

41

10

0

Two

or M

ore

Rac

es0

00

11

11

11

12

11

1

Geo

grap

hic

Reg

ion

Eas

t7

47

57

37

36

66

55

25

15

15

35

35

54

44

4

Mid

wes

t1

41

31

61

52

52

52

72

82

62

61

81

75

35

3

Sou

thw

est

12

12

11

12

10

10

22

20

23

21

30

28

33

Wes

t

Not

e. O

= o

nlin

e, P

= p

aper

. Zer

o in

dica

tes

few

er th

an 0

.5%

. Mis

sing

val

ues

indi

cate

no

stud

ents

for a

cel

l. P

erce

ntag

es w

ithin

a d

emog

raph

ic g

roup

ing

may

not

add

up

to 1

00

% d

ue to

ro

undi

ng.


19.9

Students were recruited to complete an ACT Aspire test in one or more subjects. Students at each grade within a school were randomly assigned to take either the paper or the online version of the test. This design is called random equivalent groups and ensured that recruited students had an equal chance of being assigned the online or paper test within a school. If students testing in each mode are equivalent, we can attribute observed differences in performance to differences in testing mode, not to differences in groups of students (or a combination of group and performance differences).

To help ensure our analysis included adequately balanced samples of students testing in each mode, schools were included in the analysis sample only if the ratio of students testing in one mode compared to the other within the school was less than two, implying that fewer than twice as many students tested in one mode versus the other. This data cleaning rule led to excluding fewer than 10% of students from analysis for most grades and subjects.4 Additional details regarding the samples, including demographic characteristics, are included in Tables 19.3–19.7.

19.4 Mode Effects for Raw Scores on Common ItemsWe investigated student raw score performance by summing scores across items that were the same across paper and online forms (i.e., excluding those items mentioned earlier that differed in item type across modes). English and writing showed more consistent evidence of mode effects across grades compared to mathematics, reading, and science. But for some subjects and grades, raw scores appeared to differ across modes, and for others they did not. It is apparent from these results that it was not safe to assume raw scores and raw score distributions on identical items across modes were comparable.

Tables 19.8–19.12 summarize the aggregated raw score statistics for the common items across modes, including score moments (mean, standard deviation, skewness, and kurtosis), effect sizes (standardized differences in mean scores), and statistical significance tests from tests of equivalence (for details on this test, see Rogers, Howard & Vessey, 1993; Serlin & Lapsley, 1985; see also the Kolmogorov-Smirnov test of differences in cumulative distributions by subject).

4 Up to 22% of students (five schools) were excluded from analysis. Grade 7 science and Grade 8 writing had more than 20% of students excluded. Most excluded schools only tested in one mode.


19.10

Table 19.8. English Common-Item Raw Score Summary for Online and Paper modes

Gra

de

34

56

78

EH

S

OP

OP

OP

OP

OP

OP

OP

Num

ber

of

com

mon

item

s2

12

12

02

02

12

13

23

23

33

33

53

55

05

0

Sam

ple

size

1,3

59

1,4

05

1,4

80

1,4

89

1,3

49

1,3

46

1,1

29

1,1

66

1,0

19

1,0

33

86

68

87

1,1

76

1,2

09

Raw

sco

re

Mea

n1

1.6

51

0.0

71

1.4

41

1.0

61

1.2

11

0.6

11

6.5

21

4.8

41

7.4

41

6.4

12

1.9

12

2.0

62

5.7

52

5.5

4

SD

4.1

44

.05

3.5

03

.77

4.1

54

.04

5.3

25

.32

5.9

25

.85

5.9

76

.52

9.9

69

.81

Min

imum

11

11

00

30

42

53

43

Max

imum

21

20

20

20

21

20

32

30

32

33

34

34

49

50

Ske

wne

ss0

.00

0.3

6-0

.09

-0.1

10

.11

0.0

90

.14

0.3

00

.02

0.2

4-0

.32

-0.4

30

.20

0.1

6

Kur

tosi

s-0

.68

-0.6

4-0

.65

-0.5

7-0

.71

-0.6

7-0

.34

-0.3

8-0

.62

-0.4

9-0

.51

-0.5

4-0

.82

-0.7

5

Rel

iabi

lity

.79

.78

.75

.78

.79

.78

.80

.80

.81

.81

.83

.86

.91

.90

Eff

ect s

ize

0.3

90

.11

0.1

50

.32

0.1

8-0

.02

0.0

2

Sta

t. te

st

Kol

mog

orov

-S

mirn

ov te

st4

.68

*1

.45

*1

.66

*3

.47

*1

.90

*1

.12

0.5

6

Test

of

equi

vale

nce

for

mea

n di

ffer

ence

†

t-lo

wer

/t-u

pper

16

.58

/3.7

31

0.3

5/-

4.6

2*

10

.14

/-2

.54

*1

2.0

8/3

.08

7.8

4/0

.14

2.8

4/-

3.8

5*

2.9

9/-

1.9

6*

(df)

(2,7

62

)(2

,96

7)

(2,6

93

)(2

,29

3)

(2,0

50

)(1

,75

1)

(2,3

83

)

Not

e. O

= o

nlin

e fo

rm, P

= p

aper

form

.

* S

tatis

tical

ly s

igni

fican

t at .

05

.† N

ote

that

sta

tistic

al s

igni

fican

ce in

dica

tes

no d

iffer

ence

in m

eans

acr

oss

mod

es. T

his

test

is c

ompr

ised

of t

wo

one

side

d t-

test

s. t-

low

er is

the

t-st

atis

tic fo

r the

diff

eren

ce in

mea

ns g

reat

er

than

-1

and

t-up

per i

s th

e t-

stat

istic

for t

he d

iffer

ence

in m

eans

less

than

1. T

oget

her,

thes

e tw

o st

atis

tics

test

the

null

hypo

thes

is th

at th

e di

ffer

ence

in m

eans

is le

ss th

an -

1 o

r gre

ater

th

an 1

. If b

oth

t-te

sts

are

stat

istic

ally

sig

nific

ant,

then

the

diff

eren

ce b

etw

een

onlin

e an

d pa

per m

eans

is b

etw

een

-1 a

nd 1

.


19.11

Table 19.9. mathematics Common-Item Raw Score Summary for Online and Paper modes

Gra

de

34

56

78

EH

S

OP

OP

OP

OP

OP

OP

OP

Num

ber

of

com

mon

item

s2

02

02

12

12

22

22

82

82

92

93

53

53

63

6

Sam

ple

size

1,4

88

1,5

39

1,5

34

1,5

32

1,4

50

1,4

27

1,1

56

1,1

94

97

41

,01

08

96

92

71

,07

51

,09

8

Raw

sco

re

Mea

n1

2.4

21

2.6

19

.53

9.4

81

0.0

09

.99

12

.27

12

.01

13

.31

13

.91

18

.96

19

.75

17

.60

18

.14

SD

4.6

84

.77

3.4

63

.39

3.4

93

.34

4.1

34

.11

5.5

15

.57

7.1

67

.18

8.2

18

.15

Min

imum

01

10

10

21

21

11

31

Max

imum

26

27

25

27

26

23

31

29

34

32

42

39

44

45

Ske

wne

ss0

.17

0.1

60

.60

0.6

20

.58

0.3

20

.53

0.3

60

.61

0.5

50

.25

0.0

30

.48

0.4

5

Kur

tosi

s-0

.29

-0.4

11

.00

1.1

20

.69

0.2

40

.51

0.2

50

.20

-0.0

3-0

.37

-0.5

5-0

.47

-0.4

6

Rel

iabi

lity

.76

.75

.64

.62

.67

.63

.74

.73

.82

.83

.87

.87

.88

.87

Eff

ect s

ize

-0.0

40

.02

0.0

00

.06

-0.1

1-0

.11

-0.0

7

Sta

t. te

st

Kol

mog

orov

-S

mirn

ov te

st1

.05

0.4

30

.96

0.8

41

.25

1.4

7*

0.7

3

Test

of

equi

vale

nce

for

mea

n di

ffer

ence

†

t-lo

wer

/t-u

pper

4.7

4/-

6.9

1*

8.5

4/-

7.6

3*

7.9

2/-

7.7

8*

7.4

2/-

4.3

4*

1.6

2/-

6.4

10

.63

/-5

.32

1.3

2/-

4.3

8

(df)

(3,0

25

)(3

,06

4)

(2,8

75

)(2

,34

8)

(1,9

82

)(1

,82

1)

(2,1

71

)

Not

e. O

= o

nlin

e fo

rm, P

= p

aper

form

.

* S

tatis

tical

ly s

igni

fican

t at .

05

.† N

ote

that

sta

tistic

al s

igni

fican

ce in

dica

tes

no d

iffer

ence

in m

eans

acr

oss

mod

es. T

his

test

is c

ompr

ised

of t

wo

one

side

d t-

test

s. t-

low

er is

the

t-st

atis

tic fo

r the

diff

eren

ce in

mea

ns g

reat

er

than

-1

and

t-up

per i

s th

e t-

stat

istic

for t

he d

iffer

ence

in m

eans

less

than

1. T

oget

her,

thes

e tw

o st

atis

tics

test

the

null

hypo

thes

is th

at th

e di

ffer

ence

in m

eans

is le

ss th

an -

1 o

r gre

ater

th

an 1

. If b

oth

t-te

sts

are

stat

istic

ally

sig

nific

ant,

then

the

diff

eren

ce b

etw

een

onlin

e an

d pa

per m

eans

is b

etw

een

-1 a

nd 1

.


19.12

Table 19.10. Reading Common-Item Raw Score Summary for Online and Paper modes

Gra

de

34

56

78

EH

S

OP

OP

OP

OP

OP

OP

OP

Num

ber

of

com

mon

item

s2

22

22

22

22

12

12

22

22

12

12

42

42

42

4

Sam

ple

size

1,4

41

1,5

13

1,3

99

1,4

01

1,2

99

1,2

94

1,0

04

1,0

38

1,0

10

1,0

37

80

08

31

85

79

19

Raw

sco

re

Mea

n1

2.9

31

3.9

91

3.3

01

4.3

21

2.9

41

2.8

21

4.5

61

4.6

91

1.9

61

2.3

11

6.3

91

5.9

61

5.1

71

5.2

0

SD

5.7

55

.68

5.6

65

.69

5.1

35

.03

5.4

45

.15

5.2

05

.31

6.2

56

.41

6.8

76

.99

Min

imum

11

11

11

22

10

11

11

Max

imum

26

26

26

26

25

24

26

26

25

24

29

30

30

31

Ske

wne

ss0

.09

-0.1

50

.10

-0.1

40

.05

-0.0

8-0

.11

-0.2

10

.15

0.0

4-0

.04

-0.1

60

.01

-0.0

2

Kur

tosi

s-1

.01

-0.8

8-0

.91

-0.8

7-0

.82

-0.8

1-0

.91

-0.8

2-0

.81

-0.9

4-0

.88

-0.8

6-1

.07

-1.1

0

Rel

iabi

lity

.85

.84

.83

.83

.83

.83

.83

.81

.80

.81

.83

.84

.87

.88

Eff

ect s

ize

-0.1

9-0

.18

0.0

2-0

.03

-0.0

70

.07

0.0

0

Sta

t. te

st

Kol

mog

orov

-S

mirn

ov te

st2

.76

*2

.48

*0

.52

0.6

80

.96

0.7

30

.47

Test

of

equi

vale

nce

for

mea

n di

ffer

ence

†

t-lo

wer

/t-u

pper

-0.3

1/-

9.8

1-0

.09

/-9

.42

5.5

9/-

4.4

4*

3.6

8/-

4.8

5*

2.7

9/-

5.8

1*

4.5

7/-

1.8

1*

2.9

4/-

3.1

4*

(df)

(2,9

52

)(2

,79

8)

(2,5

91

)(2

,04

0)

(2,0

45

)(1

,62

9)

(1,7

74

)

Not

e. O

= o

nlin

e fo

rm, P

= p

aper

form

.

* S

tatis

tical

ly s

igni

fican

t at .

05

.† N

ote

that

sta

tistic

al s

igni

fican

ce in

dica

tes

no d

iffer

ence

in m

eans

acr

oss

mod

es. T

his

test

is c

ompr

ised

of t

wo

one

side

d t-

test

s. t-

low

er is

the

t-st

atis

tic fo

r the

diff

eren

ce in

mea

ns

grea

ter t

han

-1 a

nd t-

uppe

r is

the

t-st

atis

tic fo

r the

diff

eren

ce in

mea

ns le

ss th

an 1

. Tog

ethe

r, th

ese

two

stat

istic

s te

st th

e nu

ll hy

poth

esis

that

the

diff

eren

ce in

mea

ns is

less

than

-1

or g

reat

er th

an 1

. If b

oth

t-te

sts

are

stat

istic

ally

sig

nific

ant,

then

the

diff

eren

ce b

etw

een

onlin

e an

d pa

per m

eans

is b

etw

een

-1 a

nd 1

.


19.13

Table 19.11. Science Common-Item Raw Score Summary for Online and Paper modes

Gra

de

34

56

78

EH

S

OP

OP

OP

OP

OP

OP

OP

Num

ber

of

com

mon

item

s2

52

52

42

42

42

42

82

82

82

82

92

92

82

8

Sam

ple

size

1,3

76

1,4

05

1,4

35

1,4

44

1,3

78

1,3

79

1,0

47

1,0

40

1,0

28

1,0

42

91

39

34

95

91

,02

3

Raw

sco

re

Mea

n1

2.4

31

2.6

91

3.9

41

4.1

51

5.1

71

4.9

71

7.2

51

7.0

41

4.9

51

5.1

81

6.8

01

6.9

31

3.1

81

4.0

4

SD

7.0

76

.77

5.6

35

.74

6.2

86

.25

7.3

07

.30

7.5

97

.34

7.1

17

.28

6.8

07

.20

Min

imum

01

11

11

21

12

22

21

Max

imum

32

32

30

30

32

29

35

35

35

33

36

36

35

36

Ske

wne

ss0

.52

0.5

20

.27

0.2

30

.02

0.0

20

.21

0.0

90

.42

0.3

50

.19

0.1

50

.76

0.6

1

Kur

tosi

s-0

.60

-0.5

1-0

.54

-0.5

7-0

.80

-0.8

4-0

.79

-0.8

6-0

.80

-0.8

8-0

.78

-0.7

1-0

.21

-0.3

3

Rel

iabi

lity

.88

.88

.84

.84

.85

.85

.88

.88

.89

.89

.87

.87

.87

.87

Eff

ect s

ize

-0.0

4-0

.04

0.0

30

.03

-0.0

3-0

.02

-0.1

2

Sta

t. te

st

Kol

mog

orov

-S

mirn

ov te

st1

.22

0.7

10

.54

0.5

90

.89

0.5

91

.69

*

Test

of

equi

vale

nce

for

mea

n di

ffer

ence

†

t-lo

wer

/t-u

pper

2.8

1/-

4.8

1*

3.7

3/-

5.7

1*

5.0

2/-

3.3

6*

3.7

7/-

2.4

8*

2.3

4/-

3.7

5*

2.6

0/-

3.3

7*

0.4

6/-

5.8

9

(df)

(2,7

79

)(2

,87

7)

(2,7

55

)(2

,08

5)

(2,0

68

)(1

,84

5)

(1,9

80

)

Not

e. O

= o

nlin

e fo

rm, P

= p

aper

form

.

* S

tatis

tical

ly s

igni

fican

t at .

05

.† N

ote

that

sta

tistic

al s

igni

fican

ce in

dica

tes

no d

iffer

ence

in m

eans

acr

oss

mod

es. T

his

test

is c

ompr

ised

of t

wo

one

side

d t-

test

s. t-

low

er is

the

t-st

atis

tic fo

r the

diff

eren

ce in

mea

ns

grea

ter t

han

-1 a

nd t-

uppe

r is

the

t-st

atis

tic fo

r the

diff

eren

ce in

mea

ns le

ss th

an 1

. Tog

ethe

r, th

ese

two

stat

istic

s te

st th

e nu

ll hy

poth

esis

that

the

diff

eren

ce in

mea

ns is

less

than

-1

or

grea

ter t

han

1. I

f bot

h t-

test

s ar

e st

atis

tical

ly s

igni

fican

t, th

en th

e di

ffer

ence

bet

wee

n on

line

and

pape

r mea

ns is

bet

wee

n -1

and

1.


19.14

Table 19.12. Writing Common-Item Raw Score Summary for Online and Paper modes

Gra

de

34

56

78

EH

S

OP

OP

OP

OP

OP

OP

OP

Num

ber

of

com

mon

item

s1

11

11

11

11

11

11

1

Sam

ple

size

1,1

89

1,2

68

1,2

60

1,2

80

1,1

01

1,1

02

84

08

85

82

58

36

64

56

58

69

87

67

Raw

sco

re

Mea

n1

1.1

11

.81

1.1

81

0.7

21

1.4

41

2.2

11

3.5

51

4.4

81

2.3

81

3.2

91

2.1

81

1.9

41

2.3

81

2.2

7

SD

3.5

33

.01

3.5

63

.13

3.7

43

.27

3.9

53

.44

3.8

73

.33

3.5

3.1

83

.65

3.5

7

Min

imum

44

44

44

44

44

44

44

Max

imum

20

20

20

20

20

20

23

24

24

24

23

24

24

24

Ske

wne

ss0

.21

0.0

3-0

.07

0.2

30

.08

-0.1

3-0

.14

-0.3

70

.27

0.1

7-0

.20

-0.0

10

.02

0.0

4

Kur

tosi

s-0

.37

-0.2

0-0

.52

-0.3

3-0

.28

-0.1

6-0

.32

-0.0

30

.09

0.2

1-0

.20

0.3

10

.12

0.1

0

Eff

ect s

ize

-0.2

10

.14

-0.2

2-0

.25

-0.2

50

.07

0.0

3

Sta

t. te

st

Kol

mog

orov

-S

mirn

ov te

st3

.27

*2

.40

*2

.65

*3

.12

*2

.64

*1

.59

*0

.86

Test

of

equi

vale

nce

for

mea

n di

ffer

ence

†

t-lo

wer

/t-u

pper

2.2

9/-

12

.85

*1

0.9

2/-

4.1

1*

1.4

9/-

11

.87

0.3

9/-

10

.84

0.4

9/-

10

.86

.67

/-4

.14

*5

.87

/-4

.72

*

(df)

(2,4

55

)(2

,53

8)

(2,2

01

)(1

,72

3)

(1,6

59

)(1

,30

1)

(1,4

63

)

Not

e. O

= o

nlin

e fo

rm, P

= p

aper

form

.

* S

tatis

tical

ly s

igni

fican

t at .

05

.† N

ote

that

sta

tistic

al s

igni

fican

ce in

dica

tes

no d

iffer

ence

in m

eans

acr

oss

mod

es. T

his

test

is c

ompr

ised

of t

wo

one

side

d t-

test

s. t-

low

er is

the

t-st

atis

tic fo

r the

diff

eren

ce in

mea

ns

grea

ter t

han

-1 a

nd t-

uppe

r is

the

t-st

atis

tic fo

r the

diff

eren

ce in

mea

ns le

ss th

an 1

. Tog

ethe

r, th

ese

two

stat

istic

s te

st th

e nu

ll hy

poth

esis

that

the

diff

eren

ce in

mea

ns is

less

than

-1

or

grea

ter t

han

1. I

f bot

h t-

test

s ar

e st

atis

tical

ly s

igni

fican

t, th

en th

e di

ffer

ence

bet

wee

n on

line

and

pape

r mea

ns is

bet

wee

n -1

and

1.


19.15

Figure 19.1 displays the cumulative percentage of students at each raw score by mode for each subject and grade. The solid curve represents the online form and the dashed curve represents the paper form. When one of the plotted cumulative percentage curves is to the left of the other, it indicates that this group scored lower relative to the group to the right (or, conversely, the group to the right scored higher). If curves cross or if the relative position of curves varies across raw scores, it indicates that the cumulative percentage of students between groups varies across scores. The distance between the two curves indicates the magnitude of difference between modes (difference in cumulative percentage in the vertical direction, difference in raw score in the horizontal direction). For example, for grade 3 English, the online cumulative percentage curve is to the right of the paper curve, which implies that online students scored higher. Reading up from a raw score point of 10, approximately 40% of online students scored 10 or lower, whereas approximately 60% of paper students scored 10 or lower. By subtracting these percentages from 100, one could say that approximately 60% of online students scored above a 10, but about 40% of students testing on paper scored above a 10. A statistical test of the differences in cumulative distributions is provided by the Kolmogorov-Smirnov test listed in Table 19.8.

Figure 19.1. Cumulative percentage of students for common item raw scores across modes

English

Grade 3 Grade 4


19.16


Grade 5 Grade 6

Grade 7 Grade 8

Early High School


19.17


Mathematics

Grade 3 Grade 4

Grade 5 Grade 6


19.18


Grade 7 Grade 8

Early High School


19.19


Reading

Grade 3 Grade 4

Grade 5 Grade 6


19.20


Grade 7 Grade 8

Early High School


19.21


Science

Grade 3 Grade 4

Grade 5 Grade 6


19.22


Grade 7 Grade 8

Early High School


19.23


Writing

Grade 3 Grade 4

Grade 5 Grade 6


19.24


Grade 7 Grade 8

Early High School


19.25

English raw scores and distributions showed evidence of differences across modes in grades 3, 5, 6, and 7, with means differing by more than one point, effect sizes between .15 and .39,5 statistically significant differences in score distributions, and nonstatistically significant similarities in mean scores. Although not consistent across statistics, grade 4 did show some evidence of mode effects with a statistically significant Kolmogorov-Smirnov test indicating that score distributions differed (see also Figure 19.1 for a plot of the score distributions). Online tests scored higher than paper tests for grades where differences were evident.

Mathematics raw scores and distributions showed evidence of differences across modes in grade 7, 8, and EHS, with the paper group scoring higher than the online group. The test of equivalence indicated that mean scores were not statistically similar. The score distributions differed statistically at grade 8 (see Figure 19.1, which plots the difference). However, the magnitudes of differences across modes were relatively small; means differed by less than one point and effect sizes were within ±.15.

Reading raw scores and distributions showed evidence of differences across modes in grades 3 and 4, with the paper group scoring higher than the online group. Means differed by more than one point, effect sizes were −.19 and −.18, differences in score distributions were statistically significant, and similarities in mean scores were not statistically significant.

Science raw scores and distributions did not appear to differ across modes for all but the EHS forms, where distributions were statistically different and means were not statistically similar. However, the difference was not large; means favored paper by less than one point and the effect size was −.12.

For writing raw scores and distributions, results showed differences across modes for all forms except EHS. Forms showed statistically significant differences in distributions in grades 3–8. Tests of equivalence did not show evidence of similarity of means for grades 3, 4, 5, 6, or 7. However, means differed by less than one point for all but grade 7. Effect sizes were within ±.15 for grades 4, 8 and EHS and were between −.20 and −.30 for grades 3, 4, 5, and 7. Paper tests scored higher than online tests in grades 3, 5, 6, and 7 and online tests scored higher than paper tests in grades 4 and 8.

In many cases, the different methods of checking mode effects for raw number-of-points scores on common items led to slightly different grades in each subject showing statistically significant mode effects. While typically not large, there were statistically significant differences across modes observed in each subject area.

5 We judged effect sizes within ±.15 to be small to negligible.


19.26

Our interpretation of the comparisons of raw number-of-points scores based on identical items across modes is that mode effects observed for these ACT Aspire forms should not be ignored. Score differences were not always observed, and when they were observed they were generally not large, but we would argue that it is not reasonable to assume no mode effects across collections of identical items. If two ACT Aspire forms did contain 100% overlapping items, we would recommend additional statistical linking to adjust for mode effects. However, as mentioned earlier, most ACT Aspire online and paper forms are not 100% identical, so regardless of mode effects, best practices would involve linking to place forms on the same scale (Kingston 2009).

19.5 Item-level Performance across ModesTo examine the comparability of item-level performance across modes, two sets of results are presented. Plots of item statistics are presented to provide an overview of the patterns of item-level performance across modes. Also, item statistics across modes are compared to gauge the degree to which individual item statistics were unusually high or low compared to all items.

Item statistics. Figures 19.2–19.5 display the mean number of points correct6 for common items across online (horizontal axis) and paper (vertical axis) forms. For multiple choice items, these statistics are often referred to as p-values. Each letter in the plot represents an item (M = multiple choice, X = constructed response). The plots illustrate patterns in the item difficulty for items across modes. For example, if the mean item performance did not differ across modes (i.e., online item mean = paper item mean), items would fall along the diagonal of the plot from lower left to upper right. Or, if one mode was harder than the other, we would see all of the items cluster above or below an imaginary line drawn through the diagonal (online = paper). We are unlikely to observe either pattern exactly across modes, but these plots can provide a better picture of the relationship between items across modes to identify (a) the strength of the relationship in item difficulty across modes and (b) whether there are particular items that we might consider unusually difficult or easy in one mode compared to the other.

Looking over the plots for English, mathematics, reading, and science, even though the scales on the plots for English have a smaller range, it is clear that the item mean differences across modes tended to be larger in English than in each of the other subjects. This is consistent with the raw number-of-points results described above, where English showed more evidence of mode effects. In addition, there were some items that appeared unusually difficult and the most pervasive differences appeared in English, particularly grades 3 and 6. To look more closely at differential effects across modes for items we used delta plots, which are described next.

6 For multiple choice items, these statistics are commonly referred to as p-values, which correspond to the proportion of students answering an item correct.


19.27

Delta plots. Figure 19.6 displays delta plots of items across online and paper forms. Delta is a transformed item p-value7 that can be used to compare item difficulty statistics across examinee groupings (Angoff & Ford, 1973) and identify items that function differently across groups. In the case of comparability across modes, delta plots can be used to identify items that show unusually high or low performance in one mode compared to other items common across modes.

Deltas are calculated by first transforming the p-value (item difficulty) to a normal deviate (z) and then scaling z to a mean of 13 and a standard deviation of 4. Delta values can be plotted by group (in this case, online on one axis, paper on the other axis) and prediction line (or delta hat) estimated from the observed delta values (see Angoff & Ford, 1973 for details). In addition, confidence bands that indicate one- and two-standard errors above and below the predicted delta value can be estimated and used to indicate items that perform statistically differently across modes.

For our purposes here, we flagged items as performing differently when they fell outside of the two standard error band, which corresponds to a roughly 95% confidence interval. In addition, we were primarily interested in the pattern of delta values across items and the number of items flagged as performing differently across modes. Given that the number of items per form ranged from 20 to 50 (see Table 19.8), we would expect to flag 1 to 2 items per plot due to chance.

In general, all but one or two items fall within plus-or-minus two standard errors of the predicted delta value. Forms with two items flagged were those with relatively larger numbers of items (the number of items increases as grade level increases). Therefore, almost all grades and subjects showed little evidence that items were performing differently across modes.

Delta plots for English early high school forms had four items that were flagged as performing differently across modes. For three of the items, the online version appeared easier (or paper version appeared harder) and for one of the items the paper version appeared easier (or online version appeared harder). To put these differences in perspective, Table 19.13 lists the item p-values for English EHS items for the flagged items8. The item that appeared easier on paper (or harder online) was earlier in the test and the items that appeared easier online (or harder on paper) were the last three items on the test. A review of the items indicated that these items did indeed appear to be the same, except for differences due to mode of administration. For the last three items on the test, it is possible that the EHS English test on paper ended up being slightly more speeded than the online version, which resulted in the last three items being flagged as unusually difficult for the paper version.

7 See previous footnote for p-value explanation. For extended/constructed response items, item means were converted to mean proportion of points before converting to delta.

8 The p-value values and p-value differences for non-flagged items are not included in Table 19.13 but the differences between online and paper p-values for non-flagged English EHS items ranged from -.10 to .10.


19.28

Considering the delta plot results along with the results from the item statistic plots illustrates two things. First, observed differences across modes at the raw-number-of-points correct score level were not explainable by specific items performing differentially well (or poorly) in one mode. In fact, with the exception of English EHS test, few items were observed that appeared to perform differentially across mode using delta plots. Second, while it wasn’t necessarily easy to see in the plots of item-level statistics, there did appear to be small but consistent mode effect across items for some grades and subjects, which are summarized in earlier tables comparing raw number-of-points correct score across modes.

Figure 19.2. English common item mean scores for paper and online forms by item type*


19.29


* Note that English only has multiple choice (MC) item type, which is represented by an “M” in the figure.


19.30

Figure 19.3. mathematics common item mean scores for paper and online forms by item type*


19.31


* Note that “M” = multiple choice (MC) and “X” = constructed response (XR) item type in the figure.


19.32

Figure 19.4. Reading common item mean scores for paper and online forms by item type*


19.33




19.34

Figure 19.5. Science common item mean scores for paper and online forms by item type*


19.35




19.36

Figure 19.6. Delta plots for online and paper common items with predicted delta and one- and two-standard error bands.




19.37



English Early High School


19.38





19.39



Mathematics Early High School


19.40





19.41



Reading Early High School


19.42





19.43



Science Early High School

Table 19.13. aCt aspire Spring 2013 Comparability Study P-values for English Early High School Items Flagged in Delta Plots

Flagged Item

Online p-value

Paper p-value

Difference (Online – Paper)

1 .29 .46 –.17

2 .70 .52 .18

3 .41 .19 .22

4 .50 .35 .15


19.44

19.6 Comparisons of Scale Scores across ModesWe investigated the comparability of ACT Aspire scale scores, which were obtained after linking paper and online forms. Paper forms were linked to online forms using random equivalent groups equipercentile linking (Kolen & Brennan, 2014). Equipercentile methodology has been used extensively with other ACT testing programs, including the operational equating of ACT Aspire (see Chapter 20). The primary purpose of this particular linking was to statistically adjust for differences across forms due to item type and mode.

Tables 19.14–19.18 summarize scale score statistics, including score moments (mean, standard deviation, skewness, and kurtosis), effect sizes, statistical significance tests from tests of equivalence, and the Kolmogorov-Smirnov test of difference in cumulative distributions for each subject. Figure 19.7 displays the cumulative percentage of students at each scale score by mode for each subject and grade. Figure 19.8 displays box plots of scale scores by mode for each subject and grade.


19.45

Table 19.14. English Scale Score Summary for Online and Paper modes

Gra

de

34

56

78

EH

S

OP

OP

OP

OP

OP

OP

OP

Num

ber

of it

ems

25

25

25

25

25

25

35

35

35

35

35

35

50

50

Sam

ple

size

1,3

59

1,4

05

1,4

80

1,4

89

1,3

49

1,3

46

1,1

29

1,1

66

1,0

19

1,0

33

86

68

87

1,1

76

1,2

09

Sca

le s

core

Mea

n4

16

.32

41

6.3

84

19

.64

41

9.5

24

21

.79

42

1.7

34

24

.23

42

4.2

24

24

.74

42

4.6

34

27

.61

42

7.6

44

28

.23

42

8.0

5

SD

5.8

75

.86

6.1

46

.13

7.1

57

.05

7.7

37

.80

8.9

38

.95

8.7

48

.79

10

.75

10

.66

Min

imum

40

14

01

40

24

05

40

34

04

40

24

00

40

34

00

40

34

01

40

04

00

Max

imum

43

54

34

43

84

38

44

24

41

44

74

46

44

84

50

44

84

47

45

44

56

Ske

wne

ss0

.51

0.4

90

.08

0.1

10

.23

0.2

50

.12

0.1

2-0

.08

-0.0

7-0

.17

-0.1

70

.01

0.0

1

Kur

tosi

s-0

.09

-0.0

5-0

.50

-0.5

8-0

.44

-0.3

5-0

.21

-0.2

4-0

.39

-0.3

4-0

.40

-0.3

9-0

.69

-0.6

0

Rel

iabi

lity

0.7

90

.79

0.7

60

.79

0.8

00

.79

0.8

20

.81

0.8

30

.82

0.8

40

.86

0.9

00

.90

Eff

ect s

ize

-0.0

10

.02

0.0

10

.00

0.0

10

.00

0.0

2

Sta

t. te

st

Kol

mog

orov

-S

mirn

ov te

st1

.41

1.2

61

.42

1.1

11

.02

0.9

40

.64

Test

of

equi

vale

nce

for

mea

n di

ffer

ence

†

t-lo

wer

/t-u

pper

-4.7

9/4

.17

*-3

.91

/4.9

7*

-3.4

4/3

.87

*-3

.07

/3.1

0*

-2.2

7/2

.80

*-2

.46

/2.3

2*

-1.8

7/2

.69

*

(df)

(2,7

62

)(2

,96

7)

(2,6

93

)(2

,29

3)

(2,0

50

)(1

,75

1)

(2,3

83

)

Not

e. O

= o

nlin

e fo

rm, P

= p

aper

form

.

* S

tatis

tical

ly s

igni

fican

t at .

05

.† N

ote

that

sta

tistic

al s

igni

fican

ce in

dica

tes

no d

iffer

ence

in m

eans

acr

oss

mod

es. T

his

test

is c

ompr

ised

of t

wo

one

side

d t-

test

s. t-

low

er is

the

t-st

atis

tic fo

r the

diff

eren

ce in

mea

ns

grea

ter t

han

-1 a

nd t-

uppe

r is

the

t-st

atis

tic fo

r the

diff

eren

ce in

mea

ns le

ss th

an 1

. Tog

ethe

r, th

ese

two

stat

istic

s te

st th

e nu

ll hy

poth

esis

that

the

diff

eren

ce in

mea

ns is

less

than

-1

or

grea

ter t

han

1. I

f bot

h t-

test

s ar

e st

atis

tical

ly s

igni

fican

t, th

en th

e di

ffer

ence

bet

wee

n on

line

and

pape

r mea

ns is

bet

wee

n -1

and

1.


19.46

Table 19.15. mathematics Scale Score Summary for Online and Paper modes

Gra

de

34

56

78

EH

S

OP

OP

OP

OP

OP

OP

OP

Num

ber

of it

ems

25

25

25

25

25

25

34

34

34

34

38

38

38

38

Sam

ple

size

1,4

88

1,5

39

1,5

34

1,5

32

1,4

50

1,4

27

1,1

56

1,1

94

97

41

,01

08

96

92

71

,07

51

,09

8

Sca

le s

core

Mea

n4

12

.21

41

2.1

94

15

.25

41

4.9

94

17

.44

41

7.3

74

19

.85

41

9.8

74

19

.73

42

0.0

34

22

.92

42

2.6

54

24

.88

42

5.1

1

SD

3.9

64

.04

4.1

54

.18

4.8

34

.75

5.6

95

.68

6.5

36

.67

7.9

27

.90

8.6

58

.73

Min

imum

40

04

00

40

34

02

40

44

03

40

44

04

40

44

03

40

14

03

40

84

05

Max

imum

42

54

27

43

04

33

43

84

35

44

54

42

44

34

42

44

54

44

45

14

51

Ske

wne

ss0

.06

-0.0

40

.09

0.1

90

.53

0.3

90

.45

0.4

00

.46

0.3

90

.24

0.3

20

.42

0.3

5

Kur

tosi

s-0

.02

-0.0

30

.26

0.5

80

.58

0.3

60

.27

0.2

00

.08

0.0

2-0

.37

-0.4

1-0

.50

-0.5

6

Rel

iabi

lity

0.7

90

.77

0.6

80

.66

0.7

00

.69

0.7

80

.77

0.8

40

.83

0.8

90

.88

0.8

90

.88

Eff

ect s

ize

0.0

00

.06

0.0

20

.00

-0.0

50

.03

-0.0

3

Sta

t. te

st

Kol

mog

orov

-S

mirn

ov te

st1

.59

*2

.49

*1

.95

*1

.50

*0

.98

0.9

21

.03

Test

of

equi

vale

nce

for

mea

n di

ffer

ence

†

t-lo

wer

/t-u

pper

-6.7

5/6

.99

*-4

.94

/8.3

6*

-5.1

7/6

.03

*-4

.33

/4.2

0*

-4.3

9/2

.35

*-1

.97

/3.4

3*

-3.2

9/2

.07

*

(df)

(3,0

25

)(3

,06

4)

(2,8

75

)(2

,34

8)

(1,9

82

)(1

,82

1)

(2,1

71

)

Not

e. O

= o

nlin

e fo

rm, P

= p

aper

form

.

* S

tatis

tical

ly s

igni

fican

t at .

05

.† N

ote

that

sta

tistic

al s

igni

fican

ce in

dica

tes

no d

iffer

ence

in m

eans

acr

oss

mod

es. T

his

test

is c

ompr

ised

of t

wo

one

side

d t-

test

s. t-

low

er is

the

t-st

atis

tic fo

r the

diff

eren

ce in

mea

ns

grea

ter t

han

-1 a

nd t-

uppe

r is

the

t-st

atis

tic fo

r the

diff

eren

ce in

mea

ns le

ss th

an 1

. Tog

ethe

r, th

ese

two

stat

istic

s te

st th

e nu

ll hy

poth

esis

that

the

diff

eren

ce in

mea

ns is

less

than

-1

or

grea

ter t

han

1. I

f bot

h t-

test

s ar

e st

atis

tical

ly s

igni

fican

t, th

en th

e di

ffer

ence

bet

wee

n on

line

and

pape

r mea

ns is

bet

wee

n -1

and

1.


19.47

Table 19.16. Reading Scale Score Summary for Online and Paper modesG

rade

34

56

78

EH

S

OP

OP

OP

OP

OP

OP

OP

Num

ber

of it

ems

24

24

24

24

24

24

24

24

24

24

24

24

24

24

Sam

ple

size

1,4

41

1,5

13

1,3

99

1,4

01

1,2

99

1,2

94

1,0

04

1,0

38

1,0

10

1,0

37

80

08

31

85

79

19

Sca

le s

core

Mea

n4

11

.84

41

1.7

84

14

.35

41

4.3

74

16

.27

41

6.3

84

18

.15

41

8.1

94

19

.09

41

9.0

74

21

.18

42

1.0

74

21

.48

42

1.5

1

SD

5.2

85

.22

5.5

65

.62

6.2

96

.10

6.6

86

.65

6.9

36

.86

7.4

97

.55

7.8

77

.89

Min

imum

40

14

01

40

24

01

40

14

01

40

34

02

40

24

02

40

14

02

40

34

03

Max

imum

42

74

29

43

14

31

43

44

33

43

64

36

43

84

36

43

74

40

44

04

42

Ske

wne

ss0

.42

0.3

90

.26

0.2

70

.23

0.2

80

.05

0.0

40

.04

0.0

3-0

.05

-0.0

5-0

.01

0.0

2

Kur

tosi

s-0

.49

-0.5

0-0

.59

-0.5

6-0

.57

-0.5

2-0

.68

-0.6

7-0

.68

-0.7

1-0

.80

-0.7

8-0

.90

-0.8

9

Rel

iabi

lity

0.8

50

.84

0.8

40

.84

0.8

40

.83

0.8

40

.82

0.8

30

.83

0.8

50

.85

0.8

80

.88

Eff

ect s

ize

0.0

10

.00

-0.0

2-0

.01

0.0

00

.01

0.0

0

Sta

t. te

st

Kol

mog

orov

-S

mirn

ov te

st1

.07

1.0

30

.76

1.1

50

.90

1.0

30

.64

Test

of

equi

vale

nce

for

mea

n di

ffer

ence

†

t-lo

wer

/t-u

pper

-4.9

0/5

.45

*-4

.84

/4.6

3*

-4.5

7/3

.65

*-3

.53

/3.2

4*

-3.2

1/3

.35

*-2

.40

/2.9

7*

-2.7

4/2

.60

*

(df)

(2,9

52

)(2

,79

8)

(2,5

91

)(2

,04

0)

(2,0

45

)(1

,62

9)

(1,7

74

)

Not

e. O

= o

nlin

e fo

rm, P

= p

aper

form

.

* S

tatis

tical

ly s

igni

fican

t at .

05

.† N

ote

that

sta

tistic

al s

igni

fican

ce in

dica

tes

no d

iffer

ence

in m

eans

acr

oss

mod

es. T

his

test

is c

ompr

ised

of t

wo

one

side

d t-

test

s. t-

low

er is

the

t-st

atis

tic fo

r the

diff

eren

ce in

mea

ns

grea

ter t

han

-1 a

nd t-

uppe

r is

the

t-st

atis

tic fo

r the

diff

eren

ce in

mea

ns le

ss th

an 1

. Tog

ethe

r, th

ese

two

stat

istic

s te

st th

e nu

ll hy

poth

esis

that

the

diff

eren

ce in

mea

ns is

less

than

-1

or

grea

ter t

han

1. I

f bot

h t-

test

s ar

e st

atis

tical

ly s

igni

fican

t, th

en th

e di

ffer

ence

bet

wee

n on

line

and

pape

r mea

ns is

bet

wee

n -1

and

1.


19.48

Table 19.17. Science Scale Score Summary for Online and Paper modes

Gra

de

34

56

78

EH

S

OP

OP

OP

OP

OP

OP

OP

Num

ber

of it

ems

28

28

28

28

28

28

32

32

32

32

32

32

32

32

Sam

ple

size

1,3

76

1,4

05

1,4

35

1,4

44

1,3

78

1,3

79

1,0

47

1,0

40

1,0

28

1,0

42

91

39

34

95

91

,02

3

Sca

le s

core

Mea

n4

13

.60

41

3.5

94

17

.02

41

7.0

94

18

.98

41

8.8

34

19

.69

41

9.7

14

19

.70

41

9.7

44

22

.97

42

2.8

04

24

.49

42

4.6

4

SD

5.9

45

.93

6.5

96

.69

6.5

56

.66

7.1

57

.04

8.0

48

.03

7.9

07

.87

8.1

58

.09

Min

imum

40

14

01

40

14

00

40

24

01

40

34

01

40

14

02

40

34

03

40

54

04

Max

imum

43

14

32

43

44

35

43

84

35

43

84

39

44

14

38

44

44

44

44

74

49

Ske

wne

ss0

.32

0.4

00

.07

0.0

2-0

.10

-0.1

00

.07

0.1

00

.17

0.1

6-0

.06

-0.0

20

.17

0.1

7

Kur

tosi

s-0

.60

-0.5

0-0

.55

-0.7

7-0

.48

-0.6

8-0

.65

-0.5

5-0

.89

-0.9

3-0

.67

-0.7

0-0

.69

-0.5

8

Rel

iabi

lity

0.8

90

.88

0.8

50

.85

0.8

60

.87

0.8

90

.88

0.9

00

.89

0.8

80

.88

0.8

60

.87

Eff

ect s

ize

0.0

0-0

.01

0.0

20

.00

0.0

00

.02

-0.0

2

Sta

t. te

st

Kol

mog

orov

-S

mirn

ov te

st1

.16

1.1

90

.79

0.8

10

.55

0.8

60

.94

Test

of

equi

vale

nce

for

mea

n di

ffer

ence

†

t-lo

wer

/t-u

pper

-4.4

1/4

.48

*-4

.32

/3.7

7*

-3.3

6/4

.58

*-3

.29

/3.1

5*

-2.9

4/2

.72

*-2

.27

/3.1

8*

-3.1

5/2

.33

*

(df)

(2,7

79

)(2

,87

7)

(2,7

55

)(2

,08

5)

(2,0

68

)(1

,84

5)

(1,9

80

)

Not

e. O

= o

nlin

e fo

rm, P

= p

aper

form

.

* S

tatis

tical

ly s

igni

fican

t at .

05

.† N

ote

that

sta

tistic

al s

igni

fican

ce in

dica

tes

no d

iffer

ence

in m

eans

acr

oss

mod

es. T

his

test

is c

ompr

ised

of t

wo

one

side

d t-

test

s. t-

low

er is

the

t-st

atis

tic fo

r the

diff

eren

ce in

mea

ns

grea

ter t

han

-1 a

nd t-

uppe

r is

the

t-st

atis

tic fo

r the

diff

eren

ce in

mea

ns le

ss th

an 1

. Tog

ethe

r, th

ese

two

stat

istic

s te

st th

e nu

ll hy

poth

esis

that

the

diff

eren

ce in

mea

ns is

less

than

-1

or

grea

ter t

han

1. I

f bot

h t-

test

s ar

e st

atis

tical

ly s

igni

fican

t, th

en th

e di

ffer

ence

bet

wee

n on

line

and

pape

r mea

ns is

bet

wee

n -1

and

1.


19.49

Table 19.18. Writing Scale Score Summary for Online and Paper modes

Gra

de

34

56

78

EH

S

OP

OP

OP

OP

OP

OP

OP

Num

ber

of it

ems

11

11

11

11

11

11

11

Sam

ple

size

1,2

36

1,2

95

1,2

72

1,2

89

1,1

06

1,1

06

84

78

88

83

48

39

64

66

65

70

87

77

Sca

le s

core

Mea

n4

22

.20

42

2.1

94

22

.35

42

2.4

74

22

.87

42

2.8

84

27

.10

42

7.0

44

24

.76

42

4.7

64

24

.36

42

4.3

74

24

.75

42

4.7

5

SD

7.0

66

.91

7.1

26

.87

7.4

87

.45

7.9

07

.61

7.7

47

.73

6.9

97

.03

7.3

17

.29

Min

imum

40

84

08

40

84

08

40

84

08

40

84

08

40

84

08

40

84

08

40

84

08

Max

imum

44

04

40

44

04

40

44

04

40

44

64

48

44

84

48

44

64

48

44

84

48

Ske

wne

ss0

.21

0.2

3-0

.07

-0.0

50

.08

0.0

6-0

.14

-0.0

60

.27

0.2

8-0

.20

-0.1

40

.02

0.0

6

Kur

tosi

s-0

.37

-0.3

4-0

.52

-0.4

0-0

.28

-0.3

3-0

.32

-0.3

20

.09

-0.0

1-0

.20

-0.1

70

.12

0.1

3

Eff

ect s

ize

0.0

0-0

.02

0.0

00

.01

0.0

00

.00

0.0

0

Sta

t. te

st

Kol

mog

orov

-S

mirn

ov te

st1

.70

*3

.77

*1

.83

*3

.76

*1

.88

*1

.57

*1

.22

Test

of

equi

vale

nce

for

mea

n di

ffer

ence

†

t-lo

wer

/t-u

pper

-3.5

0/3

.60

*-4

.02

/3.1

8*

-3.1

7/3

.12

*-2

.51

/2.8

4*

-2.6

5/2

.62

*-2

.62

/2.5

3*

-2.6

2/2

.62

*

(df)

(2,4

55

)(2

,53

8)

(2,2

01

)(1

,72

3)

(1,6

59

)(1

,30

1)

(1,4

63

)

Not

e. O

= o

nlin

e fo

rm, P

= p

aper

form

.

* S

tatis

tical

ly s

igni

fican

t at .

05

.† N

ote

that

sta

tistic

al s

igni

fican

ce in

dica

tes

no d

iffer

ence

in m

eans

acr

oss

mod

es. T

his

test

is c

ompr

ised

of t

wo

one

side

d t-

test

s. t-

low

er is

the

t-st

atis

tic fo

r the

diff

eren

ce in

mea

ns

grea

ter t

han

-1 a

nd t-

uppe

r is

the

t-st

atis

tic fo

r the

diff

eren

ce in

mea

ns le

ss th

an 1

. Tog

ethe

r, th

ese

two

stat

istic

s te

st th

e nu

ll hy

poth

esis

that

the

diff

eren

ce in

mea

ns is

less

than

-1

or

grea

ter t

han

1. I

f bot

h t-

test

s ar

e st

atis

tical

ly s

igni

fican

t, th

en th

e di

ffer

ence

bet

wee

n on

line

and

pape

r mea

ns is

bet

wee

n -1

and

1.


19.50

Compared to Tables 19.8–19.12, which contained raw scores on common items across modes, we see that differences in scale scores are small and generally not statistically significant. For example, if we compare the grade 3 English raw score cumulative distributions in Figure 19.1 to the grade 3 English scale score cumulative distributions in Figure 19.7, we see that the cumulative score differences are small or negligible for the scale scores. While the score differences across modes were eliminated for most of the statistics in Tables 19.14–19.18, there were some cases where the Kolmogorov-Smirnov test indicated statistical differences in the cumulative distributions between modes in mathematics and writing. However, these differences appeared to be small, as illustrated by Figure 19.7, and effect sizes were near zero. The box plots in Figure 19.8 further illustrate scale score comparability across modes. The box plots are not identical across modes for each grade, but they do appear similar in most cases, which is indicative of similar score distributions across modes.

Figure 19.7. Cumulative percentage of students for scale scores across modes

English

Grade 3 Grade 4

Grade 5 Grade 6


19.51


Grade 7 Grade 8

Early High School


19.52


Mathematics

Grade 3 Grade 4

Grade 5 Grade 6


19.53


Grade 7 Grade 8

Early High School


19.54


Reading

Grade 3 Grade 4

Grade 5 Grade 6


19.55


Grade 7 Grade 8

Early High School


19.56


Science

Grade 3 Grade 4

Grade 5 Grade 6


19.57


Grade 7 Grade 8

Early High School


19.58


Writing

Grade 3 Grade 4

Grade 5 Grade 6


19.59


Grade 7 Grade 8

Early High School


19.60

Figure 19.8. box plots of scale scores by grade and mode for each subject area

English

Mathematics


19.61


Reading

Science


19.62


Writing

Based in part on these results, we argue that the statistical linking was successful in assuring that scale scores across paper and online modes were comparable. In other words, mode effects across online and paper forms were effectively eliminated; scale scores appeared to function similarly across modes.

19.7 SummaryBased on the results comparing raw number-of-points-correct scores on identical items across modes, it is preferable to assume mode effects for ACT Aspire summative assessments. Not every grade and subject showed evidence of mode effects, but it occurred often enough that assuming no mode effects was not a reasonable assumption. When mode effects were observed, they generally appeared relatively small and in some cases (e.g., writing) were inconsistent. If some degree of score differences due to mode were not a strong concern, it may be acceptable or even preferable to ignore mode effects on identical items across modes, but not under current interpretations of ACT Aspire scores.

As described above, some form of statistical linkage was required across modes due to item-type differences across modes. Statistical linkages appeared to successfully adjust for differences in items and mode across paper and online forms under the random groups design. Scale scores appeared comparable across modes.

20.1

CHaPtER 20

ACT Aspire EquatingThis chapter contains two sections. The first section presents the general equating plan used for ACT Aspire. The second section presents the equating process and equating results of the 2014 spring operational administration.

20.1 Methods and ProceduresMultiple ACT Aspire test forms are developed each year. Despite being constructed to follow the same content and statistical specifications, test forms may differ slightly in difficulty. Equating is used to control for these differences across forms so that scale scores reported to students have the same meaning regardless of the specific form administered. Equating is the process of making statistical adjustments to maintain score interchangeability across test forms (see Holland & Dorans, 2006; Kolen & Brennan, 2014).

ACT Aspire equating typically uses a random equivalent groups design, which involves spiraling the administration of test forms. In this case, test forms are interspersed within a classroom so that forms are distributed equally and randomly equivalent groups of students take each form. Under this design, if groups are indeed randomly equivalent, differences observed in performance across forms can be attributed to differences in form difficulty and equating methods are applied to adjust for these differences.1

Each year, a sample of students from an operational administration of ACT Aspire is used as an equating sample. Students in this sample are administered a spiraled set of ACT Aspire test forms that includes new test forms and an anchor test form that has already been equated to previous forms. Spiraling occurs separately for paper and online test forms, but a large sample of students takes each form.

1 This methodology is also used to evaluate the comparability of forms when items are scrambled and when forms contain different pretest items.

aCt aSPIRE EQuatINg

20.2

Scores on alternate test forms are equated to the ACT Aspire score scale using equipercentile equating methodology (Kolen & Brennan, 2014). In equipercentile equating, scores on different test forms are considered equivalent if they have the same percentile rank in a given group of students. Equipercentile equating is applied to the raw number-of-points scores for each subject test separately. The equipercentile equating results are subsequently smoothed using an analytic method described by Kolen (1984) to establish a smooth curve, and the equivalents are rounded to integers. The conversion tables that result from this process are used to transform raw scores on the new forms to scale scores.

In special cases where slight changes are made to the anchor form in the current year’s equating study compared to its administration in the previous equating study, the revised anchor form is first equated to its original version using a common-item-nonequivalent group design, and then new forms are equated to the revised anchor using a random equivalent groups design.

Composite scores, including STEM and ELA scores, are not directly equated across forms but are calculated based on the separate subject test scale scores. Other scores, such as reporting category scores, are not equated across forms and are calculated based on raw number of points earned.

20.2 Summary from the Spring 2014 AdministrationEquating conducted in 2014 spring involved slight changes to some items on the anchor forms administered in spring of 2013, upon which the ACT Aspire scales were constructed. Therefore, the 2014 forms were equated to the 2013 anchor forms through common items using a common-item-nonequivalent-group design. The majority of items remained unchanged from the 2013 to 2014 administration. A small proportion of items were revised or replaced. For example, some constructed-response items were scored using revised rubrics or machine instead of human. Items with any content or scoring change from 2013 to 2014 were excluded from the common item set.

Once common items were identified after evaluating content and scoring modifications, statistical analyses were conducted to examine that the statistical characteristics of common items were similar between the 2013 and 2014 samples. Results included delta plots (Angoff & Ford, 1973) and analysis of differences in mean proportion-of-points earned2 on common items. Items that fell outside of the two standard error band of delta plots or had absolute mean proportion-of-points earned difference larger than 0.15 were flagged as outliers for further evaluation. The final common item sets had full coverage with respect to both content and statistical specifications, such as mean proportion-of-points earned and point-biserial

2 Note that for multiple-choice items, where possible scores include 0 or 1, this statistic is often referred to as the p-value. But, since most ACT Aspire forms include items with maximum possible scores greater than 1, this statistic is referred to as mean proportion of points earned.

aCt aSPIRE EQuatINg

20.3

distributions on the intact forms, and had sufficient number of items from each reporting category. The proportions of common items ranged from 52% to 97% across forms.

Classical linear equating methods (e.g., Tucker method, Levine observed score method) or frequency estimation equipercentile equating methods were used to conduct equating, after evaluating the characteristics of group and form differences between 2013 and 2014 administrations. Equating was conducted separately for the online and paper forms using 2014 operational data. The paper forms were linked to the online forms in the 2013 comparability study. Pretest items included in test forms were not scored and were not used in the equating process. There were scrambled versions of each test form in 2014. Operational items in the scrambled forms were identical to the 2014 anchor forms but were presented in slightly different orders. Examinees’ performance on the scrambled forms was examined and evidence indicated that scrambled forms functioned similarly to the anchor forms.

20.2.1 ResultsTables 20.1 to 20.8 present the demographic information for the samples used in the equating study. All demographic information was provided by schools. The non-response rates were higher for online forms than for paper forms. More examinees took mathematics and reading than English and science. Genders were evenly distributed. Depending on specific grade and mode, distributions of ethnicity were slightly different. Approximately half of the sample was white and a quarter was African American.

To evaluate the equating results, plots of students’ average scale scores conditional on common item raw scores were reviewed. Students taking the 2013 form and those taking the 2014 form with the same common item raw score should have similar scale scores after equating. Figures 20.1-20.4 present plots for each subject and certain grade. The red circle points represent the group of examinees taking the 2014 forms and the blue cross points represent the group of examinees taking the 2013 forms. These plots indicate that for most score points, students with the same common item raw score have the same scale scores regardless of which form they take. Scale score differences tend to be larger at both ends than in the middle range of the score points.

20.2.2 Standard Error of EquatingStandard error of equating (SEE) is an index capturing random error in estimating equating relationships using scores of examinees from samples instead of from a population. According to Kolen and Brennan (2014), SEE is “conceived of as the standard deviation of equated scores over hypothetical replications of an equating procedure in samples from a population or populations of examinees.” To compute SEE for ACT Aspire forms, the bootstrap method was adopted. Two hundred bootstrap samples of the same size were drawn with replacement from the original

aCt aSPIRE EQuatINg

20.4

sample. Based on each bootstrap sample, a new conversion table was generated. SEE was computed as the standard deviation of the unrounded scale scores corresponding to each raw score point on the new form across all replications.

Figure 20.5 presents the plots of SEE against raw score point for selected subjects and grades. The magnitudes of the SEEs are affected by the equating sample size—the larger the sample size, the smaller the SEE. The shape of the SEE curves in Figure 20.5 is related to the equating method that was selected. Generally speaking, when a classical linear method is selected, the SEE curve tends to be smooth and follows a U shape. When the equipercentile method is selected, SEEs are low and the curve is flat in the middle section of the raw score scale; however, SEEs tend to get larger as raw score points move from the middle to either end; and around the two end points, SEEs become small again due to a linear interpolation in the equating process.

aCt aSPIRE EQuatINg

20.5

Table 20.1. Percentages of Students taking English Online Forms in the Spring 2014 Equating Study by grade and Demographics


Grade (N)

3 4 5 6 7 8 EHS

(3,342) (3,243) (3,346) (3,743) (5,463) (8,979) (18,116)

Gender

Not Specified 13 13 11 13 15 12 16

Female 43 43 44 42 41 45 42

Male 44 44 44 44 43 43 42

Race/Ethnicity

Not Specified 37 37 41 51 51 46 55



White 29 31 29 27 32 42 36


Asian 1 1 1 1 1 1 1

Native Hawaiian/Other Pacific Islander 0 0 0 0 0 0 0


Prefer Not to Respond 0 0 0 0 1 1 1

Geographic Region

No Response 0 0 0 0 0 0 1

East 56 56 57 46 34 18 7

Midwest 15 12 13 26 45 66 79

Southwest 6 8 6 9 6 5 7

West 23 24 24 20 16 10 6

Note. Zero indicates fewer than 0.5%. Missing values indicate no students. Percentages within a demographic grouping may not add up to 100% due to rounding.

aCt aSPIRE EQuatINg

20.6

Table 20.2. Percentages of Students taking English Paper Forms in the Spring 2014 Equating Study by grade and Demographics


Grade (N)

3 4 5 6 7 8 EHS

(5,049) (5,215) (5,038) (5,658) (5,692) (6,027) (4,247)

Gender


Female 48 47 49 49 49 49 46

Male 51 52 50 49 50 49 38

Race/Ethnicity

Not Specified 8 9 9 11 11 12 63



White 60 58 60 57 58 56 28


Asian 2 2 2 1 2 2 2



Prefer Not to Respond

Geographic Region


East 95 96 96 93 92 90 26

Midwest 0 0 0 2 4 4 29

Southwest 5 4 4 4 3 5 18

West 0 0 0 0 0 1 28


aCt aSPIRE EQuatINg

20.7

Table 20.3. Percentages of Students taking mathematics Online Forms in the Spring 2014 Equating Study by grade and Demographics


Grade (N)

3 4 5 6 7 8 EHS

(4,415) (4,511) (5,512) (5,364) (6,735) (10,614) (16,694)

Gender


Female 47 46 47 45 44 47 42

Male 44 46 46 48 46 45 41

Race/Ethnicity

Not Specified 25 24 27 31 35 36 56



White 42 44 42 42 43 46 36


Asian 1 2 2 2 1 1 1




Geographic Region


East 72 75 66 68 53 34 7

Midwest 10 8 10 16 31 53 79

Southwest 5 5 12 5 6 5 8

West 14 12 13 11 10 8 5


aCt aSPIRE EQuatINg

20.8

Table 20.4. Percentages of Students taking mathematics Paper Forms in the Spring 2014 Equating Study by grade and Demographics


Grade (N)

3 4 5 6 7 8 EHS

(19,375) (18,635) (17668) (19,106) (20,340) (19,923) (4,121)

Gender


Female 49 48 49 49 49 49 47

Male 50 52 51 51 51 50 38

Race/Ethnicity




White 56 56 57 55 55 55 29


Asian 1 1 1 1 1 1 2




Geographic Region


East 99 99 99 98 98 97 26

Midwest 0 0 0 1 1 1 29

Southwest 1 1 1 1 1 2 18

West 0 0 0 0 0 0 26


aCt aSPIRE EQuatINg

20.9

Table 20.5. Percentages of Students taking Reading Online Forms in the Spring 2014 Equating Study by grade and Demographics


Grade (N)

3 4 5 6 7 8 EHS

(5,343) (5,579) (3,523) (6,002) (7,697) (11,608) (17,486)

Gender


Female 45 44 46 45 43 46 42

Male 46 47 49 47 47 45 42

Race/Ethnicity

Not Specified 24 23 22 31 34 35 57



White 42 44 46 42 43 46 35


Asian 1 2 1 1 1 1 1




Geographic Region


East 73 75 75 69 56 37 7

Midwest 10 9 10 15 29 51 79

Southwest 4 5 1 5 4 4 7

West 13 11 14 11 11 9 6


aCt aSPIRE EQuatINg

20.10

Table 20.6. Percentages of Students taking Reading Paper Forms in the Spring 2014 Equating Study by grade and Demographics


Grade (N)

3 4 5 6 7 8 EHS

(10,462) (13,965) (6,258) (11,263) (10,641) (12,276) (4,101)

Gender


Female 48 48 48 48 48 49 47

Male 51 51 51 51 51 50 38

Race/Ethnicity




White 59 58 59 57 56 57 29


Asian 1 1 2 1 1 1 2




Geographic Region


East 97 98 97 97 96 95 27

Midwest 0 0 0 1 2 2 28

Southwest 3 2 3 2 1 2 18

West 0 0 0 0 0 0 27


aCt aSPIRE EQuatINg

20.11

Table 20.7. Percentages of Students taking Science Online Forms in the Spring 2014 Equating Study by grade and Demographics


Grade (N)

3 4 5 6 7 8 EHS

(2,158) (2,305) (2,440) (2,593) (4,048) (7,654) (16,460)

Gender

Not Specified 11 12 13 14 17 12 17

Female 45 44 43 41 41 45 42

Male 44 44 44 44 43 43 42

Race/Ethnicity

Not Specified 37 35 49 56 57 47 57



White 29 33 25 26 30 42 36


Asian 1 1 1 1 1 1 1




Geographic Region


East 53 60 50 39 24 15 7

Midwest 14 11 22 29 53 68 79

Southwest 9 10 7 10 7 7 8

West 23 19 21 22 16 11 5


aCt aSPIRE EQuatINg

20.12

Table 20.8. Percentages of Students taking Science Paper Forms in the Spring 2014 Equating Study by grade and Demographics


Grade (N)

3 4 5 6 7 8 EHS

(4,208) (4,371) (4,012) (4,963) (5,217) (5,394) (4,159)

Gender


Female 49 47 48 47 48 49 47

Male 50 53 51 50 50 48 38

Race/Ethnicity

Not Specified 10 10 10 12 11 13 63



White 56 56 52 53 49 54 28


Asian 1 1 1 1 1 1 2




Geographic Region


East 94 95 95 92 92 88 26

Midwest 0 0 0 3 4 4 30

Southwest 6 5 5 5 3 6 18

West 0 0 0 0 0 1 26


aCt aSPIRE EQuatINg

20.13

Figure 20.1. average English scale scores against common item raw scores

English grade 4 online

English grade 6 paper

aCt aSPIRE EQuatINg

20.14

Figure 20.2. average mathematics scale scores against common item raw scores

Mathematics grade 5 online

Mathematics grade 7 paper

aCt aSPIRE EQuatINg

20.15

Figure 20.3. average reading scale score against common item raw scores

Reading grade 4 online

Reading grade 8 paper

aCt aSPIRE EQuatINg

20.16

Figure 20.4. average science scale scores against common item raw scores

Science grade 6 online

Science grade 8 paper

aCt aSPIRE EQuatINg

20.17

Figure 20.5. unrounded scale score standard error of equating for selected subjects and grades

0 5 10 15 20 25

0.0

0.2

0.4

0.6

0.8

English, Grade 3, Online Form

New Form Raw Score

Unr

ound

ed S

cale

Sco

re S

EE

0 10 20 30 40 50

0.0

0.2

0.4

0.6

0.8

English, Grade EHS, Paper Form

New Form Raw Score

Unr

ound

ed S

cale

Sco

re S

EE

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

Mathematics, Grade 7, Online Form

New Form Raw Score

Unr

ound

ed S

cale

Sco

re S

EE

0 10 20 30 40 50

0.0

0.2

0.4

0.6

0.8

Mathematics, Grade EHS, Paper Form

New Form Raw Score

Unr

ound

ed S

cale

Sco

re S

EE

aCt aSPIRE EQuatINg

20.18

Figure 20.5. (Continued)

0 5 10 15 20 25 30

0.0

0.2

0.4

0.6

0.8

Reading, Grade 8, Online Form

New Form Raw Score

Unr

ound

ed S

cale

Sco

re S

EE

0 5 10 15 20 25 30

0.0

0.2

0.4

0.6

0.8

Reading, Grade EHS, Paper Form

New Form Raw Score

Unr

ound

ed S

cale

Sco

re S

EE

0 5 10 15 20 25 30 35

0.0

0.2

0.4

0.6

0.8

Science, Grade 3, Online Form

New Form Raw Score

Unr

ound

ed S

cale

Sco

re S

EE

0 5 10 15 20 25 30 35

0.0

0.2

0.4

0.6

0.8

Science, Grade 5, Paper Form

New Form Raw Score

Unr

ound

ed S

cale

Sco

re S

EE

a.1

aPPENDICES

Appendices

Appendix A: Illustration of the ACT Aspire Mathematics Assessments

AC

T A

spire

Tec

hnic

al B

ulle

tin

41

Figure A.1. mathematics test content specifications and flow

b.1

aPPENDICES

Appendix B: The ACT with Constructed-Response ItemsWhile ACT Aspire covers grades 3 through 10, the scaling study also included grade 11. The primary reason for including grade 11 was so that the ACT Aspire vertical scale could be extended to a version of the ACT and provide scale continuity from grades 3 through 11. The implications of this are twofold. First, the scaling test (ST4) included ACT items so that the vertical scale could include grade 11 and be extended to a version of the ACT. Second, a grade 11 on-grade test was included in the scaling study.

The on-grade test covering grade 11 included in the scaling study had two parts: an operationally administered form of the ACT and a separately administered set of constructed-response-only items. These two components were combined and scaled to obtain scores on the ACT Aspire scale. This grade 11 test is referred to as the ACT with constructed-response tests. In some places, such as tables and figures with limited space, this test may be referred to as “ACT” or “the ACT.”

The constructed-response component was included for mathematics, reading and science Tests. Table B.1 lists the number of constructed-response items included in each component. The ACT writing test was not included in the scaling study.

By including the ACT with constructed-response tests in the scaling study, the ACT Aspire vertical scale was extended to grade 11 and included the ACT combined with constructed-response items, consistent with tests for grades 3–10 that also included constructed-response items at these grades.

Table B.1. Number of Constructed-Response Items and Number of Score Points Included in the On-grade aCt with Constructed-Response test Forms in the Scaling Study

SubjectNumber of Constructed

Response ItemsRaw Number-of-Points

Score Range

Mathematics 6 0–24

Reading 3 0–10

Science 11 0–30

C.1

aPPENDICES

Appendix C: Spring 2013 Scaling Study Demographic Summary TablesTable C.1. Percentages of Students taking English tests in the Scaling Study by grade and Demographics


Grade (N)

3 4 5 6 7 8 9 10 11

(4,277) (3,825) (4,301) (4,683) (4,762) (3,609) (2,415) (2,258) (2,240)

Gender

Not Specified 5 6 5 4 4 5 2 1 89

Female 48 47 48 49 49 49 52 54 6

Male 47 47 46 47 47 47 47 45 5

Race/Ethnicity

Not Specified 33 34 31 26 30 30 39 35 93

Black/African American 27 27 29 27 24 22 29 28 3

American Indian/Alaskan Native 0 0 0 0 0 0 0 0 0

White 29 28 29 26 24 28 17 22 1

Hispanic/Latino 8 8 8 18 17 16 12 11 2

Asian 2 1 2 1 2 2 2 3 1

Native Hawaiian/Other Pacific Islander 1 1 0 0 0 0 0 0

Two or More Races 1 0 1 1 2 1 0 2

Geographic Region

East 35 40 36 31 30 40 25 24 24

Midwest 40 40 39 44 45 39 50 50 41

Southwest 19 13 18 10 13 12 14 20 27

West 6 7 7 16 13 8 11 6 8


C.2

aPPENDICES

Table C.2. Percentages of Students taking mathematics tests in the Scaling Study by grade and Demographics


Grade (N)

3 4 5 6 7 8 9 10 11

(4,300) (3,830) (4,260) (4,577) (4,497) (3,746) (2,479) (2,528) (1,796)

Gender

Not Specified 5 6 5 3 5 5 1 1 92

Female 48 46 49 49 48 48 52 53 4

Male 47 47 46 48 47 47 47 46 4

Race/Ethnicity

Not Specified 33 34 31 26 31 29 38 38 94


American Indian/Alaskan Native 0 0 0 0 0 0 0 0

White 29 28 29 27 23 28 17 20 1

Hispanic/Latino 8 8 8 17 17 16 12 12 1

Asian 2 1 2 2 2 2 3 3 1



Geographic Region

East 34 40 36 30 31 39 25 21 27

Midwest 41 40 39 46 44 42 48 53 40

Southwest 18 13 19 10 14 12 13 18 27

West 7 7 7 13 11 7 14 8 6


C.3

aPPENDICES

Table C.3. Percentages of Students taking Reading tests in the Scaling Study by grade and Demographics


Grade (N)

3 4 5 6 7 8 9 10 11

(4,307) (3,661) (4,129) (4,520) (4,475) (3,585) (2,257) (2,260) (1,789)

Gender

Not Specified 4 6 6 4 4 5 1 0 90

Female 48 46 48 49 49 48 52 54 5

Male 48 48 46 47 47 47 46 46 4

Race/Ethnicity

Not Specified 32 35 32 27 31 32 41 40 95


American Indian/Alaskan Native 0 0 0 0 0 0 0 0 0

White 28 28 30 27 24 27 17 21 1

Hispanic/Latino 9 8 8 17 16 16 10 10 1

Asian 2 1 2 2 2 2 2 3 1



Geographic Region

East 36 41 37 32 30 40 26 23 26

Midwest 40 39 39 44 45 40 51 51 40

Southwest 19 14 18 10 14 13 14 19 29

West 6 6 6 14 11 8 9 7 5


C.4

aPPENDICES

Table C.4. Percentages of Students taking Science tests in the Scaling Study by grade and Demographics


Grade (N)

3 4 5 6 7 8 9 10 11

(4,214) (3,571) (3,903) (4,642) (4,756) (3,544) (2,167) (2,314) (1,717)

Gender

Not Specified 5 7 5 4 4 5 2 1 93

Female 48 46 48 49 49 49 51 54 4

Male 47 47 47 47 47 46 47 46 4

Race/Ethnicity

Not Specified 33 37 33 26 29 29 43 41 96



White 29 28 30 28 24 28 19 22 0

Hispanic/Latino 9 8 8 19 17 17 10 11 1

Asian 2 1 2 1 2 2 2 3 1



Geographic Region

East 37 40 37 31 28 39 28 24 28

Midwest 38 39 36 44 46 40 50 51 39

Southwest 19 14 20 10 13 13 14 19 29

West 7 7 8 16 13 8 9 6 4


C.5

aPPENDICES

Table C.5. Percentages of Students taking Writing tests in the Scaling Study by grade and Demographics


Grade (N)

3 4 5 6 7 8 9 10

(4,109) (3,439) (3,935) (4,512) (4,666) (3,522) (2,117) (2,258)

Gender

Not Specified 4 5 4 2 3 3 1 1

Female 49 48 49 50 49 50 53 55

Male 46 47 47 48 47 47 47 45

Race/Ethnicity

Not Specified 33 36 32 25 29 29 43 36

Black/African American 25 24 25 25 23 22 27 24


White 30 29 30 28 25 28 16 22

Hispanic/Latino 9 8 9 19 18 17 11 14

Asian 2 1 2 2 2 2 2 3



Geographic Region

East 37 43 34 31 29 39 28 21

Midwest 37 35 37 42 44 41 50 52

Southwest 19 15 20 10 13 12 15 19

West 7 8 9 17 14 8 7 8


D.1

aPPENDICES

Appendix D: Descriptive Statistics and Reliabilities of Reporting Category Raw ScoresTable D.1. English Reporting Category Raw Score Descriptive Statistics and Reliabilities from Spring 2014 test administration

Tested Grade Mode

Reporting Categorya N

Max Points Mean SD Reliability SEM

3

OnlineCOE 3315 15 8.98 3.32 0.74 1.68

POW 3315 10 4.33 2.08 0.52 1.44

PaperCOE 5139 15 9.45 3.44 0.78 1.63

POW 5139 10 4.79 2.18 0.57 1.43

4

Online

COE 3217 15 9.05 2.86 0.66 1.66

KLA 3217 3 1.45 0.96 0.32 0.79

POW 3217 7 5.01 1.53 0.53 1.05

Paper

COE 5356 15 9.66 2.91 0.70 1.60

KLA 5356 3 1.64 0.94 0.31 0.78

POW 5356 7 5.14 1.50 0.53 1.03

5

Online

COE 3303 15 8.37 2.92 0.67 1.68

KLA 3303 3 1.88 0.79 0.10 0.75

POW 3303 7 4.07 1.64 0.55 1.10

Paper

COE 5380 15 8.95 2.88 0.68 1.62

KLA 5380 3 1.98 0.81 0.15 0.75

POW 5380 7 4.35 1.62 0.56 1.08

6

Online

COE 3706 20 11.44 3.56 0.72 1.89

KLA 3706 3 1.58 0.93 0.26 0.80

POW 3706 12 5.90 2.59 0.65 1.53

Paper

COE 6055 20 11.69 3.56 0.71 1.90

KLA 6055 3 1.71 0.93 0.25 0.80

POW 6055 12 5.76 2.46 0.62 1.53

7

Online

COE 5413 20 12.29 3.82 0.75 1.93

KLA 5413 5 2.74 1.21 0.31 1.01

POW 5413 10 5.22 2.01 0.48 1.44

Paper

COE 5892 20 12.75 3.84 0.75 1.91

KLA 5892 5 2.94 1.23 0.35 0.99

POW 5892 10 5.20 2.03 0.50 1.43

D.2

aPPENDICES

Table D.1. (continued)

Tested Grade Mode



8

Online

COE 8921 20 12.14 3.44 0.72 1.81

KLA 8921 5 3.68 1.29 0.53 0.88

POW 8921 10 6.09 2.29 0.66 1.33

Paper

COE 6213 20 12.35 3.61 0.75 1.79

KLA 6213 5 3.87 1.22 0.53 0.83

POW 6213 10 6.15 2.25 0.65 1.33

EHSb Online

COE 18046 31 17.22 5.81 0.83 2.42

KLA 18046 5 2.74 1.36 0.53 0.93

POW 18046 14 7.99 2.99 0.71 1.60

EHSb Paper

COE 4335 31 18.19 6.03 0.84 2.38

KLA 4335 5 2.95 1.43 0.57 0.93

POW 4335 14 8.09 3.12 0.75 1.57

a Reporting category: POW=Production of Writing, KLA=Knowledge of Language, and COE=Conventions of Standard English

b Early high school form taken by grades 9 and 10 students.

D.3

aPPENDICES

Table D.2. mathematics Reporting Category Raw Score Descriptive Statistics and Reliabilities from Spring 2014 test administration

Tested Grade Mode



3

Online

FOUNDATION 5467 14 5.74 2.53 0.61 1.57

GLP 5467 23 9.25 3.65 0.68 2.08

GLP_G 5467 3 1.35 0.70 0.17 0.64

GLP_MD 5467 3 1.11 0.82 0.06 0.79

GLP_NBT 5467 4 1.31 1.05 0.35 0.85

GLP_NF 5467 3 1.07 0.91 0.28 0.77

GLP_OA 5467 2 1.19 0.70 0.28 0.60

JE 5467 16 6.26 2.57 0.63 1.56

MODELING 5467 24 9.53 3.77 0.69 2.10

Paper

FOUNDATION 20114 14 6.32 2.54 0.63 1.55

GLP 20114 23 9.67 3.58 0.65 2.14

GLP_G 20114 3 1.49 0.75 0.12 0.70

GLP_MD 20114 3 1.12 0.81 0.05 0.79

GLP_NBT 20114 4 1.42 1.09 0.33 0.90

GLP_NF 20114 3 0.98 0.89 0.29 0.75

GLP_OA 20114 2 1.30 0.70 0.27 0.60

JE 20114 16 6.84 2.57 0.64 1.55

MODELING 20114 24 9.90 3.71 0.68 2.09

4

Online

FOUNDATION 5740 14 4.37 1.79 0.38 1.40

GLP 5740 23 6.93 2.88 0.59 1.85

GLP_G 5740 3 0.77 0.84 0.34 0.68

GLP_MD 5740 3 0.88 0.87 0.28 0.74

GLP_NBT 5740 3 0.86 0.77 0.12 0.72

GLP_NF 5740 3 0.99 0.85 0.25 0.74

GLP_OA 5740 3 0.93 0.79 0.08 0.76

JE 5740 16 5.00 1.71 0.52 1.18

MODELING 5740 20 5.72 2.31 0.47 1.69

Paper

FOUNDATION 19366 14 4.45 1.74 0.37 1.38

GLP 19366 23 6.86 2.95 0.59 1.90

GLP_G 19366 3 0.85 0.84 0.23 0.74

GLP_MD 19366 3 0.86 0.85 0.23 0.74

GLP_NBT 19366 3 0.83 0.78 0.15 0.72

GLP_NF 19366 3 1.02 0.87 0.28 0.74

GLP_OA 19366 3 0.92 0.79 0.07 0.76

JE 19366 16 4.95 1.67 0.47 1.22

MODELING 19366 20 5.62 2.37 0.47 1.73

D.4

aPPENDICES

Table D.2. (continued)Tested Grade Mode



5

Online

FOUNDATION 6536 14 4.64 1.99 0.47 1.46

GLP 6536 23 7.26 3.03 0.61 1.90

GLP_G 6536 3 0.81 0.79 0.13 0.74

GLP_MD 6536 3 0.91 0.74 0.07 0.71

GLP_NBT 6536 3 1.00 0.87 0.23 0.76

GLP_NF 6536 3 1.28 0.92 0.19 0.82

GLP_OA 6536 3 1.04 0.75 0.25 0.65

JE 6536 16 4.61 1.81 0.49 1.30

MODELING 6536 24 7.39 2.97 0.60 1.87

Paper

FOUNDATION 19064 14 4.29 1.84 0.43 1.39

GLP 19064 23 7.44 3.07 0.59 1.96

GLP_G 19064 3 0.98 0.79 0.08 0.76

GLP_MD 19064 3 0.87 0.75 0.07 0.73

GLP_NBT 19064 3 0.93 0.85 0.20 0.76

GLP_NF 19064 3 1.12 0.90 0.19 0.81

GLP_OA 19064 3 1.36 0.88 0.22 0.77

JE 19064 16 4.50 1.77 0.49 1.26

MODELING 19064 24 7.57 2.87 0.56 1.91

6

Online

FOUNDATION 6288 18 5.40 2.57 0.61 1.60

GLP 6288 28 10.01 3.84 0.68 2.18

GLP_EE 6288 4 1.09 0.98 0.27 0.84

GLP_G 6288 4 1.50 1.03 0.22 0.91

GLP_NS 6288 4 1.75 0.97 0.26 0.84

GLP_RP 6288 4 1.80 1.00 0.27 0.85

GLP_S 6288 4 1.50 0.93 0.28 0.79

JE 6288 16 4.22 1.78 0.48 1.29

MODELING 6288 26 9.47 3.82 0.69 2.13

Paper

FOUNDATION 20077 18 4.91 2.48 0.59 1.60

GLP 20077 28 10.17 3.85 0.67 2.22

GLP_EE 20077 4 1.09 0.95 0.17 0.86

GLP_G 20077 4 1.36 0.99 0.20 0.89

GLP_NS 20077 4 1.83 1.04 0.28 0.89

GLP_RP 20077 4 1.73 0.97 0.25 0.84

GLP_S 20077 4 1.66 0.97 0.27 0.83

JE 20077 16 4.13 1.86 0.49 1.33

MODELING 20077 26 9.08 3.70 0.68 2.10

D.5

aPPENDICES




7

Online

FOUNDATION 8195 18 6.41 2.69 0.60 1.70

GLP 8195 28 10.36 4.47 0.78 2.09

GLP_EE 8195 4 1.14 1.05 0.38 0.83

GLP_G 8195 4 1.64 1.13 0.39 0.89

GLP_NS 8195 4 1.64 1.00 0.38 0.79

GLP_RP 8195 4 1.42 1.11 0.38 0.88

GLP_S 8195 4 1.93 1.04 0.44 0.78

JE 8195 16 5.46 1.98 0.57 1.29

MODELING 8195 30 10.73 4.30 0.75 2.14

Paper

FOUNDATION 20965 18 5.97 2.57 0.57 1.68

GLP 20965 28 9.73 4.13 0.73 2.15

GLP_EE 20965 4 1.13 1.01 0.31 0.83

GLP_G 20965 4 1.39 1.07 0.33 0.88

GLP_NS 20965 4 1.48 1.00 0.29 0.84

GLP_RP 20965 4 1.39 1.09 0.36 0.88

GLP_S 20965 4 1.80 1.07 0.30 0.89

JE 20965 16 5.20 1.84 0.52 1.27

MODELING 20965 30 9.95 3.96 0.71 2.13

8

Online

FOUNDATION 12154 20 7.20 2.85 0.65 1.69

GLP 12154 33 14.01 5.66 0.83 2.36

GLP_EE 12154 6 2.95 1.53 0.56 1.01

GLP_F 12154 4 2.26 1.19 0.44 0.89

GLP_G 12154 6 2.91 1.43 0.44 1.07

GLP_NS 12154 2 1.00 0.79 0.41 0.61

GLP_S 12154 3 1.69 1.00 0.47 0.73

JE 12154 20 5.47 2.41 0.63 1.47

MODELING 12154 40 16.04 5.73 0.82 2.45

Paper

FOUNDATION 20856 20 7.42 2.91 0.65 1.72

GLP 20856 33 13.14 5.64 0.82 2.42

GLP_EE 20856 6 2.64 1.45 0.51 1.02

GLP_F 20856 4 1.97 1.18 0.42 0.90

GLP_G 20856 6 2.71 1.43 0.46 1.05

GLP_NS 20856 2 0.92 0.79 0.40 0.61

GLP_S 20856 3 1.49 1.01 0.45 0.75

JE 20856 20 5.79 2.48 0.61 1.55

MODELING 20856 40 15.21 5.51 0.80 2.47

D.6

aPPENDICES




EHSb

Online

FOUNDATION 17767 20 8.63 3.78 0.75 1.87

GLP 17767 33 12.21 5.08 0.79 2.32

GLP_A 17767 5 2.13 1.22 0.41 0.94

GLP_F 17767 4 1.32 1.09 0.41 0.84

GLP_G 17767 5 2.24 1.16 0.30 0.97

GLP_N 17767 3 1.34 0.96 0.36 0.77

GLP_S 17767 4 1.84 1.14 0.38 0.90

JE 17767 20 5.45 2.73 0.70 1.49

MODELING 17767 33 12.36 4.97 0.79 2.29

Paper

FOUNDATION 4320 20 8.87 4.02 0.77 1.93

GLP 4320 33 12.62 5.42 0.81 2.38

GLP_A 4320 5 2.31 1.32 0.44 0.98

GLP_F 4320 4 1.42 1.14 0.45 0.84

GLP_G 4320 5 2.21 1.17 0.32 0.96

GLP_N 4320 3 1.42 0.98 0.39 0.76

GLP_S 4320 4 1.93 1.17 0.40 0.91

JE 4320 20 5.63 3.00 0.72 1.58

MODELING 4320 33 12.57 5.41 0.81 2.35

a Reporting category: GLP=Grade Level Progress, JE=Justification and Explanation, GLP_NF=Number and Operations—Fractions, GLP_NBT=Number and Operations in Base 10, GLP_NS=The Number System, GLP_N=Number and Quantify, GLP_OA=Operations and Algebraic Thinking, GLP_EE=Expressions and Equations, GLP_RP=Ratios and Proportional Relationships, GLP_A=Algebra, GLP_F=Functions, GLP_G=Geometry, GLP_MD=Measurement and Data, and GLP_S=Statistics and Probability


D.7

aPPENDICES

Table D.3. Reading Reporting Category Raw Score Descriptive Statistics and Reliabilities from Spring 2014 test administration

Tested Grade Mode



3

Online

CAS 6027 8 4.25 1.97 0.58 1.28

IOK 6027 5 1.77 1.38 0.41 1.06

KID 6027 16 8.03 3.69 0.77 1.76

Paper

CAS 20058 8 4.65 1.90 0.55 1.28

IOK 20058 5 1.92 1.41 0.40 1.09

KID 20058 16 9.05 3.52 0.74 1.80

4

Online

CAS 6281 9 5.26 2.32 0.62 1.42

IOK 6281 5 1.46 1.28 0.48 0.92

KID 6281 15 8.31 3.18 0.71 1.70

Paper

CAS 19344 9 5.58 2.19 0.59 1.41

IOK 19344 5 1.63 1.23 0.39 0.97

KID 19344 15 8.78 3.11 0.69 1.72

5

Online

CAS 6345 8 5.45 1.88 0.55 1.26

IOK 6345 5 2.24 1.39 0.41 1.07

KID 6345 16 9.26 3.61 0.75 1.79

Paper

CAS 19405 8 5.41 1.86 0.55 1.24

IOK 19405 5 2.23 1.34 0.39 1.04

KID 19405 16 9.43 3.45 0.73 1.80

6

Online

CAS 6622 8 4.43 1.94 0.56 1.29

IOK 6622 5 2.37 1.58 0.40 1.22

KID 6622 16 10.13 3.51 0.75 1.77

Paper

CAS 20070 8 4.65 1.90 0.54 1.29

IOK 20070 5 2.39 1.61 0.41 1.24

KID 20070 16 10.42 3.41 0.73 1.76

7

Online

CAS 8396 9 4.83 1.91 0.50 1.35

IOK 8396 5 2.07 1.48 0.38 1.17

KID 8396 15 7.69 3.20 0.70 1.76

Paper

CAS 21043 9 4.90 1.92 0.49 1.38

IOK 21043 5 2.38 1.48 0.37 1.18

KID 21043 15 7.66 3.18 0.69 1.76

D.8

aPPENDICES


Tested Grade Mode



8

Online

CAS 12567 6 4.01 1.54 0.53 1.06

IOK 12567 4 1.49 1.25 — c — c

KID 12567 21 12.21 4.21 0.77 2.03

Paper

CAS 20915 6 4.02 1.58 0.52 1.10

IOK 20915 4 1.72 1.24 — c — c

KID 20915 21 12.15 4.08 0.75 2.04

EHSb

Online CAS 18132 7 4.01 1.80 0.57 1.18

Online IOK 18132 5 1.75 1.09 0.28 0.92

Online KID 18132 19 10.62 4.45 0.82 1.89

Paper CAS 4326 7 4.28 1.82 0.60 1.16

Paper IOK 4326 5 1.91 1.11 0.21 0.98

Paper KID 4326 19 11.48 4.36 0.82 1.87

a Reporting category: KID=Key Ideas and Details, CAS=Craft and Structure, and IOK=Integration of Knowledge and Ideas

b Early high school form taken by grades 9 and 10 students.c Only one constructed response item is included in this reporting category, thus reliability and SEM cannot be

computed.

D.9

aPPENDICES

Table D.4. Science Reporting Category Raw Score Descriptive Statistics and Reliabilities from Spring 2014 test administration

Tested Grade Mode



3

Online

EMI 2715 12 4.08 2.77 0.65 1.64

IOD 2715 14 6.95 3.23 0.76 1.58

SIN 2715 10 3.91 2.31 0.64 1.38

Paper

EMI 4286 12 5.18 2.96 0.68 1.67

IOD 4286 14 7.41 3.44 0.80 1.55

SIN 4286 10 5.01 2.53 0.65 1.50

4

Online

EMI 2915 11 4.10 2.18 0.56 1.44

IOD 2915 15 9.13 3.14 0.75 1.57

SIN 2915 10 3.56 1.85 0.49 1.33

Paper

EMI 4468 11 4.54 2.42 0.60 1.53

IOD 4468 15 9.60 3.26 0.75 1.62

SIN 4468 10 4.13 1.93 0.51 1.35

5

Online

EMI 3006 8 3.78 2.17 0.52 1.50

IOD 3006 14 9.03 2.99 0.69 1.66

SIN 3006 14 5.41 2.64 0.67 1.52

Paper

EMI 4357 8 4.09 1.99 0.45 1.48

IOD 4357 14 9.20 2.91 0.68 1.65

SIN 4357 14 5.48 2.71 0.69 1.52

6

Online

EMI 3034 14 7.57 2.93 0.63 1.78

IOD 3034 14 8.54 3.06 0.72 1.61

SIN 3034 12 5.85 2.72 0.72 1.44

Paper

EMI 5333 14 8.01 3.03 0.66 1.76

IOD 5333 14 8.94 3.04 0.73 1.57

SIN 5333 12 5.70 2.77 0.72 1.47

7

Online

EMI 4688 12 5.83 2.94 0.67 1.69

IOD 4688 20 10.19 4.24 0.81 1.85

SIN 4688 8 4.20 2.10 0.63 1.27

Paper

EMI 5329 12 5.68 2.84 0.63 1.72

IOD 5329 20 10.09 4.03 0.78 1.90

SIN 5329 8 3.87 2.03 0.64 1.22

D.10

aPPENDICES


Tested Grade Mode



8

Online

EMI 8323 13 4.93 2.92 0.68 1.64

IOD 8323 17 10.32 3.36 0.75 1.69

SIN 8323 10 4.61 2.28 0.59 1.45

Paper

EMI 5476 13 5.13 3.08 0.69 1.73

IOD 5476 17 10.44 3.32 0.75 1.65

SIN 5476 10 4.84 2.39 0.63 1.45

EHSb

Online

EMI 17199 13 5.15 3.13 0.73 1.64

IOD 17199 18 7.83 3.63 0.75 1.82

SIN 17199 9 3.82 2.18 0.59 1.40

Paper

EMI 4199 13 5.68 3.41 0.76 1.68

IOD 4199 18 8.92 3.79 0.76 1.87

SIN 4199 9 4.10 2.34 0.65 1.39

a Reporting category: IOD=Interpretation of Data, SIN=Scientific Investigation, and EMI=Evaluation of Models, Inferences, and Experimental Results


E.1

aPPENDICES

Appendix E: ACT Aspire ELA Content Framework and Alignment: EnglishGrade 3 Grade 4 Grade 5 Grade 6 Grade 7 Grade 8 EHS (9-10)

3 essays Avg. 150

words each

3 essays Avg. 150

words each

3 essays Avg. 250

words each

3 essays Avg. 300

words each

4 essays Avg. 300

words each

4 essays Avg. 300

words each

4 essays Avg. 300

words each

• Increasingly complex text• Increasingly sophisticated language use

• Balance of Expressive Narrative, Informational, and Persuasive essays

Production of Writingtext Purposes and topic DevelopmentDemonstrate an understanding of and control over the rhetorical aspects of texts by identifying the purpose of parts of texts, determining whether a text has met its intended goal, and evaluating the relevance of material in terms of a text’s focus.

OrganizationUse various strategies to ensure that a text is logically organized, flows smoothly, and has an effective introduction and conclusion.

[Knowledge of Language not assessed with Grade 3 items]

Knowledge of LanguageClarity, Style, and toneDemonstrate effective language use through ensuring precision and concision in word choice, maintaining consistency in style and tone, and using language appropriate for the context.

Conventions of Standard EnglishSentence Structure and FormationRecognize common problems with sentence structure and formation in a text and make revisions to improve the writing.

usageRecognize common problems with standard English usage in a text and make revisions to improve the writing.

Punctuation and CapitalizationRecognize common problems with standard English punctuation and capitalization and make revisions to improve the writing.

E.2

aPPENDICES

Grade 3

• 3 essays, average 150 words each• Balance of Expressive Narrative and Informational Essays

Production of WritingText Purposes and Topic Development

Demonstrate an understanding of and control over the rhetorical aspects of texts by identifying the purpose of parts of texts, determining whether a text has met its intended goal, and evaluating the relevance of material in terms of a text’s focus.• Identify the purpose of a word or phrase.• Use a word or phrase to accomplish a straightforward rhetorical purpose or effect.Organization

Use various strategies to ensure that a text is logically organized, flows smoothly, and has an effective introduction and conclusion.• Organize sentences in logical sequence within a short paragraph.• Determine the most logical placement for a sentence in a short paragraph.• Use logical ordering for phrases within a sentence.• Use an effective introductory or concluding sentence in a short paragraph with a straightforward topic.• Use effective transition words and phrases to link ideas between sentences (e.g., “For example”).

Conventions of Standard EnglishSentence Structure and Formation

Recognize common problems with sentence structure and formation in a text and make revisions to improve the writing.

Forming Sentences and Relating Clauses• Correct faulty coordination in compound sentences and issues with subordination in complex sentences appropriate for

3rd-grade students.• Use a coordinating conjunction to join two short, simple sentences.Usage

Recognize common problems with standard English usage in a text and make revisions to improve the writing.

Forming and Using Nouns and Pronouns• Use the correct pronoun in a sentence.• Form regular and irregular plural nouns.Forming and Using Verbs• Ensure subject-verb agreement in simple sentences.• Use the correct verb form in a simple sentence (e.g., runs, ran, running) so that it makes sense in context.• Form regular and irregular verbs in various tenses.Forming and Using Modifiers• Form adjectives and adverbs; ensuring the correct degree (comparative, superlative) is used.• Determine whether to use the adjective or adverb form (ending in –ly) in a sentence.Forming Possessives• Correctly form possessive pronouns and adjectives when the possessive relationship is clear.Observing Usage Conventions• Correctly use idiomatic language appropriate for 3rd-grade school contexts.

E.3

aPPENDICES

Grade 3 (continued)

Punctuation and Capitalization

Recognize common problems with standard English punctuation and capitalization and make revisions to improve the writing.

Punctuating Sentences and Sentence Elements• Correct unnecessary punctuation in a short sentence or paragraph.• Eliminate unnecessary punctuation that causes ambiguity from a sentence (e.g., a comma that separates subject and verb in

a simple sentence).• Use the correct punctuation at the end of a sentence.Punctuating Possessives• Use an apostrophe to form a possessive noun.Capitalization conventions• Capitalize proper names.

Grade 4



Demonstrate an understanding of and control over the rhetorical aspects of texts by identifying the purpose of parts of texts, determining whether a text has met its intended goal, and evaluating the relevance of material in terms of a text’s focus.• Describe how deleting a sentence changes the development of a short paragraph.• Use a word or phrase to achieve a specific rhetorical purpose or effect (e.g., provide better support for a claim).• Determine if content is relevant to the topic and focus of a paragraph. Provide a reason to support the decision.Organization

Use various strategies to ensure that a text is logically organized, flows smoothly, and has an effective introduction and conclusion.• Organize sentences in logical sequence within a short paragraph.• Determine the most logical placement for a sentence in a paragraph.• Use logical ordering for phrases within a sentence.• Use an effective introductory or concluding sentence in a short paragraph.• Use effective transition words and phrases to emphasize relationships between sentences.• Determine where to divide a short paragraph with multiple topics in order to accomplish a specific rhetorical purpose.

Knowledge of languageClarity, Style, and Tone

Demonstrate effective language use through ensuring precision and concision in word choice, maintaining consistency in style and tone, and using language appropriate for the context.• Use words that provide an appropriate level of precision, and express simple relationships within sentences (e.g.,

conjunctions).• Use words and phrases appropriate to level of text formality (“effective president” vs. “really cool president”).

E.4

aPPENDICES

Grade 4 (continued)


Sentence Structure and Formation• Correct faulty coordination, subordination, and parallelism in sentences at the appropriate level of complexity for 4th-grade

students.• Use coordinating conjunctions to combine 2-3 short sentences.Fixing Fragments and Run-ons• Identify and correct obvious sentence fragments.• Correct run-on sentences, including comma splices.Usage


Forming and Using Nouns and Pronouns• Use the correct pronoun in a sentence.• Form regular and irregular plural nouns.Forming and Using Verbs• Correct common mistakes with subject-verb agreement.• Form regular and irregular common verbs in various tenses (rode, became, watches).• Use modal auxiliary verb appropriate to context.Forming and Using Modifiers• Form adjectives and adverbs; ensuring the correct degree (comparative, superlative) is used.• Determine whether to use the adjective or adverb form (ending in –ly) in a sentence.Forming Possessives• Correctly form possessive pronouns and adjectives when the possessive relationship is clear.Observing Usage Conventions• Correctly use idiomatic language appropriate for 4th-grade school contexts.• Distinguishing between and among words frequently confused by students at the 4th-grade level.• Use prepositions appropriate to context.


Punctuating Sentences and Sentence Elements• Correct unnecessary punctuation in a short sentence or paragraph• Eliminate unnecessary punctuation that causes ambiguity from a sentence (e.g., a comma that separates subject and verb in

a simple sentence).• Use the correct punctuation at the end of a sentence.• Use a comma before a coordinating conjunction in compound sentences.Punctuating Speech and Textual Quotations• Use quotation marks to punctuate dialogue correctly.Punctuating Possessives• Use context to decide between plural and possessive nouns.Capitalization conventions• Observe capitalization conventions for multi word proper nouns (e.g., Apex Soccer League).

E.5

aPPENDICES

Grade 5



Demonstrate an understanding of and control over the rhetorical aspects of texts by identifying the purpose of parts of texts, determining whether a text has met its intended goal, and evaluating the relevance of material in terms of a text’s focus.• Describe how deleting a sentence changes the development of a paragraph.• Determine whether an essay has met a specified rhetorical purpose and provide examples to explain why.• Use a word or phrase to accomplish a specific rhetorical purpose (e.g., provide better support for a claim).• Determine if content is relevant to the topic and focus of a paragraph. Provide a reason to support the decision.Organization

Use various strategies to ensure that a text is logically organized, flows smoothly, and has an effective introduction and conclusion.• Organize sentences in a logical sequence within a short paragraph; ensure a cohesive flow of ideas.• Determine the most logical placement for a sentence in a paragraph which has several ideas.• Use logical ordering for phrases within a sentence.• Use an effective introductory or concluding sentence that maintains content focus without redundancy in a short

paragraph.• Use effective transition words or phrases appropriate to content and genre (e.g., later v second for a narrative).• Determine where to divide a short paragraph with multiple topics in order to accomplish a specific rhetorical purpose.• Order paragraphs in a logical sequence.


Demonstrate effective language use through ensuring precision and concision in word choice, maintaining consistency in style and tone, and using language appropriate for the context.• Use words that provide an appropriate level of precision, and express simple relationships within sentences (e.g.,

conjunctions).• Use words and phrases appropriate to level of text formality.



Forming Sentences and Relating Clauses• Correct faulty coordination, subordination, and parallelism in sentences at the appropriate level of complexity for 5th-grade

students.Fixing Fragments and Run-ons• Identify and correct longer sentence fragments.• Correct run-on sentences, including comma splices.Maintaining Consistency and Coherence• Correct inappropriate shifts in verb tense and aspect.

E.6

aPPENDICES

Grade 5 (continued)

Usage


Forming and Using Nouns and Pronouns• Use the correct pronoun in a sentence.Forming and Using Verbs• Correct common mistakes with subject-verb agreement.• Form regular and irregular common verbs in various tenses (rode, became, watches).• Use modal auxiliary verb appropriate to context.Forming and Using Modifiers• Form adjectives and adverbs; ensuring the correct degree (comparative, superlative) is used.• Determine whether to use the adjective or adverb form (ending in –ly) in a sentence.Forming Possessives• Correctly form possessive adjectives when the possessive relationship is somewhat clear.Observing Usage Conventions• Correctly use idiomatic language appropriate for 5th-grade school contexts.• Distinguish between and among words frequently confused by students at the 5th-grade level.• Use prepositions appropriate to context.


Punctuating Sentences and Sentence Elements• Correct unnecessary punctuation in a short sentence or paragraph.• Eliminate unnecessary punctuation that causes ambiguity from a sentence (e.g., a comma that separates subject and verb in

a simple sentence).• Use a comma before a coordinating conjunction in compound sentences.• Use a comma to correctly punctuate a phrase at the beginning of a sentence.• Use a comma to set off tag questions and yes and no.• Use commas in a sentence to correctly separate items in a series.• Distinguish between restrictive/nonrestrictive or essential/nonessential sentence elements.Punctuating Speech and Textual Quotations• Punctuate dialogue, including exclamations and questions.• Punctuating Possessives• Form possessive singular and plural nouns and use them in appropriate contexts.Capitalization conventions• Apply conventions of capitalization.

E.7

aPPENDICES

Grade 6



Demonstrate an understanding of and control over the rhetorical aspects of texts by identifying the purpose of parts of texts, determining whether a text has met its intended goal, and evaluating the relevance of material in terms of a text’s focus.• Determine the writer’s purpose for including a given sentence. • Describe how deleting a phrase or sentence changes the development of a paragraph.• Determine whether an essay has met a specified rhetorical purpose and provide examples to explain why.• Use a word or phrase to accomplish a specific rhetorical purpose (e.g., provide better support for a claim).• Determine if content is relevant to the topic and focus of a paragraph. Provide a reason to support the decision.Organization

Use various strategies to ensure that a text is logically organized, flows smoothly, and has an effective introduction and conclusion.• Organize sentences in a logical sequence within a paragraph; ensure a cohesive flow of ideas.• Add or move sentences in order to improve local cohesion and clarify the paragraph as a whole.• Use logical ordering for phrases within a sentence.• Use an effective introductory, concluding or transition sentence to emphasize relationships within and between shorter

paragraphs.• Use effective transition words or phrases to connect ideas in two adjacent sentences and to connect a new paragraph with

ideas in the previous (e.g., on the other hand).• Determine where to divide a short paragraph with multiple topics in order to accomplish a specific rhetorical purpose.• Order paragraphs in a logical sequence to improve cohesion.


Demonstrate effective language use through ensuring precision and concision in word choice, maintaining consistency in style and tone, and using language appropriate for the context.• Revise a redundant phrase to make the sentence more concise.• Use words and phrases to precisely express ideas and relationships (e.g., subordinating conjunctions, conjunctive adverb).• Revise a word or phrase so that the sentence matches the tone of the essay (either too informal or too formal for a given

purpose).Conventions of Standard EnglishSentence Structure and Formation



students.Fixing Fragments and Run-ons• Identify and correct longer sentence fragments.• Correct run-on sentences, including comma splices.Maintaining Consistency and Coherence• Drawing on the context of a sentence or paragraph, correct an inappropriate shift in pronoun number or person.• Correct inappropriate shifts in verb tense and aspect, including auxiliary verbs.

E.8

aPPENDICES

Grade 6 (continued)

Usage


Forming and Using Nouns and Pronouns• Correct common mistakes with pronoun-antecedent agreement.• Correct vague or ambiguous pronouns to improve local coherence.• Use appropriate pronoun case..Forming and Using Verbs• Correct common mistakes with subject-verb agreement.• Form regular and irregular verbs to correctly express tense and aspect of actions (e.g., she reads, yesterday she had read).Forming and Using Modifiers• Form adjectives and adverbs; ensuring the correct degree (comparative, superlative) is used. • Make distinctions between different comparative constructions (e.g., fewer than, less than).• Determine whether to use the adjective or adverb form of modifiers in a sentence with pairs of modifiers (e.g., “she spoke

quietly but quickly”).Forming Possessives• Correctly form possessive adjectives when the possessive relationship is somewhat clear.Observing Usage Conventions• Correctly use idiomatic language appropriate for 6th-grade school contexts.• Distinguish among words frequently confused by students at the 6th-grade level.• Form and use prepositional phrases to express specific actions (e.g., “comes from outside,” “look up in the tree”).Punctuation and Capitalization


Punctuating Sentences and Sentence Elements• Correct unnecessary punctuation from a longer sentence with a relatively complex structure (e.g., comma used before a

coordinating conjunction when there are not independent clauses).• Eliminate unnecessary punctuation that causes ambiguity from a sentence when the context is more complex.• Use a comma to correctly punctuate a phrase at the beginning of a sentence.• Use a comma to correctly indicate direct address.• Use a comma to set off tag questions and yes and no.• Use commas in a sentence to correctly separate items in a series.• Determine if a phrase is restrictive/nonrestrictive and use commas to set it off, if necessaryPunctuating Possessives• Form possessive singular and plural nouns and use them in appropriate contexts.

E.9

aPPENDICES

Grade 7

• 4 essays, average 300 words each• Balance of Expressive Narrative, Informational, and Persuasive Essays


Demonstrate an understanding of and control over the rhetorical aspects of texts by identifying the purpose of parts of texts, determining whether a text has met its intended goal, and evaluating the relevance of material in terms of a text’s focus.• Describe how deleting a phrase or sentence effects the development of a longer paragraph.• Determine whether an essay has met a specified rhetorical purpose and provide examples to explain why.• Use a word or phrase to accomplish a specific rhetorical purpose (e.g., provide better support for a claim).• Determine if content is relevant to the topic and focus of a paragraph and/or to the essay as a whole. Provide a reason to

support the decision.Organization

Use various strategies to ensure that a text is logically organized, flows smoothly, and has an effective introduction and conclusion.• Organize sentences in a logical sequence within a longer paragraph to accomplish a specific rhetorical purpose; ensure a

cohesive flow of ideas. • Add or move sentences in order to improve cohesion of the essay as a whole.• Use logical ordering for phrases within a sentence.• Use an effective introductory, concluding or transition sentence to emphasize relationships within and between paragraphs.• Use effective transition words or phrases to emphasize relationships among ideas.• Determine where to divide a paragraph with multiple topics in order to accomplish a specific rhetorical purpose.• Order paragraphs in a logical sequence to improve cohesion.


Demonstrate effective language use through ensuring precision and concision in word choice, maintaining consistency in style and tone, and using language appropriate for the context.• Revise a redundant phrase to make the sentence more concise.• Use words and phrases to precisely express ideas and relationships (e.g., subordinating conjunctions, conjunctive adverb).• Revise a word or phrase so that the sentence matches the tone of the essay (either too informal or too formal for a given

purpose).Conventions of Standard EnglishSentence Structure



students.• Correct common problems with misplaced modifiers.• Correct common problems with squinters and danglers.Fixing Fragments and Run-ons• Identify and correct sentence fragments when the fragments are more complex in structure.• Correct run-on sentences, including comma splices.Maintaining Consistency and Coherence• Drawing on the context of a sentence or paragraph, correct an inappropriate shift in pronoun number or person.• Correct inappropriate shifts in verb tense and aspect, including auxiliary verbs.

E.10

aPPENDICES

Grade 7 (continued)

Usage


Forming and Using Nouns and Pronouns• Correct common mistakes with pronoun-antecedent agreement.• Correct vague or ambiguous pronouns to improve local coherence.• Use appropriate pronoun case.Forming and Using Verbs• Correct common mistakes with subject-verb agreement in compound and complex sentences.Forming and Using Modifiers• Form adjectives and adverbs; ensuring the correct degree (comparative, superlative) is used. • Make distinctions between different comparative constructions (e.g., fewer than, less than).• Determine whether to use the adjective or adverb form of modifiers in a sentence with pairs of modifiers (e.g., “she spoke

quietly but quickly”).Forming Possessives• Correctly form possessive adjectives in sentences with somewhat complicated structures.Observing Usage Conventions• Correctly use idiomatic language appropriate for 7th-grade school contexts.• Distinguish among words frequently confused by students at the 7th-grade level.Punctuation and Capitalization


Punctuating Sentences and Sentence Elements• Correct unnecessary punctuation from longer sentences (e.g., those containing prepositional phrases and compound items

in a series).• Eliminate unnecessary punctuation that causes ambiguity from a sentence when the context is more complex.• Use commas to separate multi-word phrases in a series.• Determine if a phrase is restrictive/nonrestrictive and use commas to set it off, if necessary.Punctuating Possessives• Form possessive singular and plural nouns and use them in appropriate contexts

E.11

aPPENDICES

Grade 8



Demonstrate an understanding of and control over the rhetorical aspects of texts by identifying the purpose of parts of texts, determining whether a text has met its intended goal, and evaluating the relevance of material in terms of a text’s focus.• Describe how deleting a phrase or sentence changes the development of a paragraph or the essay as a whole.• Determine whether an essay has met a specified rhetorical purpose and provide examples to explain why.• Use a word or phrase to accomplish a specific rhetorical purpose (e.g., provide better support for a claim).• Determine if content is relevant to the topic and focus of a paragraph and/or to the essay as a whole when the context is

more complex. Provide a reason to support the decision.Organization

Use various strategies to ensure that a text is logically organized, flows smoothly, and has an effective introduction and conclusion.• Organize sentences in a logical sequence within a longer paragraph to accomplish a specific rhetorical purpose; ensure a

cohesive flow of ideas. • Add or move sentences in order to improve cohesion of the essay as a whole.• Use logical ordering for phrases within a sentence.• Use an effective introductory, concluding or transition sentence to emphasize relationships within and between paragraphs.• Use effective transition words or phrases to emphasize relationships among ideas.• Determine where to divide a more complex paragraph with multiple topics in order to accomplish a specific rhetorical

purpose.• Order paragraphs in a logical sequence to improve cohesion and accomplish a specific rhetorical purpose.


Demonstrate effective language use through ensuring precision and concision in word choice, maintaining consistency in style and tone, and using language appropriate for the context.• Revise a redundant phrase to make the sentence or paragraph more concise.• Use words and phrases to precisely express ideas and relationships (e.g., subordinating conjunctions, conjunctive adverb).• Revise a word or phrase so that the sentence matches the tone of the essay (either too informal or too formal for a given

purpose).Conventions of Standard EnglishSentence Structure and Formation



students.• Correct common problems with misplaced modifiers.• Correct common problems with squinters and danglers.Fixing Fragments and Run-ons• Identify and correct sentence fragments when the fragments are more complex in structure.• Correct run-on sentences, including comma splices.Maintaining Consistency and Coherence• Drawing on the context of the paragraph, correct an inappropriate shift in pronoun number or person.• Correct inappropriate shifts in verb tense and aspect, including auxiliary verbs.• Correct inappropriate shifts in verb voice.

E.12

aPPENDICES

Grade 8 (continued)

Usage


Forming and Using Nouns and Pronouns• Correct common mistakes with pronoun-antecedent agreement.• Correct vague or ambiguous pronouns to improve local coherence.• Use appropriate pronoun case.Forming and Using Verbs• Correct more advanced mistakes with subject-verb agreement (e.g., compound subjects) in compound and complex

sentences.Forming and Using Modifiers• Form adjectives and adverbs; ensuring the correct degree (comparative, superlative) is used. • Make distinctions between different comparative constructions (e.g., fewer than, less than).• Determine whether to use the adjective or adverb form of modifiers in a sentence with pairs of modifiers (e.g., “she spoke

quietly but quickly”).Forming Possessives• Correctly form possessive adjectives in sentences with complicated structures.Observing Usage Conventions• Correctly use idiomatic language appropriate for 8th-grade school contexts.• Distinguish among words frequently confused by students at the 8th-grade levelPunctuation and Capitalization


Punctuating Sentences and Sentence Elements• Correct unnecessary punctuation from sentences with relatively complex structure.• Eliminate unnecessary punctuation that causes ambiguity from a sentence when the context is more complex.• Use commas to separate items in a series in a complex sentence.• Determine if a phrase is restrictive/nonrestrictive and use commas to set it off, if necessary.Punctuating Possessives• Form possessive singular and plural nouns and use them in appropriate contexts.

E.13

aPPENDICES

Early High School (9-10)



Demonstrate an understanding of and control over the rhetorical aspects of texts by identifying the purpose of parts of texts, determining whether a text has met its intended goal, and evaluating the relevance of material in terms of a text’s focus.• Describe how deleting a supporting detail changes the meaning of a complex sentence and the overall development of the

essay as a whole.• Decide whether information should be deleted from an illustrative sentence and explain the rationale for the decision.• Determine whether an essay has met a complex rhetorical purpose and provide examples to explain why.• Use a word, phrase, or sentence to accomplish a specific rhetorical purpose (e.g., express an idea with nuance or emotional

overtones).• Determine if content is relevant to the topic and focus of a paragraph and/or to the essay as a whole when the context is

more complex. Provide a reason to support the decision.Organization

Use various strategies to ensure that a text is logically organized, flows smoothly, and has an effective introduction and conclusion.• Organize sentences in a logical sequence within a long paragraph to accomplish a specific rhetorical purpose; ensure a

cohesive flow of ideas. • Add or move sentences in order to improve cohesion of the essay as a whole when the context is more complex.• Use logical ordering for phrases within a sentence.• Use an effective introductory, concluding or transition sentence to emphasize relationships within and between longer,

more complex paragraphs.• Use effective transition words or phrases to emphasize complex/subtle relationships among ideas in longer paragraphs

and across paragraphs.• Determine where to divide a complex paragraph with multiple topics in order to accomplish a specific rhetorical purpose.• Order paragraphs in a logical sequence to improve cohesion and accomplish a specific rhetorical purpose.


Demonstrate effective language use through ensuring precision and concision in word choice, maintaining consistency in style and tone, and using language appropriate for the context.• Revise a redundant phrase to make a sentence or paragraph more concise when the context is more complex.• Use words and phrases to precisely express ideas and relationships (e.g., subordinating conjunctions, conjunctive adverb).• Revise a word or phrase so that the sentence matches the tone of the essay (either too informal or too formal for a given

purpose).

E.14

aPPENDICES

Early High School (9-10) (continued)



Forming Sentences and Relating Clauses• Correct faulty coordination and subordination and parallelism in sentences at the appropriate level of complexity for early-

high-school students.• Correct common problems with misplaced modifiers.• Correct common problems with squinters and danglers.Fixing Fragments and Run-ons• Identify and correct sentence fragments when the fragments are more complex in structure.• Correct run-on sentences, including comma splices.Maintaining Consistency and Coherence• Ensure that verb mood is used consistently.• Drawing on the context of the paragraph or preceding paragraphs, correct an inappropriate shift in pronoun number or

person.• Correct inappropriate shifts in verb tense and aspect, including auxiliary verbs.• Correct inappropriate shifts in verb voice.Usage


Forming and Using Nouns and Pronouns• Correct common mistakes with pronoun-antecedent agreement.• Correct vague or ambiguous pronouns to improve coherence within the essay.• Use appropriate pronoun case.Forming and Using Verbs• Correct more advanced mistakes with subject-verb agreement (e.g., sentences with a number of interceding elements) in

compound and complex sentences.Forming Possessives• Correctly form possessive adjectives in sentences with complicated structures.Observing Usage Conventions• Correctly use idiomatic language appropriate for early-high-school contexts.• Distinguish among words frequently confused by students at the early-high-school level.Punctuation and Capitalization


Punctuating Sentences and Sentence Elements• Correct unnecessary punctuation from longer sentences with a complex structures.• Eliminate unnecessary punctuation that causes ambiguity from a sentence when the context is more complex.• Use commas to separate items in a series in a complex sentence.• Determine if a phrase is restrictive/nonrestrictive and use commas to set it off, if necessary.Punctuating Possessives• Form possessive singular and plural nouns and use them in appropriate contexts.

F.1

aPPENDICES

Appendix F: ACT Aspire ELA Content Framework and Alignment: Reading

Basic StraightforwardSomewhat Challenging

More Challenging Complex Highly Complex

aCt4 passages: 1 Literary + 3 Informational

Average Word Count: 500, CCSS FK Range: 10.34-14.2

Early High School (grades 9-10)3 passages: 1 Literary + 2 Informational


grade 83 passages: 1 Literary + 2 Informational


grade 73 passages: 1 Literary + 2

Informational



Informational


grade 54 passages: 2 Literary + 2 Informational



Informational



Informational


F.2

aPPENDICES

Grade 3

4 passages: 2 Literary + 2 Informational


Key Ideas and DetailsClose Reading

Locate and interpret details, paraphrase information and ideas accurately, and draw logical conclusions based on textual evidence. • Demonstrate understanding of basic and straightforward texts, using the text as the basis for the answersCentral Ideas, Themes & Summaries

Identify and infer whole-text-level central (main) ideas and themes, identify and infer central ideas of defined parts of texts, and summarize information and ideas accurately. • Determine the main ideas of basic and straightforward texts and identify key details to support them• Summarize basic and straightforward texts• Summarize stories with plots in chronological order and simple, external conflicts• Determine the central message, lesson, or moral of a story and identify key details to support them Relationships

Identify and infer sequences, comparative relationships (comparisons and contrasts), and cause-effect relationships in texts. • Describe the relationship between a series of historical events, scientific ideas or concepts, or steps in a process• Describe and/or compare the traits, motivations, or feelings of a character• Explain how characters’ actions contribute to the sequence of events

Craft and StructureWord Meanings & Word Choice

Determine the meaning of words and phrases and to analyze word choice rhetorically. • Determine the meaning of general academic and domain-specific words and phrases in basic and straightforward texts

relevant to a Grade 3 topic or subject area• Determine the meaning of words and phrases as they are used in basic and straightforward stories• Determine the meaning of simple, non-literal language and literary devices, such as simple similes and metaphorsText Structure

Analyze text structure rhetorically at the whole-text and partial-text levels. • Use text features to identify the purpose and function of overall and specific text portions• Describe how parts of a story affect its overall theme, moral, or lessonPurpose & Point of View

Understand the purpose or purposes for which texts are written, understand point of view, evaluate the choice of narration in literary texts, and analyze bias, stereotypes, and persuasive techniques in texts. • Distinguish author’s point of view from one’s own point of view• Distinguish a narrator’s point of view from the point of view of an author or other characters’ point of view

F.3

aPPENDICES

Grade 3 (continued)

Integration of Knowledge and IdeasArguments

Analyze arguments for their claims, counterclaims, reasons, evidence, and soundness of reasoning.• Distinguish between facts and opinions • Describe and identify logical connections between particular sentences and paragraphs of a textMultiple Texts

Make connections between two or more texts in the same medium and format.• Compare and contrast the most important points and key details presented in two texts on the same topic• Compare and contrast the themes, settings, and plots of multiple stories

basic Straightforward

BASIC texts will have simple purposes and use primarily short and basic sentences. Vocabulary will mostly be familiar and contemporary, with some polysyllabic words and domain-specific language. Most objects and events will be concrete.Ideas and concepts will be few, clear, and often repeated.

Informational texts will have clear focus on one subject and use a simple structure with basic transitions.

Literary texts will be primarily literal, with plots in chronological order and simple, usually external, conflicts. These texts will have a single narrator and generally flat characters. These texts may include a few instances of simple non-literal language and literary devices such as simple similes and metaphors.

STRAIGHTFORWARD texts will have clear purpose and organization with explicit transitions and include a variety of sentence structures. Vocabulary may include some general academic or uncommon words and phrases. These texts may require inferential reading and contain some abstract concepts and diverse perspectives.

Informational texts will exhibit features of the field of study (e.g. science), including some domain-specific language and assumption of broad content knowledge.

Literary texts will have a single narrator and conflicts that are straightforward and usually external. Plots may deviate from chronological order with simple, heavily signaled flash-backs or flash-forwards. These texts may include some non-literal language and simple literary devices. Major and minor characters may show some depth and the text will offer basic insights into characters, people, situations, and events (e.g., simple motives).

F.4

aPPENDICES

Grade 4




Locate and interpret details, paraphrase information and ideas accurately, and draw logical conclusions based on textual evidence.• Refer to details and examples in basic and straightforward texts when explaining what the text says explicitly and when

drawing inferences from the textCentral Ideas, Themes & Summaries

Identify and infer whole-text-level central (main) ideas and themes, identify and infer central ideas of defined parts of texts, and summarize information and ideas accurately. • Determine the main idea and themes of a basic or straight forward text and explain how it is supported by key details• Summarize the main ideas and themes of basic and straightforward textsRelationships

Identify and infer sequences, comparative relationships (comparisons and contrasts), and cause-effect relationships in texts. • Use specific details from a story, such as a characters’ words, thoughts, and actions, to describe a character, setting, or event• Draw comparisons between ideas, characters, problems in stories• Explain events, procedures, ideas, or concepts in a historical, scientific, or technical text, including what happened and why,

based on specific information in the text


Determine the meaning of words and phrases and to analyze word choice rhetorically. • Determine the meaning of general academic and domain-specific words and phrases in texts relevant to a Grade 4 topic

or subject area• Analyze how word choice affects meanings in storiesText Structure

Analyze text structure rhetorically at the whole-text and partial-text levels. • Describe the purpose or function of structures within a story• Describe the purpose or function of paragraphs in an informational text as they relate to chronology, comparison, and

cause/effectPurpose & Point of View

Understand the purpose or purposes for which texts are written, understand point of view, evaluate the choice of narration in literary texts, and analyze bias, stereotypes, and persuasive techniques in texts. • Identify the point of view from which different stories are narrated, including the difference between first-and third person

narrations• Determine the overall purpose of a text• Identify bias, stereotypes, unstated assumptions, and persuasive techniques

F.5

aPPENDICES

Grade 4 (continued)


Analyze arguments for their claims, counterclaims, reasons, evidence, and soundness of reasoning. • Distinguish between facts and opinions and identify how they are used in a text• Explain how an author uses reasons and evidence to support particular points in a text.Multiple Texts (SYN)

Make connections between two or more texts in the same medium and format. • Integrate information from two texts on the same topic to provide more knowledge about the topic• Compare and contrast the treatment of similar topics, and patterns of events in stories

basic Straightforward

BASIC texts will have simple purposes and use primarily short and basic sentences. Vocabulary will mostly be familiar and contemporary, with some polysyllabic words and domain-specific language. Most objects and events will be concrete. Ideas and concepts will be few, clear, and often repeated.






F.6

aPPENDICES

Grade 5




Locate and interpret details, paraphrase information and ideas accurately, and draw logical conclusions based on textual evidence. • Use accurately quoted or paraphrased evidence from basic, straightforward, and somewhat challenging texts to draw

inferences• Use accurately quoted or paraphrased details from basic, straightforward, and somewhat challenging texts to support

conclusionsCentral Ideas, Themes & Summaries

Identify and infer whole-text-level central (main) ideas and themes, identify and infer central ideas of defined parts of texts, and summarize information and ideas accurately. • Explain how the main idea of a basic, straightforward, or somewhat challenging text are supported by key details• Determine a theme of a story from details in the text, including how the characters in a story respond to challenges and/or

reflect upon a topicRelationships

Identify and infer sequences, comparative relationships (comparisons and contrasts), and cause-effect relationships in texts.• Explain the relationships or interactions between two or more individuals, events, ideas, or concepts in a historical, scientific, or

technical text based on specific information in the text• Compare and contrast two or more characters, settings, or events in a story, drawing on specific details in the text


Determine the meaning of words and phrases and to analyze word choice rhetorically.• Determine the meaning or general academic and domain-specific words and phrases in a text relevant to a grade 5 topic or

subject area• Determine the meaning of words and phrases as they are used in a text, including figurative language such as metaphors

and similesText Structure

Analyze text structure rhetorically at the whole-text and partial-text levels.• Demonstrate an understanding of different text structures, such as chronology, comparison, cause/effect, problem solution• Explain how the purpose and function of parts of a story in relationship to the overall theme, message, or outcome of the

storyPurpose & Point of View

Understand the purpose or purposes for which texts are written, understand point of view, evaluate the choice of narration in literary texts, and analyze bias, stereotypes, and persuasive techniques in texts.• Recognize how point of view influences the information provided about a topic • Describe how a narrator’s point of view influences how events are described

F.7

aPPENDICES

Grade 5 (continued)


Analyze arguments for their claims, counterclaims, reasons, evidence, and soundness of reasoning. • Distinguish between facts and opinions and identify how they are used in a text• Explain how an author uses reasons and evidence to support particular points in a text, identifying which reasons and

evidence support which pointsMultiple Texts

Make connections between two or more texts in the same medium and format. • Synthesize information from two or more texts on the same topic to offer new ideas about that topic• Compare and contrast stories in the same genre on their approaches to similar themes and topics

basic Straightforward Somewhat Challenging

BASIC texts will have simple purposes and use primarily short and basic sentences. Vocabulary will mostly be familiar and contemporary, with some polysyllabic words and domain-specific language. Most objects and events will be concrete. Ideas and concepts will be few, clear, and often repeated.






SOMEWHAT CHALLENGING texts will have clear purpose and organization, with variety in sentence styles and transitions. Vocabulary may include some general academic or uncommon words and phrases. These texts will require inferential reading and contain abstract concepts as well as experiences and perspectives that may be different that those of the reader.

Informational texts will exhibit features of the general discipline (e.g. natural science), including domain-specific language and assumption of everyday and content knowledge appropriate to the grade assessed.

Literary texts may have multiple narrators and non-linear plots, with shifts in point of view or time clearly signaled. Characters will be dynamic and experience uncomplicated internal or external conflicts. These texts may include non-literal language and literary devices such as symbolism or irony. Themes and subject matter may be challenging, offering insights into people, situations and events.

F.8

aPPENDICES

Grade 6

3 passages: 1 Literary + 2 Informational,



Locate and interpret details, paraphrase information and ideas accurately, and draw logical conclusions based on textual evidence. • Use textual evidence from a basic, straightforward, or somewhat challenging text to support analysis of what the text says

explicitly as well as inferences drawn from the textCentral Ideas, Themes & Summaries

Identify and infer whole-text-level central (main) ideas and themes, identify and infer central ideas of defined parts of texts, and summarize information and ideas accurately. • Determine a central idea of a basic, straightforward, or somewhat challenging text and identify how this idea is

communicated through strategically selected evidence• Determine a central theme of a basic, straightforward, or somewhat challenging text and identify how this theme is

communicated through strategically selected details• Summarize a basic, straightforward, or somewhat challenging text based only on ideas from the text rather than from prior

knowledge or personal opinionRelationships

Identify and infer sequences, comparative relationships (comparisons and contrasts), and cause-effect relationships in texts.• Analyze how comparison, cause and effect, and sequence are used to introduce, illustrate, or elaborate upon a key event,

idea or individual• Describe how a story’s plot unfolds in a series of events• Describe how characters’ respond or change as a plot moves toward a resolution


Determine the meaning of words and phrases and to analyze word choice rhetorically. • Determine figurative and connotative meaning• Analyze how word choice influences meaning and toneText Structure (TST)

Analyze text structure rhetorically at the whole-text and partial-text levels.• Analyze how a particular sentence or paragraph fits into the overall structure of a text and contributes to the development

of ideas• Analyze how a particular story event or scene fits into the overall development of a theme, setting, or plotPurpose & Point of View

Assess how point of view or purpose shapes the content and style of a text. (CCSS R.6)• Determine how an author’s point of view or purpose is conveyed in a text• Explain how an author develops the point of view of a narrator in a text


Analyze arguments for their claims, counterclaims, reasons, evidence, and soundness of reasoning. • Distinguish between facts and opinions and identify how they are used in a text• Trace and evaluate the argument and specific claims in a text, distinguishing claims that are supported by reasons and

evidence from claims that are not

F.9

aPPENDICES

Grade 6 (continued)

Multiple Texts

Make connections between two or more texts in the same medium and format. • Compare and contrast one author’s presentation of information with that of another’s• Compare and contrast texts in different forms or genres in terms of their approaches to similar themes and topics

Straightforward Somewhat Challenging







Grade 7




Locate and interpret details, paraphrase information and ideas accurately, and draw logical conclusions based on textual evidence.• Use textual evidence from straightforward and somewhat challenging texts to support analysis of what the text says

explicitly as well as inferences drawn from the textCentral Ideas, Themes & Summaries

Identify and infer whole-text-level central (main) ideas and themes, identify and infer central ideas of defined parts of texts, and summarize information and ideas accurately. • Determine a central idea in a straightforward or somewhat challenging text and analyze its development over the course of

the text• Determine a central theme in a straightforward or somewhat challenging text and analyze its development over the course

of the text• Provide an objective summary of a straightforward or somewhat challenging text Relationships

Identify and infer sequences, comparative relationships (comparisons and contrasts), and cause-effect relationships in texts.• Analyze the interactions between individuals, events, and ideas in a text

F.10

aPPENDICES

Grade 7 (continued)


Determine the meaning of words and phrases and to analyze word choice rhetorically. • Determine the meaning of words and phrases as they are used in a text, including figurative, connotative, and technical

meanings• Analyze the impact of a specific word choice on meaning and toneText Structure

Analyze text structure rhetorically at the whole-text and partial-text levels. • Analyze the structure an author uses to organize a text, including how the major sections contribute to the whole and to

the development of the ideasPurpose & Point of View

Understand the purpose or purposes for which texts are written, understand point of view, evaluate the choice of narration in literary texts, and analyze bias, stereotypes, and persuasive techniques in texts.• Determine an author’s point of view and how the author distinguishes his or her position from that of others• Determine an author’s purpose • Analyze how an author develops and contrasts the points of view of different characters or narrators in a text


Analyze arguments for their claims, counterclaims, reasons, evidence, and soundness of reasoning. • Trace and evaluate the argument and specific claims in a text• Assess whether evidence is relevant and sufficient to support the claimsMultiple Texts

Make connections between two or more texts in the same medium and format. • Analyze how two authors writing about the same topic hone their presentations of key information by emphasizing

different evidence or advancing different interpretations of facts• Compare and contrast two fictional portrayals of a time, place, or character

Straightforward Somewhat ChallengingSTRAIGHTFORWARD texts will have clear purpose and organization with explicit transitions and include a variety of sentence structures. Vocabulary may include some general academic or uncommon words and phrases. These texts may require inferential reading and contain some abstract concepts and diverse perspectives.






F.11

aPPENDICES

Grade 8




Locate and interpret details, paraphrase information and ideas accurately, and draw logical conclusions based on textual evidence.• Cite the textual evidence from somewhat challenging and more challenging texts that most strongly supports an analysis

of what the text says explicitly as well as inferences drawn from the textCentral Ideas, Themes & Summaries

Identify and infer whole-text-level central (main) ideas and themes, identify and infer central ideas of defined parts of texts, and summarize information and ideas accurately. • Determine a central idea of a somewhat challenging or more challenging text and analyze its development over the course

of the text, including its relationship to supporting ideas• Determine a theme of a somewhat challenging or more challenging text and analyze its development over the course of

the text, including its relationship to the characters, setting, and plot• Provide an objective summary of a somewhat challenging or more challenging textRelationships

Identify and infer sequences, comparative relationships (comparisons and contrasts), and cause-effect relationships in texts.• Analyze how a text makes connections among and distinctions between individuals, ideas, or events• Analyze how particular lines of dialogue or incidents in a story cause other story events, reveal traits of a character, or

influence a decisio


Determine the meaning of words and phrases and to analyze word choice rhetorically.• Determine the meaning of words and phrases as they are used in a text, including figurative, connotative, and technical

meanings• Analyze the impact of specific word choices on meaning and tone, including analogies or allusions to other textsText Structure

Analyze text structure rhetorically at the whole-text and partial-text levels. • Analyze the structure of a specific paragraph in a text, including the role of particular sentences in developing and clarifying

a key concept• Compare and contrast the structure of two texts and analyze how the differing structure of each text contributes to its

meaning and stylePurpose & Point of View

Understand the purpose or purposes for which texts are written, understand point of view, evaluate the choice of narration in literary texts, and analyze bias, stereotypes, and persuasive techniques in texts.• Determine an author’s point of view or purpose in a text• Analyze how the author acknowledges and responds to conflicting evidence or viewpoints• Analyze how differences in the points of view of the characters create such effects as suspense or humor

F.12

aPPENDICES

Grade 8 (continued)


Analyze arguments for their claims, counterclaims, reasons, evidence, and soundness of reasoning. • Evaluate the argument and specific claims in a text• Assess whether evidence to support an argument or claim is relevant and sufficientMultiple Texts

Make connections between two or more texts in the same medium and format.• Analyze how two or more texts provide contrasting information and identify how the new information influences each text• Analyze the ways in which two or more texts provide archetypal characters and how the presentations of these archetypes

may differ

Straightforward Somewhat Challenging more Challenging







MORE CHALLENGING texts may have multi-faceted purpose and organization with varied and often complicated sentence structures. Vocabulary will include general academic and uncommon words and phrases. These texts may depict several abstract ideas and concepts and require students to read at interpretive and evaluative levels.

Informational texts will exhibit features of the general discipline (e.g. natural science), including domain-specific language and assumption of discipline-specific content knowledge.

Literary texts may have multiple narrators and well-rounded characters who may experience subtle internal or external conflicts. Plots may be non-linear or parallel and may lack resolution. Characters are. These texts include non-literal language and literary devices such as symbolism or irony. Themes and subject matter will be challenging, offering deep insights into people, situations and events.

F.13

aPPENDICES

Early High School (Grades 9-10)

3 passages: Literary + 2 Informational



Locate and interpret details, paraphrase information and ideas accurately, and draw logical conclusions based on textual evidence.• Cite strong and thorough textual evidence from somewhat challenging, more challenging, or complex texts to support

analysis of what the text says explicitly as well as inferences drawn from the textCentral Ideas, Themes & Summaries

Identify and infer whole-text-level central (main) ideas and themes, identify and infer central ideas of defined parts of texts, and summarize information and ideas accurately.• Determine a central idea of a somewhat challenging, more challenging, or complex text and analyze its development over

the course of the text, including how it emerges and is shaped and refined by specific details• Determine a theme of a somewhat challenging, more challenging, or complex text and analyze its development over the

course of the text, including how it emerges and is shaped and refined by specific details• Provide an objective summary of a somewhat challenging, more challenging, or complex textRelationships

Identify and infer sequences, comparative relationships (comparisons and contrasts), and cause-effect relationships in texts.• Analyze how the author unfolds an analysis or series of ideas or events, including the order in which the points are made, how

they are introduced and developed, and the connections that are drawn between them• Analyze how complex characters develop over the course of a text, interact with other characters, and advance the plot or

develop the theme


Determine the meaning of words and phrases and to analyze word choice rhetorically.• Determine the meaning of words and phrases as they are used in a text, including figurative, connotative, and technical

meanings• Analyze the cumulative impact of specific word choice on meaning and tone• Analyze how language evokes a sense of time and place • Analyze how language sets a formal or informal toneText Structure

Analyze text structure rhetorically at the whole-text and partial-text levels.• Analyze how an author’s ideas or claims are developed and refined by particular sentences or paragraphs• Analyze how an author’s choices concerning how to order events and manipulate time create such effects as mystery,

tension, or surprisePurpose & Point of View

Understand the purpose or purposes for which texts are written, understand point of view, evaluate the choice of narration in literary texts, and analyze bias, stereotypes, and persuasive techniques in texts.• Determine an author’s point of view and analyze how an author’s use of language advances that point of view• Determine an author’s purpose and analyze how an author’s use of language advances that point of view• Analyze how a cultural experience is reflected in a work of literature

F.14

aPPENDICES

Early High School (Grades 9-10) (continued)


Analyze arguments for their claims, counterclaims, reasons, evidence, and soundness of reasoning.• Evaluate the argument and specific claims in a text, assessing whether the evidence is relevant and sufficient• Identify false statements and fallacious reasoningMultiple Texts

Make connections between two or more texts in the same medium and format.• Compare the themes and concepts of two texts of historical and/or literary significance

Somewhat Challenging more Challenging Complex




MORE CHALLENGING texts may have multi-faceted purpose and organization with varied and often complicated sentence structures. Vocabulary will include general academic and uncommon words and phrases. These texts may depict several abstract ideas and concepts and require students to read at interpretive and evaluative levels.

Informational texts will exhibit features of the general discipline (e.g. natural science), including domain-specific language and assumption of discipline-specific content knowledge.

Literary texts may have multiple narrators and well-rounded characters who may experience subtle internal or external conflicts. Plots may be non-linear or parallel and may lack resolution. Characters are. These texts include non-literal language and literary devices such as symbolism or irony. Themes and subject matter will be challenging, offering deep insights into people, situations and events.

COMPLEX texts may have complex structure and purpose that might not be evident upon first reading. Vocabulary and syntax will be sophisticated and may include words or structures uncommon in contemporary texts. Numerous abstract ideas and concepts will require students to read at interpretive and evaluative levels. In these texts, students may encounter experiences, values, and perspectives different than their own.

Informational texts will have high concept density and consistently use domain-specific and general academic vocabulary.

Literary texts may have multiple and/or unreliable narrators and well-rounded characters with complex conflicts that lack easy resolution. These texts may use unconventional language structures (e.g. stream of consciousness) and challenging literary devices (e.g. extended metaphors, satire, parody). Themes and subject matter may be mature, offering profound insights or philosophical reflection.

F.15

aPPENDICES

ACT Aspire Writing – Reflective Narrative Rubric Grade 3

Reflective Narrative Development Organization Language Use

Score: 5 The response engages with the task, and presents a capable reflective narrative. The narrative conveys the significance of the event through thoughtful reflection on the experience and on the meaning of the experience. There is purposeful movement between specific and generalized ideas.

The narrative is capably developed through purposeful conveyance of action, sensory details, and/or character. Reflection on experience and meaning is supported through apt description and/or explanation. Details enhance the story and help to convey its significance.

The response exhibits a purposeful organizational structure, with some logical progression within the story. Transitions within the response clarify the relationships among elements of the reflective narrative.

The response demonstrates the ability to capably convey meaning with clarity. Word choice is usually precise. Sentence structures are clear and often varied. Voice and tone are appropriate for the narrative purpose and are maintained throughout most of the response. While errors in grammar, usage, and mechanics may be present, they do not impede understanding.

Responses at this scorepoint demonstrate capable skill in writing a reflective narrative.

Score: 4 The response is appropriate to the task, and presents an adequate reflective narrative. The narrative demonstrates recognition of the significance of the event through reflection on the experience and/or on the meaning of the experience. Connections between specific and generalized ideas are mostly clear.

The narrative is adequately developed through conveyance of action, sensory details, and/or character. Reflection on experience and/or meaning is mostly supported through description and explanation. Details may enhance the story and help to convey its significance.

The response exhibits a clear organizational structure, with a discernable logic to the story. Transitions within the response clarify relationships among the elements of the reflective narrative.

The response demonstrates the ability to clearly convey meaning. Word choice is sometimes precise. Sentence structures are occasionally varied and usually clear. Voice and tone are appropriate for the narrative purpose, but may be inconsistently maintained. While errors in grammar, usage, and mechanics are present, they rarely impede understanding.

Responses at this scorepoint demonstrate adequate skill in writing a reflective narrative.

Score: 3 The response demonstrates a limited understanding of the task, and presents a somewhat appropriate reflective narrative. Reflection on the experience or on the meaning of the experience is limited or only somewhat relevant. Specific and generalized ideas are only somewhat connected.

The narrative is somewhat developed. There is some conveyance of action, sensory details, and/or character, but it may be limited or only somewhat relevant. Reflection on the experience and/or meaning is somewhat supported through description and explanation.

Organization is somewhat appropriate to the task, but may be simplistic or may digress at times. Transitions within the response sometimes clarify relationships among the elements of the reflective narrative.

The response demonstrates some ability to convey meaning. Word choice is general and occasionally imprecise. Sentence structures show little variety and are inconsistently clear. Voice and tone are somewhat appropriate for the narrative purpose but are inconsistently maintained. Distracting errors in grammar, usage, and mechanics are present, and they sometimes impede understanding.

Responses at this scorepoint demonstrate some developing skill in writing a reflective narrative.

F.16

aPPENDICES

ACT Aspire Writing – Reflective Narrative Rubric Grade 3 (continued)


Score: 2 The response demonstrates a rudimentary understanding of the task, with weak or inconsistent skill in generating a reflective narrative. Reflection on the experience or on the meaning of the experience is unclear or incomplete, or may be irrelevant. If present, connections between specific and generalized ideas are weak or inconsistent.

Development is weak. Elements of the story are reported rather than described. Reflection on the experience and/or meaning through description or explanation is weak, inconsistent, or not clearly relevant.

Organization is rudimentary. The logic of the story may be unclear. Transitions within the response are often misleading or poorly formed.

The response demonstrates a weak ability to convey meaning. Word choice is rudimentary and vague. Sentence structures are often unclear. Voice and tone may not be appropriate for the narrative purpose. Distracting errors in grammar, usage, and mechanics are present, and they impede understanding.

Responses at this scorepoint demonstrate weak or inconsistent skill in writing a reflective narrative.

Score: 1 The response demonstrates little or no understanding of the task, with virtually no narrative, and/or virtually no reflection on the experience or its meaning.

The response is virtually undeveloped, with little or no action, sensory detail, or character, and little or no reflection.

The response shows virtually no evidence of organization. Transitional devices may be present, but they fail to relate elements of the reflective narrative.

The response demonstrates little or no ability to convey meaning. Word choice is imprecise, making ideas difficult to comprehend. Sentence structures are mostly unclear. Voice and tone are not appropriate for the narrative purpose. Errors in grammar, usage, and mechanics are pervasive and significantly impede understanding.

Responses at this scorepoint demonstrate little or no skill in writing a reflective narrative.

unscorable The response is blank, voided, off-topic, illegible, or not written in English.

F.17

aPPENDICES

ACT Aspire Writing – Analytical Expository Rubric Grade 4

Analysis Development Organization Language Use





Responses at this scorepoint demonstrate capable skill in writing an analytical essay.





Responses at this scorepoint demonstrate adequate skill in writing an analytical essay.





Responses at this scorepoint demonstrate some developing skill in writing an analytical essay.

F.18

aPPENDICES

ACT Aspire Writing – Analytical Expository Rubric Grade 4 (continued)






Responses at this scorepoint demonstrate weak or inconsistent skill in writing an analytical essay.





Responses at this scorepoint demonstrate little or no skill in writing an analytical essay.


F.19

aPPENDICES

ACT Aspire Writing – Persuasive Argument Rubric Grade 5

Argument Development Organization Language Use

Score: 5 The response engages with the task, and presents a thoughtful argument driven by apt reasons. The response addresses implications, complications, and/or counterarguments. There is purposeful movement between specific and generalized ideas.

Ideas are capably explained and supported, with purposeful use of reasoning and/or detailed examples. The writer’s claims and specific support are sometimes integrated.

The response exhibits a purposeful organizational strategy. A logical sequencing of ideas contributes to the effectiveness of the writer’s argument. Transitions between and within paragraphs clarify the relationships among ideas.

The response demonstrates the ability to capably convey meaning. Word choice is usually precise. Sentence structures are clear and often varied. Voice and tone are appropriate for the persuasive purpose and are maintained throughout most of the response. While errors in grammar, usage, and mechanics may be present, they do not impede understanding.

Responses at this scorepoint demonstrate capable skill in writing a persuasive argumentative essay.

Score: 4 The response is appropriate to the task, and presents a clear argument, with satisfactory reasons for the position. The response demonstrates recognition of implications, complications, and/or counterarguments. There is some movement between specific and generalized ideas.

Ideas are adequately explained and supported, with satisfactory use of reasoning and/or detailed examples. The writer’s claims and specific support may be integrated.

The response exhibits a clear but simple organizational structure. Ideas are logically grouped. Transitions between and within paragraphs clarify the relationships among ideas.

The response demonstrates the ability to clearly convey meaning. Word choice is sometimes precise. Sentence structures are occasionally varied and usually clear. Voice and tone are appropriate for the persuasive purpose, but may be inconsistently maintained. While errors in grammar, usage, and mechanics are present, they rarely impede understanding.

Responses at this scorepoint demonstrate adequate skill in writing a persuasive argumentative essay

Score: 3 The response is somewhat appropriate to the task, and presents a somewhat clear argument with a vague or oversimplified position. Reasons for the position are somewhat appropriate and/or somewhat relevant. Implications, complications, and counterarguments are oversimplified or not clearly relevant to the purpose. Specific and generalized ideas may be only somewhat connected.

Explanation and support of ideas are limited, but include some use of reasoning and/or examples.

The response exhibits some evidence of organizational structure. Some ideas are logically grouped. Transitions between and within paragraphs sometimes clarify the relationships among ideas.

The response demonstrates some developing ability to convey meaning. Word choice is general and occasionally imprecise. Sentence structures show little variety and are sometimes unclear. Voice and tone are somewhat appropriate for the persuasive purpose but are inconsistently maintained. Distracting errors in grammar, usage, and mechanics are present, and they sometimes impede understanding.

Responses at this scorepoint demonstrate some developing skill in writing a persuasive argumentative essay.

F.20

aPPENDICES

ACT Aspire Writing – Persuasive Argument Rubric Grade 5 (continued)

Argument Development Organization Language Use

Score: 2 The response demonstrates a rudimentary understanding of the task. The position may be unclear. Reasons for the position are unclear, incomplete, or not clearly relevant. If present, implications, complications, or counterarguments are weak. Any connections between specific and generalized ideas are unclear, incomplete, or irrelevant..

Explanation and support of ideas are unclear or incomplete, with little use of reasoning and/or examples.

The response exhibits only a little evidence of organizational structure. Few ideas are logically grouped. Transitions between and within paragraphs are often missing, poorly formed, or misleading.

The response demonstrates a weak ability to convey meaning. Word choice is rudimentary and frequently imprecise. Sentence structures are often unclear. Voice and tone may not be appropriate for the persuasive purpose. Distracting errors in grammar, usage, and mechanics are present, and they impede understanding

Responses at this scorepoint demonstrate weak or inconsistent skill in writing a persuasive argumentative essay.

Score: 1 The response demonstrates little or no understanding of the task. If a position is taken, there are virtually no reasons for the position.

Ideas lack explanation and support, with virtually no use of reasoning or examples.

The response exhibits no evidence of organizational structure. Ideas are not logically grouped. Transitional devices may be present, but they fail to relate ideas.

The response demonstrates little or no ability to convey meaning. Word choice is imprecise and difficult to comprehend. Voice and tone are not appropriate for the persuasive purpose. Sentence structures are mostly unclear. Errors in grammar, usage, and mechanics are pervasive and significantly impede understanding.

Responses at this scorepoint demonstrate little or no skill in writing a persuasive argumentative essay.


F.21

aPPENDICES

ACT Aspire Writing – Reflective Narrative Rubric Grade 6


Score: 6 The response critically engages with the task, and presents an effective reflective narrative. The narrative conveys the significance of the event through insightful reflection on the experience and on the meaning of the experience. There is skillful movement between specific and generalized ideas.

The narrative is effectively developed through skillful conveyance of action, sensory details, and/or character. Reflection on experience and meaning is well supported through effective description and/or explanation. Details are integral to the story and its significance.

The response exhibits a skillful organizational strategy, with logical progression within the story. Transitions within the response consistently clarify the relationships among the elements of the reflective narrative.

The response demonstrates the ability to effectively convey meaning with clarity. Word choice is precise. Sentence structures are varied and clear. Voice and tone are appropriate for the narrative purpose and are maintained throughout the response. While a few errors in grammar, usage, and mechanics may be present, they do not impede understanding.

Responses at this scorepoint demonstrate effective skill in writing a reflective narrative.





Responses at this scorepoint demonstrate capable skill in writing a reflective narrative.





Responses at this scorepoint demonstrate adequate skill in writing a reflective narrative.

F.22

aPPENDICES

ACT Aspire Writing – Reflective Narrative Rubric Grade 6 (continued)






Responses at this scorepoint demonstrate some developing skill in writing a reflective narrative.





Responses at this scorepoint demonstrate weak or inconsistent skill in writing a reflective narrative.





Responses at this scorepoint demonstrate little or no skill in writing a reflective narrative.


F.23

aPPENDICES

ACT Aspire Writing – Analytical Expository Rubric Grade 7


Score: 6 The response critically engages with the task, and presents a complex analysis that addresses implications and complications of the subject. There is skillful movement between specific details and generalized ideas.

Ideas are thoroughly explained, with skillful use of supporting reasons and/or detailed examples. The writer’s claims and specific support are well integrated.

The response exhibits a skillful organizational strategy. A logical progression of ideas increases the effectiveness of the writer’s analysis. Transitions between and within paragraphs strengthen the relationships among ideas.

The response demonstrates the ability to effectively convey meaning with clarity. Word choice is precise. Sentence structures are varied and clear. Voice and tone are appropriate for the analytical purpose and are maintained throughout the response. While a few errors in grammar, usage, and mechanics may be present, they do not impede understanding.

Responses at this scorepoint demonstrate effective skill in writing an analytical essay.

Score: 5 The response engages with the task, and presents a thoughtful analysis that discusses implications and complications of the subject. There is purposeful movement between specific details and generalized ideas.

Ideas are capably explained, with purposeful use of supporting reasons and/or detailed examples. The writer’s claims and specific support are sometimes integrated.

The response exhibits a purposeful organizational strategy. A logical sequencing of ideas contributes to the effectiveness of the writer’s analysis. Transitions between and within paragraphs consistently clarify the relationships among ideas.

The response demonstrates the ability to capably convey meaning with clarity. Word choice is usually precise. Sentence structures are clear and often varied. Voice and tone are appropriate for the analytical purpose and are maintained throughout most of the response. While errors in grammar, usage, and mechanics may be present, they do not impede understanding.


Score: 4 The response is appropriate to the task, and presents an analysis that recognizes implications and complications of the subject. There is clear movement between specific details and generalized ideas.

Ideas are adequately explained, with satisfactory use of supporting reasons and/or examples.


The response demonstrates the ability to clearly convey meaning. Word choice is sometimes precise. Sentence structures are occasionally varied and usually clear. Voice and tone are appropriate for the analytical purpose, but may be inconsistently maintained. While errors in grammar, usage, and mechanics are present, they rarely impede understanding.


F.24

aPPENDICES

ACT Aspire Writing – Analytical Expository Rubric Grade 7 (continued)


Score: 3 The response is somewhat appropriate to the task, with an analysis that is oversimplified or imprecise. Implications or complications are only somewhat clear or relevant. Specific details and generalized ideas are somewhat connected.

Explanations of ideas are limited, but include some use of supporting reasons and/or relevant examples.


The response demonstrates some developing ability to convey meaning. Word choice is general and occasionally imprecise. Sentence structures show little variety and are sometimes unclear. Voice and tone are somewhat appropriate for the analytical purpose but are inconsistently maintained. Distracting errors in grammar, usage, and mechanics are present, and they sometimes impede understanding.


Score: 2 The response demonstrates a rudimentary understanding of the task, with weak or inconsistent skill in presenting an analysis. Implications or complications are not clearly relevant. Any connections between specific details and generalized ideas are unclear or incomplete.

Explanations of ideas are unclear or incomplete, with little use of supporting reasons or examples.

The response exhibits only a little evidence of organizational structure. Logical grouping of ideas is inconsistent or unclear. Transitions between and within paragraphs are often missing, misleading, or poorly formed.

The response demonstrates a weak ability to convey meaning. Word choice is rudimentary and frequently imprecise. Sentence structures are often unclear. Voice and tone may not be appropriate for the analytical purpose. Distracting errors in grammar, usage, and mechanics are present, and they impede understanding


Score: 1 The response demonstrates little or no understanding of the task. The response lacks connections between specific details and generalized ideas.

Ideas lack explanation, with virtually no use of supporting reasons or relevant examples.


The response demonstrates little or no ability to convey meaning. Word choice is imprecise and difficult to comprehend. Voice and tone are not appropriate for the analytical purpose. Sentence structures are mostly unclear. Errors in grammar, usage, and mechanics are pervasive and significantly impede understanding.



F.25

aPPENDICES

ACT Aspire Writing – Persuasive Argument Rubric Grade 8


Score: 6 The response critically engages with the task, and presents a skillful argument driven by insightful reasons. The response critically addresses implications, complications, and/or counterarguments. There is skillful movement between specific and generalized ideas.

Ideas are effectively explained and supported, with skillful use of reasoning and/or detailed examples. The writer’s claims and specific support are well integrated.

The response exhibits a skillful organizational strategy. A logical progression of ideas increases the effectiveness of the writer’s argument. Transitions between and within paragraphs strengthen the relationships among ideas.

The response demonstrates the ability to effectively convey meaning with clarity. Word choice is precise. Sentence structures are varied and clear. Voice and tone are appropriate for the persuasive purpose and are maintained throughout the response. While a few errors in grammar, usage, and mechanics may be present, they do not impede understanding.

Responses at this scorepoint demonstrate effective skill in writing a persuasive argumentative essay.

Score: 5 The response engages with the task, and presents a thoughtful argument driven by apt reasons. The response addresses implications, complications, and/or counterarguments. There is purposeful movement between specific and generalized ideas.

Ideas are capably explained and supported, with purposeful use of reasoning and/or detailed examples. The writer’s claims and specific support are sometimes integrated.

The response exhibits a purposeful organizational strategy. A logical sequencing of ideas contributes to the effectiveness of the writer’s argument. Transitions between and within paragraphs clarify the relationships among ideas.

The response demonstrates the ability to capably convey meaning. Word choice is usually precise. Sentence structures are clear and often varied. Voice and tone are appropriate for the persuasive purpose and are maintained throughout most of the response. While errors in grammar, usage, and mechanics may be present, they do not impede understanding.

Responses at this scorepoint demonstrate capable skill in writing a persuasive argumentative essay.

Score: 4 The response is appropriate to the task, and presents a clear argument, with satisfactory reasons for the position. The response demonstrates recognition of implications, complications, and/or counterarguments. There is some movement between specific and generalized ideas.

Ideas are adequately explained and supported, with satisfactory use of reasoning and/or detailed examples. The writer’s claims and specific support may be integrated.


The response demonstrates the ability to clearly convey meaning. Word choice is sometimes precise. Sentence structures are occasionally varied and usually clear. Voice and tone are appropriate for the persuasive purpose, but may be inconsistently maintained. While errors in grammar, usage, and mechanics are present, they rarely impede understanding.

Responses at this scorepoint demonstrate adequate skill in writing a persuasive argumentative essay.

F.26

aPPENDICES

ACT Aspire Writing – Persuasive Argument Rubric Grade 8 (continued)


Score: 3 The response is somewhat appropriate to the task, and presents a somewhat clear argument with a vague or oversimplified position. Reasons for the position are somewhat appropriate and/or somewhat relevant. Implications, complications, and counterarguments are oversimplified or not clearly relevant to the purpose. Specific and generalized ideas may be only somewhat connected.

Explanation and support of ideas are limited, but include some use of reasoning and/or examples.


The response demonstrates some developing ability to convey meaning. Word choice is general and occasionally imprecise. Sentence structures show little variety and are sometimes unclear. Voice and tone are somewhat appropriate for the persuasive purpose but are inconsistently maintained. Distracting errors in grammar, usage, and mechanics are present, and they sometimes impede understanding.

Responses at this scorepoint demonstrate some developing skill in writing a persuasive argumentative essay.

Score: 2 The response demonstrates a rudimentary understanding of the task. The position may be unclear. Reasons for the position are unclear, incomplete, or not clearly relevant. If present, implications, complications, or counterarguments are weak. Any connections between specific and generalized ideas are unclear, incomplete, or irrelevant.

Explanation and support of ideas are unclear or incomplete, with little use of reasoning and/or examples.

The response exhibits only a little evidence of organizational structure. Few ideas are logically grouped. Transitions between and within paragraphs are often missing, poorly formed, or misleading.

The response demonstrates a weak ability to convey meaning. Word choice is rudimentary and frequently imprecise. Sentence structures are often unclear. Voice and tone may not be appropriate for the persuasive purpose. Distracting errors in grammar, usage, and mechanics are present, and they impede understanding

Responses at this scorepoint demonstrate weak or inconsistent skill in writing a persuasive argumentative essay.

Score: 1 The response demonstrates little or no understanding of the task. If a position is taken, there are virtually no reasons for the position.

Ideas lack explanation and support, with virtually no use of reasoning or examples.


The response demonstrates little or no ability to convey meaning. Word choice is imprecise and difficult to comprehend. Voice and tone are not appropriate for the persuasive purpose. Sentence structures are mostly unclear. Errors in grammar, usage, and mechanics are pervasive and significantly impede understanding.

Responses at this scorepoint demonstrate little or no skill in writing a persuasive argumentative essay.


F.27

aPPENDICES

ACT Aspire Writing – Analytical Expository Rubric Early High School


Score: 6 The response critically engages with the task, and presents a complex analysis that addresses implications and complications of the subject. There is skillful movement between specific details and generalized ideas.

Ideas are thoroughly explained, with skillful use of supporting reasons and/or detailed examples. The writer’s claims and specific support are well integrated.

The response exhibits a skillful organizational strategy. A logical progression of ideas increases the effectiveness of the writer’s analysis. Transitions between and within paragraphs strengthen the relationships among ideas.

The response demonstrates the ability to effectively convey meaning with clarity. Word choice is precise. Sentence structures are varied and clear. Voice and tone are appropriate for the analytical purpose and are maintained throughout the response. While a few errors in grammar, usage, and mechanics may be present, they do not impede understanding.

Responses at this scorepoint demonstrate effective skill in writing an analytical essay.

Score: 5 The response engages with the task, and presents a thoughtful analysis that discusses implications and complications of the subject. There is purposeful movement between specific details and generalized ideas.

Ideas are capably explained, with purposeful use of supporting reasons and/or detailed examples. The writer’s claims and specific support are sometimes integrated.

The response exhibits a purposeful organizational strategy. A logical sequencing of ideas contributes to the effectiveness of the writer’s analysis. Transitions between and within paragraphs consistently clarify the relationships among ideas.

The response demonstrates the ability to capably convey meaning with clarity. Word choice is usually precise. Sentence structures are clear and often varied. Voice and tone are appropriate for the analytical purpose and are maintained throughout most of the response. While errors in grammar, usage, and mechanics may be present, they do not impede understanding.


Score: 4 The response is appropriate to the task, and presents an analysis that recognizes implications and complications of the subject. There is clear movement between specific details and generalized ideas.

Ideas are adequately explained, with satisfactory use of supporting reasons and/or examples.


The response demonstrates the ability to clearly convey meaning. Word choice is sometimes precise. Sentence structures are occasionally varied and usually clear. Voice and tone are appropriate for the analytical purpose, but may be inconsistently maintained. While errors in grammar, usage, and mechanics are present, they rarely impede understanding.


F.28

aPPENDICES

ACT Aspire Writing –Analytical Expository Rubric Early High School (continued)


Score: 3 The response is somewhat appropriate to the task, with an analysis that is oversimplified or imprecise. Implications or complications are only somewhat clear or relevant. Specific details and generalized ideas are somewhat connected.

Explanations of ideas are limited, but include some use of supporting reasons and/or relevant examples.


The response demonstrates some developing ability to convey meaning. Word choice is general and occasionally imprecise. Sentence structures show little variety and are sometimes unclear. Voice and tone are somewhat appropriate for the analytical purpose but are inconsistently maintained. Distracting errors in grammar, usage, and mechanics are present, and they sometimes impede understanding.


Score: 2 The response demonstrates a rudimentary understanding of the task, with weak or inconsistent skill in presenting an analysis. Implications or complications are not clearly relevant. Any connections between specific details and generalized ideas are unclear or incomplete.

Explanations of ideas are unclear or incomplete, with little use of supporting reasons or examples.

The response exhibits only a little evidence of organizational structure. Logical grouping of ideas is inconsistent or unclear. Transitions between and within paragraphs are often missing, misleading, or poorly formed.

The response demonstrates a weak ability to convey meaning. Word choice is rudimentary and frequently imprecise. Sentence structures are often unclear. Voice and tone may not be appropriate for the analytical purpose. Distracting errors in grammar, usage, and mechanics are present, and they impede understanding


Score: 1 The response demonstrates little or no understanding of the task. The response lacks connections between specific details and generalized ideas.

Ideas lack explanation, with virtually no use of supporting reasons or relevant examples.


The response demonstrates little or no ability to convey meaning. Word choice is imprecise and difficult to comprehend. Voice and tone are not appropriate for the analytical purpose. Sentence structures are mostly unclear. Errors in grammar, usage, and mechanics are pervasive and significantly impede understanding.



R.1

ReferencesAbedi, J. (2008). Linguistic modification: Part I: Language factors in the assessment

of English language learners: The theory and principles underlying the linguistic modification approach. Los Angeles: Department of Education: LEP Partnership at the Davis and the Advance Research and Data Analyses Center, University of California. Retrieved from http://ncela.ed.gov/files/uploads/11/abedi_sato.pdf

Abedi, J. (2009). Assessing high school English language learners. In L. M. Pinkus (Ed.), Meaningful measurement: The role of assessments in improving high school education in the twenty-first century (pp. 143–156). Washington, DC: Alliance for Excellent Education. Retrieved from www.ewcupdate.com/userfiles/assessmentnetwork_net/file/MeaningfulMeasurement.pdf

Abedi, J., Courtney, M., & Leon, S. (2003b). Research-supported accommodation for English language learners in NAEP (CSE Technical Report No. 586). Los Angeles: National Center for Research on Evaluation, Standards, and Student Testing, University of California. Retrieved from http://www.cse.ucla.edu/products/Reports/tR586.pdf

Abedi, J., Courtney, M., Leon, S., Kao, J., & Assam, T. (2006). English language learners and math achievement: A study of opportunity to learn and language accommodation (CSE Technical Report No. 702). Los Angeles: National Center for Research on Evaluation, Standards, and Student Testing, University of California. Retrieved from http://files.eric.ed.gov/fulltext/ED495848.pdf

Abedi, J., Courtney, M., Mirocha, J., Leon, S., & Goldberg, J. (2005). Language accommodations for English language learners in large-scale assessments: Bilingual dictionaries and linguistic modification (CSE Report No. 666). Los Angeles: National Center for Research on Evaluation, Standards, and Student Testing, University of California. Retrieved from http://www.cse.ucla.edu/products/reports/r666.pdf

http://ncela.ed.gov/files/uploads/11/abedi_sato.pdf

http://ncela.ed.gov/files/uploads/11/abedi_sato.pdf

http://www.ewcupdate.com/userfiles/assessmentnetwork_net/file/MeaningfulMeasurement.pdf

http://www.ewcupdate.com/userfiles/assessmentnetwork_net/file/MeaningfulMeasurement.pdf

http://www.cse.ucla.edu/products/Reports/TR586.pdf

http://www.cse.ucla.edu/products/Reports/TR586.pdf

http://files.eric.ed.gov/fulltext/ED495848.pdf

http://www.cse.ucla.edu/products/reports/r666.pdf

R.2

REFERENCES

Abedi, J. & Ewers, N. (2013). Accommodations for English language learners and students with disabilities: A research-based decision algorithm. Los Angeles: Smarter Balanced Assessment Consortium, University of California. Retrieved from http://www.smarterbalanced.org/wordpress/wp-content/uploads/2012/08/accomodations-for-under-represented-students.pdf

Abedi, J., & Herman, J. L. (2010). Assessing English language learners’ opportunity to learn mathematics: Issues and limitations. Teacher’s College Record, 112(3), 723–746.

Abedi, J., Hofstetter, C., Baker, E., & Lord, C. (2001). NAEP math performance and test accommodations: Interactions with student language background (CSE Technical Report No. 536). Los Angeles: National Center for Research on Evaluation, Standards, and Student Testing, University of California. Retrieved from http://www.cse.ucla.edu/products/reports/newtr536.pdf

Abedi, J., Hofstetter, C., & Lord, C. (2004). Assessment accommodations for English language learners: Implications for policy-based empirical research. Review of Educational Research, 74(1), 1–28.

Acosta, B., Rivera, C., & Willner, L. S. (2008). Best Practices in State Assessment Policies for Accommodating English Language Learners: A Delphi Study. Prepared for the LEP Partnership, U. S. Department of Education. Arlington, VA: The George Washington University Center for Equity and Excellence in Education. Retrieved from http://files.eric.ed.gov/fulltext/ED539759.pdf

ACT. (2006). Ready for college and ready for work: Same or different? Iowa City, IA: ACT. Retrieved from www.act.org/research/policymakers/pdf/ReadinessBrief.pdf

ACT. (2007). National Career Readiness CertificateTM: WorkKeys assessments technical bulletin. Iowa City, IA: ACT. Retrieved from okjobmatch.com/ada/customization/Kansas/documents/NCRC_Technical_Bulletin.pdf

ACT. (2013a). The ACT Explore® technical manual: 2013–2014. Iowa City, IA: ACT.

ACT. (2013b). The ACT Plan® technical manual: 2013–2014. Iowa City, IA: ACT.

ACT. (2014a). ACT Aspire® exemplar writing questions. Iowa City, IA: ACT.

ACT. (2014b). The ACT® technical manual. Iowa City, IA: ACT.

Allen, J. (2014). Development of predicted paths for ACT Aspire score reports (ACT Working Paper 2014-06). Iowa City, IA: ACT.

Allen, J., & Sconing, J. (2005). Using ACT assessment® scores to set benchmarks for college readiness (ACT Research Report 2005-3). Iowa City, IA: ACT.

http://www.smarterbalanced.org/wordpress/wp-content/uploads/2012/08/Accomodations-for-under-represented-students.pdf

http://www.smarterbalanced.org/wordpress/wp-content/uploads/2012/08/Accomodations-for-under-represented-students.pdf

http://www.cse.ucla.edu/products/reports/newtr536.pdf

http://www.cse.ucla.edu/products/reports/newtr536.pdf


http://www.act.org/research/policymakers/pdf/ReadinessBrief.pdf

http://okjobmatch.com/ada/customization/Kansas/documents/NCRC_Technical_Bulletin.pdf

http://okjobmatch.com/ada/customization/Kansas/documents/NCRC_Technical_Bulletin.pdf

R.3

REFERENCES

Almond, P., Winter, P., Cameto, R., Russell, M., Sato, E., Clarke-Midura, J., Torres, C., Haertel, G., Dolan, R., Beddow, P., & Lazarus, S. (2010). Technology-enabled and universally designed assessment: Considering access in measuring the achievement of students with disabilities—A foundation for research. Journal of Technology, Learning, and Assessment, 10(5), 1–52.

American Educational Research Association (AERA), American Psychological Association (APA), & National Council on Measurement in Education (NCME). (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.

Angoff, W. H., & Ford, S. F. (1973). Item-race interaction on a test of scholastic aptitude. Journal of Educational Measurement, 10(2), 95–106.

Arditi, A. (2016a). Making text legible: Designing for people with partial sight [Brochure]. New York, NY: Lighthouse International. Retrieved from http://li129-107.members.linode.com/accessibility/design/accessible-print-design/making-text-legible/

Arditi, A. (2016b). Effective color contrast: Designing for people with partial sight and color deficiencies [Brochure]. New York, NY: Lighthouse International. Retrieved from http://li129-107.members.linode.com/accessibility/design/accessible-print-design/effective-color-contrast/

Baker, F. B., & Kim, S. H. (2004). Item response theory: Parameter estimation techniques (2nd ed.). New York, NY: Dekker.

Beddow, P. (2009). TAMI Test Accessibility and Modification Inventory: Quantifying and improving the accessibility of tests and test items. Presented at CCSSO 2009 National Conference on Student Assessment in Los Angeles, CA.

Beddow, P. A., Elliott, S. N., & Kettler, R. J. (2013). Test accessibility: Item reviews and lessons learned from four state assessments. Education Research International, 2013(952704), 1-12. doi: 10.1155/2013/952704.

Beddow, P. A., Elliott, S. N., & Kettler, R. J. (2009). Test Accessibility and Modification Inventory (TAMI). Nashville, TN: Vanderbilt University. Retrieved from http://peabody.vanderbilt.edu/docs/pdf/PRO/tamI_accessibility_Rating_matrix.pdf

Beddow, P. A., Elliott, S. N., & Kettler, R. J. (2009). TAMI Accessibility Rating Matrix (ARM). Nashville, TN: Vanderbilt University. Retrieved from http://peabody.vanderbilt.edu/docs/pdf/PRO/tamI_accessibility_Rating_matrix.pdf

Beddow, P. A., Elliott, S. N., & Kettler, R. J. (n.d.). Accessibility reviews to improve test score validity. Operational Alternate Assessments of Science Inquiry Standards OAASIS Project. Nashville, TN: Vanderbilt University. Retrieved from http://www.cehd.umn.edu/NCEO/Presentations/Posters/6VanderbiltaERa.pdf

http://li129-107.members.linode.com/accessibility/design/accessible-print-design/making-text-legible/

http://li129-107.members.linode.com/accessibility/design/accessible-print-design/making-text-legible/

http://li129-107.members.linode.com/accessibility/design/accessible-print-design/effective-color-contrast/

http://li129-107.members.linode.com/accessibility/design/accessible-print-design/effective-color-contrast/

http://peabody.vanderbilt.edu/docs/pdf/PRO/TAMI_Accessibility_Rating_Matrix.pdf




http://www.cehd.umn.edu/NCEO/Presentations/Posters/6VanderbiltAERA.pdf

http://www.cehd.umn.edu/NCEO/Presentations/Posters/6VanderbiltAERA.pdf

R.4

REFERENCES

Betebenner, D. W., Van Iwaarden, A., Domingue, B., & Shang, Y. (2016). SGP: Student growth percentiles & percentile growth trajectories. [R package version 1.5-0.0]. Retrieved from cran.r-project.org/web/packages/SGP/index.html

Bollen, K. A. (1989). Structural equations with latent variables. New York, NY: Wiley.

Bowles, M., & Stansfield, C. W. (2008). Standards-based assessment in the native language: A practical guide to the issues. NLA-LEP Partnership. Retrieved from www.edweek.org/media/maz-guide%20to%20native%20language%20assessment%20v15-blog.pdf

Brennan, R. L. (2001). Generalizability theory. New York, NY: Springer-Verlag.

Burton, S. J., Sudweeks, R. R., Merrill, P., & Wood, B. (1991). How to prepare better multiple-choice test items: Guidelines for university faculty. Provo, UT: Brigham Young University Testing Services and the Department of Instructional Science. Retrieved from testing.byu.edu/handbooks/betteritems.pdf

Camilli, G. C. (2006). Test fairness. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 221-256). Westport, CT: American Council on Education and Praeger.

Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56(2), 81–105.

CAST. (2011). Universal design for learning guidelines version 2.0. Wakefield, MA: Author. Retrieved from www.udlcenter.org/aboutudl/udlguidelines

Castellano, K. E., & Ho, A. D. (2015). Practical differences among aggregate-level conditional status metrics: From median student growth percentiles to value-added models. Journal of Educational and Behavioral Statistics, 40(1), 35–68.

Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1(2), 245–276.

Cawthon, S., Ho, E., Patel, P., Potvin, D., & Trundt, K. (2009). Multiple constructs and effects of accommodations on accommodated test scores for students with disabilities. Practical Assessment, Research & Evaluation, 14(18), 1–9. Retrieved from pareonline.net/pdf/v14n18.pdf

Cawthon, S., Leppo, R., Carr, T. G., & Kopriva, R. (2013). Towards accessible assessments: Promises and limitations of test item adaptations for students with disabilities and English language learners. Retrieved from http://iiassessment.wceruw.org/research/researchPapers/related/towards_accessible_assessments.pdf

http://cran.r-project.org/web/packages/SGP/index.html

http://www.edweek.org/media/maz-guide%20to%20native%20language%20assessment%20v15-blog.pdf

http://www.edweek.org/media/maz-guide%20to%20native%20language%20assessment%20v15-blog.pdf

http://testing.byu.edu/handbooks/betteritems.pdf

http://www.udlcenter.org/aboutudl/udlguidelines

http://pareonline.net/pdf/v14n18.pdf

http://iiassessment.wceruw.org/research/researchPapers/related/Towards_accessible_assessments.pdf



R.5

REFERENCES

Christensen, L., Rogers, C., & Braam, M. (2012). Concluding external validity report of student accessibility assessment system enhanced assessment grant. Research funded by the U. S. Department of Education in support of Enhanced Assessment Grant, Student Accessibility Assessment System (SAAS), 2009-2011.

Christensen, L. L., Shyyan, V., Rogers, C. & Kincaid, A. (2014). Audio support guidelines for accessible assessments: Insights from cognitive labs. Minneapolis: National Center on Educational Outcomes, U.S. Department of Education, University of Minnesota. Retrieved from http://files.eric.ed.gov/fulltext/ED561901.pdf

Colorado Department of Education. (n.d.). Colorado accommodations guide for English learners 2013-2014. Retrieved from www.cde.state.co.us/sites/default/files/Accommodations%20Guide%20for%20ELs%202013%202014%20final%20V.2.pdf

Critz, M. (2010). Fonts for iPad & iPhone. Retrieved from http://www.michaelcritz.com/2010/04/02/fonts-for-ipad-iphone/

Critz, M. (n.d.). iOS fonts: A place for happy typography. Retrieved from iosfonts.com

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334.

Cronbach, L. J., Schönemann, P., & McKie, D. (1965). Alpha coefficients for stratified-parallel tests. Educational and Psychological Measurement, 25(2), 291–312.

Curran, E. (2003, November 25). Correspondence regarding ACT English test format received from Eileen Curran. Topic: Underlining in English tests. Boston, MA: National Braille Press.

de Ayala, R. J. (2009). The theory and practice of item response theory. New York, NY: Guilford Press.

Dolan, R. P., Burling, K., Harms, M., Strain-Seymour, E., Way, W., & Rose, D. H. (2013). A universal design for learning-based framework for designing accessible technology-enhanced assessments. (Research Report). Iowa City, IA: Pearson Education Measurement. Retrieved from http://www.cast.org/our-work/publications/2013/assessment-computer-based-testing-students-disabilities-english-learners-universal-design-udl-dolan.html#.V0kdK03rupo

Duncan, T., Parent, L., Chen, W., Ferrara, S., Johnson, E., Oppler, S., & Shieh, Y-Y. (2005). Study of a dual-language test booklet in eighth-grade mathematics. Applied Measurement in Education, 18(2), 129–161.

Fedorchak, G. (2012). Access by Design—Implications for equity and excellence in education. Internal whitepaper prepared on behalf of the NH Department of Education for the Smarter Balanced Assessment Consortium. (Internal Document – no public link)


http://www.cde.state.co.us/sites/default/files/Accommodations%20Guide%20for%20ELs%202013%202014%20final%20V.2.pdf



http://www.michaelcritz.com/2010/04/02/fonts-for-ipad-iphone/

http://www.michaelcritz.com/2010/04/02/fonts-for-ipad-iphone/

http://www.iosfonts.com

http://www.cast.org/our-work/publications/2013/assessment-computer-based-testing-students-disabilities-english-learners-universal-design-udl-dolan.html#.V0kdK03rupo



R.6

REFERENCES

Fedorchak, G., Higgins, J., Russell, M., & Thurlow, M. (2011). Increasing accessibility to assessments through computer based tools. [PowerPoint Slides]. Orlando, FL: National Conference on Student Assessment. Retrieved from: https://ccsso.confex.com/ccsso/2011/webprogram/Session2220.html

Fletcher, J., Francis, D., O’Malley, K., Copeland, K., Mehta, P., Caldwell, C., Kalinowski, S., Young, V., & Vaughn, S. (2009). Effects of a bundled accommodations package on high-stakes testing for middle school students with reading disabilities. Exceptional Children, 75(4), 447–463.

Francis, D., Rivera, M., Lesaux, N., Kieffer, M., & Rivera, H. (2006) Practical guidelines for the education of English language learner: Research-based recommendations for the use of accommodations in large-scale assessments. (Book 3 of 3). Portsmouth, NH: RMC Research Corporation, Center on Instruction. Retrieved from http://centeroninstruction.org/files/Ell3-assessments.pdf

Freed, G., Gould, B., Hayward, J., Lally, K., Parkin, M., & Rothberg, M. (2015). Item writer guidelines for greater accessibility. Boston, MA: The Carl and Ruth Shapiro Family National Center for Accessible Media at WGBH (NCAM) Educational Foundation.

Geisinger, K. F. (1991). Using standard-setting data to establish cutoff scores. Educational Measurement: Issues and Practice, 10(2), 17–22.

Gilmer, J. S., & Feldt, L. S. (1983). Reliability estimation for a test with parts of unknown lengths. Psychometrika, 48(1), 99–111.

Gorsuch, R. L. (1983). Factor analysis (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.

Guzman-Orth, D., Laitusis, C., Thurlow, M., & Christensen, L. (2014). Conceptualizing accessibility for English language proficiency assessments. Educational Testing Service (ETS) and National Center on Educational Outcomes (NCEO). Retrieved from www.ets.org/Media/Research/pdf/K12%20EL%20Accessibility%20Paper.pdf

Hafner, A. L. (2001, April). Evaluating the impact of test accommodations on test scores of LEP students and non-LEP students. Paper presented at the annual meeting of the American Educational Research Association in Seattle, WA. Retrieved from files.eric.ed.gov/fulltext/ED455299.pdf

Hanson, B. A. (1994). Extension of Lord-Wingersky algorithm to computing test score distributions for polytomous items. Unpublished research note.

Hatcher, L. (1994). A step-by-step approach to using the SAS® system for factor analysis and structural equation modeling (1st ed.). Cary, NC: SAS Institute Inc.

https://ccsso.confex.com/ccsso/2011/webprogram/Session2220.html

https://ccsso.confex.com/ccsso/2011/webprogram/Session2220.html

http://centeroninstruction.org/files/ELL3-Assessments.pdf

http://www.ets.org/Media/Research/pdf/K12%20EL%20Accessibility%20Paper.pdf

http://www.ets.org/Media/Research/pdf/K12%20EL%20Accessibility%20Paper.pdf


R.7

REFERENCES

Higgins, J., Famularo, L., Bowman, T., & Hall, R. (2015). Research and development of audio and American Sign Language guidelines for creating accessible computer-based assessments. Dover, NH: Measured Progress. Retrieved from: https://www.measuredprogress.org/wp-content/uploads/2015/08/gaaP-Research-and-Development-of-audio-and-american-Sign-language-guidelines-for-Creating-accessible-Computer-based-assessments.pdf

Higgins, J., Fedorchak, G., Katz, M. (2012) Assignment of accessibility tools for digitally delivered assessments: Key findings. Dover: NH: U.S. Department of Education, Measured Progress. (Internal Report for State of New Hampshire–excerpted from the final grant report submitted to the US Department of Education, Office of Elementary and Secondary Education in support of Enhanced Assessment Grant: Examining the feasibility, effect and capacity to provide universal access through computer-based testing (2007-2009).

Holland, P. W., & Dorans, N. J. (2006). Linking and equating. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 187–220). Westport, CT: Praeger.

Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1–55.

Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). Westport, CT: Praeger.

Kieffer, M., Rivera, M., & Francis, D. (2012). Practical guidelines for the education of English language learners: Research-based recommendations for the use of accommodations in large-scale assessments research and practice recommendations. Portsmouth, NH: RMC Research Corporation, Center on Instruction. Retrieved from http://files.eric.ed.gov/fulltext/ED537635.pdf

Kingston, N. M. (2008). Comparability of computer- and paper-administered multiple-choice tests for K–12 populations: A synthesis. Applied Measurement in Education, 22(1), 22–37. doi: 10.1080/ 08957340802558326

Kitchel, E. (2013). American Printing House for the Blind. Correspondence received from Dr. Elaine Kitchel. Topic: Low Vision Project.

Kitchel, J. E. (2003). Guidelines to prepare text and graphics for enhanced format documents [PowerPoint slides]. Louisville, KY: American Printing House for the Blind (APH). Retrieved from http://slideplayer.com/slide/712571/

Kitchel, J. E. (n.d.). APH guidelines for print document design. Louisville, KY: American Printing House for the Blind (APH). Retrieved from http://www.aph.org/research/design-guidelines/

Koenker, R. (2005). Quantile regression. New York, NY: Cambridge University Press.

https://www.measuredprogress.org/wp-content/uploads/2015/08/GAAP-Research-and-Development-of-Audio-and-American-Sign-Language-Guidelines-for-Creating-Accessible-Computer-Based-Assessments.pdf





http://slideplayer.com/slide/712571/

http://www.aph.org/research/design-guidelines/

http://www.aph.org/research/design-guidelines/

R.8

REFERENCES

Kolen, M. J. (1984). Effectiveness of analytic smoothing in equipercentile equating. Journal of Educational and Behavioral Statistics, 9(1), 25–44.

Kolen, M. J., & Brennan, R. L. (2014). Test equating, scaling, and linking: Methods and practices (3rd ed.). New York, NY: Springer-Verlag.

Kwon, M., Legge, G. E., & Dibbels, B. R. (2007). Developmental changes in the visual span for reading. Science Direct Vision Research, 47, 2889–2900. Retrieved from www.sciencedirect.com/science/article/pii/S0042698907003422

Laitusis, C., Buzick, H., Stone, E., Hansen, E., & Hakkinen, M. (2012). Literature review of testing accommodations and accessibility tools for students with disabilities. Princeton, NJ: Educational Testing Service, Smarter Balanced Assessment Consortium. Retrieved from https://www.smarterbalanced.org/wp-content/uploads/2015/08/Smarter-balanced-Students-with-Disabilities-literature-Review.pdf

Legge, G. E. (2007). Psychophysics of reading in normal and low vision. Mahwah, NJ: Lawrence Erlbaum Associates.

Legge, G. E., & Bigelow, C. A. (2011). Does print size matter for reading? A review of findings from vision science and typography. Journal of Vision, 11(5):8, 1–22. doi:10_1167/11.5.8.

Lister, N., & Shaftel, J. (2014, June). Accessibility for Technology-Enhanced Assessments (ATEA). Presentation at the National Conference on Student Assessment in New Orleans, LA.

Lovett, B. J. (2011). Extended time testing accommodations: What does the research say? National Association of School Psychologists Communique, 39(8). Retrieved from www.questia.com/magazine/1P3-2376997951/extended-time-testing-accommodations-what-does-the

Mace, R. L. (1989). Universal design recommendations report (Project 89). Ronald L. Mace Papers, MC 00260, Special Collections Research Center, North Carolina State University Libraries, Raleigh, NC.

Mansfield, J. S., Legge, G. E., & Bane, M. C. (1996). Psychophysics of reading: XV. Font effects in normal and low vision. Investigative Ophthalmology and Visual Science, 37(8), pp. 1492-1501.

Mattern, K., Burrus, J., Camara, W., O’Connor, R., Hansen, M. A., Gambrell, J., Casillas, A., & Bobek, B. (2014). Broadening the definition of college and career readiness: A holistic approach. Iowa City, IA: ACT.

http://www.sciencedirect.com/science/article/pii/S0042698907003422

https://www.smarterbalanced.org/wp-content/uploads/2015/08/Smarter-Balanced-Students-with-Disabilities-Literature-Review.pdf



http://www.questia.com/magazine/1P3-2376997951/extended-time-testing-accommodations-what-does-the

http://www.questia.com/magazine/1P3-2376997951/extended-time-testing-accommodations-what-does-the

R.9

REFERENCES

Mislevy, R. J., Almond, R. G., & Lukas J. (2004). A brief introduction to evidence-centered design (CSE Report 632). Los Angeles, CA: University of California-Los Angeles, National Center for Research on Evaluation, Standards, and Student Testing. Retrieved from files.eric.ed.gov/fulltext/ED483399.pdf

Mislevy, R. J., & Haertel, G. (2006). Implications for evidence-centered design for educational assessment. Educational Measurement: Issues and Practice, 25(4), 6–20.

Muraki, E., & Bock, R. D. (2003). PARSCALE 4 for Windows: IRT based test scoring and item analysis for graded items and rating scales [Computer software]. Skokie, IL: Scientific Software International, Inc.

Muthén, L. K., & Muthén, B. O. (1998-2007). Mplus user’s guide (5th ed.). Los Angeles, CA: Muthén & Muthén.

New England Common Assessment Consortium. (2007). Supplement to the 2007 NECAP Technical Manual, Accommodations allowed in NECAP General Assessment: Are they appropriate? What impact do those accommodations have on the test scores? (Response to Federal Peer Review Item 4.0.2)

O’Brien, B. A., Mansfield, J. S., & Legge, G. E. (2005). The effect of print size on reading speed in dyslexia. Journal of research in reading, 28(3), 332-349. doi:10.1111/j.1467-9817.2005.00273.x.

Parker, C. E., Louie, J., and O’Dwyer, L. (2009). New measures of English language proficiency and their relationship to performance on large-scale content assessments (Issues & Answers Report, REL 2009-No. 066). Washington, DC: US Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Northeast and Islands. Retrieved from ies.ed.gov/ncee/edlabs/regions/northeast/pdf/REL_2009066.pdf

Passel, J. S. (2011). Demography of Immigrant Youth: Past, Present and Future. Immigrant Children, 21(1), pp. 19–41. Retrieved from futureofchildren.org/futureofchildren/publications/journals/journal_details/index.xml?journalid=74

Pennock-Roman, M., & Rivera, C. (2011). Mean effects of test accommodations for ELLs and non-ELLs: A meta-analysis of experimental studies. Educational Measurement: Issues and Practice, 30(3), 10–28.

Pennock-Roman, M., & Rivera, C. (2012). Summary of literature on empirical studies of the validity and effectiveness of test accommodations for ELLs: 2005–012. Prepared for Measured Progress by The George Washington University Center for Equity and Excellence in Education. Retrieved from www.smarterbalanced.org/wp-content/uploads/2015/08/Smarter-Balanced-ELL-Literature-Review.pdf


http://ies.ed.gov/ncee/edlabs/regions/northeast/pdf/REL_2009066.pdf

http://ies.ed.gov/ncee/edlabs/regions/northeast/pdf/REL_2009066.pdf

http://futureofchildren.org/futureofchildren/publications/journals/journal_details/index.xml?journalid=74

http://futureofchildren.org/futureofchildren/publications/journals/journal_details/index.xml?journalid=74

http://www.smarterbalanced.org/wp-content/uploads/2015/08/Smarter-Balanced-ELL-Literature-Review.pdf

http://www.smarterbalanced.org/wp-content/uploads/2015/08/Smarter-Balanced-ELL-Literature-Review.pdf

R.10

REFERENCES

Popham, W. J. (2012). Assessment Bias: How to Banish it. Booklet in series Mastering Assessment: A Self-Service System for Educators (2nd ed.). Boston, MA: Pearson Education, Inc., publishing as Allyn & Bacon. Retrieved from ati.pearson.com/downloads/chapters/Popham_Bias_BK04.pdf

Rogers, C. M., Christian, E. M., & Thurlow, M. L. (2012). A summary of the research on the effects of test accommodations: 2009-2010 (Technical Report 65). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes. Retrieved from www.cehd.umn.edu/NCEO/onlinepubs/Tech65/default.htm

Rogers, J. L., Howard, K. I., & Vessey, J. T. (1993). Using significance tests to evaluate equivalence between two experimental groups. Psychological Bulletin, 113(3), 553–565.

Russell, M. (2002). Testing on computers: A follow-up study comparing performance on computer and on paper. Technology and Assessment Study Collaborative (inTASC). Originally published in Educational Policy Analysis Archives, 7(20), 1999. Retrieved from www.bc.edu/research/intasc/PDF/TestCompFollowUp.pdf

Russell, M., Goldberg, A., & O’Connor, K. (2003). Computer-based testing and validity: A look back and into the future. Technology and Assessment Study Collaborative (inTASC). Chestnut Hill, MA: Boston College. Retrieved from www.bc.edu/research/intasc/PDF/ComputerBasedValidity.pdf

Russell-Minda, E., Jutai, J., Strong, G., Campbell, K., Gold, D., Pretty, L., & Wilmot, L. (2006). An evidence-based review of the research on typeface legibility for readers with low vision. London: ON: Vision Rehabilitation Evidence-Based Review (VREBR) and The Canadian National Institute for the Blind (CNIB), University of Western Ontario.

Russell-Minda, E., Jutai, J., Strong, J. G., Campbell, K., Gold, D., Pretty, L., & Wilmot, L. (2007). The legibility of typefaces for readers with low vision: A research review. Journal of Visual Impairment & Blindness, 101(7), 402–415.

Sato, E., Rabinowitz, S., Gallagher, C., & Huang, C. W. (2010). Accommodations for English language learner students: the effect of linguistic modification of math test item sets (NCEE 2009–4079). Washington, DC: US Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance. Retrieved from ies.ed.gov/ncee/edlabs/regions/west/pdf/REL_20094079.pdf

Serlin, R. C., & Lapsley, D. K. (1985). Rationality in psychological research: The good-enough principle. American Psychologist, 40(1), 73–83.

Shaftel, J., Belton-Kocher, E., Glasnapp, D., & Poggio, J. (2006). The impact of language characteristics in mathematics test items on the performance of English language learners and students with disabilities. Educational Assessment, 11(2), 105–126.

http://ati.pearson.com/downloads/chapters/Popham_Bias_BK04.pdf

http://ati.pearson.com/downloads/chapters/Popham_Bias_BK04.pdf

http://www.cehd.umn.edu/NCEO/onlinepubs/Tech65/default.htm

http://www.bc.edu/research/intasc/PDF/TestCompFollowUp.pdf

http://www.bc.edu/research/intasc/PDF/ComputerBasedValidity.pdf

http://www.bc.edu/research/intasc/PDF/ComputerBasedValidity.pdf

http://ies.ed.gov/ncee/edlabs/regions/west/pdf/REL_20094079.pdf

http://ies.ed.gov/ncee/edlabs/regions/west/pdf/REL_20094079.pdf

R.11

REFERENCES

Sireci, S. G. (2009). Packing and unpacking sources of validity evidence: History repeats itself again. In R. W. Lissitz (Ed.), The concept of validity: Revisions, new directions, and applications (pp. 19–37). Charlotte, NC: Information Age Publishing.

Solano-Flores, G. (2012). Translation accommodations framework for testing English language learners in mathematics. Boulder: Smarter Balanced Assessment Consortium, University of Colorado. Retrieved from https://www.smarterbalanced.org/wp-content/uploads/2015/08/translation-accommodations-Framework-for-testing-Ell-math.pdf

Thompson, S. J., Johnstone, C. J., & Thurlow, M. L. (2002). Universal design applied to large scale assessments (NCEO Synthesis Report 44). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes. Retrieved from www.cehd.umn.edu/nceo/onlinepubs/Synthesis44.html

Thurlow, M. L., Laitusis, C. C., Dillon, D. R., Cook, L. L., Moen, R. E., Abedi, J., & O’Brien, D. G. (2009). Accessibility principles for reading assessments. Minneapolis, MN: National Accessible Reading Assessment Projects. Retrieved from http://files.eric.ed.gov/fulltext/ED510369.pdf

US Department of Education. (2003). Fact sheet: NCLB provisions ensure flexibility and accountability for limited English proficient students. Retrieved from www2.ed.gov/nclb/accountability/schools/factsheet-english.html. Related information listed in Standards and assessments: Non-regulatory guidance, retrieved from www.ed.gov/policy/elsec/guid/saaguidance03.doc

Wang, M. W., & Stanley, J. C. (1970). Differential weighting: A review of methods and empirical studies. Review of Educational Research, 40(5), 663–705.

Wang, T., Kolen, M. J., & Harris, D. J. (2000). Psychometric properties of scale scores and performance levels for performance assessments using polytomous IRT. Journal of Educational Measurement, 37(2), 141–162.

Wells, C. S., Sireci, S. G., & Bahry, L. M. (2014). The effect of conditioning years on the reliability of SGPs (Center for Educational Assessment Research Report 869). Amherst, MA: University of Massachusetts Amherst, Center for Educational Assessment. Retrieved from www.umass.edu/remp/pdf/WSB_NCME2014.pdf

Wolf, M. K., Herman, J. L., & Dietel, R. (2010). Improving the validity of English language learner assessment systems (Policy Brief 10). Los Angeles, CA: University of California-Los Angeles, National Center for Research on Evaluation, Standards, and Student Testing. Retrieved from www.cse.ucla.edu/products/policy/policybrief10_frep_ellpolicy.pdf

https://www.smarterbalanced.org/wp-content/uploads/2015/08/Translation-Accommodations-Framework-for-Testing-ELL-Math.pdf



http://www.cehd.umn.edu/nceo/onlinepubs/Synthesis44.html

http://www.cehd.umn.edu/nceo/onlinepubs/Synthesis44.html



http://www2.ed.gov/nclb/accountability/schools/factsheet-english.html

http://www2.ed.gov/nclb/accountability/schools/factsheet-english.html

http://www.ed.gov/policy/elsec/guid/saaguidance03.doc

http://www.ed.gov/policy/elsec/guid/saaguidance03.doc

http://www.umass.edu/remp/pdf/WSB_NCME2014.pdf

http://www.cse.ucla.edu/products/policy/policybrief10_frep_ellpolicy.pdf

http://www.cse.ucla.edu/products/policy/policybrief10_frep_ellpolicy.pdf

R.12

REFERENCES

Wolf, M. K., Kim, J., Kao, J. C. & Rivera, N. (2009). Examining the effectiveness and validity of glossary and read-aloud accommodations for English language learners in a math assessment. Los Angeles, CA: University of California-Los Angeles, National Center for Research on Evaluation, Standards, and Student Testing. Retrieved from www.cde.state.co.us/sites/default/files/documents/research/download/pdf/cresstglossaryaccomstudy09.pdf

Yen, W. M., & Fitzpatrick, A. R. (2006). Item response theory. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 111–153). Westport, CT: Praeger.

http://www.cde.state.co.us/sites/default/files/documents/research/download/pdf/cresstglossaryaccomstudy09.pdf

http://www.cde.state.co.us/sites/default/files/documents/research/download/pdf/cresstglossaryaccomstudy09.pdf

u.1

2016 UpdatesPrevious chapters in this technical manual include ACT Aspire test development, test administration, and psychometric analyses from the early development stage to the operational administrations. This section presents updated information and statistical analyses based on data collected from recent operational administrations and a few updates that will be implemented starting from spring 2016. Specifically, Section 1 of this chapter presents the updated nationally representative norms used for reporting from the fall 2015 operational administration; Section 2 presents the updated ACT Aspire STEM benchmarks used for reporting from the fall 2015 operational administration; Section 3 presents raw score reliabilities of forms administered in spring 2015; and Section 4 presents a series of program improvements that will be implemented starting from the spring 2016 operational administration.

Section 1: Updating ACT Aspire NormsACT Aspire started with the goal of reporting three-year rolling user norms with equal weights to each student. Beginning with fall 2015 reporting, the norm data have been statistically weighted to more closely match a national distribution in terms of selected student demographics and achievement.

This section describes the development of ACT Aspire norms based on weighted data from tests administered through spring 2015. First, the inclusion rules for the samples are described, followed by a description of a weighting methodology designed to produce nationally representative norms. Then, the updated ACT Aspire norm tables, summary statistics, and demographic information for the weighted samples are presented.

aCt aspire Norm SamplesFor grades 3–8, the norm samples included students who took ACT Aspire on-grade subject tests in spring 2013, spring 2014, and spring 2015, as consistent with the goal to include three years of rolling data.

u.2

2016 uPDatES

For grades 9 and 10, the norm samples were restricted to students who took ACT Aspire in consecutive years. The grade 9 samples included students who took grade 8 ACT Aspire and grade 9 ACT Aspire subject tests approximately one year apart. Similarly, the grade 10 samples included students who took ACT Aspire in grade 9 and grade 10 approximately one year apart. As described later, these longitudinal samples were used to anchor the grade 9 and grade 10 norm samples to the grade 8 and grade 9 score distributions, respectively. Table U.1 provides the sample sizes by grade level and subject area.

Table U.1. Sample Sizes used for Norm Establishment

Grade level

Subject area


3 102,213 176,117 176,128 73,337 89,557

4 100,592 173,117 173,383 74,618 94,322

5 100,037 172,635 172,437 99,966 93,023

6 98,838 172,865 172,723 72,081 94,949

7 103,297 179,967 179,501 104,468 99,005

8 112,999 190,450 190,086 89,559 107,996

9 3,875 4,149 3,925 3,877 3,364

10 6,436 6,634 6,600 6,172 4,862

Weighting methodology for aCt aspire NormsStudents assessed with ACT Aspire are not representative of the national population of US elementary and secondary school students. To support interpretations of nationally representative norms, weights were assigned to the ACT Aspire-tested samples so that the weighted samples were similar to the national population of US elementary and secondary school students on school affiliation (public vs. private), and race/ethnicity and academic achievement among public school students.

For grades 3–8, we first estimated the population of US elementary and secondary public school students with respect to race/ethnicity and district mean achievement level. Then, the public school ACT Aspire-tested students were weighted to match on these characteristics. Finally, the weighted public school data was combined with the private school data with the final weighting reflecting the frequency of school affiliation (public or private) in the population. An important assumption of this approach is that a district’s mean achievement level can be measured by its mean ACT Composite score1, and that this measure reflects achievement in the lower grade levels. To examine this assumption, correlations between district mean ACT Composite score and district mean ACT Aspire scores in the lower grade levels were examined

1 When a district tests virtually all students with the ACT, as is often done with ACT state and district testing, a district’s mean ACT Composite score measures the mean achievement level of the district’s graduating students.

u.3

2016 uPDatES

(Table U.2). For most subject areas and grade levels, district mean ACT Composite score was highly correlated with district mean ACT Aspire score. For writing, the correlations were less consistent across grade levels and were generally lower. The correlations in Table U.2 suggest that district mean ACT Composite score reflects achievement in lower grade levels.

Table U.2. Correlation of District mean aCt Composite Scores with District mean aCt aspire Score

Grade level

Subject area


3 0.69 0.61 0.72 0.52 0.51

4 0.61 0.70 0.74 0.57 0.08

5 0.63 0.69 0.73 0.66 0.59

6 0.74 0.65 0.68 0.58 0.62

7 0.70 0.79 0.66 0.71 0.38

8 0.73 0.81 0.58 0.66 0.26

9 0.79 0.79 0.73 0.72 0.61

10 0.72 0.74 0.66 0.70 0.47

Note. Districts with at least 50 ACT Aspire-tested students were included.

For grade 9, a sample of students tested with ACT Aspire in grades 8 and 9 was used. After the norms were developed for grade 8, the longitudinal samples (one for each subject area) were weighted to match the national grade 8 score distribution. The weighted score distribution of the grade 9 scores were then calculated. For grade 10, the same approach was used, but with the national grade 9 score distribution serving as the anchor.

The weighting procedure for grades 3–8 is described in detail below. In what follows, target population refers to the population of US elementary and secondary school students. The following steps were taken to weight each ACT Aspire norming sample2:

1. Determine the target population’s relative frequency of students enrolled in public and private schools.3

2. Among public school students within the target population, determine the relative frequency by race/ethnicity and school percent eligible for free or reduced lunch (school FRL). Table U.3 shows the categorizations of these variables and relative

2 There was one norming sample for each grade level and subject area.3 Estimated from Market Data Retrieval (MDR) data, the target population included 90.5% public-school and 9.5%

private-school enrollees.

u.4

2016 uPDatES

frequencies in the public school target population.4

3. Identify all public school districts that administer the ACT test to all 11th grade students; we refer to ACT-tested students within these districts as the ACT state and district testing public school sample. Weight this sample to the public school target population on race/ethnicity and school FRL. Weights were determined by the ratio of the population and sample percentages of each combination of race/ethnicity and school FRL:

We refer to this weighted sample as the ACT public school population. The ACT public school population was nearly identical to the public school target population with respect to relative frequencies of race/ethnicity and school FRL groups (Table U.3).

4. For each student in the ACT public school population, determine the mean ACT Composite score within their district. Categorize district mean ACT score5 and student race/ethnicity, and calculate the relative frequency of each combination of race/ethnicity and district mean ACT Composite score level (DMACT) within the ACT public school population.

5. For each ACT Aspire norm sample, determine public/private school affiliation and, for public school students, race/ethnicity and DMACT. For students enrolled in public schools, calculate the sample percentage for each combination of race/ethnicity and DMACT. Weight the sample to the target population on public/pri-vate school affiliation and, for public school students, race/ethnicity and DMACT. Weights were determined by the ratio of the population and sample percentages:

For an example norm sample (grade 5 English), the relative frequencies of the weighting variables are provided in Table U.4 for the target population, the unweighted sample, and the weighted sample. The weighted sample was identical to the target population with respect to school affiliation, and the public school sample was identical to the public school target population on race/ethnicity and district mean ACT Composite score. As shown in Table U.4, the weighted mean ACT Aspire grade 5 English score was higher than the unweighted mean. This was expected because the unweighted district mean ACT Composite score was lower than the target population’s district mean ACT Composite score.

4 The public school target population relative frequencies for race/ethnicity and school FRL were derived from the Common Core of Public School Universe Survey Data provided by the National Center for Education Statistics.

5 See Table U.3 for the categorization of district mean ACT Composite score.

u.5

2016 uPDatES

Table U.3. target Public School Population and aCt Public School Population

Variable

Population of public U.S. elementary and secondary

students

ACT state and district testing public school sample

Unweighted ACT public school population*

Race/ethnicity

Asian 4.7% 3.3% 4.7%

Black/AfricanAmerican 15.5% 14.6% 15.5%

Hispanic 25.0% 13.1% 25.0%

Native American 1.1% 0.7% 1.1%

White 50.6% 55.4% 50.6%

Unknown/Other** 3.1% 12.9% 3.1%

FRL % of enrolled school

0–10% 6.8% 5.7% 6.8%

10–20% 8.9% 12.0% 8.9%

20–35% 14.9% 23.1% 14.9%

35–50% 17.4% 23.0% 17.4%

50–65% 16.8% 18.8% 16.8%

65–80% 15.0% 10.1% 15.0%

80–90% 9.5% 3.6% 9.5%

90–100% 10.0% 3.3% 10.0%

Unknown 0.8% 0.5% 0.8%

District mean ACT score

<17.5 10.1% 16.1%

17.5–18.2 9.2% 9.5%

18.2–18.7 11.3% 15.9%

18.7–19.2 10.3% 9.6%

19.2–19.5 8.6% 7.7%

19.5–20.1 9.9% 8.5%

20.1–20.8 12.4% 9.7%

20.8–21.4 8.3% 6.3%

21.4–22.6 9.3% 7.2%

>22.6 10.6% 9.6%

Mean (SD) district mean ACT score 19.9 (2.5) 19.3 (2.8)

Mean (SD) ACT Composite score 19.9 (5.3) 19.3 (5.3)

Note. Shading represents variables used in weighting.

* ACT state and district testing public school sample weighted to national population of U.S. elementary and secondary students on race/ethnicity and school FRL.

** For the public school target population, the relative frequency for Unknown/Other was obtained by subtracting the sum of the individual race/ethnicity counts from total enrollment counts. For the ACT data, the Unknown/Other category included Native Hawaiian/Other Pacific Islander, two or more races, prefer not to respond, and no response.

u.6

2016 uPDatES

Table U.4. One Example of aCt aspire Norm Sample Weighting: grade 5 English

Variable

Population of U.S. elementary and

secondary students

ACT Aspire norm sample: grade 5 English

Unweighted Weighted

Female 49.0% 49.0% 48.5%

Male 51.0% 51.0% 51.5%

School affiliation

Public 90.5% 95.4% 90.5%

Private 9.5% 4.6% 9.5%

Public school frequencies

Race/ethnicity

Asian 4.7% 1.6% 4.7%

Black/African American 15.5% 32.5% 15.5%

Hispanic 25.0% 6.9% 25.0%

Native 1.1% 0.7% 1.0%

White 50.6% 49.9% 50.6%

Unknown/Other 3.1% 8.4% 3.1%

FRL % of enrolled school

0–10% 6.8% 1.5% 7.6%

10–20% 8.9% 2.8% 3.0%

20–35% 14.9% 10.3% 15.6%

35–50% 17.4% 12.0% 11.6%

50–65% 16.8% 25.6% 27.2%

65–80% 15.0% 19.0% 14.1%

80–90% 9.5% 11.4% 7.9%

90–100% 10.0% 10.4% 6.3%

Unknown 0.8% 7.1% 6.6%

District mean ACT score*

<17.5 16.1% 27.5% 16.1%

17.5-18.2 9.5% 25.8% 9.5%

18.2-18.7 15.9% 11.3% 15.9%

18.7-19.2 9.6% 13.3% 9.6%


* Frequencies from ACT public school population.

u.7

2016 uPDatES

Table U.4. (continued)

Variable

Population of U.S. elementary and

secondary students

ACT Aspire norm sample: grade 5 English

Unweighted Weighted

19.2-19.5 7.7% 2.8% 7.7%

19.5-20.1 8.5% 1.5% 8.5%

20.1-20.8 9.7% 6.4% 9.7%

20.8-21.4 6.3% 7.7% 6.3%

21.4-22.6 7.2% 3.0% 7.2%

>22.6 9.6% 0.7% 9.6%

Mean (SD) district mean ACT score 19.3 (2.8) 18.4 (1.7) 19.6 (2.5)

Mean (SD) ACT Aspire score 422.3 (7.1) 423.1 (7.1)


* Frequencies from ACT public school population.

Figure U.1 summarizes the weighting procedure for the ACT Aspire norms. Tables U.5–U.9 present the demographic information on the weighted samples for each grade and subject test.

Norm tables and Summary StatisticsThe norm table includes the cumulative percentages of students scoring at or below each scale score in the weighted norm sample. A smoothing process was applied to the empirical cumulative percentages to reduce the sampling error and increase the precision of the norms. The norm tables are presented in Tables U.10–U.14 for each subject respectively. The mean scale score values for students in each grade and subject test are given in Table U.15. These represent estimated mean scores if all students in the target population (all U.S. elementary and secondary school students) were to take the test. For the same target population, Table U.16 also shows the estimated percentages of students meeting ACT Readiness Benchmark in each grade and subject test.

u.8

2016 uPDatES

Population of U.S.

elementary and secondary

students

ACT public school

population

ACT Aspire grade 3-8

norm samples

ACT Aspire grade 8-9

longitudinal samples

ACT Aspire grade 9-10 longitudinal

samples

School affiliation (public or private)

= =

School affiliation (public or private)

Race/ethnicity (public) = Race/ethnicity

(public) = Race/ethnicity (public)

School FRL % (public) = School FRL %

(public)


(public)=


(public)

Grade 8 score

distribution=

Grade 8 score

distribution

Grade 9 score

distribution=

Grade 9 score

distribution

Grade 10 score

distribution

Note. Variables used for weighting are included in shaded boxes.

Figure U.1. Conceptual diagram of chain-linked weighting strategy

u.9

2016 uPDatES

Table U.5. 2015 aCt aspire Weighted English Norm group Demographics

Grade (%)

3 4 5 6 7 8 9 10(n=

102,213)(n=

100,592)(n=

100,037)(n=

98,838)(n=

103,297)(n=

112,999)(n=

3,875)(n=

6,436)

Gender

F 48.99 48.56 48.11 48.81 47.89 48.6 44.32 48.96

M 49.91 50.32 51 49.76 50.57 50.32 40.73 48.64

No response 1.10 1.12 0.89 1.43 1.54 1.09 14.96 2.40

State

AL 47.91 49.07 49.33 44.03 36.9 33.92 0.59 -

AR - - - - - - 0.36 0.17

AZ - 0.19 - - - 0.25 - -

CA 1.36 1.50 2.51 2.51 1.37 1.11 - -

CO 2.87 2.36 2.00 1.85 1.93 1.92 1.51 1.17

CT - - - - - - - 3.45

FL 0.28 0.35 0.22 0.16 0.14 0.24 - -

GA - - - 0.05 0.07 0.09 - -

IA 0.17 0.18 0.14 0.15 0.28 1.17 0.89 0.55

IL 0.13 0.13 0.16 0.14 0.12 0.56 8.24 9.59

IN - - - 0.13 0.11 0.20 - -

KS 1.73 1.78 1.89 0.78 0.70 3.03 6.34 3.35

KY 0.04 0.05 0.04 0.22 0.76 0.49 2.69 0.09

LA 6.08 6.07 5.07 4.93 4.57 4.47 11.93 7.93

MI 0.41 0.35 0.37 0.52 6.79 10.17 14.75 13.06

MN - - - - - 0.10 - -

MO - - - - 0.54 0.06 0.08 2.16

MS 0.10 0.14 0.11 0.13 1.13 1.25 1.84 0.59

ND - - - - - 0.21 - 0.58

NE - - - - - 0.00 - 5.02

NJ - - - - - - - 0.02

NV 0.16 0.19 0.18 0.18 0.25 0.19 - -

OH - - 0.02 0.05 0.26 0.50 - 0.25

Note. - indicates no students tested.

u.10

2016 uPDatES


Grade (%)

3 4 5 6 7 8 9 10(n=

102,213)(n=

100,592)(n=

100,037)(n=

98,838)(n=

103,297)(n=

112,999)(n=

3,875)(n=

6,436)

OK 0.29 0.26 0.28 0.29 0.27 0.25 - -

OR - - - - - 0.01 - -

PA - - - - - - - 3.10

SC 36.72 36.07 36.08 42.27 40.24 36.82 15.08 -

SD 0.03 - 0.02 - 0.02 - - -

TN 0.03 0.05 0.03 0.04 0.03 0.20 2.10 1.17

TX 0.57 0.58 0.48 0.73 0.87 0.75 1.78 0.27

UT 1.05 0.62 0.98 0.59 1.33 0.49 2.30 -

VA - - - - - - 0.03 -

WI 0.06 0.07 0.07 0.27 1.33 1.51 29.20 47.25

No response - - - - - 0.05 0.29 0.26

Race/Ethnicity

Asian 4.38 4.36 4.36 4.37 4.36 4.34 0.87 1.72


Hispanic/Latino 22.87 22.78 22.81 22.77 22.92 22.73 6.85 7.32

Native American 0.97 0.97 0.95 0.95 0.96 0.98 0.71 0.76

Other 0.90 0.81 0.86 0.67 0.49 0.37 1.42 1.61

White 47.90 47.53 47.55 47.68 48.30 47.88 48.33 69.42

Unknown 8.68 9.25 9.14 9.24 8.62 9.50 39.46 16.22


u.11

2016 uPDatES

Table U.6. 2015 aCt aspire Weighted mathematics Norm group Demographics

Grade (%)

3 4 5 6 7 8 9 10(n=

176,117)(n=

173,117)(n=

172,635)(n=

172,865)(n=

179,967)(n=

190,450)(n=

4,149)(n=

6,634)

Gender

F 48.72 48.73 48.43 48.38 48.04 48.96 45.46 50.30

M 50.17 50.22 50.71 50.30 50.55 50.05 40.31 47.24

No response 1.11 1.05 0.86 1.32 1.41 0.99 14.24 2.46

State

AL 66.56 66.84 67.00 64.51 61.19 58.80 1.63 -

AR - - - - - - 0.38 0.19

AZ - 0.19 - - - 0.24 - -

CA 1.34 1.48 2.43 2.56 1.46 1.19 - -

CO 1.09 1.06 0.98 0.96 0.94 1.17 1.58 1.15

CT - - - - - - - 3.27

FL 0.27 0.35 0.22 0.16 0.14 0.23 - -

GA - - - 0.05 0.07 0.09 - -

IA 0.17 0.17 0.14 0.15 0.28 0.70 0.88 0.62

IL 0.09 0.09 0.10 0.10 0.10 0.48 7.30 9.80

IN - - - 0.12 0.11 0.18 - -

KS 0.54 0.59 0.55 0.51 0.46 2.27 6.20 3.07

KY 0.04 0.05 0.04 0.22 0.61 0.48 3.05 0.10

LA 5.98 6.00 5.05 4.89 4.62 4.40 11.69 8.47

MI 0.36 0.28 0.39 0.47 2.52 4.15 13.83 12.77

MN - - - - - 0.09 - -

MO - - - - 0.36 0.05 0.06 2.43

MS 0.08 0.12 0.10 0.12 0.51 0.77 1.48 0.53

ND - - - - - 0.13 - 0.51

NE - - - - - 0.00 - 4.69

NJ - - - - - - - 0.02

NV 0.25 0.27 0.26 0.24 0.25 0.19 - -

OH - - 0.02 0.04 0.24 0.34 - 0.26


u.12

2016 uPDatES


Grade (%)

3 4 5 6 7 8 9 10(n=

176,117)(n=

173,117)(n=

172,635)(n=

172,865)(n=

179,967)(n=

190,450)(n=

4,149)(n=

6,634)

OK 0.17 0.16 0.16 0.17 0.16 0.14 - -

OR - - - - - 0.01 - -

PA - - - - - - - 2.90

SC 21.58 21.30 21.18 23.45 22.68 21.42 14.30 0.01

SD 0.03 - 0.02 - 0.02 - - -

TN 0.03 0.05 0.03 0.04 0.03 0.20 1.80 1.18

TX 0.31 0.31 0.27 0.37 1.12 0.66 3.54 0.27

UT 1.04 0.62 0.98 0.60 1.33 0.49 2.09 -

VA - - - - - - 0.03 -

WI 0.06 0.07 0.07 0.26 0.81 1.08 29.86 47.51

No response - - - - - 0.05 0.31 0.24

Race/Ethnicity

Asian 4.38 4.36 4.37 4.37 4.36 4.34 1.00 1.60


Hispanic/Latino 22.87 22.78 22.81 22.77 22.92 22.73 6.61 7.02

Native American 0.97 0.97 0.97 0.98 0.98 0.99 0.74 0.80

Other 0.58 0.56 0.61 0.47 0.38 0.30 1.29 1.61

White 47.95 47.59 47.61 47.73 48.27 47.99 50.24 69.45

Unknown 8.95 9.44 9.31 9.37 8.75 9.46 37.24 16.52


u.13

2016 uPDatES

Table U.7. 2015 aCt aspire Weighted Reading Norm group Demographics

Grade (%)

3 4 5 6 7 8 9 10(n=

176,128)(n=

173,383)(n=

172,437)(n=

172,723)(n=

179,501)(n=

190,086)(n=

3,925)(n=

6,600)

Gender

F 48.74 48.76 48.49 48.39 48.10 49.00 47.15 51.04

M 50.2 50.25 50.71 50.32 50.6 50.00 38.87 46.54

No response 1.05 0.99 0.8 1.29 1.3 1.00 13.98 2.42

State

AL 66.59 66.87 67.07 64.43 61.81 58.72 1.80 -

AR - - - - - - 0.40 0.18

AZ - 0.19 - - - 0.25 - -

CA 1.34 1.48 2.50 2.61 1.48 1.22 - -

CO 1.10 1.07 0.97 0.99 0.95 1.20 1.58 1.05

CT - - - - - - - 3.04

FL 0.28 0.35 0.22 0.16 0.14 0.24 - -

GA - - - 0.05 0.07 0.09 - -

IA 0.17 0.17 0.14 0.15 0.28 0.72 1.02 0.61

IL 0.09 0.09 0.10 0.10 0.10 0.49 7.98 9.83

IN - - - 0.12 0.11 0.18 - -

KS 0.53 0.57 0.54 0.50 0.47 2.26 5.96 2.94

KY 0.04 0.05 0.04 0.22 0.60 0.49 0.20 0.10

LA 5.98 6.01 5.00 4.91 4.60 4.51 13.03 7.99

MI 0.37 0.35 0.39 0.47 2.56 4.13 14.06 12.62

MN - - - - - 0.10 - -

MO - - - - 0.36 0.05 0.08 2.37

MS 0.08 0.12 0.10 0.12 0.52 0.70 1.20 0.47

ND - - - - - 0.13 - 0.59

NE - - - - - 0.00 - 4.64

NJ - - - - - - - 0.02

NV 0.25 0.27 0.26 0.25 0.25 0.19 - -

OH - - 0.02 0.04 0.24 0.34 - 0.25


u.14

2016 uPDatES


Grade (%)

3 4 5 6 7 8 9 10(n=

176,128)(n=

173,383)(n=

172,437)(n=

172,723)(n=

179,501)(n=

190,086)(n=

3,925)(n=

6,600)

OK 0.18 0.16 0.16 0.18 - 0.14 - -

OR - - - - - 0.01 - -

PA - - - - - - - 3.09

SC 21.52 21.21 21.15 23.45 22.62 21.48 14.01 -

SD 0.03 - 0.02 - 0.02 - - -

TN 0.03 0.05 0.03 0.04 0.03 0.20 2.15 1.14

TX 0.33 0.31 0.25 0.43 0.64 0.53 2.02 0.29

UT 1.03 0.62 0.98 0.54 1.34 0.49 2.51 -

VA - - - - - - 0.02 -

WI 0.06 0.07 0.07 0.25 0.81 1.09 31.63 48.44

No response - - - - - 0.05 0.36 0.34

Race/Ethnicity

Asian 4.38 4.36 4.37 4.37 4.36 4.34 1.14 1.67


Hispanic/Latino 22.87 22.78 22.81 22.76 22.92 22.73 6.2 6.47

Native American 0.97 0.97 0.97 0.98 0.98 0.98 0.69 0.69

Other 0.58 0.57 0.62 0.48 0.38 0.30 1.24 1.58

White 47.94 47.6 47.59 47.69 48.28 47.87 49.42 70.76

Unknown 8.95 9.42 9.32 9.42 8.75 9.60 38.86 16.16


u.15

2016 uPDatES

Table U.8. 2015 aCt aspire Weighted Science Norm group Demographics

Grade (%)

3 4 5 6 7 8 9 10(n=

73,337)(n=

74,618)(n=

99,966)(n=

72,081)(n=

104,468)(n=

89,559)(n=

3,877)(n=

6,172)

Gender

F 49.39 48.00 48.34 48.89 48.01 48.63 45.46 50.29

M 49.44 50.87 50.76 49.71 50.45 50.21 39.81 47.14

No response 1.17 1.13 0.90 1.40 1.54 1.15 14.74 2.57

State

AL 61.88 63.4 71.56 61.06 64.81 46.75 0.37 -

AR - - - - - - 0.39 0.18

AZ - 0.19 - - - 0.25 - -

CA 1.37 1.50 2.48 2.63 1.50 1.23 - -

CO 1.65 1.59 1.37 1.53 1.39 1.59 1.47 1.05

CT - - - - - - - 1.26

FL 0.28 0.35 0.22 0.16 0.14 0.24 - -

GA - - - 0.05 0.07 0.09 - -

IA 0.17 0.18 0.14 0.15 0.28 1.06 0.89 0.65

IL 0.14 0.13 0.13 0.15 0.12 0.69 8.30 10.23

IN - - - 0.14 0.12 0.20 - -

KS 2.30 2.38 0.90 1.11 0.79 3.91 6.05 3.03

KY 0.04 0.05 0.04 0.23 0.67 0.48 3.30 0.10

LA 6.11 6.12 5.11 5.03 4.64 4.43 12.63 8.77

MI 0.42 0.26 0.42 0.44 3.78 8.91 12.30 11.75

MN - - - - - 0.09 - -

MO - - - - 0.65 0.07 0.08 2.51

MS 0.06 0.08 0.08 0.09 0.51 1.19 1.15 0.49

ND - - - - - 0.34 - 0.60

NE - - - - - 0.00 - 4.87

NJ - - - - - - - 0.01

NV 0.17 0.19 0.18 0.18 0.25 0.19 - -

OH - - 0.03 0.05 0.27 0.48 - 0.26


u.16

2016 uPDatES


Grade (%)

3 4 5 6 7 8 9 10(n=

73,337)(n=

74,618)(n=

99,966)(n=

72,081)(n=

104,468)(n=

89,559)(n=

3,877)(n=

6,172)

OK 0.22 0.22 0.22 0.23 0.24 0.21 - -

OR - - - - - 0.01 - -

PA - - - - - - - 3.32

SC 23.62 22.12 15.56 25.29 16.34 24.05 14.83 -

SD 0.03 - 0.03 - 0.02 - - -

TN 0.03 0.04 0.03 0.04 0.03 0.20 2.10 1.26

TX 0.40 0.49 0.43 0.54 0.77 0.76 3.81 0.28

UT 1.06 0.63 0.99 0.60 1.35 0.48 2.32 -

VA - - - - - - 0.03 -

WI 0.06 0.07 0.07 0.3 1.26 2.05 29.65 49.06

No response - - - - - 0.05 0.34 0.31

Race/Ethnicity

Asian 4.38 4.36 4.36 4.37 4.36 4.34 0.95 1.64


Hispanic/Latino 22.88 22.78 22.81 22.78 22.92 22.74 6.53 6.45

Native American 0.97 0.97 0.97 0.92 0.98 0.95 0.73 0.79

Other 0.50 0.49 0.49 0.40 0.26 0.26 1.24 1.66

White 47.9 47.54 47.55 47.72 48.3 47.99 48.77 69.8

Unknown 9.06 9.55 9.50 9.49 8.84 9.53 39.42 17.13


u.17

2016 uPDatES

Table U.9. 2015 aCt aspire Weighted Writing Norm group Demographics

Grade (%)

3 4 5 6 7 8 9 10(n=

89,557)(n=

94,322)(n=

93,023)(n=

94,949)(n=

99,005)(n=

107,996)(n=

3,364)(n=

4,862)

Gender

F 50.31 49.46 47.92 49.12 47.84 49.00 46.84 53.53

M 48.87 49.76 51.33 49.82 50.92 49.91 37.50 45.15

No response 0.82 0.78 0.75 1.06 1.25 1.09 15.66 1.31

State

AL 39.35 40.46 40.25 37.92 34.34 30.01 0.59 -

AR - - - - - - 0.44 0.28

AZ - 0.19 - - - 0.31 - -

CA 1.46 1.57 2.71 3.03 1.81 1.60 - -

CO 2.63 2.10 2.16 2.20 2.49 2.25 1.84 1.23

FL 0.28 0.37 0.24 0.18 0.16 0.32 - -

GA - - - 0.06 0.08 0.12 - -

IA 0.15 0.15 0.12 0.13 0.31 0.11 - -

IL 0.25 0.23 0.24 0.17 0.14 0.50 10.02 9.43

IN - - - 0.23 0.19 0.30 - -

KS 8.1 9.34 8.11 0.74 0.68 2.94 6.37 3.97

KY 0.04 0.05 0.04 0.26 0.95 0.64 3.68 0.15

LA 6.05 6.07 4.83 5.32 4.64 3.94 9.73 8.50

MI 0.43 0.27 0.44 0.51 2.95 12.71 13.8 9.14

MN - - - - - 0.12 - -

MO - - - - 0.53 0.07 0.12 3.52

MS 0.02 0.01 0.04 0.03 0.01 0.70 - 0.60

ND - - - - - 0.25 - -

NE - - - - - - - 0.30

NJ - - - - - - - 0.03

NV 0.19 0.20 0.28 0.20 0.30 0.14 - -

OH - - 0.24 - 0.30 0.16 - 0.35

OK - - - 0.28 0.25 0.20 - -


u.18

2016 uPDatES


Grade (%)

3 4 5 6 7 8 9 10(n=

89,557)(n=

94,322)(n=

93,023)(n=

94,949)(n=

99,005)(n=

107,996)(n=

3,364)(n=

4,862)

OR - - - - - 0.01 - -

PA - - - - - - - 4.23

SC 39.37 37.71 38.64 47.14 45.66 39.26 15.85 -

SD 0.03 - 0.03 - 0.02 - - -

TN 0.03 0.05 0.03 0.05 0.03 0.03 1.54 0.71

TX 0.41 0.49 0.46 0.58 0.89 0.74 2.13 0.35

UT 1.12 0.65 1.06 0.68 1.58 0.63 2.50 -

VA - - - - - - 0.03 -

WI 0.07 0.08 0.09 0.31 1.69 1.89 31.2 56.89

No response - - - - - 0.07 0.16 0.32

Race/Ethnicity

Asian 3.41 4.36 3.38 3.68 3.66 3.39 0.92 1.38


Hispanic/Latino 23.19 22.79 23.06 24.01 24.12 23.08 7.32 5.57

Native American 0.96 0.95 0.96 1.04 1.05 1.01 0.60 0.88

Other 1.01 1.18 1.00 0.79 0.54 0.41 1.44 1.69

White 48.39 47.5 47.92 45.47 45.92 48.42 49.56 72.03

Unknown 8.55 8.93 9.21 9.81 9.56 9.73 37.72 15.43


u.19

2016 uPDatES

Table U.10. 2015 aCt aspire English Norms: Percentage of Students at or below Each Scale Score

Scale Score

Grade

3 4 5 6 7 8 9 10

400 1 1 1 1 1 1 1 1

401 1 1 1 1 1 1 1 1

402 1 1 1 1 1 1 1 1

403 1 1 1 1 1 1 1 1

404 1 1 1 1 1 1 1 1

405 2 1 1 1 1 1 1 1

406 3 1 1 1 2 1 1 1

407 5 1 1 2 2 1 1 1

408 8 2 1 2 3 1 1 2

409 12 4 2 3 3 2 2 3

410 16 5 3 4 4 2 3 3

411 22 8 4 5 5 3 4 4

412 27 10 6 7 6 4 5 5

413 34 14 9 8 8 5 6 7

414 40 18 11 11 9 7 8 8

415 46 23 15 13 11 8 9 10

416 52 28 18 16 13 10 11 11

417 58 33 22 19 15 12 14 13

418 64 39 27 23 18 15 16 15

419 69 45 32 26 21 18 19 17

420 74 51 37 31 24 21 21 20

421 78 57 42 35 27 24 24 22

422 82 63 48 40 31 28 27 25

423 86 68 53 44 35 31 31 27

424 88 74 58 49 39 36 34 30

425 91 78 63 54 43 40 37 33

426 93 83 68 59 47 44 41 36

427 95 86 73 64 52 48 44 39

428 96 89 77 68 56 53 48 42

u.20

2016 uPDatES


Scale Score

Grade

3 4 5 6 7 8 9 10

429 97 92 81 72 61 57 51 46

430 98 94 84 76 65 61 55 49

431 99 96 87 80 69 66 58 52

432 99 97 90 83 73 70 62 56

433 99 98 92 86 77 73 65 59

434 99 99 94 88 80 77 69 63

435 100 99 95 91 83 80 72 66

436 99 97 92 86 83 76 69

437 99 98 94 89 86 79 73

438 100 98 95 91 89 82 76

439 99 96 93 91 84 79

440 99 97 95 93 87 82

441 99 98 96 94 89 85

442 100 99 97 96 92 87

443 99 98 97 93 90

444 99 99 98 95 92

445 99 99 98 96 93

446 99 99 99 97 95

447 99 99 99 98 96

448 100 99 99 99 97

449 99 99 99 98

450 100 99 99 99

451 99 99 99

452 100 99 99

453 99 99

454 99 99

455 99 99

456 100 100

u.21

2016 uPDatES

Table U.11. 2015 aCt aspire mathematics Norms: Percentage of Students at or below Each Scale Score

Scale Score

Grade

3 4 5 6 7 8 9 10

400 1 1 1 1 1 1 1 1

401 1 1 1 1 1 1 1 1

402 1 1 1 1 1 1 1 1

403 1 1 1 1 1 1 1 1

404 2 1 1 1 1 1 1 1

405 4 1 1 1 1 1 1 1

406 6 1 1 1 1 1 1 1

407 9 2 1 1 2 1 1 1

408 12 3 2 2 3 1 1 1

409 17 5 3 2 5 2 1 1

410 24 8 5 4 6 3 2 2

411 31 13 8 6 9 5 3 3

412 40 19 12 8 12 7 4 4

413 49 27 18 11 15 10 6 6

414 59 36 24 15 20 14 9 8

415 68 46 32 20 24 18 12 10

416 76 55 40 25 29 22 15 13

417 83 65 48 31 35 27 19 16

418 89 73 56 37 40 32 24 20

419 93 79 64 44 46 37 28 24

420 96 85 70 50 52 43 33 28

421 98 89 76 56 57 48 38 32

422 99 92 81 63 62 53 43 36

423 99 94 85 68 67 57 48 40

424 99 96 88 73 72 62 52 45

425 99 97 91 78 76 66 57 49

426 99 98 93 82 79 69 61 53

427 99 99 94 85 83 73 65 57

428 99 99 96 88 85 76 69 61

429 99 99 97 91 88 79 73 65

u.22

2016 uPDatES


Scale Score

Grade

3 4 5 6 7 8 9 10

430 99 99 97 93 90 81 76 68

431 99 99 98 94 92 84 80 72

432 99 99 99 95 93 86 82 75

433 99 99 99 97 95 88 85 79

434 100 99 99 97 96 90 88 82

435 99 99 98 97 92 90 85

436 99 99 99 98 93 92 88

437 99 99 99 98 95 94 90

438 99 99 99 99 96 95 92

439 99 99 99 99 97 96 94

440 100 99 99 99 98 97 96

441 99 99 99 98 98 97

442 99 99 99 99 99 98

443 99 99 99 99 99 99

444 99 99 99 99 99 99

445 99 99 99 99 99 99

446 100 99 99 99 99 99

447 99 99 99 99 99

448 99 99 99 99 99

449 99 99 99 99 99

450 99 99 99 99 99

451 100 99 99 99 99

452 99 99 99 99

453 100 99 99 99

454 99 99 99

455 99 99 99

456 100 99 99

457 99 99

458 99 99

459 99 99

460 100 100

u.23

2016 uPDatES

Table U.12. 2015 aCt aspire Reading Norms: Percentage of Students at or below Each Scale Score

Scale Score

Grade

3 4 5 6 7 8 9 10

400 1 1 1 1 1 1 1 1

401 1 1 1 1 1 1 1 1

402 1 1 1 1 1 1 1 1

403 2 1 1 1 1 1 1 1

404 4 1 1 1 1 1 1 1

405 8 3 1 1 1 1 1 1

406 13 5 2 2 1 1 1 1

407 19 8 4 3 2 1 1 2

408 25 13 6 5 3 2 2 3

409 32 17 9 7 5 3 4 4

410 38 23 12 9 7 4 5 5

411 44 28 16 12 10 6 7 7

412 50 34 21 15 13 8 10 9

413 56 40 26 19 16 11 13 12

414 62 46 32 22 19 13 16 15

415 67 52 38 26 23 16 19 18

416 73 58 43 30 27 19 23 21

417 78 64 49 35 32 23 26 24

418 83 69 55 40 36 26 30 27

419 87 75 60 45 41 30 34 30

420 91 80 66 50 47 34 38 34

421 93 84 71 56 52 39 42 37

422 96 88 76 62 58 44 47 41

423 97 91 80 67 64 49 51 44

424 98 93 84 73 70 54 55 48

425 99 95 88 78 76 60 59 52

426 99 97 91 83 81 65 64 56

427 99 98 93 87 86 71 68 61

428 99 99 95 90 90 76 73 65

u.24

2016 uPDatES


Scale Score

Grade

3 4 5 6 7 8 9 10

429 100 99 97 93 93 81 77 70

430 99 98 95 95 86 82 75

431 100 99 97 97 90 86 80

432 99 98 98 93 90 85

433 99 99 99 95 93 90

434 100 99 99 97 95 93

435 99 99 98 97 96

436 100 99 99 98 98

437 99 99 99 99

438 100 99 99 99

439 99 99 99

440 100 99 99

441 99 99

442 100 100

u.25

2016 uPDatES

Table U.13. 2015 aCt aspire Science Norms: Percentage of Students at or below Each Scale Score

Scale Score

Grade

3 4 5 6 7 8 9 10

400 1 1 1 1 1 1 1 1

401 1 1 1 1 1 1 1 1

402 1 1 1 1 1 1 1 1

403 1 1 1 1 1 1 1 1

404 3 1 1 1 1 1 1 1

405 5 2 1 1 1 1 1 1

406 8 4 2 1 1 1 1 1

407 12 6 3 2 2 1 1 1

408 17 8 4 3 3 2 1 1

409 22 11 6 5 5 3 2 2

410 27 15 8 6 7 4 3 3

411 33 18 10 8 9 6 4 4

412 38 23 13 11 12 7 5 5

413 43 28 16 14 15 9 7 6

414 49 33 20 17 19 12 8 8

415 54 38 24 20 22 15 11 10

416 59 43 28 24 26 18 13 12

417 64 49 33 28 30 21 16 15

418 69 55 38 33 34 24 19 17

419 74 60 44 37 38 28 22 20

420 79 66 50 42 42 32 26 23

421 83 71 56 47 46 36 29 26

422 87 76 62 53 50 40 33 30

423 91 81 69 58 54 45 37 33

424 94 85 75 64 59 50 42 36

425 96 89 80 69 63 54 46 40

426 98 92 85 74 67 59 50 43

427 99 94 89 79 71 64 54 47

428 99 96 92 83 75 68 59 51

u.26

2016 uPDatES


Scale Score

Grade

3 4 5 6 7 8 9 10

429 99 97 94 87 79 72 63 55

430 99 98 96 90 83 77 67 59

431 99 99 98 93 87 81 71 63

432 99 99 98 95 90 84 75 67

433 100 99 99 97 93 87 79 71

434 99 99 98 95 90 82 74

435 99 99 99 97 92 85 78

436 100 99 99 98 94 88 82

437 99 99 99 96 91 85

438 100 99 99 97 93 88

439 99 99 98 95 91

440 100 99 99 97 93

441 99 99 98 95

442 99 99 99 97

443 100 99 99 98

444 99 99 99

445 99 99 99

446 100 99 99

447 99 99

448 99 99

449 100 100

u.27

2016 uPDatES

Table U.14. 2015 aCt aspire Writing Norms: Percentage of Students at or below Each Scale Score

Scale Score

Grade

3 4 5 6 7 8 9 10

408 1 1 1 1 1 1 1 1

409 1 1 1 1 1 1 1 1

410 2 1 2 2 2 2 2 1

411 3 1 3 2 3 2 3 1

412 4 1 4 3 4 3 3 2

413 6 2 5 4 6 4 5 3

414 9 2 7 5 8 6 6 4

415 12 4 9 7 10 7 8 5

416 16 6 12 9 13 10 10 6

417 21 8 15 11 16 13 12 8

418 27 12 20 14 20 16 15 10

419 33 17 25 17 25 20 18 12

420 40 24 30 21 30 26 21 14

421 48 31 36 26 36 32 25 17

422 55 40 43 32 42 38 29 19

423 62 49 50 37 49 45 33 22

424 69 58 57 44 56 52 38 25

425 75 66 63 50 62 60 43 28

426 81 74 70 56 69 67 48 32

427 85 80 75 62 74 73 54 37

428 89 85 80 68 79 78 60 42

429 92 90 84 73 84 83 66 48

430 94 93 87 78 87 87 73 56

431 96 95 90 82 90 90 80 64

432 97 96 92 85 93 93 86 73

433 98 98 93 88 95 95 91 83

434 99 98 95 91 96 96 95 90

435 99 99 96 93 97 97 98 96

436 99 99 97 94 98 98 99 99

u.28

2016 uPDatES


Scale Score

Grade

3 4 5 6 7 8 9 10

437 99 99 98 96 99 98 99 99

438 99 99 98 97 99 99 99 99

439 99 99 99 97 99 99 99 99

440 100 100 100 98 99 99 99 99

441 98 99 99 99 99

442 99 99 99 99 99

443 99 99 99 99 99

444 99 99 99 99 99

445 99 99 99 99 99

446 99 99 99 99 99

447 99 99 99 99 99

448 100 100 100 100 100

Table U.15. National average Scale Scores using 2015 Norm Sample

Grade level

Subject area


3 417 413 413 415 422

4 420 416 415 418 424

5 423 418 418 420 424

6 425 421 420 422 426

7 427 421 421 422 424

8 428 423 423 424 424

9 429 425 423 426 426

10 430 426 424 428 428

u.29

2016 uPDatES

Table U.16. Percentage of Students Who met Subject benchmark in the 2015 Norm Sample

Grade English Mathematics Reading Science Writing

3 73 60 38 36 15

4 72 54 42 40 20

5 73 52 40 44 25

6 74 56 50 47 38

7 76 43 42 41 26

8 76 38 51 41 27

9 63 35 45 37 46

10 61 28 39 37 63

u.30

2016 uPDatES

Section 2: Updating ACT Aspire STEM Readiness BenchmarksACT Aspire STEM scores are computed as the rounded average of mathematics and science scale scores. When ACT Aspire was launched operationally in spring 2014, there was no STEM score reported on the ACT® test and thus no ACT College Readiness Benchmark for STEM to be used for estimating ACT Aspire Readiness Benchmarks. Therefore, the original ACT Aspire STEM benchmarks were calculated as the rounded average of the readiness benchmarks for mathematics and science, which was anticipated to be simpler for users and less likely to lead to confusion.

Beginning in September 2015, the ACT started reporting the STEM score, which is also computed as the rounded average of mathematics and science scale scores. The benchmark for the new STEM score on the ACT is a score of 26 (on a scale of 1-36), which represents the minimum level of achievement required for students to have a 50% chance of earning a B or higher or about a 75% chance of obtaining a C or higher in STEM-related courses, such as Calculus, Chemistry, Biology, Physics, or Engineering (Mattern, Radunzel, & Westrick, 2015). This indicates that academic readiness for STEM coursework requires higher scores in mathematics and science than the traditional college readiness benchmarks. In comparison, the ACT College Readiness Benchmarks in mathematics and science are 22 and 23, respectively (Allen, 2013).

To maintain the alignment with the ACT College Readiness Benchmarks, the ACT Readiness Benchmarks for the ACT Aspire STEM scores were updated in fall 2015 based on the ACT STEM benchmark. A standardized score backmapping procedure was applied to the 2013 special study data used to scale ACT Aspire and create the ACT Readiness Benchmarks for the other subjects. The standardized score corresponding to the ACT STEM benchmark of 26 was computed based on grade 11 students who participated in that special study. This z-score was then converted to the ACT Aspire scale score by multiplying grade specific scale score standard deviation and adding grade specific scale score mean to obtain the STEM benchmarks for grades 3 to 10.

Table U.17 contains the new ACT Aspire STEM benchmarks for grades 3 to 10. Compared with the original STEM benchmarks, the new benchmarks are higher than the original ones by three to six points depending on grade. The new ACT Readiness Benchmarks for the ACT Aspire STEM scores are better aligned with the ACT College Readiness Benchmark. Students may meet the individual mathematics and science readiness benchmarks, but not meet the STEM benchmarks of the ACT Aspire test.

u.31

2016 uPDatES

Table U.17. New aCt Readiness benchmarks for aCt aspire StEm Scores

Tested Grade New

3 419

4 422

5 425

6 427

7 429

8 432

9 434

10 437

ReferencesAllen, J. (2013). Updating the ACT college readiness benchmarks. ACT Research Report 2013-6. Iowa City, IA: ACT, Inc.

Mattern, K., Radunzel, J., & Westrick, P. (2015). Development of STEM Readiness Benchmarks to Assist Educational and Career Decision Making. ACT Research Report 2015-3. Iowa City, IA: ACT, Inc.

u.32

2016 uPDatES

Section 3: Reliability for Spring 2015 ACT Aspire FormsThis section presents the reliability estimate of Cronbach’s alpha for ACT Aspire test forms administered in spring 2015. Forms with the same collection of operational items were combined in the analysis. To obtain a national representative sample, data from the spring 2014 and spring 2015 administrations were combined in the computation for forms administered in both years. Reliability estimates for each grade and subject are shown in Table U.18.

Table U.18. Cronbach’s alpha of aCt aspire Forms administered in Spring 2015

Grade Subject

Online Paper

Form 1 Items

Form 2 Items

Form 1 Items

Form 2 Items

3 English 0.80 0.73 0.82 0.76

Mathematics 0.80 0.83 0.78 0.82

Reading 0.85 0.84 0.84 0.84

Science 0.88 0.85 0.88 0.85

4 English 0.78 0.78 0.79 0.80

Mathematics 0.69 0.79 0.66 0.80

Reading 0.84 0.82 0.82 0.83

Science 0.84 0.84 0.84 0.84

5 English 0.76 0.79 0.78 0.80

Mathematics 0.70 0.78 0.68 0.78

Reading 0.84 0.81 0.82 0.82

Science 0.84 0.86 0.83 0.86

6 English 0.82 0.81 0.82 0.82

Mathematics 0.79 0.76 0.77 0.73

Reading 0.83 0.83 0.82 0.83

Science 0.87 0.86 0.87 0.86

7 English 0.81 0.86 0.82 0.87

Mathematics 0.83 0.81 0.81 0.78

Reading 0.80 0.81 0.80 0.81

Science 0.88 0.88 0.87 0.86

Note. EHS = Early High School

u.33

2016 uPDatES


Grade Subject

Online Paper

Form 1 Items

Form 2 Items

Form 1 Items

Form 2 Items

8 English 0.85 0.86 0.85 0.86

Mathematics 0.87 0.86 0.86 0.85

Reading 0.82 0.84 0.81 0.83

Science 0.87 0.88 0.87 0.86

EHS English 0.89 0.90 0.91 0.91

Mathematics 0.87 0.88 0.88 0.87

Reading 0.86 0.86 0.87 0.85

Science 0.88 0.89 0.89 0.89

Note. EHS = Early High School

u.34

2016 uPDatES

Section 4: Changes to the Program for 2016The ACT Aspire testing program is an effective and efficient measure of college and career readiness. The grade-level tests were designed to be given in a short amount of time to avoid the phenomenon of over-testing and allow for more valuable classroom learning experience. The program is also designed to allow for continuous improvement. The goal of ACT Aspire is to promote fluency in the College and Career Readiness Standards (CCRS), while optimizing the testing experience for students.

The following three categories of improvement will be implemented in the spring 2016 operational administration:

1. Testing time

2. Number of test items for grades 3-5 English and mathematics

3. Reporting category name change for mathematics

testing timeACT Aspire assessments are timed tests, meaning that students have a certain amount of time allowed to complete their answers. Virtually all timed tests of academic achievement can be considered to be speeded to some degree, as students working at a very deliberate pace may feel they do not have sufficient time to carefully consider each test question

ACT conducted extensive research to inform the adjustments to testing time for future administrations of ACT Aspire. A number of factors related to speededness were analyzed. After completing several types of analyses based on mode, item location, item latency and omit rates, very specific adjustments to testing time were made.

Table U.19 shows the changes in testing time for the spring 2016 operational administration.

Table U.19. aCt aspire Summative testing time Changes (in minutes)

GradeEnglish (New)

English (Old)

Math (New)

Math (Old)

Reading (New)

Reading (Old)

Science (New)

Science (Old)

3 40 30 65 55 65 60 60 55

4 40 30 65 55 65 60 60 55

5 40 30 65 55 65 60 60 55

6 40 35 70 60 65 60 60 55

7 40 35 70 60 65 60 60 55

8 40 35 75 65 65 60 60 55

EHS 45 40 75 65 65 60 60 55

Note. NEW assessments are those administered in 2016; OLD assessments are those administered in 2015. Assessments for grades 6–EHS did not undergo an assessment design change.

u.35

2016 uPDatES

To ensure ACT Aspire provides sufficient evidence for reliable scores, ACT monitors many statistical factors to ensure test scores are valid, reliable, and fair. The following changes to these tests will be implemented in the spring 2016 operational:

English

• Add 6 MC Items to grades 3, 4 and 5

Mathematics

• Add 6 MC Items to grades 3, 4 and 5

• Remove a single CR from grades 3, 4 and 5

• The reporting category “Foundation” has been renamed to “Integrating Essential Skills” on all score reports.

When adding these items, item latency analysis is also performed to ensure the test timing is controlled to avoid speededness. Secondly, items are added proportionally to reporting categories in order to collect sufficient and high quality evidence for content coverage.

The following tables show content and item type coverage for the mathematics and English tests. They provide information as follows:

English• Table U.20 shows the increase in MC items for English grades 3, 4 and 5 by item

type

• Table U.21 shows the increase in points by reporting category for English MC items in grades 3, 4 and 5

• Note: Grades 6 through EHS are not changed in spring 2016 and are included here for your reference

mathematics• Table U.22 shows the increase in MC items for Mathematics grades 3, 4 and 5 by

item type. Also shown is the decrease of a single CR item in these grades

• Table U.23 shows the increase in points by reporting category for Mathematics MC items in grades 3, 4 and 5, and a decrease in the Justification and Explanation(JE) category for these grades based on reducing by one CR item (equal to 4 points)

• Note: Grades 6 through EHS are not changed in spring 2016 and are included here for your reference

u.36

2016 uPDatES

EnglishTable U.20. English: Number of Items by Item type

Grade

3 3 4 4 5 5 6 7 8 EHS

NEW OLD NEW OLD NEW OLD

Item Type Number of Items

MC 27–28 21–22 27–28 21–22 27–28 21–22 31–33 31–33 31–33 46–48

TE 3–4 3–4 3–4 3–4 3–4 3–4 2–4 2–4 2–4 2–4

Total 31 25 31 25 31 25 35 35 35 50

Note. EHS – Early High School (Grades 9 and 10); MC – Multiple Choice; TE – Technology Enhanced. Paper-and-pencil tests do not have TE items. MC items are used in their place.

Table U.21. English: Number of Points by Reporting Category

Grade

3 3 4 4 5 5 6 7 8 EHS


Item Type Number of Points

POW 12–14 9–11 8–10 6–8 8–10 6–8 10–12 9–11 9–11 12–14

KLA 3–5 2–4 3–5 2–4 3–5 4–6 4–6 6–8

CSE 17–19 14–16 17–19 14–16 17–19 14–16 19–21 19–21 19–21 29–31

Total 31 25 31 25 31 25 35 35 35 50

Note. POW – Production of Writing; KLA – Knowledge of Language; CSE – Conventions of Standard English

mathematicsTable U.22. mathematics: Number of Items by Item type

Grade

3 3 4 4 5 5 6 7 8 EHS



MC 21–23 15–16 21–23 15–16 21–23 15–16 23–25 23–25 27–29 27–29

TE 4–6 5–6 4–6 5–6 4–6 5–6 5–7 5–7 5–7 5–7

CR 3 4 3 4 3 4 4 4 5 5

Total 30 25 30 25 30 25 34 34 38 38

MC – Multiple Choice; TE – Technology Enhanced; CR – Constructed Response. Paper-and-pencil tests do not have TE items. MC items are used in their place.

u.37

2016 uPDatES

Table U.23. mathematics: Number of Points by Reporting Category

Grade

3 3 4 4 5 5 6 7 8 EHS



Base 10 5–7 5–7 5–8 3–5 5–8 3–5 1–3 1–3 1–3 0–2

Fractions 3–5 2–4 6–8 4–6 6–8 4–6 1–3 1–3 1–3 0–2

NS 3–5 3–5 2–4 1–3

NQ 1–3

OAT 6–8 3–5 4–6 3–5 3–5 3–5 1–3 1–3 0–2 0–2

EE 3–5 3–5 5–7 2–4

RP 3–5 3–5 0–2 1–3

ALG 2–4

Functions 3–5 3–5

Meas. 0–2 0–2 1–3 1–3

GEO 3–5 3–5 3–5 3–5 4–6 3–5 5–7 4–6 6–8 5–7

MD 5–7 3–5 3–5 3–5 3–5 3–5

Data 0–2 1–3 1–3 1–3

SP 3–5 3–5 4–6 4–7

JE 12 16 12 16 12 16 16 16 20 20

Total 39 37 39 37 39 37 46 46 53 53

AbbreviationACT Aspire Reporting

Category

Base 10 Number & Operations in Base 10

NS The Number System

OAT Operations & Algebraic Thinking

RP Ratios & Proportional Relationships

Functions Functions

Meas. Measurement

MD Measurement & Data

AbbreviationACT Aspire Reporting

Category

SP Statistics & Probability

JE Justification & Explanation

Fractions Number & Operations–Fractions

NQ Number & Quantity

EE Expressions & Equations

ALG Algebra

GEO Geometry

Data Data

ACT Aspire® Technical Manual - archchicago.orgocs.archchicago.org/Portals/23/ACT Aspire Technical...

Documents

Transcript of ACT Aspire® Technical Manual - archchicago.orgocs.archchicago.org/Portals/23/ACT Aspire Technical...