The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret...

114
Hoover Institution Stanford University Stanford, CA 94305-6010 Http:// Credo.Stanford.edu/ CREDO The Future of California’s Academic Performance Index April, 2002

Transcript of The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret...

Page 1: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

Hoover Institution Stanford University Stanford, CA 94305-6010 Http:// Credo.Stanford.edu/

CREDO

The Future of California’s Academic Performance Index

April, 2002

Page 2: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

ii

The Future of California’s Academic Performance Index

April, 2002

CREDO

Stephen Fletcher Margaret Raymond

Page 3: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

iii

Project Staff

Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick Carissa M. Miller Kerry A. Philp Kate M. Chauncey With editorial assistance from: Cerena Sweetland-Gil With technical consulting from: Eric A. Hanushek, Ph.D., Principal Investigator

Page 4: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

iv

Table of Contents Acknowledgments v

List of Tables vi

Executive Summary viii

I. Introduction 1

II. Research Approach 4

III. Logistics of Collecting Data on Teacher and Student

Attendance Rates and Student Graduation Rates 5

IV. Essential Performance Characteristics for California’s API 10

V. STAR – The Foundation of the API 14

VI. Analysis of the California API 19

VII. An Alternative Approach to the API 45

VIII. Summary of Findings 54

IX. Recommendations 56

Bibliography

Appendices Appendix A: Detailed Summary of SB 1552 Changes to PSAA (SB 1X) Appendix B: The 2001 Base Academic Performance Index (API):

Integrating the California Standards Test for English-Language Arts into the API

Appendix C: Literature Review of Candidate Variables Appendix D: Analysis of the Reliability and Validity of the Current

Computational Method for the API

Page 5: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

v

Acknowledgments

CREDO would have been unable to complete this project without the help of several individuals. Their mention here seems a minimal tribute for what for some was a substantial investment of time and attention. The thinking on this project was greatly benefited by feedback from Robert J. Spurlock and Brian Edwards from California’s Office of the Secretary for Education. Their comments during the early stages of analysis helped the authors to better understand the issues involved in the project, as well as to look at the recommendations from a short-term and long-term perspective. Their constructive review of the draft report substantially improved the final product. Linda Lownes of the California Department of Education was willing to give of her time, either in person or on the telephone, to help us understand issues related to the STAR program and the API. Her comments were helpful, and her availability is appreciated. Finally, this project could not have been completed without the willingness of personnel at the State Departments of Education throughout the country to answer questions about their states’ accountability activities. These individuals not only told us about current activities, but also shared their insights about the evolution of their accountability systems, including plans for the future. For all the contributions of these people to this project, we are thankful. We absolve all parties, however, of any responsibility for any shortcomings of this report.

Page 6: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

vi

List of Tables

Table 1 15 Classification of States by the Number of Grade Levels Assessed in 2001 Table 2 16 Classification of States by the Type of Assessments Used in 2001 Table 3 18 Type of Assessments being Used in States And the Grade Levels Being Assessed Table 4 20 Methods of School Accountability Used by States in 2001 Table 5 24 Variables Considered for Inclusion In the API Table 6 Disposition of Candidate Variables 29 Table 7 32 Classification of States by the Type of Data Elements Used in School Rating Systems in 2001 Table 8 33 Candidate Variables by Type Table 9 36 Strength of Relationship of Candidate Variable to Student Achievement Table 10 37 Pattern of Classification Variables by Strength of Association with Student Achievement

Page 7: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

vii

Table 11 39 Quality Assessment of Candidate Variables Table 12 43 Classification of States with Rating Systems by the Number of Grade Levels Assessed in 2001 Table 13 47 Classification of States by the Type of Analysis Model Used in School Rating Systems in 2001 Table 14 55 Benefits of the Suggested Changes to the API

Page 8: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

viii

Executive Summary

Background The provisions of the Budget Act of 2000-01 (AB 1740) require an independent analysis of the state’s rating system for elementary and secondary public schools, known as the Academic Performance Index (API). The directive sought to explore the feasibility of expanding the API to include other factors beyond its current use of student scores from the Standardized Testing and Reporting (STAR). CREDO, a non-partisan research group at the Hoover Institution of Stanford University conducted a comprehensive analysis of three variables mandated in the original legislation for inclusion in the API: graduation rates; student attendance rates and certificated school personnel attendance rates. In addition, the analysis considered 29 other potential factors. Finally, the manner of incorporating STAR results into school scores was compared to the techniques used in other states. This report presents the findings. Approach The diversity in school accountability programs across the country provided a rich source of alternative approaches to the common interest in measuring school performance. The comparison revealed the strong standing of California with regard to several Best Practices: the accountability program uses a rating system to quantify and evaluate school performance; it incorporates nine grade levels of student test data based in both norm-referenced and criterion-referenced tests; and it is based on outcome measures of student performance. Study of other systems yielded other potential inputs to the API. The study also called for examination of the set of information collected regularly from school districts and reported to the California Department of Education (CDE) to identify potential additions. Those sources were augmented through interviews with local school districts in an effort to identify other measures of student performance that might be collected for local purposes. These data were then analyzed against a set of performance criteria, grounded in the provisions of the legislation, to determine their suitability for inclusion in the API. In addition to studying the input side of the API, the analysis focused on other ways to use test scores to generate the API. The method employed in California was compared to three other techniques used in other states. It was important to ascertain that the methods that CDE uses to calculate the API for schools and disaggregated populations of interest produce fair and accurate results for all schools. Feasibility of Adding Graduation Rates, Student Attendance Rates or Certificated School Personnel Attendance Rates to the API At the time of this report, the three variables that are mandated for inclusion in the API are not defined, measured or transmitted to the State Department of Education in such a way that they could be incorporated into the API with confidence. Significant changes to both state and local practice would need to occur before consistent and correct measures resulted. Graduation Rates The current definition of graduation rates, the percent of students entering into 12th Grade who graduate, fails to consider the numbers of students of that cohort who drop out, are retained, or graduate before or after the traditional 12 years of

Page 9: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

ix

schooling. The computation needs a set of common standards for defining the population, setting time limits for inclusion, and how to treat students who do not stay in a high school until the end of their twelfth grade. Improved tracking of students over time will be needed. Student Attendance Rates As currently reported student attendance rates (called Average Daily Attendance or ADA) do not lend themselves to inclusion in the API because they are calculated for groups of grades. To include student attendance rates will require better monitoring of students, both over the course of a day and over the school year. As well, unduplicated and uniform calculations of the rate will be needed. Consistent application of rules for determining absences will also be needed. Certificated School Personnel Attendance Rates None are reported to the state currently. A common set of definitions is needed, including uniform units of measure (number of periods, half day, full day, etc.) and consistent accounting rules for personal time, sick days, and professional development. As well, the State Education Department will need to develop a reporting policy for the three variables to assure that districts provide the data in a timely fashion. Use of electronic transmission of reports is encouraged to minimize the sources of potential input error. The required steps to produce these three data will be made far easier when a common data system is available throughout the districts of the state. It makes little sense to undertake these steps apart from a larger effort to overhaul information systems. Suitable Variables for Inclusion in the API Thirty-two different factors were identified as candidate variables but all of them failed one or more of the performance criteria. (The three mandated variables were included in the analysis.) Three failed due to lack of data in California. The remaining twenty-nine factors were studied for their congruence with student achievement. Twenty were found at best to be moderately related to student achievement, meaning that their inclusion in the API would reduce the ability to discern differences in school performance. The remaining nine variables showed strong association with student achievement, and could add a multi-dimensional basis to the school scores. All nine were eliminated on data quality grounds: insufficient variation in the variable values; consistency problems with collection or calculation; or bias that distorts what is measured. When more uniform data resources are developed in California, the nine variables strongly associated with student outcomes should be revisited for appraisal. Choice of Aggregation Method for Producing School Scores In the course of studying the accountability systems of other states, three other methods of computing school scores emerged. The current API is an example of a Static Measure of school performance of which there are two variations. This approach was studied against the alternative, which looks at changes for individual students over time, called a Student Change Model. California has the chance to improve the performance of the API if it were to alter its choice of aggregation method. The analysis revealed five major ways

Page 10: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

x

that the API would benefit from the change from the cross-sectional Status Model to a Revolving Panel Student Change Model. The five benefits are:

1. With the Student Change Model, measures of school performance for a single year are accurate.

2. With the Student Change Model, measures of progress in school performance are

more accurate.

3. The Student Change Model produces a fairer picture of all schools, both in demonstrating improvement and in showing the preservation of gains.

4. With the Student Change Model, all eligible students contribute to the school

score with equal weight.

5. Conditioned on an ability to track student moves across districts and certain legislative changes, the Student Change Model would allow for greater numbers of students to be deemed eligible for inclusion in the API every year.

The capability exists today to move to a district-level revolving panel to track students for as long as they remain in a district. At the point where a statewide unique student identifier is available the basis of the revolving panel model could be expanded to track students as they move within the state. If the change were adopted, the performance of the API could improve. The precision of the API calculation would increase. The longitudinal approach could also support inclusion of a wider segment of the student body than is currently used. The statewide approach also would support a richer analysis, of the causal factors of student performance, enabling accountability to focus more clearly on those factors controlled by schools and districts. Recommendations Four recommendations complete the report. They are:

1. Until more robust measures are available, the API should continue to consist of STAR scores.

2. Use individual student gain scores on STAR tests to calculate the API.

3. Conduct regular quality assurance review of the API. 4. Establish long-term student outcomes to reflect ultimate success of the PSAA.

Page 11: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

1

I. Introduction

The Office of the Secretary for Education of the Governor’s Office (OSE) and the Department of Finance were charged by the California State Legislature in the Budget Act of 2000-01 (AB 1740) to select a contractor to report on:

1. Data collection and data collection alternatives for potential additional factors to be included in the Academic Performance Index (API);

2. Recommendations regarding the most cost effective, and most feasible methods for including factors, in addition to test scores, in the API; and

3. Upon request, present options to the State Board of Education, including the fiscal impact of each new factor, specific processes for capturing the new data, and feasible timeframes for inclusion in the API.

The legislature wanted particular attention paid to certificated personnel and student attendance rates and graduation rates because of their reference in the Public School Accountability Act (SB 1X). The Budget Act also charged the OSE and the Department of Finance with selecting a contractor to perform that study. They selected CREDO, an independent non-partisan research group at Stanford University. This report presents the results. The enabling legislation for the Public Schools Accountability Act (PSAA) created three components. The API is the tool created to measure objectively and consistently the performance of schools in the state. The other two programs, the Immediate Intervention/Underperforming Schools Program and the Governor's High Achieving/Improving Schools Program, build on the API’s results with remediation and rewards. This report focuses only on the API and not on the two other complementary programs of the Public Schools Accountability Act. The report also does not address the alternate accountability system established for alternative schools. As indicated in the Budget Act, the primary objective of the study was to examine possible data elements for inclusion in the API, as well as the best way for including the new elements. This examination focused on data collection options, as well as whether the data elements would meet the legislative requirements for the API. It was important that any recommendations for change in the API position California for viable longer term development of its accountability system; care was needed to avoid evolutionary dead-ends. Our research points to desirable changes immediately and in the longer term. We assume that future changes in legislation are feasible, and tie longer run recommendations to the changes that would be required to make them possible. The project was especially challenging because public school accountability systems are new. Both policy makers and researchers recognize that the systems adopted to

Page 12: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

2

date across the states are experimental and evolving. They are experimental in the sense that they are structured differently in different states, states do not have a long record of experience with them, and no one is certain which approach offers the best picture of school performance. This uncertainty is compounded by the fact that in California and elsewhere, school accountability systems treat student-level test scores in a way that differs from their original intended use. Accountability systems are evolving as states learn how to make them more accurate. In California, this intent is clearly reflected in the legislation. The analysis was greatly aided by the study of accountability practices in other states. Parallel experience in other states offers the chance to leverage the learning curve elsewhere and keep the California system at the front of the curve. Based on the comparisons, we have derived a general developmental outlook that arrays the systems progressively along a number of dimensions. For each of the features of the API, the developmental approach identifies Minimum, Better and Best practices. The developmental outlook helps both to locate the current California API and to point the way for its refinement. The comparison revealed the strong standing of California with regard to several Best Practices: the accountability program uses a rating system to quantify and evaluate school performance; it incorporates nine grade levels of student test data based in both norm-referenced and criterion-referenced tests; and it is based on outcome measures of student performance. The findings show the California API in a positive light. The current construction of the API, which relies on aggregated test scores on the Standardized Testing and Reporting (STAR) exams, yields an empirical picture of the academic performance of students in California schools. The initial decision not to incorporate certificated personnel and student attendance rates and graduation rates into the API is supported by the poor quality of the currently available data that are collected inconsistently across California districts. The state is in a strong position to enhance the sophistication of the API by migrating to a system that focuses on student level gains in knowledge and controls for factors outside the schools’ control. The findings support several immediate and near-term modifications to the analysis of API data. Longer run changes, and the legislative changes necessary to support them are also identified.

Background of the API The Public Schools Accountability Act was passed as Senate Bill(SB) 1X in the 1999 Special Legislative Session. The legislation specifically calls for “an immediate and comprehensive accountability system to hold each of the state's public schools accountable for the academic progress and achievement of its pupils within the resources available to schools.” (Section 52050.5 (d)) To that end, the Superintendent of Public Instruction was directed to develop an Academic Performance Index “to be used to measure the performance of schools, especially the academic performance of pupils, and demonstrate comparable improvement in

Page 13: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

3

academic achievement by all numerically significant ethnic and socioeconomically disadvantaged subgroups within schools.” (Section 52052 (a)) The legislation was revised with the enactment of SB 1552, in September 2000. The revisions concerning the API strengthened the rules for delineating populations of interest, clarified permissible API exclusions due to student mobility, and provided specific direction for setting improvement targets for schools. A side-by-side comparison of the two bills is included in Appendix A. The California Department of Education (CDE) established the Advisory Committee for PSAA to translate the intent of the legislation into functional specifications for the API. With those directions, a Technical Design Group consisting of statisticians, assessment experts and CDE policy staff formulated the API calculations and associated comparative benchmarks. At the point this report was prepared, the API had been in effect for three years in California.

Overview of the Report This report is presented in the following way: Section II gives a summary of the approach used in this project. In Section III, we review the logistics of collecting student and certificated personnel attendance and student graduation information from schools and districts. In Section IV, we review the multiple performance requirements of the API. These include legislative mandates, current operational constraints and the practical objectives that the API must satisfy. Since the foundation of California’s accountability system is the STAR program, Section V gives a brief analysis of the current testing program and the planned changes to it. This discussion helps to introduce the differences between the STAR tests and the API. Section VI presents the analysis of the California API using the developmental framework for accountability systems and the performance criteria developed in Section IV. For each of the performance criteria, we define key dimensions of the API and show where in the progression for each dimension California falls. For illustration, we also show other states. In each area, we discuss the opportunities and challenges involved with evolving the API in the future. In Section VII, we review alternate ways to calculate the API and discuss the benefits of changing approaches. Our recommendations for changes to the API and associated systems appear in Section VIII.

Page 14: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

4

II. Research Approach

A brief description of the project is included here. The project was conducted during the second half of 2001. Information and data were obtained from a large number of sources. Because school accountability practices are changing so rapidly, even recent publications were at risk of being outdated. We circumvented this problem by augmenting an extensive scan of current publications with in-depth interviews with officials in every state education department in the nation. To minimize the time asked of busy public servants, a profile of each state was prepared. Information was drawn from the web sites of the state education department, state legislature, state board of education (if separate from the department of education) and various national associations of education organizations. In addition, the Center for Policy Research in Education had compiled reports on state accountability plans in 2000, and they investigated some of the same questions as this effort. Finally, a recent special report by Education Week published in January 2001 focused in part on accountability systems, and these data were incorporated into our work. With the assistance of the Secretary for Education, we were able to secure telephone interviews in all 49 other states. The interviews followed a rigorous protocol to obtain consent, and lasted between one and three hours in length. Without the patient assistance of the respondents who agreed to participate, this report would not be as current as it is. Even with their participation, the data has been augmented to reflect recent changes in the systems in several states that occurred after our interviews were completed. Within California, the study protocol called for an examination of those data and indices that are routinely reported to the California Department of Education. We compiled all of the regular reports that schools and districts file with the CDE. In addition, we interviewed school district personnel, typically in the Assessment or Evaluation units, to see if any districts were collecting data on student outcomes for their own purposes that augmented the measures required by the state.

Page 15: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

5

III. Logistics of Collecting Data on Teacher and Student Attendance

Rates and Student Graduation Rates A key purpose of this report was to explore the feasibility of collecting data from schools or districts on teacher and student attendance rates and student graduation rates. Based on the information garnered in California and other states, data can be collected if the state does the following:

1. Standardizes definitions, 2. Has schools report the information to the state, and 3. Monitors how schools or districts collect the information.

The purpose of this section is to review the practice in other states that use the three variables, the status of how the schools or districts in California collect these elements, and how adoption of the three practices above will allow California to include the variables in the API. This section addresses only the collection and reporting mechanisms that are in use or that could be adopted. In the following section, we examine the question of whether adding any of these factors to the API is desirable. Based on our findings from other states and from a sample of California school districts, we believe that certificated personnel and student attendance rates and student graduation rates can be collected without the implementation of CSIS, but that collection would require standardizing definitions and monitoring school and district processes. CDE will also need to develop a reporting policy for the three variables to assure that districts provide the data in a timely fashion. Use of electronic transmission of reports is encouraged to minimize the sources of potential input error. The required steps to produce these three data will be made far easier when a common data system is available throughout the districts of the state. Given the large amount of effort required to standardize definitions and monitor district processes, it makes little sense to undertake these steps except as a component of a larger effort to overhaul information systems. Certificated Personnel Attendance Rate We learned little about how to collect this variable from other states because no state is using certificated personnel attendance in its computational model for accountability. However, two states, Nevada and Pennsylvania, do include the variable in their school profiles, which are similar to California’s School Accountability Report Card (SARC). In California, districts record certificated personnel attendance for payroll and to make sure each classroom has a teacher. The detail of the information collected varies, ranging from fractions of an hour to the number of periods away, to whether a person is gone a full day or half day. In addition, districts also collect varying levels of detail on why a

Page 16: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

6

person is absent (e.g., sick, professional development, doctor/dentist). The information is self-reported and not monitored for correctness. At the end of each month, reports are generated for administrators about overall attendance rates, and individual reports are created for teachers so they can monitor how many available sick days have been used. Several issues need to be addressed before certificated personnel attendance information can be utilized in the API. First, definitions need to be standardized, particularly those dealing with units of time being measured. This issue is critical because of the difference between the elementary and secondary school day. Many districts have addressed the issue already because elementary schools and middle schools can be in the same district. Consequently, the focus of standardizing the definition needs to be across districts. A related issue concerns variation in the length of a day. At the secondary level some teachers teach more classes than others, and so it needs to be decided if an extra duty teacher gets credit for more than a full day working or only one day. This discussion applies to teachers doing in-class instruction as well as those teachers involved in extra-curricular activities, like coaching or advising the debate team. A second area that needs to be clarified is what circumstances constitute an absence. For example, consistent accounting would mean that a teacher who is not in the classroom but is mentoring a new teacher or developing a new district curriculum would be treated the same way in all districts. The need for uniformity goes beyond accounting to include different notions of professional development. Standardizing the reasons for absences will allow the state the flexibility to include certain absences in the attendance rate if they so choose. Second, the mechanics of collecting attendance information, storing it, and calculating an attendance rate should be monitored. The monitoring process may range from doing within-year and across-year analyses of data to actual site visits. This activity may be done by the state or county offices of education. Monitoring also needs to occur in terms of the reasons given for an absence. Verification could be formal (e.g., completing a form describing the activities participated in) or informal (e.g., a brief presentation at a faculty meeting). However it is done, the reasons for absences given by personnel need to be monitored by districts in order to assure accuracy in the resulting statistics. Third, the state must decide how the information is to be reported and how often reports are to be sent in. Districts generate a variety of reports each month from the data on certificated personnel attendance. They also generate a series of yearly reports. We suggest that certificated personnel attendance be reported at the same time as student attendance. Based on current procedures, certificated personnel attendance would be reported three times each year.

Page 17: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

7

Student Attendance Rate Ten states are collecting student attendance information for their accountability systems. The method of collection varies from state to state. For example, Kentucky requires that schools calculate an average attendance rate and furnish the result to the state department of education, which monitors school processes by periodic site visits. Alternatively, Louisiana requires schools to send them attendance information for each student, and then the state makes the calculations. For states with a statewide MIS network (e.g., New Mexico), schools input attendance data and the state calculates the average daily attendance. In California, student attendance is maintained at the school level and reported to the state three times a year. The reports are unique for two reasons. First, the reports overlap in the time frame of interest. For example, the first report covers July 1 to December 31, the second covers July 1 to April 15, and the third covers July 1 to June 30. Consequently, the three reports cover overlapping time frames and so comparisons can be made to determine attendance patterns. Second, the reports require schools to calculate the average daily attendance (ADA) for a band of grade levels, not individual grade levels. Specifically, a school sends in the ADA by the following: Grades 1-3, 4-6, 7-8, and 9-12. While the final tabulations may fit a variety of State Education Department purposes, the data as currently reported do not lend themselves to easy integration into the API. Based on California’s current data collection, three recommendations arise on student attendance collection to improve the data quality. First, standardized definition of what constitutes student attendance are needed. The need is acute for secondary schools. Although teachers take attendance during every class, only the attendance from one period is reported to the school office and recorded. Variable attendance in the rest of the day is not reported. If student attendance is to be included in the API, we recommend a change in the current method of attendance so daily attendance reflects how many classes are actually attended. The same idea can also be applied to elementary grades to deal with those cases when a student is only in attendance part of a day because of things like illness, or a doctor/dentist appointment. Second, the current level of state monitoring of data collection and calculation, required for fiscal accountability, should be expanded to include more safeguards. One option would be for each teacher to maintain a spreadsheet of student attendance for each class being taught. At the elementary level, each teacher would maintain one spreadsheet, while at the secondary level a teacher would maintain multiple spreadsheets. The spreadsheets can then be sent to the state at specified intervals for calculation of ADA. The advantage of this method is that the state would have control over the calculation of the ADA and could try different methods. In addition the student attendance data could be used for other purposes (e.g., analyzing STAR data) and could eventually be added to the California School Information System (CSIS) data, when it becomes available.

Page 18: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

8

Third, the current reports submitted to the state should be modified to improve the quality of the data. The standard of preserving the maximum degree of detail should dictate the level of disaggregation over time. Considering the current level of local data collection and information systems, the ADA should be calculated by grade level and not for clusters of grades for the time being. Additionally, the report should be modified to include enrollment information, so that the state will have the option of doing additional analysis of the data. When improvements to school and district MIS systems are made, including the use of statewide unique student identifiers, then the reports should be modified to require individual level attendance data to be reported. Student Graduation Rate Graduation rates are composite indicators that require the use of two separate figures: the number of graduates and the size of the reference population on which the rate is based. Each of the two figures can be calculated in different ways, and the resulting graduation rates can look dramatically different. This pattern was evident in the study of other states. Five states are using a graduation statistic in their accountability computation model. Four of the states use the same calculation as California (number of twelfth grade students graduating divided by twelfth grade enrollment from the previous fall). One state, Michigan, uses ninth grade enrollment from four years prior to graduation as the denominator. Current California practice is guided by the information already collected through the California Basic Education Data Systems (CBEDS). Specifically, schools report the number of twelfth grade graduates from the previous year, which includes the number of summer graduates, but does not include students who received high school equivalencies. Schools also report grade level enrollment for major ethnic groups and student gender. Because of the information collected each year, California could calculate graduation rate using ninth or twelfth grade enrollment as a denominator. As with the attendance variables, our recommendations focus on standardizing definitions, monitoring the process, and getting reports from districts. With respect to definitions, we believe the state has a unique opportunity to improve the graduation rate by making sure it is both a reasonable construct and computationally sound. The current method of calculating the statistic means that it is closer to a twelfth grade completion rate than a graduation rate. A high school graduation rate can be calculated by linking ninth grade student information from three years prior with twelfth grade information. The linkage can be done currently in some districts because they assign students unique identifiers when they enter the district and so student move-ins, move-outs, and dropouts can be tracked from year to year. For districts without unique student identifiers, tracking can still be done by a hand comparison of student enrollments across years. In both cases, students will be included in calculations unless they transfer to another school. This process would allow for inclusion of the graduation rate in the API and help districts

Page 19: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

9

progress to the time when CSIS is implemented. A second definition that needs to be improved is what counts as a graduate. The current definition focuses only on twelfth grade students who graduate. However, students can complete elementary and secondary education in less than twelve years, which means they are not included in the data. There are also students who take longer than twelve years, but they are only included if they complete requirements before the start of the next school year. Consequently, the current data definition ignores students from two important segments of the student population. With respect to monitoring, we recommend that districts be audited periodically on data collection procedures for student enrollment and student graduates, the two components of the graduation rate. Monitoring would focus on the time period that districts use to calculate graduates and how they determine if a student is still enrolled in the district. Monitoring may also include determining if a district has a unique student identifier, and if not, what steps are being taken to develop an identifier. In terms of reports, the format of the information currently being given to the state will need to be changed. As indicated, schools and districts submit summary statistics through CBEDS. If the state is going to calculate a graduation rate based on unique student identifiers, schools and districts will have to submit student level data and not school level data. Consequently, CBEDS will need to be modified or the student level information can be submitted separately. The other option is for districts to calculate a graduation rate for each school and then submit the results to the state. This option would require that the calculation method and process be monitored for correctness and accuracy.

Page 20: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

10

IV. Essential Performance Characteristics for California’s API

The Academic Performance Index was created to give policy makers and the general public an objective and consistent measure of school performance in California elementary and secondary schools. This laudable purpose, however, could be interpreted in many ways and lead to a variety of approaches. Indeed, the diversity of accountability programs across the country is testament to how differently the same intent can be translated into practice. To select the best alternatives from among the many options for the API, CREDO constructed a set of criteria to describe what the API must accomplish if it is to provide policy makers with the clearest picture of the schools in the state. The language of the authorizing legislation provided initial guidance about how the API should be structured. These mandates established some parameters of what is to be measured and who is to be measured. The legislative requirements for the API are reviewed below. As far as it goes, the legislation helps specify the system, but there is still considerable latitude within those boundaries. We have developed other requirements that describe the ways the API must perform if it is to be an effective and accurate tool for policy making. The full set of performance requirements (sometimes called functional requirements in systems analysis) serve as evaluative criteria in the analysis of the current API and the various options for change.

Legislative Requirements Education Code Sections 52051-52058 contain the authorization for the Public School Accountability Program. The Academic Performance Index is one of three elements of the Public School Accountability Program; the other two are the Immediate Intervention/Underperforming Schools Program and the Governor's High Achieving/Improving Schools Program. The details of the statute include six requirements that must be included in the API. These are:

L1. The API should be outcomes focused.

“It is also the intent of the Legislature that the comprehensive and effective school accountability system primarily focus on increasing academic achievement.” (52050.5 (j)) “By July 1, 1999, the Superintendent of Public Instruction, with approval of the State Board of Education, shall develop an Academic Performance Index (API), to measure the performance of schools, especially the academic performance of pupils…”(SB 1X, 52052.(a1)) (emphasis added) Paradoxically, although the legislation indicates that the API’s emphasis should be on student outcomes, SB 1X creates a

Page 21: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

11

tension between means and ends by requiring that the API include certificated personnel and student attendance rates and student graduation rates.

L2. The API should comprehensively reflect the performance of students. “It is in the interest of the people and the future of this state to ensure that each child in California receives a high quality education consistent with all statewide content and performance standards, as adopted by the State Board of Education, and with a meaningful assessment system and reporting program requirements.” (SB 1X, 52050.5(b)) [The API should] “demonstrate comparable improvement in academic achievement by all numerically significant ethnic and socioeconomically disadvantaged subgroups within schools.” (SB 1X, 52052.(a1)) Although the emphasis of the API is to make sure all students get a good education, not all students are included in the API. Specifically, students in grades kindergarten, first, and twelfth grades do not participate in the STAR assessment program and so are not included in the API. Additionally, alternative schools do not generally participate in the API.

L3. Specified statewide tests must constitute at least sixty percent of value of the API. (SB 1X, 52052.(a.3A)) The intent to rely predominantly on standardized tests of student academic achievement is clear. The STAR tests provide a common foundation across schools for assessing the performance of students.

L4. The accountability system should contain assessment information on students in at least grades 2 through 11. Although the API legislation does not specify any grade levels, SB 1X does indicate that the API includes the results of the STAR program. Senate Bill 376 (1997-1998 session) requires the STAR program include an assessment of students in grades two through eleven. The value of including STAR results in the API is threefold: schools are able to monitor their performance over time; similar schools can be compared to each other, and policy

Page 22: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

12

makers and others can track school performance against changes in state policy. Moreover, the legislation is explicit in the expectation that schools be held “accountable for the academic achievement and progress of its pupils”. (SB 1X, 52050.5(d)) (emphasis added) This requirement is also consistent with the new federal law, the Elementary and Secondary Education Act that requires annual testing students in grades 3-8.

L5. The statewide accountability system must be easily accessible

and understandable to parents and others. 52050.5(h). The API needs buy-in from affected constituencies. At a minimum parents and other interests should understand the concept behind the API enough to be able to trust the results. The API is also intended to boost accountability of schools to parents. Having information on performance of schools readily available to parents is one of the key justifications for building accountability systems, so accessibility of the results and related documentation is an important requirement.

L6. The API should be a composite index rather than a narrative or

single indicator. “The API shall consist of a variety of indicators currently reported to the State Department of Education” (SB 1X, 52052.(a.1and 3))

Additional Requirements The language of the Public School Accountability Act expressed the Act’s legislative intent in terms of the public policy goals the Act sought to further. Those intentions can be translated into criteria that must be satisfied if the API is to function optimally. In addition, the API has several other mathematical requirements to meet if it is to provide a fair and accurate measure of the impact of the legislation. Finally, in recognition of the current budgetary environment, we have added a criterion that recognizes the obligations of the state to pay for legislative mandates. We acknowledge the importance of political factors in shaping the future options for the API, but exclude them from the present analysis. The additional three requirements are presented below; a fourth requirement is presented in Section VII.

A1. The API and its components need to directly and defensibly

support the outcomes of interest.

The legislation specifies only one outcome – student academic achievement and progress. Given the detail devoted to student

Page 23: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

13

achievement and the complete absence of language about any other outcomes, the only defensible construction of the API under the current legislation pertains to student academic achievement.

A2. Individual components of the API must meet minimum standards for data quality so they can be positive contributors to the results.

Since the API requires combining performance indicators into a single index, there are several practical considerations that must be met in order to make the API work as intended. Elements of the API should exhibit meaningful variation across schools; if every school had the same aggregated STAR score, for example, it would not be useful in differentiating good schools from bad. Each element should be collected and reported consistently across schools. Finally, each element must accurately reflect concepts related to students and schools. Inaccuracy can arise from different sources: items like suspension rates can be skewed by inaccurate base populations on which they are calculated; measures like proportions of certificated personnel can be so general that they include principals, teachers, and school counselors, even though each has a different role in student learning; or a measure may only be relevant for certain grades or schools.

A3. The API should maximize use of existing data to the extent

consistent with the previous criteria.

If the state adds data elements to the API that are not currently being collected and are not specified in the Public Schools Accountability Act, then the state will be responsible for school and district costs associated with legislatively mandated data collection.

Page 24: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

14

V. STAR – The Foundation of the API

Currently in California tests from the Standardized Testing and Reporting (STAR) program are the only factors included in the API. Consequently, much of the public debate about the API blurs the distinction between STAR and the API. In reviewing STAR in this section, we hope to clarify the differences. As well, the testing system is slated for changes over the coming years; this has important implications for its role as the foundation of the API.

The STAR program was created in 1997 to create a common method of measuring achievement of students in grades two through eleven in California. For the norm-referenced assessment, the California State Board of Education chose the Stanford Achievement Test Series, Ninth Edition (SAT-9), Form T, published by Harcourt Educational Measurement.

Students in grades 2 through 8 are tested in the basic skills of reading, spelling, written expression, and mathematics. Students in grades 9 through 11 are tested in reading, language, mathematics, history-social science, and science. Individual student scores and aggregate scores resulting from the administration of the SAT-9 are reported to teachers, administrators, parents, governing boards of school districts, county boards of education, and the State Department of Education. Those test scores and associated national percentile scores are then used to calculate each school’s API.

The STAR program is notable both for its comprehensiveness and its frequency. As shown in Table 1, across the testing practices of states, only seven other states include as many grades in the state testing program as California. While virtually all states administer tests each year, unless the grade span is inclusive, it is difficult to capture individuals or groups of student as they move from grade to grade. As will be discussed later, the feature of annual testing in STAR provides untapped potential for the state school accountability system.

SAT-9 is a nationally normed standardized test that yields measures of student learning in varying cognitive areas depending on grade level. Tests taken by California students are compared with a nationally representative sample of test-takers to establish national percentile rank scores. States can use the test with confidence that the results are tolerably reliable, stable over time and a fair measure of student knowledge of the test material. The test is constructed using a set of national learning standards. Consequently, to the extent that a state or a district uses a sequence or timing of instruction that differs from the basis of the SAT-9, the questions on the test will be less appropriate. The same holds true for differences in content material, such as the focus in fourth grade on California missions.

Page 25: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

15

Table 1

Classification of States by the Number of Grade Levels Assessed in 2001

Minimum

Less than 5 Grade Levels

Better

5 – 8 Grade Levels

Best

9 or More Grade Levels Connecticut Georgia Hawaii Indiana Iowa Maine Minnesota Montana Nebraska Nevada New Hampshire New Jersey New York North Dakota Ohio Oregon Wisconsin Wyoming

Alaska Arkansas Colorado Delaware Florida Illinois Kansas Kentucky Louisiana Maryland Massachusetts Michigan Missouri New Mexico North Carolina Oklahoma Pennsylvania Rhode Island

South Carolina Texas Utah Virginia Vermont Washington

Alabama Arizona California Idaho Mississippi South Dakota Tennessee West Virginia

In 1999, the state expanded the STAR program by adding test sections consisting of items that were designed to align with state-adopted content standards in mathematics and English/Language Arts.1 In 2001, the results from the English/Language Arts standards-based test, along with the results of the SAT-9, were used to generate API scores for the 2000-01 school year. The state plans to include the results from the 2002 mathematics standards-based test (grades 2-7), the standards-based social science test (grades 9-11), and the high school exit exam to generate API scores for the 2001-02 school year. Other changes to the STAR program since 1998 include the addition (1) in 1999 of a Spanish language academic achievement test for those students whose native language was Spanish and had not been enrolled in the State for more than twelve months, (2) in 2001 of a writing test in grades four and seven, and (3) in 2001 end of course assessments in science and history/social science classes in grades nine through eleven. The additional assessments are criterion-referenced tests where student performance is based on a comparison to a pre-established standard and not the performance of other students. As indicated in Table 2, the pattern of starting with a norm-referenced test like the SAT-9, and adding criterion-referenced tests is one followed by other states as well. Because the use of both types of tests offers the best of both approaches, it constitutes the

Page 26: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

16

Best practice. Table 3 (found on page 18) displays the use of norm-referenced and criterion-referenced tests by grade for each state. Criterion referenced tests tend to correlate closely with norm-referenced tests, so we can anticipate good performance of the new elements.

Table 2 Classification of States

by the Type of Assessments Used in 2001

Minimum

Norm-Referenced Tests

Better

Criterion-Referenced Tests

Best

Norm- and Criterion-Referenced Tests

Hawaii Illinois Indiana Iowa Maine Minnesota Montana Nebraska Nevada New Jersey New Mexico North Carolina North Dakota South Dakota Utah

Arkansas Colorado Connecticut Kansas Maryland Massachusetts Michigan Missouri New Hampshire New York Ohio Oregon Pennsylvania South Carolina Texas Vermont Wisconsin

Alabama Alaska Arizona California Delaware Florida Georgia Idaho Kentucky Louisiana Mississippi Oklahoma Rhode Island Tennessee Virginia Washington West Virginia Wyoming

With the new additions to the STAR program, there are several issues that are currently being addressed by the Technical Design Group for the API and will need to be of concern in the future. First, although the English Language Arts test and subsequent adoption of other tests will not affect the quality of the core SAT-9, they will have their own performance, and should to be studied both on their own and in union with the SAT-9. It will take several years to accumulate enough testing experience to gauge these effects with certainty. Until more is known about the additions, California is wise to retain a norm-referenced test as the core of the testing system. Decisions concerning the retention of norm-referenced tests in the long run should wait until the performance of the criterion-referenced tests is established. Second, it is clear that a great deal of attention has been devoted to the quality of STAR and this is likely to continue. The concern with the precision of the student-level tests has direct bearing on the API. Without minimizing the importance of the work that has

Page 27: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

17

already occurred, it must be noted that the results do not translate directly to the API. The API is a different entity from STAR despite its use of STAR tests. The most important distinction is between the use of the STAR as a means to test individual students and the aggregation of STAR scores to discern performance of a school. Like many other states, California is adapting a testing instrument that is calibrated for individual use to perform a different function. As indicated, the Technical Design Group for the API has identified many of the issues raised here. Their continued attention to the problems associated with producing school scores and subgroup scores will ensure that the best information possible is provided to educators, parents, and legislators.

Page 28: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

18

Table 3 Type of Assessments Being Used in States

and the Grade Levels Being Assessed Type of Test and the Grade Levels Norm-referenced Criterion-referenced

Alabama 3-11 5-7 Alaska 4,5,7,9 3,6,8,10 Arizona 2-11 3,5,8,10,11 Arkansas California

5,7,10 2-11

4,6,8,11 2-11

Colorado 3,4,5,7,8 Connecticut 4,6,8,10 Delaware 3,5,8,10 4,6,8,11 Florida 3-10 3-10 Georgia 4,8 4,6,8,11 Hawaii 3,6,8,10 Idaho 3-8 4,8,9-11 Illinois 3,5,8-12 Indiana 3,6,8,10 Iowa 4,8,11 Kansas 4-8,10-11 Kentucky 3,6,9 4,7,8,12 Louisiana 3,5,6,7 4,8 Maine 4,8,11 Maryland 3,5,8,9,11 Massachusetts 4-8,10-11 Michigan 4,5,7,8,11 Minnesota 3,5,8,10 Mississippi 5-8 2-12 Missouri 3-5,7-11 Montana 4,8,11 Nebraska 4,8,11 Nevada 8,10 New Hampshire 3,6,10 New Jersey 4,5,8,11 New Mexico 3-9 New York 4,8,12 North Carolina 3-8,10 North Dakota 4,6,8,10 Ohio 4,6,9,12 Oklahoma 4,5 5,8,9-12 Oregon 3,5,8,10 Pennsylvania 5,6,8,9,11 Rhode Island 4,8,10 3,7,10,11 South Carolina 3-8 South Dakota 2-11 Tennessee 3-8 9-12 Texas 3-8 Utah 3,4,5,8,10,11 Vermont 2,4,6,8,10,11 Virginia 4,6,9 3,5,8 Washington 3,6 4,7,10 West Virginia 3-11 4,7,10 Wisconsin 4,8,10 Wyoming 4,8,11 4,8,11

Page 29: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

19

VI. Analysis of the California API

This section presents the analysis of the California API. The purpose of the study was to develop recommendations for modification of the API. The analysis tested how well the API satisfies each of the performance requirements introduced in Section IV. The results are presented here in four segments: How user-friendly and efficient is the API?; What should the API measure?; How should the API be structured?; and Who should be included in the API? We constructed a set of tools to examine the API and how well it works. The tools draw on the collective experience in other states; in doing so, it is possible to view California’s system in a broader context. Study of other states reveals the diversity of their accountability programs. They differ in many ways, such as the types of measures used in an index or how precise the final measures are. When the systems were compared on a given attribute, we found in most cases that the systems clustered into discrete and progressively sophisticated groups. The groupings are important because they present differing incentives to schools about their behavior. Thus if a state could upgrade its system on any of these attributes it would alter the incentives that schools face and produce a better result. We call these progressions developmental frameworks, and identify Minimum, Better and Best Practices within each one. They are used throughout this section to explain the California API. The California API already compares favorably against other states in its current organization. Our examination of accountability programs around the country shows that accountability programs fall into three groups. The developmental framework found in Table 4 clusters state programs by the complexity of the program. Simple description of schools provides a minimum level of information. The incentives are minimal in that they serve notice of some degree of review. Two states use single indicators to reflect school performance: Montana uses test scores (Iowa Test of Basic Skills), Utah uses test scores (SAT-9, state developed criterion-referenced tests, a writing test, and a reading diagnostic test). In both cases, there is no judgment applied to the scores and no associated consequences. This level of activity represents the Minimum practice. The use of profiles utilizing multiple measures improves on simple description and represents the Better practice. Schools face stronger incentives because more of their activities and results are studied. This advantage is offset by the fact that these profiles do not assimilate the information into a unified judgment about schools. Forty-seven states have developed profiles that compile a set of variables to present a multifaceted view of schools. In California, the profile is known as the School Accountability Report Card (SARC). The Best practice methods were found in the thirty-one states that utilize a rating system made up of computations and evaluations to ascertain whether schools are performing according to established standards. These states provide schools with the strongest incentives since both judgment and consequences are involved. These systems are the

Page 30: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

20

Table 4

Methods of School Accountability Used by States in 2001

Minimum Better Best Alabama X X Alaska X X Arizona X Arkansas X X California X X Colorado X X Connecticut X X Delaware X X Florida X X Georgia X X Hawaii X Idaho X Illinois X Indiana X Iowa X Kansas X Kentucky X X Louisiana X X Maine X Maryland X X Massachusetts X X Michigan X X Minnesota X Mississippi X Missouri X Montana X Nebraska X Nevada X X New Hampshire X X New Jersey X New Mexico X X New York X X North Carolina X X North Dakota X Ohio X X Oklahoma X X Oregon X X Pennsylvania X Rhode Island X X South Carolina X X South Dakota X Tennessee X X Texas X X Utah X Vermont X X Virginia X X Washington X West Virginia X X Wisconsin X X Wyoming X Totals 2 47 31

Page 31: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

21

most similar to the California API and were chosen as the comparison group for the analysis that follows. Of the thirty-one states, all but one employ both profiles and rating systems.

Analysis: Is the API user friendly and efficient?

Criterion L5: The API must be easily accessible and understandable to parents and others.

California is unique in having a legislative provision to assure that the API be available and comprehensible to parents and others. Placing performance information in the hands of affected parties is an important objective of the API. If the policy of holding schools accountable for performance as measured by student academic achievement is to be successful, at a minimum parents and other interested parties need to support the goals and accept the results of the program as credible. Public confidence determines political support, and public confidence is based on how open and understandable the accountability program is. Interviews in other states indicated that accessibility of the school ratings was fairly easy to accomplish, as has been the case in California. Explanations of the program and results are posted on CDE web sites. Schools have copies of their results. Libraries and community resource centers have material. Translation of material into languages other than Spanish was the only noted barrier to access when local districts commented on API accessibility. Ease of understanding of the API is less certain. This research did not conduct public surveys of parents or other groups so empirical estimates are unavailable. However, consider the current computation of the API: it includes translating raw scores to National Percentile Ranking, proportional weighting and percentage calculations. These concepts are uncommon for most people. On the other hand, some people have argued that classifying the API results into deciles has helped parents better understand the results (e.g., a school in the eighth decile is doing better than a school in the fourth decile). Consequently, it seems highly likely that the complexity needed for accuracy of the API will always be in tension with a desire for simplicity. Criterion A4: The API should seek to maximize the use of existing data. Efficiency is a standard public policy goal. In the context of the API, efficiency is taken to mean leveraging to the greatest extent feasible the existing sources of data available to the state. Making use of existing resources avoids the mandated requirement in the California Constitution to underwrite the monetary costs associated with new reporting requirements. Conversely, inclusion of new data elements carries non-monetary costs of coordination, administrative refinements of protocols and procedures and all incremental adjustments in state oversight and quality review.

Page 32: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

22

The current API involves only STAR scores, so it imposes no additional costs on districts and schools. Expansion of the API must consider the constraint of this criterion. The statute requires inclusion into the API of attendance rates for students and certificated personnel and student graduation rates after the data and data-reporting systems for these variables are found to be dependable. Student graduation and attendance variables are currently reported to CDE, but their inclusion in the API was deferred initially for reasons of data quality. Certificated personnel attendance rates are not currently being collected by the state. As mentioned earlier, these variables all would entail new investments in data collection, standardization, and compliance. One of the prime reasons for this project was to critically assess the suitability of other variables for inclusion into the API. Variables were identified through a number of sources for this purpose. They consisted of:

· Variables currently collected by schools and reported to the California Department of Education in a variety of reports

· Additions to the API proposed in recent California legislation · Measures routinely collected by local districts for assessment purposes

that are not forwarded to the state · Variables in rating systems of other states or under consideration

We also reviewed variables included in School Accountability Report Cards (SARCs). Each school supplies information to be incorporated into their report card. For example, schools must describe the quality of instructional materials, the qualifications of substitute teachers, the quality of school instruction and leadership, the school safety plan, teachers’ professional development, the adequacy of teachers’ evaluations, the college admissions test preparation program, and whether students are prepared to enter the workforce. Because the requirements for these elements are general, and the information supplied is largely subjective, schools have a degree of latitude as to what information to include. Consequently, the information is qualitative or opinion-base and not suitable for the API. We excluded them from further analysis. We also chose to exclude data from the Physical Fitness Assessment because the API focuses on academic performance. After duplicates and overlaps were excluded, we identified a total of 32 variables as candidates for inclusion in the API; they appear in Table 5. The variables are:

Class size College entrance test results Condition of school facilities Course offerings Dropout rate Graduation rate Leadership and staffing requirements Number of computers in a school

Page 33: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

23

Number of fire drills Number of instructional minutes Number of non-credentialed teachers Number of pupil hours in an intensive reading program in grades K-4 Number of pupil hours in an intensive algebra program in grades 7-8 Number of support personnel Number of students enrolled in advanced classes Number of high school graduates completing A-G course requirements

for admissions to University of California/California State University System (UC/CSU) Parent/community satisfaction Percent of students passing end of course examinations Percent of students passing high school exit examinations Percent of students successfully transitioning to post-secondary life Percent of students taking the state test Principal mobility Retention rate School crime rate School expenditures Student attendance rate Student mobility Suspension rate Teacher attendance Teacher mobility Teacher salaries Year-round school status

The variables that were explicitly mentioned in the SB 1X legislation appear in bold face.

Page 34: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

24

Table 5 Variables Considered for Inclusion in the API

States with Rating System Construct Variable Data Source1 that Includes Variable Included in Public School Accountability Act Certified Personnel Attendance Rate Teacher Attendance Local Graduation Rate for Students in Secondary Schools Graduation Rate CBEDS, SARC GA, MI, OH, OK, SC, VA

Pupil Attendance Rate Student Attendance J-Series, SARC KY, LA, MD, NV, NM Rate OH, OR, SC, WV, TX Proposed in recent California legislation Access to Technology Numbers of Computers in a School CBEDS

Condition of School Condition of Local Districts Facilities and Grounds School Facilities Dropout Rate Dropout Rate CBEDS, SARC KY, LA, MD, MI, NM, NY, NC, OK, OR, SC, TX, WV, VT Number of Honors and Advanced Number of Students Placement Courses Offered at the Enrolled in CBEDS (limited), High School Level Advanced Classes Local Districts

Page 35: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

25

Table 5 (Continued) Variables Considered for Inclusion in the API States with Rating System Construct Variable Data Source1 that Includes Variable Proposed in recent California legislation (Continued) Number of Students who take the Preliminary Scholastic Aptitude Test College Entrance College Board, and the Scholastic Aptitude Test Test Results SARC

Number of Teachers who are not fully Number of Certified but are assigned to Non-Credentialed Classroom Teaching Teachers CBEDS, SARC Principal Transiency Principal Mobility Local Districts Student Transiency Student Mobility STAR Header Sheet Suspension Rate Suspension Rate SARC Teacher Transiency Teacher Mobility Local Districts Whether a School Operates Year Round Year Round School Status CBEDS, SARC

Page 36: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

26

Table 5 (Continued) Variables Considered for Inclusion in the API States with Rating System Construct Variable Data Source1 that Include Variable Additional Variables Available in California Class Size SARC Number of Instructional Minutes SARC Number of Graduates Completing A-G Course Requirements for

University of California and California State University (UC/CSU) SARC

Number of Pupil Hours in Intensive Reading, Gr. K-4 J-Series Number of Pupil Hours in Intensive Algebra, Gr. 7-8 J-series Number of Support Personnel SARC, CBEDS School Expenditures J-Series

Page 37: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

27

Table 5 (Continued) Variables Considered for Inclusion in API States with Rating System Construct Variable Data Source1 that Includes Variable Additional Variables Available Teacher Salaries SARC in California Variables In Use or Proposed Course Offerings Local Districts VA in Other States

Leadership and Staffing Requirements Unavailable VA Number of Fire Drills Unavailable VA Parent/Community Satisfaction Local Districts NM

Percent of Students Passing End of Course Examinations GSE Results GA

Percent of Students Passing HS Exit Examinations CDE (in the future) GA Percent of Students Successfully Transitioning to Post-Secondary Life Unavailable KY

Page 38: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

28

Table 5 (Continued) Variables Considered for Inclusion in API States with Rating System Construct Variable Data Source1 that Include Variable Variables In Use or Proposed In Other States (Continued)

Percent of Students Taking the State Test STAR data OR Retention Rate Local KY School Crime Rate SARC NM

1Legend for Data Source Acronyms CBEDS—California Basic Education Data System SARC—School Accountability Report Card J-Series—The Principal Appointment Forms STAR—Standardized Testing and Reporting GSE—Golden State Exam CDE—California Department of Education

Page 39: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

29

Table 6

Disposition of Candidate Variables

Candidate Variables

Candidate Variables that are

currently available in California

Candidate Variables that are

strongly associated with student achievement

Candidate Variables that meet data quality

requirements 1. Class size 2. College entrance test results 3. Condition of school facilities 4. Course offerings 5. Dropout rate 6. Graduation Rate 7. Leadership and staffing requirements 8. Number of computers in a school

9. Number of fire drills 10. Number of instructional minutes 11. Number of non-credentialed teachers 12. Number of pupil hours in an intensive

reading program in grades K-4 13. Number of pupil hours in an intensive

algebra program in grades 7-8 14. Number of support personnel 15. Number of students enrolled in

advanced classes 16. Number of HS students completing

A - G requirements

Class size College entrance test results Condition of school facilities Course offerings Dropout rate Graduation Rate

Number of computers in a school

Number of instructional minutes Number of non-credentialed teachers Number of pupil hours in an intensive reading program in grades K-4 Number of pupil hours in an intensive algebra program in grades 7-8 Number of support personnel Number of students enrolled in advanced classes

Number of HS students completing A-G requirements

Dropout rate Graduation Rate

Number of students enrolled in advanced classes

Number of HS students completing A-G requirements

None

Page 40: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

30

Table 6 (Continued) Disposition of Candidate Variables

Candidate Variables

Candidate Variables that are

currently available in California

Candidate Variables that are strongly associated with student

achievement

Candidate Variables that meet data quality

requirements 17. Parent/community satisfaction 18. Percent of students passing end of

course examinations 19. Percent of students passing high school

exit examinations 20. Percent of students successfully

transitioning to post-secondary life 21. Percent of students taking the state test 22. Principal mobility 23. Retention rate 24. School crime rate

25. School expenditures 26. Student attendance rate 27. Student mobility 28. Suspension rate 29. Teacher attendance 30. Teacher mobility 31. Teacher salaries 32. Year-round school status

Parent/community satisfaction Percent of students passing end of course examinations Percent of students passing high school exit examinations Percent of students taking the state test

Principal mobility Retention rate School crime rate

School expenditures Student attendance rate Student mobility Suspension rate Teacher attendance Teacher mobility Teacher salaries Year-round school status

Percent of students passing end of course examinations Percent of students passing high school exit examinations

Retention rate

Student mobility Suspension rate

None

Page 41: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

31

Also in Table 5 the candidate variables are examined for availability in California. The California Department of Education collects 17 of the candidate variables already. (In some of the 17 cases the measure would have to be calculated using two figures that the CDE collects.) Of the remaining 15 variables, 12 are available at the local level but not reported to CDE. If any of these 12 measures are determined to be suitable for inclusion in the API, the costs associated with local district compilation and transmittal would appear to be justified. They are continued for study against the remaining criteria. Three variables were found not to exist at either the state or local level and were excluded from further analysis: leadership and staffing requirements; number of fire drills, and the percent of students successfully transitioning to post-secondary life, as displayed in Table 6. The excluded candidates would require original data collections of varying difficulty, with the percent of students successfully transitioning to post-secondary life creating the greatest challenge because it would require tracking students after high school graduation. In addition, leadership and staffing requirements (the number of principals per school site) is related to school size, and the number of fire drills has more to do with school emergency preparedness than student learning. Since both the costs and potential yields for each were uncertain, we dropped them from the rest of the study. Over the longer term, it is possible that the eventual adoption of the California School Information System (CSIS) by districts could eliminate or minimize the additional costs associated with the variables that were excluded in this step. At such time, the newly available variables should be examined against the remaining criteria.

Analysis: What Should the API Examine? Criterion L1: The API should be outcomes focused. The focus of accountability is results, not effort. The legislation is explicit that the API must examine schools in terms of their student achievement outcomes. At the same time, the legislation calls for the inclusion of three variables that are not outcome measures themselves. This criterion does not exclude other non-outcome measures from being incorporated; rather it requires that the measures must be strongly related to the outcomes a school produces. This requirement provides a strong criterion for considering the API’s future. Choosing to look primarily at outcomes is not a forgone conclusion, as the developmental framework in Table 7 shows. We found a significant number of states with rating systems that rely on a mix of process and outcome measures. With such a blend, those states hold schools accountable for the way students are taught in addition to considering the outcome of those efforts. Such an approach is clearly superior to just looking at the process measures, the Minimum approach. However, the incentives of hybrid systems are ambiguous: a school could be rewarded for improving its procedures, even if they do not result in additional student achievement. An outcomes orientation in the API creates incentives for schools or districts to direct resources appropriately to maximize the outcomes being studied. Outcome measures illustrate most clearly the degree to which

Page 42: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

32

schools are achieving the educational goals for their students. The incentives are purer than in the hybrid system, and so are considered Best practice. Only 10 of the 29 remaining candidate variables are outcome measures. Nine are measures of school activities so are classified as process variables. Ten are measures of inputs. The breakdown of these measures appear in Table 8.

Table 7 Classification of States by the Type of Data Elements

Used in School Rating Systems in 2001

Minimum

Process Measures

Better Hybrid Process &

Outcome Measures

Best

Outcome Measures Georgia

Kentucky Louisiana

Alabama Alaska Arkansas

Maryland California New Mexico Colorado North Carolina

Ohio Oklahoma

Connecticut Delaware Florida

Oregon Texas

Massachusetts Michigan

West Virginia Vermont

Mississippi New Hampshire New York Nevada Rhode Island South Carolina Tennessee Virginia

Wisconsin

Page 43: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

33

Table 8

Candidate Variables by Type

Inputs Process Outcomes Teacher Attendance Rate Class Size Condition of School Facilities & Grounds Course Offerings Number of Computers in a School Number of Non-Credentialed Teachers Number of Support Personnel School Crime Rate School Expenditures Teacher Salaries

Student Attendance Rate Number of Instructional Minutes Number of Pupil Hours in an Intensive Reading Program in Grades K-4 Number of Pupil Hours in an Intensive Algebra Program in Grades 7-8 Percent of Students Taking State Test Principal Mobility Student Mobility Teacher Mobility Year Round School Status

College Entrance Exam Scores Drop-out Rate Graduation Rate Number of Students in Advanced Courses Number of Graduates Completing A-G Course Requirements Parent/Community Satisfaction Percent of Students Passing End of Course Exams Percent of Students Passing The High School Exit Exam Retention Rate Suspension Rate

Because the API at present only consists of STAR scores of students in each school, it is exclusively outcomes focused. It could be expanded with other outcome measures and still satisfy this criterion and retain clear incentives for schools. However, if the API were modified to include input or process measures, the strength of the association between the new measure and student achievement would determine the degree to which the incentives are dulled. If the relationship between a new factor and student achievement is strong, then the combination would be less compromised than if the strength between them were weak. It is this alignment that is addressed in the following criterion. Criterion A1: The API and its components need directly and defensibly to support

the outcomes of interest. Criterion A1 requires that any additional factor in the API have a clear and strong relationship to student achievement, the fundamental outcome of the API. The justification parallels the discussion of incentives above. The criterion seeks to minimize the dilution of the intrinsic incentives in an outcomes oriented accountability system. The best systems are those that give schools an undistorted signal about the effect of their efforts. As is clear from Table 5, states make use of other factors. This fact raises the question of how closely aligned the candidate variables are to student achievement, California’s existing outcome. We scanned the education literature for studies that measured the strength of the relationship of each variable to student achievement. Where possible,

Page 44: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

34

multiple articles were identified for each candidate variable. The full summary of the literature review is presented in Appendix C. Here, we provide a brief summary of the findings. We classified each candidate variable in one of three groups based on the strength of the

relationship and the weight of empirical evidence. If the relationship has not been studied or the evidence is weak or inconclusive, we considered it to have Weak

Page 45: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

35

association with student achievement. If there is conclusive evidence about a variable but the correlation was less than or equal to r = .5 the strength of the association we rated Moderate. If the conclusive research showed a close association it was designated as Strong. The results appear in Table 9.

Input Variables Non-credentialed teachers. The percentage of uncertified teachers in a school may not by itself be agood indicator about the quality of instruction in a school. Although some researchers argue thatimproving student achievement necessitates having a properly trained teacher in each classroom,others argue that the issue is subject area training not necessarily credential status.1

Number of support personnel. As we were unable to find any studies that looked specifically at the relationship between the number of support personnel (counselors, librarians, psychologists, social workers, nurses, speech/language/hearing specialists, and non-teaching resource specialists) in a school and student achievement, this appears to be a weak relationship. However, some literature indicates that support personnel can play a role in improving student achievement. For example, counselors can intervene through study skill groups, time management training, and achievement motivation groups.2 School crime rate. The relationship between school crime rate and student achievement is weak. Eighty percent of schools report less than one incident of serious crime a month. The lack of occurrence of crime would then indicate that a school crime rate would be an ineffective indicator for the API because there is minimal variation across schools.3

School expenditures. Some states use the ratio of school expenditures to school scores to summarizewhether school funding has an effect on school learning. The disadvantage of the indicator, though, isthat the link between school funding and student learning has not been shown because some studiesshow that a relationship may exist and other studies indicate that there is no correlation.4 Teacher salaries. Studies that directly correlate teacher salaries to student achievement appear to be limited. One study substituted number of graduate education hours and years of experience for teacher salary and found that they were unrelated to student achievement. Another study used teacher salaries, but substituted principals’ evaluations for student achievement and also found a weak relationship.5 1Darling-Hammond, 1999; Goldhaber and Brewer 2000; Miller et al. 1998. 2Brown 1999. 3Kaufman et al. 1999. 4Grissmer 2000; Krueger 1998. 5Ballou et al.1997; Hanushek 1971. Note: All citations are referenced in the bibliography.

Page 46: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

36

Table 9

Strength of Relationship of Candidate Variable to Student Achievement

Weak Moderate Strong

College Entrance Exam Scores Course Offerings Number of Instructional Minutes Number of Computers Number of Non-Credentialed Teachers Number of Pupil Hours in an Intensive Reading Program in Grades K-4 Number of Pupil Hours in an Intensive Algebra Program in Grades 7-8 Number of Support Personnel Parent/Communication Satisfaction Principal Mobility School Crime Rate Teacher Mobility Teacher Salaries School Expenditures

Class Size Condition of Schools Facilities & Grounds Percentage of Students Taking State Test Student Attendance Rate Teacher Attendance Year Round School

Drop-out Rate Graduation Rate Number of Students in Advanced Courses Number of Graduates Completing A-G Course Requirements Percent of Students Passing End of Course Exams Percent of Students Passing High School Exit Exam Retention Rate Student Mobility Suspension Rate

The resulting classification shows an interesting pattern, as revealed in Table 10. The input variables were found largely to be Weak. Process variables have more varied relationships to student achievement, suggesting that it would be imprudent to reject a process measure out of hand. Only student mobility is strongly related. Student and teacher attendance rates are only moderately related, and therefore dismissed.

Table 9

Process Variables

Student attendance rate. Research in Louisiana on attendance rates and student achievement only found a relationship for a small group of students in a particular type of school. Consequently, a more informative analysis should be done using student-level data and then aggregate results to the school level.1

Number of instructional minutes. Although there is a correlation between time spent in class and student learning, the important factor is how the time is spent, not simply minutes of instruction. Good teaching and bad teaching can take the same amount of time.2

Number of pupil hours in an intensive reading program in grades K-4, number of pupil hours in an intensive algebra program in grades 7-8. No work on the relationship between after school reading and math programs and student achievement has been conducted in the last seven years. Although some research has been done on the effectiveness of after school programs in general, the results are questionable because the after school program wasnot the only change occurring within a school. Any changes in student achievement when several variables are introduced should not be attributed to a particular feature of the reform, but to the reform as a whole.3

1Caldas 1993; Crone et al. 1993. 2Berliner and Rosenshine 1977. 3 Molnar et al. 1999. Note: All citations are referenced in the bibliography

Page 47: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

37

Table 10

Pattern of Classification Variables by Strength of Association with Student Achievement

(Value is the number of variables) Rating Inputs

Process Outcomes

Weak 7 5 2 Moderate 3 3 0 Strong 0 1 8 Of the three types of variables, the outcome variables showed the strongest association to student achievement, with two exceptions. Parent/Community Satisfaction and College Entrance Exam Scores fall down for much the same reason – there is insufficient variation in the variable to be strongly correlated with any other variables. Public opinion research has documented a constant positive regard by parents for their schools despite actual differences.2 College Entrance Exam Scores are self-selective and reflect only a segment of the student body of a school. We continued the analysis for the nine candidate variables that showed a strong relationship with student achievement: Dropout rate; Graduation rate; Number of students enrolled in advanced classes; Number of graduates completing A-G course requirements; Percent of students passing end of course examinations; Percent of students passing the high school exit examination;

Process Variables

Percent of students taking the state test. This descriptor is related to student achievement,but its effectiveness in the API may be limited by legislative requirements. In most states, the general student population is required to take the state assessment. Consequently, there is little variation in the descriptor across schools, and so it would not help the API’s ability to distinguish effective and non-effective schools. Principal mobility rate. We were not able to find any studies that looked at the relationship between principal transciency and student achievement. Student mobility. Although the conventional wisdom is that student achievement and student mobility are related, the extant literature is not as conclusive. Studies found that the relationship between achievement and mobility is moderated by: schools with high student turnover rates, high percentages of economically disadvantaged students, student English proficiency, student poverty, a student’s absence rate, and age of the student (appears to be more critical in early years).1 Teacher mobility rate. Studies suggested that a relationship between teacher mobility andstudent achievement may exist. For example, two studies indicate that teachers move to improve their working environment by transferring to higher SES schools. However, there is no indication that teacher mobility has an effect on student achievement.2

Year round school status. A review of the year round school literature indicates there is a moderate relationship to student achievement. Nine studies found a positive relationship with student achievement, four studies found that enrollment in a year round school had no significant difference on student achievement, and one found attending a year round school had a negative relationship to achievement for students with low IQ (less than 100). However, a study comparing achievement of students attending year round and traditional schools in a large urban school found no significant difference.3

1Heinlein and Shin 2000; Jennings et al. 2000; Mao, Whitsett and Mellor 1997; Rumberger et al. 1999. 2Greenberg and McCall 1974; Hanushek, Kain, and Rivkin 1999; Uehara 1999. 3Shields and Oberg 2000. Note: All citations are referenced in the bibliography.

Page 48: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

38

Retention rate; Student mobility; and Suspension rate. Of the three variables mandated by legislation, certificated personnel and student attendance rates did not show sufficient strength of association to remain potential candidates.

Analysis: How should the API be structured? Criterion L3: Specified statewide tests must constitute at least sixty percent of value of the API. Criterion L6: The API should be a composite index rather than a profile or single descriptor. The importance of Criterion L3 is that it fixes the majority of the API focus on measurable student achievement. On a stand-alone basis, this criterion would permit STAR results to be the sole factor, since the remaining allocation of the API is not specified. Criterion L6, on the other hand, seeks to create a complex measure of school performance. The legislation required student and certificated staff attendance rates and student attendance rates, but did not exclude other factors. Absent other criteria, the new factors could range from input measures such as the proportion of certified teachers in a school to process measures such as the number of instructional minutes to the other outcome measures just discussed. There is an argument for having a composite index. Having multiple dimensions of performance creates strong incentives for schools to take the results seriously. It is harder to dismiss results when they have multiple foundations, because the judgment is more balanced than with a single factor. Like the two previous criteria, the developmental focus employed here was the strength of the incentives created by the API. Evaluating a school’s outcomes against established standards makes it clear to schools what is their primary purpose and clearly establishes expectations for results. In order to align Criterion L6 with an outcomes focus (Criterion L1 and Criterion A1), the additional elements of the API should themselves be outcome oriented. The states that have outcome based accountability systems use test scores as a single measure of outcome. California joins 18 other states in this regard. However, consistent with both Criteria L3 and L6, it is conceivable that other outcomes can be incorporated into the API to create a more rounded set of expectations. California satisfies Criterion L3 at present but not Criterion L6. Both could be met if the legislative mandate to add either attendance rates or graduation rates were implemented, the second criterion could still be met. Alternatively, any number of additional factors could be added as long as their collective weight in the API was limited to 40 percent. However, other measures strongly correlated with student achievement would jointly satisfy this criterion and be able to meet the foregoing criteria related to outcomes. A possible example would be to combine absolute achievement levels with progress measures. The

Page 49: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

39

result would continue to send clear signals to schools about performance but provide multiple points of focus. Criterion A2: Individual components of the API must meet minimum standards for data quality so they can be positive contributors to the results. This criterion concerns the technical attributes of any component of the API that are necessary for good data quality. While these considerations may appear arcane, the value of the API could be compromised if it were found that its ingredients introduced confusion into the results. The legislature in fact recognized the general problem of data quality when it permitted the deferral of graduation rates and student and certificated personnel attendance rates until the measures were technically adequate. Each element must add to the ability of the API to reveal differences in school performance or else it is not worth including. We analyzed three conditions of data quality: variation across schools; consistent data collection and reporting; and unbiased measurement. The results of applying these criteria to the remaining variables appear in Table 11.

Table 11 Quality Assessment of Candidate Variables

Variation Consistent Variable Across Schools Measurement Unbiased Dropout Rate Yes No Unknown Graduation Rate Yes No No Number of Students In Advanced Classes Limited No No Number of Graduates Completing A-F Requirements Yes No No Percent of Students Passing End of Course Exams Yes Yes No Percent of Students Yes N/A N/A Passing HS Expected to Presumed Presumed Exit Exams Diminish Yes Yes Retention Rate Yes No No Student Mobility Yes No No Suspension Rate Yes Yes No

Page 50: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

40

Regardless of the range of possible scores, a variable must show variation in the actual scores across schools. Schools must spread out from each other to enable comparison. If the majority of schools show nearly the same value on a given measure, it offers little value as a differentiator. We found one variable that lacked variation: Number of Students Enrolled in Advanced Classes. Although we listed Percentage of Students Passing the High School Exit Exams as having variation across schools, this variation is an artifact since the test has only been administered at the ninth grade level. Like graduation rate data, it is likely that the statistic will have little variation across schools when it takes into account twelfth grade students. Consistent measurement is required so that all schools are examined on an equal footing. As an example, if some schools count student maternity leaves as dropping out and others do not, the numbers are not comparable. At this time in California, data consistency is generally acknowledged to affect many variables, including several in our study: Graduation rates, number of students in advanced classes, number of graduates completing A-G Requirements, student mobility, and dropout rates.3

Outcome Variables

Parent/community satisfaction. This descriptor is weak because of the limited variation in parent satisfaction. In one study, seventy-five percent of parents whose children scored in the lowest and eighty-seven percent of parents whose children scored in the highest quartile in mathematics were satisfied with schools. Another study found that the percentage of parents rating their children’s school as an A or a B variedfrom seventy-one percent in 1985 to sixty-six percent in 1999. The lack of variation indicates that if a descriptor related to parent satisfaction were added to the API, its ability to discriminate between a good school and a poor school would not improve.1

Percent of students passing end of course exams. Without reviewing the literature, we know the relationship between student achievement and this descriptor is strong because the descriptor is a measure of student achievement. Retention rate. Studies indicate there is a strong negative relationship between retention rate and student achievement. Specifically, retained students, on average, did worse when they were promoted than students of similar ability who were not retained, and that dropouts were five times more likely to have repeated a grade than high school graduates. One study, using the extensive database on student testing and class size in Tennessee, found that students who were retained were behind their peers after the next year, even if the retained were in smaller classes.2

Suspension rate. Suspension rate is a common indicator of student behavior used to evaluate the effectiveness of educational programs. For example, one study of a mentoring program found that student attendance increased and suspension rates decreased, but no changes in student achievement occurred. Other studies have found a relationship between suspension rate and achievement, but this is probably due to non-suspended students attending school more often than suspended students.3 1Elam, Rose and Gallup 1994; Elam and Gallup 1989; Gallup 1985; Rose and Gallup 1999. 2Dill 1993; Harvey 1994; Shepard and Smith 1990. 3Alspaugh 1996; Powers and McConner 1997. Note: All citations are referenced in the bibliography

Page 51: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

41

The variables in the API should be unbiased to provide an undistorted measure of reality. It is important for the integrity of the API that each factor captures the complete and honest circumstance. We found many variables did or easily could be made to distort the underlying reality of a school. For example, the “Percent of Students Taking End of Course Exams” sounds reasonable until one realizes that the Golden State Exams are voluntary; any student at risk of receiving a low score simply will not take the test. Any variable that is vulnerable to self-selection or manipulation is unsuitable for the API. The idea is to create real incentives, not incentives to game the system. The legislative requirement to use graduation rates in the API intensified the focus on their measurement characteristics. The graduation rate fails on multiple grounds. Graduation rate, as it is currently calculated, is the number of students graduating in June divided by the number of twelfth grade students enrolled in the previous fall. In other words, the graduation rate is an indicator of twelfth grade completion rather than an indicator of a student’s success in the K-12 education program. In addition, the current graduation rate excludes students who drop out of school, transfer in or out, are expelled, or graduate prior to June of their twelfth grade year. There is also a concern about consistency of reporting; schools may relax the rules to include “ever graduated” and show higher rates than those using a more narrow calculation. Even if the population base were corrected, the graduation rate would still fall short. Since the graduation rate is based on a dichotomy (graduated/ did not graduate) it is a blunt measure that cannot reveal anything about the set of individuals within the group, such as their underlying achievement. Two schools with equivalent graduation rates may in fact be very different – one might have very high performing students but not the other. Factoring in graduation rates would reward them as though they were alike.

Outcome Variables

College entrance exam scores. There is little research on the relationship between the number of students taking college entrance examinations and student achievement. It is likely, though, that the relationship is weak because of the restricted sample of the student population taking the examinations.1

Drop-out rate. The extant literature tends to view student dropout rate as an outcome of an educational program. The challenge with using drop-out rate is how to define it and how to measure it. The event drop-out rate is the number of students completing grade 12 divided by the number of students enrolled at the beginning of grade 12. The status dropout rate is determined by taking a sample of 16-24 year old responses, then adjusting statistically to reflect the dropout rate for the nation.2

Graduation rate, Number of graduates completing A-G requirements The effectiveness of graduation rate as a descriptor is questionable because of the multiple ways it can be defined (a 1997 study indicatedthat states calculated graduation rates using three different methods) and the limited research on how it relates to student achievement. A 1984 and a 1992 study in the Dallas Independent School District tried to use graduation rate to identify effective schools but found it had little value when used with tests scores.3

Number of students in advanced courses. Little research has been done on the relationship between student achievement and enrollment in advanced mathematics and science classes. The only work in the last five years looked at enrollment trends in higher-level science and mathematics class but did not look at changes in student achievement, either generally or with respect to science and mathematics.4

1Webster, Mendro and Almaguer 1993. 2Guryan 2001; Lillard and DiCicca. 2001. 3Clements and Blank 1997; Webster, Mendro and Almaguer 1993. 4Mayer et al. 2001. Note: All citations are referenced in the bibliography

Page 52: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

42

For California, the API is currently based on strong measures, as was discussed in Section V. The state’s efforts to minimize variation across schools in terms of how the STAR tests are administered, collected, scored, and analyzed is commendable. Unfortunately, none of the nine remaining candidate variables met the conditions of this criterion at the present time. We excluded them from further review. Consequently, there are no candidate variables that are suitable for addition to the API according to our review criteria. The topic could be revisited conditioned on the improvements to the variation across schools, more uniform measurement, or definitions. The steps to evaluate the suitability of variables as factors in the API could be repeated at such time as changes are made to the way variables are collected or tabulated. Other measures will no doubt be proposed over time as well. This task would appear to fall well within the scope of activities of the Technical Design Group. The state of California’s student and school data resources has been explored elsewhere.4 The findings of this portion of the analysis speak clearly to the need for better data, not only for consideration in this matter, but to inform other important questions as well. It is hard to imagine successfully influencing school performance without the ability to better document the effects.

Analysis: Whom should the API include? Criterion L4: The accountability system should contain assessment information on

students in at least grades 2 through 11. One of the primary aims of the API is to provide comprehensive information about school performance. The legislation dictated ten grades to be covered with the STAR program, establishing a criterion of “every student every year” for ten of the thirteen years of public education. (We recognize that alternative schools work under an alternative accountability system and a few very small schools are excluded entirely.) The incentive created for schools is that they must assure sound education in each grade each year. The longer the gap between grades tested, the less certainty exists about school performance as a whole since other factors such as student population changes could have affected the results. (We return to this point later in the analysis.) Taking annual measures for a large set of grades circumvents this problem to a considerable degree. The criterion also provides the foundation for schools to be able to refine their analysis of the API results to isolate grade levels where results might be out of line. Such stratified information can support schools’ efforts to track the effects of changes in their resource allocations.

Page 53: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

43

Table 12

Classification of States with Rating Systems by the Number of Grade Levels Assessed in 2001

Less than 5 Grade Levels 5 – 8 Grade Levels 9 or More Grade Levels Connecticut Georgia Nevada New Hampshire New York Ohio Oklahoma Oregon Wisconsin

Alaska Colorado Delaware Florida Kentucky Louisiana Maryland Massachusetts Michigan New Mexico

North Carolina Oklahoma Rhode Island South Carolina Texas Virginia Vermont

Alabama California Mississippi Tennessee West Virginia

California is one of only five states whose accountability program qualifies as Best Practice on the basis of grades tested. Table 12 presents the comparison for the 31 states with rating systems. Recall that Minimum practice means that states test a few grades, but none are contiguous. The incentives for schools are uneven, since schools will be inclined to allocate resources to maximize performance in those grades that are tested. The Better practice improves on the incentives by adding grade levels.5 The Best practice tests nine or more grade levels. California meets the Best practice for this criterion. Conveniently, California also meets the new federal requirements for testing adopted in the recent Elementary and Secondary Education Act. Criterion L2: The API should comprehensively reflect the performance of students. The API legislation is clear in its intent to hold schools accountable for the education of all California students. As a practical matter, there will always be students that are unable to take the test – illness, language barriers, recent transfers or not tested due to parental waivers. However, this criterion sets the expectation that schools should seek to maximize the numbers of students whose scores are incorporated into the API. Maximizing inclusion in the API creates incentives for schools to be concerned with the education of every student. Intentionally excluding some students counteracts this incentive. It also may legitimate marginalizing groups that do not affect the scores for a school. As an example, the API includes Special Education students who are physically able to take the test under standardized conditions. Their inclusion is consistent with other California policy that supports the education of Special Education students with special emphasis. All students should be included for another reason. Exclusion of a group could statistically bias the API result for a school if the group differs in any meaningful way from the students who are included. For example, if curly haired students’ scores were

Page 54: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

44

excluded from the API, and curly haired students were naturally more gifted academically than others, then the API would be biased downward by their exclusion. The largest concern of course is that schools have an incentive to try to manipulate their scores by affecting who is included. There are countervailing arguments for excluding the scores of some students from the API. Schools argue that they should be held accountable for those students they have taught, but not for students who have not been in the school a significant amount of time. Students who were not enrolled in the district the previous year are not included in schools’ API scores. But the problem cuts two ways -- the high rate of transition in some schools means that upwards of forty percent of the students are not reflected in the API.6 The criterion rests on how representative of the entire student body the measures in the API are. The exclusion problem arises because the API is calculated based on snapshots of achievement levels instead of progress. A better solution for this dilemma exists after the API is calculated for all students. When the results are evaluated as satisfactory or unsatisfactory, considerations could be given to schools with exceptional circumstances. It should be noted that if the API incorporated progress measures the problem would be minimized. By focusing on how much a student has improved, the incentives to attend to each student are improved.

Page 55: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

45

VII. An Alternative Approach to the API

In the course of surveying the accountability systems in other states, several alternate ways to aggregate test scores into school scores were identified. We considered these other approaches equally as important to evaluate as the candidates for additional factors to the API. This section presents a recommendation for a different approach to compiling measures of school performance from individual student test results. The survey of states showed that no states have compared the statistical performance of their rating systems against other approaches. While several -- including California -- have been diligent about studying the precision and appropriateness of the individual assessment tests, their accountability systems are sufficiently new that attention has not yet been devoted to the system as a whole. Thus the analysis presented here places California in a leadership position in undertaking self-assessment of the system and results. The significance of this self-assessment can be appreciated by considering past legal challenges regarding student assessments and subgroups.7 Since courts have been concerned in the past about the fairness of individual assessments for all students, it is reasonable to anticipate an extension of that thinking to include accountability systems. Reviewing the performance of the API now enables the state to better position itself in the event of such a challenge. Based on our analysis of the various approaches currently in use in other states, there is considerable reason from both statistical and administrative standpoints to shift from the current method of calculating the API to one that is based on measures of student progress. By tracking the progress of individual students over time and incorporating those progress measures into a single score for schools, California can provide a clearer and more complete picture of how schools are performing. The capability exists today to track students over time within districts. With this ability, the state can follow students as they move through schools in a district to identify year over year gains or losses in learning. In the future when a statewide unique student identifier is adopted in California, the capacity will exist to track students even if they change districts within the state. Both approaches are superior to the current approach, as will be discussed below, and move the state forward in its commitment to hold schools accountable for the performance of their students.

Description of the Alternatives The survey of state practices revealed four ways that states aggregate test scores into school scores. There are two primary approaches, each with two variations. These are described briefly below:

Page 56: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

46

Static Measures of Schools. One basic approach is to take point-in-time snapshots of school performance. The scores of all eligible students are incorporated into the measure, with all schools using the same method of calculating the aggregate score. In this vein, there are two choices: Status Model – A single school score is compiled from all eligible students. This

approach is the one used in California. With the Status Model, it is not possible to factor out year-to-year changes in student body composition or grade-to-grade changes in instructional design or teacher quality. The result is used both as a measure of school performance in a single year and as the base for change in scores in subsequent years.

Grade Level Cross-Sectional Model – The approach extends the Status Model a

step further and examines changes in grade level performance across years. For example, a school might examine the performance in 4th grade in successive years. The technique is the same as the Status Model but stratified by grade. With the Grade Level Cross-Sectional Model, shifts in instructional design or teachers across grades will not affect results, an improvement over the Status Model. Shifts in the types of students from year to year will influence results. In addition, there are fewer students in each grade than in the whole school; therefore, the grade level scores are more vulnerable to swings from year to year.8

Student Change Models. The Student Change models capture student achievement and progress in a manner that preserves the greatest amount of detail about a school and its students. These models isolate the effects of changes in student body composition, teacher differences and instructional design shifts across grades; the results describe most clearly the performance of schools. Because the unit of measure is improvement, it can be applied to students regardless of the actual level of initial performance.9 Within the Student Change model, two variations exist. Cohort Student Change Model – This approach follows the same students from

year to year. The scores of students in a school are matched year to year and the extent of improvement or loss is calculated. These measures of change, referred to as gain scores, are then aggregated to a school score. The disadvantage of the cohort model is that students who enter a school after the first year of the cohort are not included.

Revolving Panel Student Change Model – This analysis improves on cohort models because it analyzes data at the student level and can include all students with gain scores, not just the students in the original group. Today, some districts in California have the capacity to track students within their boundaries. We call this a District-based Revolving Panel. At such time as students can be tracked with unique identifiers as they move within the state, the revolving panel can be even more inclusive. This approach is named the Statewide Revolving Panel.

Page 57: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

47

Table 13 displays the states with rating systems by the analytic model used to calculate their school scores. The progression from Status to Grade Level Change to Student Change is associated with greater precision in the measures and greater detail about the real impacts of school activity, as will be discussed in the remainder of this section. Accordingly, the incentives for schools are improved because the results do a better job of explaining the real state of schools without confounding influences mixed in. Consequently, the Revolving Panel Student Change model was designated as the Best, and the Status model as the Minimum.

Table 13 Classification of States by the Type of Analysis Model

Used in School Rating Systems in 2001

Minimum Better

Best

Student Change

Status

Grade Level Change Cross Sectional

Cohort

Revolving Panel

Arkansas Alabama

New Hampshire New York

Alaska Colorado

Louisiana Oklahoma

New Mexico North Carolina

Tennessee Massachusetts

California Ohio Delaware Rhode Island Connecticut Oregon Florida Vermont Georgia South Carolina Kentucky Wisconsin Mississippi Maryland Michigan

Texas Virginia West Virginia

Nevada

Performance Requirements for Computing the API As was the case in the preceding analysis of potential additions to the API, the examination of alternate ways to aggregate test scores was grounded in objective performance requirements. Criterion: The API as a whole must be cogent and accurate for the school overall and for the subgroups that are to be examined separately.

For the API to best serve its public policy goals, the steps involved in calculating school scores and the resulting measure should be impartial and true. The API should provide confidence that it can correctly and consistently identify schools that are producing a higher level of achievement in educating their students, and those that do not. There must be assurance that the API is measuring the right things and measuring them accurately. In short, the API must be based on sound methodology for schools as a whole and for any subgroups of special interest. The subgroups may be culturally, socio-

Page 58: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

48

economically or educationally defined. SB 1552, which revised SB 1X, defines groups as either a) having more than 30 persons and constituting 15 percent of the student body or b) having 100 persons regardless of proportion.

In terms of measuring the right things, the intent of SB 1X is that the API would be indicative of a school’s ability to “…provide for the academic development of each pupil…” and help each student “…become a lifelong learner, equipped to live and succeed within the economic and societal complexities of the 21st century.”(SB 1X, 52050.5a) This intent means that the API needs to measure the things a school is currently doing that are likely to have the most effect on student learning. It is easy to stipulate that student achievement is a good measure of school performance. Does the API capture student achievement in the best possible way?

In addition to considering what the API should include, it is equally important to know what it should exclude. The API should exclude to the extent possible those factors outside a school’s control that affect student-level achievement. One important test is whether the API eliminates the effects of having different groups of students in each grade from year to year. As an example, if a school draws a particularly bright class of students, the API score will bump up as they pass through. After they leave, the API will drop. The bump occurs even if the schooling is constant. Shifts in the API score ought only to reflect changes in school performance, not variation in the students coming through. Conversely regardless of students’ starting points, schools should be held responsible for creating additional gains. The legislative intent of SB 1X also means that the API needs to reflect how well schools prepare students for the future. In an ideal world, there would be some external benchmark against which we could gauge the overall performance of the API. We can think of several candidate measures that might apply to K-12 education (juvenile justice indicators, college admission, employability, income, voter registration, evidence of good citizenry, etc.) but none are routinely linked to student K-12 performance at the present time. External benchmarks would allow for periodic tests that the API is still accurate in explaining school performance.

The API also needs to be accurate. If the API is accurate, then two schools with equivalent academic performance will receive the same score. The reverse must also apply – two schools with the same score must have equivalent academic performance. The same requirement applies to subgroup-level measures; they must exhibit good measurement characteristics. In addition, to be sure that the API is an accurate measure, it is also necessary to consider whether it captures the full experience of schools. If the computational steps used to create a single score for a school ignore groups of students or fail to recognize differences in scores between two years, then the bias that is introduced to the API results would diminish its value as a policy device.

Page 59: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

49

Why Student Gain Score Approaches are Superior

For the purpose of the following analysis, we assume that the STAR scores are perfectly accurate.10 This assumption enables a clearer description of the effects of the API. Here, the focus is on the computations that produce the API and the resulting indicator of performance for the entire school and for designated students sub-groups. The analysis of the different analytic models reveals that Student Change Models produce a more accurate measurement of school performance. While adoption of the Statewide Revolving Panel Model would be optimal, we recognize the current lack of a unique statewide student identifier. A second choice -- a District-based Revolving Panel Model -- will realize many of the same of benefits and will set the stage for eventual adoption of the Statewide Revolving Panel design. Both significantly improve on the current Status model. A more technical treatment of these issues is included as an appendix to this report. The presentation of benefits here is focused for the non-technical reader. With the Student Change Model, measures of school performance for a single year are accurate. For each year, differences across students in a given school are preserved. This is because the Student Change models incorporate the actual level of performance for every student, both for the base year and for the calculation of progress. Schools and policy makers alike will have a clearer read of how schools are doing. This is not the case with the current method of calculating the API. Its use of performance bands group together as indistinguishable all students who score within each band. With today’s API, the calculations in effect substitute the lowest score in the band in which the student score falls. Every student whose score is placed in the band is weighted the same, so the lowest NPR in the band becomes the de facto score for each student in the band. So if a student scores a 58 NPR he falls into the third band, and has the same impact as a student in the 50th NPR. The same is true for all the students in every band. Unless the student happens to score at the lower boundary, the difference between the true score and the lowest band score is left uncounted. The difference in approaches has an obvious effect on the accuracy of the results. Consider the hypothetical case where two schools each have all their students score the range that is currently aggregated in a single band. If one set of scores clusters at the low end of the range and the other at the high end, the difference could be substantial, but would not influence the current API.

Page 60: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

50

With the Student Change Model, measures of progress in school performance are more accurate. Student Change Models track the change in learning by student, in the same unit of analysis in which learning occurs. Each student’s actual change is captured and incorporated into the school’s score. The shift in focus from point-in-time achievement to progress provides an equalizing basis for comparison across students – the emphasis is on progress regardless of starting point. In this regard, the technique is blind to the starting endowments of students and schools are judged on the basis of their ability to take students forward from whatever point they start. This feature offers better incentives for schools to work toward improvement with every student. The District-based Revolving Panel Model provides a reasonable basis for adopting this approach. As students progress through a school, their year-to-year scores are tracked and the changes folded into a school score. As long as students stay in the same district, their performance will contribute to their schools’ scores. Since newcomers will still be tested, their performance could be incorporated after the second year. The Statewide Revolving Panel Student Change Model allows students who change districts to be included after one year, since the unique statewide identifier could bring forward a score from the prior year. This feature contrasts sharply with the current Status Model. Moving across years it is easy to see that the use of bands in the Status Model API creates significant problems. The incentives created by the API motivate schools to work to improve the academic achievement of their students. However, lumping scores into bands means that real scores can improve significantly and the improvement will not be reflected in the API. This point is especially meaningful when one considers that individual test score moves of ten percentile rankings are considered exceptional. The only scores that would affect the current Status Model API are those that moved across bands from the prior year, since they would affect the proportion of students in each band.11 That two students with equivalent improvement could contribute to the API differently solely on the basis of their starting points highlights the problems with sensitivity that currently exist. The Student Change Model produces a fairer picture of all schools, both in demonstrating improvement and in showing the preservation of gains. The Student Change Model provides a self-weighted measure of the effect that schools have on their students. It is self-weighting because each student included in the school score acts as his own “pre- and post-” test score. This feature means that the annual averaging of all eligible gain scores isolates the contributions of the school to student learning and that all schools will have the same chance to demonstrate their impacts on student learning. Every degree of change is captured and amassed in the school score. The result is in two ways a superior reflection of school performance than is possible with the current API. First, today’s API produces a more stable measure for large

Page 61: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

51

schools than others, because the computations are affected by both the proportion and the absolute numbers of students in each band. Small schools are likely to experience more fluctuation from year to year than large schools. This occurs due to the greater chance in any given year that small schools will have students in each band clustered at the top or bottom of the range -- and therefore susceptible to jumping bands for better or worse in the API results. Second, it will be harder for large schools to show improvement over time, all other factors being equal. Larger numbers of students in each band will tend to cluster more tightly around the average score within each range than smaller schools, suggesting that there will need to be larger gains for more students to register improvement on the API compared to small schools. The statistical problem just mentioned becomes especially acute when school populations are disaggregated into subgroups.12 California is one of 24 states with rating systems that examines the results on a disaggregated basis. Even if a minimum of 100 students in a subgroup is met, spreading their scores across five bands makes for few numbers in each band, and therefore a greater likelihood that they are distributed in a manner that is poised for band shifts in the coming year. Thus the well-intentioned provision of the API may in fact be creating biased figures of group behavior over time. Since subgroup results serve to measure status and progress on important social and educational policies, the precision of the API takes on additional import. The Student Change Model controls better for changes in a school’s student composition. Because the Student Change Model only incorporates scores of students that have been in the school for two or more years, the approach controls for shifts in student populations better than the Status Model used in the API. By removing these effects from the model, a major source of bias is eliminated, with the desirable result of more accurate measures of school impacts on students.

The problem of blurring changes in student populations and changes in school performance is common to any rating system based on cross-sectional snapshots of performance. Over time a school has groups of students moving through its grades. At any one point in time, it may have a collection of students who are better or worse than the typical student, a situation known as student variation. Those better or worse students’ test scores are factored into the API, and will elevate or depress the API independent of the efforts of the school. As currently structured, the API cannot discriminate between student variation and true improvement in school performance. A few states have recognized this problem and have moved to correct their systems. Massachusetts has chosen to average school scores for three years to dilute the effect of student variation (although their approach creates its own set of problems when trying to attribute changes in index scores to activities a school undertook to improve student achievement). Others have elected to move to Student Change Models in the next few years.

Page 62: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

52

With the Student Change Model, all eligible students contribute to the school score with equal weight. The Student Change Model is composed of the gain scores of all eligible students, so each has an equal chance to affect the result. The benefit described here goes beyond the inter-student accuracy discussed in the first benefit (which concerned points of measurement); here the concern is with the method of calculation using those points. With the Student Change Model, every eligible gain score receives equivalent weight in the calculation of a school score. It may surprise some to know that the current API does not achieve the same result. The only scores that affect the change in a school’s API scores are those, by moving into a different band, change the proportions within each band, and thus the school score. Such scores represent a subset of all the scores for a school. What happens to those scores determines the fate of the school’s API. Thus the API is based on a sample of students that may not be representative of the students as a whole. The problem is even more acute when disaggregating school scores by ethnicity and socio-economic status. Conditioned on an ability to track student moves across districts and certain legislative changes, the Student Change Model would allow for greater numbers of students to be deemed eligible for inclusion in the API every year. The Student Change Model calculates the measure of learning that each student has achieved, and incorporates it into a school level score. The current legislation allows the scores of students who are in a district for the first year to be excluded from a school’s API. However, if the state attains the capacity to identify students uniquely throughout the state, it would be possible to bring a student’s score from a prior district forward and calculate the gain score at the end of the first year. There may be other concerns behind the exclusion rules, but a final API that does not reflect the full experience of all students gives a selective picture of a school. Comparing the 2000 statistics for STAR and the API shows that statewide 14 percent of students taking the STAR were excluded from the API. (Students excluded from the API for non-standard test administration would also have been excluded from the STAR results.) They represent a significant group of students who benefit from significant efforts by a school, but they are not reflected in the API results.

Conclusions The Student Change Model produces a more accurate, comprehensive and sensitive measure of the performance of schools than is currently available with the California API. Beyond the computational benefits described above, adoption of a Student Change Model for the API is attractive for other policy reasons. Because the data on students is preserved at the level of greatest detail it becomes possible to perform richer analyses to

Page 63: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

53

support California’s accountability policy. The use of gain scores also permits non-school factors such as transience or special status largely to be controlled for. In looking forward, the adoption of gain scores will enable even stronger refinements of accountability. The state will be able to better parse the contributions of schools, curricula, different staff mixes or other factors so it may learn how to best maximize student achievement. To move from using a Status model to a better model, several things need to occur. First, a Student Change model requires following individual students from one year to the next. Linking student data across testing periods means each student must have a unique identifier. If the identifier is district-specific, then school-level API scores could be calculated for students within a district. However, the district-level alternative means that student information would be lost when students transfer from an elementary district to a high school district or when student move within the state. A better identifier would be state-specific so as long as students stay within the state, they could be included in the API calculations. Second, all data elements included in the API would need to be collected at the student-level. The availability of student-level data on the individual student answer sheets makes this feasible in the short term. In the longer run, if variables reported to CDE in aggregated form are desirable for new analyses, it would not impose costs on districts to alter the reporting requirement to require the disaggregated figures instead.

Page 64: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

54

VIII. Summary of Findings

Applying the ten performance criteria to the API provided a broad base of comparison for current or potential elements and on the analytic processes that produce school scores. In the future, the criteria could be used to re-examine the form and function of the API, thereby providing continuity in the base of comparison. The developmental frameworks that support the criteria show that the current API is already favorably positioned when compared to the accountability systems of other states. The analysis revealed the following strengths of the API:

1. California has a rating system to evaluate schools with an objective and critical index.

2. California includes grades 2-11 in the API making the focus comprehensive. 3. California has a solid focus on outcomes through the use of student test scores. 4. California studies the API impacts for special populations of interest to promote

equivalent results for all students. 5. California has not to date incorporated elements into the API that dampen the

clarity and strength of incentives to schools. The developmental frameworks also provide a clear rationale for making changes to the API in order to improve the picture of school performance. The analysis supports the following steps to improve the API:

1. Formalize minimum performance standards for any factor of the API. . 2. Establish a regular quality review of API performance. 3. Improve the diagnostic capability of the API to detect changes in performance. 4. Achieve greater precision of results by expanding the time frame of analysis. 5. Contigent on adopting a unique student identifier and establishing practices for

allocating accountability among schools for students who attend multiple schools in one year, increase the percentage of students in each school who are included in the API.

To bring these suggested remedies down to a concrete level, Table 14 presents the list of potential changes discussed in the analysis.

Page 65: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

55

Table 14

Benefits of Suggested Changes to the API Changes to Current API Structure

Change Benefit Action Prevent variables that are not Preserves incentives Change legislation to strongly associated with student eliminate certificated achievement from being added personnel and student to API attendance rates

Add Other Outcomes as Adds dimension Standardize data definitions Data quality permits to result Expand state

Monitoring of district data collection Modify current reports to Preserve maximum detail

Increase focus on Directs focus of schools Add factor weight for Special groups group improvement Increase percentage of Decreases selectivity Change legislation on Students in API API inclusion rules to

reflect policies (to be developed) on allocating accountability for students who move across districts

Change analytic model to: -Revolving Panel- Increases precision All districts must have

District wide and inclusivity student identifier

-Revolving Panel- Maximizes measurement State must assign unique Statewide precision and inclusivity student identifier

The changes suggested by the analysis underlie the recommendations presented in the final section of this report.

Page 66: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

56

IX. Recommendations

The preceding analysis gives sound reasons for the four key recommendations of this report. Some of the specific actions are predicated on prior activity. Accordingly, we assigned general time frames to each step. Recommendation: Until more robust data resources are available, the API should

continue to consist of STAR scores. None of the factors proposed for inclusion of the API were found to be suitable after extensive review. Until some of the steps outlined below are in place, modification of the API factors will neither maintain nor improve its effectiveness. Retaining the API’s current reliance on STAR scores maximizes its utility for the time being. This recommendation has four associated steps:

a. The legislature should develop and adopt criteria for assessment and selection of future factors for the API. The criteria used here could be employed or modified to establish a priori standards that are objective, measurable and related to the policy goals of the law.

b. The legislature should alter the language of the PSAA to reflect the use of standards for including new factors into the API. This step also involves the deletion of the requirement to include certificated personnel and student attendance rates as factors, since the analysis showed them to be poor candidates.

c. The California Department of Education should improve the quality of student, school, district and state data. Particular attention is needed in the areas of consistency of data collection and reporting, variable definition and true measurement.

d. In order to improve the data quality of the state education systems, it will be necessary to develop unique identifiers for students. While some improvement in the API and other data is feasible without them, identifiers are needed in order to make the most beneficial changes to the API. A unique statewide student identifier can be added to the STAR header sheet for each student so data can be linked across years. The details of how to develop a unique statewide student identifier that is compliant with federal and state law has been completed by the CSIS group.

Recommendation: Change the computations for the API. By adopting a longitudinal approach to the API, sounder results will be obtained. The near-term strategy is to calculate district-level gain scores in a revolving panel design. Student identification in the short term can be done using a district level identifier. A district identifier should be a short term placeholder, though, because of the information

Page 67: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

57

lost on significant proportions of students transferring between districts. Even so, the near-term results will be better than school-wide averages, but even those can be can be further improved over time. Six action steps are linked to this recommendation:

a. In the next year calculate both the school average and a panel based gain score. The dual analysis will be necessary to create a transition in which districts identify each student in a unique fashion and follow them regardless of which school the student attends. It will also provide a basis for transitioning the API scores for each school. For the one-year transition, districts will use the existing exclusion rules in the legislation.

b. In the second year, implement gain-score based API scores. c. When the state student identifier is in place, evolve the analysis to utilize a

statewide revolving panel design to include the maximum number of students in the analysis.

d. When the remaining identifiers are in place, extend the revolving panel analysis to conduct value-added analysis of the API gain scores to differentiate school versus non-school factors affecting student achievement. This step can take advantage of the demographic and program information on the STAR header sheet to improve the model.

e. Alter the header sheet of the STAR tests to capture teacher turnover, teacher experience, and classroom student composition, to eventually enhance the value-added analysis.

Recommendation: Conduct regular quality assurance review of the API. The performance of the API needs to be assessed annually. A typical quality assurance process for the API would track its utility as a predictive instrument. A key part of this process would determine an external benchmark that can be used for determining the validity of the API. The trend information will provide important baseline information to the CDE as additional changes are made to STAR and as other factors are raised for consideration as possible additions to the API. There are two steps involved in this recommendation:

a. Augment legislation to require regular reviews of the API structure and performance.

b. Create and adopt a procedure to conduct regular analytic reviews of the API and its performance.

Page 68: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

58

Recommendation: Establish long-term student outcomes to reflect ultimate

success of the PSAA. The truest validity of the API and student achievement is the ability of California students to pursue successful life courses after secondary schools. The CDE should investigate longer-run outcomes for California’s students, in conjunction with the Department of Labor or through CDE and the state system of higher education. Since the ultimate outcomes of public education are reaped after students complete their high school years, obtaining information on the longer-run experience of students can provide separate mechanisms for testing the reliability and validity of the API as an accountability system. Other states, like Tennessee, can serve as role models for the integration of information systems to enable this kind of analysis. 1 As indicated on the CDE website (http://www.cde.ca.gov/ststetests/star.qanda/augqa9906.pdf), the STAR augmentation tests were added to the program to improve the alignment of the assessment system with state curriculum 2 Rose, Lowell C. and Alec M. Gallup, “The Thirty-Second Annual Poll of the Public’s Attitudes towards the Public Schools,” Phi Delta Kappan, vol. 82, (September 2000), pp. 41-58. See performance comparisons in Moe, Terry M. Schools, Vouchers and the American People, Washington, DC: Brookings Institution Press, 2001. 3 Inconsistency in the measure, number of graduates completing A-G requirements A-G Requirements, results from course completion being defined as a student receiving a C or better, and grading practices varying across schools. 4. “Report to the Governor and Legislature, Establishing School-Level Graduation and Attendance Rates, For Implementation of School Accountability (As Required by the Public Schools Accountability Act of 1999)” California Department of Education, (September 1999). 5. This group is admittedly less homogeneous than the Minimum and Best groups because it includes states that test students every year for a fewer number of grades as well as those that test non-sequential grades. The incentives for both sub-groups still exceed those of the Minimum, although those for states with non-sequential grades are smaller than states with blocks of grades. Both sub-groups have weaker incentives than the Best practice states. 6 Rumberger, Russell W., Katherine A. Larson, Robert K. Ream and Gregory J. Palardy, “The Educational Consequences of Mobility for California Students and Schools,” ERIC ED441040. 7 Diana vs. State Board of Education, Civil Action C-7037, RFPND Cal, Jan 7, 1970, June 18, 1973; Larry P vs. Riles, Civil Action 6-71-2270, 343F Sup p1036, ND Cal, 1972. 8 This result occurs because there are a smaller number of observations; therefore each one has a greater (and potentially large) contribution to the measurement error and to the variance. 9 We recognize that National Percentile Rankings cannot be manipulated in this way. However, Normal Equivalent Curves – a different transformation of raw scores – will enable calculations directly from scores. 10 In fact we know that this assumption is unfounded, and that the error inherent in individual scores affects the likelihood of changes over time. This point is addressed in Kane, Thomas J. and Douglas O. Steiger, “Improving School Accountability Measures,” NBER Working Paper No.W8156, (March 2001). 11 This point was illustrated for CREDO by John Chubb. We are grateful for the insight. 12 Kane, Thomas J. and Douglas O. Steiger, “Improving School Accountability Measures,” NBER Working Paper No.W8156, (March 2000).

Page 69: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

Bibliography

Alspaugh, J. W. (1996). "The Longitudinal Effects of Socioeconomic Status on

Elementary School Achievement." ERIC (ED397120). Caldas, S. J. (1993). "Reexamination of Input and Process Factor Effects on Public

School Achievement." Journal of Educational Research 86(4): 206-14. Clements, B. S. and R. K. Blank (1997). "What Do We Know about Education in the

States: Education Indicators in State Reports." ERIC (ED414314). Cohen, D. K. (1990). "A Revolution in One Classroom: The Case of Mrs. Oublier."

Educational Evaluation and Policy Analysis 12(3): 311-329. Cohen, D. K. and D. Loewenberg Ball (1990). "Policy and Practice: An Overview."

Educational Evaluation and Policy Analysis 12(3): 233-239. Cohen, D. K. and D. Lowenberg Ball (1990). "Relations between Policy and Practice: A

Commentary." Educational Evaluation and Policy Analysis 12(3): 331-338. Crone, L. J., C. H. Glascock, B. J. Franklin and S. E. Kochan (1993). An Examination of

Attendance in Louisiana Schools. Louisiana. Darling-Hammond, L. (1999). Teacher Quality and Student Achievement: A Review of

State Policy Evidence, Center for the Study of Teaching and Policy 12(3): 239-348.

Darling-Hammond, L. (1990). Instructional Policy Into Practice: "The Power of the

Bottom Over the Top." Educational Evaluation and Policy Analysis 12(3): 339-347.

Dill, V. S. (1993). "Closing the Gap: Acceleration vs. Remediation and the Impact of

Retention in Grade on Student Achievement." ERIC (ED364938). Earthman, G. I. and L. Lemasters (1996). "Review of Research on the Relationship

between School Buildings, Student Achievement, and Student Behavior." ERIC (ED416666).

Ehrenberg, R. G., R. S. Ehrenberg, D. I. Rees and E. L. Ehrenberg (1991). "School

District Leave Policies, Teachers Absenteeism, and Student Achievement." Journal of Human Resources v26(n1): p72-105.

Elam, S. M., C. R. Lowell and A. M. Gallup (1994). "The 26th Annual Gallup Poll of the

Public's Attitude toward the Public Schools." Phi Delta Kappan 76(1): 41-56.

Page 70: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

Elam, S. M. a. A. M. Gallup. (1989). "The 21st Annual Gallup Poll of the Public's Attitudes toward the Public Schools." Phi Delta Kappan 71(1): 41-54.

Franklin, B. J. and L. J. Crone (1992). "School Accountability: Predictors and Indicators

of Louisiana School Effectiveness." ERIC (ED354261). Gallup, A. M. (1985). "The 17th Annual Gallup Poll of the Public's Attitudes toward the

Public Schools." Phi Delta Kappan 67(1): 32-47. Goldhaber, D. D. and D. J. Brewer (2000). "Does Teacher Certification Matter? High

School Teacher Certification Status and Student Achievement." Educational Evaluation & Policy Analysis 22(2): 129-45.

Greenberg, D., and J. McCall (1974). "Teacher Mobility and Allocation." ERIC

(ED102160). Guryan, J. (2001). "Desegregation and Black Dropout Rates." WP 8345 National Bureau

of Economic Research. Hanushek, E. A., J. F. Kain, and S. G. Rivkin (1999). Do higher salaries buy better

teachers? National Bureau of Economic Research Working Paper. Cambridge, MA.

Harvey, B. H. (1994). "To Retain or Not? There is no Question." ERIC (ED369177). Heinlein, L. M. and M. Shinn (2000). "School Mobility and Student Achievement in an

Urban Setting." Psychology in the Schools 37(4): 349-57. Jennings, T. A., T. M. Kovalski, and J. T. Behrens (2000). "Predicting Academic

Achievement Using Archival Mobility Data." ERIC (ED449181). Kaufman, P., X. Chen, S. P. Choy, K. A. Chandler, C. D. Chapman, M. R. Rand, and C.

Ringel (1999). "Indicators of School Crime and Safety, 1998." Education Statistics Quarterly 1(1): 42-45.

Lillard, D. R. a. P. P. DeCicca. (2001). "Higher Standards, more dropouts? Evidence

within and across time." Economics of Education Review 20(5): 459-473. Loewenberg Ball, D. (1990). "Reflections and Deflections of Policy: The Case of Carol

Turner." Educational Evaluation and Policy Analysis 12(3): 247-259. Mao, M. X., Maura D. Whitsett, and Lynn T. Mellor (1997). "Student Mobility,

Academic Performance, and School Accountability." ERIC (ED409380). Mayer, D. P., et al. (2001). "Monitoring School Quality: An Indicators Report."

Education Statistics Quarterly 3(1): 38-44.

Page 71: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

Miller, J. W., B. A. McKenna, et al. (1998). "A Comparison of Alternatively and

Traditionally Prepared Teachers." Journal of Teacher Education 49(3): 165-76. Office of Educational Research and Improvement. (1992). Parental Satisfaction with

Schools and the Need for Standards. Washington, DC, U.S. Department of Education.

Peterson, Penelope L. (1990). "Doing More in the Same Amount of Time: Cathy Swift."

Educational Evaluation and Policy Analysis 12(3): 261-280. Peterson, Penelope L. (1990). "The California Study of Elementary Mathematics."

Educational Evaluation and Policy Analysis 12(3): 241-245. Pitkiff, E. (1993). "Teacher Absenteeism: What Administrators Can Do." NASSP

Bulletin 77(551): 39-45. Powers, S. and S. McConner (1997). "Project SOAR 1996-1997. Evaluation Report."

ERIC (ED412269). Richards, C. E., and T. M. Sheu (1992). "The South Carolina School Incentive Reeward

Program." Economics of Education Review 11(1):71-86. Rose, L. C., and A. M. Gallup (1999). "The 31st Annual Gallup Poll of the Public's

Attitudes toward the Public Schools." Phi Delta Kappan 81(1): 41-56. Rowland, C. (2000). "Teacher Use of Computers and the Internet in Public Schools."

Education Statistics Quarterly 2(2): 72-75. Rumberger, R. W., K. A. Larson, R. K. Ream, and G. J. Palardy (1999). "The

Educational Consequences of Mobility for California Students and Schools. Research Series." ERIC (ED441040).

Schacter, J. and C. Fagnano (1999). "Does Computer Technology Improve Student

Learning and Achievement? How, When, and under What Conditions?" Journal of Educational Computing Research 20(4): 329-43.

Shepard, L. A., and M. L. Smith (1990). "Synthesis of Research on Grade Retention."

Educational Leadership 47(8): 84-88. Shields, C. M., and S. L. Oberg (2000). Year-Round Schooling: Promises and Pitfalls.

Lanham, MD, Scarecrow Press, Inc. Skyes, G. (1990). "Organizing Policy Into Practice: Reactions to the Cases." Educational Evaluation and Policy Analysis 12(3): 349-353.

Page 72: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

Uehara, D. L. (1999). "Where are the Teachers? A Policy Report on Teacher Attendance in the Pacific Region." ERIC (ED 440925).

Webster, W. J., R. L. Mendro, and T. O. Almaguer (1993). "Effectiveness Indices: The

Major Component of an Equitable Accountability System." ERIC (ED358130). Wenglinsky, H. (1998). "Does It Compute? The Relationship between Educational

Technology and Student Achievement in Mathematics." ERIC (ED42519). Weimers, N. J. (1990). “Transformation and Accommodation: A Case Study of Joe

Scott.” Educational Evaluation and Policy Analysis 12(3): 281-292. Wilson, S. M. (1990). “A Conflict of Interests: The Case of Mark Black.” Educational

Evaluation and Policy Analysis 12(3): 293-310.

Page 73: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

Appendix A

Page 74: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive
Page 75: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive
Page 76: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive
Page 77: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive
Page 78: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive
Page 79: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive
Page 80: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive
Page 81: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive
Page 82: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive
Page 83: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive
Page 84: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive
Page 85: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive
Page 86: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive
Page 87: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

Appendix B

Page 88: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

The 2001 Base Academic Performance Index (API): Integrating the California Standards Test for English-Language Arts into the API November 13, 2001

Page 89: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

California Department of Education Policy and Evaluation Division

1

2001 Base Academic Performance Index (API): Integrating the Results from the California Standards Test

in English-Language Arts (CST ELA) into the API On September 5, 2001, the State Board of Education approved a methodology for integrating the results from the California Standards Test in English-Language Arts (CST ELA) into the 2001 Base Academic Performance Index (API), which the California Department of Education will release in January 2002. This paper: • Explores the legal and policy background for the incorporation of results from the CST ELA

into the API and describes the guiding principle of continuity • Reviews step-by-step the methodology for incorporating the CST ELA results into the API • Concludes with graphic illustrations of how to calculate the 2001 Base API Background Legal Requirements The Public Schools Accountability Act (PSAA) of 1999 (Ch. 3 of the Statutes of 1999) requires the inclusion of results from the standards-based component of the Standardized Testing and Reporting (STAR) examination in the API [Education Code, Section 52052(a)(3)]. This becomes possible only when the State Board of Education (SBE) defines performance levels for the standards-based tests. This has already occurred for the CST ELA beginning with the administration of the spring 2001 test. Standards-Based Tests and the API The present API methodology of aggregating individual norm-referenced results into five performance bands will easily accommodate standards-based reporting conventions. This is not an accident. The API was originally designed with precisely this eventuality in mind. 2001 Base API. The results of the CST ELA are reported at the school level in terms of the percentage of pupils scoring at certain performance levels. Following the terminology of the National Assessment of Educational Progress (NAEP), these levels were initially considered by the SBE to be below basic, basic, proficient, and advanced. After further review, the State Board decided that the below basic performance level should be further subdivided into two: below basic and far below basic. This subdivision results in five performance levels, making the API more sensitive to gains by low achievers on the CST ELA. It establishes a precedent for the future as other standards-based tests are incorporated into the API. 2002 Base API. It is anticipated that the 2002 results from the performance based writing test in grades 4 and 7 as well as the 2002 standards-based results in mathematics will be integrated into the 2002 Base API. The results from the writing test will be used along with results from the CST ELA to determine an individual student’s ELA performance level. Therefore, it will not be necessary to introduce the writing test into the API as a separate component. Along with writing

Page 90: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

California Department of Education Policy and Evaluation Division

2

and mathematics, it is also possible that results of the science and history/social science tests may be available for incorporation into the API in 2002. 2003 Base API. In 2003 the exact configuration of STAR may change with the possible introduction of a new norm-referenced test. Guiding Principle: Continuity

In approving a methodology, the SBE accorded an overriding importance to the principle of continuity. The present system of APIs and targets has now been in place for almost two years. It has created a culture along with a set of expectations on the part of local educational agencies (LEAs) as to what constitutes significant growth and a high level of performance. Therefore, features of the present API system should be preserved to the greatest extent possible. In particular, the present API scale of 200 to 1000 and the performance target of 800 will be maintained.1 The performance level weighting factors for the new CST ELA indicator will be equivalent to those used for the Stanford 9 results. Finally, a Scale Calibration Factor (SCF) will ensure that the statewide average 2001 Base APIs for elementary, middle, and high schools will equal the statewide average 2001 Growth APIs by school type. Steps in Calculating the 2001 Base API Step #1: Apply the Performance Level Weighting Factors In order to incorporate results from the CST ELA into the API, it is necessary to calculate a summary number for these results. Following the existing methodology for summarizing norm-referenced results, this number will be derived by first multiplying the percentage of students scoring at each performance level by a weighting factor and then summing the results of these calculations into a single number. This number represents a summary score for the CST ELA (“indicator score”). The system of weighting factors for summarizing the CST ELA results will be the same as for summarizing norm-referenced results (1000-875-700-500-200).

1 With the adoption of the Scale Calibration Factor (see page 3), it is theoretically possible for a school to have an API in excess of 1000. However, it is likely that all of the attained scores on the 2001 Base API will fall between 200 and 1000.

California Standards Test English Language Arts

A B C D

Performance LevelsWeighting Factors

Percent of Pupils in

Each Level

Weighted Score in

Each Level(B x C)

5 Advanced 1000 9% 90.00

4 Proficient 875 22% 192.50

3 Basic 700 33% 231.00

2 Below Basic 500 22% 110.00

1 Far Below Basic 200 14% 28.00

a Indicator Score 651.50

b Indicator Weight 36%

c Total Weighted Score for Indicator 234.54

axb=c

Page 91: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

California Department of Education Policy and Evaluation Division

3

Step #2: Integrate the CST ELA Indicator Score into the API

Content area weights. Once an indicator score for the CST ELA is calculated, it is then integrated into indicator scores for the norm-referenced tests (NRTs) in order to arrive at an API. According to the methodology adopted by the SBE, the current division of the API into content areas will be maintained. The CST ELA indicator score will therefore constitute a portion of the English language arts component of the API, which currently is 60% of the API in grades 2-8 (reading, language, and spelling) and 40% in grades 9-11 (reading and language). The charts below graphically summarize the resulting methods for test results for grades 2-8 and 9-11: Elementary and Middle Schools, Grades 2-8

The relative weight for Language Arts vs. Mathematics is the same as for the 2001 Growth API

High Schools, Grades 9-11

The relative weight for Language Arts vs. Mathematics is the same as for the 2001 Growth API

NRT and CST weights. Within the English language arts content area, the SBE has approved a weight of 60% CST results to 40% NRT results. This ratio will be applied fully in the base 2001 Base API, not phased in as some have proposed. The following tables summarize the specific proportion that each content area will constitute and illustrate the proportional split of CST to NRT types of results for grades 2-8 and 9-11:

Elementary and Middle Schools, Grades 2-8

Content Area % of API Math NRT 40% ELA NRT 24% Reading (12%) Language (6%) Spelling (6%) CST ELA 36%

Stanford 9 Reading 12% Stanford 9 Language 6% Stanford 9 Spelling 6%

CST ELA 36% { 2001

Base API

Stanford 9 Mathematics 40%

Stanford 9 Reading 8% Stanford 9 Language 8% CST ELA 24% { 2001

Base API

Stanford 9 Mathematics 20% Stanford 9 Science 20% Stanford 9 Social Science 20% {

Page 92: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

California Department of Education Policy and Evaluation Division

4

High Schools, Grades 9-11 Step #3: Application of the Scale Calibration Factor (SCF) It is probable that the statewide average indicator score of a new API component will not coincide with the existing statewide average API. Therefore, the integration of new components into the API will likely cause unintentional fluctuations between the same year’s statewide average Growth and Base APIs. This type of fluctuation is counterintuitive, since both the Growth and Base API reflect performance by exactly the same students at exactly the same time. In order to eliminate these fluctuations and thereby to enhance the interpretability of the API, the SBE has approved the application of a neutral introduction factor, henceforth referred to as the Scale Calibration Factor (SCF), in the calculation of the Base API. The SCF is an additive constant. It may be either a positive or negative number, depending upon the impact of new components of the API. The 2001 Base API will mark the first use of the SCF. Simply put, the SCF is the difference between statewide average 2000-2001 Growth API and the initial statewide average 2001 Base API by school type as derived from Steps #1 and 2 above. The appropriate SCF will be added to or subtracted from each school’s initial 2001 Base API in order to arrive at the school’s final 2001 Base API. Charts Illustrating How to Calculate the 2001 Base API The following summary charts (pages 5, 6, and 7) illustrate how to calculate the 2001 Base API, including the application of SCFs, for three grade span types (2-6, 7-8, and 9-11). As noted, the exact value of the SCFs will be determined only after the generation of the final 2000-2001 API Growth File in December 2001 and the preliminary 2001 API Base File in January 2002.

Content Area % of API Math NRT 20% ELA NRT 16% Reading (8%) Language (8%) CST ELA 24% Science 20% Social Science 20%

Page 93: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

California Department of Education Policy and Evaluation Division

5

California Standards Test English Language Arts

A B C D

Performance LevelsWeighting

Factors

Percent of Pupils in

Each Level

Weighted Score in

Each Leve l

(B x C) E L A Math

5 Advanced 1000 9 % 90.00 Content area weights NRT 2 4 % 40%4 Prof ic ient 8 7 5 22% 192.50 Content area weights ST 3 6 %

3 Basic 7 0 0 33% 231.00

2 Below Bas ic 5 0 0 22% 110.00 Port ion of API 6 0 % 40%

1 Far Below Basic 2 0 0 14% 28.00

a Indicator Score 651.50

b Indicator Weight 36%

c Total Weighted Score for Indicator 234.54 +

Stanford 9 Reading Language Spell ing Mathemat ics

A B C D E F G H K L

Per formance BandsWeight ing

Factors

Percent of Pupi ls in Each

B a n d

Weighted Score in Each

Band

Percent of Pupils in

Each Band

Weighted Score in Each

Band

Percent of Pupils in

Each Band

Weighted Score in Each

Band

Percent of Pupils in

Each Band

Weighted Score in Each

Band

(B x C) ( B x E ) ( B x G ) (B x K)

5 80-99th NPR 1000 13% 130.00 17% 170 .00 1 2 % 120.00 19% 190.00

4 60-79th NPR 8 7 5 20% 175.00 20% 175 .00 1 9 % 166.25 30% 262.50

3 40-59th NPR 7 0 0 29% 203.00 30% 210 .00 3 2 % 224.00 22% 154.00

2 20-39th NPR 5 0 0 20% 100.00 19% 95.00 2 4 % 120.00 16% 80.00

1 1-19th NPR 2 0 0 18% 36.00 14% 28.00 1 3 % 26.00 13% 26.00

2001a Indicator Score 644.00 678 .00 656.25 712.50 APIb Indicator Weight 12% 6 % 6% 40% Basec Total Weighted Score for Indicator 77.28 + 40.68 + 39.38 + 285.00 + 1.64 = 679

*This Scale Calibration Factor (SCF) is for i l lustrative purposes only. The exact value of the SCF wil l be available only after the generation of the f inal 2000-2001 API Growth data f i le in December 2001 and the prel iminary 2001 API Base data f i le in January 2002.

CHART 1

How to Calculate the 2001 Base API for an Elementary School (grades 2-6)

axb=c

axb=c

ScaleCalibration

Factor*

Page 94: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

California Department of Education Policy and Evaluation Division

6

California Standards Test English Language Arts

A B C D

Performance LevelsWeighting

Factors

Percent of Pupils in

Each Level

Weighted Score in

Each Level

(B x C) ELA Math

5 Advanced 1000 9% 90.00 Content area weights NRT 24% 40%4 Proficient 875 23% 201.25 Content area weights ST 36%

3 Basic 700 34% 238.00

2 Below Basic 500 20% 100.00 Portion of API 60% 40%

1 Far Below Basic 200 14% 28.00

a Indicator Score 657.25

b Indicator Weight 36%

c Total Weighted Score for Indicator 236.61 +

Stanford 9 Reading Language Spelling Mathematics

A B C D E F G H K L

Performance BandsWeighting

Factors

Percent of Pupils in Each

Band

Weighted Score in Each

Band

Percent of Pupils in

Each Band

Weighted Score in Each

Band

Percent of Pupils in

Each Band

Weighted Score in Each

Band

Percent of Pupils in

Each Band

Weighted Score in Each

Band

(B x C) (B x E) (B x G) (B x K)

5 80-99th NPR 1000 6% 60.00 17% 170.00 11% 110.00 16% 160.00

4 60-79th NPR 875 26% 227.50 23% 201.25 23% 201.25 25% 218.75

3 40-59th NPR 700 33% 231.00 28% 196.00 24% 168.00 22% 154.00

2 20-39th NPR 500 20% 100.00 19% 95.00 20% 100.00 21% 105.00

1 1-19th NPR 200 15% 30.00 13% 26.00 22% 44.00 16% 32.00

2001a Indicator Score 648.50 688.25 623.25 669.75 APIb Indicator Weight 12% 6% 6% 40% Basec Total Weighted Score for Indicator 77.82 + 41.30 + 37.40 + 267.90 + -1.22 = 660

* This Scale Calibration Factor (SCF) is for illustrative purposes only. The exact value of the SCF will be available only after the generation of the final 2000-2001 API Growth data file in December 2001 and the preliminary 2001 API Base data file in January 2002.

CHART 2

How to Calculate the 2001 Base API for a Middle School (grades 7-8)

axb=c

axb=c

ScaleCalibration

Factor*

Page 95: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

California Department of Education Policy and Evaluation Division

7

California Standards Test English Language Arts

A B C D

Performance LevelsWeighting

Factors

Percent of Pupils in

Each Level

Weighted Score in

Each Level

(B x C) ELA Math Sci

5 Advanced 1000 9% 90.00 Content area weights NRT 16% 20% 20%4 Proficient 875 20% 175.00 Content area weights ST 24%3 Basic 700 32% 224.00

2 Below Basic 500 23% 115.00 Portion of API 40% 20% 20%1 Far Below Basic 200 16% 32.00

a Indicator Score 636.00

b Indicator Weight 24%

c Total Weighted Score for Indicator 152.64 +

Stanford 9 Reading Language Mathematics Science Social Science

A B C D E F G H I J K L

Performance BandsWeighting

Factors

Percent of Pupils in Each

Band

Weighted Score in Each

Band

Percent of Pupils in

Each Band

Weighted Score in

Each Band

Percent of Pupils in

Each Band

Weighted Score in

Each Band

Percent of Pupils in

Each Band

Weighted Score in

Each Band

Percent of Pupils in

Each Band

Weighted Score in

Each Band

(B x C) (B x E) (B x G) (B x I) (B x K)

5 80-99th NPR 1000 9% 90.00 12% 120.00 21% 210.00 14% 140.00 11% 110.00

4 60-79th NPR 875 17% 148.75 26% 227.50 21% 183.75 22% 192.50 24% 210.00

3 40-59th NPR 700 23% 161.00 23% 161.00 20% 140.00 22% 154.00 28% 196.00

2 20-39th NPR 500 23% 115.00 22% 110.00 19% 95.00 21% 105.00 19% 95.00

1 1-19th NPR 200 28% 56.00 17% 34.00 19% 38.00 21% 42.00 18% 36.00

2001a Indicator Score 570.75 652.50 666.75 633.50 647.00 APIb Indicator Weight 8% 8% 20% 20% 20% Basec Total Weighted Score for Indicator 45.66 + 52.20 + 133.35 + 126.70 + 129.40 + -3.90 = 636

* This Scale Calibration Factor (SCF) is for illustrative purposes only. The exact value of the SCF will be available only after the generation of the final 2000-2001 API Growth data file in December 2001 and the preliminary 2001 API Base data file in January 2002.

How to Calculate the 2001 Base API for a High School (grades 9-11)

CHART 3

Soc Sci

20%

20%

axb=c

axb=c

ScaleCalibration

Factor*

Page 96: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

Appendix C

Page 97: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

Review of Possible API Data Elements

Class size. The review of the literature by Glass and Smith (1979) concluded that reducing class size to less than 20 students has an impact on student achievement. The review was part of the basis for the Tennessee class size reduction experiment where some students were assigned to classes with less than 20 students, and others were assigned to larger classes, but with a teacher’s aide. Nye, Hedges et al. (1999) found that K-3 students assigned to the smaller classes learned more than students assigned to the larger classes, even after they returned to regular size classes. Hanushek, Kain et al. (1999) question the Tennessee findings because of student mobility and whether the appropriate statistical analysis was used. In evaluating California’s class size reduction program, Bohrnstedt and Stecher (1999) found that while the student to teacher ratio was reduced, instruction did not change and not every teacher working in the reduced class size classes had a teaching credential. The California results are also questionable because class size reduction started before the evaluators could collect pre-implementation data. Consequently, the debate on the effectiveness of class size reduction in improving students’ learning is still on-going.

College entrance examinations: Participation and average score. There is little research on the relationship between the number of students taking college entrance examinations and student achievement. The one study we were able to find was done by Webster (1994), was trying to develop a model of school effectiveness. The author found the best model was based on standardized test results, and no significant improvement occurred if college entrance results were added to the model. Consequently, we are left with the idea that factors like college entrance examination rates may or may not be important factors to include in a school effectiveness model.

Condition of school facilities. Earthman and Lemasters (1996) reviewed the existing research on the relationship between the condition of school facilities and student achievement. The authors found that early studies showed a weak relationship, later studies found significant correlations between student achievement and thermal environment, building color and interior painting, physical plant size, age of the buildings, and building maintenance. The authors, although thorough in their review, tended to ignore the possibility that the correlations existed because of external factors. For example, schools in higher SES neighborhoods are likely to be better maintained than schools in lower SES neighborhoods. Data from California’s STAR program indicates that schools in high SES neighborhoods are likely to have higher test scores than schools in low SES neighborhoods. By combining these two pieces of information, we could conclude that students attending well-maintained schools are likely to have higher test scores. What we have ignored though, is a third factor, SES. Consequently, Earthman and Lemasters’ review is a good start, but correlation does not mean causality.

Page 98: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

Course offerings. The question of the relationship between course offerings and student achievement is weak because of course offerings and achievement are only indirectly linked. The Fall 1990 issue of Educational Evaluation and Policy Analysis was devoted to showing that having state standards does not mean students are getting the same information. During interviews about the California Mathematics Framework, researchers from Michigan State found that teachers were informed about the state standards and said they were implementing them in their classrooms. However, when the researchers observed classroom instruction they found that instruction did not correspond to the standards. If the link between curriculum and student achievement is instruction, then the relationship between curriculum and achievement is weak because of the variation in instruction.

Dropout rate. The extant literature tends to view student dropout rate as an outcome of an educational program, whether the program is desegregation (Guryan, 2001) or higher curriculum and performance standards (Lillard and DeCicca, 2001). The challenge with using a dropout rate is how to define it and how to measure it. The US Department of Education has two dropout rates, event and status. (http://www.nces.ed.gov/fastfacts /display.asp?id=16) An event dropout rate is similar to the calculation used by California, in that the rate is defined by the number of students completing grade 12 divided by the number of students enrolled at the beginning of grade 12. The event rate does not take into account the number of students moving in and out during grade 12 or the number of students who left school before grade 12. Although we used one year to illustrate the event rate, a four-year event dropout rate could also be calculated using the ninth grade enrollment four years previous to the number of high school graduates. The status dropout rate is calculated by surveying people between the ages of 16 and 24, asking them if they are currently enrolled in high school, graduated, or dropped out, and dividing this number by the sample total. This rate can then be adjusted statistically to reflect the status dropout rate for the nation.

Graduation rate. The concern with using graduation rate as a descriptor is based on a variety of ways of calculating a graduation rate and limited research on the relationship between student achievement and graduation rate. First, (Clements and Blank 1997) surveyed states and found graduation rates were calculated using three distinct methods:

1. Cohort: Total number of graduates divided by the number of ninth grade students enrolled four years earlier.

2. Leaver: Total number of graduates divided by the number of graduates plus the number of dropouts over four years.

3. Twelfth Grade Event: Number of graduates divided by the twelfth grade enrollment the previous fall.

In terms of usage, fourteen states used the first method, six states used the second, seven states used the third, and fourteen make no calculations. Consequently, there is little consensus across states about how to calculate a graduation rate. Second, the research looking at the relationship between student achievement and graduation rate is limited. In 1984, Webster (1994) tried to develop a method to identify

Page 99: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

effective schools in terms of student achievement for the Dallas Independent School District. The 1984 analysis used only student scores from a norm-referenced test. In 1992, the authors did a similar analysis, but this time used seven descriptors:

1. The norm-referenced test scores, 2. Scores on a state criterion referenced test that included a writing sample, 3. 143 course specific criterion referenced tests, 4. Student promotion rate, 5. Graduation rate, 6. Attendance rate, and 7. The percentage of students taking the Scholastic Aptitude Test and the average

scores. The initial comparison of the 1992 results with the 1984 results found significant similarity, meaning that the additional six descriptors did not provide any additional information for identifying effective schools. Further comparisons were later stopped, though, because of concerns about schools that had changed principals between 1984 and 1992. Based on these studies, the effectiveness of graduation rate is questionable because of the multiple ways it can be defined and the limited research on how it relates to student achievement.

Number of computers in a school. Although there have been no studies in the last five years looking at the relationship between the number of computers in a school and student achievement, there are have been some studies that suggest factors in the relationship. For example, Rowland (2000) found that ninety-nine percent of schools have computers, but Wenglinsky (1998) found that fourth and eighth grade African American students are less likely to have a computer in their classroom than Caucasian students. Wenglinsky also found that non-poor and suburban students are more likely to use computers for higher order thinking activities than poor, urban and rural students. After reviewing several large scale studies of the use of computers in classrooms, (Schacter and Fagnano 1999) concluded that computers in a classroom won’t necessarily improve student learning unless the software being used by students is based on sound principles of instructional design and student learning. Based, then, on these studies, if an indicator for students’ access to technology is to be included in the API, it would need to be defined in terms of the ratio of students to computers, and be associated with how the computers are used.

Number of instructional minutes. Currently the state of California collects data on how many minutes of instruction a student receives during a school year. Although there is a correlation between time spent in class and student learning, a review by Berliner and Rosenshine (1977) found that the important factor is how the time is spent and not just minutes of instruction. For example, the authors reviewed one study that found that students learned more if the instructional time was focused on one task, rather than a teacher switching from individual instruction to reading an announcement from the

Page 100: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

principal to starting a large group discussion. Consequently, instructional minutes collected in the aggregate would not be a useful element in the API.

Number of non-credentialed teachers. Darling-Hammond (1999) has argued that improving student achievement necessitates having a properly trained teacher in each classroom. The argument is supported by work done by researchers like Franklin and Crone (1992) who found, when studying teachers and students in 1,336 public schools in Louisiana, that student achievement was correlated with the percentage of certified teachers within a school. Other researchers, though, have not found evidence to support the certification claim. For example, Miller, McKenna et al. (1998) found no difference between a group of 41 traditionally prepared teachers and 41 alternatively certified teachers in terms of teaching behaviors or the student achievement after each group had been teaching for three years. Goldhaber and Brewer (2000) found that student achievement in mathematics depended on whether a teacher had subject specific training and less on whether they had a teaching credential. The conclusion, then, is that the percentage of uncertified teachers in a school may not, by itself, be a good indicator about the quality of instruction in a school. Number of pupil hours in after school programs. By definition, the Elementary School Intensive Reading Program and the Intensive Algebra Instruction Academies are before and after school programs. We conducted a literature review on the relationship between after school reading and math programs and student achievement, and found no work in the last seven years. We expanded our search to look at after school programs in general and, although some research has been done on the effectiveness of after school programs, the results can be questioned. The general problem with the studies is that the after school program was not the only change occurring within a school. For example, Molnar (1999) studying the Student Achievement Guarantee in Education (SAGE) program in Wisconsin found that the program included before and after school activities, class size reduction, changes in the curriculum, and teacher professional development. Any changes in student achievement, then, should not be attributed to a particular feature of the program, but only to the program as a whole. With respect to particular programs, we found a lack of consistency in implementation throughout the state. Discussions with CDE specialists indicated that the reading program does not have a definition of what is a poor reader, and the algebra program in only implemented in a few districts. Consequently, even if the literature review was more promising, the data from the programs would have limited use for the API. Based then on the literature review and how the programs operate in the state, the relationship between after school programs and student achievement is defined as weak. Number of support personnel. The SARC defines support personnel as counselors, librarians, psychologists, social workers, nurses, speech/language/hearing specialists, and non-teaching resource specialists (http://www.cde.ca.gov/ope/sarc/template/blanktemp.asp). A review of the existing literature indicates that work has been done on how support personnel can play a role in

Page 101: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

improving student achievement. For example, Brown (1999) presented several interventions that could be done by counselors. These interventions included study skill groups, time management training, and achievement motivation groups. However, we were not able to find any studies that looked specifically at the relationship between the number of support personnel in a school and student achievement. Consequently, we consider this variable to have a weak relationship to student achievement.

Number of students enrolled advanced classes. Little research has been done on the relationship between student achievement and enrollment in advanced mathematics and science classes. The only work in the last five years was done by Mayer, Mullens et al. (2001) and they looked at enrollment trends in higher-level science and mathematics classes but did not look at changes in student achievement, either generally or with respect to science and mathematics. Parent/community satisfaction. This descriptor is weak because of the limited variation in parent satisfaction. Research done through the US Department of Education (Office of Educational Research and Improvement, 1992) found little variation with parents’ level of satisfaction with a school in terms of students’ mathematics achievement. Specifically, they found that seventy five percent of parents whose children scored in the lowest quartile were satisfied with schools, and eighty seven percent of parents whose children scored in the highest quartile were satisfied. The annual parent satisfaction survey conducted by Phi Delta Kappan (Gallup, 1985; Elam, 1989; Elam, 1994; Rose, 1999) found that the percent of parents rating their children’s school as an A or a B varied from seventy one percent in 1985 to sixty six percent in 1999. The lack of variation indicates that if a descriptor related to parent satisfaction were added to the API, the rating system would not be able to better discriminate between a good school and a poor school. Percent of students passing end of course examinations. Without reviewing the literature, we know the relationship between student achievement and this descriptor is strong because the descriptor is a measure of student achievement. Percent of students taking the state test. This descriptor is related to student achievement, but its effectiveness in the API may be limited by the legislative requirement to test all pupils. This requirement leads to little variation among schools in the percentage of students taking the test, so it would not help the API’s ability to distinguish effective and non-effective schools. Principal mobility rate. We were not able to find any studies that looked at the relationship between principal transciency and student achievement. Retention rate. The extant literature generally argues that retention of students has negative effects on student achievement. Shepard (1990) found that retained students, on average, did worse when they were promoted than students of similar ability who were not retained, and that dropouts were five times more likely to have repeated a grade than high school graduates. Dill (1993) reviewed the existing literature and found

Page 102: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

that research does not support retaining students to improve their achievement. Dill concluded that retention had negative emotional and academic consequences for students, and that retained students were more likely to eventually drop out of school. Harvey (1994), using the extensive database on student testing and class size in Tennessee, found that students who were retained were behind their peers after the next year, even if the retainees were in smaller classes. The results led Harvey to argue against retention and in favor of other methods of remediation for struggling students. These studies, and their reviews, indicate there is a strong negative relationship between retention rate and student achievement. If a retention rate descriptor is included in the API, the negative relationship will need to be taken into account. School crime rate. The relationship between school crime rate and student achievement is weak. Data collected by Kaufman (1999) in 1996-1997 indicate that forty-three percent of schools report no incidents of serious crime and thirty-seven percent indicate only one to five incidents. That means that eighty percent of schools report fewer than one incident of serious crime a month. The lack of occurrence of crime would then indicate that a school crime rate would be an ineffective indicator for the API because there is minimal variation across schools.

School expenditures. The weakness of this indicator is that the link between school funding and student learning has not been shown. Krueger (1998) have shown a relationship between funding and achievement, but the study has been criticized for including funding that isn’t directly related to students or for aggregating test scores across grade levels. Grissmer, Flanagan et al. (2000) found that a relationship between funding and achievement may exist but that the relationship varies based on how much funding was available to a school prior to the increase. Still other studies have shown that there is no relationship between funding and achievement. For example, Hanushek (1989) found that while school funding has increased during the 1980s and 1990s, there has been a decrease in student scores on the National Assessment of Education Progress examinations, particularly at the middle and high school levels.

Student attendance rate. Within the last ten years research on the relationship between attendance rates and student achievement has been done by the Louisiana State Department of Education. The research was conducted because the public wanted a student attendance rate included in the school report cards, but the State Department of Education could not find evidence in the extant literature for supporting the inclusion of the measure. Caldas (1993) and Crone (1993) then used school level data to compare attendance rates with percent passing rates on the state assessments in grades 3, 5, and 7 and the passing rate on the Graduate Exit Examination. The researchers found that the strongest correlation between attendance rate and passing rate occurred when the group being used consisted of Caucasian students in metropolitan high schools. The authors did note that Caucasian students were a minority in these types of schools. It should also be noted that there was a variation of more than fifteen percentage points in the attendance rate between low SES metropolitan schools and high SES schools, and this variation was not apparent until attendance rates were disaggregated by SES and location. It should

Page 103: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

also be noted that the analysis was only effective because researchers were able to disaggregate attendance data by school location, SES, and student ethnicity. Although these studies suggest that student attendance rate could be used as a data element in a rating system, we have several concerns about the results. First, the researchers hypothesized that how well a student did on a test was related to the percentage of days she/he attended school. The data, though, only supported the hypothesis for a subgroup of students in a particular type of school. This result implies that if you are an African American student attending a pre-dominantly African American high school it matters less what percentage of days you attend school than if you were attending a pre-dominantly Latino or Caucasian high school. We find this result raises more questions than it answers. Second, the authors performed their analysis on school-level attendance and assessment data. This analysis, then, tended to ignore any variation that occurred around the average, thereby losing information about students. We believe a more informative analysis would have been to do the analysis with student-level data and then aggregate results to the school level.

Student mobility. Common knowledge would suggest that student achievement and student mobility are related. Rumberger (1999), using the National Educational Longitudinal Survey, found that students who move two times or more in high school are more likely not to graduate (59% for movers and 93% for non-movers). Mao (1997) used data from the Texas Public Education Information Management System and the Texas Assessment of Academic Skills to follow students from students from 1992 to 1996 (students started in grade 1). The authors found that student mobility was related to student achievement and the relationship became stronger in schools with high student turnover rates or percentages of economically disadvantaged students. Similar findings came from Jennings (2000) using data from Arizona elementary schools. The authors found that the relationship between achievement and mobility was moderated by English proficiency, poverty, and a student’s absence rate. Research indicates that the relationship between student mobility and achievement may be more critical at the early stages of learning. For example Heinlein and Shinn (2000) found that there was a relationship between student mobility between grades 3 and 6 and sixth grade achievement until they controlled for third grade achievement. This finding suggests that mobility may be more critical in the early grades when basic skills are being learned. These findings indicate that there is a strong relationship between student achievement and mobility, especially for students who are low income and/or limited English proficient. The relationship may be even stronger at the early elementary grades.

Suspension rate. While standardized test scores are used to evaluate the academic effectiveness of an educational program, suspension rates are used to evaluation the behavioral effectiveness. For example, Powers (1997) used suspension rates in an evaluation of Project SOAR in the Tucson Unified School District. The project was designed to improve student achievement and behavior by providing mentors to at-risk students. The authors found that students’ academic indicators did not change

Page 104: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

as a result of program participation, but their attendance rates improved and suspension rates decreased. (Alspaugh 1996) compared student achievement in mathematics and reading across ten high and ten low socioeconomic elementary schools in the Midwest. The author found a -.66 correlation between suspension rate and reading achievement and a -.65 correlation between suspension rate and mathematics achievement. Although the academic indicators in these studies can be questioned, the results imply that suspension rate is a common indicator of whether a program improves student behavior. The Alspaugh study suggests that suspension rate may also be strongly related to student achievement. Teacher attendance. The relationship between teacher attendance and student achievement is moderate. In a bulletin from the National Association of Secondary School Principals, Pitkiff (1993) indicated that teacher absenteeism is more likely to higher in schools with lower student achievement, high percentages of minority and poor students, and urban school districts. However, when the achievement of all students is studied, then the relationship is not as clear. For example, Richards (1992), in studying South Carolina’s School Incentive Reward Program, found student achievement had some gains, but there was little change in teacher attendance. Ehrenberg et al.(1991) found a minimal correlation between the number of days leave used by teachers and student achievement (.02 to .08, depending on the grade level). These studies were done with students in a variety of schools. Teacher mobility. The relationship between teacher transciency and student achievement may not be related to what happens in a classroom, but teachers’ desire to improve their working environment. We found two studies supporting the argument that teachers move to improve their working environment. Greenberg and McCall (1974) studied teacher movement within the San Diego school system and found that new teachers were most likely to be assigned to low SES schools. As teachers gain more seniority, they transfer to schools that a higher SES school. Hanushek, Kain et al. (1999) supports the Greenberg et al findings because they found that teacher transciency was related to student characteristics such as income, race, and achievement. These studies suggest that a relationship between teacher transciency and student achievement may exist, but the results are inconclusive because of the limited research. Teacher salaries. The extant literature is limited in terms of the relationship between teacher salaries and student achievement. The studies that we did find made substitutions for one or the other of the two variables. For example, Hanushek (1971) used number of hours of graduation education and years of experience as a substitute for teacher salary, and found that they were unrelated to student achievement. Ballou (1997) did use teacher salaries in their analysis, but substituted principals’ evaluations for

Page 105: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

student achievement, and found little relationship between the variables. Consequently, we define the relationship as weak. Year round school status. A review of the year round school literature indicates that a descriptor for this variable has a moderate relationship to student achievement. Shields (2000) reviewed the existing literature on year round schools and found that nine studies found a positive relationship with student achievement, four studies found that enrollment in a year round school had no significant difference on student achievement, and one found attending a year round school had a negative relationship to achievement for students with low IQ (less than 100). The researchers also conducted a study comparing achievement of students attending year round and traditional schools in a large urban district and found no significant difference. Based on these findings, we can only conclude that year round school status and student achievement have a moderate relationship. Alspaugh, J. W. (1996). "The Longitudinal Effects of Socioeconomic Status on

Elementary School Achievement." ERIC(ED397120): 16. Ballou, D., & Padgursky, Michael (1997). Teacher Pay and Teacher Quality. Kalamazoo,

MI:, Upjohn Institute for Employment Research. Berliner, D. C. and B. Rosenshine (1977). The Aquisition of Knowledge in the

Classroom. Schooling and the Aquisition of Knowledge. R. J. Spiro, W. E. Montague and R. C. Anderson. Hillsdale, Lawrence Erlbaum Associates: pp. 375-396.

Bohrnstedt, G. W. and B. M. Stecher (1999). Class Size Reduction in California 1996-1998: Early Findings Signal Promise and Concerns.

Brown, D. (1999). "Improving student achievement: What school counselors can do." ERIC(ED435895).

Caldas, S. J. (1993). "Reexamination of Input and Process Factor Effects on Public School Achievement." Journal of Educational Research 86(4): 206-14.

Clements, B. S. and R. K. Blank (1997). "What Do We Know about Education in the States: Education Indicators in State Reports." ERIC (ED414314).

Crone, L. J., C. H. Glascock, B. J. Franklin and S. E. Kochan (1993). An Examination of Attendance in Louisiana Schools. Annual Meeting of the Mid-South Educational Research Association (November 10-12, 1993), New Orleans, LA.

Darling-Hammond, L. (1999). Teacher Quality and Student Achievement: A Review of State Policy Evidence, Center for the Study of Teaching and Policy.

Dill, V. S. (1993). "Closing the Gap: Acceleration vs. Remediation and the Impact of Retention in Grade on Student Achievement." ERIC (ED364938).

Earthman, G. I. and L. Lemasters (1996). "Review of Research on the Relationship between School Buildings, Student Achievement, and Student Behavior." ERIC (ED416666).

Ehrenberg, R. G. a. R. S. Ehrenberg, D. I. Rees and E. L. Ehrenberg (1991). "School District Leave Policies, Teachers Absenteeism, and Student Achievement." Journal of Human Resources 26(1): p72-105.

Page 106: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

Elam, S. M., R. C. Lowell and A. M. Gallup (1994). "The 26th Annual Gallup Poll of the Public's Attitude toward the Public Schools." Phi Delta Kappan 76(1): 41-56.

Elam, S. M. and A. M. Gallup. (1989). "The 21st Annual Gallup Poll of the Public's Attitudes toward the Public Schools." Phi Delta Kappan 71(1): 41-54.

Franklin, B. J. and L. J. Crone (1992). "School Accountability: Predictors and Indicators of Louisiana School Effectiveness." ERIC (ED354261).

Gallup, A. M. (1985). "The 17th Annual Gallup Poll of the Public's Attitudes toward the Public Schools." Phi Delta Kappan 67(1): 32-47.

Glass, G. V. and M. L. Smith (1979). "Meta-analysis of research on class size and achievement." Educational Evaluation and Policy Analysis 1(1): 2-16.

Goldhaber, D. D. and D. J. Brewer (2000). "Does Teacher Certification Matter? High School Teacher Certification Status and Student Achievement." Educational Evaluation & Policy Analysis 22(2): 129-45.

Greenberg, D. and J. McCall (1974). "Teacher mobility and allocation." Journal of Human Resources 9(4): 480-502.

Grissmer, D., A. Flanagan, et al. (2000). Improving Student Achievement: What State NAEP Test Scores Tell Us. California.

Guryan, J. (2001). Desegregation and Black Dropout Rates, National Bureau of Economic Research.

Hanushek, E. A. (1971). "Teacher characteristics and gains in student achievement: Estimation using micro data." American Economic Review 60(2): 280-288.

Hanushek, E. A. (1989). "The impact of differential expenditures on school performance." Educational Researcher 18(4): 45-51.

Hanushek, E. A., J. F. Kain, et al. (1999). Do higher salaries buy better teachers? Harvey, B. H. (1994). "To Retain or Not? There is no Question." ERIC (ED369177). Heinlein, L. M. and M. Shinn (2000). "School Mobility and Student Achievement in an

Urban Setting." Psychology in the Schools 37(4): 349-57. Jennings, T. A., T. M. Kovalski, and J. T. Behrens (2000). "Predicting Academic

Achievement Using Archival Mobility Data." ERIC (ED449181). Kaufman, P., X. Chen, S. P. Choy, K. A. Chandler, C. D. Chapman, M. R. Rand, and C.

Ringel (1999). "Indicators of School Crime and Safety, 1998." Education Statistics Quarterly 1(1): http://nces.ed.gov/pubs99/quarterly/spring/4-elementary/4-esq11-e.html.

Krueger, A. B. (1998). "Reassessing the view that American schools are broken." FRBNY Economic Policy Review.

Lillard, D. R. and P. P. DeCicca (2001). "Higher Standards, more dropouts? Evidence within and across time." Economics of Education Review 20(no.5): 459-473.

Mao, M. X., M. D. Whitsett, and L. T. Mellor (1997). "Student Mobility, Academic Performance, and School Accountability." ERIC (ED409380).

Mayer, D. P., J. E. Mullens, et al. (2001). "Monitoring School Quality: An Indicators Report." Education Statistics Quarterly 3(1): 38-44.

Miller, J. W., B. A. McKenna, et al. (1998). "A Comparison of Alternatively and Traditionally Prepared Teachers." Journal of Teacher Education 49(3): 165-76.

Molnar, A., P. Smith, and J. Zahorik (1999). Evaluation results of the Student Achievement Guarantee in Education (SAGE) program.

Page 107: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

Nye, B., L. V. Hedges, et al. (1999). "The long-term effects of small classes: A five-year follow-up of the Tennessee class size experiment." Educational Evaluation and Policy Analysis 21(2): 127-142.

Office of Educational Research and Improvement, (1992). Parental Satisfaction with Schools and the Need for Standards. Washington DC, U.S. Department of Education.

Pitkiff, E. (1993). "Teacher Absenteeism: What Administrators Can Do." NASSP Bulletin 77(551): 39-45.

Powers, S. and. S. McConner. (1997). "Project SOAR 1996-1997. Evaluation Report." ERIC (ED412269).

Richards, C. C., Tian Ming Sheu (1992). "The South Carolina School Incentive Reward Program: A Policy Analysis." Economics of Education Review 11(1): 71-86.

Rose, L. C., and Alec M. Gallup (1999). "The 31st Annual Gallup Poll of the Public's Attitudes toward the Public Schools." Phi Delta Kappan 81(1): 41-56.

Rowland, C. (2000). "Teacher Use of Computers and the Internet in Public Schools." Education Statistics Quarterly 2(2): 72-75.

Rumberger, R. W., K. A. Larson, R. K. Ream, and G. J. Palardy (1999). "The Educational Consequences of Mobility for California Students and Schools. Research Series." ERIC (ED441040).

Schacter, J. and C. Fagnano (1999). "Does Computer Technology Improve Student Learning and Achievement? How, When, and under What Conditions?" Journal of Educational Computing Research 20(4): 329-43.

Shepard, L. A., and M. L. Smith (1990). "Synthesis of Research on Grade Retention." Educational Leadership 47(8): 84-88.

Shields, C. M., and S. L. Oberg (2000). Year-Round Schooling: Promises and Pitfalls. Lanham, MD, Scarecrow Press, Inc.

Webster, W. J. (1994). "Effectiveness Indices: A "Value Added" Approach to Measuring School Effect." Studies in Educational Evaluation 20(1): 113-45.

Wenglinsky, H. (1998). "Does It Compute? The Relationship between Educational Technology and Student Achievement in Mathematics." ERIC (ED42519).

Page 108: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

Appendix D

Page 109: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

Analysis of the Reliability and Validity of the Current

Computation Method for the API This section draws upon the steps involved with calculating the API for a school, presented in Appendix B. Reliability Reliability looks at whether a given measure is accurate. One could think of using a 32-inch yardstick to measure a room. The final measures would appear sound, but they would be erroneous by a considerable degree. We considered this problem, called measurement error, in the steps taken to calculate the API. Since the inputs to the API – the STAR scores – like every test already have a degree of error, it is important that the API add as little to existing error as possible. We also looked at the calculations to see if they treated the academic performance of students impartially and if the results fairly mapped the real state of school performance. These latter points are referred to as sampling error. The API should be structured to minimize both these types of problems. We find that currently the API has considerable amounts of these errors, and thus the results are not as reliable as they could be. Measurement error is created by the rules to calculate the API. To explain measurement error, consider a school in which all students take the STAR exam and all qualify for inclusion. (We will relax this assumption later.) The rules for creating a school score are the same for all the test parts, so only one test need be considered. The students’ test scores are expressed in National Percentile Rankings (NPR) and grouped into five performance bands from low to high. The proportion of the schools’ students who score in each band are then multiplied by a weighting factor. These products for all the bands are then added together for the school score on that test. The wide quintile bands impose a degree of measurement error on the API. Regardless of the NPR score actually earned by the student, the calculations in effect substitute the lowest score in the band in which the student score falls. Every student that is placed in the band is weighted the same, so the lowest NPR in the band becomes the de facto score for each student in the band. So if a student scores a 58 NPR he falls into the third band, and has the same impact as a student in the 50th NPR. The same is true for all the students in every band. Unless the student happens to score at the lower boundary, the difference between the true score and the lowest band score is left uncounted. This is the first source of measurement error. This source of measurement error arises directly as a result of the use of performance bands. What start as 100 NPR categories of performance become 5 categories of performance bands. The implications for measurement loss are clear. The wider the band, the larger the potential distortion in the accounting of individual student achievement.

Page 110: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

Moving across years it is easy to see that the use of bands in the API creates significant problems with measurement error. If the incentives created by the API are effective, schools will work to improve the academic achievement of their students. Lumping scores into bands gives greater import to performance at the ends of the bands. The real scores can improve significantly within a band and not be revealed in the API. This point is especially meaningful when one considers that individual test score moves of ten percentile rankings are considered exceptional. The only scores that would affect the API are those that moved across bands from the prior year, since they would affect the proportion of students in each band.i That two students with equivalent improvement could contribute to the API differently solely on the basis of their starting points highlights the measurement problems that exist. Two important implications arise from these findings. First, small schools are likely to experience more fluctuation from year to year than large schools. This result is due to the greater chance that in any given year small schools will have students in each band clustered at the top or bottom of the range and therefore susceptible to jumping bands for better or worse in the API results. Second, it will be harder for large schools to show improvement over time, all other factors being equal. Having larger numbers of students in each band creates an expected average score closer to the midpoint of the range with narrower variance than smaller schools, suggesting that the degree of change will have to be larger to register improvement on the API. This problem becomes especially acute when school populations are disaggregated into subgroups.ii Even if a minimum of 100 students in a subgroup is met, spreading their scores across five bands makes for few numbers in each band, and therefore a greater likelihood that they are distributed in a manner that is poised for band shifts in the coming year. Thus the well-intentioned provision of the API may in fact be creating biased figures of group behavior over time. As currently structured, the API is too blunt an instrument to be able to capture the full extent of student improvement. On a stand-alone basis the measurement error could be diminished by increasing the number of bands. At the limit, there would be one hundred bands - one for each NPR. Each would have its associated weight and the remaining calculations would continue intact. At that point, however, the assumption about no individual measurement error breaks down and becomes influential in the accuracy of the API. Having wide bands made it more likely that the majority of scores in the band would have been assigned there even if they were inaccurate to a degree. Having a band for every NPR score makes the within-band likelihood of the score belonging in the band equal to the individual measurement error of the test score itself. Reliability also considers how representative the final API score is of the true academic achievement of the students in a school, measurement error aside. One might consider the analogy of taking the height measurements of 100 individuals and using that set of measures to guess the average height of a country’s population. Similar problems arise with the API.

Page 111: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

One type of problem is common to any rating system based on cross-sectional snapshots of performance. Over time a school has groups of students moving through its grades. At any one point in time, it may have a collection of students who are better or worse than the typical student, a situation known as student variation. Those better or worse students’ test scores are factored into the API, and will elevate or depress the API independent of the efforts of the school. As currently structured, the API cannot discriminate between student variation and true improvement in school performance. This is the first source of sampling error. A second source of sampling error arises from the practice of averaging scores within each percentile band discussed above. Since the only scores that affect the change in a school’s API scores are those that by moving into a different band change the proportions within each band, they represent a subset of all the scores for a school. What happens to those scores determines the fate of the school’s API. Thus the API is based on a sample of students that are not representative of the students as a whole. This is a second source of sampling error. Its effects are even more acute when disaggregating school scores by ethnicity and socio-economic status. Finally, the rules about exclusion of test scores create a third source of sampling error. There may be valid concern behind many of the exclusion rules, but a final API that does not reflect the full experience of all students suffers from sampling error. An example of this form of sampling error is the exclusion of students who were not enrolled in the district in the previous year. Comparing the 2000 statistics for STAR and the API indicate that 14 percent of students taking the STAR were excluded from the API. (Students excluded from the API for non-standard test administration would also have been excluded from the STAR results.) They represent a significant group of students who benefit from significant efforts by the school, but they are excluded from the API calculation. These points show that the API has several reliability problems. It inadvertently creates sampling error by its calculation and exclusion rules. In addition, the method of averaging employed by the API creates a distorted picture of the performance of schools. Below, consideration is given to the use of averaging in any form as the best means of capturing school performance. Validity Validity considers the degree to which the current API is measuring the right thing. Student achievement on STAR has been chosen to reflect school performance. The choice of the legislature is taken as a given. The question is whether the current API represents the best measure of student academic achievement. This analysis revealed that it does not. Even if the sources of measurement error were eliminated, the API calculations are based in averaged student test scores. There are two problems encountered with averaging. Figure 1 lays out a simplified example of the difficulties inherent in averaging test scores each year. Ten tests scores are collected each for Schools A, B and C. For each set of

Page 112: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

Figure 1 Simulated Scores for Three Schools Column 1 Column 2 Column 3 Column 4 School A School B School C Gain Score (C-B) 1 18 20 42 22 2 25 22 45 23 3 26 25 47 22 4 35 28 49 21 5 40 30 50 20 6 50 70 50 -20 7 60 72 51 -21 8 75 75 53 -22 9 82 78 55 -23 10 89 80 58 -22 Total 500 500 500 Average 50 50 50

Page 113: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

Page 114: The Future of California’s Academic Performance Index · 2018. 4. 2. · Project Staff Margaret E. Raymond, Ph.D. Stephen H. Fletcher, Ph.D. Jeanene Harlick ... Their constructive

CREDO

scores, the average score is 50. This approach is called a Status Model, as it captures the school’s condition at a single point in time. As can be seen from the graphs beneath each set of figures, those averages hide real differences in the distribution of scores. School C has its scores more closely clustered around its average, suggesting that its performance is more consistent than School A and School B. This example shows the first problem with averaging: it fails to account for the variance of the scores, and thereby loses a considerable amount of information. The Status Model also carries with it the problem of sampling error due to student variation discussed above. A modified version of the Status Model is the Grade Level Change Model. It examines changes in grade level performance across years. For example, a school might examine the performance in 4th grade in successive years. The technique is the same as the Status Model, stratified by grade; thus, the variance is increased.iii This approach is possible on a cross-sectional basis since the comparison is year-to-year change by grade level, but accordingly does present the same problems of information loss and sampling error due to student variation as the Status model. A second problem with averaging becomes evident when examining changes in scores over time. Still using Figure 1, consider that B and C are the same school in two different years. Comparing the averages of the two years, one would conclude that no change in performance occurred. However, if each pair of observations is compared, the individual student differences show that five students had lower scores and five showed improvement. This difference is displayed in Column 4 of Figure 1. Even when the average performance of the school did not change, the use of gain scores preserved more of the real experience in the school. A final comment concerns a different type of validity, the value of a measure in predicting larger outcomes of interest. In our conception of school accountability, student academic achievement has been selected as the outcome of interest. Of course, performance on standardized tests is not the only way to conceive of student achievement, but it is the only measure that can perform the functions needed for rigorous comparisons. We are concerned with student achievement partly in its own right, but also because we have some notion of its relationship to larger outcomes later in life: better chances for higher education, improved employment, higher income, smoother transitions to adulthood, and so on. To date, there has been no attempt to connect California student achievement with any of these other outcomes. However, a few other states have started to track the association, and it should be considered here as well. i This point was illustrated for CREDO by John Chubb. We are grateful for the insight. ii Kane, Thomas J.and Douglas O. Steiger, “Improving School Accountability Measures,” NBER Working Paper No.W8156, (March 2001). iii This result occurs because there are a smaller number of observations; therefore each one has a greater (and potentially large) contribution to the measurement error and to the variance.