Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal...

33
1 The Effect of Advanced Placement Science on Students’ Skills, Confidence and Stress Dylan Conger Alec I. Kennedy Mark C. Long Raymond McGhee Jr. ABSTRACT The AP program has been widely adopted by secondary schools, yet the evidence on the impacts of taking AP courses has been entirely observational. We report results from the first experimental study of AP, focusing on whether AP endows students with greater human capital than other regular and honors courses. We find suggestive evidence that taking an AP science course increases students’ science skill and their interest in pursuing a STEM major in college. AP course-takers also have lower confidence in their ability to succeed in college science, higher levels of stress, and worse grades than their control counterparts. ____________ Dylan Conger is a professor of public policy at the George Washington University. Alec I. Kennedy is a doctoral student at the University of Washington. Mark C. Long is a professor of public policy and governance and adjunct professor of economics at the University of Washington. Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown, Sarah Coffey, Bonnee Groover, Josette Arevalo Gross, Hernando Grueso Hurtado, Jessica Mislevy, Kelsey Rote, Massiel Sepulveda, and Mariam Zameer for excellent research assistance. They also appreciate the guidance and insights provided by Del Harnisch, Michal Kurlaender, Richard Murnane, Helen Quinn, and Aaron Rogat. The authors are grateful for comments from three anonymous referees. The College Board staff provided answers to the study team’s questions about the AP program and general feedback on the research design, but the College Board did not provide financial support and was otherwise not involved in the production of this research. The research was funded by the National Science Foundation (Award 1220092) and is registered in the American Economic Association’s Registry for RCTs (ID 000140). The data used to produce the empirical findings in this paper are available from the Inter-university Consortium for Political and Social Research at http://doi.org/############. Online Appendix can be found at http://jhr.uwpress.org. Corresponding author email: [email protected]. JEL codes: I20, J24

Transcript of Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal...

Page 1: Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown,

1

The Effect of Advanced Placement Science on Studentsrsquo Skills Confidence and Stress

Dylan Conger

Alec I Kennedy

Mark C Long

Raymond McGhee Jr

ABSTRACT

The AP program has been widely adopted by secondary schools yet the evidence on the impacts

of taking AP courses has been entirely observational We report results from the first

experimental study of AP focusing on whether AP endows students with greater human capital

than other regular and honors courses We find suggestive evidence that taking an AP science

course increases studentsrsquo science skill and their interest in pursuing a STEM major in college

AP course-takers also have lower confidence in their ability to succeed in college science higher

levels of stress and worse grades than their control counterparts

____________

Dylan Conger is a professor of public policy at the George Washington University Alec I

Kennedy is a doctoral student at the University of Washington Mark C Long is a professor of

public policy and governance and adjunct professor of economics at the University of

Washington Raymond McGhee Jr is a senior director at Equal Measure The authors thank

Nicole Bateman Kerry Beldoff Grant H Blume Jordan Brown Sarah Coffey Bonnee Groover

Josette Arevalo Gross Hernando Grueso Hurtado Jessica Mislevy Kelsey Rote Massiel

Sepulveda and Mariam Zameer for excellent research assistance They also appreciate the

guidance and insights provided by Del Harnisch Michal Kurlaender Richard Murnane Helen

Quinn and Aaron Rogat The authors are grateful for comments from three anonymous referees

The College Board staff provided answers to the study teamrsquos questions about the AP program

and general feedback on the research design but the College Board did not provide financial

support and was otherwise not involved in the production of this research The research was

funded by the National Science Foundation (Award 1220092) and is registered in the American

Economic Associationrsquos Registry for RCTs (ID 000140) The data used to produce the empirical

findings in this paper are available from the Inter-university Consortium for Political and Social

Research at httpdoiorg

Online Appendix can be found at httpjhruwpressorg

Corresponding author email marklonguwedu

JEL codes I20 J24

2

I Introduction

The Advanced Placement (AP) program a set of college-level courses and exams offered at the

high school level has become a centerpiece of efforts to strengthen the transition to

postsecondary training and boost human capital Many colleges and universities treat studentsrsquo

enrollment in AP courses and scores on AP exams as a signal of quality in admissions and grant

college credit or course waivers to students who receive high AP exam scores (Geiser and

Santelices 2004) These incentives have prompted a substantial increase in the number of

students taking AP courses and exams in recent decades with more than five times as many AP

exams taken in 2018 (over five million) as in 1996 (less than one million) (College Board 2018)

At the programrsquos inception in the mid-1950s AP courses were found in a handful of elite private

schools today AP is offered in nearly 70 percent of public schools in the United States (Thomas

et al 2013)

Much of the expansion has been driven by federal and state policies designed to increase

access to AP including offering subsidies to pay for exams building AP course offerings into

school accountability requirements and requiring public postsecondary institutions to offer

credit for AP exam scores (Adelman 2006 Dounay Zinth 2016 Holstead et al 2010) For almost

20 years for instance the US Department of Education has provided states with funds to offset

the cost of AP exams for low-income students1 Despite the programrsquos popularity among many

AP also has its critics Some researchers and educators claim that the programrsquos effectiveness

has been oversold and that there is no real evidence that AP endows students with greater skill or

subject-matter interest than other high school courses (Berger 2006 Drew 2011 Klopfenstein

and Thomas 2010 2009 Tai 2008 Tierney 2012) Others worry that the pressure of AP courses

causes students undue stress and confidence loss (Hopkins 2012 Kim 2015 Steinberg 2009)

The expansion of AP to less-resourced schools has also raised concerns that many of the students

now taking the courses are academically underprepared such that the monetary and psychic costs

of the investment may outweigh the potential benefits (Bowie 2013 Dougherty and Mellor 2009

Duffett and Farkas 2009 Smith Hurwitz and Avery 2017 Tierney 2012) Up to now

researchersrsquo ability to generate causal evidence on any of the claims made by proponents and

opponents has been substantially limited by the nonrandom sorting of students into AP classes

As a result all of the prior research on AP impacts has been observational

In this paper we provide the first experimental evidence on AP program impacts We focus

on AP science courses which have been endorsed by educators and policymakers as a key

strategy for increasing American studentsrsquo skill and interest in Science Technology

Engineering and Mathematics (STEM) and strengthening the STEM workforce (eg Adelman

2006 Bush 2006 House 2016) With participation from 23 schools and over 1800 students from

across the United States we randomly offered students enrollment into newly launched AP

Biology or Chemistry courses in their schools To directly evaluate whether AP endows students

with higher levels of skill than other science courses we designed and validated an instrument to

measure studentsrsquo scientific inquiry abilities (eg the ability to analyze data and make scientific

arguments) We also collected administrative data and surveyed students to assess AP impacts on

their interest in pursuing a STEM degree in college confidence in completing a college science

course high school grades and stress levels In addition to generating impact estimates we

report on the courses that AP crowds out along with the contrast between treatment and control

students in the content and rigor of their science courses

The results suggest that there is some truth in the claims made by both advocates and critics

3

of AP Consistent with the goals of an AP course treatment group students report that their

courses are more challenging and inquiry-based than control group students These views are

shared by teachers who report a higher level of rigor in their AP science courses compared to

their other science courses We find suggestive evidence that this academic challenge leads to

increases in skill AP course-takers score 023 standard deviations higher than control group

compliers on the end-of-year assessment of scientific skill Though our precision prevents us

from ruling out zero treatment effects at traditional levels of statistical inference (p-value=014)

this large point estimate suggests genuine productivity gains for students who take AP science

over and above the gains experienced by students who enroll in other high school courses We

also find suggestive evidence of an AP science boost to studentsrsquo interest in pursuing a STEM

degree should they enroll in college Together these results fail to support the concern that the

AP programrsquos impact on human capital has been oversold

At the same time our results confirm that the workload and expectations of an AP science

class causes students to lose confidence in their ability to succeed in college-level science gain

stress and earn lower grades (prior to the weights that are often attached to AP grades by

secondary and postsecondary institutions) The confidence levels among study participants are

quite high with 92 percent of control group compliers reporting that they are ldquosomewhatrdquo or

ldquoextremelyrdquo confident in their ability to succeed in a college science course AP course-takers

report a 10-percentage point lower estimation of their ability Students in the AP course are also

more than twice as likely as control group compliers to report that the course negatively affected

their physical or emotional health (our measure of stress) And comparisons of transcripts reveal

that treatment group students earned lower preweighted grades in science and other subjects

during the year that they took the AP class

Our study contributes to a small research base on the effects of the AP program2 Using a

regression discontinuity design Smith Hurwitz and Avery (2017) show that students who

barely earn a college-credit equivalent score on the AP exam (eg scoring just above the

threshold necessary to receive a 3 on the exam (out of 5) are more likely to complete their

bachelorrsquos degrees in four years than students who fall just below that threshold In a related

paper that relies on the same data and design Avery et al (2018) demonstrate that AP exam

scores also influence studentsrsquo college major choices These compelling results demonstrate that

students take advantage of postsecondary AP credit policies to waive out of intro courses and

that receiving a higher AP exam score may serve as a signal of skill to both institutions and

students These two studies however do not show that AP courses per se led to skill

development as they focus solely on differences in behavior for AP exam-takers who fall just

below and just above the score thresholds Jackson (2010 2014) evaluates the impacts of the AP

Incentive Program which offers cash incentives to teachers and students for passing scores on

AP exams as well as funds for training teachers and convening teams of teachers to align pre-AP

curriculum with the needs of the AP class Jackson identifies impact from variation in the timing

of program implementation across high schools in Texas and finds large positive treatment

effects on AP courses and exams (2010) The AP Incentive program also increased studentsrsquo

college going and persistence as well as their labor market earnings (Jackson 2010 2014) These

two studies indicate that the AP Incentive Program increased AP participation and subsequent

educational attainment and labor market performance However it is not clear whether these

results would hold in the absence of the Incentive Program

We build on these findings and inform policy and practice in several ways Most important

we directly test one of the main mechanisms through which AP is expected to influence studentsrsquo

4

attainment and earnings by increasing their skill and interest in the subject matter We determine

whether skill and interest gains as distinct from college admissions and credit-granting policies

are key drivers behind APrsquos impact on later outcomes This distinction is important given that

less than half of AP course-takers earn a credit-granting score on the AP exam either because

they do not take the exam or because they obtain low scores (National Research Council 2002

College Board 2018) Many selective colleges are also increasingly making it difficult for

students to receive credit for their AP exam scores Most top institutions restrict the number of

AP subject areas that are eligible only offer credit or waivers for very high scores on the exams

or cap the total amount of AP credit that a student can receive (Weinstein 2016) In 2012

Dartmouth College announced that it would no longer grant credit for any AP exam score a

policy shared by several other selective institutions including Amherst College Brown

University and the California Institute of Technology (Weinstein 2016) Our results which

generalize to a newly offered AP course suggest that AP endows students with human capital

even if it does not grant them the opportunity to earn credit at their preferred college For college

admissions officers the findings also suggest that AP course-taking offers a reasonable signal of

studentsrsquo skill and subject-matter interest Our estimated effects on skill and STEM interest are

somewhat limited by insufficient precision yet they represent the first and most credible

evidence to date on the impact of AP on these key outcomes

Our study is also among the first known to us that quantifies the AP impact on studentsrsquo

grades We find that students who take an AP science course earn lower grades in science (by

029 grade points) and lower grades in their other courses (by 018 grade points) The lower

grades in science are driven by the lower grade received in the AP class a negative effect that

many secondary and postsecondary institutions offset by upweighting AP grades The estimates

suggest that studentsrsquo AP science grade would have to be inflated by a factor of 146 (eg a C

would have to be converted to approximately a B+) to remove the net negative on overall grade

point average (GPA) While many high schools including those that participated in our study

weight studentsrsquo GPAs to adjust for the academic difficulty of the courses practices vary

substantially across institutions (Sadler and Tai 2007 Klopfenstein and Lively 2016) In a recent

survey of Texas high schools for instance Klopfenstein and Lively (2016) find that most

schools with AP courses used weights but that they ranged from 05 to 1 point (with a small

number assigning more than 1 extra point) Our findings suggest that the current practices at

many institutions under adjust for the grade penalty from AP courses In addition attaching

weight to AP grades cannot undo the learning loss that may occur when students shift their effort

away from non-AP coursework

We also contribute to other strands of literature on the relationship between studentsrsquo

academic achievement and their perceptions of their own confidence and stress Prior literature

on the relationship between studentsrsquo confidence in their ability and their true ability is rife with

mixed results (Boekaerts and Rozendaal 2010 Stankov and Crawford 1996 Stankov 2013)

Psychologists have also documented an inverted U-shaped relationship between perceived

pressure and performance where some amount of stress is necessary to increase achievement

yet too much stress can reduce studentsrsquo ability to gain knowledge (Anderson 1976 Davis 2014

Yerkes and Dodson 1908) We find that students taking an AP science class experience cognitive

gains concurrent with losses in their academic confidence This finding is consistent with

evidence that many US students are highly confident in their skills and that this noncognitive

belief often interferes with their ability to learn (Chiu and Klassen 2010 Stankov and Lee 2014)

The AP course appears to reduce studentsrsquo estimation of their own ability either by changing the

5

standard to which they compare themselves or by making them more aware of the challenges

they might face in a college course Whether these changes in perceived confidence persist and

how they influence later outcomes is uncertain Students with expectation levels that match the

real demands of college courses might eventually perform better in those courses Some students

might also use the insights they gain from a challenging AP science class to shift away from

difficult science courses in college (or entire majors) that could delay or hinder their college

completion Our results also suggest that AP causes a significant amount of stress for students

but we do not find evidence that the added pressure substantially limits their knowledge gains in

science

II AP Science and Conceptual Framework

A AP and Other Rigorous Secondary School Courses

The AP program is an appealing option for high school administrators who seek to offer college-

level courses to their students AP course descriptions and assignments are designed to match

those offered in introductory college courses in each subject and thus to prepare students for the

rigor of college coursework The College Board (the ldquoBoardrdquo for brevity) is a not-for-profit

organization that administers AP and provides professional development for teachers reviews of

course syllabi and extensive curricular materials (eg sample syllabi sample lab experiments)3

The Board also offers standardized AP exams in the spring of each year that are graded by

external examiners and provide an externally-validated measure of student learning Most exams

include both an essay or problem-solving component and multiple-choice questions all of which

are aligned with the course descriptions The exam is one of the key features of the AP program

and is used by high school and postsecondary educators to evaluate the depth of studentsrsquo skill

independently of teacher bias

In addition to AP courses high school students typically have three alternative options for

advanced coursework Most high schools offer ldquohonorsrdquo courses which are intended to provide a

more rigorous curriculum than the regular course in the same subject The content and rigor of

honors courses varies across high schools and there is no standardized honors exam offered to

students in these courses A second option is the International Baccalaureate (IB) program

which was originally designed for students in international schools and aims to develop

studentsrsquo critical thinking skills and their knowledge of international affairs The IB program is

offered worldwide but remains relatively uncommon in the United States with less than 5 percent

of high schools offering IB in 2016 (The IB Programme 2016) A final option is for students to

take a course at a nearby college (or online) or for some a course that is taught at their high

school by an instructor who has been approved as college-level These ldquodual enrollmentrdquo or

ldquodual creditrdquo courses are meant to provide students with the opportunity to simultaneously earn

high school and college credit In the most recent national survey high schools reported

approximately two million enrollments in dual credit courses (Thomas et al 2013) There is

limited information on the colleges that accept dual enrollment credits Most courses are offered

through collaborations between high schools and local community and public postsecondary

institutions suggesting that credits are generally accepted at these institutions and less often

accepted at other institutions Comparisons of AP science classes to regular and honors level

science classes reveal that students receive much more homework and work harder in their AP

classes (Sadler et al 2014) To our knowledge there have been no comparisons of the workload

or effort in AP science courses compared to IB or dual enrollment science courses

6

B Conceptual Framework

There are several channels through which an AP science class is expected to influence studentsrsquo

cognitive and noncognitive skills Much like the ideal college course AP science is designed to

provide rigorous content and a substantial workload be taught by teachers who have high

expectations and consist of students who are driven to succeed These inputsmdashcourse rigor

teacher expectations and peer motivationmdashare often thought of as the main characteristics that

distinguish AP courses from other high school courses

Yet AP science classes are also intended to offer an inquiry-based approach to science that

when combined with a high level of rigor provides an additional causal pathway to change

Specifically a well-implemented AP science course should encourage students to ask questions

gather and interpret data arrive at explanations grounded in scientific principles and

communicate their observations to one another under the guidance of teachers (College Board

2011a 2011b)4 This student-led inquiry-based approach differs from many traditional

secondary school science classrooms where the goal is often for students to memorize content

and replicate laboratory experiments that demonstrate the content (National Research Council

2002 2012) The AP science course in contrast seeks to expose students to the real-world

practices of science and the skills that form the basis of scientific inquiry by focusing more on

big picture concepts and small group experimentation with students directing the inquiry The

curriculum also encourages teachers to move away from lecture-based pedagogy and multiple-

choice quizzes and to increase their use of technology to help students analyze data draw

interpretations and communicate findings (College Board 2011a 2011b)

AP science classes are expected to increase studentsrsquo ability to ask research questions design

experiments analyze data and draw conclusions In the process of gaining these scientific

inquiry skills the new curriculum is intended to spur greater interest in the practice of science

because it becomes more enjoyable and more accessible to students for whom rote memorization

and execution of prefabricated lab experiments might have diminished enthusiasm in the subject

(National Research Council 2012) Science experts posit that inquiry-based science courses will

be particularly successful in generating greater interest and skill among women and among

students from underrepresented minority groups (Aguilar Walton and Wieman 2014 Ellis

Fosdick and Rasmussen 2016 Kurth Anderson and Palincsar 2002 Leslie et al 2015 Litzler

Samuelson and Lorah 2014)

While the rigor and expectations of a college course may be appropriate for some students it

can be too demanding for others Students often report high levels of stress and burnout from

taking AP courses particularly if they perceive that they are not prepared for the challenge of

college coursework (Kim 2015 Marx 2014 Tucker 2012) A strenuous AP course could in fact

cause students to lose confidence in their ability to complete college science courses A number

of mechanisms could cause students to lose confidence including exposure to stronger peers

inability to successfully complete assignments or simply receiving lower grades than they

received in their non-AP courses5 The AP effect on confidence will likely matter differently for

students with different levels of initial confidence For students who are over-confident in their

ability to succeed in college science courses taking a challenging AP course in high school

might cause them to revise their expectations to be more in line with the higher demands of

college-level work

Taking a more strenuous AP course is also likely to affect studentsrsquo time allocation

Studentsrsquo performance in each class will be determined by their subject-specific ability as well as

the amount of time they devote to their coursework versus other activities including work

7

extracurricular and leisure If AP courses are more demanding than other courses students

solving a time allocation problem may shift more effort into their AP course away from other

pursuits The impact of this change in time allocation on studentsrsquo performance in AP and other

courses will depend upon whether they shift effort away from other courses and on the degree of

complementarity between their AP science course and their other courses Study time devoted to

an AP science course could improve student performance in other math and science classes

(where the skills tasks and knowledge are similar) even if students spend less time on those

courses For courses that require students to perform tasks that are not complementary with AP

science (eg courses in the humanities) taking AP science concurrently with these courses

could decrease student performance in both courses Of course students taking an AP course

could choose to reduce time spent on alternative (non-academic) activities If these other

activities have no causal impact on performance in school then the impact on overall

achievement could be negligible

Some students report concerns about their time allocation as they weigh the decision to enroll

in AP (Foust Hertberg-Davis and Callahan 2009 Hopkins 2012 Kim 2015) Many of these

concerns have increased over time as the courses have become more accessible to students who

previously faced barriers to enrollment Traditionally teachers only recommended AP courses to

students with high grades in prerequisite classes and the courses were only offered in schools

with substantial resources The Board has made efforts to increase access with for instance a

policy statement that encourages schools to open AP to all students who are ldquowilling to accept

the challengerdquo and remove all barriers that restrict access (College Board 2002)6 In a 2008

survey of a nationally-representative sample 65 percent of secondary school teachers reported

that their schools encourage as many students as possible to take AP and 69 percent reported that

AP courses are generally open to any student who wants to enroll (Duffett and Farkas 2009)

These open access policies have led to complaints that students who enroll with less preparation

will be unable to engage in the material (and perhaps become more discouraged by the

difficulty of the course) than students with more prior preparation (Hopkins 2012 Steinberg

2009 Duffett and Farkas 2009) Open access could also adversely affect more prepared students

through negative peer effects or through teachers removing content and slowing the pace of

course delivery

III AP Science Impact Study

A Overview

We recruited 23 schools from across the United States and offered monetary compensation to

pay for equipment and teacher training and as an incentive to secure participation7 Eligible

schools included ones that had not offered AP Biology or AP Chemistry in recent years were

willing to add such a course and comply with study protocol and had more eligible students than

could be served in one class so as to supply a sufficiently-sized control group8 Of the 23

schools 12 schools added AP Chemistry 10 schools added AP Biology and 1 school added both

courses We recruited two waves of schools (those that offered the course for the first time in

2013 and those that offered it for the first time in 2014) both waves were asked to field the

course for two years and the earlier-joining schools had the option of fielding the course for

three years The study includes 47 schools by cohort groups

Each participating school identified students that the school deemed eligible to take the new

AP Biology or Chemistry course in the spring of the prior year We treated all eligible students

8

who assented to participate in the study and who obtained consent from their parent or guardian

as study participants Upon receipt of signed consentassent forms we randomly offered

enrollment in the newly launched course to a subset of participating students9 The study

includes a total of 27 teachers and 1819 students (with an average of approximately 19 students

per AP class)

Figure 1 shows the geographic distribution of the 11 participating districts which are

primarily concentrated in the western southern and eastern regions of the country10 The

underrepresentation of districts in the Midwest is consistent with evidence that the Midwestern

region has experienced less competition over the years in access to selective postsecondary

institutions and a corresponding lag in AP participation rates (Bound Hershbein and Long

2009) Relative to districts across the nation those participating in the study tend to be in

neighborhoods with lower levels of socioeconomic status and to educate students who score

below average on tests in earlier grades (see Figure 2) Correspondingly participating schools

tend to be larger and more likely to educate students who are eligible for free or reduced-price

lunch Black and Hispanic than other schools (Panel A of Table 1)

There are two reasons for this over-representation of larger schools serving less economically

prosperous communities First AP courses are already offered in the majority of the nationrsquos

public high schools and schools that serve students from high-income families tend to offer

more AP subjects than schools that serve students from lower-income families (Malkus 2016

Theokas and Saaris 2013) Given that our research design only allowed for schools that had not

recently offered an AP science course the population of schools from which we recruited tended

to be those in settings with fewer resources Second participating schools were required to state

that they believed they would have 60 or more students who were qualified to take the AP

science course and this requirement tended to disqualify smaller high schools

Reflecting the school demographics participating teachers are slightly younger less

experienced and more likely to be female Black Asian American and of Hispanic ethnicity

than US high school science teachers generally (Panel B of Table 1) Nearly half (a third) of our

study teachers have less than or equal to five (two) years of teaching experience which is more

than double (triple) the rate of US high school science teachers Study teachers are more likely to

hold an undergraduate major in a STEM field than other high school science teachers yet far less

likely to hold a mastersrsquo degree and slightly less likely to have earned a teaching credential in

science Most of the participating teachers had previously taught a higher-level course (mostly

honors) yet only 47 percent of them had previously taught an AP course Our research

consequently applies to a population of teachers who are relatively new to the AP science

curriculum and who have generally not received graduate training11 Assuming AP courses

improve with teacher preparation our results likely capture the effect of a less-than-ideal version

of AP and may result in less positive treatment effects than when AP is delivered by teachers

with more training and experience (Clotfelter Ladd and Vigdor 2010)

B Data and Student Descriptive Statistics

We rely on three primary and secondary data sources for impact estimates The first is an

assessment developed and validated by the research team that measures studentsrsquo scientific

inquiry skills We administered this assessment to students in both treatment and control groups

and designed it to measure general inquiry skills (eg how to analyze data) rather than specific

content knowledge in Biology or Chemistry To that end the assessment tool includes nine items

that rely on science disciplinary knowledge that is taught in middle school specifically material

from Life Sciences and Physical Sciences The assessment which we administered to all study

9

participants during a 45-minute period measures studentsrsquo skills in data analysis scientific

explanation and scientific argument12 Participating teachers were not provided copies of the

instrument in advance therefore teachers were unable to teach any content material prior to test

administration

The second source is a questionnaire that we administered concurrently with the assessment

and that asks students a number of questions about their most recent science class and their plans

after high school The assessment and questionnaire were completed together and administered

outside of class (henceforth we refer to these instruments as the ldquosurveyrdquo) The third data source

are studentsrsquo high school transcripts which contain data on demographic and socioeconomic

background grades courses standardized exams taken in the 8th and 10th grades as well as high

school completion We use these data to determine the balance of randomization on pre-

treatment covariates estimate the effect of randomization on course-taking (including

compliance) improve the precision of our estimates with statistical controls and estimate

treatment effects on studentsrsquo grades

Our survey response rate was 78 percent13 Attrition can be attributed to student absences

during the dates scheduled for survey administration and communication lapses between school

coordinators and students Students who were randomly assigned to treatment have a 9-

percentage point higher survey response rate Given the possibility of nonrandom sample

attrition we weight all regressions by the inverse of the probability of completing the survey

conditional on student characteristics14 We implement a variety of robustness checks as

additional means to account for nonresponse These include multiple imputation of missing

outcome variables excluding one high school that had a low response rate and using the Lee

(2009) technique to provide bounds on the estimated effects These methods and results are

discussed below

We supplement these data with surveys that we administered online to teachers of the new

AP courses at the conclusion of the course The teacher survey includes questions about their

educational background professional experiences and professional development past and

present instructional practices generally and around science specifically participation in the

College Board AP training ability to cover the content of the AP course and coaching

mentoring and other professional community supports received from the school district and

education community

Table 2 provides balancing tests on pre-treatment characteristics for the full sample and the

survey respondents conditional on school by cohort fixed effects15 Most of the estimated

differences between treatment and control group students on pre-treatment observed

characteristics are small with some notable exceptions In both the full and survey samples

treatment group studentsrsquo reading exam scores were 010 and 009 standard deviations higher

than control group students both at p-values below 005 The magnitude of the treatment-control

difference was slightly lower and less precisely-estimated in math yet also favored treatment

group students16 To adjust for these chance imbalances we include all student covariates as

predictors of outcomes in the models and in the robustness checks we exclude these

covariates17

Table 2 also shows the extent of differences between control group compliers and non-

compliers We find that non-compliers are generally much more academically prepared for AP

science they have higher pre-treatment reading and math test scores and are more likely to have

completed the prerequisite courses On demographics non-compliers are more likely to be Asian

American and female18

10

IV Empirical Strategy

We estimate the effect of taking the AP science course with a standard instrumental variable

specification

(1) 119884119894119895 = 120572119895 + 119860119894119895120573 + 119935119894120574 + 120598119894119895

(2) 119860119875119894119895 = 120575119895 + 119874119891119891119890119903119890119889119894119895120579 + 119935119894120583 + 120598119894119895

where 119860119875119894119895 = 1 if student i enrolled in the AP science course in school x cohort stratum j 119860119894119895 is

the fitted value based on the estimates of the parameters in Equation (2) Offeredij = 1 if the

student is randomized into the treatment group Xi is a vector of pre-treatment covariates

(including age math and reading exam scores from 8th and 10th grade (standardized and

averaged for math and reading separately) cumulative GPA prior to the year when the AP

science course was offered and indicator variables for female racial group (Asian American

Black or Hispanic Native American or Multiracial) disability gifted English Language

Learner eligible for free or reduced-price lunch home language is not English and took

recommended prerequisite courses) and 120572119895 and 120575119895 are school by cohort fixed effects19 We use

two-stage least squares to estimate the model for all outcomes The local average treatment effect

(LATE) estimate is given by β

The intent to treat (ITT) estimate is obtained by replacing 119860119894119895 with Offeredij in Equation (1)

as shown in Equation (3) The coefficient on Offeredij in Equation (3) provides the effect of

being offered enrollment in the new AP science course and is a weighted average of effects on

those who do and do not choose to enroll in the course

(3) 119884119894119895 = 120577119895 + 119874119891119891119890119903119890119889119894119895120591 + 119935119894120582 + 120598119894119895

For outcomes that are obtained from the survey we weight regressions by the inverse of the

estimated probability of completing the survey20 The results are similar without using these

weights (see Online Appendix Tables 3 4 and 6) Since we have some missingness in student

characteristics as a result of either missing student transcripts or certain data elements not

collected by the district we use multiple imputation by chained equations creating 10 imputed

datasets and combine the results21 For inference we cluster standard errors at the level of

treatment assignment (school by cohort) in our analysis of main effects In the analysis of

robustness we report permutation standard errors robust standard errors (for comparison to

permutations) and the statistical significance of the LATE estimates after adjusting our tests of

significance for multiple comparisons

V Results

A Course-Taking and Treatment Contrast

Table 3 provides estimated effects of the randomized offer of enrollment on AP science course

enrollment and share of credits in all courses for the full sample and the survey samples The

first-stage estimates indicate that the offer substantially increased the likelihood of the student

taking the AP science course by 38 percentage points in the full sample and 39 percentage points

in the survey sample As we expected compliance with randomization was imperfect with 42

11

percent of the students who received an offer choosing not to enroll and 19 percent of the control

students enrolling Nearly all of these latter crossovers reflected decisions by the district to

violate the study protocol and let control group students into the course while a few of these

came from hardship exemptions that were requested by the school and granted by the study team

The remaining rows in Table 3 shine light on the courses that were crowded out by the newly

offered AP science course Mechanically treatment group students took more credits in AP

science (an 11-percentage point increase in the share of total credits in the full sample)

Treatment group studentsrsquo share of courses in any AP also increased by 11 percentage points

indicating that they chose not to reduce enrollment in other AP courses Instead taking AP

science appears to have crowded out regular courses (down 9 percentage points) including

regular science courses (down 2 percentage points)22

Approximately 78 percent of the control group compliers took any science course with 34

percent taking a non-AP advanced science course (almost entirely honors courses) during the

study year The control students who did not take AP Biology or Chemistry took a variety of

alternative science courses with the most commonly reported courses including Chemistry

(13) Physics (12) AP Environmental Science (11) Biology (10) Honors Biology (9)

and AnatomyPhysiology (9)

Table 4 provides the contrast in treatment and control group complier reports on the content

and rigor of their science courses for three composite variables We find that taking AP science

yielded a substantially more academically challenging curriculum (up 080 sd p-value lt 001)

and raised the extent of inquiry-based classroom activities (up 033 sd p-value = 006) Our

results also suggest that AP course-takerrsquos classrooms were more likely to use technology (up

028 sd p-value = 014)23 Online Appendix Table 5 shows estimated impacts on each of the

component variables used in constructing the composite variables We find that while AP

classrooms were more inquiry-based than other science classrooms using our composite

measure some of the core components of the inquiry approach that were intended by the Board

(eg applying knowledge to solve a new problem) were not more prevalent in AP science

classes than other science classes24 This contrast between studentsrsquo reports of the content and

rigor of their AP science course relative to other courses available to them offers one measure of

the relative quality of the treatment In a companion manuscript we provide a detailed evaluation

of implementation fidelity (the degree to which the courses were implemented as intended by the

Board) through teacher surveys course syllabi student transcripts and interviews with teachers

and school administrators (Long Conger and McGhee 2018) In that manuscript we find results

that are consistent with the finding that most teachers were able to implement a rigorous AP

science classroom yet they also struggled with the inquiry-based approach and integrating

technology into the classroom

These reported differences between treatment and control group classrooms also hold despite

the fact that many of the teachers selected to teach AP also teach the other science courses taken

by control group students In fact almost 67 percent of AP teachers reported using some of their

AP science strategies and lessons in their non-AP classes These within-school spillovers likely

attenuate observed differences in outcomes between treatment and control group students in the

same school25

B AP Impact on Outcomes

Table 5 reports estimated impacts of AP science on the key outcomes of interest We estimate

that for the typical complier taking AP science raises objectively measured scientific inquiry

skills by 023 standard deviations We are unable to rule out zero treatment impacts with

12

conventionally high levels of confidence (p-value = 014) and consequently refer to these results

as more suggestive than definitive AP science also increased compliersrsquo interest in pursuing a

STEM degree should they enroll in college by 9 percentage points up from a control group

complier mean of 62 percent with again more suggestive than definitive results at traditional

levels of statistical inference (p-value = 016)

Table 5 provides stronger evidence of negative treatment effects on studentsrsquo confidence in

their ability to succeed in a college science course Among control group compliers 92 percent

express that they are at least somewhat confident in their ability to succeed in a college science

course These high levels of confidence are perhaps not surprising since all of our sample

participants demonstrated interest in taking AP Chemistry or Biology as a result of signing the

study assent forms Taking AP science substantially lowered participantsrsquo likelihood of being at

least somewhat confident in their ability to complete college courses in science (down 10

percentage points p-value = 006) We also find large effects of the AP course on studentsrsquo self-

reported stress levels Among control group compliers 12 percent stated that their most recent

science class had a negative or strong negative impact on their stress levels (where a negative

impact indicates more stress) Taking AP science more than doubles this rate raising the

likelihood of stating a negative impact by 17 percentage points (p-value = 001) In results

available from the authors we also examine the effect of taking AP on the full distribution of

studentrsquos self-reported confidence and stress levels We find that taking AP science increases

studentsrsquo likelihood of reporting strong negative impacts on stress by 5 percentage points (p-

value = 005) above the control group complier mean of 2 percent

In addition to experiencing a loss in confidence and an increase in stress treatment group

studentsrsquo grades suffered We estimate that taking AP science reduced studentsrsquo grades in their

science courses by 029 points (p-value = 007) Relative to a control group complier mean of

280 taking AP science lowers studentsrsquo science GPAs during the study year (usually their junior

year) from around a B- to a C+26 This decline is addressed to some degree by high schools that

use a weighted grade point average to upweight grades from AP courses The last row of Table 5

provides our estimated effects of AP science on studentsrsquo grades in other courses AP science

takers score approximately 018 grade points lower than control group compliers in non-science

courses during the study year (p-value below 001) These results suggest that students may be

shifting their effort away from their non-AP classes in order to meet the demands of the

challenging AP course An average of these impacts weighted by studentsrsquo share of credits in

science during the study year assuming that they take AP science (024) suggests that taking AP

science lowers studentsrsquo overall grades by 021 during the year ((-029 times 024) + (-018 times

076))

With our estimates in hand we can easily compute the adjustment that would leave the

studentrsquos GPA during the study year unaffected For students who took AP Biology or Chemistry

as result of this experiment the share of their classes in any AP science subject is predicted to be

14 percent (ie 002 + 012 from Table 3) If these studentsrsquo grades in AP science courses were

boosted by 146 (021014) their GPAs during the study year would be unaffected by their

enrollment in these AP courses This 146 boost is close to the higher end of the practices

documented in Klopfenstein and Lively (2016)27

C Robustness Checks

Table 6 presents a variety of robustness checks of the ITT estimates on our six main outcomes

The first two columns of this table repeat the findings previously shown in Table 5 Columns (3)

and (4) present alternate methods for inference Column (3) reports robust standard errors and

13

Column (4) reports the results of a permutation test where we randomly assign a pseudo

treatment and compute the share of 1000 permutations where the absolute value of the estimated

pseudo treatment effect exceeds the absolute value of the estimated treatment effect shown in

Column (2)28 The resulting p-values from this permutation test are similar to the results using

robust standard errors (shown in Column (3)) resulting in five of the six outcomes with p-values

of less than 01029

Columns (5) through (7) of Table 6 show that the results are robust to (a) dropping the one

high school that offered both AP Biology and AP Chemistry as part of the study (b) including

observations with multiply-imputed missing outcome variables and (c) excluding the high

school with the lowest survey response rate30 Column (8) shows the results when we exclude all

of the Xi covariates where we find much larger estimated positive effects on scientific inquiry

skills and smaller estimated negative effects on grades The differences in the treatment effects

on the remaining three outcomes are modest These results likely reflect the fact that students

who were randomly assigned into the treatment group have higher pre-treatment grades and

reading and math test scores all covariates that strongly correlate with science skill and future

grades

Columns (9) through (12) of Table 6 use the Lee (2009) method to place bounds on our

estimates due to potential nonresponse bias in the student survey used for the first four outcomes

This method trims particular observations from the treatment group (in this case) until it matches

the response rate of the control group The lower (upper) bound estimate trims the treatment

observations with the highest (lowest) values of the outcome Using these lower and upper bound

estimates we compute the 95 percent confidence interval for the treatment effect itself by

applying the Imbens and Manski (2004) method Consistent with our main findings the upper

and lower bound points estimates are positive for science skill (003 and 039 sd) interest in

pursuing a STEM degree (2 and 12 percentage points) and stress (1 and 11 percentage points)

However the 95 percent confidence intervals overlap zero in all cases and are roughly double the

size of the ordinary confidence intervals These results suggest that some additional caution

should be considered in evaluating the effects from outcomes based on the study survey31

Finally we would have liked to report the results of theoretically motivated heterogeneity

analyses yet we lack the statistical power needed to test heterogeneity with a high level of

confidence For example Figure 3 shows a quantile regression conditional on Xi with science

skill as the outcome We find that the point estimates at every quantile are insignificantly

different from the 009 ITT point estimate reported in Table 5 yet the 95 confidence intervals

fail to rule out large positives and negatives Additional heterogeneity results can be found in the

Online Appendix32

VI Conclusion

Most admissions committees at bachelorrsquos degree-granting institutions rely on applicantsrsquo AP

course and exam participation as signals of subject-matter skill and interest rendering the

relationship between AP uptake and college enrollment somewhat deterministic There has been

almost no empirical work to support the theory that AP disproportionately endows high school

students with greater human capital than the other courses available to them Many students

educators and parents have also complained that the rigor of the AP pro- gram causes students to

lose confidence gain stress and perform poorly in other courses We evaluate these claims with

experimental evidence on the impact of AP Biology and Chemistry courses on studentsrsquo skills

14

interests and beliefs We recruited 23 schools that had not previously offered AP Biology or

Chemistry and were willing to permit us to randomize student access to the newly offered

course At the time of our school recruitment an estimated 50 percent of US high schools

already offered AP science classes and they tended to be in relatively higher-income

communities disproportionately serving White students (Malkus 2016) Our study drew from the

remaining population of schools where teachers had lower levels of training than science

teachers nationally and students were disproportionately non-White and poor Consequently our

results on AP impacts best generalize to schools like these that are on the cusp of deciding

whether to offer an AP science course

The estimates suggest that AP science led to improvements in science skill and STEM

interest above the courses that these students would otherwise take Prior research points to

longer-run benefits of AP including a higher likelihood of college enrollment and completion as

well as possible earnings gains (Jackson 2010 2014) Our findings suggest that these long-term

effects are at least partially driven by genuine increases in skill and not due solely to

postsecondary admissions and credit-granting policies33 We also find that AP science classes

substantially increase studentsrsquo stress levels and reduce their confidence in completing a college

science course Students who take AP science also receive lower grades in science and in other

(non-science) courses The cognitive gains from AP science are consistent with evidence that

higher levels of pressure and a lower level of confidence cause students to learn more than they

would otherwise And some of the negative effect on grades can be offset by upwardly weighting

grades in advanced courses

Although we have no direct way to convert our study impacts into monetary values for

students or society our evidence suggests that schools and districts are not making unwise or

costly investments in AP Calculating the differential cost to deliver an AP course versus another

level course in the same subject is difficult given that few schools document per-course

expenditures One recent analysis of a US district that relied on teacher salaries and course

assignments offers a partial cost-analysis Roza (2009) finds approximately $360 more in per-

pupil expenditures to deliver AP versus honors due primarily to smaller class sizes and more

senior teachers in AP This cost does not factor in the time that teachers spend retraining

themselves to teach the new curriculum At the same time relative to other policies aimed at

increasing human capital in high school that are often more costly to implement (such as

reducing class size) offering an AP course may be one of the least expensive options

This study offers the first credible estimates on the impact of a curriculum that is now offered

in the majority of the nationrsquos high schools and used by most postsecondary institutions to assess

applicant potential Our findings offer evidence to support and refute some of the claims made

about the AP program At the same time many important questions remain about differential AP

course impacts along student teacher and school attributes and on different parts of the outcome

distributions What are the general equilibrium effects of AP expansion for instance on college

admissions decisions as AP expands into schools with fewer resources Do AP courses generate

spillover effects on non-AP course-takers via changes in peer interactions and changes in how

teachers teach their non-AP classes These are all questions that warrant further research

15

References

Abadie Alberto Susan Athey Guido W Imbens and Jeffrey Wooldridge 2017 ldquoWhen Should

you Adjust Standard Errors for Clusteringrdquo NBER Working Paper No 24003

Cambridge MA NBER

Adelman Clifford 2006 The Toolbox Revisited Paths to Degree Completion from High School

Through College Washington DC US Department of Education

Aguilar Lauren Greg Walton and Carl Wieman 2014 ldquoPsychological Insights for Improved

Physics Teachingrdquo Physics Today 67 (5) 43ndash49

Altonji Joseph G 1995 ldquoThe Effects of High School Curriculum on Education and Labor

Market Outcomesrdquo The Journal of Human Resources 30 (3) 409ndash438

Anderson Carl R 1976 ldquoCoping Behaviors as Intervening Mechanisms in the Inverted-U-stress-

performance Relationshiprdquo Journal of Applied Psychology 61 (1) 30ndash34

Attewell Paul and Thurston Domina 2008 ldquoRaising the Bar Curricular Intensity and

Academic Performancerdquo Educational Evaluation and Policy Analysis 30 (1) 51ndash71

Avery Christopher Oded Gurantz Michael Hurwitz and Jonathan Smith 2018 ldquoShifting

College Majors in Response to Advanced Placement Exam Scoresrdquo Journal of Human

Resources 53 (4) 918ndash956

Benjamini Yoav and Yosef Hochberg 1995 ldquoControlling the False Discovery Rate A Practical

and Powerful Approach to Multiple Testingrdquo Journal of the Royal Statistical Society 57

(1) 289ndash300

Bennett J S Hogarth F Lubben B Campbell and A Robinson 2010 ldquoTalking Science The

Research Evidence on the Use of Small Group Discussions in Science Teachingrdquo

International Journal of Science Education 32 (1) 69ndash95

Berger Joe 2006 ldquoDemoting Advanced Placementrdquo The New York Times October 4

Boekaerts Monique and Jeroen S Rozendaal 2010 ldquoUsing Multiple Calibration Indices in

Order to Capture the Complex Picture of What Affects Studentsrsquo Accuracy of Feeling of

Confidencerdquo Learning and Instruction 20 (5) 372ndash382

Bound John Brad Hershbein and Bridget Terry Long 2009 ldquoPlaying the Admissions Game

Student Reactions to Increasing College Competitionrdquo The Journal of Economic

Perspectives 23 (4) 119ndash146

Bowie Liz 2013 ldquoMaryland Schools have been Leader in Advanced Placement but Results are

Mixedrdquo The Baltimore Sun August 17

Bush George W 2006 ldquoState of the Union Address by the Presidentrdquo Washington DC The

White House

Chiu Ming Ming and Robert M Klassen 2010 ldquoRelations of Mathematics Self-Concept and its

Calibration with Mathematics Achievement Cultural Differences among Fifteen-year-

olds in 34 Countriesrdquo Learning and Instruction 20 (1) 2ndash17

Clotfelter Charles T Helen F Ladd and Jacob L Vigdor 2010 ldquoTeacher Credentials and

Student Achievement in High School Across-Subject Analysis with Student Fixed

Effectsrdquo Journal of Human Resources 45 (3) 655ndash681

College Board 2002 Equity Policy Statement New York NY

__________ 2011a AP Biology Curriculum Framework 2012-2013 New York NY

__________ 2011b AP Chemistry Curriculum Framework 2013-2014 New York NY

__________ 2017a AP Course and Exam Redesign New York NY

__________ 2017b AP Course Audit New York NY

__________ 2018 AP Program Participation and Performance Data 2018 New York NY

16

Davis Jennifer R 2014 ldquoA Little Goes a Long Way Pressure for College Students to Succeedrdquo

Journal of Undergraduate Research 12 (1) 1ndash9

Dobbie Will and Roland G Fryer Jr 2015 ldquoThe medium-term impacts of high-achieving

charter schoolsrdquo Journal of Political Economy 123 (5) 985ndash1037

Dougherty Chrys and Lynn Mellor 2009 ldquoPreparation Mattersrdquo National Center for

Educational Achievement Washington DC

Dounay Zinth Jennifer 2016 ldquo50-State Comparison Advanced Placement Policiesrdquo Education

Commission of the States

Drew Christopher 2011 ldquoRethinking Advanced Placementrdquo The New York Times January 7

Duffett Ann and Steve Farkas 2009 ldquoGrowing Pains in the Advanced Placement Program Do

Tough Trade-offs Lie Aheadrdquo Thomas B Fordham Institute Washington DC

Ellis Jessica Bailey K Fosdick and Chris Rasmussen 2016 ldquoWomen 15 Times More Likely to

Leave STEM Pipeline after Calculus Compared to Men Lack of Mathematical

Confidence a Potential Culpritrdquo PLOS ONE 11 (7) 1ndash14

Foust Regan Clark Holly Hertberg-Davis and Carolyn M Callahan 2009 ldquoStudentsrsquo

Perceptions of the Non-academic Advantages and Disadvantages of Participation in

Advanced Placement Courses and International Baccalaureate Programsrdquo Adolescence

44 (174) 289ndash312

Geiser Saul and Veronica Santelices 2004 ldquoThe Role of Advanced Placement and Honors

Courses in College Admissionsrdquo Center for Studies in Higher Education Research

Occasional Paper Series CSHE404

Goodman Joshua Samuel 2012 ldquoThe Labor of Division Returns to Compulsory Math

Courseworkrdquo Unpublished Manuscript

Harel O 2009 ldquoThe Estimation of R-squared and Adjusted R-squared in Incomplete Data Sets

Using Multiple Imputationrdquo Journal of Applied Statistics 36 (10) 1109ndash1118

Hippel Paul T von 2007 ldquoRegression with Missing Ys An Improved Strategy for Analyzing

Multiply Imputed Datardquo Sociological Methodology 37 (1) 83ndash117

Holstead Michael S Terry E Spradlin Margaret E McGillivray and Nathan Burroughs 2010

ldquoThe Impact of Advanced Placement Incentive Programsrdquo Center for Evaluation and

Education Policy Indiana University Education Policy Brief 8(1)

Hopkins Katy 2012 ldquoWeigh the Benefits Stress of AP Courses for Your Studentrdquo US News

the World Report May 10

Huber Martin 2013 ldquoA Simple Test for the Ignorability of Non-compliance in Experimentsrdquo

Economics Letters 120 (3) 389ndash391

Imbens G and F Manski 2004 ldquoConfidence Intervals for Partially Identified Parametersrdquo

Econometrica 72 (6) 1845ndash1857

Jackson C Kirabo 2010 ldquoA Little Now for a Lot Later A Look at a Texas Advanced Placement

Incentive Programrdquo Journal of Human Resources 45 (3) 591ndash639

__________ 2014 ldquoDo College-Preparatory Programs Improve Long-Term Outcomesrdquo

Economic Inquiry 52 (1) 72ndash99

Joensen Juanna Schroslashter and Helena Skyt Nielsen 2009 ldquoIs there a Causal Effect of High

School Math on Labor Market Outcomesrdquo Journal of Human Resources 44 (1) 171ndash

198

Kim Emily 2015 ldquoAP Classes often Translate to Advanced Pressurerdquo Los Angeles Times

September 22

Klopfenstein Kristin and Kit Lively 2016 ldquoDo Grade Weights Promote More Advanced

17

Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324

Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement

Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891

__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and

Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds

Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188

Cambridge Harvard Education Press

Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla

Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)

287ndash 313

Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on

Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102

Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations

of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347

(6219) 262ndash265

Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math

and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic

Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student

STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher

Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking

on Secondary and Postsecondary Successrdquo American Educational Research Journal 49

(2) 285ndash322

Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP

Expansion Can Schools in Less-Resourced Communities Successfully Implement

Advanced Placement Science Coursesrdquo Conditionally accepted by Educational

Researcher

Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo

American Enterprise Institute Washington DC

Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23

McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy

Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of

Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-

144) US Department of Education Washington DC National Center for Education

Statistics

National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of

Mathematics and Science in US High Schoolsrdquo Washington DC National Academies

Press

__________ 2012 A Framework for K-12 Science Education Practices Crosscutting

Concepts and Core Ideas Washington DC The National Academies Press

Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC

Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data

Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures

Version 10 Stanford University

Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic

Analysis amp Policy 4 (1) 1ndash30

18

Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The

Review of Economics and Statistics 86 (2) 497ndash513

Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)

Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of

Advanced High School Coursework in Increasing STEM Career Interestrdquo Science

Educator 23 (1) 1ndash13

Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework

in College Admission Decisionsrdquo College and University 82 (4) 7ndash14

Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan

Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific

Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo

Educational Measurement Forthcoming

Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where

it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor

Economics 35 (1) 67ndash147

Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An

Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732

Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual

differencesrdquo Personality and Individual Differences 21 (6) 971ndash986

Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of

Cross-Cultural Psychology 45 (5) 821ndash837

Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid

Growthrdquo The New York Times April 29

Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo

Liberal Education 94 (3) 38ndash43

The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo

Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo

Education Trust June 5

Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and

Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-

001) US Department of Education Washington DC National Center for Education

Statistics

Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13

Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate

US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the

Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced

Placement Testsrdquo Washington DC

Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of

Advanced Placementrdquo Progressive Policy Institute Washington DC

West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth

Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring

Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation

and Policy Analysis 38 (1) 148ndash170

Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity

of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482

19

Figure 1

Geographic Distribution of Participating Districts

20

Figure 2

Participating Districts Neighborhood Socioeconomic Status and School Test Scores

Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school

district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos

neighborhood defined as the first principal component factor score based on measures of median

income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed

household rate and unemployment rate Y-axis is the districtrsquos average test score in grade

equivalents based on the averaged spring math and English scores for students in grades 3-8 for

2009-2013 with the expected level of achievement standardized to zero The size of each circle

is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using

Statarsquos default settings and roughly shows the predicted test score as a function of the

neighborhoodrsquos SES

21

Figure 3

Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile

Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects

Corresponding OLS estimate shown by the dashed horizontal line Science skill has been

standardized to have a mean of 0 and SD of 1 for the full sample of participating students

Results are weighted by the inverse probability of completing the survey

22

Table 1

Participating Schools and Teachers Compared to Other US High Schools and High School

Science Teachers Panel A Schools Participating Others

Average Enrollment 1409 723

Free or Reduced-Price Lunch 0700 0438

Asian 0055 0050

Black 0349 0154

Hispanic 0410 0221

White 0164 0537

Adjusted Cohort Graduation Rate 0843 0802

District Instruction Expenditures Per Pupil $6561 $5636

District Student Services Expenditures Per Pupil $3787 $3385

Panel B Teachers Participating Others

Age Under 30 0407 0160

Age 30-49 0432 0553

Age 50 or over 0161 0287

Female 0630 0536

Hispanic or Latino 0111 0051

Race American Indian or Alaska Native 0000 0009

Race Asian American 0111 0041

Race Black 0111 0060

Race Native Hawaiian or other Pacific Islander 0000 0004

Race White 0778 0896

Years of Experience 103 132

Years of Experience lt=2 0290 0085

Years of Experience lt=5 0481 0234

Hold a Teaching Certificate 0926 0945

Undergraduate Major in STEM 0944 0747

Single Subject Credential in Science 0630 0823

Masterrsquos Degree or Higher 0356 0615

Previously Taught AP Course 0469 NA

Previously Taught AP IB or Honors Course 0796 NA

Number of Professional Development Trainings 309 NA

in the Past 5 years (0-5)

Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts

httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public

high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a

9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the

Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey

httpsncesedgovsurveyssass Others in Panel B refers to public and private high school

teachers in the US High school science teachers are defined as teachers of grades 9-12 whose

main teaching assignment is in the natural sciences

23

Table 2

TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics

(1) (2) (3) (4) (5) (6)

Full Sample Survey Sample

Pre-Treatment Characteristic

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Age as of October of 11th Grade 166 -003 -007 166 -001 -001

(002) (007) (003) (009)

[019] [035] [065] [094]

Math Exam Score 038 008 025 044 007 030

(004) (010) (005) (016)

[008] [002] [017] [006]

Reading Exam Score 029 010 018 036 009 017

(003) (012) (004) (017)

[000] [014] [002] [031]

HS Grade Point Average 316 005 020 323 006 013

(003) (008) (003) (010)

[014] [002] [006] [020]

Female 059 000 010 061 -001 011

(003) (006) (004) (007)

[099] [010] [073] [012]

Asian American 012 002 010 012 003 010

(002) (005) (001) (007)

[027] [006] [007] [012]

Black 032 -002 -006 027 000 -005

(002) (006) (002) (005)

[029] [028] [088] [040]

Hispanic Native American or Multiracial 031 001 005 033 001 005

24

(002) (006) (002) (007)

[055] [041] [081] [051]

Disabled 002 000 -001 001 000 -001

(001) (001) (001) (001)

[093] [024] [057] [05]

Gifted 013 003 000 014 002 001

(002) (005) (002) (009)

[006] [100] [025] [089]

English Language Learner 005 001 002 004 001 004

(001) (002) (001) (003)

[041] [039] [054] [022]

Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007

(002) (007) (003) (009)

[066] [077] [072] [045]

Language Other than English Spoken at Home 034 002 003 035 001 004

(002) (007) (002) (007)

[032] [073] [059] [056]

Took Recommended Prerequisite Courses 079 000 009 079 002 005

(002) (004) (002) (005)

[084] [004] [043] [031]

Number of Observations 1819 1417

Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by

School x Cohort are in parentheses and p-values are in brackets

25

Table 3

First Stage Impacts on AP Course Enrollment and Overall Course Enrollment

(1) (2) (3) (4) (5) (6)

Full Sample Survey Respondents

Outcome

Control

Group

Mean

ITT

LATE

Control

Group

Mean

ITT

LATE

AP Treatment Course Enrollment 019 038 024 039

(005) (006)

[000] [000] Share of Credits During Study Year in

AP Science 003 004 011 003 004 010

(001) (001) (001) (001)

[000] [000] [000] [000]

All AP 013 004 011 014 004 010

(001) (002) (001) (002)

[000] [000] [000] [000]

Other Advanced Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [023] [020] [020]

All Other Advanced 025 -001 -003 025 -001 -003

(001) (002) (001) (003)

[023] [023] [030] [030]

Regular Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [020] [024] [019]

All Regular 062 -003 -009 061 -003 -007

(001) (003) (001) (003)

[002] [000] [007] [003]

Number of Observations 1819 1417

Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating

Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation

(1) Course-taking information collected from student transcripts Control Group Mean uses the

full control group for the first outcome (ie AP Treatment Course Enrollment) and those control

group members who complied with their assignment (ie those who did not take the AP

Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are

weighted by the inverse probability of completing the survey Standard errors clustered by School

x Cohort are in parentheses and p-values are in brackets

26

Table 4

Treatment Contrast (Composite Variables)

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Academically Challenging Curriculum -033 031 080

(010) (024)

[000] [000]

Project-Based Independent Classroom

Activities -006 013 033

(007) (017)

[007] [006]

Integrated Use of Technology

-011 011 028

(008) (019)

[019] [014]

Number of Observations 1417

Notes To construct these composite variables we first converted the values on each component

variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest

category was set to 10 the lowest to 00 and the remaining categories evenly spaced between

00 and 10 We then averaged and standardized these converted values Results are weighted by

the inverse probability of completing the survey Online Appendix Table 5 provides the list of

component variables Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

27

Table 5

AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Science Skill -019 009 023

(006) (016)

[015] [014]

STEM Interest 062 004 009

(002) (007)

[016] [016]

Confidence in College

Science 092 -004 -010

(002) (005)

[011] [006]

Stress 012 007 017

(003) (007)

[002] [001]

Grades in Science Courses 280 -012 -029

(007) (016)

[008] [007]

Grades in Other Courses 314 -007 -018

(002) (006)

[000] [000]

Number of Observations 1819 for grades 1417 for other

outcomes

Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of

participating students STEM interest =1 if high or some interest in pursuing a STEM degree or

=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to

complete a college science course or =0 if somewhat not confident or not at all confident Stress=

1 if most recent science course had strong negative or negative impact on physical or emotional

health or =0 if strong positive impact positive impact or no impact Grades in science and other

courses are obtained from student transcripts and measure grades during the study year

Results with the exception of grades during study year are weighted by the inverse probability of

completing the survey Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

Table 6

Robustness Checks of Main ITT Results

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Outcome

Control

Group

Complie

r Mean

Main

Result

s

Robus

t SE

p-value

(permutatio

n test)

Excludin

g High

School

56

Including

Imputatio

n of

Missing

Outcome

Variables

Excluding

Covariate

s

Excludin

g High

School

23

Lee

Lower

Boun

d

Lee

Upper

Boun

d

95

Confidence

Interval

from Lee

Bounds

Rati

o of

95

CI in

(11)

to

95

CI in

(7)

Science Skill -019 009 010 011 020 007 003 039

-

009

05

1 20

(006) (005) (000) (000) (000) (000) (007) (007)

[015] [006] [006] [020] [011] [001] [024] [072] [000]

STEM Interest 062 004 005 003 003 003 002 012

-

003

01

8 19

(002) (003) (000) (000) (000) (000) (003) (004)

[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College

Science 092 -004 -003 -006 -006 -004 -006 005

-

009

01

0 20

(002) (002) (000) (000) (000) (000) (002) (003)

[011] [005] [007] [037] [002] [003] [010] [000] [017]

Stress 012 007 005 006 008 007 001 011

-

005

01

5 16

(003) (002) (000) (000) (000) (000) (003) (002)

[002] [000] [000] [014] [007] [002] [002] [079] [000]

Grades in Science Courses 280 -012 -006 -010 -007 |

(007) (004) (000) (000) (000)

[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts

Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey

(002) (003) (000) (000) (000) |

[000] [001] [001] [000] [001] [038]

Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than

standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby

a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of

43

the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)

reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the

experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply

imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and

where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)

from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those

treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and

control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to

derive confidence interval for the treatment effect itself)

30

1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the

Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the

effect of advanced high school courses more generally often without distinctions between AP

and other rigorous course options Nearly all of these nonexperimental studies find large positive

effects of rigorous secondary school courses particularly those in math and science on studentsrsquo

high school postsecondary and labor market performance (eg Altonji 1995 Attewell and

Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long

Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer

an AP course are encouraged (though not required) to attend a professional development

training The Board and other independent agencies offer several workshops with the most

extensive training being the AP summer institute a week-long training that is led by an

experienced AP instructor Teachers are then expected to develop their syllabi for the course and

submit them to the Board for review A team of auditors at the Board review each syllabus and

grant permission to a school to label the course as AP on course catalogs and student transcripts

once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they

do not meet the requirements upon original submission College Board (2017b) contains a

discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for

assessment (ie course delivery and student performance are not assessed by the Board) In

order to effectively run an AP Biology or Chemistry course teachers require access to a well-

equipped classroom and laboratory including all supplies necessary to engage in

experimentation (eg beakers solutions microscopes measuring equipment) Most of the

teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of

learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry

reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses

were based upon recommendations from the National Science Foundation the National Research

Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently

influenced by their frames of reference in ways that other assessments of these traits (eg

external observations) may be less influenced By increasing the standard to which they compare

themselves studentsrsquo confidence may decrease This feature of most self-assessments could be

considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et

al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome

depends to some extent on how these changes in perceived ability influence other behaviors

such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and

Biology I and Chemistry I for AP Biology with no additional requirements beyond these

prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week

training course classroom supplies (eg lab materials textbooks) and to compensate schools

for the staff time required for study administration efforts We also offered $1000 compensation

for an individual selected by the school to serve as a liaison between the study team and the

31

school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of

students which would have powered the study to detect effect sizes smaller than those detected

here We faced several challenges in recruiting schools to participate even with the monetary

incentives Some schools were uncomfortable with randomization across classrooms while

others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the

course would be offered We also made some assignments on a rolling basis as additional

consentassent forms were submitted We have no information on the students who were deemed

eligible by the school to take the new AP science course but who did not sign the consent form

to participate As these students did not participate we do not have permission to obtain

information on their characteristics (eg via transcripts) and for most schools we do not know

the number of such students 10 Participating districts include Anaheim Union High School District California East Side

Union High School District California Lynwood Unified School District California Jefferson

Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg

Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public

Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville

Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate

degrees held by teachers nationally are likely to be in education (not STEM) Thus the study

teachers are less likely to have a graduate degree but not necessarily less likely to have STEM

training We also did not survey teachers regarding their Teach for America (TFA) experience

but it is possible that the relatively high share of STEM undergraduate degrees could be driven

by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last

pilot test included 140 students) prior to administering the tool to study participants Reliability

metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of

the development of the assessment instrument in the survey can be found in Seeratan et al

(2017) 13 Each year in the spring semester our team administered and collected the participant surveys

during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However

if study participants who did not take the survey differ in unobserved ways then our reweighting

based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo

characteristics before imputation of missing values (as described below) these results are very

similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact

that some students did not have 10th grade scores) we created one reading and math score for

each student that is the average of both scores or just the 8th grade score For the 23 participating

students who were in 10th grade during the year in which the AP course was offered to their

cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be

endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of

enrollment in the course so the lack of balance is simply due to unlucky randomization rather

32

than manipulation by school administrators We considered implementing a randomized block

design to avoid such issues but found it infeasible to obtain the necessary test score information

prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was

allowed to register for the new class We added an entire planning year to our study design to

avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by

Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We

find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these

six outcomes which suggests that generalizing our estimated treatment effects to the full control

group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly

offered enrollment in an AP course and then given the option of Chemistry or Biology To

account for the two courses offered we treat the school as two separate groups School-

Chemistry and School-Biology For those students who were not offered an AP course we

randomly assign them to one of two control groups proportional to the number of treated

students who chose each course For example if 60 of the treated students chose Biology then

we randomly assign 60 of the control students to the School-Biology control group In Section

VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a

probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1

if student i in school by cohort j completed any part of the end-of-year survey Xi is the same

vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed

effects and Φ() is the cumulative normal distribution function The results of this regression are

included in Online Appendix Table 2 Students who had higher pre-treatment grades Black

students those who were not disabled and those who took prerequisite courses were more likely

to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives

more weight in the regression to study participants who completed the survey and yet had pre-

study characteristics that were similar to those study participants who did not complete the

survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and

with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we

observe each outcome variable This follows a multiple imputation then deletion strategy

suggested by Hippel (2007) which improves efficiency while protecting against problematic

imputed outcome values As a robustness check Section VC provides results including

imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually

challenging curriculum with more homework than non-AP complier students Treatment group

students are also more likely to report that the students in their class were driven to succeed and

that the teacher set high standards The AP science class also involved more student-led projects

or experiments hands on learning and small group work all activities that are deemed to be

essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)

Yet we do not find strong evidence that students in AP classes were more likely to present what

they learned apply their knowledge to solve a new problem or work independently and none of

the component measures of technology usage were statistically significantly affected Nor did

33

treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear

better able to implement the academic rigor expected of an AP science class than some of the

inquiry-based approaches that the College Board intends for AP science We do not find

evidence that taking AP science led students to be more likely to report that they found their

course more interesting which may reflect the inability of the teachers to fully implement a

creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects

that might render our estimated effects smaller A research design with randomization both

across and within schools would allow for estimation of spillover effects but such a design was

infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers

received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the

weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors

in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple

outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons

(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same

three outcomes that reach statistical significance without applying the correction (shown in

Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys

from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we

have included the students from cohort 1 of high school number 23 where nonresponse was due

mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes

(including student prior academic preparation raceethnicity gender and teacher preparation)

We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in

science and grades in other courses) Some of the differences in the point estimates were quite

large yet so too were the standard errors For instance five of the seven estimated differential

treatment effects on science skill exceed 025 standard deviations with p-values that fall in the

suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse

on all three cohorts of study participants Once data collection is complete we will have the

ability to examine the effect of AP science on college enrollment college selectivity and college

completion

Page 2: Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown,

2

I Introduction

The Advanced Placement (AP) program a set of college-level courses and exams offered at the

high school level has become a centerpiece of efforts to strengthen the transition to

postsecondary training and boost human capital Many colleges and universities treat studentsrsquo

enrollment in AP courses and scores on AP exams as a signal of quality in admissions and grant

college credit or course waivers to students who receive high AP exam scores (Geiser and

Santelices 2004) These incentives have prompted a substantial increase in the number of

students taking AP courses and exams in recent decades with more than five times as many AP

exams taken in 2018 (over five million) as in 1996 (less than one million) (College Board 2018)

At the programrsquos inception in the mid-1950s AP courses were found in a handful of elite private

schools today AP is offered in nearly 70 percent of public schools in the United States (Thomas

et al 2013)

Much of the expansion has been driven by federal and state policies designed to increase

access to AP including offering subsidies to pay for exams building AP course offerings into

school accountability requirements and requiring public postsecondary institutions to offer

credit for AP exam scores (Adelman 2006 Dounay Zinth 2016 Holstead et al 2010) For almost

20 years for instance the US Department of Education has provided states with funds to offset

the cost of AP exams for low-income students1 Despite the programrsquos popularity among many

AP also has its critics Some researchers and educators claim that the programrsquos effectiveness

has been oversold and that there is no real evidence that AP endows students with greater skill or

subject-matter interest than other high school courses (Berger 2006 Drew 2011 Klopfenstein

and Thomas 2010 2009 Tai 2008 Tierney 2012) Others worry that the pressure of AP courses

causes students undue stress and confidence loss (Hopkins 2012 Kim 2015 Steinberg 2009)

The expansion of AP to less-resourced schools has also raised concerns that many of the students

now taking the courses are academically underprepared such that the monetary and psychic costs

of the investment may outweigh the potential benefits (Bowie 2013 Dougherty and Mellor 2009

Duffett and Farkas 2009 Smith Hurwitz and Avery 2017 Tierney 2012) Up to now

researchersrsquo ability to generate causal evidence on any of the claims made by proponents and

opponents has been substantially limited by the nonrandom sorting of students into AP classes

As a result all of the prior research on AP impacts has been observational

In this paper we provide the first experimental evidence on AP program impacts We focus

on AP science courses which have been endorsed by educators and policymakers as a key

strategy for increasing American studentsrsquo skill and interest in Science Technology

Engineering and Mathematics (STEM) and strengthening the STEM workforce (eg Adelman

2006 Bush 2006 House 2016) With participation from 23 schools and over 1800 students from

across the United States we randomly offered students enrollment into newly launched AP

Biology or Chemistry courses in their schools To directly evaluate whether AP endows students

with higher levels of skill than other science courses we designed and validated an instrument to

measure studentsrsquo scientific inquiry abilities (eg the ability to analyze data and make scientific

arguments) We also collected administrative data and surveyed students to assess AP impacts on

their interest in pursuing a STEM degree in college confidence in completing a college science

course high school grades and stress levels In addition to generating impact estimates we

report on the courses that AP crowds out along with the contrast between treatment and control

students in the content and rigor of their science courses

The results suggest that there is some truth in the claims made by both advocates and critics

3

of AP Consistent with the goals of an AP course treatment group students report that their

courses are more challenging and inquiry-based than control group students These views are

shared by teachers who report a higher level of rigor in their AP science courses compared to

their other science courses We find suggestive evidence that this academic challenge leads to

increases in skill AP course-takers score 023 standard deviations higher than control group

compliers on the end-of-year assessment of scientific skill Though our precision prevents us

from ruling out zero treatment effects at traditional levels of statistical inference (p-value=014)

this large point estimate suggests genuine productivity gains for students who take AP science

over and above the gains experienced by students who enroll in other high school courses We

also find suggestive evidence of an AP science boost to studentsrsquo interest in pursuing a STEM

degree should they enroll in college Together these results fail to support the concern that the

AP programrsquos impact on human capital has been oversold

At the same time our results confirm that the workload and expectations of an AP science

class causes students to lose confidence in their ability to succeed in college-level science gain

stress and earn lower grades (prior to the weights that are often attached to AP grades by

secondary and postsecondary institutions) The confidence levels among study participants are

quite high with 92 percent of control group compliers reporting that they are ldquosomewhatrdquo or

ldquoextremelyrdquo confident in their ability to succeed in a college science course AP course-takers

report a 10-percentage point lower estimation of their ability Students in the AP course are also

more than twice as likely as control group compliers to report that the course negatively affected

their physical or emotional health (our measure of stress) And comparisons of transcripts reveal

that treatment group students earned lower preweighted grades in science and other subjects

during the year that they took the AP class

Our study contributes to a small research base on the effects of the AP program2 Using a

regression discontinuity design Smith Hurwitz and Avery (2017) show that students who

barely earn a college-credit equivalent score on the AP exam (eg scoring just above the

threshold necessary to receive a 3 on the exam (out of 5) are more likely to complete their

bachelorrsquos degrees in four years than students who fall just below that threshold In a related

paper that relies on the same data and design Avery et al (2018) demonstrate that AP exam

scores also influence studentsrsquo college major choices These compelling results demonstrate that

students take advantage of postsecondary AP credit policies to waive out of intro courses and

that receiving a higher AP exam score may serve as a signal of skill to both institutions and

students These two studies however do not show that AP courses per se led to skill

development as they focus solely on differences in behavior for AP exam-takers who fall just

below and just above the score thresholds Jackson (2010 2014) evaluates the impacts of the AP

Incentive Program which offers cash incentives to teachers and students for passing scores on

AP exams as well as funds for training teachers and convening teams of teachers to align pre-AP

curriculum with the needs of the AP class Jackson identifies impact from variation in the timing

of program implementation across high schools in Texas and finds large positive treatment

effects on AP courses and exams (2010) The AP Incentive program also increased studentsrsquo

college going and persistence as well as their labor market earnings (Jackson 2010 2014) These

two studies indicate that the AP Incentive Program increased AP participation and subsequent

educational attainment and labor market performance However it is not clear whether these

results would hold in the absence of the Incentive Program

We build on these findings and inform policy and practice in several ways Most important

we directly test one of the main mechanisms through which AP is expected to influence studentsrsquo

4

attainment and earnings by increasing their skill and interest in the subject matter We determine

whether skill and interest gains as distinct from college admissions and credit-granting policies

are key drivers behind APrsquos impact on later outcomes This distinction is important given that

less than half of AP course-takers earn a credit-granting score on the AP exam either because

they do not take the exam or because they obtain low scores (National Research Council 2002

College Board 2018) Many selective colleges are also increasingly making it difficult for

students to receive credit for their AP exam scores Most top institutions restrict the number of

AP subject areas that are eligible only offer credit or waivers for very high scores on the exams

or cap the total amount of AP credit that a student can receive (Weinstein 2016) In 2012

Dartmouth College announced that it would no longer grant credit for any AP exam score a

policy shared by several other selective institutions including Amherst College Brown

University and the California Institute of Technology (Weinstein 2016) Our results which

generalize to a newly offered AP course suggest that AP endows students with human capital

even if it does not grant them the opportunity to earn credit at their preferred college For college

admissions officers the findings also suggest that AP course-taking offers a reasonable signal of

studentsrsquo skill and subject-matter interest Our estimated effects on skill and STEM interest are

somewhat limited by insufficient precision yet they represent the first and most credible

evidence to date on the impact of AP on these key outcomes

Our study is also among the first known to us that quantifies the AP impact on studentsrsquo

grades We find that students who take an AP science course earn lower grades in science (by

029 grade points) and lower grades in their other courses (by 018 grade points) The lower

grades in science are driven by the lower grade received in the AP class a negative effect that

many secondary and postsecondary institutions offset by upweighting AP grades The estimates

suggest that studentsrsquo AP science grade would have to be inflated by a factor of 146 (eg a C

would have to be converted to approximately a B+) to remove the net negative on overall grade

point average (GPA) While many high schools including those that participated in our study

weight studentsrsquo GPAs to adjust for the academic difficulty of the courses practices vary

substantially across institutions (Sadler and Tai 2007 Klopfenstein and Lively 2016) In a recent

survey of Texas high schools for instance Klopfenstein and Lively (2016) find that most

schools with AP courses used weights but that they ranged from 05 to 1 point (with a small

number assigning more than 1 extra point) Our findings suggest that the current practices at

many institutions under adjust for the grade penalty from AP courses In addition attaching

weight to AP grades cannot undo the learning loss that may occur when students shift their effort

away from non-AP coursework

We also contribute to other strands of literature on the relationship between studentsrsquo

academic achievement and their perceptions of their own confidence and stress Prior literature

on the relationship between studentsrsquo confidence in their ability and their true ability is rife with

mixed results (Boekaerts and Rozendaal 2010 Stankov and Crawford 1996 Stankov 2013)

Psychologists have also documented an inverted U-shaped relationship between perceived

pressure and performance where some amount of stress is necessary to increase achievement

yet too much stress can reduce studentsrsquo ability to gain knowledge (Anderson 1976 Davis 2014

Yerkes and Dodson 1908) We find that students taking an AP science class experience cognitive

gains concurrent with losses in their academic confidence This finding is consistent with

evidence that many US students are highly confident in their skills and that this noncognitive

belief often interferes with their ability to learn (Chiu and Klassen 2010 Stankov and Lee 2014)

The AP course appears to reduce studentsrsquo estimation of their own ability either by changing the

5

standard to which they compare themselves or by making them more aware of the challenges

they might face in a college course Whether these changes in perceived confidence persist and

how they influence later outcomes is uncertain Students with expectation levels that match the

real demands of college courses might eventually perform better in those courses Some students

might also use the insights they gain from a challenging AP science class to shift away from

difficult science courses in college (or entire majors) that could delay or hinder their college

completion Our results also suggest that AP causes a significant amount of stress for students

but we do not find evidence that the added pressure substantially limits their knowledge gains in

science

II AP Science and Conceptual Framework

A AP and Other Rigorous Secondary School Courses

The AP program is an appealing option for high school administrators who seek to offer college-

level courses to their students AP course descriptions and assignments are designed to match

those offered in introductory college courses in each subject and thus to prepare students for the

rigor of college coursework The College Board (the ldquoBoardrdquo for brevity) is a not-for-profit

organization that administers AP and provides professional development for teachers reviews of

course syllabi and extensive curricular materials (eg sample syllabi sample lab experiments)3

The Board also offers standardized AP exams in the spring of each year that are graded by

external examiners and provide an externally-validated measure of student learning Most exams

include both an essay or problem-solving component and multiple-choice questions all of which

are aligned with the course descriptions The exam is one of the key features of the AP program

and is used by high school and postsecondary educators to evaluate the depth of studentsrsquo skill

independently of teacher bias

In addition to AP courses high school students typically have three alternative options for

advanced coursework Most high schools offer ldquohonorsrdquo courses which are intended to provide a

more rigorous curriculum than the regular course in the same subject The content and rigor of

honors courses varies across high schools and there is no standardized honors exam offered to

students in these courses A second option is the International Baccalaureate (IB) program

which was originally designed for students in international schools and aims to develop

studentsrsquo critical thinking skills and their knowledge of international affairs The IB program is

offered worldwide but remains relatively uncommon in the United States with less than 5 percent

of high schools offering IB in 2016 (The IB Programme 2016) A final option is for students to

take a course at a nearby college (or online) or for some a course that is taught at their high

school by an instructor who has been approved as college-level These ldquodual enrollmentrdquo or

ldquodual creditrdquo courses are meant to provide students with the opportunity to simultaneously earn

high school and college credit In the most recent national survey high schools reported

approximately two million enrollments in dual credit courses (Thomas et al 2013) There is

limited information on the colleges that accept dual enrollment credits Most courses are offered

through collaborations between high schools and local community and public postsecondary

institutions suggesting that credits are generally accepted at these institutions and less often

accepted at other institutions Comparisons of AP science classes to regular and honors level

science classes reveal that students receive much more homework and work harder in their AP

classes (Sadler et al 2014) To our knowledge there have been no comparisons of the workload

or effort in AP science courses compared to IB or dual enrollment science courses

6

B Conceptual Framework

There are several channels through which an AP science class is expected to influence studentsrsquo

cognitive and noncognitive skills Much like the ideal college course AP science is designed to

provide rigorous content and a substantial workload be taught by teachers who have high

expectations and consist of students who are driven to succeed These inputsmdashcourse rigor

teacher expectations and peer motivationmdashare often thought of as the main characteristics that

distinguish AP courses from other high school courses

Yet AP science classes are also intended to offer an inquiry-based approach to science that

when combined with a high level of rigor provides an additional causal pathway to change

Specifically a well-implemented AP science course should encourage students to ask questions

gather and interpret data arrive at explanations grounded in scientific principles and

communicate their observations to one another under the guidance of teachers (College Board

2011a 2011b)4 This student-led inquiry-based approach differs from many traditional

secondary school science classrooms where the goal is often for students to memorize content

and replicate laboratory experiments that demonstrate the content (National Research Council

2002 2012) The AP science course in contrast seeks to expose students to the real-world

practices of science and the skills that form the basis of scientific inquiry by focusing more on

big picture concepts and small group experimentation with students directing the inquiry The

curriculum also encourages teachers to move away from lecture-based pedagogy and multiple-

choice quizzes and to increase their use of technology to help students analyze data draw

interpretations and communicate findings (College Board 2011a 2011b)

AP science classes are expected to increase studentsrsquo ability to ask research questions design

experiments analyze data and draw conclusions In the process of gaining these scientific

inquiry skills the new curriculum is intended to spur greater interest in the practice of science

because it becomes more enjoyable and more accessible to students for whom rote memorization

and execution of prefabricated lab experiments might have diminished enthusiasm in the subject

(National Research Council 2012) Science experts posit that inquiry-based science courses will

be particularly successful in generating greater interest and skill among women and among

students from underrepresented minority groups (Aguilar Walton and Wieman 2014 Ellis

Fosdick and Rasmussen 2016 Kurth Anderson and Palincsar 2002 Leslie et al 2015 Litzler

Samuelson and Lorah 2014)

While the rigor and expectations of a college course may be appropriate for some students it

can be too demanding for others Students often report high levels of stress and burnout from

taking AP courses particularly if they perceive that they are not prepared for the challenge of

college coursework (Kim 2015 Marx 2014 Tucker 2012) A strenuous AP course could in fact

cause students to lose confidence in their ability to complete college science courses A number

of mechanisms could cause students to lose confidence including exposure to stronger peers

inability to successfully complete assignments or simply receiving lower grades than they

received in their non-AP courses5 The AP effect on confidence will likely matter differently for

students with different levels of initial confidence For students who are over-confident in their

ability to succeed in college science courses taking a challenging AP course in high school

might cause them to revise their expectations to be more in line with the higher demands of

college-level work

Taking a more strenuous AP course is also likely to affect studentsrsquo time allocation

Studentsrsquo performance in each class will be determined by their subject-specific ability as well as

the amount of time they devote to their coursework versus other activities including work

7

extracurricular and leisure If AP courses are more demanding than other courses students

solving a time allocation problem may shift more effort into their AP course away from other

pursuits The impact of this change in time allocation on studentsrsquo performance in AP and other

courses will depend upon whether they shift effort away from other courses and on the degree of

complementarity between their AP science course and their other courses Study time devoted to

an AP science course could improve student performance in other math and science classes

(where the skills tasks and knowledge are similar) even if students spend less time on those

courses For courses that require students to perform tasks that are not complementary with AP

science (eg courses in the humanities) taking AP science concurrently with these courses

could decrease student performance in both courses Of course students taking an AP course

could choose to reduce time spent on alternative (non-academic) activities If these other

activities have no causal impact on performance in school then the impact on overall

achievement could be negligible

Some students report concerns about their time allocation as they weigh the decision to enroll

in AP (Foust Hertberg-Davis and Callahan 2009 Hopkins 2012 Kim 2015) Many of these

concerns have increased over time as the courses have become more accessible to students who

previously faced barriers to enrollment Traditionally teachers only recommended AP courses to

students with high grades in prerequisite classes and the courses were only offered in schools

with substantial resources The Board has made efforts to increase access with for instance a

policy statement that encourages schools to open AP to all students who are ldquowilling to accept

the challengerdquo and remove all barriers that restrict access (College Board 2002)6 In a 2008

survey of a nationally-representative sample 65 percent of secondary school teachers reported

that their schools encourage as many students as possible to take AP and 69 percent reported that

AP courses are generally open to any student who wants to enroll (Duffett and Farkas 2009)

These open access policies have led to complaints that students who enroll with less preparation

will be unable to engage in the material (and perhaps become more discouraged by the

difficulty of the course) than students with more prior preparation (Hopkins 2012 Steinberg

2009 Duffett and Farkas 2009) Open access could also adversely affect more prepared students

through negative peer effects or through teachers removing content and slowing the pace of

course delivery

III AP Science Impact Study

A Overview

We recruited 23 schools from across the United States and offered monetary compensation to

pay for equipment and teacher training and as an incentive to secure participation7 Eligible

schools included ones that had not offered AP Biology or AP Chemistry in recent years were

willing to add such a course and comply with study protocol and had more eligible students than

could be served in one class so as to supply a sufficiently-sized control group8 Of the 23

schools 12 schools added AP Chemistry 10 schools added AP Biology and 1 school added both

courses We recruited two waves of schools (those that offered the course for the first time in

2013 and those that offered it for the first time in 2014) both waves were asked to field the

course for two years and the earlier-joining schools had the option of fielding the course for

three years The study includes 47 schools by cohort groups

Each participating school identified students that the school deemed eligible to take the new

AP Biology or Chemistry course in the spring of the prior year We treated all eligible students

8

who assented to participate in the study and who obtained consent from their parent or guardian

as study participants Upon receipt of signed consentassent forms we randomly offered

enrollment in the newly launched course to a subset of participating students9 The study

includes a total of 27 teachers and 1819 students (with an average of approximately 19 students

per AP class)

Figure 1 shows the geographic distribution of the 11 participating districts which are

primarily concentrated in the western southern and eastern regions of the country10 The

underrepresentation of districts in the Midwest is consistent with evidence that the Midwestern

region has experienced less competition over the years in access to selective postsecondary

institutions and a corresponding lag in AP participation rates (Bound Hershbein and Long

2009) Relative to districts across the nation those participating in the study tend to be in

neighborhoods with lower levels of socioeconomic status and to educate students who score

below average on tests in earlier grades (see Figure 2) Correspondingly participating schools

tend to be larger and more likely to educate students who are eligible for free or reduced-price

lunch Black and Hispanic than other schools (Panel A of Table 1)

There are two reasons for this over-representation of larger schools serving less economically

prosperous communities First AP courses are already offered in the majority of the nationrsquos

public high schools and schools that serve students from high-income families tend to offer

more AP subjects than schools that serve students from lower-income families (Malkus 2016

Theokas and Saaris 2013) Given that our research design only allowed for schools that had not

recently offered an AP science course the population of schools from which we recruited tended

to be those in settings with fewer resources Second participating schools were required to state

that they believed they would have 60 or more students who were qualified to take the AP

science course and this requirement tended to disqualify smaller high schools

Reflecting the school demographics participating teachers are slightly younger less

experienced and more likely to be female Black Asian American and of Hispanic ethnicity

than US high school science teachers generally (Panel B of Table 1) Nearly half (a third) of our

study teachers have less than or equal to five (two) years of teaching experience which is more

than double (triple) the rate of US high school science teachers Study teachers are more likely to

hold an undergraduate major in a STEM field than other high school science teachers yet far less

likely to hold a mastersrsquo degree and slightly less likely to have earned a teaching credential in

science Most of the participating teachers had previously taught a higher-level course (mostly

honors) yet only 47 percent of them had previously taught an AP course Our research

consequently applies to a population of teachers who are relatively new to the AP science

curriculum and who have generally not received graduate training11 Assuming AP courses

improve with teacher preparation our results likely capture the effect of a less-than-ideal version

of AP and may result in less positive treatment effects than when AP is delivered by teachers

with more training and experience (Clotfelter Ladd and Vigdor 2010)

B Data and Student Descriptive Statistics

We rely on three primary and secondary data sources for impact estimates The first is an

assessment developed and validated by the research team that measures studentsrsquo scientific

inquiry skills We administered this assessment to students in both treatment and control groups

and designed it to measure general inquiry skills (eg how to analyze data) rather than specific

content knowledge in Biology or Chemistry To that end the assessment tool includes nine items

that rely on science disciplinary knowledge that is taught in middle school specifically material

from Life Sciences and Physical Sciences The assessment which we administered to all study

9

participants during a 45-minute period measures studentsrsquo skills in data analysis scientific

explanation and scientific argument12 Participating teachers were not provided copies of the

instrument in advance therefore teachers were unable to teach any content material prior to test

administration

The second source is a questionnaire that we administered concurrently with the assessment

and that asks students a number of questions about their most recent science class and their plans

after high school The assessment and questionnaire were completed together and administered

outside of class (henceforth we refer to these instruments as the ldquosurveyrdquo) The third data source

are studentsrsquo high school transcripts which contain data on demographic and socioeconomic

background grades courses standardized exams taken in the 8th and 10th grades as well as high

school completion We use these data to determine the balance of randomization on pre-

treatment covariates estimate the effect of randomization on course-taking (including

compliance) improve the precision of our estimates with statistical controls and estimate

treatment effects on studentsrsquo grades

Our survey response rate was 78 percent13 Attrition can be attributed to student absences

during the dates scheduled for survey administration and communication lapses between school

coordinators and students Students who were randomly assigned to treatment have a 9-

percentage point higher survey response rate Given the possibility of nonrandom sample

attrition we weight all regressions by the inverse of the probability of completing the survey

conditional on student characteristics14 We implement a variety of robustness checks as

additional means to account for nonresponse These include multiple imputation of missing

outcome variables excluding one high school that had a low response rate and using the Lee

(2009) technique to provide bounds on the estimated effects These methods and results are

discussed below

We supplement these data with surveys that we administered online to teachers of the new

AP courses at the conclusion of the course The teacher survey includes questions about their

educational background professional experiences and professional development past and

present instructional practices generally and around science specifically participation in the

College Board AP training ability to cover the content of the AP course and coaching

mentoring and other professional community supports received from the school district and

education community

Table 2 provides balancing tests on pre-treatment characteristics for the full sample and the

survey respondents conditional on school by cohort fixed effects15 Most of the estimated

differences between treatment and control group students on pre-treatment observed

characteristics are small with some notable exceptions In both the full and survey samples

treatment group studentsrsquo reading exam scores were 010 and 009 standard deviations higher

than control group students both at p-values below 005 The magnitude of the treatment-control

difference was slightly lower and less precisely-estimated in math yet also favored treatment

group students16 To adjust for these chance imbalances we include all student covariates as

predictors of outcomes in the models and in the robustness checks we exclude these

covariates17

Table 2 also shows the extent of differences between control group compliers and non-

compliers We find that non-compliers are generally much more academically prepared for AP

science they have higher pre-treatment reading and math test scores and are more likely to have

completed the prerequisite courses On demographics non-compliers are more likely to be Asian

American and female18

10

IV Empirical Strategy

We estimate the effect of taking the AP science course with a standard instrumental variable

specification

(1) 119884119894119895 = 120572119895 + 119860119894119895120573 + 119935119894120574 + 120598119894119895

(2) 119860119875119894119895 = 120575119895 + 119874119891119891119890119903119890119889119894119895120579 + 119935119894120583 + 120598119894119895

where 119860119875119894119895 = 1 if student i enrolled in the AP science course in school x cohort stratum j 119860119894119895 is

the fitted value based on the estimates of the parameters in Equation (2) Offeredij = 1 if the

student is randomized into the treatment group Xi is a vector of pre-treatment covariates

(including age math and reading exam scores from 8th and 10th grade (standardized and

averaged for math and reading separately) cumulative GPA prior to the year when the AP

science course was offered and indicator variables for female racial group (Asian American

Black or Hispanic Native American or Multiracial) disability gifted English Language

Learner eligible for free or reduced-price lunch home language is not English and took

recommended prerequisite courses) and 120572119895 and 120575119895 are school by cohort fixed effects19 We use

two-stage least squares to estimate the model for all outcomes The local average treatment effect

(LATE) estimate is given by β

The intent to treat (ITT) estimate is obtained by replacing 119860119894119895 with Offeredij in Equation (1)

as shown in Equation (3) The coefficient on Offeredij in Equation (3) provides the effect of

being offered enrollment in the new AP science course and is a weighted average of effects on

those who do and do not choose to enroll in the course

(3) 119884119894119895 = 120577119895 + 119874119891119891119890119903119890119889119894119895120591 + 119935119894120582 + 120598119894119895

For outcomes that are obtained from the survey we weight regressions by the inverse of the

estimated probability of completing the survey20 The results are similar without using these

weights (see Online Appendix Tables 3 4 and 6) Since we have some missingness in student

characteristics as a result of either missing student transcripts or certain data elements not

collected by the district we use multiple imputation by chained equations creating 10 imputed

datasets and combine the results21 For inference we cluster standard errors at the level of

treatment assignment (school by cohort) in our analysis of main effects In the analysis of

robustness we report permutation standard errors robust standard errors (for comparison to

permutations) and the statistical significance of the LATE estimates after adjusting our tests of

significance for multiple comparisons

V Results

A Course-Taking and Treatment Contrast

Table 3 provides estimated effects of the randomized offer of enrollment on AP science course

enrollment and share of credits in all courses for the full sample and the survey samples The

first-stage estimates indicate that the offer substantially increased the likelihood of the student

taking the AP science course by 38 percentage points in the full sample and 39 percentage points

in the survey sample As we expected compliance with randomization was imperfect with 42

11

percent of the students who received an offer choosing not to enroll and 19 percent of the control

students enrolling Nearly all of these latter crossovers reflected decisions by the district to

violate the study protocol and let control group students into the course while a few of these

came from hardship exemptions that were requested by the school and granted by the study team

The remaining rows in Table 3 shine light on the courses that were crowded out by the newly

offered AP science course Mechanically treatment group students took more credits in AP

science (an 11-percentage point increase in the share of total credits in the full sample)

Treatment group studentsrsquo share of courses in any AP also increased by 11 percentage points

indicating that they chose not to reduce enrollment in other AP courses Instead taking AP

science appears to have crowded out regular courses (down 9 percentage points) including

regular science courses (down 2 percentage points)22

Approximately 78 percent of the control group compliers took any science course with 34

percent taking a non-AP advanced science course (almost entirely honors courses) during the

study year The control students who did not take AP Biology or Chemistry took a variety of

alternative science courses with the most commonly reported courses including Chemistry

(13) Physics (12) AP Environmental Science (11) Biology (10) Honors Biology (9)

and AnatomyPhysiology (9)

Table 4 provides the contrast in treatment and control group complier reports on the content

and rigor of their science courses for three composite variables We find that taking AP science

yielded a substantially more academically challenging curriculum (up 080 sd p-value lt 001)

and raised the extent of inquiry-based classroom activities (up 033 sd p-value = 006) Our

results also suggest that AP course-takerrsquos classrooms were more likely to use technology (up

028 sd p-value = 014)23 Online Appendix Table 5 shows estimated impacts on each of the

component variables used in constructing the composite variables We find that while AP

classrooms were more inquiry-based than other science classrooms using our composite

measure some of the core components of the inquiry approach that were intended by the Board

(eg applying knowledge to solve a new problem) were not more prevalent in AP science

classes than other science classes24 This contrast between studentsrsquo reports of the content and

rigor of their AP science course relative to other courses available to them offers one measure of

the relative quality of the treatment In a companion manuscript we provide a detailed evaluation

of implementation fidelity (the degree to which the courses were implemented as intended by the

Board) through teacher surveys course syllabi student transcripts and interviews with teachers

and school administrators (Long Conger and McGhee 2018) In that manuscript we find results

that are consistent with the finding that most teachers were able to implement a rigorous AP

science classroom yet they also struggled with the inquiry-based approach and integrating

technology into the classroom

These reported differences between treatment and control group classrooms also hold despite

the fact that many of the teachers selected to teach AP also teach the other science courses taken

by control group students In fact almost 67 percent of AP teachers reported using some of their

AP science strategies and lessons in their non-AP classes These within-school spillovers likely

attenuate observed differences in outcomes between treatment and control group students in the

same school25

B AP Impact on Outcomes

Table 5 reports estimated impacts of AP science on the key outcomes of interest We estimate

that for the typical complier taking AP science raises objectively measured scientific inquiry

skills by 023 standard deviations We are unable to rule out zero treatment impacts with

12

conventionally high levels of confidence (p-value = 014) and consequently refer to these results

as more suggestive than definitive AP science also increased compliersrsquo interest in pursuing a

STEM degree should they enroll in college by 9 percentage points up from a control group

complier mean of 62 percent with again more suggestive than definitive results at traditional

levels of statistical inference (p-value = 016)

Table 5 provides stronger evidence of negative treatment effects on studentsrsquo confidence in

their ability to succeed in a college science course Among control group compliers 92 percent

express that they are at least somewhat confident in their ability to succeed in a college science

course These high levels of confidence are perhaps not surprising since all of our sample

participants demonstrated interest in taking AP Chemistry or Biology as a result of signing the

study assent forms Taking AP science substantially lowered participantsrsquo likelihood of being at

least somewhat confident in their ability to complete college courses in science (down 10

percentage points p-value = 006) We also find large effects of the AP course on studentsrsquo self-

reported stress levels Among control group compliers 12 percent stated that their most recent

science class had a negative or strong negative impact on their stress levels (where a negative

impact indicates more stress) Taking AP science more than doubles this rate raising the

likelihood of stating a negative impact by 17 percentage points (p-value = 001) In results

available from the authors we also examine the effect of taking AP on the full distribution of

studentrsquos self-reported confidence and stress levels We find that taking AP science increases

studentsrsquo likelihood of reporting strong negative impacts on stress by 5 percentage points (p-

value = 005) above the control group complier mean of 2 percent

In addition to experiencing a loss in confidence and an increase in stress treatment group

studentsrsquo grades suffered We estimate that taking AP science reduced studentsrsquo grades in their

science courses by 029 points (p-value = 007) Relative to a control group complier mean of

280 taking AP science lowers studentsrsquo science GPAs during the study year (usually their junior

year) from around a B- to a C+26 This decline is addressed to some degree by high schools that

use a weighted grade point average to upweight grades from AP courses The last row of Table 5

provides our estimated effects of AP science on studentsrsquo grades in other courses AP science

takers score approximately 018 grade points lower than control group compliers in non-science

courses during the study year (p-value below 001) These results suggest that students may be

shifting their effort away from their non-AP classes in order to meet the demands of the

challenging AP course An average of these impacts weighted by studentsrsquo share of credits in

science during the study year assuming that they take AP science (024) suggests that taking AP

science lowers studentsrsquo overall grades by 021 during the year ((-029 times 024) + (-018 times

076))

With our estimates in hand we can easily compute the adjustment that would leave the

studentrsquos GPA during the study year unaffected For students who took AP Biology or Chemistry

as result of this experiment the share of their classes in any AP science subject is predicted to be

14 percent (ie 002 + 012 from Table 3) If these studentsrsquo grades in AP science courses were

boosted by 146 (021014) their GPAs during the study year would be unaffected by their

enrollment in these AP courses This 146 boost is close to the higher end of the practices

documented in Klopfenstein and Lively (2016)27

C Robustness Checks

Table 6 presents a variety of robustness checks of the ITT estimates on our six main outcomes

The first two columns of this table repeat the findings previously shown in Table 5 Columns (3)

and (4) present alternate methods for inference Column (3) reports robust standard errors and

13

Column (4) reports the results of a permutation test where we randomly assign a pseudo

treatment and compute the share of 1000 permutations where the absolute value of the estimated

pseudo treatment effect exceeds the absolute value of the estimated treatment effect shown in

Column (2)28 The resulting p-values from this permutation test are similar to the results using

robust standard errors (shown in Column (3)) resulting in five of the six outcomes with p-values

of less than 01029

Columns (5) through (7) of Table 6 show that the results are robust to (a) dropping the one

high school that offered both AP Biology and AP Chemistry as part of the study (b) including

observations with multiply-imputed missing outcome variables and (c) excluding the high

school with the lowest survey response rate30 Column (8) shows the results when we exclude all

of the Xi covariates where we find much larger estimated positive effects on scientific inquiry

skills and smaller estimated negative effects on grades The differences in the treatment effects

on the remaining three outcomes are modest These results likely reflect the fact that students

who were randomly assigned into the treatment group have higher pre-treatment grades and

reading and math test scores all covariates that strongly correlate with science skill and future

grades

Columns (9) through (12) of Table 6 use the Lee (2009) method to place bounds on our

estimates due to potential nonresponse bias in the student survey used for the first four outcomes

This method trims particular observations from the treatment group (in this case) until it matches

the response rate of the control group The lower (upper) bound estimate trims the treatment

observations with the highest (lowest) values of the outcome Using these lower and upper bound

estimates we compute the 95 percent confidence interval for the treatment effect itself by

applying the Imbens and Manski (2004) method Consistent with our main findings the upper

and lower bound points estimates are positive for science skill (003 and 039 sd) interest in

pursuing a STEM degree (2 and 12 percentage points) and stress (1 and 11 percentage points)

However the 95 percent confidence intervals overlap zero in all cases and are roughly double the

size of the ordinary confidence intervals These results suggest that some additional caution

should be considered in evaluating the effects from outcomes based on the study survey31

Finally we would have liked to report the results of theoretically motivated heterogeneity

analyses yet we lack the statistical power needed to test heterogeneity with a high level of

confidence For example Figure 3 shows a quantile regression conditional on Xi with science

skill as the outcome We find that the point estimates at every quantile are insignificantly

different from the 009 ITT point estimate reported in Table 5 yet the 95 confidence intervals

fail to rule out large positives and negatives Additional heterogeneity results can be found in the

Online Appendix32

VI Conclusion

Most admissions committees at bachelorrsquos degree-granting institutions rely on applicantsrsquo AP

course and exam participation as signals of subject-matter skill and interest rendering the

relationship between AP uptake and college enrollment somewhat deterministic There has been

almost no empirical work to support the theory that AP disproportionately endows high school

students with greater human capital than the other courses available to them Many students

educators and parents have also complained that the rigor of the AP pro- gram causes students to

lose confidence gain stress and perform poorly in other courses We evaluate these claims with

experimental evidence on the impact of AP Biology and Chemistry courses on studentsrsquo skills

14

interests and beliefs We recruited 23 schools that had not previously offered AP Biology or

Chemistry and were willing to permit us to randomize student access to the newly offered

course At the time of our school recruitment an estimated 50 percent of US high schools

already offered AP science classes and they tended to be in relatively higher-income

communities disproportionately serving White students (Malkus 2016) Our study drew from the

remaining population of schools where teachers had lower levels of training than science

teachers nationally and students were disproportionately non-White and poor Consequently our

results on AP impacts best generalize to schools like these that are on the cusp of deciding

whether to offer an AP science course

The estimates suggest that AP science led to improvements in science skill and STEM

interest above the courses that these students would otherwise take Prior research points to

longer-run benefits of AP including a higher likelihood of college enrollment and completion as

well as possible earnings gains (Jackson 2010 2014) Our findings suggest that these long-term

effects are at least partially driven by genuine increases in skill and not due solely to

postsecondary admissions and credit-granting policies33 We also find that AP science classes

substantially increase studentsrsquo stress levels and reduce their confidence in completing a college

science course Students who take AP science also receive lower grades in science and in other

(non-science) courses The cognitive gains from AP science are consistent with evidence that

higher levels of pressure and a lower level of confidence cause students to learn more than they

would otherwise And some of the negative effect on grades can be offset by upwardly weighting

grades in advanced courses

Although we have no direct way to convert our study impacts into monetary values for

students or society our evidence suggests that schools and districts are not making unwise or

costly investments in AP Calculating the differential cost to deliver an AP course versus another

level course in the same subject is difficult given that few schools document per-course

expenditures One recent analysis of a US district that relied on teacher salaries and course

assignments offers a partial cost-analysis Roza (2009) finds approximately $360 more in per-

pupil expenditures to deliver AP versus honors due primarily to smaller class sizes and more

senior teachers in AP This cost does not factor in the time that teachers spend retraining

themselves to teach the new curriculum At the same time relative to other policies aimed at

increasing human capital in high school that are often more costly to implement (such as

reducing class size) offering an AP course may be one of the least expensive options

This study offers the first credible estimates on the impact of a curriculum that is now offered

in the majority of the nationrsquos high schools and used by most postsecondary institutions to assess

applicant potential Our findings offer evidence to support and refute some of the claims made

about the AP program At the same time many important questions remain about differential AP

course impacts along student teacher and school attributes and on different parts of the outcome

distributions What are the general equilibrium effects of AP expansion for instance on college

admissions decisions as AP expands into schools with fewer resources Do AP courses generate

spillover effects on non-AP course-takers via changes in peer interactions and changes in how

teachers teach their non-AP classes These are all questions that warrant further research

15

References

Abadie Alberto Susan Athey Guido W Imbens and Jeffrey Wooldridge 2017 ldquoWhen Should

you Adjust Standard Errors for Clusteringrdquo NBER Working Paper No 24003

Cambridge MA NBER

Adelman Clifford 2006 The Toolbox Revisited Paths to Degree Completion from High School

Through College Washington DC US Department of Education

Aguilar Lauren Greg Walton and Carl Wieman 2014 ldquoPsychological Insights for Improved

Physics Teachingrdquo Physics Today 67 (5) 43ndash49

Altonji Joseph G 1995 ldquoThe Effects of High School Curriculum on Education and Labor

Market Outcomesrdquo The Journal of Human Resources 30 (3) 409ndash438

Anderson Carl R 1976 ldquoCoping Behaviors as Intervening Mechanisms in the Inverted-U-stress-

performance Relationshiprdquo Journal of Applied Psychology 61 (1) 30ndash34

Attewell Paul and Thurston Domina 2008 ldquoRaising the Bar Curricular Intensity and

Academic Performancerdquo Educational Evaluation and Policy Analysis 30 (1) 51ndash71

Avery Christopher Oded Gurantz Michael Hurwitz and Jonathan Smith 2018 ldquoShifting

College Majors in Response to Advanced Placement Exam Scoresrdquo Journal of Human

Resources 53 (4) 918ndash956

Benjamini Yoav and Yosef Hochberg 1995 ldquoControlling the False Discovery Rate A Practical

and Powerful Approach to Multiple Testingrdquo Journal of the Royal Statistical Society 57

(1) 289ndash300

Bennett J S Hogarth F Lubben B Campbell and A Robinson 2010 ldquoTalking Science The

Research Evidence on the Use of Small Group Discussions in Science Teachingrdquo

International Journal of Science Education 32 (1) 69ndash95

Berger Joe 2006 ldquoDemoting Advanced Placementrdquo The New York Times October 4

Boekaerts Monique and Jeroen S Rozendaal 2010 ldquoUsing Multiple Calibration Indices in

Order to Capture the Complex Picture of What Affects Studentsrsquo Accuracy of Feeling of

Confidencerdquo Learning and Instruction 20 (5) 372ndash382

Bound John Brad Hershbein and Bridget Terry Long 2009 ldquoPlaying the Admissions Game

Student Reactions to Increasing College Competitionrdquo The Journal of Economic

Perspectives 23 (4) 119ndash146

Bowie Liz 2013 ldquoMaryland Schools have been Leader in Advanced Placement but Results are

Mixedrdquo The Baltimore Sun August 17

Bush George W 2006 ldquoState of the Union Address by the Presidentrdquo Washington DC The

White House

Chiu Ming Ming and Robert M Klassen 2010 ldquoRelations of Mathematics Self-Concept and its

Calibration with Mathematics Achievement Cultural Differences among Fifteen-year-

olds in 34 Countriesrdquo Learning and Instruction 20 (1) 2ndash17

Clotfelter Charles T Helen F Ladd and Jacob L Vigdor 2010 ldquoTeacher Credentials and

Student Achievement in High School Across-Subject Analysis with Student Fixed

Effectsrdquo Journal of Human Resources 45 (3) 655ndash681

College Board 2002 Equity Policy Statement New York NY

__________ 2011a AP Biology Curriculum Framework 2012-2013 New York NY

__________ 2011b AP Chemistry Curriculum Framework 2013-2014 New York NY

__________ 2017a AP Course and Exam Redesign New York NY

__________ 2017b AP Course Audit New York NY

__________ 2018 AP Program Participation and Performance Data 2018 New York NY

16

Davis Jennifer R 2014 ldquoA Little Goes a Long Way Pressure for College Students to Succeedrdquo

Journal of Undergraduate Research 12 (1) 1ndash9

Dobbie Will and Roland G Fryer Jr 2015 ldquoThe medium-term impacts of high-achieving

charter schoolsrdquo Journal of Political Economy 123 (5) 985ndash1037

Dougherty Chrys and Lynn Mellor 2009 ldquoPreparation Mattersrdquo National Center for

Educational Achievement Washington DC

Dounay Zinth Jennifer 2016 ldquo50-State Comparison Advanced Placement Policiesrdquo Education

Commission of the States

Drew Christopher 2011 ldquoRethinking Advanced Placementrdquo The New York Times January 7

Duffett Ann and Steve Farkas 2009 ldquoGrowing Pains in the Advanced Placement Program Do

Tough Trade-offs Lie Aheadrdquo Thomas B Fordham Institute Washington DC

Ellis Jessica Bailey K Fosdick and Chris Rasmussen 2016 ldquoWomen 15 Times More Likely to

Leave STEM Pipeline after Calculus Compared to Men Lack of Mathematical

Confidence a Potential Culpritrdquo PLOS ONE 11 (7) 1ndash14

Foust Regan Clark Holly Hertberg-Davis and Carolyn M Callahan 2009 ldquoStudentsrsquo

Perceptions of the Non-academic Advantages and Disadvantages of Participation in

Advanced Placement Courses and International Baccalaureate Programsrdquo Adolescence

44 (174) 289ndash312

Geiser Saul and Veronica Santelices 2004 ldquoThe Role of Advanced Placement and Honors

Courses in College Admissionsrdquo Center for Studies in Higher Education Research

Occasional Paper Series CSHE404

Goodman Joshua Samuel 2012 ldquoThe Labor of Division Returns to Compulsory Math

Courseworkrdquo Unpublished Manuscript

Harel O 2009 ldquoThe Estimation of R-squared and Adjusted R-squared in Incomplete Data Sets

Using Multiple Imputationrdquo Journal of Applied Statistics 36 (10) 1109ndash1118

Hippel Paul T von 2007 ldquoRegression with Missing Ys An Improved Strategy for Analyzing

Multiply Imputed Datardquo Sociological Methodology 37 (1) 83ndash117

Holstead Michael S Terry E Spradlin Margaret E McGillivray and Nathan Burroughs 2010

ldquoThe Impact of Advanced Placement Incentive Programsrdquo Center for Evaluation and

Education Policy Indiana University Education Policy Brief 8(1)

Hopkins Katy 2012 ldquoWeigh the Benefits Stress of AP Courses for Your Studentrdquo US News

the World Report May 10

Huber Martin 2013 ldquoA Simple Test for the Ignorability of Non-compliance in Experimentsrdquo

Economics Letters 120 (3) 389ndash391

Imbens G and F Manski 2004 ldquoConfidence Intervals for Partially Identified Parametersrdquo

Econometrica 72 (6) 1845ndash1857

Jackson C Kirabo 2010 ldquoA Little Now for a Lot Later A Look at a Texas Advanced Placement

Incentive Programrdquo Journal of Human Resources 45 (3) 591ndash639

__________ 2014 ldquoDo College-Preparatory Programs Improve Long-Term Outcomesrdquo

Economic Inquiry 52 (1) 72ndash99

Joensen Juanna Schroslashter and Helena Skyt Nielsen 2009 ldquoIs there a Causal Effect of High

School Math on Labor Market Outcomesrdquo Journal of Human Resources 44 (1) 171ndash

198

Kim Emily 2015 ldquoAP Classes often Translate to Advanced Pressurerdquo Los Angeles Times

September 22

Klopfenstein Kristin and Kit Lively 2016 ldquoDo Grade Weights Promote More Advanced

17

Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324

Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement

Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891

__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and

Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds

Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188

Cambridge Harvard Education Press

Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla

Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)

287ndash 313

Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on

Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102

Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations

of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347

(6219) 262ndash265

Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math

and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic

Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student

STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher

Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking

on Secondary and Postsecondary Successrdquo American Educational Research Journal 49

(2) 285ndash322

Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP

Expansion Can Schools in Less-Resourced Communities Successfully Implement

Advanced Placement Science Coursesrdquo Conditionally accepted by Educational

Researcher

Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo

American Enterprise Institute Washington DC

Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23

McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy

Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of

Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-

144) US Department of Education Washington DC National Center for Education

Statistics

National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of

Mathematics and Science in US High Schoolsrdquo Washington DC National Academies

Press

__________ 2012 A Framework for K-12 Science Education Practices Crosscutting

Concepts and Core Ideas Washington DC The National Academies Press

Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC

Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data

Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures

Version 10 Stanford University

Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic

Analysis amp Policy 4 (1) 1ndash30

18

Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The

Review of Economics and Statistics 86 (2) 497ndash513

Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)

Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of

Advanced High School Coursework in Increasing STEM Career Interestrdquo Science

Educator 23 (1) 1ndash13

Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework

in College Admission Decisionsrdquo College and University 82 (4) 7ndash14

Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan

Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific

Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo

Educational Measurement Forthcoming

Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where

it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor

Economics 35 (1) 67ndash147

Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An

Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732

Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual

differencesrdquo Personality and Individual Differences 21 (6) 971ndash986

Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of

Cross-Cultural Psychology 45 (5) 821ndash837

Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid

Growthrdquo The New York Times April 29

Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo

Liberal Education 94 (3) 38ndash43

The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo

Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo

Education Trust June 5

Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and

Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-

001) US Department of Education Washington DC National Center for Education

Statistics

Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13

Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate

US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the

Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced

Placement Testsrdquo Washington DC

Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of

Advanced Placementrdquo Progressive Policy Institute Washington DC

West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth

Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring

Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation

and Policy Analysis 38 (1) 148ndash170

Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity

of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482

19

Figure 1

Geographic Distribution of Participating Districts

20

Figure 2

Participating Districts Neighborhood Socioeconomic Status and School Test Scores

Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school

district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos

neighborhood defined as the first principal component factor score based on measures of median

income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed

household rate and unemployment rate Y-axis is the districtrsquos average test score in grade

equivalents based on the averaged spring math and English scores for students in grades 3-8 for

2009-2013 with the expected level of achievement standardized to zero The size of each circle

is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using

Statarsquos default settings and roughly shows the predicted test score as a function of the

neighborhoodrsquos SES

21

Figure 3

Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile

Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects

Corresponding OLS estimate shown by the dashed horizontal line Science skill has been

standardized to have a mean of 0 and SD of 1 for the full sample of participating students

Results are weighted by the inverse probability of completing the survey

22

Table 1

Participating Schools and Teachers Compared to Other US High Schools and High School

Science Teachers Panel A Schools Participating Others

Average Enrollment 1409 723

Free or Reduced-Price Lunch 0700 0438

Asian 0055 0050

Black 0349 0154

Hispanic 0410 0221

White 0164 0537

Adjusted Cohort Graduation Rate 0843 0802

District Instruction Expenditures Per Pupil $6561 $5636

District Student Services Expenditures Per Pupil $3787 $3385

Panel B Teachers Participating Others

Age Under 30 0407 0160

Age 30-49 0432 0553

Age 50 or over 0161 0287

Female 0630 0536

Hispanic or Latino 0111 0051

Race American Indian or Alaska Native 0000 0009

Race Asian American 0111 0041

Race Black 0111 0060

Race Native Hawaiian or other Pacific Islander 0000 0004

Race White 0778 0896

Years of Experience 103 132

Years of Experience lt=2 0290 0085

Years of Experience lt=5 0481 0234

Hold a Teaching Certificate 0926 0945

Undergraduate Major in STEM 0944 0747

Single Subject Credential in Science 0630 0823

Masterrsquos Degree or Higher 0356 0615

Previously Taught AP Course 0469 NA

Previously Taught AP IB or Honors Course 0796 NA

Number of Professional Development Trainings 309 NA

in the Past 5 years (0-5)

Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts

httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public

high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a

9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the

Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey

httpsncesedgovsurveyssass Others in Panel B refers to public and private high school

teachers in the US High school science teachers are defined as teachers of grades 9-12 whose

main teaching assignment is in the natural sciences

23

Table 2

TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics

(1) (2) (3) (4) (5) (6)

Full Sample Survey Sample

Pre-Treatment Characteristic

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Age as of October of 11th Grade 166 -003 -007 166 -001 -001

(002) (007) (003) (009)

[019] [035] [065] [094]

Math Exam Score 038 008 025 044 007 030

(004) (010) (005) (016)

[008] [002] [017] [006]

Reading Exam Score 029 010 018 036 009 017

(003) (012) (004) (017)

[000] [014] [002] [031]

HS Grade Point Average 316 005 020 323 006 013

(003) (008) (003) (010)

[014] [002] [006] [020]

Female 059 000 010 061 -001 011

(003) (006) (004) (007)

[099] [010] [073] [012]

Asian American 012 002 010 012 003 010

(002) (005) (001) (007)

[027] [006] [007] [012]

Black 032 -002 -006 027 000 -005

(002) (006) (002) (005)

[029] [028] [088] [040]

Hispanic Native American or Multiracial 031 001 005 033 001 005

24

(002) (006) (002) (007)

[055] [041] [081] [051]

Disabled 002 000 -001 001 000 -001

(001) (001) (001) (001)

[093] [024] [057] [05]

Gifted 013 003 000 014 002 001

(002) (005) (002) (009)

[006] [100] [025] [089]

English Language Learner 005 001 002 004 001 004

(001) (002) (001) (003)

[041] [039] [054] [022]

Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007

(002) (007) (003) (009)

[066] [077] [072] [045]

Language Other than English Spoken at Home 034 002 003 035 001 004

(002) (007) (002) (007)

[032] [073] [059] [056]

Took Recommended Prerequisite Courses 079 000 009 079 002 005

(002) (004) (002) (005)

[084] [004] [043] [031]

Number of Observations 1819 1417

Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by

School x Cohort are in parentheses and p-values are in brackets

25

Table 3

First Stage Impacts on AP Course Enrollment and Overall Course Enrollment

(1) (2) (3) (4) (5) (6)

Full Sample Survey Respondents

Outcome

Control

Group

Mean

ITT

LATE

Control

Group

Mean

ITT

LATE

AP Treatment Course Enrollment 019 038 024 039

(005) (006)

[000] [000] Share of Credits During Study Year in

AP Science 003 004 011 003 004 010

(001) (001) (001) (001)

[000] [000] [000] [000]

All AP 013 004 011 014 004 010

(001) (002) (001) (002)

[000] [000] [000] [000]

Other Advanced Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [023] [020] [020]

All Other Advanced 025 -001 -003 025 -001 -003

(001) (002) (001) (003)

[023] [023] [030] [030]

Regular Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [020] [024] [019]

All Regular 062 -003 -009 061 -003 -007

(001) (003) (001) (003)

[002] [000] [007] [003]

Number of Observations 1819 1417

Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating

Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation

(1) Course-taking information collected from student transcripts Control Group Mean uses the

full control group for the first outcome (ie AP Treatment Course Enrollment) and those control

group members who complied with their assignment (ie those who did not take the AP

Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are

weighted by the inverse probability of completing the survey Standard errors clustered by School

x Cohort are in parentheses and p-values are in brackets

26

Table 4

Treatment Contrast (Composite Variables)

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Academically Challenging Curriculum -033 031 080

(010) (024)

[000] [000]

Project-Based Independent Classroom

Activities -006 013 033

(007) (017)

[007] [006]

Integrated Use of Technology

-011 011 028

(008) (019)

[019] [014]

Number of Observations 1417

Notes To construct these composite variables we first converted the values on each component

variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest

category was set to 10 the lowest to 00 and the remaining categories evenly spaced between

00 and 10 We then averaged and standardized these converted values Results are weighted by

the inverse probability of completing the survey Online Appendix Table 5 provides the list of

component variables Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

27

Table 5

AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Science Skill -019 009 023

(006) (016)

[015] [014]

STEM Interest 062 004 009

(002) (007)

[016] [016]

Confidence in College

Science 092 -004 -010

(002) (005)

[011] [006]

Stress 012 007 017

(003) (007)

[002] [001]

Grades in Science Courses 280 -012 -029

(007) (016)

[008] [007]

Grades in Other Courses 314 -007 -018

(002) (006)

[000] [000]

Number of Observations 1819 for grades 1417 for other

outcomes

Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of

participating students STEM interest =1 if high or some interest in pursuing a STEM degree or

=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to

complete a college science course or =0 if somewhat not confident or not at all confident Stress=

1 if most recent science course had strong negative or negative impact on physical or emotional

health or =0 if strong positive impact positive impact or no impact Grades in science and other

courses are obtained from student transcripts and measure grades during the study year

Results with the exception of grades during study year are weighted by the inverse probability of

completing the survey Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

Table 6

Robustness Checks of Main ITT Results

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Outcome

Control

Group

Complie

r Mean

Main

Result

s

Robus

t SE

p-value

(permutatio

n test)

Excludin

g High

School

56

Including

Imputatio

n of

Missing

Outcome

Variables

Excluding

Covariate

s

Excludin

g High

School

23

Lee

Lower

Boun

d

Lee

Upper

Boun

d

95

Confidence

Interval

from Lee

Bounds

Rati

o of

95

CI in

(11)

to

95

CI in

(7)

Science Skill -019 009 010 011 020 007 003 039

-

009

05

1 20

(006) (005) (000) (000) (000) (000) (007) (007)

[015] [006] [006] [020] [011] [001] [024] [072] [000]

STEM Interest 062 004 005 003 003 003 002 012

-

003

01

8 19

(002) (003) (000) (000) (000) (000) (003) (004)

[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College

Science 092 -004 -003 -006 -006 -004 -006 005

-

009

01

0 20

(002) (002) (000) (000) (000) (000) (002) (003)

[011] [005] [007] [037] [002] [003] [010] [000] [017]

Stress 012 007 005 006 008 007 001 011

-

005

01

5 16

(003) (002) (000) (000) (000) (000) (003) (002)

[002] [000] [000] [014] [007] [002] [002] [079] [000]

Grades in Science Courses 280 -012 -006 -010 -007 |

(007) (004) (000) (000) (000)

[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts

Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey

(002) (003) (000) (000) (000) |

[000] [001] [001] [000] [001] [038]

Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than

standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby

a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of

43

the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)

reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the

experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply

imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and

where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)

from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those

treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and

control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to

derive confidence interval for the treatment effect itself)

30

1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the

Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the

effect of advanced high school courses more generally often without distinctions between AP

and other rigorous course options Nearly all of these nonexperimental studies find large positive

effects of rigorous secondary school courses particularly those in math and science on studentsrsquo

high school postsecondary and labor market performance (eg Altonji 1995 Attewell and

Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long

Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer

an AP course are encouraged (though not required) to attend a professional development

training The Board and other independent agencies offer several workshops with the most

extensive training being the AP summer institute a week-long training that is led by an

experienced AP instructor Teachers are then expected to develop their syllabi for the course and

submit them to the Board for review A team of auditors at the Board review each syllabus and

grant permission to a school to label the course as AP on course catalogs and student transcripts

once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they

do not meet the requirements upon original submission College Board (2017b) contains a

discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for

assessment (ie course delivery and student performance are not assessed by the Board) In

order to effectively run an AP Biology or Chemistry course teachers require access to a well-

equipped classroom and laboratory including all supplies necessary to engage in

experimentation (eg beakers solutions microscopes measuring equipment) Most of the

teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of

learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry

reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses

were based upon recommendations from the National Science Foundation the National Research

Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently

influenced by their frames of reference in ways that other assessments of these traits (eg

external observations) may be less influenced By increasing the standard to which they compare

themselves studentsrsquo confidence may decrease This feature of most self-assessments could be

considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et

al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome

depends to some extent on how these changes in perceived ability influence other behaviors

such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and

Biology I and Chemistry I for AP Biology with no additional requirements beyond these

prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week

training course classroom supplies (eg lab materials textbooks) and to compensate schools

for the staff time required for study administration efforts We also offered $1000 compensation

for an individual selected by the school to serve as a liaison between the study team and the

31

school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of

students which would have powered the study to detect effect sizes smaller than those detected

here We faced several challenges in recruiting schools to participate even with the monetary

incentives Some schools were uncomfortable with randomization across classrooms while

others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the

course would be offered We also made some assignments on a rolling basis as additional

consentassent forms were submitted We have no information on the students who were deemed

eligible by the school to take the new AP science course but who did not sign the consent form

to participate As these students did not participate we do not have permission to obtain

information on their characteristics (eg via transcripts) and for most schools we do not know

the number of such students 10 Participating districts include Anaheim Union High School District California East Side

Union High School District California Lynwood Unified School District California Jefferson

Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg

Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public

Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville

Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate

degrees held by teachers nationally are likely to be in education (not STEM) Thus the study

teachers are less likely to have a graduate degree but not necessarily less likely to have STEM

training We also did not survey teachers regarding their Teach for America (TFA) experience

but it is possible that the relatively high share of STEM undergraduate degrees could be driven

by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last

pilot test included 140 students) prior to administering the tool to study participants Reliability

metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of

the development of the assessment instrument in the survey can be found in Seeratan et al

(2017) 13 Each year in the spring semester our team administered and collected the participant surveys

during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However

if study participants who did not take the survey differ in unobserved ways then our reweighting

based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo

characteristics before imputation of missing values (as described below) these results are very

similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact

that some students did not have 10th grade scores) we created one reading and math score for

each student that is the average of both scores or just the 8th grade score For the 23 participating

students who were in 10th grade during the year in which the AP course was offered to their

cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be

endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of

enrollment in the course so the lack of balance is simply due to unlucky randomization rather

32

than manipulation by school administrators We considered implementing a randomized block

design to avoid such issues but found it infeasible to obtain the necessary test score information

prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was

allowed to register for the new class We added an entire planning year to our study design to

avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by

Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We

find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these

six outcomes which suggests that generalizing our estimated treatment effects to the full control

group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly

offered enrollment in an AP course and then given the option of Chemistry or Biology To

account for the two courses offered we treat the school as two separate groups School-

Chemistry and School-Biology For those students who were not offered an AP course we

randomly assign them to one of two control groups proportional to the number of treated

students who chose each course For example if 60 of the treated students chose Biology then

we randomly assign 60 of the control students to the School-Biology control group In Section

VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a

probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1

if student i in school by cohort j completed any part of the end-of-year survey Xi is the same

vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed

effects and Φ() is the cumulative normal distribution function The results of this regression are

included in Online Appendix Table 2 Students who had higher pre-treatment grades Black

students those who were not disabled and those who took prerequisite courses were more likely

to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives

more weight in the regression to study participants who completed the survey and yet had pre-

study characteristics that were similar to those study participants who did not complete the

survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and

with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we

observe each outcome variable This follows a multiple imputation then deletion strategy

suggested by Hippel (2007) which improves efficiency while protecting against problematic

imputed outcome values As a robustness check Section VC provides results including

imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually

challenging curriculum with more homework than non-AP complier students Treatment group

students are also more likely to report that the students in their class were driven to succeed and

that the teacher set high standards The AP science class also involved more student-led projects

or experiments hands on learning and small group work all activities that are deemed to be

essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)

Yet we do not find strong evidence that students in AP classes were more likely to present what

they learned apply their knowledge to solve a new problem or work independently and none of

the component measures of technology usage were statistically significantly affected Nor did

33

treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear

better able to implement the academic rigor expected of an AP science class than some of the

inquiry-based approaches that the College Board intends for AP science We do not find

evidence that taking AP science led students to be more likely to report that they found their

course more interesting which may reflect the inability of the teachers to fully implement a

creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects

that might render our estimated effects smaller A research design with randomization both

across and within schools would allow for estimation of spillover effects but such a design was

infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers

received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the

weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors

in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple

outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons

(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same

three outcomes that reach statistical significance without applying the correction (shown in

Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys

from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we

have included the students from cohort 1 of high school number 23 where nonresponse was due

mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes

(including student prior academic preparation raceethnicity gender and teacher preparation)

We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in

science and grades in other courses) Some of the differences in the point estimates were quite

large yet so too were the standard errors For instance five of the seven estimated differential

treatment effects on science skill exceed 025 standard deviations with p-values that fall in the

suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse

on all three cohorts of study participants Once data collection is complete we will have the

ability to examine the effect of AP science on college enrollment college selectivity and college

completion

Page 3: Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown,

3

of AP Consistent with the goals of an AP course treatment group students report that their

courses are more challenging and inquiry-based than control group students These views are

shared by teachers who report a higher level of rigor in their AP science courses compared to

their other science courses We find suggestive evidence that this academic challenge leads to

increases in skill AP course-takers score 023 standard deviations higher than control group

compliers on the end-of-year assessment of scientific skill Though our precision prevents us

from ruling out zero treatment effects at traditional levels of statistical inference (p-value=014)

this large point estimate suggests genuine productivity gains for students who take AP science

over and above the gains experienced by students who enroll in other high school courses We

also find suggestive evidence of an AP science boost to studentsrsquo interest in pursuing a STEM

degree should they enroll in college Together these results fail to support the concern that the

AP programrsquos impact on human capital has been oversold

At the same time our results confirm that the workload and expectations of an AP science

class causes students to lose confidence in their ability to succeed in college-level science gain

stress and earn lower grades (prior to the weights that are often attached to AP grades by

secondary and postsecondary institutions) The confidence levels among study participants are

quite high with 92 percent of control group compliers reporting that they are ldquosomewhatrdquo or

ldquoextremelyrdquo confident in their ability to succeed in a college science course AP course-takers

report a 10-percentage point lower estimation of their ability Students in the AP course are also

more than twice as likely as control group compliers to report that the course negatively affected

their physical or emotional health (our measure of stress) And comparisons of transcripts reveal

that treatment group students earned lower preweighted grades in science and other subjects

during the year that they took the AP class

Our study contributes to a small research base on the effects of the AP program2 Using a

regression discontinuity design Smith Hurwitz and Avery (2017) show that students who

barely earn a college-credit equivalent score on the AP exam (eg scoring just above the

threshold necessary to receive a 3 on the exam (out of 5) are more likely to complete their

bachelorrsquos degrees in four years than students who fall just below that threshold In a related

paper that relies on the same data and design Avery et al (2018) demonstrate that AP exam

scores also influence studentsrsquo college major choices These compelling results demonstrate that

students take advantage of postsecondary AP credit policies to waive out of intro courses and

that receiving a higher AP exam score may serve as a signal of skill to both institutions and

students These two studies however do not show that AP courses per se led to skill

development as they focus solely on differences in behavior for AP exam-takers who fall just

below and just above the score thresholds Jackson (2010 2014) evaluates the impacts of the AP

Incentive Program which offers cash incentives to teachers and students for passing scores on

AP exams as well as funds for training teachers and convening teams of teachers to align pre-AP

curriculum with the needs of the AP class Jackson identifies impact from variation in the timing

of program implementation across high schools in Texas and finds large positive treatment

effects on AP courses and exams (2010) The AP Incentive program also increased studentsrsquo

college going and persistence as well as their labor market earnings (Jackson 2010 2014) These

two studies indicate that the AP Incentive Program increased AP participation and subsequent

educational attainment and labor market performance However it is not clear whether these

results would hold in the absence of the Incentive Program

We build on these findings and inform policy and practice in several ways Most important

we directly test one of the main mechanisms through which AP is expected to influence studentsrsquo

4

attainment and earnings by increasing their skill and interest in the subject matter We determine

whether skill and interest gains as distinct from college admissions and credit-granting policies

are key drivers behind APrsquos impact on later outcomes This distinction is important given that

less than half of AP course-takers earn a credit-granting score on the AP exam either because

they do not take the exam or because they obtain low scores (National Research Council 2002

College Board 2018) Many selective colleges are also increasingly making it difficult for

students to receive credit for their AP exam scores Most top institutions restrict the number of

AP subject areas that are eligible only offer credit or waivers for very high scores on the exams

or cap the total amount of AP credit that a student can receive (Weinstein 2016) In 2012

Dartmouth College announced that it would no longer grant credit for any AP exam score a

policy shared by several other selective institutions including Amherst College Brown

University and the California Institute of Technology (Weinstein 2016) Our results which

generalize to a newly offered AP course suggest that AP endows students with human capital

even if it does not grant them the opportunity to earn credit at their preferred college For college

admissions officers the findings also suggest that AP course-taking offers a reasonable signal of

studentsrsquo skill and subject-matter interest Our estimated effects on skill and STEM interest are

somewhat limited by insufficient precision yet they represent the first and most credible

evidence to date on the impact of AP on these key outcomes

Our study is also among the first known to us that quantifies the AP impact on studentsrsquo

grades We find that students who take an AP science course earn lower grades in science (by

029 grade points) and lower grades in their other courses (by 018 grade points) The lower

grades in science are driven by the lower grade received in the AP class a negative effect that

many secondary and postsecondary institutions offset by upweighting AP grades The estimates

suggest that studentsrsquo AP science grade would have to be inflated by a factor of 146 (eg a C

would have to be converted to approximately a B+) to remove the net negative on overall grade

point average (GPA) While many high schools including those that participated in our study

weight studentsrsquo GPAs to adjust for the academic difficulty of the courses practices vary

substantially across institutions (Sadler and Tai 2007 Klopfenstein and Lively 2016) In a recent

survey of Texas high schools for instance Klopfenstein and Lively (2016) find that most

schools with AP courses used weights but that they ranged from 05 to 1 point (with a small

number assigning more than 1 extra point) Our findings suggest that the current practices at

many institutions under adjust for the grade penalty from AP courses In addition attaching

weight to AP grades cannot undo the learning loss that may occur when students shift their effort

away from non-AP coursework

We also contribute to other strands of literature on the relationship between studentsrsquo

academic achievement and their perceptions of their own confidence and stress Prior literature

on the relationship between studentsrsquo confidence in their ability and their true ability is rife with

mixed results (Boekaerts and Rozendaal 2010 Stankov and Crawford 1996 Stankov 2013)

Psychologists have also documented an inverted U-shaped relationship between perceived

pressure and performance where some amount of stress is necessary to increase achievement

yet too much stress can reduce studentsrsquo ability to gain knowledge (Anderson 1976 Davis 2014

Yerkes and Dodson 1908) We find that students taking an AP science class experience cognitive

gains concurrent with losses in their academic confidence This finding is consistent with

evidence that many US students are highly confident in their skills and that this noncognitive

belief often interferes with their ability to learn (Chiu and Klassen 2010 Stankov and Lee 2014)

The AP course appears to reduce studentsrsquo estimation of their own ability either by changing the

5

standard to which they compare themselves or by making them more aware of the challenges

they might face in a college course Whether these changes in perceived confidence persist and

how they influence later outcomes is uncertain Students with expectation levels that match the

real demands of college courses might eventually perform better in those courses Some students

might also use the insights they gain from a challenging AP science class to shift away from

difficult science courses in college (or entire majors) that could delay or hinder their college

completion Our results also suggest that AP causes a significant amount of stress for students

but we do not find evidence that the added pressure substantially limits their knowledge gains in

science

II AP Science and Conceptual Framework

A AP and Other Rigorous Secondary School Courses

The AP program is an appealing option for high school administrators who seek to offer college-

level courses to their students AP course descriptions and assignments are designed to match

those offered in introductory college courses in each subject and thus to prepare students for the

rigor of college coursework The College Board (the ldquoBoardrdquo for brevity) is a not-for-profit

organization that administers AP and provides professional development for teachers reviews of

course syllabi and extensive curricular materials (eg sample syllabi sample lab experiments)3

The Board also offers standardized AP exams in the spring of each year that are graded by

external examiners and provide an externally-validated measure of student learning Most exams

include both an essay or problem-solving component and multiple-choice questions all of which

are aligned with the course descriptions The exam is one of the key features of the AP program

and is used by high school and postsecondary educators to evaluate the depth of studentsrsquo skill

independently of teacher bias

In addition to AP courses high school students typically have three alternative options for

advanced coursework Most high schools offer ldquohonorsrdquo courses which are intended to provide a

more rigorous curriculum than the regular course in the same subject The content and rigor of

honors courses varies across high schools and there is no standardized honors exam offered to

students in these courses A second option is the International Baccalaureate (IB) program

which was originally designed for students in international schools and aims to develop

studentsrsquo critical thinking skills and their knowledge of international affairs The IB program is

offered worldwide but remains relatively uncommon in the United States with less than 5 percent

of high schools offering IB in 2016 (The IB Programme 2016) A final option is for students to

take a course at a nearby college (or online) or for some a course that is taught at their high

school by an instructor who has been approved as college-level These ldquodual enrollmentrdquo or

ldquodual creditrdquo courses are meant to provide students with the opportunity to simultaneously earn

high school and college credit In the most recent national survey high schools reported

approximately two million enrollments in dual credit courses (Thomas et al 2013) There is

limited information on the colleges that accept dual enrollment credits Most courses are offered

through collaborations between high schools and local community and public postsecondary

institutions suggesting that credits are generally accepted at these institutions and less often

accepted at other institutions Comparisons of AP science classes to regular and honors level

science classes reveal that students receive much more homework and work harder in their AP

classes (Sadler et al 2014) To our knowledge there have been no comparisons of the workload

or effort in AP science courses compared to IB or dual enrollment science courses

6

B Conceptual Framework

There are several channels through which an AP science class is expected to influence studentsrsquo

cognitive and noncognitive skills Much like the ideal college course AP science is designed to

provide rigorous content and a substantial workload be taught by teachers who have high

expectations and consist of students who are driven to succeed These inputsmdashcourse rigor

teacher expectations and peer motivationmdashare often thought of as the main characteristics that

distinguish AP courses from other high school courses

Yet AP science classes are also intended to offer an inquiry-based approach to science that

when combined with a high level of rigor provides an additional causal pathway to change

Specifically a well-implemented AP science course should encourage students to ask questions

gather and interpret data arrive at explanations grounded in scientific principles and

communicate their observations to one another under the guidance of teachers (College Board

2011a 2011b)4 This student-led inquiry-based approach differs from many traditional

secondary school science classrooms where the goal is often for students to memorize content

and replicate laboratory experiments that demonstrate the content (National Research Council

2002 2012) The AP science course in contrast seeks to expose students to the real-world

practices of science and the skills that form the basis of scientific inquiry by focusing more on

big picture concepts and small group experimentation with students directing the inquiry The

curriculum also encourages teachers to move away from lecture-based pedagogy and multiple-

choice quizzes and to increase their use of technology to help students analyze data draw

interpretations and communicate findings (College Board 2011a 2011b)

AP science classes are expected to increase studentsrsquo ability to ask research questions design

experiments analyze data and draw conclusions In the process of gaining these scientific

inquiry skills the new curriculum is intended to spur greater interest in the practice of science

because it becomes more enjoyable and more accessible to students for whom rote memorization

and execution of prefabricated lab experiments might have diminished enthusiasm in the subject

(National Research Council 2012) Science experts posit that inquiry-based science courses will

be particularly successful in generating greater interest and skill among women and among

students from underrepresented minority groups (Aguilar Walton and Wieman 2014 Ellis

Fosdick and Rasmussen 2016 Kurth Anderson and Palincsar 2002 Leslie et al 2015 Litzler

Samuelson and Lorah 2014)

While the rigor and expectations of a college course may be appropriate for some students it

can be too demanding for others Students often report high levels of stress and burnout from

taking AP courses particularly if they perceive that they are not prepared for the challenge of

college coursework (Kim 2015 Marx 2014 Tucker 2012) A strenuous AP course could in fact

cause students to lose confidence in their ability to complete college science courses A number

of mechanisms could cause students to lose confidence including exposure to stronger peers

inability to successfully complete assignments or simply receiving lower grades than they

received in their non-AP courses5 The AP effect on confidence will likely matter differently for

students with different levels of initial confidence For students who are over-confident in their

ability to succeed in college science courses taking a challenging AP course in high school

might cause them to revise their expectations to be more in line with the higher demands of

college-level work

Taking a more strenuous AP course is also likely to affect studentsrsquo time allocation

Studentsrsquo performance in each class will be determined by their subject-specific ability as well as

the amount of time they devote to their coursework versus other activities including work

7

extracurricular and leisure If AP courses are more demanding than other courses students

solving a time allocation problem may shift more effort into their AP course away from other

pursuits The impact of this change in time allocation on studentsrsquo performance in AP and other

courses will depend upon whether they shift effort away from other courses and on the degree of

complementarity between their AP science course and their other courses Study time devoted to

an AP science course could improve student performance in other math and science classes

(where the skills tasks and knowledge are similar) even if students spend less time on those

courses For courses that require students to perform tasks that are not complementary with AP

science (eg courses in the humanities) taking AP science concurrently with these courses

could decrease student performance in both courses Of course students taking an AP course

could choose to reduce time spent on alternative (non-academic) activities If these other

activities have no causal impact on performance in school then the impact on overall

achievement could be negligible

Some students report concerns about their time allocation as they weigh the decision to enroll

in AP (Foust Hertberg-Davis and Callahan 2009 Hopkins 2012 Kim 2015) Many of these

concerns have increased over time as the courses have become more accessible to students who

previously faced barriers to enrollment Traditionally teachers only recommended AP courses to

students with high grades in prerequisite classes and the courses were only offered in schools

with substantial resources The Board has made efforts to increase access with for instance a

policy statement that encourages schools to open AP to all students who are ldquowilling to accept

the challengerdquo and remove all barriers that restrict access (College Board 2002)6 In a 2008

survey of a nationally-representative sample 65 percent of secondary school teachers reported

that their schools encourage as many students as possible to take AP and 69 percent reported that

AP courses are generally open to any student who wants to enroll (Duffett and Farkas 2009)

These open access policies have led to complaints that students who enroll with less preparation

will be unable to engage in the material (and perhaps become more discouraged by the

difficulty of the course) than students with more prior preparation (Hopkins 2012 Steinberg

2009 Duffett and Farkas 2009) Open access could also adversely affect more prepared students

through negative peer effects or through teachers removing content and slowing the pace of

course delivery

III AP Science Impact Study

A Overview

We recruited 23 schools from across the United States and offered monetary compensation to

pay for equipment and teacher training and as an incentive to secure participation7 Eligible

schools included ones that had not offered AP Biology or AP Chemistry in recent years were

willing to add such a course and comply with study protocol and had more eligible students than

could be served in one class so as to supply a sufficiently-sized control group8 Of the 23

schools 12 schools added AP Chemistry 10 schools added AP Biology and 1 school added both

courses We recruited two waves of schools (those that offered the course for the first time in

2013 and those that offered it for the first time in 2014) both waves were asked to field the

course for two years and the earlier-joining schools had the option of fielding the course for

three years The study includes 47 schools by cohort groups

Each participating school identified students that the school deemed eligible to take the new

AP Biology or Chemistry course in the spring of the prior year We treated all eligible students

8

who assented to participate in the study and who obtained consent from their parent or guardian

as study participants Upon receipt of signed consentassent forms we randomly offered

enrollment in the newly launched course to a subset of participating students9 The study

includes a total of 27 teachers and 1819 students (with an average of approximately 19 students

per AP class)

Figure 1 shows the geographic distribution of the 11 participating districts which are

primarily concentrated in the western southern and eastern regions of the country10 The

underrepresentation of districts in the Midwest is consistent with evidence that the Midwestern

region has experienced less competition over the years in access to selective postsecondary

institutions and a corresponding lag in AP participation rates (Bound Hershbein and Long

2009) Relative to districts across the nation those participating in the study tend to be in

neighborhoods with lower levels of socioeconomic status and to educate students who score

below average on tests in earlier grades (see Figure 2) Correspondingly participating schools

tend to be larger and more likely to educate students who are eligible for free or reduced-price

lunch Black and Hispanic than other schools (Panel A of Table 1)

There are two reasons for this over-representation of larger schools serving less economically

prosperous communities First AP courses are already offered in the majority of the nationrsquos

public high schools and schools that serve students from high-income families tend to offer

more AP subjects than schools that serve students from lower-income families (Malkus 2016

Theokas and Saaris 2013) Given that our research design only allowed for schools that had not

recently offered an AP science course the population of schools from which we recruited tended

to be those in settings with fewer resources Second participating schools were required to state

that they believed they would have 60 or more students who were qualified to take the AP

science course and this requirement tended to disqualify smaller high schools

Reflecting the school demographics participating teachers are slightly younger less

experienced and more likely to be female Black Asian American and of Hispanic ethnicity

than US high school science teachers generally (Panel B of Table 1) Nearly half (a third) of our

study teachers have less than or equal to five (two) years of teaching experience which is more

than double (triple) the rate of US high school science teachers Study teachers are more likely to

hold an undergraduate major in a STEM field than other high school science teachers yet far less

likely to hold a mastersrsquo degree and slightly less likely to have earned a teaching credential in

science Most of the participating teachers had previously taught a higher-level course (mostly

honors) yet only 47 percent of them had previously taught an AP course Our research

consequently applies to a population of teachers who are relatively new to the AP science

curriculum and who have generally not received graduate training11 Assuming AP courses

improve with teacher preparation our results likely capture the effect of a less-than-ideal version

of AP and may result in less positive treatment effects than when AP is delivered by teachers

with more training and experience (Clotfelter Ladd and Vigdor 2010)

B Data and Student Descriptive Statistics

We rely on three primary and secondary data sources for impact estimates The first is an

assessment developed and validated by the research team that measures studentsrsquo scientific

inquiry skills We administered this assessment to students in both treatment and control groups

and designed it to measure general inquiry skills (eg how to analyze data) rather than specific

content knowledge in Biology or Chemistry To that end the assessment tool includes nine items

that rely on science disciplinary knowledge that is taught in middle school specifically material

from Life Sciences and Physical Sciences The assessment which we administered to all study

9

participants during a 45-minute period measures studentsrsquo skills in data analysis scientific

explanation and scientific argument12 Participating teachers were not provided copies of the

instrument in advance therefore teachers were unable to teach any content material prior to test

administration

The second source is a questionnaire that we administered concurrently with the assessment

and that asks students a number of questions about their most recent science class and their plans

after high school The assessment and questionnaire were completed together and administered

outside of class (henceforth we refer to these instruments as the ldquosurveyrdquo) The third data source

are studentsrsquo high school transcripts which contain data on demographic and socioeconomic

background grades courses standardized exams taken in the 8th and 10th grades as well as high

school completion We use these data to determine the balance of randomization on pre-

treatment covariates estimate the effect of randomization on course-taking (including

compliance) improve the precision of our estimates with statistical controls and estimate

treatment effects on studentsrsquo grades

Our survey response rate was 78 percent13 Attrition can be attributed to student absences

during the dates scheduled for survey administration and communication lapses between school

coordinators and students Students who were randomly assigned to treatment have a 9-

percentage point higher survey response rate Given the possibility of nonrandom sample

attrition we weight all regressions by the inverse of the probability of completing the survey

conditional on student characteristics14 We implement a variety of robustness checks as

additional means to account for nonresponse These include multiple imputation of missing

outcome variables excluding one high school that had a low response rate and using the Lee

(2009) technique to provide bounds on the estimated effects These methods and results are

discussed below

We supplement these data with surveys that we administered online to teachers of the new

AP courses at the conclusion of the course The teacher survey includes questions about their

educational background professional experiences and professional development past and

present instructional practices generally and around science specifically participation in the

College Board AP training ability to cover the content of the AP course and coaching

mentoring and other professional community supports received from the school district and

education community

Table 2 provides balancing tests on pre-treatment characteristics for the full sample and the

survey respondents conditional on school by cohort fixed effects15 Most of the estimated

differences between treatment and control group students on pre-treatment observed

characteristics are small with some notable exceptions In both the full and survey samples

treatment group studentsrsquo reading exam scores were 010 and 009 standard deviations higher

than control group students both at p-values below 005 The magnitude of the treatment-control

difference was slightly lower and less precisely-estimated in math yet also favored treatment

group students16 To adjust for these chance imbalances we include all student covariates as

predictors of outcomes in the models and in the robustness checks we exclude these

covariates17

Table 2 also shows the extent of differences between control group compliers and non-

compliers We find that non-compliers are generally much more academically prepared for AP

science they have higher pre-treatment reading and math test scores and are more likely to have

completed the prerequisite courses On demographics non-compliers are more likely to be Asian

American and female18

10

IV Empirical Strategy

We estimate the effect of taking the AP science course with a standard instrumental variable

specification

(1) 119884119894119895 = 120572119895 + 119860119894119895120573 + 119935119894120574 + 120598119894119895

(2) 119860119875119894119895 = 120575119895 + 119874119891119891119890119903119890119889119894119895120579 + 119935119894120583 + 120598119894119895

where 119860119875119894119895 = 1 if student i enrolled in the AP science course in school x cohort stratum j 119860119894119895 is

the fitted value based on the estimates of the parameters in Equation (2) Offeredij = 1 if the

student is randomized into the treatment group Xi is a vector of pre-treatment covariates

(including age math and reading exam scores from 8th and 10th grade (standardized and

averaged for math and reading separately) cumulative GPA prior to the year when the AP

science course was offered and indicator variables for female racial group (Asian American

Black or Hispanic Native American or Multiracial) disability gifted English Language

Learner eligible for free or reduced-price lunch home language is not English and took

recommended prerequisite courses) and 120572119895 and 120575119895 are school by cohort fixed effects19 We use

two-stage least squares to estimate the model for all outcomes The local average treatment effect

(LATE) estimate is given by β

The intent to treat (ITT) estimate is obtained by replacing 119860119894119895 with Offeredij in Equation (1)

as shown in Equation (3) The coefficient on Offeredij in Equation (3) provides the effect of

being offered enrollment in the new AP science course and is a weighted average of effects on

those who do and do not choose to enroll in the course

(3) 119884119894119895 = 120577119895 + 119874119891119891119890119903119890119889119894119895120591 + 119935119894120582 + 120598119894119895

For outcomes that are obtained from the survey we weight regressions by the inverse of the

estimated probability of completing the survey20 The results are similar without using these

weights (see Online Appendix Tables 3 4 and 6) Since we have some missingness in student

characteristics as a result of either missing student transcripts or certain data elements not

collected by the district we use multiple imputation by chained equations creating 10 imputed

datasets and combine the results21 For inference we cluster standard errors at the level of

treatment assignment (school by cohort) in our analysis of main effects In the analysis of

robustness we report permutation standard errors robust standard errors (for comparison to

permutations) and the statistical significance of the LATE estimates after adjusting our tests of

significance for multiple comparisons

V Results

A Course-Taking and Treatment Contrast

Table 3 provides estimated effects of the randomized offer of enrollment on AP science course

enrollment and share of credits in all courses for the full sample and the survey samples The

first-stage estimates indicate that the offer substantially increased the likelihood of the student

taking the AP science course by 38 percentage points in the full sample and 39 percentage points

in the survey sample As we expected compliance with randomization was imperfect with 42

11

percent of the students who received an offer choosing not to enroll and 19 percent of the control

students enrolling Nearly all of these latter crossovers reflected decisions by the district to

violate the study protocol and let control group students into the course while a few of these

came from hardship exemptions that were requested by the school and granted by the study team

The remaining rows in Table 3 shine light on the courses that were crowded out by the newly

offered AP science course Mechanically treatment group students took more credits in AP

science (an 11-percentage point increase in the share of total credits in the full sample)

Treatment group studentsrsquo share of courses in any AP also increased by 11 percentage points

indicating that they chose not to reduce enrollment in other AP courses Instead taking AP

science appears to have crowded out regular courses (down 9 percentage points) including

regular science courses (down 2 percentage points)22

Approximately 78 percent of the control group compliers took any science course with 34

percent taking a non-AP advanced science course (almost entirely honors courses) during the

study year The control students who did not take AP Biology or Chemistry took a variety of

alternative science courses with the most commonly reported courses including Chemistry

(13) Physics (12) AP Environmental Science (11) Biology (10) Honors Biology (9)

and AnatomyPhysiology (9)

Table 4 provides the contrast in treatment and control group complier reports on the content

and rigor of their science courses for three composite variables We find that taking AP science

yielded a substantially more academically challenging curriculum (up 080 sd p-value lt 001)

and raised the extent of inquiry-based classroom activities (up 033 sd p-value = 006) Our

results also suggest that AP course-takerrsquos classrooms were more likely to use technology (up

028 sd p-value = 014)23 Online Appendix Table 5 shows estimated impacts on each of the

component variables used in constructing the composite variables We find that while AP

classrooms were more inquiry-based than other science classrooms using our composite

measure some of the core components of the inquiry approach that were intended by the Board

(eg applying knowledge to solve a new problem) were not more prevalent in AP science

classes than other science classes24 This contrast between studentsrsquo reports of the content and

rigor of their AP science course relative to other courses available to them offers one measure of

the relative quality of the treatment In a companion manuscript we provide a detailed evaluation

of implementation fidelity (the degree to which the courses were implemented as intended by the

Board) through teacher surveys course syllabi student transcripts and interviews with teachers

and school administrators (Long Conger and McGhee 2018) In that manuscript we find results

that are consistent with the finding that most teachers were able to implement a rigorous AP

science classroom yet they also struggled with the inquiry-based approach and integrating

technology into the classroom

These reported differences between treatment and control group classrooms also hold despite

the fact that many of the teachers selected to teach AP also teach the other science courses taken

by control group students In fact almost 67 percent of AP teachers reported using some of their

AP science strategies and lessons in their non-AP classes These within-school spillovers likely

attenuate observed differences in outcomes between treatment and control group students in the

same school25

B AP Impact on Outcomes

Table 5 reports estimated impacts of AP science on the key outcomes of interest We estimate

that for the typical complier taking AP science raises objectively measured scientific inquiry

skills by 023 standard deviations We are unable to rule out zero treatment impacts with

12

conventionally high levels of confidence (p-value = 014) and consequently refer to these results

as more suggestive than definitive AP science also increased compliersrsquo interest in pursuing a

STEM degree should they enroll in college by 9 percentage points up from a control group

complier mean of 62 percent with again more suggestive than definitive results at traditional

levels of statistical inference (p-value = 016)

Table 5 provides stronger evidence of negative treatment effects on studentsrsquo confidence in

their ability to succeed in a college science course Among control group compliers 92 percent

express that they are at least somewhat confident in their ability to succeed in a college science

course These high levels of confidence are perhaps not surprising since all of our sample

participants demonstrated interest in taking AP Chemistry or Biology as a result of signing the

study assent forms Taking AP science substantially lowered participantsrsquo likelihood of being at

least somewhat confident in their ability to complete college courses in science (down 10

percentage points p-value = 006) We also find large effects of the AP course on studentsrsquo self-

reported stress levels Among control group compliers 12 percent stated that their most recent

science class had a negative or strong negative impact on their stress levels (where a negative

impact indicates more stress) Taking AP science more than doubles this rate raising the

likelihood of stating a negative impact by 17 percentage points (p-value = 001) In results

available from the authors we also examine the effect of taking AP on the full distribution of

studentrsquos self-reported confidence and stress levels We find that taking AP science increases

studentsrsquo likelihood of reporting strong negative impacts on stress by 5 percentage points (p-

value = 005) above the control group complier mean of 2 percent

In addition to experiencing a loss in confidence and an increase in stress treatment group

studentsrsquo grades suffered We estimate that taking AP science reduced studentsrsquo grades in their

science courses by 029 points (p-value = 007) Relative to a control group complier mean of

280 taking AP science lowers studentsrsquo science GPAs during the study year (usually their junior

year) from around a B- to a C+26 This decline is addressed to some degree by high schools that

use a weighted grade point average to upweight grades from AP courses The last row of Table 5

provides our estimated effects of AP science on studentsrsquo grades in other courses AP science

takers score approximately 018 grade points lower than control group compliers in non-science

courses during the study year (p-value below 001) These results suggest that students may be

shifting their effort away from their non-AP classes in order to meet the demands of the

challenging AP course An average of these impacts weighted by studentsrsquo share of credits in

science during the study year assuming that they take AP science (024) suggests that taking AP

science lowers studentsrsquo overall grades by 021 during the year ((-029 times 024) + (-018 times

076))

With our estimates in hand we can easily compute the adjustment that would leave the

studentrsquos GPA during the study year unaffected For students who took AP Biology or Chemistry

as result of this experiment the share of their classes in any AP science subject is predicted to be

14 percent (ie 002 + 012 from Table 3) If these studentsrsquo grades in AP science courses were

boosted by 146 (021014) their GPAs during the study year would be unaffected by their

enrollment in these AP courses This 146 boost is close to the higher end of the practices

documented in Klopfenstein and Lively (2016)27

C Robustness Checks

Table 6 presents a variety of robustness checks of the ITT estimates on our six main outcomes

The first two columns of this table repeat the findings previously shown in Table 5 Columns (3)

and (4) present alternate methods for inference Column (3) reports robust standard errors and

13

Column (4) reports the results of a permutation test where we randomly assign a pseudo

treatment and compute the share of 1000 permutations where the absolute value of the estimated

pseudo treatment effect exceeds the absolute value of the estimated treatment effect shown in

Column (2)28 The resulting p-values from this permutation test are similar to the results using

robust standard errors (shown in Column (3)) resulting in five of the six outcomes with p-values

of less than 01029

Columns (5) through (7) of Table 6 show that the results are robust to (a) dropping the one

high school that offered both AP Biology and AP Chemistry as part of the study (b) including

observations with multiply-imputed missing outcome variables and (c) excluding the high

school with the lowest survey response rate30 Column (8) shows the results when we exclude all

of the Xi covariates where we find much larger estimated positive effects on scientific inquiry

skills and smaller estimated negative effects on grades The differences in the treatment effects

on the remaining three outcomes are modest These results likely reflect the fact that students

who were randomly assigned into the treatment group have higher pre-treatment grades and

reading and math test scores all covariates that strongly correlate with science skill and future

grades

Columns (9) through (12) of Table 6 use the Lee (2009) method to place bounds on our

estimates due to potential nonresponse bias in the student survey used for the first four outcomes

This method trims particular observations from the treatment group (in this case) until it matches

the response rate of the control group The lower (upper) bound estimate trims the treatment

observations with the highest (lowest) values of the outcome Using these lower and upper bound

estimates we compute the 95 percent confidence interval for the treatment effect itself by

applying the Imbens and Manski (2004) method Consistent with our main findings the upper

and lower bound points estimates are positive for science skill (003 and 039 sd) interest in

pursuing a STEM degree (2 and 12 percentage points) and stress (1 and 11 percentage points)

However the 95 percent confidence intervals overlap zero in all cases and are roughly double the

size of the ordinary confidence intervals These results suggest that some additional caution

should be considered in evaluating the effects from outcomes based on the study survey31

Finally we would have liked to report the results of theoretically motivated heterogeneity

analyses yet we lack the statistical power needed to test heterogeneity with a high level of

confidence For example Figure 3 shows a quantile regression conditional on Xi with science

skill as the outcome We find that the point estimates at every quantile are insignificantly

different from the 009 ITT point estimate reported in Table 5 yet the 95 confidence intervals

fail to rule out large positives and negatives Additional heterogeneity results can be found in the

Online Appendix32

VI Conclusion

Most admissions committees at bachelorrsquos degree-granting institutions rely on applicantsrsquo AP

course and exam participation as signals of subject-matter skill and interest rendering the

relationship between AP uptake and college enrollment somewhat deterministic There has been

almost no empirical work to support the theory that AP disproportionately endows high school

students with greater human capital than the other courses available to them Many students

educators and parents have also complained that the rigor of the AP pro- gram causes students to

lose confidence gain stress and perform poorly in other courses We evaluate these claims with

experimental evidence on the impact of AP Biology and Chemistry courses on studentsrsquo skills

14

interests and beliefs We recruited 23 schools that had not previously offered AP Biology or

Chemistry and were willing to permit us to randomize student access to the newly offered

course At the time of our school recruitment an estimated 50 percent of US high schools

already offered AP science classes and they tended to be in relatively higher-income

communities disproportionately serving White students (Malkus 2016) Our study drew from the

remaining population of schools where teachers had lower levels of training than science

teachers nationally and students were disproportionately non-White and poor Consequently our

results on AP impacts best generalize to schools like these that are on the cusp of deciding

whether to offer an AP science course

The estimates suggest that AP science led to improvements in science skill and STEM

interest above the courses that these students would otherwise take Prior research points to

longer-run benefits of AP including a higher likelihood of college enrollment and completion as

well as possible earnings gains (Jackson 2010 2014) Our findings suggest that these long-term

effects are at least partially driven by genuine increases in skill and not due solely to

postsecondary admissions and credit-granting policies33 We also find that AP science classes

substantially increase studentsrsquo stress levels and reduce their confidence in completing a college

science course Students who take AP science also receive lower grades in science and in other

(non-science) courses The cognitive gains from AP science are consistent with evidence that

higher levels of pressure and a lower level of confidence cause students to learn more than they

would otherwise And some of the negative effect on grades can be offset by upwardly weighting

grades in advanced courses

Although we have no direct way to convert our study impacts into monetary values for

students or society our evidence suggests that schools and districts are not making unwise or

costly investments in AP Calculating the differential cost to deliver an AP course versus another

level course in the same subject is difficult given that few schools document per-course

expenditures One recent analysis of a US district that relied on teacher salaries and course

assignments offers a partial cost-analysis Roza (2009) finds approximately $360 more in per-

pupil expenditures to deliver AP versus honors due primarily to smaller class sizes and more

senior teachers in AP This cost does not factor in the time that teachers spend retraining

themselves to teach the new curriculum At the same time relative to other policies aimed at

increasing human capital in high school that are often more costly to implement (such as

reducing class size) offering an AP course may be one of the least expensive options

This study offers the first credible estimates on the impact of a curriculum that is now offered

in the majority of the nationrsquos high schools and used by most postsecondary institutions to assess

applicant potential Our findings offer evidence to support and refute some of the claims made

about the AP program At the same time many important questions remain about differential AP

course impacts along student teacher and school attributes and on different parts of the outcome

distributions What are the general equilibrium effects of AP expansion for instance on college

admissions decisions as AP expands into schools with fewer resources Do AP courses generate

spillover effects on non-AP course-takers via changes in peer interactions and changes in how

teachers teach their non-AP classes These are all questions that warrant further research

15

References

Abadie Alberto Susan Athey Guido W Imbens and Jeffrey Wooldridge 2017 ldquoWhen Should

you Adjust Standard Errors for Clusteringrdquo NBER Working Paper No 24003

Cambridge MA NBER

Adelman Clifford 2006 The Toolbox Revisited Paths to Degree Completion from High School

Through College Washington DC US Department of Education

Aguilar Lauren Greg Walton and Carl Wieman 2014 ldquoPsychological Insights for Improved

Physics Teachingrdquo Physics Today 67 (5) 43ndash49

Altonji Joseph G 1995 ldquoThe Effects of High School Curriculum on Education and Labor

Market Outcomesrdquo The Journal of Human Resources 30 (3) 409ndash438

Anderson Carl R 1976 ldquoCoping Behaviors as Intervening Mechanisms in the Inverted-U-stress-

performance Relationshiprdquo Journal of Applied Psychology 61 (1) 30ndash34

Attewell Paul and Thurston Domina 2008 ldquoRaising the Bar Curricular Intensity and

Academic Performancerdquo Educational Evaluation and Policy Analysis 30 (1) 51ndash71

Avery Christopher Oded Gurantz Michael Hurwitz and Jonathan Smith 2018 ldquoShifting

College Majors in Response to Advanced Placement Exam Scoresrdquo Journal of Human

Resources 53 (4) 918ndash956

Benjamini Yoav and Yosef Hochberg 1995 ldquoControlling the False Discovery Rate A Practical

and Powerful Approach to Multiple Testingrdquo Journal of the Royal Statistical Society 57

(1) 289ndash300

Bennett J S Hogarth F Lubben B Campbell and A Robinson 2010 ldquoTalking Science The

Research Evidence on the Use of Small Group Discussions in Science Teachingrdquo

International Journal of Science Education 32 (1) 69ndash95

Berger Joe 2006 ldquoDemoting Advanced Placementrdquo The New York Times October 4

Boekaerts Monique and Jeroen S Rozendaal 2010 ldquoUsing Multiple Calibration Indices in

Order to Capture the Complex Picture of What Affects Studentsrsquo Accuracy of Feeling of

Confidencerdquo Learning and Instruction 20 (5) 372ndash382

Bound John Brad Hershbein and Bridget Terry Long 2009 ldquoPlaying the Admissions Game

Student Reactions to Increasing College Competitionrdquo The Journal of Economic

Perspectives 23 (4) 119ndash146

Bowie Liz 2013 ldquoMaryland Schools have been Leader in Advanced Placement but Results are

Mixedrdquo The Baltimore Sun August 17

Bush George W 2006 ldquoState of the Union Address by the Presidentrdquo Washington DC The

White House

Chiu Ming Ming and Robert M Klassen 2010 ldquoRelations of Mathematics Self-Concept and its

Calibration with Mathematics Achievement Cultural Differences among Fifteen-year-

olds in 34 Countriesrdquo Learning and Instruction 20 (1) 2ndash17

Clotfelter Charles T Helen F Ladd and Jacob L Vigdor 2010 ldquoTeacher Credentials and

Student Achievement in High School Across-Subject Analysis with Student Fixed

Effectsrdquo Journal of Human Resources 45 (3) 655ndash681

College Board 2002 Equity Policy Statement New York NY

__________ 2011a AP Biology Curriculum Framework 2012-2013 New York NY

__________ 2011b AP Chemistry Curriculum Framework 2013-2014 New York NY

__________ 2017a AP Course and Exam Redesign New York NY

__________ 2017b AP Course Audit New York NY

__________ 2018 AP Program Participation and Performance Data 2018 New York NY

16

Davis Jennifer R 2014 ldquoA Little Goes a Long Way Pressure for College Students to Succeedrdquo

Journal of Undergraduate Research 12 (1) 1ndash9

Dobbie Will and Roland G Fryer Jr 2015 ldquoThe medium-term impacts of high-achieving

charter schoolsrdquo Journal of Political Economy 123 (5) 985ndash1037

Dougherty Chrys and Lynn Mellor 2009 ldquoPreparation Mattersrdquo National Center for

Educational Achievement Washington DC

Dounay Zinth Jennifer 2016 ldquo50-State Comparison Advanced Placement Policiesrdquo Education

Commission of the States

Drew Christopher 2011 ldquoRethinking Advanced Placementrdquo The New York Times January 7

Duffett Ann and Steve Farkas 2009 ldquoGrowing Pains in the Advanced Placement Program Do

Tough Trade-offs Lie Aheadrdquo Thomas B Fordham Institute Washington DC

Ellis Jessica Bailey K Fosdick and Chris Rasmussen 2016 ldquoWomen 15 Times More Likely to

Leave STEM Pipeline after Calculus Compared to Men Lack of Mathematical

Confidence a Potential Culpritrdquo PLOS ONE 11 (7) 1ndash14

Foust Regan Clark Holly Hertberg-Davis and Carolyn M Callahan 2009 ldquoStudentsrsquo

Perceptions of the Non-academic Advantages and Disadvantages of Participation in

Advanced Placement Courses and International Baccalaureate Programsrdquo Adolescence

44 (174) 289ndash312

Geiser Saul and Veronica Santelices 2004 ldquoThe Role of Advanced Placement and Honors

Courses in College Admissionsrdquo Center for Studies in Higher Education Research

Occasional Paper Series CSHE404

Goodman Joshua Samuel 2012 ldquoThe Labor of Division Returns to Compulsory Math

Courseworkrdquo Unpublished Manuscript

Harel O 2009 ldquoThe Estimation of R-squared and Adjusted R-squared in Incomplete Data Sets

Using Multiple Imputationrdquo Journal of Applied Statistics 36 (10) 1109ndash1118

Hippel Paul T von 2007 ldquoRegression with Missing Ys An Improved Strategy for Analyzing

Multiply Imputed Datardquo Sociological Methodology 37 (1) 83ndash117

Holstead Michael S Terry E Spradlin Margaret E McGillivray and Nathan Burroughs 2010

ldquoThe Impact of Advanced Placement Incentive Programsrdquo Center for Evaluation and

Education Policy Indiana University Education Policy Brief 8(1)

Hopkins Katy 2012 ldquoWeigh the Benefits Stress of AP Courses for Your Studentrdquo US News

the World Report May 10

Huber Martin 2013 ldquoA Simple Test for the Ignorability of Non-compliance in Experimentsrdquo

Economics Letters 120 (3) 389ndash391

Imbens G and F Manski 2004 ldquoConfidence Intervals for Partially Identified Parametersrdquo

Econometrica 72 (6) 1845ndash1857

Jackson C Kirabo 2010 ldquoA Little Now for a Lot Later A Look at a Texas Advanced Placement

Incentive Programrdquo Journal of Human Resources 45 (3) 591ndash639

__________ 2014 ldquoDo College-Preparatory Programs Improve Long-Term Outcomesrdquo

Economic Inquiry 52 (1) 72ndash99

Joensen Juanna Schroslashter and Helena Skyt Nielsen 2009 ldquoIs there a Causal Effect of High

School Math on Labor Market Outcomesrdquo Journal of Human Resources 44 (1) 171ndash

198

Kim Emily 2015 ldquoAP Classes often Translate to Advanced Pressurerdquo Los Angeles Times

September 22

Klopfenstein Kristin and Kit Lively 2016 ldquoDo Grade Weights Promote More Advanced

17

Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324

Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement

Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891

__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and

Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds

Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188

Cambridge Harvard Education Press

Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla

Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)

287ndash 313

Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on

Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102

Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations

of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347

(6219) 262ndash265

Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math

and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic

Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student

STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher

Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking

on Secondary and Postsecondary Successrdquo American Educational Research Journal 49

(2) 285ndash322

Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP

Expansion Can Schools in Less-Resourced Communities Successfully Implement

Advanced Placement Science Coursesrdquo Conditionally accepted by Educational

Researcher

Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo

American Enterprise Institute Washington DC

Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23

McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy

Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of

Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-

144) US Department of Education Washington DC National Center for Education

Statistics

National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of

Mathematics and Science in US High Schoolsrdquo Washington DC National Academies

Press

__________ 2012 A Framework for K-12 Science Education Practices Crosscutting

Concepts and Core Ideas Washington DC The National Academies Press

Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC

Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data

Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures

Version 10 Stanford University

Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic

Analysis amp Policy 4 (1) 1ndash30

18

Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The

Review of Economics and Statistics 86 (2) 497ndash513

Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)

Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of

Advanced High School Coursework in Increasing STEM Career Interestrdquo Science

Educator 23 (1) 1ndash13

Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework

in College Admission Decisionsrdquo College and University 82 (4) 7ndash14

Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan

Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific

Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo

Educational Measurement Forthcoming

Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where

it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor

Economics 35 (1) 67ndash147

Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An

Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732

Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual

differencesrdquo Personality and Individual Differences 21 (6) 971ndash986

Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of

Cross-Cultural Psychology 45 (5) 821ndash837

Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid

Growthrdquo The New York Times April 29

Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo

Liberal Education 94 (3) 38ndash43

The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo

Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo

Education Trust June 5

Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and

Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-

001) US Department of Education Washington DC National Center for Education

Statistics

Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13

Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate

US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the

Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced

Placement Testsrdquo Washington DC

Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of

Advanced Placementrdquo Progressive Policy Institute Washington DC

West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth

Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring

Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation

and Policy Analysis 38 (1) 148ndash170

Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity

of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482

19

Figure 1

Geographic Distribution of Participating Districts

20

Figure 2

Participating Districts Neighborhood Socioeconomic Status and School Test Scores

Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school

district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos

neighborhood defined as the first principal component factor score based on measures of median

income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed

household rate and unemployment rate Y-axis is the districtrsquos average test score in grade

equivalents based on the averaged spring math and English scores for students in grades 3-8 for

2009-2013 with the expected level of achievement standardized to zero The size of each circle

is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using

Statarsquos default settings and roughly shows the predicted test score as a function of the

neighborhoodrsquos SES

21

Figure 3

Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile

Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects

Corresponding OLS estimate shown by the dashed horizontal line Science skill has been

standardized to have a mean of 0 and SD of 1 for the full sample of participating students

Results are weighted by the inverse probability of completing the survey

22

Table 1

Participating Schools and Teachers Compared to Other US High Schools and High School

Science Teachers Panel A Schools Participating Others

Average Enrollment 1409 723

Free or Reduced-Price Lunch 0700 0438

Asian 0055 0050

Black 0349 0154

Hispanic 0410 0221

White 0164 0537

Adjusted Cohort Graduation Rate 0843 0802

District Instruction Expenditures Per Pupil $6561 $5636

District Student Services Expenditures Per Pupil $3787 $3385

Panel B Teachers Participating Others

Age Under 30 0407 0160

Age 30-49 0432 0553

Age 50 or over 0161 0287

Female 0630 0536

Hispanic or Latino 0111 0051

Race American Indian or Alaska Native 0000 0009

Race Asian American 0111 0041

Race Black 0111 0060

Race Native Hawaiian or other Pacific Islander 0000 0004

Race White 0778 0896

Years of Experience 103 132

Years of Experience lt=2 0290 0085

Years of Experience lt=5 0481 0234

Hold a Teaching Certificate 0926 0945

Undergraduate Major in STEM 0944 0747

Single Subject Credential in Science 0630 0823

Masterrsquos Degree or Higher 0356 0615

Previously Taught AP Course 0469 NA

Previously Taught AP IB or Honors Course 0796 NA

Number of Professional Development Trainings 309 NA

in the Past 5 years (0-5)

Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts

httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public

high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a

9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the

Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey

httpsncesedgovsurveyssass Others in Panel B refers to public and private high school

teachers in the US High school science teachers are defined as teachers of grades 9-12 whose

main teaching assignment is in the natural sciences

23

Table 2

TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics

(1) (2) (3) (4) (5) (6)

Full Sample Survey Sample

Pre-Treatment Characteristic

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Age as of October of 11th Grade 166 -003 -007 166 -001 -001

(002) (007) (003) (009)

[019] [035] [065] [094]

Math Exam Score 038 008 025 044 007 030

(004) (010) (005) (016)

[008] [002] [017] [006]

Reading Exam Score 029 010 018 036 009 017

(003) (012) (004) (017)

[000] [014] [002] [031]

HS Grade Point Average 316 005 020 323 006 013

(003) (008) (003) (010)

[014] [002] [006] [020]

Female 059 000 010 061 -001 011

(003) (006) (004) (007)

[099] [010] [073] [012]

Asian American 012 002 010 012 003 010

(002) (005) (001) (007)

[027] [006] [007] [012]

Black 032 -002 -006 027 000 -005

(002) (006) (002) (005)

[029] [028] [088] [040]

Hispanic Native American or Multiracial 031 001 005 033 001 005

24

(002) (006) (002) (007)

[055] [041] [081] [051]

Disabled 002 000 -001 001 000 -001

(001) (001) (001) (001)

[093] [024] [057] [05]

Gifted 013 003 000 014 002 001

(002) (005) (002) (009)

[006] [100] [025] [089]

English Language Learner 005 001 002 004 001 004

(001) (002) (001) (003)

[041] [039] [054] [022]

Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007

(002) (007) (003) (009)

[066] [077] [072] [045]

Language Other than English Spoken at Home 034 002 003 035 001 004

(002) (007) (002) (007)

[032] [073] [059] [056]

Took Recommended Prerequisite Courses 079 000 009 079 002 005

(002) (004) (002) (005)

[084] [004] [043] [031]

Number of Observations 1819 1417

Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by

School x Cohort are in parentheses and p-values are in brackets

25

Table 3

First Stage Impacts on AP Course Enrollment and Overall Course Enrollment

(1) (2) (3) (4) (5) (6)

Full Sample Survey Respondents

Outcome

Control

Group

Mean

ITT

LATE

Control

Group

Mean

ITT

LATE

AP Treatment Course Enrollment 019 038 024 039

(005) (006)

[000] [000] Share of Credits During Study Year in

AP Science 003 004 011 003 004 010

(001) (001) (001) (001)

[000] [000] [000] [000]

All AP 013 004 011 014 004 010

(001) (002) (001) (002)

[000] [000] [000] [000]

Other Advanced Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [023] [020] [020]

All Other Advanced 025 -001 -003 025 -001 -003

(001) (002) (001) (003)

[023] [023] [030] [030]

Regular Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [020] [024] [019]

All Regular 062 -003 -009 061 -003 -007

(001) (003) (001) (003)

[002] [000] [007] [003]

Number of Observations 1819 1417

Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating

Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation

(1) Course-taking information collected from student transcripts Control Group Mean uses the

full control group for the first outcome (ie AP Treatment Course Enrollment) and those control

group members who complied with their assignment (ie those who did not take the AP

Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are

weighted by the inverse probability of completing the survey Standard errors clustered by School

x Cohort are in parentheses and p-values are in brackets

26

Table 4

Treatment Contrast (Composite Variables)

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Academically Challenging Curriculum -033 031 080

(010) (024)

[000] [000]

Project-Based Independent Classroom

Activities -006 013 033

(007) (017)

[007] [006]

Integrated Use of Technology

-011 011 028

(008) (019)

[019] [014]

Number of Observations 1417

Notes To construct these composite variables we first converted the values on each component

variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest

category was set to 10 the lowest to 00 and the remaining categories evenly spaced between

00 and 10 We then averaged and standardized these converted values Results are weighted by

the inverse probability of completing the survey Online Appendix Table 5 provides the list of

component variables Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

27

Table 5

AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Science Skill -019 009 023

(006) (016)

[015] [014]

STEM Interest 062 004 009

(002) (007)

[016] [016]

Confidence in College

Science 092 -004 -010

(002) (005)

[011] [006]

Stress 012 007 017

(003) (007)

[002] [001]

Grades in Science Courses 280 -012 -029

(007) (016)

[008] [007]

Grades in Other Courses 314 -007 -018

(002) (006)

[000] [000]

Number of Observations 1819 for grades 1417 for other

outcomes

Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of

participating students STEM interest =1 if high or some interest in pursuing a STEM degree or

=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to

complete a college science course or =0 if somewhat not confident or not at all confident Stress=

1 if most recent science course had strong negative or negative impact on physical or emotional

health or =0 if strong positive impact positive impact or no impact Grades in science and other

courses are obtained from student transcripts and measure grades during the study year

Results with the exception of grades during study year are weighted by the inverse probability of

completing the survey Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

Table 6

Robustness Checks of Main ITT Results

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Outcome

Control

Group

Complie

r Mean

Main

Result

s

Robus

t SE

p-value

(permutatio

n test)

Excludin

g High

School

56

Including

Imputatio

n of

Missing

Outcome

Variables

Excluding

Covariate

s

Excludin

g High

School

23

Lee

Lower

Boun

d

Lee

Upper

Boun

d

95

Confidence

Interval

from Lee

Bounds

Rati

o of

95

CI in

(11)

to

95

CI in

(7)

Science Skill -019 009 010 011 020 007 003 039

-

009

05

1 20

(006) (005) (000) (000) (000) (000) (007) (007)

[015] [006] [006] [020] [011] [001] [024] [072] [000]

STEM Interest 062 004 005 003 003 003 002 012

-

003

01

8 19

(002) (003) (000) (000) (000) (000) (003) (004)

[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College

Science 092 -004 -003 -006 -006 -004 -006 005

-

009

01

0 20

(002) (002) (000) (000) (000) (000) (002) (003)

[011] [005] [007] [037] [002] [003] [010] [000] [017]

Stress 012 007 005 006 008 007 001 011

-

005

01

5 16

(003) (002) (000) (000) (000) (000) (003) (002)

[002] [000] [000] [014] [007] [002] [002] [079] [000]

Grades in Science Courses 280 -012 -006 -010 -007 |

(007) (004) (000) (000) (000)

[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts

Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey

(002) (003) (000) (000) (000) |

[000] [001] [001] [000] [001] [038]

Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than

standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby

a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of

43

the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)

reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the

experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply

imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and

where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)

from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those

treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and

control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to

derive confidence interval for the treatment effect itself)

30

1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the

Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the

effect of advanced high school courses more generally often without distinctions between AP

and other rigorous course options Nearly all of these nonexperimental studies find large positive

effects of rigorous secondary school courses particularly those in math and science on studentsrsquo

high school postsecondary and labor market performance (eg Altonji 1995 Attewell and

Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long

Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer

an AP course are encouraged (though not required) to attend a professional development

training The Board and other independent agencies offer several workshops with the most

extensive training being the AP summer institute a week-long training that is led by an

experienced AP instructor Teachers are then expected to develop their syllabi for the course and

submit them to the Board for review A team of auditors at the Board review each syllabus and

grant permission to a school to label the course as AP on course catalogs and student transcripts

once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they

do not meet the requirements upon original submission College Board (2017b) contains a

discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for

assessment (ie course delivery and student performance are not assessed by the Board) In

order to effectively run an AP Biology or Chemistry course teachers require access to a well-

equipped classroom and laboratory including all supplies necessary to engage in

experimentation (eg beakers solutions microscopes measuring equipment) Most of the

teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of

learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry

reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses

were based upon recommendations from the National Science Foundation the National Research

Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently

influenced by their frames of reference in ways that other assessments of these traits (eg

external observations) may be less influenced By increasing the standard to which they compare

themselves studentsrsquo confidence may decrease This feature of most self-assessments could be

considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et

al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome

depends to some extent on how these changes in perceived ability influence other behaviors

such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and

Biology I and Chemistry I for AP Biology with no additional requirements beyond these

prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week

training course classroom supplies (eg lab materials textbooks) and to compensate schools

for the staff time required for study administration efforts We also offered $1000 compensation

for an individual selected by the school to serve as a liaison between the study team and the

31

school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of

students which would have powered the study to detect effect sizes smaller than those detected

here We faced several challenges in recruiting schools to participate even with the monetary

incentives Some schools were uncomfortable with randomization across classrooms while

others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the

course would be offered We also made some assignments on a rolling basis as additional

consentassent forms were submitted We have no information on the students who were deemed

eligible by the school to take the new AP science course but who did not sign the consent form

to participate As these students did not participate we do not have permission to obtain

information on their characteristics (eg via transcripts) and for most schools we do not know

the number of such students 10 Participating districts include Anaheim Union High School District California East Side

Union High School District California Lynwood Unified School District California Jefferson

Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg

Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public

Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville

Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate

degrees held by teachers nationally are likely to be in education (not STEM) Thus the study

teachers are less likely to have a graduate degree but not necessarily less likely to have STEM

training We also did not survey teachers regarding their Teach for America (TFA) experience

but it is possible that the relatively high share of STEM undergraduate degrees could be driven

by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last

pilot test included 140 students) prior to administering the tool to study participants Reliability

metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of

the development of the assessment instrument in the survey can be found in Seeratan et al

(2017) 13 Each year in the spring semester our team administered and collected the participant surveys

during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However

if study participants who did not take the survey differ in unobserved ways then our reweighting

based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo

characteristics before imputation of missing values (as described below) these results are very

similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact

that some students did not have 10th grade scores) we created one reading and math score for

each student that is the average of both scores or just the 8th grade score For the 23 participating

students who were in 10th grade during the year in which the AP course was offered to their

cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be

endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of

enrollment in the course so the lack of balance is simply due to unlucky randomization rather

32

than manipulation by school administrators We considered implementing a randomized block

design to avoid such issues but found it infeasible to obtain the necessary test score information

prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was

allowed to register for the new class We added an entire planning year to our study design to

avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by

Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We

find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these

six outcomes which suggests that generalizing our estimated treatment effects to the full control

group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly

offered enrollment in an AP course and then given the option of Chemistry or Biology To

account for the two courses offered we treat the school as two separate groups School-

Chemistry and School-Biology For those students who were not offered an AP course we

randomly assign them to one of two control groups proportional to the number of treated

students who chose each course For example if 60 of the treated students chose Biology then

we randomly assign 60 of the control students to the School-Biology control group In Section

VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a

probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1

if student i in school by cohort j completed any part of the end-of-year survey Xi is the same

vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed

effects and Φ() is the cumulative normal distribution function The results of this regression are

included in Online Appendix Table 2 Students who had higher pre-treatment grades Black

students those who were not disabled and those who took prerequisite courses were more likely

to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives

more weight in the regression to study participants who completed the survey and yet had pre-

study characteristics that were similar to those study participants who did not complete the

survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and

with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we

observe each outcome variable This follows a multiple imputation then deletion strategy

suggested by Hippel (2007) which improves efficiency while protecting against problematic

imputed outcome values As a robustness check Section VC provides results including

imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually

challenging curriculum with more homework than non-AP complier students Treatment group

students are also more likely to report that the students in their class were driven to succeed and

that the teacher set high standards The AP science class also involved more student-led projects

or experiments hands on learning and small group work all activities that are deemed to be

essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)

Yet we do not find strong evidence that students in AP classes were more likely to present what

they learned apply their knowledge to solve a new problem or work independently and none of

the component measures of technology usage were statistically significantly affected Nor did

33

treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear

better able to implement the academic rigor expected of an AP science class than some of the

inquiry-based approaches that the College Board intends for AP science We do not find

evidence that taking AP science led students to be more likely to report that they found their

course more interesting which may reflect the inability of the teachers to fully implement a

creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects

that might render our estimated effects smaller A research design with randomization both

across and within schools would allow for estimation of spillover effects but such a design was

infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers

received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the

weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors

in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple

outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons

(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same

three outcomes that reach statistical significance without applying the correction (shown in

Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys

from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we

have included the students from cohort 1 of high school number 23 where nonresponse was due

mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes

(including student prior academic preparation raceethnicity gender and teacher preparation)

We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in

science and grades in other courses) Some of the differences in the point estimates were quite

large yet so too were the standard errors For instance five of the seven estimated differential

treatment effects on science skill exceed 025 standard deviations with p-values that fall in the

suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse

on all three cohorts of study participants Once data collection is complete we will have the

ability to examine the effect of AP science on college enrollment college selectivity and college

completion

Page 4: Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown,

4

attainment and earnings by increasing their skill and interest in the subject matter We determine

whether skill and interest gains as distinct from college admissions and credit-granting policies

are key drivers behind APrsquos impact on later outcomes This distinction is important given that

less than half of AP course-takers earn a credit-granting score on the AP exam either because

they do not take the exam or because they obtain low scores (National Research Council 2002

College Board 2018) Many selective colleges are also increasingly making it difficult for

students to receive credit for their AP exam scores Most top institutions restrict the number of

AP subject areas that are eligible only offer credit or waivers for very high scores on the exams

or cap the total amount of AP credit that a student can receive (Weinstein 2016) In 2012

Dartmouth College announced that it would no longer grant credit for any AP exam score a

policy shared by several other selective institutions including Amherst College Brown

University and the California Institute of Technology (Weinstein 2016) Our results which

generalize to a newly offered AP course suggest that AP endows students with human capital

even if it does not grant them the opportunity to earn credit at their preferred college For college

admissions officers the findings also suggest that AP course-taking offers a reasonable signal of

studentsrsquo skill and subject-matter interest Our estimated effects on skill and STEM interest are

somewhat limited by insufficient precision yet they represent the first and most credible

evidence to date on the impact of AP on these key outcomes

Our study is also among the first known to us that quantifies the AP impact on studentsrsquo

grades We find that students who take an AP science course earn lower grades in science (by

029 grade points) and lower grades in their other courses (by 018 grade points) The lower

grades in science are driven by the lower grade received in the AP class a negative effect that

many secondary and postsecondary institutions offset by upweighting AP grades The estimates

suggest that studentsrsquo AP science grade would have to be inflated by a factor of 146 (eg a C

would have to be converted to approximately a B+) to remove the net negative on overall grade

point average (GPA) While many high schools including those that participated in our study

weight studentsrsquo GPAs to adjust for the academic difficulty of the courses practices vary

substantially across institutions (Sadler and Tai 2007 Klopfenstein and Lively 2016) In a recent

survey of Texas high schools for instance Klopfenstein and Lively (2016) find that most

schools with AP courses used weights but that they ranged from 05 to 1 point (with a small

number assigning more than 1 extra point) Our findings suggest that the current practices at

many institutions under adjust for the grade penalty from AP courses In addition attaching

weight to AP grades cannot undo the learning loss that may occur when students shift their effort

away from non-AP coursework

We also contribute to other strands of literature on the relationship between studentsrsquo

academic achievement and their perceptions of their own confidence and stress Prior literature

on the relationship between studentsrsquo confidence in their ability and their true ability is rife with

mixed results (Boekaerts and Rozendaal 2010 Stankov and Crawford 1996 Stankov 2013)

Psychologists have also documented an inverted U-shaped relationship between perceived

pressure and performance where some amount of stress is necessary to increase achievement

yet too much stress can reduce studentsrsquo ability to gain knowledge (Anderson 1976 Davis 2014

Yerkes and Dodson 1908) We find that students taking an AP science class experience cognitive

gains concurrent with losses in their academic confidence This finding is consistent with

evidence that many US students are highly confident in their skills and that this noncognitive

belief often interferes with their ability to learn (Chiu and Klassen 2010 Stankov and Lee 2014)

The AP course appears to reduce studentsrsquo estimation of their own ability either by changing the

5

standard to which they compare themselves or by making them more aware of the challenges

they might face in a college course Whether these changes in perceived confidence persist and

how they influence later outcomes is uncertain Students with expectation levels that match the

real demands of college courses might eventually perform better in those courses Some students

might also use the insights they gain from a challenging AP science class to shift away from

difficult science courses in college (or entire majors) that could delay or hinder their college

completion Our results also suggest that AP causes a significant amount of stress for students

but we do not find evidence that the added pressure substantially limits their knowledge gains in

science

II AP Science and Conceptual Framework

A AP and Other Rigorous Secondary School Courses

The AP program is an appealing option for high school administrators who seek to offer college-

level courses to their students AP course descriptions and assignments are designed to match

those offered in introductory college courses in each subject and thus to prepare students for the

rigor of college coursework The College Board (the ldquoBoardrdquo for brevity) is a not-for-profit

organization that administers AP and provides professional development for teachers reviews of

course syllabi and extensive curricular materials (eg sample syllabi sample lab experiments)3

The Board also offers standardized AP exams in the spring of each year that are graded by

external examiners and provide an externally-validated measure of student learning Most exams

include both an essay or problem-solving component and multiple-choice questions all of which

are aligned with the course descriptions The exam is one of the key features of the AP program

and is used by high school and postsecondary educators to evaluate the depth of studentsrsquo skill

independently of teacher bias

In addition to AP courses high school students typically have three alternative options for

advanced coursework Most high schools offer ldquohonorsrdquo courses which are intended to provide a

more rigorous curriculum than the regular course in the same subject The content and rigor of

honors courses varies across high schools and there is no standardized honors exam offered to

students in these courses A second option is the International Baccalaureate (IB) program

which was originally designed for students in international schools and aims to develop

studentsrsquo critical thinking skills and their knowledge of international affairs The IB program is

offered worldwide but remains relatively uncommon in the United States with less than 5 percent

of high schools offering IB in 2016 (The IB Programme 2016) A final option is for students to

take a course at a nearby college (or online) or for some a course that is taught at their high

school by an instructor who has been approved as college-level These ldquodual enrollmentrdquo or

ldquodual creditrdquo courses are meant to provide students with the opportunity to simultaneously earn

high school and college credit In the most recent national survey high schools reported

approximately two million enrollments in dual credit courses (Thomas et al 2013) There is

limited information on the colleges that accept dual enrollment credits Most courses are offered

through collaborations between high schools and local community and public postsecondary

institutions suggesting that credits are generally accepted at these institutions and less often

accepted at other institutions Comparisons of AP science classes to regular and honors level

science classes reveal that students receive much more homework and work harder in their AP

classes (Sadler et al 2014) To our knowledge there have been no comparisons of the workload

or effort in AP science courses compared to IB or dual enrollment science courses

6

B Conceptual Framework

There are several channels through which an AP science class is expected to influence studentsrsquo

cognitive and noncognitive skills Much like the ideal college course AP science is designed to

provide rigorous content and a substantial workload be taught by teachers who have high

expectations and consist of students who are driven to succeed These inputsmdashcourse rigor

teacher expectations and peer motivationmdashare often thought of as the main characteristics that

distinguish AP courses from other high school courses

Yet AP science classes are also intended to offer an inquiry-based approach to science that

when combined with a high level of rigor provides an additional causal pathway to change

Specifically a well-implemented AP science course should encourage students to ask questions

gather and interpret data arrive at explanations grounded in scientific principles and

communicate their observations to one another under the guidance of teachers (College Board

2011a 2011b)4 This student-led inquiry-based approach differs from many traditional

secondary school science classrooms where the goal is often for students to memorize content

and replicate laboratory experiments that demonstrate the content (National Research Council

2002 2012) The AP science course in contrast seeks to expose students to the real-world

practices of science and the skills that form the basis of scientific inquiry by focusing more on

big picture concepts and small group experimentation with students directing the inquiry The

curriculum also encourages teachers to move away from lecture-based pedagogy and multiple-

choice quizzes and to increase their use of technology to help students analyze data draw

interpretations and communicate findings (College Board 2011a 2011b)

AP science classes are expected to increase studentsrsquo ability to ask research questions design

experiments analyze data and draw conclusions In the process of gaining these scientific

inquiry skills the new curriculum is intended to spur greater interest in the practice of science

because it becomes more enjoyable and more accessible to students for whom rote memorization

and execution of prefabricated lab experiments might have diminished enthusiasm in the subject

(National Research Council 2012) Science experts posit that inquiry-based science courses will

be particularly successful in generating greater interest and skill among women and among

students from underrepresented minority groups (Aguilar Walton and Wieman 2014 Ellis

Fosdick and Rasmussen 2016 Kurth Anderson and Palincsar 2002 Leslie et al 2015 Litzler

Samuelson and Lorah 2014)

While the rigor and expectations of a college course may be appropriate for some students it

can be too demanding for others Students often report high levels of stress and burnout from

taking AP courses particularly if they perceive that they are not prepared for the challenge of

college coursework (Kim 2015 Marx 2014 Tucker 2012) A strenuous AP course could in fact

cause students to lose confidence in their ability to complete college science courses A number

of mechanisms could cause students to lose confidence including exposure to stronger peers

inability to successfully complete assignments or simply receiving lower grades than they

received in their non-AP courses5 The AP effect on confidence will likely matter differently for

students with different levels of initial confidence For students who are over-confident in their

ability to succeed in college science courses taking a challenging AP course in high school

might cause them to revise their expectations to be more in line with the higher demands of

college-level work

Taking a more strenuous AP course is also likely to affect studentsrsquo time allocation

Studentsrsquo performance in each class will be determined by their subject-specific ability as well as

the amount of time they devote to their coursework versus other activities including work

7

extracurricular and leisure If AP courses are more demanding than other courses students

solving a time allocation problem may shift more effort into their AP course away from other

pursuits The impact of this change in time allocation on studentsrsquo performance in AP and other

courses will depend upon whether they shift effort away from other courses and on the degree of

complementarity between their AP science course and their other courses Study time devoted to

an AP science course could improve student performance in other math and science classes

(where the skills tasks and knowledge are similar) even if students spend less time on those

courses For courses that require students to perform tasks that are not complementary with AP

science (eg courses in the humanities) taking AP science concurrently with these courses

could decrease student performance in both courses Of course students taking an AP course

could choose to reduce time spent on alternative (non-academic) activities If these other

activities have no causal impact on performance in school then the impact on overall

achievement could be negligible

Some students report concerns about their time allocation as they weigh the decision to enroll

in AP (Foust Hertberg-Davis and Callahan 2009 Hopkins 2012 Kim 2015) Many of these

concerns have increased over time as the courses have become more accessible to students who

previously faced barriers to enrollment Traditionally teachers only recommended AP courses to

students with high grades in prerequisite classes and the courses were only offered in schools

with substantial resources The Board has made efforts to increase access with for instance a

policy statement that encourages schools to open AP to all students who are ldquowilling to accept

the challengerdquo and remove all barriers that restrict access (College Board 2002)6 In a 2008

survey of a nationally-representative sample 65 percent of secondary school teachers reported

that their schools encourage as many students as possible to take AP and 69 percent reported that

AP courses are generally open to any student who wants to enroll (Duffett and Farkas 2009)

These open access policies have led to complaints that students who enroll with less preparation

will be unable to engage in the material (and perhaps become more discouraged by the

difficulty of the course) than students with more prior preparation (Hopkins 2012 Steinberg

2009 Duffett and Farkas 2009) Open access could also adversely affect more prepared students

through negative peer effects or through teachers removing content and slowing the pace of

course delivery

III AP Science Impact Study

A Overview

We recruited 23 schools from across the United States and offered monetary compensation to

pay for equipment and teacher training and as an incentive to secure participation7 Eligible

schools included ones that had not offered AP Biology or AP Chemistry in recent years were

willing to add such a course and comply with study protocol and had more eligible students than

could be served in one class so as to supply a sufficiently-sized control group8 Of the 23

schools 12 schools added AP Chemistry 10 schools added AP Biology and 1 school added both

courses We recruited two waves of schools (those that offered the course for the first time in

2013 and those that offered it for the first time in 2014) both waves were asked to field the

course for two years and the earlier-joining schools had the option of fielding the course for

three years The study includes 47 schools by cohort groups

Each participating school identified students that the school deemed eligible to take the new

AP Biology or Chemistry course in the spring of the prior year We treated all eligible students

8

who assented to participate in the study and who obtained consent from their parent or guardian

as study participants Upon receipt of signed consentassent forms we randomly offered

enrollment in the newly launched course to a subset of participating students9 The study

includes a total of 27 teachers and 1819 students (with an average of approximately 19 students

per AP class)

Figure 1 shows the geographic distribution of the 11 participating districts which are

primarily concentrated in the western southern and eastern regions of the country10 The

underrepresentation of districts in the Midwest is consistent with evidence that the Midwestern

region has experienced less competition over the years in access to selective postsecondary

institutions and a corresponding lag in AP participation rates (Bound Hershbein and Long

2009) Relative to districts across the nation those participating in the study tend to be in

neighborhoods with lower levels of socioeconomic status and to educate students who score

below average on tests in earlier grades (see Figure 2) Correspondingly participating schools

tend to be larger and more likely to educate students who are eligible for free or reduced-price

lunch Black and Hispanic than other schools (Panel A of Table 1)

There are two reasons for this over-representation of larger schools serving less economically

prosperous communities First AP courses are already offered in the majority of the nationrsquos

public high schools and schools that serve students from high-income families tend to offer

more AP subjects than schools that serve students from lower-income families (Malkus 2016

Theokas and Saaris 2013) Given that our research design only allowed for schools that had not

recently offered an AP science course the population of schools from which we recruited tended

to be those in settings with fewer resources Second participating schools were required to state

that they believed they would have 60 or more students who were qualified to take the AP

science course and this requirement tended to disqualify smaller high schools

Reflecting the school demographics participating teachers are slightly younger less

experienced and more likely to be female Black Asian American and of Hispanic ethnicity

than US high school science teachers generally (Panel B of Table 1) Nearly half (a third) of our

study teachers have less than or equal to five (two) years of teaching experience which is more

than double (triple) the rate of US high school science teachers Study teachers are more likely to

hold an undergraduate major in a STEM field than other high school science teachers yet far less

likely to hold a mastersrsquo degree and slightly less likely to have earned a teaching credential in

science Most of the participating teachers had previously taught a higher-level course (mostly

honors) yet only 47 percent of them had previously taught an AP course Our research

consequently applies to a population of teachers who are relatively new to the AP science

curriculum and who have generally not received graduate training11 Assuming AP courses

improve with teacher preparation our results likely capture the effect of a less-than-ideal version

of AP and may result in less positive treatment effects than when AP is delivered by teachers

with more training and experience (Clotfelter Ladd and Vigdor 2010)

B Data and Student Descriptive Statistics

We rely on three primary and secondary data sources for impact estimates The first is an

assessment developed and validated by the research team that measures studentsrsquo scientific

inquiry skills We administered this assessment to students in both treatment and control groups

and designed it to measure general inquiry skills (eg how to analyze data) rather than specific

content knowledge in Biology or Chemistry To that end the assessment tool includes nine items

that rely on science disciplinary knowledge that is taught in middle school specifically material

from Life Sciences and Physical Sciences The assessment which we administered to all study

9

participants during a 45-minute period measures studentsrsquo skills in data analysis scientific

explanation and scientific argument12 Participating teachers were not provided copies of the

instrument in advance therefore teachers were unable to teach any content material prior to test

administration

The second source is a questionnaire that we administered concurrently with the assessment

and that asks students a number of questions about their most recent science class and their plans

after high school The assessment and questionnaire were completed together and administered

outside of class (henceforth we refer to these instruments as the ldquosurveyrdquo) The third data source

are studentsrsquo high school transcripts which contain data on demographic and socioeconomic

background grades courses standardized exams taken in the 8th and 10th grades as well as high

school completion We use these data to determine the balance of randomization on pre-

treatment covariates estimate the effect of randomization on course-taking (including

compliance) improve the precision of our estimates with statistical controls and estimate

treatment effects on studentsrsquo grades

Our survey response rate was 78 percent13 Attrition can be attributed to student absences

during the dates scheduled for survey administration and communication lapses between school

coordinators and students Students who were randomly assigned to treatment have a 9-

percentage point higher survey response rate Given the possibility of nonrandom sample

attrition we weight all regressions by the inverse of the probability of completing the survey

conditional on student characteristics14 We implement a variety of robustness checks as

additional means to account for nonresponse These include multiple imputation of missing

outcome variables excluding one high school that had a low response rate and using the Lee

(2009) technique to provide bounds on the estimated effects These methods and results are

discussed below

We supplement these data with surveys that we administered online to teachers of the new

AP courses at the conclusion of the course The teacher survey includes questions about their

educational background professional experiences and professional development past and

present instructional practices generally and around science specifically participation in the

College Board AP training ability to cover the content of the AP course and coaching

mentoring and other professional community supports received from the school district and

education community

Table 2 provides balancing tests on pre-treatment characteristics for the full sample and the

survey respondents conditional on school by cohort fixed effects15 Most of the estimated

differences between treatment and control group students on pre-treatment observed

characteristics are small with some notable exceptions In both the full and survey samples

treatment group studentsrsquo reading exam scores were 010 and 009 standard deviations higher

than control group students both at p-values below 005 The magnitude of the treatment-control

difference was slightly lower and less precisely-estimated in math yet also favored treatment

group students16 To adjust for these chance imbalances we include all student covariates as

predictors of outcomes in the models and in the robustness checks we exclude these

covariates17

Table 2 also shows the extent of differences between control group compliers and non-

compliers We find that non-compliers are generally much more academically prepared for AP

science they have higher pre-treatment reading and math test scores and are more likely to have

completed the prerequisite courses On demographics non-compliers are more likely to be Asian

American and female18

10

IV Empirical Strategy

We estimate the effect of taking the AP science course with a standard instrumental variable

specification

(1) 119884119894119895 = 120572119895 + 119860119894119895120573 + 119935119894120574 + 120598119894119895

(2) 119860119875119894119895 = 120575119895 + 119874119891119891119890119903119890119889119894119895120579 + 119935119894120583 + 120598119894119895

where 119860119875119894119895 = 1 if student i enrolled in the AP science course in school x cohort stratum j 119860119894119895 is

the fitted value based on the estimates of the parameters in Equation (2) Offeredij = 1 if the

student is randomized into the treatment group Xi is a vector of pre-treatment covariates

(including age math and reading exam scores from 8th and 10th grade (standardized and

averaged for math and reading separately) cumulative GPA prior to the year when the AP

science course was offered and indicator variables for female racial group (Asian American

Black or Hispanic Native American or Multiracial) disability gifted English Language

Learner eligible for free or reduced-price lunch home language is not English and took

recommended prerequisite courses) and 120572119895 and 120575119895 are school by cohort fixed effects19 We use

two-stage least squares to estimate the model for all outcomes The local average treatment effect

(LATE) estimate is given by β

The intent to treat (ITT) estimate is obtained by replacing 119860119894119895 with Offeredij in Equation (1)

as shown in Equation (3) The coefficient on Offeredij in Equation (3) provides the effect of

being offered enrollment in the new AP science course and is a weighted average of effects on

those who do and do not choose to enroll in the course

(3) 119884119894119895 = 120577119895 + 119874119891119891119890119903119890119889119894119895120591 + 119935119894120582 + 120598119894119895

For outcomes that are obtained from the survey we weight regressions by the inverse of the

estimated probability of completing the survey20 The results are similar without using these

weights (see Online Appendix Tables 3 4 and 6) Since we have some missingness in student

characteristics as a result of either missing student transcripts or certain data elements not

collected by the district we use multiple imputation by chained equations creating 10 imputed

datasets and combine the results21 For inference we cluster standard errors at the level of

treatment assignment (school by cohort) in our analysis of main effects In the analysis of

robustness we report permutation standard errors robust standard errors (for comparison to

permutations) and the statistical significance of the LATE estimates after adjusting our tests of

significance for multiple comparisons

V Results

A Course-Taking and Treatment Contrast

Table 3 provides estimated effects of the randomized offer of enrollment on AP science course

enrollment and share of credits in all courses for the full sample and the survey samples The

first-stage estimates indicate that the offer substantially increased the likelihood of the student

taking the AP science course by 38 percentage points in the full sample and 39 percentage points

in the survey sample As we expected compliance with randomization was imperfect with 42

11

percent of the students who received an offer choosing not to enroll and 19 percent of the control

students enrolling Nearly all of these latter crossovers reflected decisions by the district to

violate the study protocol and let control group students into the course while a few of these

came from hardship exemptions that were requested by the school and granted by the study team

The remaining rows in Table 3 shine light on the courses that were crowded out by the newly

offered AP science course Mechanically treatment group students took more credits in AP

science (an 11-percentage point increase in the share of total credits in the full sample)

Treatment group studentsrsquo share of courses in any AP also increased by 11 percentage points

indicating that they chose not to reduce enrollment in other AP courses Instead taking AP

science appears to have crowded out regular courses (down 9 percentage points) including

regular science courses (down 2 percentage points)22

Approximately 78 percent of the control group compliers took any science course with 34

percent taking a non-AP advanced science course (almost entirely honors courses) during the

study year The control students who did not take AP Biology or Chemistry took a variety of

alternative science courses with the most commonly reported courses including Chemistry

(13) Physics (12) AP Environmental Science (11) Biology (10) Honors Biology (9)

and AnatomyPhysiology (9)

Table 4 provides the contrast in treatment and control group complier reports on the content

and rigor of their science courses for three composite variables We find that taking AP science

yielded a substantially more academically challenging curriculum (up 080 sd p-value lt 001)

and raised the extent of inquiry-based classroom activities (up 033 sd p-value = 006) Our

results also suggest that AP course-takerrsquos classrooms were more likely to use technology (up

028 sd p-value = 014)23 Online Appendix Table 5 shows estimated impacts on each of the

component variables used in constructing the composite variables We find that while AP

classrooms were more inquiry-based than other science classrooms using our composite

measure some of the core components of the inquiry approach that were intended by the Board

(eg applying knowledge to solve a new problem) were not more prevalent in AP science

classes than other science classes24 This contrast between studentsrsquo reports of the content and

rigor of their AP science course relative to other courses available to them offers one measure of

the relative quality of the treatment In a companion manuscript we provide a detailed evaluation

of implementation fidelity (the degree to which the courses were implemented as intended by the

Board) through teacher surveys course syllabi student transcripts and interviews with teachers

and school administrators (Long Conger and McGhee 2018) In that manuscript we find results

that are consistent with the finding that most teachers were able to implement a rigorous AP

science classroom yet they also struggled with the inquiry-based approach and integrating

technology into the classroom

These reported differences between treatment and control group classrooms also hold despite

the fact that many of the teachers selected to teach AP also teach the other science courses taken

by control group students In fact almost 67 percent of AP teachers reported using some of their

AP science strategies and lessons in their non-AP classes These within-school spillovers likely

attenuate observed differences in outcomes between treatment and control group students in the

same school25

B AP Impact on Outcomes

Table 5 reports estimated impacts of AP science on the key outcomes of interest We estimate

that for the typical complier taking AP science raises objectively measured scientific inquiry

skills by 023 standard deviations We are unable to rule out zero treatment impacts with

12

conventionally high levels of confidence (p-value = 014) and consequently refer to these results

as more suggestive than definitive AP science also increased compliersrsquo interest in pursuing a

STEM degree should they enroll in college by 9 percentage points up from a control group

complier mean of 62 percent with again more suggestive than definitive results at traditional

levels of statistical inference (p-value = 016)

Table 5 provides stronger evidence of negative treatment effects on studentsrsquo confidence in

their ability to succeed in a college science course Among control group compliers 92 percent

express that they are at least somewhat confident in their ability to succeed in a college science

course These high levels of confidence are perhaps not surprising since all of our sample

participants demonstrated interest in taking AP Chemistry or Biology as a result of signing the

study assent forms Taking AP science substantially lowered participantsrsquo likelihood of being at

least somewhat confident in their ability to complete college courses in science (down 10

percentage points p-value = 006) We also find large effects of the AP course on studentsrsquo self-

reported stress levels Among control group compliers 12 percent stated that their most recent

science class had a negative or strong negative impact on their stress levels (where a negative

impact indicates more stress) Taking AP science more than doubles this rate raising the

likelihood of stating a negative impact by 17 percentage points (p-value = 001) In results

available from the authors we also examine the effect of taking AP on the full distribution of

studentrsquos self-reported confidence and stress levels We find that taking AP science increases

studentsrsquo likelihood of reporting strong negative impacts on stress by 5 percentage points (p-

value = 005) above the control group complier mean of 2 percent

In addition to experiencing a loss in confidence and an increase in stress treatment group

studentsrsquo grades suffered We estimate that taking AP science reduced studentsrsquo grades in their

science courses by 029 points (p-value = 007) Relative to a control group complier mean of

280 taking AP science lowers studentsrsquo science GPAs during the study year (usually their junior

year) from around a B- to a C+26 This decline is addressed to some degree by high schools that

use a weighted grade point average to upweight grades from AP courses The last row of Table 5

provides our estimated effects of AP science on studentsrsquo grades in other courses AP science

takers score approximately 018 grade points lower than control group compliers in non-science

courses during the study year (p-value below 001) These results suggest that students may be

shifting their effort away from their non-AP classes in order to meet the demands of the

challenging AP course An average of these impacts weighted by studentsrsquo share of credits in

science during the study year assuming that they take AP science (024) suggests that taking AP

science lowers studentsrsquo overall grades by 021 during the year ((-029 times 024) + (-018 times

076))

With our estimates in hand we can easily compute the adjustment that would leave the

studentrsquos GPA during the study year unaffected For students who took AP Biology or Chemistry

as result of this experiment the share of their classes in any AP science subject is predicted to be

14 percent (ie 002 + 012 from Table 3) If these studentsrsquo grades in AP science courses were

boosted by 146 (021014) their GPAs during the study year would be unaffected by their

enrollment in these AP courses This 146 boost is close to the higher end of the practices

documented in Klopfenstein and Lively (2016)27

C Robustness Checks

Table 6 presents a variety of robustness checks of the ITT estimates on our six main outcomes

The first two columns of this table repeat the findings previously shown in Table 5 Columns (3)

and (4) present alternate methods for inference Column (3) reports robust standard errors and

13

Column (4) reports the results of a permutation test where we randomly assign a pseudo

treatment and compute the share of 1000 permutations where the absolute value of the estimated

pseudo treatment effect exceeds the absolute value of the estimated treatment effect shown in

Column (2)28 The resulting p-values from this permutation test are similar to the results using

robust standard errors (shown in Column (3)) resulting in five of the six outcomes with p-values

of less than 01029

Columns (5) through (7) of Table 6 show that the results are robust to (a) dropping the one

high school that offered both AP Biology and AP Chemistry as part of the study (b) including

observations with multiply-imputed missing outcome variables and (c) excluding the high

school with the lowest survey response rate30 Column (8) shows the results when we exclude all

of the Xi covariates where we find much larger estimated positive effects on scientific inquiry

skills and smaller estimated negative effects on grades The differences in the treatment effects

on the remaining three outcomes are modest These results likely reflect the fact that students

who were randomly assigned into the treatment group have higher pre-treatment grades and

reading and math test scores all covariates that strongly correlate with science skill and future

grades

Columns (9) through (12) of Table 6 use the Lee (2009) method to place bounds on our

estimates due to potential nonresponse bias in the student survey used for the first four outcomes

This method trims particular observations from the treatment group (in this case) until it matches

the response rate of the control group The lower (upper) bound estimate trims the treatment

observations with the highest (lowest) values of the outcome Using these lower and upper bound

estimates we compute the 95 percent confidence interval for the treatment effect itself by

applying the Imbens and Manski (2004) method Consistent with our main findings the upper

and lower bound points estimates are positive for science skill (003 and 039 sd) interest in

pursuing a STEM degree (2 and 12 percentage points) and stress (1 and 11 percentage points)

However the 95 percent confidence intervals overlap zero in all cases and are roughly double the

size of the ordinary confidence intervals These results suggest that some additional caution

should be considered in evaluating the effects from outcomes based on the study survey31

Finally we would have liked to report the results of theoretically motivated heterogeneity

analyses yet we lack the statistical power needed to test heterogeneity with a high level of

confidence For example Figure 3 shows a quantile regression conditional on Xi with science

skill as the outcome We find that the point estimates at every quantile are insignificantly

different from the 009 ITT point estimate reported in Table 5 yet the 95 confidence intervals

fail to rule out large positives and negatives Additional heterogeneity results can be found in the

Online Appendix32

VI Conclusion

Most admissions committees at bachelorrsquos degree-granting institutions rely on applicantsrsquo AP

course and exam participation as signals of subject-matter skill and interest rendering the

relationship between AP uptake and college enrollment somewhat deterministic There has been

almost no empirical work to support the theory that AP disproportionately endows high school

students with greater human capital than the other courses available to them Many students

educators and parents have also complained that the rigor of the AP pro- gram causes students to

lose confidence gain stress and perform poorly in other courses We evaluate these claims with

experimental evidence on the impact of AP Biology and Chemistry courses on studentsrsquo skills

14

interests and beliefs We recruited 23 schools that had not previously offered AP Biology or

Chemistry and were willing to permit us to randomize student access to the newly offered

course At the time of our school recruitment an estimated 50 percent of US high schools

already offered AP science classes and they tended to be in relatively higher-income

communities disproportionately serving White students (Malkus 2016) Our study drew from the

remaining population of schools where teachers had lower levels of training than science

teachers nationally and students were disproportionately non-White and poor Consequently our

results on AP impacts best generalize to schools like these that are on the cusp of deciding

whether to offer an AP science course

The estimates suggest that AP science led to improvements in science skill and STEM

interest above the courses that these students would otherwise take Prior research points to

longer-run benefits of AP including a higher likelihood of college enrollment and completion as

well as possible earnings gains (Jackson 2010 2014) Our findings suggest that these long-term

effects are at least partially driven by genuine increases in skill and not due solely to

postsecondary admissions and credit-granting policies33 We also find that AP science classes

substantially increase studentsrsquo stress levels and reduce their confidence in completing a college

science course Students who take AP science also receive lower grades in science and in other

(non-science) courses The cognitive gains from AP science are consistent with evidence that

higher levels of pressure and a lower level of confidence cause students to learn more than they

would otherwise And some of the negative effect on grades can be offset by upwardly weighting

grades in advanced courses

Although we have no direct way to convert our study impacts into monetary values for

students or society our evidence suggests that schools and districts are not making unwise or

costly investments in AP Calculating the differential cost to deliver an AP course versus another

level course in the same subject is difficult given that few schools document per-course

expenditures One recent analysis of a US district that relied on teacher salaries and course

assignments offers a partial cost-analysis Roza (2009) finds approximately $360 more in per-

pupil expenditures to deliver AP versus honors due primarily to smaller class sizes and more

senior teachers in AP This cost does not factor in the time that teachers spend retraining

themselves to teach the new curriculum At the same time relative to other policies aimed at

increasing human capital in high school that are often more costly to implement (such as

reducing class size) offering an AP course may be one of the least expensive options

This study offers the first credible estimates on the impact of a curriculum that is now offered

in the majority of the nationrsquos high schools and used by most postsecondary institutions to assess

applicant potential Our findings offer evidence to support and refute some of the claims made

about the AP program At the same time many important questions remain about differential AP

course impacts along student teacher and school attributes and on different parts of the outcome

distributions What are the general equilibrium effects of AP expansion for instance on college

admissions decisions as AP expands into schools with fewer resources Do AP courses generate

spillover effects on non-AP course-takers via changes in peer interactions and changes in how

teachers teach their non-AP classes These are all questions that warrant further research

15

References

Abadie Alberto Susan Athey Guido W Imbens and Jeffrey Wooldridge 2017 ldquoWhen Should

you Adjust Standard Errors for Clusteringrdquo NBER Working Paper No 24003

Cambridge MA NBER

Adelman Clifford 2006 The Toolbox Revisited Paths to Degree Completion from High School

Through College Washington DC US Department of Education

Aguilar Lauren Greg Walton and Carl Wieman 2014 ldquoPsychological Insights for Improved

Physics Teachingrdquo Physics Today 67 (5) 43ndash49

Altonji Joseph G 1995 ldquoThe Effects of High School Curriculum on Education and Labor

Market Outcomesrdquo The Journal of Human Resources 30 (3) 409ndash438

Anderson Carl R 1976 ldquoCoping Behaviors as Intervening Mechanisms in the Inverted-U-stress-

performance Relationshiprdquo Journal of Applied Psychology 61 (1) 30ndash34

Attewell Paul and Thurston Domina 2008 ldquoRaising the Bar Curricular Intensity and

Academic Performancerdquo Educational Evaluation and Policy Analysis 30 (1) 51ndash71

Avery Christopher Oded Gurantz Michael Hurwitz and Jonathan Smith 2018 ldquoShifting

College Majors in Response to Advanced Placement Exam Scoresrdquo Journal of Human

Resources 53 (4) 918ndash956

Benjamini Yoav and Yosef Hochberg 1995 ldquoControlling the False Discovery Rate A Practical

and Powerful Approach to Multiple Testingrdquo Journal of the Royal Statistical Society 57

(1) 289ndash300

Bennett J S Hogarth F Lubben B Campbell and A Robinson 2010 ldquoTalking Science The

Research Evidence on the Use of Small Group Discussions in Science Teachingrdquo

International Journal of Science Education 32 (1) 69ndash95

Berger Joe 2006 ldquoDemoting Advanced Placementrdquo The New York Times October 4

Boekaerts Monique and Jeroen S Rozendaal 2010 ldquoUsing Multiple Calibration Indices in

Order to Capture the Complex Picture of What Affects Studentsrsquo Accuracy of Feeling of

Confidencerdquo Learning and Instruction 20 (5) 372ndash382

Bound John Brad Hershbein and Bridget Terry Long 2009 ldquoPlaying the Admissions Game

Student Reactions to Increasing College Competitionrdquo The Journal of Economic

Perspectives 23 (4) 119ndash146

Bowie Liz 2013 ldquoMaryland Schools have been Leader in Advanced Placement but Results are

Mixedrdquo The Baltimore Sun August 17

Bush George W 2006 ldquoState of the Union Address by the Presidentrdquo Washington DC The

White House

Chiu Ming Ming and Robert M Klassen 2010 ldquoRelations of Mathematics Self-Concept and its

Calibration with Mathematics Achievement Cultural Differences among Fifteen-year-

olds in 34 Countriesrdquo Learning and Instruction 20 (1) 2ndash17

Clotfelter Charles T Helen F Ladd and Jacob L Vigdor 2010 ldquoTeacher Credentials and

Student Achievement in High School Across-Subject Analysis with Student Fixed

Effectsrdquo Journal of Human Resources 45 (3) 655ndash681

College Board 2002 Equity Policy Statement New York NY

__________ 2011a AP Biology Curriculum Framework 2012-2013 New York NY

__________ 2011b AP Chemistry Curriculum Framework 2013-2014 New York NY

__________ 2017a AP Course and Exam Redesign New York NY

__________ 2017b AP Course Audit New York NY

__________ 2018 AP Program Participation and Performance Data 2018 New York NY

16

Davis Jennifer R 2014 ldquoA Little Goes a Long Way Pressure for College Students to Succeedrdquo

Journal of Undergraduate Research 12 (1) 1ndash9

Dobbie Will and Roland G Fryer Jr 2015 ldquoThe medium-term impacts of high-achieving

charter schoolsrdquo Journal of Political Economy 123 (5) 985ndash1037

Dougherty Chrys and Lynn Mellor 2009 ldquoPreparation Mattersrdquo National Center for

Educational Achievement Washington DC

Dounay Zinth Jennifer 2016 ldquo50-State Comparison Advanced Placement Policiesrdquo Education

Commission of the States

Drew Christopher 2011 ldquoRethinking Advanced Placementrdquo The New York Times January 7

Duffett Ann and Steve Farkas 2009 ldquoGrowing Pains in the Advanced Placement Program Do

Tough Trade-offs Lie Aheadrdquo Thomas B Fordham Institute Washington DC

Ellis Jessica Bailey K Fosdick and Chris Rasmussen 2016 ldquoWomen 15 Times More Likely to

Leave STEM Pipeline after Calculus Compared to Men Lack of Mathematical

Confidence a Potential Culpritrdquo PLOS ONE 11 (7) 1ndash14

Foust Regan Clark Holly Hertberg-Davis and Carolyn M Callahan 2009 ldquoStudentsrsquo

Perceptions of the Non-academic Advantages and Disadvantages of Participation in

Advanced Placement Courses and International Baccalaureate Programsrdquo Adolescence

44 (174) 289ndash312

Geiser Saul and Veronica Santelices 2004 ldquoThe Role of Advanced Placement and Honors

Courses in College Admissionsrdquo Center for Studies in Higher Education Research

Occasional Paper Series CSHE404

Goodman Joshua Samuel 2012 ldquoThe Labor of Division Returns to Compulsory Math

Courseworkrdquo Unpublished Manuscript

Harel O 2009 ldquoThe Estimation of R-squared and Adjusted R-squared in Incomplete Data Sets

Using Multiple Imputationrdquo Journal of Applied Statistics 36 (10) 1109ndash1118

Hippel Paul T von 2007 ldquoRegression with Missing Ys An Improved Strategy for Analyzing

Multiply Imputed Datardquo Sociological Methodology 37 (1) 83ndash117

Holstead Michael S Terry E Spradlin Margaret E McGillivray and Nathan Burroughs 2010

ldquoThe Impact of Advanced Placement Incentive Programsrdquo Center for Evaluation and

Education Policy Indiana University Education Policy Brief 8(1)

Hopkins Katy 2012 ldquoWeigh the Benefits Stress of AP Courses for Your Studentrdquo US News

the World Report May 10

Huber Martin 2013 ldquoA Simple Test for the Ignorability of Non-compliance in Experimentsrdquo

Economics Letters 120 (3) 389ndash391

Imbens G and F Manski 2004 ldquoConfidence Intervals for Partially Identified Parametersrdquo

Econometrica 72 (6) 1845ndash1857

Jackson C Kirabo 2010 ldquoA Little Now for a Lot Later A Look at a Texas Advanced Placement

Incentive Programrdquo Journal of Human Resources 45 (3) 591ndash639

__________ 2014 ldquoDo College-Preparatory Programs Improve Long-Term Outcomesrdquo

Economic Inquiry 52 (1) 72ndash99

Joensen Juanna Schroslashter and Helena Skyt Nielsen 2009 ldquoIs there a Causal Effect of High

School Math on Labor Market Outcomesrdquo Journal of Human Resources 44 (1) 171ndash

198

Kim Emily 2015 ldquoAP Classes often Translate to Advanced Pressurerdquo Los Angeles Times

September 22

Klopfenstein Kristin and Kit Lively 2016 ldquoDo Grade Weights Promote More Advanced

17

Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324

Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement

Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891

__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and

Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds

Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188

Cambridge Harvard Education Press

Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla

Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)

287ndash 313

Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on

Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102

Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations

of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347

(6219) 262ndash265

Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math

and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic

Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student

STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher

Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking

on Secondary and Postsecondary Successrdquo American Educational Research Journal 49

(2) 285ndash322

Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP

Expansion Can Schools in Less-Resourced Communities Successfully Implement

Advanced Placement Science Coursesrdquo Conditionally accepted by Educational

Researcher

Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo

American Enterprise Institute Washington DC

Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23

McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy

Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of

Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-

144) US Department of Education Washington DC National Center for Education

Statistics

National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of

Mathematics and Science in US High Schoolsrdquo Washington DC National Academies

Press

__________ 2012 A Framework for K-12 Science Education Practices Crosscutting

Concepts and Core Ideas Washington DC The National Academies Press

Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC

Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data

Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures

Version 10 Stanford University

Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic

Analysis amp Policy 4 (1) 1ndash30

18

Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The

Review of Economics and Statistics 86 (2) 497ndash513

Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)

Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of

Advanced High School Coursework in Increasing STEM Career Interestrdquo Science

Educator 23 (1) 1ndash13

Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework

in College Admission Decisionsrdquo College and University 82 (4) 7ndash14

Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan

Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific

Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo

Educational Measurement Forthcoming

Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where

it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor

Economics 35 (1) 67ndash147

Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An

Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732

Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual

differencesrdquo Personality and Individual Differences 21 (6) 971ndash986

Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of

Cross-Cultural Psychology 45 (5) 821ndash837

Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid

Growthrdquo The New York Times April 29

Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo

Liberal Education 94 (3) 38ndash43

The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo

Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo

Education Trust June 5

Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and

Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-

001) US Department of Education Washington DC National Center for Education

Statistics

Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13

Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate

US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the

Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced

Placement Testsrdquo Washington DC

Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of

Advanced Placementrdquo Progressive Policy Institute Washington DC

West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth

Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring

Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation

and Policy Analysis 38 (1) 148ndash170

Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity

of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482

19

Figure 1

Geographic Distribution of Participating Districts

20

Figure 2

Participating Districts Neighborhood Socioeconomic Status and School Test Scores

Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school

district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos

neighborhood defined as the first principal component factor score based on measures of median

income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed

household rate and unemployment rate Y-axis is the districtrsquos average test score in grade

equivalents based on the averaged spring math and English scores for students in grades 3-8 for

2009-2013 with the expected level of achievement standardized to zero The size of each circle

is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using

Statarsquos default settings and roughly shows the predicted test score as a function of the

neighborhoodrsquos SES

21

Figure 3

Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile

Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects

Corresponding OLS estimate shown by the dashed horizontal line Science skill has been

standardized to have a mean of 0 and SD of 1 for the full sample of participating students

Results are weighted by the inverse probability of completing the survey

22

Table 1

Participating Schools and Teachers Compared to Other US High Schools and High School

Science Teachers Panel A Schools Participating Others

Average Enrollment 1409 723

Free or Reduced-Price Lunch 0700 0438

Asian 0055 0050

Black 0349 0154

Hispanic 0410 0221

White 0164 0537

Adjusted Cohort Graduation Rate 0843 0802

District Instruction Expenditures Per Pupil $6561 $5636

District Student Services Expenditures Per Pupil $3787 $3385

Panel B Teachers Participating Others

Age Under 30 0407 0160

Age 30-49 0432 0553

Age 50 or over 0161 0287

Female 0630 0536

Hispanic or Latino 0111 0051

Race American Indian or Alaska Native 0000 0009

Race Asian American 0111 0041

Race Black 0111 0060

Race Native Hawaiian or other Pacific Islander 0000 0004

Race White 0778 0896

Years of Experience 103 132

Years of Experience lt=2 0290 0085

Years of Experience lt=5 0481 0234

Hold a Teaching Certificate 0926 0945

Undergraduate Major in STEM 0944 0747

Single Subject Credential in Science 0630 0823

Masterrsquos Degree or Higher 0356 0615

Previously Taught AP Course 0469 NA

Previously Taught AP IB or Honors Course 0796 NA

Number of Professional Development Trainings 309 NA

in the Past 5 years (0-5)

Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts

httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public

high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a

9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the

Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey

httpsncesedgovsurveyssass Others in Panel B refers to public and private high school

teachers in the US High school science teachers are defined as teachers of grades 9-12 whose

main teaching assignment is in the natural sciences

23

Table 2

TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics

(1) (2) (3) (4) (5) (6)

Full Sample Survey Sample

Pre-Treatment Characteristic

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Age as of October of 11th Grade 166 -003 -007 166 -001 -001

(002) (007) (003) (009)

[019] [035] [065] [094]

Math Exam Score 038 008 025 044 007 030

(004) (010) (005) (016)

[008] [002] [017] [006]

Reading Exam Score 029 010 018 036 009 017

(003) (012) (004) (017)

[000] [014] [002] [031]

HS Grade Point Average 316 005 020 323 006 013

(003) (008) (003) (010)

[014] [002] [006] [020]

Female 059 000 010 061 -001 011

(003) (006) (004) (007)

[099] [010] [073] [012]

Asian American 012 002 010 012 003 010

(002) (005) (001) (007)

[027] [006] [007] [012]

Black 032 -002 -006 027 000 -005

(002) (006) (002) (005)

[029] [028] [088] [040]

Hispanic Native American or Multiracial 031 001 005 033 001 005

24

(002) (006) (002) (007)

[055] [041] [081] [051]

Disabled 002 000 -001 001 000 -001

(001) (001) (001) (001)

[093] [024] [057] [05]

Gifted 013 003 000 014 002 001

(002) (005) (002) (009)

[006] [100] [025] [089]

English Language Learner 005 001 002 004 001 004

(001) (002) (001) (003)

[041] [039] [054] [022]

Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007

(002) (007) (003) (009)

[066] [077] [072] [045]

Language Other than English Spoken at Home 034 002 003 035 001 004

(002) (007) (002) (007)

[032] [073] [059] [056]

Took Recommended Prerequisite Courses 079 000 009 079 002 005

(002) (004) (002) (005)

[084] [004] [043] [031]

Number of Observations 1819 1417

Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by

School x Cohort are in parentheses and p-values are in brackets

25

Table 3

First Stage Impacts on AP Course Enrollment and Overall Course Enrollment

(1) (2) (3) (4) (5) (6)

Full Sample Survey Respondents

Outcome

Control

Group

Mean

ITT

LATE

Control

Group

Mean

ITT

LATE

AP Treatment Course Enrollment 019 038 024 039

(005) (006)

[000] [000] Share of Credits During Study Year in

AP Science 003 004 011 003 004 010

(001) (001) (001) (001)

[000] [000] [000] [000]

All AP 013 004 011 014 004 010

(001) (002) (001) (002)

[000] [000] [000] [000]

Other Advanced Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [023] [020] [020]

All Other Advanced 025 -001 -003 025 -001 -003

(001) (002) (001) (003)

[023] [023] [030] [030]

Regular Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [020] [024] [019]

All Regular 062 -003 -009 061 -003 -007

(001) (003) (001) (003)

[002] [000] [007] [003]

Number of Observations 1819 1417

Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating

Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation

(1) Course-taking information collected from student transcripts Control Group Mean uses the

full control group for the first outcome (ie AP Treatment Course Enrollment) and those control

group members who complied with their assignment (ie those who did not take the AP

Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are

weighted by the inverse probability of completing the survey Standard errors clustered by School

x Cohort are in parentheses and p-values are in brackets

26

Table 4

Treatment Contrast (Composite Variables)

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Academically Challenging Curriculum -033 031 080

(010) (024)

[000] [000]

Project-Based Independent Classroom

Activities -006 013 033

(007) (017)

[007] [006]

Integrated Use of Technology

-011 011 028

(008) (019)

[019] [014]

Number of Observations 1417

Notes To construct these composite variables we first converted the values on each component

variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest

category was set to 10 the lowest to 00 and the remaining categories evenly spaced between

00 and 10 We then averaged and standardized these converted values Results are weighted by

the inverse probability of completing the survey Online Appendix Table 5 provides the list of

component variables Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

27

Table 5

AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Science Skill -019 009 023

(006) (016)

[015] [014]

STEM Interest 062 004 009

(002) (007)

[016] [016]

Confidence in College

Science 092 -004 -010

(002) (005)

[011] [006]

Stress 012 007 017

(003) (007)

[002] [001]

Grades in Science Courses 280 -012 -029

(007) (016)

[008] [007]

Grades in Other Courses 314 -007 -018

(002) (006)

[000] [000]

Number of Observations 1819 for grades 1417 for other

outcomes

Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of

participating students STEM interest =1 if high or some interest in pursuing a STEM degree or

=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to

complete a college science course or =0 if somewhat not confident or not at all confident Stress=

1 if most recent science course had strong negative or negative impact on physical or emotional

health or =0 if strong positive impact positive impact or no impact Grades in science and other

courses are obtained from student transcripts and measure grades during the study year

Results with the exception of grades during study year are weighted by the inverse probability of

completing the survey Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

Table 6

Robustness Checks of Main ITT Results

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Outcome

Control

Group

Complie

r Mean

Main

Result

s

Robus

t SE

p-value

(permutatio

n test)

Excludin

g High

School

56

Including

Imputatio

n of

Missing

Outcome

Variables

Excluding

Covariate

s

Excludin

g High

School

23

Lee

Lower

Boun

d

Lee

Upper

Boun

d

95

Confidence

Interval

from Lee

Bounds

Rati

o of

95

CI in

(11)

to

95

CI in

(7)

Science Skill -019 009 010 011 020 007 003 039

-

009

05

1 20

(006) (005) (000) (000) (000) (000) (007) (007)

[015] [006] [006] [020] [011] [001] [024] [072] [000]

STEM Interest 062 004 005 003 003 003 002 012

-

003

01

8 19

(002) (003) (000) (000) (000) (000) (003) (004)

[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College

Science 092 -004 -003 -006 -006 -004 -006 005

-

009

01

0 20

(002) (002) (000) (000) (000) (000) (002) (003)

[011] [005] [007] [037] [002] [003] [010] [000] [017]

Stress 012 007 005 006 008 007 001 011

-

005

01

5 16

(003) (002) (000) (000) (000) (000) (003) (002)

[002] [000] [000] [014] [007] [002] [002] [079] [000]

Grades in Science Courses 280 -012 -006 -010 -007 |

(007) (004) (000) (000) (000)

[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts

Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey

(002) (003) (000) (000) (000) |

[000] [001] [001] [000] [001] [038]

Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than

standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby

a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of

43

the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)

reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the

experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply

imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and

where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)

from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those

treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and

control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to

derive confidence interval for the treatment effect itself)

30

1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the

Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the

effect of advanced high school courses more generally often without distinctions between AP

and other rigorous course options Nearly all of these nonexperimental studies find large positive

effects of rigorous secondary school courses particularly those in math and science on studentsrsquo

high school postsecondary and labor market performance (eg Altonji 1995 Attewell and

Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long

Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer

an AP course are encouraged (though not required) to attend a professional development

training The Board and other independent agencies offer several workshops with the most

extensive training being the AP summer institute a week-long training that is led by an

experienced AP instructor Teachers are then expected to develop their syllabi for the course and

submit them to the Board for review A team of auditors at the Board review each syllabus and

grant permission to a school to label the course as AP on course catalogs and student transcripts

once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they

do not meet the requirements upon original submission College Board (2017b) contains a

discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for

assessment (ie course delivery and student performance are not assessed by the Board) In

order to effectively run an AP Biology or Chemistry course teachers require access to a well-

equipped classroom and laboratory including all supplies necessary to engage in

experimentation (eg beakers solutions microscopes measuring equipment) Most of the

teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of

learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry

reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses

were based upon recommendations from the National Science Foundation the National Research

Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently

influenced by their frames of reference in ways that other assessments of these traits (eg

external observations) may be less influenced By increasing the standard to which they compare

themselves studentsrsquo confidence may decrease This feature of most self-assessments could be

considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et

al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome

depends to some extent on how these changes in perceived ability influence other behaviors

such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and

Biology I and Chemistry I for AP Biology with no additional requirements beyond these

prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week

training course classroom supplies (eg lab materials textbooks) and to compensate schools

for the staff time required for study administration efforts We also offered $1000 compensation

for an individual selected by the school to serve as a liaison between the study team and the

31

school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of

students which would have powered the study to detect effect sizes smaller than those detected

here We faced several challenges in recruiting schools to participate even with the monetary

incentives Some schools were uncomfortable with randomization across classrooms while

others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the

course would be offered We also made some assignments on a rolling basis as additional

consentassent forms were submitted We have no information on the students who were deemed

eligible by the school to take the new AP science course but who did not sign the consent form

to participate As these students did not participate we do not have permission to obtain

information on their characteristics (eg via transcripts) and for most schools we do not know

the number of such students 10 Participating districts include Anaheim Union High School District California East Side

Union High School District California Lynwood Unified School District California Jefferson

Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg

Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public

Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville

Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate

degrees held by teachers nationally are likely to be in education (not STEM) Thus the study

teachers are less likely to have a graduate degree but not necessarily less likely to have STEM

training We also did not survey teachers regarding their Teach for America (TFA) experience

but it is possible that the relatively high share of STEM undergraduate degrees could be driven

by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last

pilot test included 140 students) prior to administering the tool to study participants Reliability

metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of

the development of the assessment instrument in the survey can be found in Seeratan et al

(2017) 13 Each year in the spring semester our team administered and collected the participant surveys

during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However

if study participants who did not take the survey differ in unobserved ways then our reweighting

based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo

characteristics before imputation of missing values (as described below) these results are very

similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact

that some students did not have 10th grade scores) we created one reading and math score for

each student that is the average of both scores or just the 8th grade score For the 23 participating

students who were in 10th grade during the year in which the AP course was offered to their

cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be

endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of

enrollment in the course so the lack of balance is simply due to unlucky randomization rather

32

than manipulation by school administrators We considered implementing a randomized block

design to avoid such issues but found it infeasible to obtain the necessary test score information

prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was

allowed to register for the new class We added an entire planning year to our study design to

avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by

Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We

find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these

six outcomes which suggests that generalizing our estimated treatment effects to the full control

group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly

offered enrollment in an AP course and then given the option of Chemistry or Biology To

account for the two courses offered we treat the school as two separate groups School-

Chemistry and School-Biology For those students who were not offered an AP course we

randomly assign them to one of two control groups proportional to the number of treated

students who chose each course For example if 60 of the treated students chose Biology then

we randomly assign 60 of the control students to the School-Biology control group In Section

VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a

probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1

if student i in school by cohort j completed any part of the end-of-year survey Xi is the same

vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed

effects and Φ() is the cumulative normal distribution function The results of this regression are

included in Online Appendix Table 2 Students who had higher pre-treatment grades Black

students those who were not disabled and those who took prerequisite courses were more likely

to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives

more weight in the regression to study participants who completed the survey and yet had pre-

study characteristics that were similar to those study participants who did not complete the

survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and

with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we

observe each outcome variable This follows a multiple imputation then deletion strategy

suggested by Hippel (2007) which improves efficiency while protecting against problematic

imputed outcome values As a robustness check Section VC provides results including

imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually

challenging curriculum with more homework than non-AP complier students Treatment group

students are also more likely to report that the students in their class were driven to succeed and

that the teacher set high standards The AP science class also involved more student-led projects

or experiments hands on learning and small group work all activities that are deemed to be

essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)

Yet we do not find strong evidence that students in AP classes were more likely to present what

they learned apply their knowledge to solve a new problem or work independently and none of

the component measures of technology usage were statistically significantly affected Nor did

33

treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear

better able to implement the academic rigor expected of an AP science class than some of the

inquiry-based approaches that the College Board intends for AP science We do not find

evidence that taking AP science led students to be more likely to report that they found their

course more interesting which may reflect the inability of the teachers to fully implement a

creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects

that might render our estimated effects smaller A research design with randomization both

across and within schools would allow for estimation of spillover effects but such a design was

infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers

received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the

weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors

in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple

outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons

(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same

three outcomes that reach statistical significance without applying the correction (shown in

Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys

from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we

have included the students from cohort 1 of high school number 23 where nonresponse was due

mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes

(including student prior academic preparation raceethnicity gender and teacher preparation)

We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in

science and grades in other courses) Some of the differences in the point estimates were quite

large yet so too were the standard errors For instance five of the seven estimated differential

treatment effects on science skill exceed 025 standard deviations with p-values that fall in the

suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse

on all three cohorts of study participants Once data collection is complete we will have the

ability to examine the effect of AP science on college enrollment college selectivity and college

completion

Page 5: Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown,

5

standard to which they compare themselves or by making them more aware of the challenges

they might face in a college course Whether these changes in perceived confidence persist and

how they influence later outcomes is uncertain Students with expectation levels that match the

real demands of college courses might eventually perform better in those courses Some students

might also use the insights they gain from a challenging AP science class to shift away from

difficult science courses in college (or entire majors) that could delay or hinder their college

completion Our results also suggest that AP causes a significant amount of stress for students

but we do not find evidence that the added pressure substantially limits their knowledge gains in

science

II AP Science and Conceptual Framework

A AP and Other Rigorous Secondary School Courses

The AP program is an appealing option for high school administrators who seek to offer college-

level courses to their students AP course descriptions and assignments are designed to match

those offered in introductory college courses in each subject and thus to prepare students for the

rigor of college coursework The College Board (the ldquoBoardrdquo for brevity) is a not-for-profit

organization that administers AP and provides professional development for teachers reviews of

course syllabi and extensive curricular materials (eg sample syllabi sample lab experiments)3

The Board also offers standardized AP exams in the spring of each year that are graded by

external examiners and provide an externally-validated measure of student learning Most exams

include both an essay or problem-solving component and multiple-choice questions all of which

are aligned with the course descriptions The exam is one of the key features of the AP program

and is used by high school and postsecondary educators to evaluate the depth of studentsrsquo skill

independently of teacher bias

In addition to AP courses high school students typically have three alternative options for

advanced coursework Most high schools offer ldquohonorsrdquo courses which are intended to provide a

more rigorous curriculum than the regular course in the same subject The content and rigor of

honors courses varies across high schools and there is no standardized honors exam offered to

students in these courses A second option is the International Baccalaureate (IB) program

which was originally designed for students in international schools and aims to develop

studentsrsquo critical thinking skills and their knowledge of international affairs The IB program is

offered worldwide but remains relatively uncommon in the United States with less than 5 percent

of high schools offering IB in 2016 (The IB Programme 2016) A final option is for students to

take a course at a nearby college (or online) or for some a course that is taught at their high

school by an instructor who has been approved as college-level These ldquodual enrollmentrdquo or

ldquodual creditrdquo courses are meant to provide students with the opportunity to simultaneously earn

high school and college credit In the most recent national survey high schools reported

approximately two million enrollments in dual credit courses (Thomas et al 2013) There is

limited information on the colleges that accept dual enrollment credits Most courses are offered

through collaborations between high schools and local community and public postsecondary

institutions suggesting that credits are generally accepted at these institutions and less often

accepted at other institutions Comparisons of AP science classes to regular and honors level

science classes reveal that students receive much more homework and work harder in their AP

classes (Sadler et al 2014) To our knowledge there have been no comparisons of the workload

or effort in AP science courses compared to IB or dual enrollment science courses

6

B Conceptual Framework

There are several channels through which an AP science class is expected to influence studentsrsquo

cognitive and noncognitive skills Much like the ideal college course AP science is designed to

provide rigorous content and a substantial workload be taught by teachers who have high

expectations and consist of students who are driven to succeed These inputsmdashcourse rigor

teacher expectations and peer motivationmdashare often thought of as the main characteristics that

distinguish AP courses from other high school courses

Yet AP science classes are also intended to offer an inquiry-based approach to science that

when combined with a high level of rigor provides an additional causal pathway to change

Specifically a well-implemented AP science course should encourage students to ask questions

gather and interpret data arrive at explanations grounded in scientific principles and

communicate their observations to one another under the guidance of teachers (College Board

2011a 2011b)4 This student-led inquiry-based approach differs from many traditional

secondary school science classrooms where the goal is often for students to memorize content

and replicate laboratory experiments that demonstrate the content (National Research Council

2002 2012) The AP science course in contrast seeks to expose students to the real-world

practices of science and the skills that form the basis of scientific inquiry by focusing more on

big picture concepts and small group experimentation with students directing the inquiry The

curriculum also encourages teachers to move away from lecture-based pedagogy and multiple-

choice quizzes and to increase their use of technology to help students analyze data draw

interpretations and communicate findings (College Board 2011a 2011b)

AP science classes are expected to increase studentsrsquo ability to ask research questions design

experiments analyze data and draw conclusions In the process of gaining these scientific

inquiry skills the new curriculum is intended to spur greater interest in the practice of science

because it becomes more enjoyable and more accessible to students for whom rote memorization

and execution of prefabricated lab experiments might have diminished enthusiasm in the subject

(National Research Council 2012) Science experts posit that inquiry-based science courses will

be particularly successful in generating greater interest and skill among women and among

students from underrepresented minority groups (Aguilar Walton and Wieman 2014 Ellis

Fosdick and Rasmussen 2016 Kurth Anderson and Palincsar 2002 Leslie et al 2015 Litzler

Samuelson and Lorah 2014)

While the rigor and expectations of a college course may be appropriate for some students it

can be too demanding for others Students often report high levels of stress and burnout from

taking AP courses particularly if they perceive that they are not prepared for the challenge of

college coursework (Kim 2015 Marx 2014 Tucker 2012) A strenuous AP course could in fact

cause students to lose confidence in their ability to complete college science courses A number

of mechanisms could cause students to lose confidence including exposure to stronger peers

inability to successfully complete assignments or simply receiving lower grades than they

received in their non-AP courses5 The AP effect on confidence will likely matter differently for

students with different levels of initial confidence For students who are over-confident in their

ability to succeed in college science courses taking a challenging AP course in high school

might cause them to revise their expectations to be more in line with the higher demands of

college-level work

Taking a more strenuous AP course is also likely to affect studentsrsquo time allocation

Studentsrsquo performance in each class will be determined by their subject-specific ability as well as

the amount of time they devote to their coursework versus other activities including work

7

extracurricular and leisure If AP courses are more demanding than other courses students

solving a time allocation problem may shift more effort into their AP course away from other

pursuits The impact of this change in time allocation on studentsrsquo performance in AP and other

courses will depend upon whether they shift effort away from other courses and on the degree of

complementarity between their AP science course and their other courses Study time devoted to

an AP science course could improve student performance in other math and science classes

(where the skills tasks and knowledge are similar) even if students spend less time on those

courses For courses that require students to perform tasks that are not complementary with AP

science (eg courses in the humanities) taking AP science concurrently with these courses

could decrease student performance in both courses Of course students taking an AP course

could choose to reduce time spent on alternative (non-academic) activities If these other

activities have no causal impact on performance in school then the impact on overall

achievement could be negligible

Some students report concerns about their time allocation as they weigh the decision to enroll

in AP (Foust Hertberg-Davis and Callahan 2009 Hopkins 2012 Kim 2015) Many of these

concerns have increased over time as the courses have become more accessible to students who

previously faced barriers to enrollment Traditionally teachers only recommended AP courses to

students with high grades in prerequisite classes and the courses were only offered in schools

with substantial resources The Board has made efforts to increase access with for instance a

policy statement that encourages schools to open AP to all students who are ldquowilling to accept

the challengerdquo and remove all barriers that restrict access (College Board 2002)6 In a 2008

survey of a nationally-representative sample 65 percent of secondary school teachers reported

that their schools encourage as many students as possible to take AP and 69 percent reported that

AP courses are generally open to any student who wants to enroll (Duffett and Farkas 2009)

These open access policies have led to complaints that students who enroll with less preparation

will be unable to engage in the material (and perhaps become more discouraged by the

difficulty of the course) than students with more prior preparation (Hopkins 2012 Steinberg

2009 Duffett and Farkas 2009) Open access could also adversely affect more prepared students

through negative peer effects or through teachers removing content and slowing the pace of

course delivery

III AP Science Impact Study

A Overview

We recruited 23 schools from across the United States and offered monetary compensation to

pay for equipment and teacher training and as an incentive to secure participation7 Eligible

schools included ones that had not offered AP Biology or AP Chemistry in recent years were

willing to add such a course and comply with study protocol and had more eligible students than

could be served in one class so as to supply a sufficiently-sized control group8 Of the 23

schools 12 schools added AP Chemistry 10 schools added AP Biology and 1 school added both

courses We recruited two waves of schools (those that offered the course for the first time in

2013 and those that offered it for the first time in 2014) both waves were asked to field the

course for two years and the earlier-joining schools had the option of fielding the course for

three years The study includes 47 schools by cohort groups

Each participating school identified students that the school deemed eligible to take the new

AP Biology or Chemistry course in the spring of the prior year We treated all eligible students

8

who assented to participate in the study and who obtained consent from their parent or guardian

as study participants Upon receipt of signed consentassent forms we randomly offered

enrollment in the newly launched course to a subset of participating students9 The study

includes a total of 27 teachers and 1819 students (with an average of approximately 19 students

per AP class)

Figure 1 shows the geographic distribution of the 11 participating districts which are

primarily concentrated in the western southern and eastern regions of the country10 The

underrepresentation of districts in the Midwest is consistent with evidence that the Midwestern

region has experienced less competition over the years in access to selective postsecondary

institutions and a corresponding lag in AP participation rates (Bound Hershbein and Long

2009) Relative to districts across the nation those participating in the study tend to be in

neighborhoods with lower levels of socioeconomic status and to educate students who score

below average on tests in earlier grades (see Figure 2) Correspondingly participating schools

tend to be larger and more likely to educate students who are eligible for free or reduced-price

lunch Black and Hispanic than other schools (Panel A of Table 1)

There are two reasons for this over-representation of larger schools serving less economically

prosperous communities First AP courses are already offered in the majority of the nationrsquos

public high schools and schools that serve students from high-income families tend to offer

more AP subjects than schools that serve students from lower-income families (Malkus 2016

Theokas and Saaris 2013) Given that our research design only allowed for schools that had not

recently offered an AP science course the population of schools from which we recruited tended

to be those in settings with fewer resources Second participating schools were required to state

that they believed they would have 60 or more students who were qualified to take the AP

science course and this requirement tended to disqualify smaller high schools

Reflecting the school demographics participating teachers are slightly younger less

experienced and more likely to be female Black Asian American and of Hispanic ethnicity

than US high school science teachers generally (Panel B of Table 1) Nearly half (a third) of our

study teachers have less than or equal to five (two) years of teaching experience which is more

than double (triple) the rate of US high school science teachers Study teachers are more likely to

hold an undergraduate major in a STEM field than other high school science teachers yet far less

likely to hold a mastersrsquo degree and slightly less likely to have earned a teaching credential in

science Most of the participating teachers had previously taught a higher-level course (mostly

honors) yet only 47 percent of them had previously taught an AP course Our research

consequently applies to a population of teachers who are relatively new to the AP science

curriculum and who have generally not received graduate training11 Assuming AP courses

improve with teacher preparation our results likely capture the effect of a less-than-ideal version

of AP and may result in less positive treatment effects than when AP is delivered by teachers

with more training and experience (Clotfelter Ladd and Vigdor 2010)

B Data and Student Descriptive Statistics

We rely on three primary and secondary data sources for impact estimates The first is an

assessment developed and validated by the research team that measures studentsrsquo scientific

inquiry skills We administered this assessment to students in both treatment and control groups

and designed it to measure general inquiry skills (eg how to analyze data) rather than specific

content knowledge in Biology or Chemistry To that end the assessment tool includes nine items

that rely on science disciplinary knowledge that is taught in middle school specifically material

from Life Sciences and Physical Sciences The assessment which we administered to all study

9

participants during a 45-minute period measures studentsrsquo skills in data analysis scientific

explanation and scientific argument12 Participating teachers were not provided copies of the

instrument in advance therefore teachers were unable to teach any content material prior to test

administration

The second source is a questionnaire that we administered concurrently with the assessment

and that asks students a number of questions about their most recent science class and their plans

after high school The assessment and questionnaire were completed together and administered

outside of class (henceforth we refer to these instruments as the ldquosurveyrdquo) The third data source

are studentsrsquo high school transcripts which contain data on demographic and socioeconomic

background grades courses standardized exams taken in the 8th and 10th grades as well as high

school completion We use these data to determine the balance of randomization on pre-

treatment covariates estimate the effect of randomization on course-taking (including

compliance) improve the precision of our estimates with statistical controls and estimate

treatment effects on studentsrsquo grades

Our survey response rate was 78 percent13 Attrition can be attributed to student absences

during the dates scheduled for survey administration and communication lapses between school

coordinators and students Students who were randomly assigned to treatment have a 9-

percentage point higher survey response rate Given the possibility of nonrandom sample

attrition we weight all regressions by the inverse of the probability of completing the survey

conditional on student characteristics14 We implement a variety of robustness checks as

additional means to account for nonresponse These include multiple imputation of missing

outcome variables excluding one high school that had a low response rate and using the Lee

(2009) technique to provide bounds on the estimated effects These methods and results are

discussed below

We supplement these data with surveys that we administered online to teachers of the new

AP courses at the conclusion of the course The teacher survey includes questions about their

educational background professional experiences and professional development past and

present instructional practices generally and around science specifically participation in the

College Board AP training ability to cover the content of the AP course and coaching

mentoring and other professional community supports received from the school district and

education community

Table 2 provides balancing tests on pre-treatment characteristics for the full sample and the

survey respondents conditional on school by cohort fixed effects15 Most of the estimated

differences between treatment and control group students on pre-treatment observed

characteristics are small with some notable exceptions In both the full and survey samples

treatment group studentsrsquo reading exam scores were 010 and 009 standard deviations higher

than control group students both at p-values below 005 The magnitude of the treatment-control

difference was slightly lower and less precisely-estimated in math yet also favored treatment

group students16 To adjust for these chance imbalances we include all student covariates as

predictors of outcomes in the models and in the robustness checks we exclude these

covariates17

Table 2 also shows the extent of differences between control group compliers and non-

compliers We find that non-compliers are generally much more academically prepared for AP

science they have higher pre-treatment reading and math test scores and are more likely to have

completed the prerequisite courses On demographics non-compliers are more likely to be Asian

American and female18

10

IV Empirical Strategy

We estimate the effect of taking the AP science course with a standard instrumental variable

specification

(1) 119884119894119895 = 120572119895 + 119860119894119895120573 + 119935119894120574 + 120598119894119895

(2) 119860119875119894119895 = 120575119895 + 119874119891119891119890119903119890119889119894119895120579 + 119935119894120583 + 120598119894119895

where 119860119875119894119895 = 1 if student i enrolled in the AP science course in school x cohort stratum j 119860119894119895 is

the fitted value based on the estimates of the parameters in Equation (2) Offeredij = 1 if the

student is randomized into the treatment group Xi is a vector of pre-treatment covariates

(including age math and reading exam scores from 8th and 10th grade (standardized and

averaged for math and reading separately) cumulative GPA prior to the year when the AP

science course was offered and indicator variables for female racial group (Asian American

Black or Hispanic Native American or Multiracial) disability gifted English Language

Learner eligible for free or reduced-price lunch home language is not English and took

recommended prerequisite courses) and 120572119895 and 120575119895 are school by cohort fixed effects19 We use

two-stage least squares to estimate the model for all outcomes The local average treatment effect

(LATE) estimate is given by β

The intent to treat (ITT) estimate is obtained by replacing 119860119894119895 with Offeredij in Equation (1)

as shown in Equation (3) The coefficient on Offeredij in Equation (3) provides the effect of

being offered enrollment in the new AP science course and is a weighted average of effects on

those who do and do not choose to enroll in the course

(3) 119884119894119895 = 120577119895 + 119874119891119891119890119903119890119889119894119895120591 + 119935119894120582 + 120598119894119895

For outcomes that are obtained from the survey we weight regressions by the inverse of the

estimated probability of completing the survey20 The results are similar without using these

weights (see Online Appendix Tables 3 4 and 6) Since we have some missingness in student

characteristics as a result of either missing student transcripts or certain data elements not

collected by the district we use multiple imputation by chained equations creating 10 imputed

datasets and combine the results21 For inference we cluster standard errors at the level of

treatment assignment (school by cohort) in our analysis of main effects In the analysis of

robustness we report permutation standard errors robust standard errors (for comparison to

permutations) and the statistical significance of the LATE estimates after adjusting our tests of

significance for multiple comparisons

V Results

A Course-Taking and Treatment Contrast

Table 3 provides estimated effects of the randomized offer of enrollment on AP science course

enrollment and share of credits in all courses for the full sample and the survey samples The

first-stage estimates indicate that the offer substantially increased the likelihood of the student

taking the AP science course by 38 percentage points in the full sample and 39 percentage points

in the survey sample As we expected compliance with randomization was imperfect with 42

11

percent of the students who received an offer choosing not to enroll and 19 percent of the control

students enrolling Nearly all of these latter crossovers reflected decisions by the district to

violate the study protocol and let control group students into the course while a few of these

came from hardship exemptions that were requested by the school and granted by the study team

The remaining rows in Table 3 shine light on the courses that were crowded out by the newly

offered AP science course Mechanically treatment group students took more credits in AP

science (an 11-percentage point increase in the share of total credits in the full sample)

Treatment group studentsrsquo share of courses in any AP also increased by 11 percentage points

indicating that they chose not to reduce enrollment in other AP courses Instead taking AP

science appears to have crowded out regular courses (down 9 percentage points) including

regular science courses (down 2 percentage points)22

Approximately 78 percent of the control group compliers took any science course with 34

percent taking a non-AP advanced science course (almost entirely honors courses) during the

study year The control students who did not take AP Biology or Chemistry took a variety of

alternative science courses with the most commonly reported courses including Chemistry

(13) Physics (12) AP Environmental Science (11) Biology (10) Honors Biology (9)

and AnatomyPhysiology (9)

Table 4 provides the contrast in treatment and control group complier reports on the content

and rigor of their science courses for three composite variables We find that taking AP science

yielded a substantially more academically challenging curriculum (up 080 sd p-value lt 001)

and raised the extent of inquiry-based classroom activities (up 033 sd p-value = 006) Our

results also suggest that AP course-takerrsquos classrooms were more likely to use technology (up

028 sd p-value = 014)23 Online Appendix Table 5 shows estimated impacts on each of the

component variables used in constructing the composite variables We find that while AP

classrooms were more inquiry-based than other science classrooms using our composite

measure some of the core components of the inquiry approach that were intended by the Board

(eg applying knowledge to solve a new problem) were not more prevalent in AP science

classes than other science classes24 This contrast between studentsrsquo reports of the content and

rigor of their AP science course relative to other courses available to them offers one measure of

the relative quality of the treatment In a companion manuscript we provide a detailed evaluation

of implementation fidelity (the degree to which the courses were implemented as intended by the

Board) through teacher surveys course syllabi student transcripts and interviews with teachers

and school administrators (Long Conger and McGhee 2018) In that manuscript we find results

that are consistent with the finding that most teachers were able to implement a rigorous AP

science classroom yet they also struggled with the inquiry-based approach and integrating

technology into the classroom

These reported differences between treatment and control group classrooms also hold despite

the fact that many of the teachers selected to teach AP also teach the other science courses taken

by control group students In fact almost 67 percent of AP teachers reported using some of their

AP science strategies and lessons in their non-AP classes These within-school spillovers likely

attenuate observed differences in outcomes between treatment and control group students in the

same school25

B AP Impact on Outcomes

Table 5 reports estimated impacts of AP science on the key outcomes of interest We estimate

that for the typical complier taking AP science raises objectively measured scientific inquiry

skills by 023 standard deviations We are unable to rule out zero treatment impacts with

12

conventionally high levels of confidence (p-value = 014) and consequently refer to these results

as more suggestive than definitive AP science also increased compliersrsquo interest in pursuing a

STEM degree should they enroll in college by 9 percentage points up from a control group

complier mean of 62 percent with again more suggestive than definitive results at traditional

levels of statistical inference (p-value = 016)

Table 5 provides stronger evidence of negative treatment effects on studentsrsquo confidence in

their ability to succeed in a college science course Among control group compliers 92 percent

express that they are at least somewhat confident in their ability to succeed in a college science

course These high levels of confidence are perhaps not surprising since all of our sample

participants demonstrated interest in taking AP Chemistry or Biology as a result of signing the

study assent forms Taking AP science substantially lowered participantsrsquo likelihood of being at

least somewhat confident in their ability to complete college courses in science (down 10

percentage points p-value = 006) We also find large effects of the AP course on studentsrsquo self-

reported stress levels Among control group compliers 12 percent stated that their most recent

science class had a negative or strong negative impact on their stress levels (where a negative

impact indicates more stress) Taking AP science more than doubles this rate raising the

likelihood of stating a negative impact by 17 percentage points (p-value = 001) In results

available from the authors we also examine the effect of taking AP on the full distribution of

studentrsquos self-reported confidence and stress levels We find that taking AP science increases

studentsrsquo likelihood of reporting strong negative impacts on stress by 5 percentage points (p-

value = 005) above the control group complier mean of 2 percent

In addition to experiencing a loss in confidence and an increase in stress treatment group

studentsrsquo grades suffered We estimate that taking AP science reduced studentsrsquo grades in their

science courses by 029 points (p-value = 007) Relative to a control group complier mean of

280 taking AP science lowers studentsrsquo science GPAs during the study year (usually their junior

year) from around a B- to a C+26 This decline is addressed to some degree by high schools that

use a weighted grade point average to upweight grades from AP courses The last row of Table 5

provides our estimated effects of AP science on studentsrsquo grades in other courses AP science

takers score approximately 018 grade points lower than control group compliers in non-science

courses during the study year (p-value below 001) These results suggest that students may be

shifting their effort away from their non-AP classes in order to meet the demands of the

challenging AP course An average of these impacts weighted by studentsrsquo share of credits in

science during the study year assuming that they take AP science (024) suggests that taking AP

science lowers studentsrsquo overall grades by 021 during the year ((-029 times 024) + (-018 times

076))

With our estimates in hand we can easily compute the adjustment that would leave the

studentrsquos GPA during the study year unaffected For students who took AP Biology or Chemistry

as result of this experiment the share of their classes in any AP science subject is predicted to be

14 percent (ie 002 + 012 from Table 3) If these studentsrsquo grades in AP science courses were

boosted by 146 (021014) their GPAs during the study year would be unaffected by their

enrollment in these AP courses This 146 boost is close to the higher end of the practices

documented in Klopfenstein and Lively (2016)27

C Robustness Checks

Table 6 presents a variety of robustness checks of the ITT estimates on our six main outcomes

The first two columns of this table repeat the findings previously shown in Table 5 Columns (3)

and (4) present alternate methods for inference Column (3) reports robust standard errors and

13

Column (4) reports the results of a permutation test where we randomly assign a pseudo

treatment and compute the share of 1000 permutations where the absolute value of the estimated

pseudo treatment effect exceeds the absolute value of the estimated treatment effect shown in

Column (2)28 The resulting p-values from this permutation test are similar to the results using

robust standard errors (shown in Column (3)) resulting in five of the six outcomes with p-values

of less than 01029

Columns (5) through (7) of Table 6 show that the results are robust to (a) dropping the one

high school that offered both AP Biology and AP Chemistry as part of the study (b) including

observations with multiply-imputed missing outcome variables and (c) excluding the high

school with the lowest survey response rate30 Column (8) shows the results when we exclude all

of the Xi covariates where we find much larger estimated positive effects on scientific inquiry

skills and smaller estimated negative effects on grades The differences in the treatment effects

on the remaining three outcomes are modest These results likely reflect the fact that students

who were randomly assigned into the treatment group have higher pre-treatment grades and

reading and math test scores all covariates that strongly correlate with science skill and future

grades

Columns (9) through (12) of Table 6 use the Lee (2009) method to place bounds on our

estimates due to potential nonresponse bias in the student survey used for the first four outcomes

This method trims particular observations from the treatment group (in this case) until it matches

the response rate of the control group The lower (upper) bound estimate trims the treatment

observations with the highest (lowest) values of the outcome Using these lower and upper bound

estimates we compute the 95 percent confidence interval for the treatment effect itself by

applying the Imbens and Manski (2004) method Consistent with our main findings the upper

and lower bound points estimates are positive for science skill (003 and 039 sd) interest in

pursuing a STEM degree (2 and 12 percentage points) and stress (1 and 11 percentage points)

However the 95 percent confidence intervals overlap zero in all cases and are roughly double the

size of the ordinary confidence intervals These results suggest that some additional caution

should be considered in evaluating the effects from outcomes based on the study survey31

Finally we would have liked to report the results of theoretically motivated heterogeneity

analyses yet we lack the statistical power needed to test heterogeneity with a high level of

confidence For example Figure 3 shows a quantile regression conditional on Xi with science

skill as the outcome We find that the point estimates at every quantile are insignificantly

different from the 009 ITT point estimate reported in Table 5 yet the 95 confidence intervals

fail to rule out large positives and negatives Additional heterogeneity results can be found in the

Online Appendix32

VI Conclusion

Most admissions committees at bachelorrsquos degree-granting institutions rely on applicantsrsquo AP

course and exam participation as signals of subject-matter skill and interest rendering the

relationship between AP uptake and college enrollment somewhat deterministic There has been

almost no empirical work to support the theory that AP disproportionately endows high school

students with greater human capital than the other courses available to them Many students

educators and parents have also complained that the rigor of the AP pro- gram causes students to

lose confidence gain stress and perform poorly in other courses We evaluate these claims with

experimental evidence on the impact of AP Biology and Chemistry courses on studentsrsquo skills

14

interests and beliefs We recruited 23 schools that had not previously offered AP Biology or

Chemistry and were willing to permit us to randomize student access to the newly offered

course At the time of our school recruitment an estimated 50 percent of US high schools

already offered AP science classes and they tended to be in relatively higher-income

communities disproportionately serving White students (Malkus 2016) Our study drew from the

remaining population of schools where teachers had lower levels of training than science

teachers nationally and students were disproportionately non-White and poor Consequently our

results on AP impacts best generalize to schools like these that are on the cusp of deciding

whether to offer an AP science course

The estimates suggest that AP science led to improvements in science skill and STEM

interest above the courses that these students would otherwise take Prior research points to

longer-run benefits of AP including a higher likelihood of college enrollment and completion as

well as possible earnings gains (Jackson 2010 2014) Our findings suggest that these long-term

effects are at least partially driven by genuine increases in skill and not due solely to

postsecondary admissions and credit-granting policies33 We also find that AP science classes

substantially increase studentsrsquo stress levels and reduce their confidence in completing a college

science course Students who take AP science also receive lower grades in science and in other

(non-science) courses The cognitive gains from AP science are consistent with evidence that

higher levels of pressure and a lower level of confidence cause students to learn more than they

would otherwise And some of the negative effect on grades can be offset by upwardly weighting

grades in advanced courses

Although we have no direct way to convert our study impacts into monetary values for

students or society our evidence suggests that schools and districts are not making unwise or

costly investments in AP Calculating the differential cost to deliver an AP course versus another

level course in the same subject is difficult given that few schools document per-course

expenditures One recent analysis of a US district that relied on teacher salaries and course

assignments offers a partial cost-analysis Roza (2009) finds approximately $360 more in per-

pupil expenditures to deliver AP versus honors due primarily to smaller class sizes and more

senior teachers in AP This cost does not factor in the time that teachers spend retraining

themselves to teach the new curriculum At the same time relative to other policies aimed at

increasing human capital in high school that are often more costly to implement (such as

reducing class size) offering an AP course may be one of the least expensive options

This study offers the first credible estimates on the impact of a curriculum that is now offered

in the majority of the nationrsquos high schools and used by most postsecondary institutions to assess

applicant potential Our findings offer evidence to support and refute some of the claims made

about the AP program At the same time many important questions remain about differential AP

course impacts along student teacher and school attributes and on different parts of the outcome

distributions What are the general equilibrium effects of AP expansion for instance on college

admissions decisions as AP expands into schools with fewer resources Do AP courses generate

spillover effects on non-AP course-takers via changes in peer interactions and changes in how

teachers teach their non-AP classes These are all questions that warrant further research

15

References

Abadie Alberto Susan Athey Guido W Imbens and Jeffrey Wooldridge 2017 ldquoWhen Should

you Adjust Standard Errors for Clusteringrdquo NBER Working Paper No 24003

Cambridge MA NBER

Adelman Clifford 2006 The Toolbox Revisited Paths to Degree Completion from High School

Through College Washington DC US Department of Education

Aguilar Lauren Greg Walton and Carl Wieman 2014 ldquoPsychological Insights for Improved

Physics Teachingrdquo Physics Today 67 (5) 43ndash49

Altonji Joseph G 1995 ldquoThe Effects of High School Curriculum on Education and Labor

Market Outcomesrdquo The Journal of Human Resources 30 (3) 409ndash438

Anderson Carl R 1976 ldquoCoping Behaviors as Intervening Mechanisms in the Inverted-U-stress-

performance Relationshiprdquo Journal of Applied Psychology 61 (1) 30ndash34

Attewell Paul and Thurston Domina 2008 ldquoRaising the Bar Curricular Intensity and

Academic Performancerdquo Educational Evaluation and Policy Analysis 30 (1) 51ndash71

Avery Christopher Oded Gurantz Michael Hurwitz and Jonathan Smith 2018 ldquoShifting

College Majors in Response to Advanced Placement Exam Scoresrdquo Journal of Human

Resources 53 (4) 918ndash956

Benjamini Yoav and Yosef Hochberg 1995 ldquoControlling the False Discovery Rate A Practical

and Powerful Approach to Multiple Testingrdquo Journal of the Royal Statistical Society 57

(1) 289ndash300

Bennett J S Hogarth F Lubben B Campbell and A Robinson 2010 ldquoTalking Science The

Research Evidence on the Use of Small Group Discussions in Science Teachingrdquo

International Journal of Science Education 32 (1) 69ndash95

Berger Joe 2006 ldquoDemoting Advanced Placementrdquo The New York Times October 4

Boekaerts Monique and Jeroen S Rozendaal 2010 ldquoUsing Multiple Calibration Indices in

Order to Capture the Complex Picture of What Affects Studentsrsquo Accuracy of Feeling of

Confidencerdquo Learning and Instruction 20 (5) 372ndash382

Bound John Brad Hershbein and Bridget Terry Long 2009 ldquoPlaying the Admissions Game

Student Reactions to Increasing College Competitionrdquo The Journal of Economic

Perspectives 23 (4) 119ndash146

Bowie Liz 2013 ldquoMaryland Schools have been Leader in Advanced Placement but Results are

Mixedrdquo The Baltimore Sun August 17

Bush George W 2006 ldquoState of the Union Address by the Presidentrdquo Washington DC The

White House

Chiu Ming Ming and Robert M Klassen 2010 ldquoRelations of Mathematics Self-Concept and its

Calibration with Mathematics Achievement Cultural Differences among Fifteen-year-

olds in 34 Countriesrdquo Learning and Instruction 20 (1) 2ndash17

Clotfelter Charles T Helen F Ladd and Jacob L Vigdor 2010 ldquoTeacher Credentials and

Student Achievement in High School Across-Subject Analysis with Student Fixed

Effectsrdquo Journal of Human Resources 45 (3) 655ndash681

College Board 2002 Equity Policy Statement New York NY

__________ 2011a AP Biology Curriculum Framework 2012-2013 New York NY

__________ 2011b AP Chemistry Curriculum Framework 2013-2014 New York NY

__________ 2017a AP Course and Exam Redesign New York NY

__________ 2017b AP Course Audit New York NY

__________ 2018 AP Program Participation and Performance Data 2018 New York NY

16

Davis Jennifer R 2014 ldquoA Little Goes a Long Way Pressure for College Students to Succeedrdquo

Journal of Undergraduate Research 12 (1) 1ndash9

Dobbie Will and Roland G Fryer Jr 2015 ldquoThe medium-term impacts of high-achieving

charter schoolsrdquo Journal of Political Economy 123 (5) 985ndash1037

Dougherty Chrys and Lynn Mellor 2009 ldquoPreparation Mattersrdquo National Center for

Educational Achievement Washington DC

Dounay Zinth Jennifer 2016 ldquo50-State Comparison Advanced Placement Policiesrdquo Education

Commission of the States

Drew Christopher 2011 ldquoRethinking Advanced Placementrdquo The New York Times January 7

Duffett Ann and Steve Farkas 2009 ldquoGrowing Pains in the Advanced Placement Program Do

Tough Trade-offs Lie Aheadrdquo Thomas B Fordham Institute Washington DC

Ellis Jessica Bailey K Fosdick and Chris Rasmussen 2016 ldquoWomen 15 Times More Likely to

Leave STEM Pipeline after Calculus Compared to Men Lack of Mathematical

Confidence a Potential Culpritrdquo PLOS ONE 11 (7) 1ndash14

Foust Regan Clark Holly Hertberg-Davis and Carolyn M Callahan 2009 ldquoStudentsrsquo

Perceptions of the Non-academic Advantages and Disadvantages of Participation in

Advanced Placement Courses and International Baccalaureate Programsrdquo Adolescence

44 (174) 289ndash312

Geiser Saul and Veronica Santelices 2004 ldquoThe Role of Advanced Placement and Honors

Courses in College Admissionsrdquo Center for Studies in Higher Education Research

Occasional Paper Series CSHE404

Goodman Joshua Samuel 2012 ldquoThe Labor of Division Returns to Compulsory Math

Courseworkrdquo Unpublished Manuscript

Harel O 2009 ldquoThe Estimation of R-squared and Adjusted R-squared in Incomplete Data Sets

Using Multiple Imputationrdquo Journal of Applied Statistics 36 (10) 1109ndash1118

Hippel Paul T von 2007 ldquoRegression with Missing Ys An Improved Strategy for Analyzing

Multiply Imputed Datardquo Sociological Methodology 37 (1) 83ndash117

Holstead Michael S Terry E Spradlin Margaret E McGillivray and Nathan Burroughs 2010

ldquoThe Impact of Advanced Placement Incentive Programsrdquo Center for Evaluation and

Education Policy Indiana University Education Policy Brief 8(1)

Hopkins Katy 2012 ldquoWeigh the Benefits Stress of AP Courses for Your Studentrdquo US News

the World Report May 10

Huber Martin 2013 ldquoA Simple Test for the Ignorability of Non-compliance in Experimentsrdquo

Economics Letters 120 (3) 389ndash391

Imbens G and F Manski 2004 ldquoConfidence Intervals for Partially Identified Parametersrdquo

Econometrica 72 (6) 1845ndash1857

Jackson C Kirabo 2010 ldquoA Little Now for a Lot Later A Look at a Texas Advanced Placement

Incentive Programrdquo Journal of Human Resources 45 (3) 591ndash639

__________ 2014 ldquoDo College-Preparatory Programs Improve Long-Term Outcomesrdquo

Economic Inquiry 52 (1) 72ndash99

Joensen Juanna Schroslashter and Helena Skyt Nielsen 2009 ldquoIs there a Causal Effect of High

School Math on Labor Market Outcomesrdquo Journal of Human Resources 44 (1) 171ndash

198

Kim Emily 2015 ldquoAP Classes often Translate to Advanced Pressurerdquo Los Angeles Times

September 22

Klopfenstein Kristin and Kit Lively 2016 ldquoDo Grade Weights Promote More Advanced

17

Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324

Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement

Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891

__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and

Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds

Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188

Cambridge Harvard Education Press

Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla

Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)

287ndash 313

Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on

Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102

Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations

of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347

(6219) 262ndash265

Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math

and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic

Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student

STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher

Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking

on Secondary and Postsecondary Successrdquo American Educational Research Journal 49

(2) 285ndash322

Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP

Expansion Can Schools in Less-Resourced Communities Successfully Implement

Advanced Placement Science Coursesrdquo Conditionally accepted by Educational

Researcher

Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo

American Enterprise Institute Washington DC

Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23

McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy

Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of

Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-

144) US Department of Education Washington DC National Center for Education

Statistics

National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of

Mathematics and Science in US High Schoolsrdquo Washington DC National Academies

Press

__________ 2012 A Framework for K-12 Science Education Practices Crosscutting

Concepts and Core Ideas Washington DC The National Academies Press

Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC

Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data

Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures

Version 10 Stanford University

Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic

Analysis amp Policy 4 (1) 1ndash30

18

Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The

Review of Economics and Statistics 86 (2) 497ndash513

Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)

Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of

Advanced High School Coursework in Increasing STEM Career Interestrdquo Science

Educator 23 (1) 1ndash13

Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework

in College Admission Decisionsrdquo College and University 82 (4) 7ndash14

Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan

Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific

Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo

Educational Measurement Forthcoming

Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where

it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor

Economics 35 (1) 67ndash147

Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An

Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732

Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual

differencesrdquo Personality and Individual Differences 21 (6) 971ndash986

Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of

Cross-Cultural Psychology 45 (5) 821ndash837

Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid

Growthrdquo The New York Times April 29

Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo

Liberal Education 94 (3) 38ndash43

The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo

Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo

Education Trust June 5

Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and

Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-

001) US Department of Education Washington DC National Center for Education

Statistics

Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13

Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate

US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the

Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced

Placement Testsrdquo Washington DC

Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of

Advanced Placementrdquo Progressive Policy Institute Washington DC

West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth

Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring

Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation

and Policy Analysis 38 (1) 148ndash170

Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity

of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482

19

Figure 1

Geographic Distribution of Participating Districts

20

Figure 2

Participating Districts Neighborhood Socioeconomic Status and School Test Scores

Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school

district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos

neighborhood defined as the first principal component factor score based on measures of median

income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed

household rate and unemployment rate Y-axis is the districtrsquos average test score in grade

equivalents based on the averaged spring math and English scores for students in grades 3-8 for

2009-2013 with the expected level of achievement standardized to zero The size of each circle

is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using

Statarsquos default settings and roughly shows the predicted test score as a function of the

neighborhoodrsquos SES

21

Figure 3

Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile

Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects

Corresponding OLS estimate shown by the dashed horizontal line Science skill has been

standardized to have a mean of 0 and SD of 1 for the full sample of participating students

Results are weighted by the inverse probability of completing the survey

22

Table 1

Participating Schools and Teachers Compared to Other US High Schools and High School

Science Teachers Panel A Schools Participating Others

Average Enrollment 1409 723

Free or Reduced-Price Lunch 0700 0438

Asian 0055 0050

Black 0349 0154

Hispanic 0410 0221

White 0164 0537

Adjusted Cohort Graduation Rate 0843 0802

District Instruction Expenditures Per Pupil $6561 $5636

District Student Services Expenditures Per Pupil $3787 $3385

Panel B Teachers Participating Others

Age Under 30 0407 0160

Age 30-49 0432 0553

Age 50 or over 0161 0287

Female 0630 0536

Hispanic or Latino 0111 0051

Race American Indian or Alaska Native 0000 0009

Race Asian American 0111 0041

Race Black 0111 0060

Race Native Hawaiian or other Pacific Islander 0000 0004

Race White 0778 0896

Years of Experience 103 132

Years of Experience lt=2 0290 0085

Years of Experience lt=5 0481 0234

Hold a Teaching Certificate 0926 0945

Undergraduate Major in STEM 0944 0747

Single Subject Credential in Science 0630 0823

Masterrsquos Degree or Higher 0356 0615

Previously Taught AP Course 0469 NA

Previously Taught AP IB or Honors Course 0796 NA

Number of Professional Development Trainings 309 NA

in the Past 5 years (0-5)

Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts

httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public

high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a

9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the

Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey

httpsncesedgovsurveyssass Others in Panel B refers to public and private high school

teachers in the US High school science teachers are defined as teachers of grades 9-12 whose

main teaching assignment is in the natural sciences

23

Table 2

TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics

(1) (2) (3) (4) (5) (6)

Full Sample Survey Sample

Pre-Treatment Characteristic

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Age as of October of 11th Grade 166 -003 -007 166 -001 -001

(002) (007) (003) (009)

[019] [035] [065] [094]

Math Exam Score 038 008 025 044 007 030

(004) (010) (005) (016)

[008] [002] [017] [006]

Reading Exam Score 029 010 018 036 009 017

(003) (012) (004) (017)

[000] [014] [002] [031]

HS Grade Point Average 316 005 020 323 006 013

(003) (008) (003) (010)

[014] [002] [006] [020]

Female 059 000 010 061 -001 011

(003) (006) (004) (007)

[099] [010] [073] [012]

Asian American 012 002 010 012 003 010

(002) (005) (001) (007)

[027] [006] [007] [012]

Black 032 -002 -006 027 000 -005

(002) (006) (002) (005)

[029] [028] [088] [040]

Hispanic Native American or Multiracial 031 001 005 033 001 005

24

(002) (006) (002) (007)

[055] [041] [081] [051]

Disabled 002 000 -001 001 000 -001

(001) (001) (001) (001)

[093] [024] [057] [05]

Gifted 013 003 000 014 002 001

(002) (005) (002) (009)

[006] [100] [025] [089]

English Language Learner 005 001 002 004 001 004

(001) (002) (001) (003)

[041] [039] [054] [022]

Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007

(002) (007) (003) (009)

[066] [077] [072] [045]

Language Other than English Spoken at Home 034 002 003 035 001 004

(002) (007) (002) (007)

[032] [073] [059] [056]

Took Recommended Prerequisite Courses 079 000 009 079 002 005

(002) (004) (002) (005)

[084] [004] [043] [031]

Number of Observations 1819 1417

Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by

School x Cohort are in parentheses and p-values are in brackets

25

Table 3

First Stage Impacts on AP Course Enrollment and Overall Course Enrollment

(1) (2) (3) (4) (5) (6)

Full Sample Survey Respondents

Outcome

Control

Group

Mean

ITT

LATE

Control

Group

Mean

ITT

LATE

AP Treatment Course Enrollment 019 038 024 039

(005) (006)

[000] [000] Share of Credits During Study Year in

AP Science 003 004 011 003 004 010

(001) (001) (001) (001)

[000] [000] [000] [000]

All AP 013 004 011 014 004 010

(001) (002) (001) (002)

[000] [000] [000] [000]

Other Advanced Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [023] [020] [020]

All Other Advanced 025 -001 -003 025 -001 -003

(001) (002) (001) (003)

[023] [023] [030] [030]

Regular Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [020] [024] [019]

All Regular 062 -003 -009 061 -003 -007

(001) (003) (001) (003)

[002] [000] [007] [003]

Number of Observations 1819 1417

Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating

Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation

(1) Course-taking information collected from student transcripts Control Group Mean uses the

full control group for the first outcome (ie AP Treatment Course Enrollment) and those control

group members who complied with their assignment (ie those who did not take the AP

Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are

weighted by the inverse probability of completing the survey Standard errors clustered by School

x Cohort are in parentheses and p-values are in brackets

26

Table 4

Treatment Contrast (Composite Variables)

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Academically Challenging Curriculum -033 031 080

(010) (024)

[000] [000]

Project-Based Independent Classroom

Activities -006 013 033

(007) (017)

[007] [006]

Integrated Use of Technology

-011 011 028

(008) (019)

[019] [014]

Number of Observations 1417

Notes To construct these composite variables we first converted the values on each component

variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest

category was set to 10 the lowest to 00 and the remaining categories evenly spaced between

00 and 10 We then averaged and standardized these converted values Results are weighted by

the inverse probability of completing the survey Online Appendix Table 5 provides the list of

component variables Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

27

Table 5

AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Science Skill -019 009 023

(006) (016)

[015] [014]

STEM Interest 062 004 009

(002) (007)

[016] [016]

Confidence in College

Science 092 -004 -010

(002) (005)

[011] [006]

Stress 012 007 017

(003) (007)

[002] [001]

Grades in Science Courses 280 -012 -029

(007) (016)

[008] [007]

Grades in Other Courses 314 -007 -018

(002) (006)

[000] [000]

Number of Observations 1819 for grades 1417 for other

outcomes

Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of

participating students STEM interest =1 if high or some interest in pursuing a STEM degree or

=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to

complete a college science course or =0 if somewhat not confident or not at all confident Stress=

1 if most recent science course had strong negative or negative impact on physical or emotional

health or =0 if strong positive impact positive impact or no impact Grades in science and other

courses are obtained from student transcripts and measure grades during the study year

Results with the exception of grades during study year are weighted by the inverse probability of

completing the survey Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

Table 6

Robustness Checks of Main ITT Results

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Outcome

Control

Group

Complie

r Mean

Main

Result

s

Robus

t SE

p-value

(permutatio

n test)

Excludin

g High

School

56

Including

Imputatio

n of

Missing

Outcome

Variables

Excluding

Covariate

s

Excludin

g High

School

23

Lee

Lower

Boun

d

Lee

Upper

Boun

d

95

Confidence

Interval

from Lee

Bounds

Rati

o of

95

CI in

(11)

to

95

CI in

(7)

Science Skill -019 009 010 011 020 007 003 039

-

009

05

1 20

(006) (005) (000) (000) (000) (000) (007) (007)

[015] [006] [006] [020] [011] [001] [024] [072] [000]

STEM Interest 062 004 005 003 003 003 002 012

-

003

01

8 19

(002) (003) (000) (000) (000) (000) (003) (004)

[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College

Science 092 -004 -003 -006 -006 -004 -006 005

-

009

01

0 20

(002) (002) (000) (000) (000) (000) (002) (003)

[011] [005] [007] [037] [002] [003] [010] [000] [017]

Stress 012 007 005 006 008 007 001 011

-

005

01

5 16

(003) (002) (000) (000) (000) (000) (003) (002)

[002] [000] [000] [014] [007] [002] [002] [079] [000]

Grades in Science Courses 280 -012 -006 -010 -007 |

(007) (004) (000) (000) (000)

[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts

Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey

(002) (003) (000) (000) (000) |

[000] [001] [001] [000] [001] [038]

Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than

standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby

a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of

43

the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)

reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the

experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply

imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and

where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)

from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those

treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and

control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to

derive confidence interval for the treatment effect itself)

30

1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the

Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the

effect of advanced high school courses more generally often without distinctions between AP

and other rigorous course options Nearly all of these nonexperimental studies find large positive

effects of rigorous secondary school courses particularly those in math and science on studentsrsquo

high school postsecondary and labor market performance (eg Altonji 1995 Attewell and

Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long

Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer

an AP course are encouraged (though not required) to attend a professional development

training The Board and other independent agencies offer several workshops with the most

extensive training being the AP summer institute a week-long training that is led by an

experienced AP instructor Teachers are then expected to develop their syllabi for the course and

submit them to the Board for review A team of auditors at the Board review each syllabus and

grant permission to a school to label the course as AP on course catalogs and student transcripts

once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they

do not meet the requirements upon original submission College Board (2017b) contains a

discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for

assessment (ie course delivery and student performance are not assessed by the Board) In

order to effectively run an AP Biology or Chemistry course teachers require access to a well-

equipped classroom and laboratory including all supplies necessary to engage in

experimentation (eg beakers solutions microscopes measuring equipment) Most of the

teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of

learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry

reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses

were based upon recommendations from the National Science Foundation the National Research

Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently

influenced by their frames of reference in ways that other assessments of these traits (eg

external observations) may be less influenced By increasing the standard to which they compare

themselves studentsrsquo confidence may decrease This feature of most self-assessments could be

considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et

al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome

depends to some extent on how these changes in perceived ability influence other behaviors

such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and

Biology I and Chemistry I for AP Biology with no additional requirements beyond these

prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week

training course classroom supplies (eg lab materials textbooks) and to compensate schools

for the staff time required for study administration efforts We also offered $1000 compensation

for an individual selected by the school to serve as a liaison between the study team and the

31

school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of

students which would have powered the study to detect effect sizes smaller than those detected

here We faced several challenges in recruiting schools to participate even with the monetary

incentives Some schools were uncomfortable with randomization across classrooms while

others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the

course would be offered We also made some assignments on a rolling basis as additional

consentassent forms were submitted We have no information on the students who were deemed

eligible by the school to take the new AP science course but who did not sign the consent form

to participate As these students did not participate we do not have permission to obtain

information on their characteristics (eg via transcripts) and for most schools we do not know

the number of such students 10 Participating districts include Anaheim Union High School District California East Side

Union High School District California Lynwood Unified School District California Jefferson

Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg

Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public

Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville

Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate

degrees held by teachers nationally are likely to be in education (not STEM) Thus the study

teachers are less likely to have a graduate degree but not necessarily less likely to have STEM

training We also did not survey teachers regarding their Teach for America (TFA) experience

but it is possible that the relatively high share of STEM undergraduate degrees could be driven

by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last

pilot test included 140 students) prior to administering the tool to study participants Reliability

metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of

the development of the assessment instrument in the survey can be found in Seeratan et al

(2017) 13 Each year in the spring semester our team administered and collected the participant surveys

during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However

if study participants who did not take the survey differ in unobserved ways then our reweighting

based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo

characteristics before imputation of missing values (as described below) these results are very

similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact

that some students did not have 10th grade scores) we created one reading and math score for

each student that is the average of both scores or just the 8th grade score For the 23 participating

students who were in 10th grade during the year in which the AP course was offered to their

cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be

endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of

enrollment in the course so the lack of balance is simply due to unlucky randomization rather

32

than manipulation by school administrators We considered implementing a randomized block

design to avoid such issues but found it infeasible to obtain the necessary test score information

prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was

allowed to register for the new class We added an entire planning year to our study design to

avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by

Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We

find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these

six outcomes which suggests that generalizing our estimated treatment effects to the full control

group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly

offered enrollment in an AP course and then given the option of Chemistry or Biology To

account for the two courses offered we treat the school as two separate groups School-

Chemistry and School-Biology For those students who were not offered an AP course we

randomly assign them to one of two control groups proportional to the number of treated

students who chose each course For example if 60 of the treated students chose Biology then

we randomly assign 60 of the control students to the School-Biology control group In Section

VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a

probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1

if student i in school by cohort j completed any part of the end-of-year survey Xi is the same

vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed

effects and Φ() is the cumulative normal distribution function The results of this regression are

included in Online Appendix Table 2 Students who had higher pre-treatment grades Black

students those who were not disabled and those who took prerequisite courses were more likely

to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives

more weight in the regression to study participants who completed the survey and yet had pre-

study characteristics that were similar to those study participants who did not complete the

survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and

with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we

observe each outcome variable This follows a multiple imputation then deletion strategy

suggested by Hippel (2007) which improves efficiency while protecting against problematic

imputed outcome values As a robustness check Section VC provides results including

imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually

challenging curriculum with more homework than non-AP complier students Treatment group

students are also more likely to report that the students in their class were driven to succeed and

that the teacher set high standards The AP science class also involved more student-led projects

or experiments hands on learning and small group work all activities that are deemed to be

essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)

Yet we do not find strong evidence that students in AP classes were more likely to present what

they learned apply their knowledge to solve a new problem or work independently and none of

the component measures of technology usage were statistically significantly affected Nor did

33

treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear

better able to implement the academic rigor expected of an AP science class than some of the

inquiry-based approaches that the College Board intends for AP science We do not find

evidence that taking AP science led students to be more likely to report that they found their

course more interesting which may reflect the inability of the teachers to fully implement a

creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects

that might render our estimated effects smaller A research design with randomization both

across and within schools would allow for estimation of spillover effects but such a design was

infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers

received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the

weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors

in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple

outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons

(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same

three outcomes that reach statistical significance without applying the correction (shown in

Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys

from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we

have included the students from cohort 1 of high school number 23 where nonresponse was due

mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes

(including student prior academic preparation raceethnicity gender and teacher preparation)

We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in

science and grades in other courses) Some of the differences in the point estimates were quite

large yet so too were the standard errors For instance five of the seven estimated differential

treatment effects on science skill exceed 025 standard deviations with p-values that fall in the

suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse

on all three cohorts of study participants Once data collection is complete we will have the

ability to examine the effect of AP science on college enrollment college selectivity and college

completion

Page 6: Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown,

6

B Conceptual Framework

There are several channels through which an AP science class is expected to influence studentsrsquo

cognitive and noncognitive skills Much like the ideal college course AP science is designed to

provide rigorous content and a substantial workload be taught by teachers who have high

expectations and consist of students who are driven to succeed These inputsmdashcourse rigor

teacher expectations and peer motivationmdashare often thought of as the main characteristics that

distinguish AP courses from other high school courses

Yet AP science classes are also intended to offer an inquiry-based approach to science that

when combined with a high level of rigor provides an additional causal pathway to change

Specifically a well-implemented AP science course should encourage students to ask questions

gather and interpret data arrive at explanations grounded in scientific principles and

communicate their observations to one another under the guidance of teachers (College Board

2011a 2011b)4 This student-led inquiry-based approach differs from many traditional

secondary school science classrooms where the goal is often for students to memorize content

and replicate laboratory experiments that demonstrate the content (National Research Council

2002 2012) The AP science course in contrast seeks to expose students to the real-world

practices of science and the skills that form the basis of scientific inquiry by focusing more on

big picture concepts and small group experimentation with students directing the inquiry The

curriculum also encourages teachers to move away from lecture-based pedagogy and multiple-

choice quizzes and to increase their use of technology to help students analyze data draw

interpretations and communicate findings (College Board 2011a 2011b)

AP science classes are expected to increase studentsrsquo ability to ask research questions design

experiments analyze data and draw conclusions In the process of gaining these scientific

inquiry skills the new curriculum is intended to spur greater interest in the practice of science

because it becomes more enjoyable and more accessible to students for whom rote memorization

and execution of prefabricated lab experiments might have diminished enthusiasm in the subject

(National Research Council 2012) Science experts posit that inquiry-based science courses will

be particularly successful in generating greater interest and skill among women and among

students from underrepresented minority groups (Aguilar Walton and Wieman 2014 Ellis

Fosdick and Rasmussen 2016 Kurth Anderson and Palincsar 2002 Leslie et al 2015 Litzler

Samuelson and Lorah 2014)

While the rigor and expectations of a college course may be appropriate for some students it

can be too demanding for others Students often report high levels of stress and burnout from

taking AP courses particularly if they perceive that they are not prepared for the challenge of

college coursework (Kim 2015 Marx 2014 Tucker 2012) A strenuous AP course could in fact

cause students to lose confidence in their ability to complete college science courses A number

of mechanisms could cause students to lose confidence including exposure to stronger peers

inability to successfully complete assignments or simply receiving lower grades than they

received in their non-AP courses5 The AP effect on confidence will likely matter differently for

students with different levels of initial confidence For students who are over-confident in their

ability to succeed in college science courses taking a challenging AP course in high school

might cause them to revise their expectations to be more in line with the higher demands of

college-level work

Taking a more strenuous AP course is also likely to affect studentsrsquo time allocation

Studentsrsquo performance in each class will be determined by their subject-specific ability as well as

the amount of time they devote to their coursework versus other activities including work

7

extracurricular and leisure If AP courses are more demanding than other courses students

solving a time allocation problem may shift more effort into their AP course away from other

pursuits The impact of this change in time allocation on studentsrsquo performance in AP and other

courses will depend upon whether they shift effort away from other courses and on the degree of

complementarity between their AP science course and their other courses Study time devoted to

an AP science course could improve student performance in other math and science classes

(where the skills tasks and knowledge are similar) even if students spend less time on those

courses For courses that require students to perform tasks that are not complementary with AP

science (eg courses in the humanities) taking AP science concurrently with these courses

could decrease student performance in both courses Of course students taking an AP course

could choose to reduce time spent on alternative (non-academic) activities If these other

activities have no causal impact on performance in school then the impact on overall

achievement could be negligible

Some students report concerns about their time allocation as they weigh the decision to enroll

in AP (Foust Hertberg-Davis and Callahan 2009 Hopkins 2012 Kim 2015) Many of these

concerns have increased over time as the courses have become more accessible to students who

previously faced barriers to enrollment Traditionally teachers only recommended AP courses to

students with high grades in prerequisite classes and the courses were only offered in schools

with substantial resources The Board has made efforts to increase access with for instance a

policy statement that encourages schools to open AP to all students who are ldquowilling to accept

the challengerdquo and remove all barriers that restrict access (College Board 2002)6 In a 2008

survey of a nationally-representative sample 65 percent of secondary school teachers reported

that their schools encourage as many students as possible to take AP and 69 percent reported that

AP courses are generally open to any student who wants to enroll (Duffett and Farkas 2009)

These open access policies have led to complaints that students who enroll with less preparation

will be unable to engage in the material (and perhaps become more discouraged by the

difficulty of the course) than students with more prior preparation (Hopkins 2012 Steinberg

2009 Duffett and Farkas 2009) Open access could also adversely affect more prepared students

through negative peer effects or through teachers removing content and slowing the pace of

course delivery

III AP Science Impact Study

A Overview

We recruited 23 schools from across the United States and offered monetary compensation to

pay for equipment and teacher training and as an incentive to secure participation7 Eligible

schools included ones that had not offered AP Biology or AP Chemistry in recent years were

willing to add such a course and comply with study protocol and had more eligible students than

could be served in one class so as to supply a sufficiently-sized control group8 Of the 23

schools 12 schools added AP Chemistry 10 schools added AP Biology and 1 school added both

courses We recruited two waves of schools (those that offered the course for the first time in

2013 and those that offered it for the first time in 2014) both waves were asked to field the

course for two years and the earlier-joining schools had the option of fielding the course for

three years The study includes 47 schools by cohort groups

Each participating school identified students that the school deemed eligible to take the new

AP Biology or Chemistry course in the spring of the prior year We treated all eligible students

8

who assented to participate in the study and who obtained consent from their parent or guardian

as study participants Upon receipt of signed consentassent forms we randomly offered

enrollment in the newly launched course to a subset of participating students9 The study

includes a total of 27 teachers and 1819 students (with an average of approximately 19 students

per AP class)

Figure 1 shows the geographic distribution of the 11 participating districts which are

primarily concentrated in the western southern and eastern regions of the country10 The

underrepresentation of districts in the Midwest is consistent with evidence that the Midwestern

region has experienced less competition over the years in access to selective postsecondary

institutions and a corresponding lag in AP participation rates (Bound Hershbein and Long

2009) Relative to districts across the nation those participating in the study tend to be in

neighborhoods with lower levels of socioeconomic status and to educate students who score

below average on tests in earlier grades (see Figure 2) Correspondingly participating schools

tend to be larger and more likely to educate students who are eligible for free or reduced-price

lunch Black and Hispanic than other schools (Panel A of Table 1)

There are two reasons for this over-representation of larger schools serving less economically

prosperous communities First AP courses are already offered in the majority of the nationrsquos

public high schools and schools that serve students from high-income families tend to offer

more AP subjects than schools that serve students from lower-income families (Malkus 2016

Theokas and Saaris 2013) Given that our research design only allowed for schools that had not

recently offered an AP science course the population of schools from which we recruited tended

to be those in settings with fewer resources Second participating schools were required to state

that they believed they would have 60 or more students who were qualified to take the AP

science course and this requirement tended to disqualify smaller high schools

Reflecting the school demographics participating teachers are slightly younger less

experienced and more likely to be female Black Asian American and of Hispanic ethnicity

than US high school science teachers generally (Panel B of Table 1) Nearly half (a third) of our

study teachers have less than or equal to five (two) years of teaching experience which is more

than double (triple) the rate of US high school science teachers Study teachers are more likely to

hold an undergraduate major in a STEM field than other high school science teachers yet far less

likely to hold a mastersrsquo degree and slightly less likely to have earned a teaching credential in

science Most of the participating teachers had previously taught a higher-level course (mostly

honors) yet only 47 percent of them had previously taught an AP course Our research

consequently applies to a population of teachers who are relatively new to the AP science

curriculum and who have generally not received graduate training11 Assuming AP courses

improve with teacher preparation our results likely capture the effect of a less-than-ideal version

of AP and may result in less positive treatment effects than when AP is delivered by teachers

with more training and experience (Clotfelter Ladd and Vigdor 2010)

B Data and Student Descriptive Statistics

We rely on three primary and secondary data sources for impact estimates The first is an

assessment developed and validated by the research team that measures studentsrsquo scientific

inquiry skills We administered this assessment to students in both treatment and control groups

and designed it to measure general inquiry skills (eg how to analyze data) rather than specific

content knowledge in Biology or Chemistry To that end the assessment tool includes nine items

that rely on science disciplinary knowledge that is taught in middle school specifically material

from Life Sciences and Physical Sciences The assessment which we administered to all study

9

participants during a 45-minute period measures studentsrsquo skills in data analysis scientific

explanation and scientific argument12 Participating teachers were not provided copies of the

instrument in advance therefore teachers were unable to teach any content material prior to test

administration

The second source is a questionnaire that we administered concurrently with the assessment

and that asks students a number of questions about their most recent science class and their plans

after high school The assessment and questionnaire were completed together and administered

outside of class (henceforth we refer to these instruments as the ldquosurveyrdquo) The third data source

are studentsrsquo high school transcripts which contain data on demographic and socioeconomic

background grades courses standardized exams taken in the 8th and 10th grades as well as high

school completion We use these data to determine the balance of randomization on pre-

treatment covariates estimate the effect of randomization on course-taking (including

compliance) improve the precision of our estimates with statistical controls and estimate

treatment effects on studentsrsquo grades

Our survey response rate was 78 percent13 Attrition can be attributed to student absences

during the dates scheduled for survey administration and communication lapses between school

coordinators and students Students who were randomly assigned to treatment have a 9-

percentage point higher survey response rate Given the possibility of nonrandom sample

attrition we weight all regressions by the inverse of the probability of completing the survey

conditional on student characteristics14 We implement a variety of robustness checks as

additional means to account for nonresponse These include multiple imputation of missing

outcome variables excluding one high school that had a low response rate and using the Lee

(2009) technique to provide bounds on the estimated effects These methods and results are

discussed below

We supplement these data with surveys that we administered online to teachers of the new

AP courses at the conclusion of the course The teacher survey includes questions about their

educational background professional experiences and professional development past and

present instructional practices generally and around science specifically participation in the

College Board AP training ability to cover the content of the AP course and coaching

mentoring and other professional community supports received from the school district and

education community

Table 2 provides balancing tests on pre-treatment characteristics for the full sample and the

survey respondents conditional on school by cohort fixed effects15 Most of the estimated

differences between treatment and control group students on pre-treatment observed

characteristics are small with some notable exceptions In both the full and survey samples

treatment group studentsrsquo reading exam scores were 010 and 009 standard deviations higher

than control group students both at p-values below 005 The magnitude of the treatment-control

difference was slightly lower and less precisely-estimated in math yet also favored treatment

group students16 To adjust for these chance imbalances we include all student covariates as

predictors of outcomes in the models and in the robustness checks we exclude these

covariates17

Table 2 also shows the extent of differences between control group compliers and non-

compliers We find that non-compliers are generally much more academically prepared for AP

science they have higher pre-treatment reading and math test scores and are more likely to have

completed the prerequisite courses On demographics non-compliers are more likely to be Asian

American and female18

10

IV Empirical Strategy

We estimate the effect of taking the AP science course with a standard instrumental variable

specification

(1) 119884119894119895 = 120572119895 + 119860119894119895120573 + 119935119894120574 + 120598119894119895

(2) 119860119875119894119895 = 120575119895 + 119874119891119891119890119903119890119889119894119895120579 + 119935119894120583 + 120598119894119895

where 119860119875119894119895 = 1 if student i enrolled in the AP science course in school x cohort stratum j 119860119894119895 is

the fitted value based on the estimates of the parameters in Equation (2) Offeredij = 1 if the

student is randomized into the treatment group Xi is a vector of pre-treatment covariates

(including age math and reading exam scores from 8th and 10th grade (standardized and

averaged for math and reading separately) cumulative GPA prior to the year when the AP

science course was offered and indicator variables for female racial group (Asian American

Black or Hispanic Native American or Multiracial) disability gifted English Language

Learner eligible for free or reduced-price lunch home language is not English and took

recommended prerequisite courses) and 120572119895 and 120575119895 are school by cohort fixed effects19 We use

two-stage least squares to estimate the model for all outcomes The local average treatment effect

(LATE) estimate is given by β

The intent to treat (ITT) estimate is obtained by replacing 119860119894119895 with Offeredij in Equation (1)

as shown in Equation (3) The coefficient on Offeredij in Equation (3) provides the effect of

being offered enrollment in the new AP science course and is a weighted average of effects on

those who do and do not choose to enroll in the course

(3) 119884119894119895 = 120577119895 + 119874119891119891119890119903119890119889119894119895120591 + 119935119894120582 + 120598119894119895

For outcomes that are obtained from the survey we weight regressions by the inverse of the

estimated probability of completing the survey20 The results are similar without using these

weights (see Online Appendix Tables 3 4 and 6) Since we have some missingness in student

characteristics as a result of either missing student transcripts or certain data elements not

collected by the district we use multiple imputation by chained equations creating 10 imputed

datasets and combine the results21 For inference we cluster standard errors at the level of

treatment assignment (school by cohort) in our analysis of main effects In the analysis of

robustness we report permutation standard errors robust standard errors (for comparison to

permutations) and the statistical significance of the LATE estimates after adjusting our tests of

significance for multiple comparisons

V Results

A Course-Taking and Treatment Contrast

Table 3 provides estimated effects of the randomized offer of enrollment on AP science course

enrollment and share of credits in all courses for the full sample and the survey samples The

first-stage estimates indicate that the offer substantially increased the likelihood of the student

taking the AP science course by 38 percentage points in the full sample and 39 percentage points

in the survey sample As we expected compliance with randomization was imperfect with 42

11

percent of the students who received an offer choosing not to enroll and 19 percent of the control

students enrolling Nearly all of these latter crossovers reflected decisions by the district to

violate the study protocol and let control group students into the course while a few of these

came from hardship exemptions that were requested by the school and granted by the study team

The remaining rows in Table 3 shine light on the courses that were crowded out by the newly

offered AP science course Mechanically treatment group students took more credits in AP

science (an 11-percentage point increase in the share of total credits in the full sample)

Treatment group studentsrsquo share of courses in any AP also increased by 11 percentage points

indicating that they chose not to reduce enrollment in other AP courses Instead taking AP

science appears to have crowded out regular courses (down 9 percentage points) including

regular science courses (down 2 percentage points)22

Approximately 78 percent of the control group compliers took any science course with 34

percent taking a non-AP advanced science course (almost entirely honors courses) during the

study year The control students who did not take AP Biology or Chemistry took a variety of

alternative science courses with the most commonly reported courses including Chemistry

(13) Physics (12) AP Environmental Science (11) Biology (10) Honors Biology (9)

and AnatomyPhysiology (9)

Table 4 provides the contrast in treatment and control group complier reports on the content

and rigor of their science courses for three composite variables We find that taking AP science

yielded a substantially more academically challenging curriculum (up 080 sd p-value lt 001)

and raised the extent of inquiry-based classroom activities (up 033 sd p-value = 006) Our

results also suggest that AP course-takerrsquos classrooms were more likely to use technology (up

028 sd p-value = 014)23 Online Appendix Table 5 shows estimated impacts on each of the

component variables used in constructing the composite variables We find that while AP

classrooms were more inquiry-based than other science classrooms using our composite

measure some of the core components of the inquiry approach that were intended by the Board

(eg applying knowledge to solve a new problem) were not more prevalent in AP science

classes than other science classes24 This contrast between studentsrsquo reports of the content and

rigor of their AP science course relative to other courses available to them offers one measure of

the relative quality of the treatment In a companion manuscript we provide a detailed evaluation

of implementation fidelity (the degree to which the courses were implemented as intended by the

Board) through teacher surveys course syllabi student transcripts and interviews with teachers

and school administrators (Long Conger and McGhee 2018) In that manuscript we find results

that are consistent with the finding that most teachers were able to implement a rigorous AP

science classroom yet they also struggled with the inquiry-based approach and integrating

technology into the classroom

These reported differences between treatment and control group classrooms also hold despite

the fact that many of the teachers selected to teach AP also teach the other science courses taken

by control group students In fact almost 67 percent of AP teachers reported using some of their

AP science strategies and lessons in their non-AP classes These within-school spillovers likely

attenuate observed differences in outcomes between treatment and control group students in the

same school25

B AP Impact on Outcomes

Table 5 reports estimated impacts of AP science on the key outcomes of interest We estimate

that for the typical complier taking AP science raises objectively measured scientific inquiry

skills by 023 standard deviations We are unable to rule out zero treatment impacts with

12

conventionally high levels of confidence (p-value = 014) and consequently refer to these results

as more suggestive than definitive AP science also increased compliersrsquo interest in pursuing a

STEM degree should they enroll in college by 9 percentage points up from a control group

complier mean of 62 percent with again more suggestive than definitive results at traditional

levels of statistical inference (p-value = 016)

Table 5 provides stronger evidence of negative treatment effects on studentsrsquo confidence in

their ability to succeed in a college science course Among control group compliers 92 percent

express that they are at least somewhat confident in their ability to succeed in a college science

course These high levels of confidence are perhaps not surprising since all of our sample

participants demonstrated interest in taking AP Chemistry or Biology as a result of signing the

study assent forms Taking AP science substantially lowered participantsrsquo likelihood of being at

least somewhat confident in their ability to complete college courses in science (down 10

percentage points p-value = 006) We also find large effects of the AP course on studentsrsquo self-

reported stress levels Among control group compliers 12 percent stated that their most recent

science class had a negative or strong negative impact on their stress levels (where a negative

impact indicates more stress) Taking AP science more than doubles this rate raising the

likelihood of stating a negative impact by 17 percentage points (p-value = 001) In results

available from the authors we also examine the effect of taking AP on the full distribution of

studentrsquos self-reported confidence and stress levels We find that taking AP science increases

studentsrsquo likelihood of reporting strong negative impacts on stress by 5 percentage points (p-

value = 005) above the control group complier mean of 2 percent

In addition to experiencing a loss in confidence and an increase in stress treatment group

studentsrsquo grades suffered We estimate that taking AP science reduced studentsrsquo grades in their

science courses by 029 points (p-value = 007) Relative to a control group complier mean of

280 taking AP science lowers studentsrsquo science GPAs during the study year (usually their junior

year) from around a B- to a C+26 This decline is addressed to some degree by high schools that

use a weighted grade point average to upweight grades from AP courses The last row of Table 5

provides our estimated effects of AP science on studentsrsquo grades in other courses AP science

takers score approximately 018 grade points lower than control group compliers in non-science

courses during the study year (p-value below 001) These results suggest that students may be

shifting their effort away from their non-AP classes in order to meet the demands of the

challenging AP course An average of these impacts weighted by studentsrsquo share of credits in

science during the study year assuming that they take AP science (024) suggests that taking AP

science lowers studentsrsquo overall grades by 021 during the year ((-029 times 024) + (-018 times

076))

With our estimates in hand we can easily compute the adjustment that would leave the

studentrsquos GPA during the study year unaffected For students who took AP Biology or Chemistry

as result of this experiment the share of their classes in any AP science subject is predicted to be

14 percent (ie 002 + 012 from Table 3) If these studentsrsquo grades in AP science courses were

boosted by 146 (021014) their GPAs during the study year would be unaffected by their

enrollment in these AP courses This 146 boost is close to the higher end of the practices

documented in Klopfenstein and Lively (2016)27

C Robustness Checks

Table 6 presents a variety of robustness checks of the ITT estimates on our six main outcomes

The first two columns of this table repeat the findings previously shown in Table 5 Columns (3)

and (4) present alternate methods for inference Column (3) reports robust standard errors and

13

Column (4) reports the results of a permutation test where we randomly assign a pseudo

treatment and compute the share of 1000 permutations where the absolute value of the estimated

pseudo treatment effect exceeds the absolute value of the estimated treatment effect shown in

Column (2)28 The resulting p-values from this permutation test are similar to the results using

robust standard errors (shown in Column (3)) resulting in five of the six outcomes with p-values

of less than 01029

Columns (5) through (7) of Table 6 show that the results are robust to (a) dropping the one

high school that offered both AP Biology and AP Chemistry as part of the study (b) including

observations with multiply-imputed missing outcome variables and (c) excluding the high

school with the lowest survey response rate30 Column (8) shows the results when we exclude all

of the Xi covariates where we find much larger estimated positive effects on scientific inquiry

skills and smaller estimated negative effects on grades The differences in the treatment effects

on the remaining three outcomes are modest These results likely reflect the fact that students

who were randomly assigned into the treatment group have higher pre-treatment grades and

reading and math test scores all covariates that strongly correlate with science skill and future

grades

Columns (9) through (12) of Table 6 use the Lee (2009) method to place bounds on our

estimates due to potential nonresponse bias in the student survey used for the first four outcomes

This method trims particular observations from the treatment group (in this case) until it matches

the response rate of the control group The lower (upper) bound estimate trims the treatment

observations with the highest (lowest) values of the outcome Using these lower and upper bound

estimates we compute the 95 percent confidence interval for the treatment effect itself by

applying the Imbens and Manski (2004) method Consistent with our main findings the upper

and lower bound points estimates are positive for science skill (003 and 039 sd) interest in

pursuing a STEM degree (2 and 12 percentage points) and stress (1 and 11 percentage points)

However the 95 percent confidence intervals overlap zero in all cases and are roughly double the

size of the ordinary confidence intervals These results suggest that some additional caution

should be considered in evaluating the effects from outcomes based on the study survey31

Finally we would have liked to report the results of theoretically motivated heterogeneity

analyses yet we lack the statistical power needed to test heterogeneity with a high level of

confidence For example Figure 3 shows a quantile regression conditional on Xi with science

skill as the outcome We find that the point estimates at every quantile are insignificantly

different from the 009 ITT point estimate reported in Table 5 yet the 95 confidence intervals

fail to rule out large positives and negatives Additional heterogeneity results can be found in the

Online Appendix32

VI Conclusion

Most admissions committees at bachelorrsquos degree-granting institutions rely on applicantsrsquo AP

course and exam participation as signals of subject-matter skill and interest rendering the

relationship between AP uptake and college enrollment somewhat deterministic There has been

almost no empirical work to support the theory that AP disproportionately endows high school

students with greater human capital than the other courses available to them Many students

educators and parents have also complained that the rigor of the AP pro- gram causes students to

lose confidence gain stress and perform poorly in other courses We evaluate these claims with

experimental evidence on the impact of AP Biology and Chemistry courses on studentsrsquo skills

14

interests and beliefs We recruited 23 schools that had not previously offered AP Biology or

Chemistry and were willing to permit us to randomize student access to the newly offered

course At the time of our school recruitment an estimated 50 percent of US high schools

already offered AP science classes and they tended to be in relatively higher-income

communities disproportionately serving White students (Malkus 2016) Our study drew from the

remaining population of schools where teachers had lower levels of training than science

teachers nationally and students were disproportionately non-White and poor Consequently our

results on AP impacts best generalize to schools like these that are on the cusp of deciding

whether to offer an AP science course

The estimates suggest that AP science led to improvements in science skill and STEM

interest above the courses that these students would otherwise take Prior research points to

longer-run benefits of AP including a higher likelihood of college enrollment and completion as

well as possible earnings gains (Jackson 2010 2014) Our findings suggest that these long-term

effects are at least partially driven by genuine increases in skill and not due solely to

postsecondary admissions and credit-granting policies33 We also find that AP science classes

substantially increase studentsrsquo stress levels and reduce their confidence in completing a college

science course Students who take AP science also receive lower grades in science and in other

(non-science) courses The cognitive gains from AP science are consistent with evidence that

higher levels of pressure and a lower level of confidence cause students to learn more than they

would otherwise And some of the negative effect on grades can be offset by upwardly weighting

grades in advanced courses

Although we have no direct way to convert our study impacts into monetary values for

students or society our evidence suggests that schools and districts are not making unwise or

costly investments in AP Calculating the differential cost to deliver an AP course versus another

level course in the same subject is difficult given that few schools document per-course

expenditures One recent analysis of a US district that relied on teacher salaries and course

assignments offers a partial cost-analysis Roza (2009) finds approximately $360 more in per-

pupil expenditures to deliver AP versus honors due primarily to smaller class sizes and more

senior teachers in AP This cost does not factor in the time that teachers spend retraining

themselves to teach the new curriculum At the same time relative to other policies aimed at

increasing human capital in high school that are often more costly to implement (such as

reducing class size) offering an AP course may be one of the least expensive options

This study offers the first credible estimates on the impact of a curriculum that is now offered

in the majority of the nationrsquos high schools and used by most postsecondary institutions to assess

applicant potential Our findings offer evidence to support and refute some of the claims made

about the AP program At the same time many important questions remain about differential AP

course impacts along student teacher and school attributes and on different parts of the outcome

distributions What are the general equilibrium effects of AP expansion for instance on college

admissions decisions as AP expands into schools with fewer resources Do AP courses generate

spillover effects on non-AP course-takers via changes in peer interactions and changes in how

teachers teach their non-AP classes These are all questions that warrant further research

15

References

Abadie Alberto Susan Athey Guido W Imbens and Jeffrey Wooldridge 2017 ldquoWhen Should

you Adjust Standard Errors for Clusteringrdquo NBER Working Paper No 24003

Cambridge MA NBER

Adelman Clifford 2006 The Toolbox Revisited Paths to Degree Completion from High School

Through College Washington DC US Department of Education

Aguilar Lauren Greg Walton and Carl Wieman 2014 ldquoPsychological Insights for Improved

Physics Teachingrdquo Physics Today 67 (5) 43ndash49

Altonji Joseph G 1995 ldquoThe Effects of High School Curriculum on Education and Labor

Market Outcomesrdquo The Journal of Human Resources 30 (3) 409ndash438

Anderson Carl R 1976 ldquoCoping Behaviors as Intervening Mechanisms in the Inverted-U-stress-

performance Relationshiprdquo Journal of Applied Psychology 61 (1) 30ndash34

Attewell Paul and Thurston Domina 2008 ldquoRaising the Bar Curricular Intensity and

Academic Performancerdquo Educational Evaluation and Policy Analysis 30 (1) 51ndash71

Avery Christopher Oded Gurantz Michael Hurwitz and Jonathan Smith 2018 ldquoShifting

College Majors in Response to Advanced Placement Exam Scoresrdquo Journal of Human

Resources 53 (4) 918ndash956

Benjamini Yoav and Yosef Hochberg 1995 ldquoControlling the False Discovery Rate A Practical

and Powerful Approach to Multiple Testingrdquo Journal of the Royal Statistical Society 57

(1) 289ndash300

Bennett J S Hogarth F Lubben B Campbell and A Robinson 2010 ldquoTalking Science The

Research Evidence on the Use of Small Group Discussions in Science Teachingrdquo

International Journal of Science Education 32 (1) 69ndash95

Berger Joe 2006 ldquoDemoting Advanced Placementrdquo The New York Times October 4

Boekaerts Monique and Jeroen S Rozendaal 2010 ldquoUsing Multiple Calibration Indices in

Order to Capture the Complex Picture of What Affects Studentsrsquo Accuracy of Feeling of

Confidencerdquo Learning and Instruction 20 (5) 372ndash382

Bound John Brad Hershbein and Bridget Terry Long 2009 ldquoPlaying the Admissions Game

Student Reactions to Increasing College Competitionrdquo The Journal of Economic

Perspectives 23 (4) 119ndash146

Bowie Liz 2013 ldquoMaryland Schools have been Leader in Advanced Placement but Results are

Mixedrdquo The Baltimore Sun August 17

Bush George W 2006 ldquoState of the Union Address by the Presidentrdquo Washington DC The

White House

Chiu Ming Ming and Robert M Klassen 2010 ldquoRelations of Mathematics Self-Concept and its

Calibration with Mathematics Achievement Cultural Differences among Fifteen-year-

olds in 34 Countriesrdquo Learning and Instruction 20 (1) 2ndash17

Clotfelter Charles T Helen F Ladd and Jacob L Vigdor 2010 ldquoTeacher Credentials and

Student Achievement in High School Across-Subject Analysis with Student Fixed

Effectsrdquo Journal of Human Resources 45 (3) 655ndash681

College Board 2002 Equity Policy Statement New York NY

__________ 2011a AP Biology Curriculum Framework 2012-2013 New York NY

__________ 2011b AP Chemistry Curriculum Framework 2013-2014 New York NY

__________ 2017a AP Course and Exam Redesign New York NY

__________ 2017b AP Course Audit New York NY

__________ 2018 AP Program Participation and Performance Data 2018 New York NY

16

Davis Jennifer R 2014 ldquoA Little Goes a Long Way Pressure for College Students to Succeedrdquo

Journal of Undergraduate Research 12 (1) 1ndash9

Dobbie Will and Roland G Fryer Jr 2015 ldquoThe medium-term impacts of high-achieving

charter schoolsrdquo Journal of Political Economy 123 (5) 985ndash1037

Dougherty Chrys and Lynn Mellor 2009 ldquoPreparation Mattersrdquo National Center for

Educational Achievement Washington DC

Dounay Zinth Jennifer 2016 ldquo50-State Comparison Advanced Placement Policiesrdquo Education

Commission of the States

Drew Christopher 2011 ldquoRethinking Advanced Placementrdquo The New York Times January 7

Duffett Ann and Steve Farkas 2009 ldquoGrowing Pains in the Advanced Placement Program Do

Tough Trade-offs Lie Aheadrdquo Thomas B Fordham Institute Washington DC

Ellis Jessica Bailey K Fosdick and Chris Rasmussen 2016 ldquoWomen 15 Times More Likely to

Leave STEM Pipeline after Calculus Compared to Men Lack of Mathematical

Confidence a Potential Culpritrdquo PLOS ONE 11 (7) 1ndash14

Foust Regan Clark Holly Hertberg-Davis and Carolyn M Callahan 2009 ldquoStudentsrsquo

Perceptions of the Non-academic Advantages and Disadvantages of Participation in

Advanced Placement Courses and International Baccalaureate Programsrdquo Adolescence

44 (174) 289ndash312

Geiser Saul and Veronica Santelices 2004 ldquoThe Role of Advanced Placement and Honors

Courses in College Admissionsrdquo Center for Studies in Higher Education Research

Occasional Paper Series CSHE404

Goodman Joshua Samuel 2012 ldquoThe Labor of Division Returns to Compulsory Math

Courseworkrdquo Unpublished Manuscript

Harel O 2009 ldquoThe Estimation of R-squared and Adjusted R-squared in Incomplete Data Sets

Using Multiple Imputationrdquo Journal of Applied Statistics 36 (10) 1109ndash1118

Hippel Paul T von 2007 ldquoRegression with Missing Ys An Improved Strategy for Analyzing

Multiply Imputed Datardquo Sociological Methodology 37 (1) 83ndash117

Holstead Michael S Terry E Spradlin Margaret E McGillivray and Nathan Burroughs 2010

ldquoThe Impact of Advanced Placement Incentive Programsrdquo Center for Evaluation and

Education Policy Indiana University Education Policy Brief 8(1)

Hopkins Katy 2012 ldquoWeigh the Benefits Stress of AP Courses for Your Studentrdquo US News

the World Report May 10

Huber Martin 2013 ldquoA Simple Test for the Ignorability of Non-compliance in Experimentsrdquo

Economics Letters 120 (3) 389ndash391

Imbens G and F Manski 2004 ldquoConfidence Intervals for Partially Identified Parametersrdquo

Econometrica 72 (6) 1845ndash1857

Jackson C Kirabo 2010 ldquoA Little Now for a Lot Later A Look at a Texas Advanced Placement

Incentive Programrdquo Journal of Human Resources 45 (3) 591ndash639

__________ 2014 ldquoDo College-Preparatory Programs Improve Long-Term Outcomesrdquo

Economic Inquiry 52 (1) 72ndash99

Joensen Juanna Schroslashter and Helena Skyt Nielsen 2009 ldquoIs there a Causal Effect of High

School Math on Labor Market Outcomesrdquo Journal of Human Resources 44 (1) 171ndash

198

Kim Emily 2015 ldquoAP Classes often Translate to Advanced Pressurerdquo Los Angeles Times

September 22

Klopfenstein Kristin and Kit Lively 2016 ldquoDo Grade Weights Promote More Advanced

17

Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324

Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement

Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891

__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and

Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds

Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188

Cambridge Harvard Education Press

Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla

Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)

287ndash 313

Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on

Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102

Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations

of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347

(6219) 262ndash265

Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math

and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic

Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student

STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher

Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking

on Secondary and Postsecondary Successrdquo American Educational Research Journal 49

(2) 285ndash322

Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP

Expansion Can Schools in Less-Resourced Communities Successfully Implement

Advanced Placement Science Coursesrdquo Conditionally accepted by Educational

Researcher

Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo

American Enterprise Institute Washington DC

Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23

McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy

Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of

Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-

144) US Department of Education Washington DC National Center for Education

Statistics

National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of

Mathematics and Science in US High Schoolsrdquo Washington DC National Academies

Press

__________ 2012 A Framework for K-12 Science Education Practices Crosscutting

Concepts and Core Ideas Washington DC The National Academies Press

Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC

Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data

Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures

Version 10 Stanford University

Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic

Analysis amp Policy 4 (1) 1ndash30

18

Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The

Review of Economics and Statistics 86 (2) 497ndash513

Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)

Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of

Advanced High School Coursework in Increasing STEM Career Interestrdquo Science

Educator 23 (1) 1ndash13

Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework

in College Admission Decisionsrdquo College and University 82 (4) 7ndash14

Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan

Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific

Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo

Educational Measurement Forthcoming

Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where

it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor

Economics 35 (1) 67ndash147

Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An

Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732

Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual

differencesrdquo Personality and Individual Differences 21 (6) 971ndash986

Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of

Cross-Cultural Psychology 45 (5) 821ndash837

Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid

Growthrdquo The New York Times April 29

Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo

Liberal Education 94 (3) 38ndash43

The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo

Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo

Education Trust June 5

Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and

Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-

001) US Department of Education Washington DC National Center for Education

Statistics

Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13

Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate

US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the

Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced

Placement Testsrdquo Washington DC

Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of

Advanced Placementrdquo Progressive Policy Institute Washington DC

West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth

Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring

Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation

and Policy Analysis 38 (1) 148ndash170

Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity

of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482

19

Figure 1

Geographic Distribution of Participating Districts

20

Figure 2

Participating Districts Neighborhood Socioeconomic Status and School Test Scores

Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school

district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos

neighborhood defined as the first principal component factor score based on measures of median

income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed

household rate and unemployment rate Y-axis is the districtrsquos average test score in grade

equivalents based on the averaged spring math and English scores for students in grades 3-8 for

2009-2013 with the expected level of achievement standardized to zero The size of each circle

is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using

Statarsquos default settings and roughly shows the predicted test score as a function of the

neighborhoodrsquos SES

21

Figure 3

Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile

Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects

Corresponding OLS estimate shown by the dashed horizontal line Science skill has been

standardized to have a mean of 0 and SD of 1 for the full sample of participating students

Results are weighted by the inverse probability of completing the survey

22

Table 1

Participating Schools and Teachers Compared to Other US High Schools and High School

Science Teachers Panel A Schools Participating Others

Average Enrollment 1409 723

Free or Reduced-Price Lunch 0700 0438

Asian 0055 0050

Black 0349 0154

Hispanic 0410 0221

White 0164 0537

Adjusted Cohort Graduation Rate 0843 0802

District Instruction Expenditures Per Pupil $6561 $5636

District Student Services Expenditures Per Pupil $3787 $3385

Panel B Teachers Participating Others

Age Under 30 0407 0160

Age 30-49 0432 0553

Age 50 or over 0161 0287

Female 0630 0536

Hispanic or Latino 0111 0051

Race American Indian or Alaska Native 0000 0009

Race Asian American 0111 0041

Race Black 0111 0060

Race Native Hawaiian or other Pacific Islander 0000 0004

Race White 0778 0896

Years of Experience 103 132

Years of Experience lt=2 0290 0085

Years of Experience lt=5 0481 0234

Hold a Teaching Certificate 0926 0945

Undergraduate Major in STEM 0944 0747

Single Subject Credential in Science 0630 0823

Masterrsquos Degree or Higher 0356 0615

Previously Taught AP Course 0469 NA

Previously Taught AP IB or Honors Course 0796 NA

Number of Professional Development Trainings 309 NA

in the Past 5 years (0-5)

Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts

httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public

high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a

9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the

Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey

httpsncesedgovsurveyssass Others in Panel B refers to public and private high school

teachers in the US High school science teachers are defined as teachers of grades 9-12 whose

main teaching assignment is in the natural sciences

23

Table 2

TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics

(1) (2) (3) (4) (5) (6)

Full Sample Survey Sample

Pre-Treatment Characteristic

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Age as of October of 11th Grade 166 -003 -007 166 -001 -001

(002) (007) (003) (009)

[019] [035] [065] [094]

Math Exam Score 038 008 025 044 007 030

(004) (010) (005) (016)

[008] [002] [017] [006]

Reading Exam Score 029 010 018 036 009 017

(003) (012) (004) (017)

[000] [014] [002] [031]

HS Grade Point Average 316 005 020 323 006 013

(003) (008) (003) (010)

[014] [002] [006] [020]

Female 059 000 010 061 -001 011

(003) (006) (004) (007)

[099] [010] [073] [012]

Asian American 012 002 010 012 003 010

(002) (005) (001) (007)

[027] [006] [007] [012]

Black 032 -002 -006 027 000 -005

(002) (006) (002) (005)

[029] [028] [088] [040]

Hispanic Native American or Multiracial 031 001 005 033 001 005

24

(002) (006) (002) (007)

[055] [041] [081] [051]

Disabled 002 000 -001 001 000 -001

(001) (001) (001) (001)

[093] [024] [057] [05]

Gifted 013 003 000 014 002 001

(002) (005) (002) (009)

[006] [100] [025] [089]

English Language Learner 005 001 002 004 001 004

(001) (002) (001) (003)

[041] [039] [054] [022]

Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007

(002) (007) (003) (009)

[066] [077] [072] [045]

Language Other than English Spoken at Home 034 002 003 035 001 004

(002) (007) (002) (007)

[032] [073] [059] [056]

Took Recommended Prerequisite Courses 079 000 009 079 002 005

(002) (004) (002) (005)

[084] [004] [043] [031]

Number of Observations 1819 1417

Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by

School x Cohort are in parentheses and p-values are in brackets

25

Table 3

First Stage Impacts on AP Course Enrollment and Overall Course Enrollment

(1) (2) (3) (4) (5) (6)

Full Sample Survey Respondents

Outcome

Control

Group

Mean

ITT

LATE

Control

Group

Mean

ITT

LATE

AP Treatment Course Enrollment 019 038 024 039

(005) (006)

[000] [000] Share of Credits During Study Year in

AP Science 003 004 011 003 004 010

(001) (001) (001) (001)

[000] [000] [000] [000]

All AP 013 004 011 014 004 010

(001) (002) (001) (002)

[000] [000] [000] [000]

Other Advanced Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [023] [020] [020]

All Other Advanced 025 -001 -003 025 -001 -003

(001) (002) (001) (003)

[023] [023] [030] [030]

Regular Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [020] [024] [019]

All Regular 062 -003 -009 061 -003 -007

(001) (003) (001) (003)

[002] [000] [007] [003]

Number of Observations 1819 1417

Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating

Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation

(1) Course-taking information collected from student transcripts Control Group Mean uses the

full control group for the first outcome (ie AP Treatment Course Enrollment) and those control

group members who complied with their assignment (ie those who did not take the AP

Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are

weighted by the inverse probability of completing the survey Standard errors clustered by School

x Cohort are in parentheses and p-values are in brackets

26

Table 4

Treatment Contrast (Composite Variables)

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Academically Challenging Curriculum -033 031 080

(010) (024)

[000] [000]

Project-Based Independent Classroom

Activities -006 013 033

(007) (017)

[007] [006]

Integrated Use of Technology

-011 011 028

(008) (019)

[019] [014]

Number of Observations 1417

Notes To construct these composite variables we first converted the values on each component

variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest

category was set to 10 the lowest to 00 and the remaining categories evenly spaced between

00 and 10 We then averaged and standardized these converted values Results are weighted by

the inverse probability of completing the survey Online Appendix Table 5 provides the list of

component variables Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

27

Table 5

AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Science Skill -019 009 023

(006) (016)

[015] [014]

STEM Interest 062 004 009

(002) (007)

[016] [016]

Confidence in College

Science 092 -004 -010

(002) (005)

[011] [006]

Stress 012 007 017

(003) (007)

[002] [001]

Grades in Science Courses 280 -012 -029

(007) (016)

[008] [007]

Grades in Other Courses 314 -007 -018

(002) (006)

[000] [000]

Number of Observations 1819 for grades 1417 for other

outcomes

Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of

participating students STEM interest =1 if high or some interest in pursuing a STEM degree or

=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to

complete a college science course or =0 if somewhat not confident or not at all confident Stress=

1 if most recent science course had strong negative or negative impact on physical or emotional

health or =0 if strong positive impact positive impact or no impact Grades in science and other

courses are obtained from student transcripts and measure grades during the study year

Results with the exception of grades during study year are weighted by the inverse probability of

completing the survey Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

Table 6

Robustness Checks of Main ITT Results

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Outcome

Control

Group

Complie

r Mean

Main

Result

s

Robus

t SE

p-value

(permutatio

n test)

Excludin

g High

School

56

Including

Imputatio

n of

Missing

Outcome

Variables

Excluding

Covariate

s

Excludin

g High

School

23

Lee

Lower

Boun

d

Lee

Upper

Boun

d

95

Confidence

Interval

from Lee

Bounds

Rati

o of

95

CI in

(11)

to

95

CI in

(7)

Science Skill -019 009 010 011 020 007 003 039

-

009

05

1 20

(006) (005) (000) (000) (000) (000) (007) (007)

[015] [006] [006] [020] [011] [001] [024] [072] [000]

STEM Interest 062 004 005 003 003 003 002 012

-

003

01

8 19

(002) (003) (000) (000) (000) (000) (003) (004)

[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College

Science 092 -004 -003 -006 -006 -004 -006 005

-

009

01

0 20

(002) (002) (000) (000) (000) (000) (002) (003)

[011] [005] [007] [037] [002] [003] [010] [000] [017]

Stress 012 007 005 006 008 007 001 011

-

005

01

5 16

(003) (002) (000) (000) (000) (000) (003) (002)

[002] [000] [000] [014] [007] [002] [002] [079] [000]

Grades in Science Courses 280 -012 -006 -010 -007 |

(007) (004) (000) (000) (000)

[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts

Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey

(002) (003) (000) (000) (000) |

[000] [001] [001] [000] [001] [038]

Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than

standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby

a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of

43

the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)

reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the

experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply

imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and

where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)

from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those

treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and

control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to

derive confidence interval for the treatment effect itself)

30

1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the

Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the

effect of advanced high school courses more generally often without distinctions between AP

and other rigorous course options Nearly all of these nonexperimental studies find large positive

effects of rigorous secondary school courses particularly those in math and science on studentsrsquo

high school postsecondary and labor market performance (eg Altonji 1995 Attewell and

Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long

Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer

an AP course are encouraged (though not required) to attend a professional development

training The Board and other independent agencies offer several workshops with the most

extensive training being the AP summer institute a week-long training that is led by an

experienced AP instructor Teachers are then expected to develop their syllabi for the course and

submit them to the Board for review A team of auditors at the Board review each syllabus and

grant permission to a school to label the course as AP on course catalogs and student transcripts

once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they

do not meet the requirements upon original submission College Board (2017b) contains a

discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for

assessment (ie course delivery and student performance are not assessed by the Board) In

order to effectively run an AP Biology or Chemistry course teachers require access to a well-

equipped classroom and laboratory including all supplies necessary to engage in

experimentation (eg beakers solutions microscopes measuring equipment) Most of the

teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of

learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry

reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses

were based upon recommendations from the National Science Foundation the National Research

Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently

influenced by their frames of reference in ways that other assessments of these traits (eg

external observations) may be less influenced By increasing the standard to which they compare

themselves studentsrsquo confidence may decrease This feature of most self-assessments could be

considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et

al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome

depends to some extent on how these changes in perceived ability influence other behaviors

such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and

Biology I and Chemistry I for AP Biology with no additional requirements beyond these

prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week

training course classroom supplies (eg lab materials textbooks) and to compensate schools

for the staff time required for study administration efforts We also offered $1000 compensation

for an individual selected by the school to serve as a liaison between the study team and the

31

school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of

students which would have powered the study to detect effect sizes smaller than those detected

here We faced several challenges in recruiting schools to participate even with the monetary

incentives Some schools were uncomfortable with randomization across classrooms while

others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the

course would be offered We also made some assignments on a rolling basis as additional

consentassent forms were submitted We have no information on the students who were deemed

eligible by the school to take the new AP science course but who did not sign the consent form

to participate As these students did not participate we do not have permission to obtain

information on their characteristics (eg via transcripts) and for most schools we do not know

the number of such students 10 Participating districts include Anaheim Union High School District California East Side

Union High School District California Lynwood Unified School District California Jefferson

Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg

Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public

Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville

Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate

degrees held by teachers nationally are likely to be in education (not STEM) Thus the study

teachers are less likely to have a graduate degree but not necessarily less likely to have STEM

training We also did not survey teachers regarding their Teach for America (TFA) experience

but it is possible that the relatively high share of STEM undergraduate degrees could be driven

by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last

pilot test included 140 students) prior to administering the tool to study participants Reliability

metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of

the development of the assessment instrument in the survey can be found in Seeratan et al

(2017) 13 Each year in the spring semester our team administered and collected the participant surveys

during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However

if study participants who did not take the survey differ in unobserved ways then our reweighting

based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo

characteristics before imputation of missing values (as described below) these results are very

similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact

that some students did not have 10th grade scores) we created one reading and math score for

each student that is the average of both scores or just the 8th grade score For the 23 participating

students who were in 10th grade during the year in which the AP course was offered to their

cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be

endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of

enrollment in the course so the lack of balance is simply due to unlucky randomization rather

32

than manipulation by school administrators We considered implementing a randomized block

design to avoid such issues but found it infeasible to obtain the necessary test score information

prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was

allowed to register for the new class We added an entire planning year to our study design to

avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by

Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We

find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these

six outcomes which suggests that generalizing our estimated treatment effects to the full control

group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly

offered enrollment in an AP course and then given the option of Chemistry or Biology To

account for the two courses offered we treat the school as two separate groups School-

Chemistry and School-Biology For those students who were not offered an AP course we

randomly assign them to one of two control groups proportional to the number of treated

students who chose each course For example if 60 of the treated students chose Biology then

we randomly assign 60 of the control students to the School-Biology control group In Section

VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a

probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1

if student i in school by cohort j completed any part of the end-of-year survey Xi is the same

vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed

effects and Φ() is the cumulative normal distribution function The results of this regression are

included in Online Appendix Table 2 Students who had higher pre-treatment grades Black

students those who were not disabled and those who took prerequisite courses were more likely

to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives

more weight in the regression to study participants who completed the survey and yet had pre-

study characteristics that were similar to those study participants who did not complete the

survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and

with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we

observe each outcome variable This follows a multiple imputation then deletion strategy

suggested by Hippel (2007) which improves efficiency while protecting against problematic

imputed outcome values As a robustness check Section VC provides results including

imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually

challenging curriculum with more homework than non-AP complier students Treatment group

students are also more likely to report that the students in their class were driven to succeed and

that the teacher set high standards The AP science class also involved more student-led projects

or experiments hands on learning and small group work all activities that are deemed to be

essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)

Yet we do not find strong evidence that students in AP classes were more likely to present what

they learned apply their knowledge to solve a new problem or work independently and none of

the component measures of technology usage were statistically significantly affected Nor did

33

treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear

better able to implement the academic rigor expected of an AP science class than some of the

inquiry-based approaches that the College Board intends for AP science We do not find

evidence that taking AP science led students to be more likely to report that they found their

course more interesting which may reflect the inability of the teachers to fully implement a

creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects

that might render our estimated effects smaller A research design with randomization both

across and within schools would allow for estimation of spillover effects but such a design was

infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers

received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the

weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors

in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple

outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons

(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same

three outcomes that reach statistical significance without applying the correction (shown in

Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys

from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we

have included the students from cohort 1 of high school number 23 where nonresponse was due

mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes

(including student prior academic preparation raceethnicity gender and teacher preparation)

We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in

science and grades in other courses) Some of the differences in the point estimates were quite

large yet so too were the standard errors For instance five of the seven estimated differential

treatment effects on science skill exceed 025 standard deviations with p-values that fall in the

suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse

on all three cohorts of study participants Once data collection is complete we will have the

ability to examine the effect of AP science on college enrollment college selectivity and college

completion

Page 7: Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown,

7

extracurricular and leisure If AP courses are more demanding than other courses students

solving a time allocation problem may shift more effort into their AP course away from other

pursuits The impact of this change in time allocation on studentsrsquo performance in AP and other

courses will depend upon whether they shift effort away from other courses and on the degree of

complementarity between their AP science course and their other courses Study time devoted to

an AP science course could improve student performance in other math and science classes

(where the skills tasks and knowledge are similar) even if students spend less time on those

courses For courses that require students to perform tasks that are not complementary with AP

science (eg courses in the humanities) taking AP science concurrently with these courses

could decrease student performance in both courses Of course students taking an AP course

could choose to reduce time spent on alternative (non-academic) activities If these other

activities have no causal impact on performance in school then the impact on overall

achievement could be negligible

Some students report concerns about their time allocation as they weigh the decision to enroll

in AP (Foust Hertberg-Davis and Callahan 2009 Hopkins 2012 Kim 2015) Many of these

concerns have increased over time as the courses have become more accessible to students who

previously faced barriers to enrollment Traditionally teachers only recommended AP courses to

students with high grades in prerequisite classes and the courses were only offered in schools

with substantial resources The Board has made efforts to increase access with for instance a

policy statement that encourages schools to open AP to all students who are ldquowilling to accept

the challengerdquo and remove all barriers that restrict access (College Board 2002)6 In a 2008

survey of a nationally-representative sample 65 percent of secondary school teachers reported

that their schools encourage as many students as possible to take AP and 69 percent reported that

AP courses are generally open to any student who wants to enroll (Duffett and Farkas 2009)

These open access policies have led to complaints that students who enroll with less preparation

will be unable to engage in the material (and perhaps become more discouraged by the

difficulty of the course) than students with more prior preparation (Hopkins 2012 Steinberg

2009 Duffett and Farkas 2009) Open access could also adversely affect more prepared students

through negative peer effects or through teachers removing content and slowing the pace of

course delivery

III AP Science Impact Study

A Overview

We recruited 23 schools from across the United States and offered monetary compensation to

pay for equipment and teacher training and as an incentive to secure participation7 Eligible

schools included ones that had not offered AP Biology or AP Chemistry in recent years were

willing to add such a course and comply with study protocol and had more eligible students than

could be served in one class so as to supply a sufficiently-sized control group8 Of the 23

schools 12 schools added AP Chemistry 10 schools added AP Biology and 1 school added both

courses We recruited two waves of schools (those that offered the course for the first time in

2013 and those that offered it for the first time in 2014) both waves were asked to field the

course for two years and the earlier-joining schools had the option of fielding the course for

three years The study includes 47 schools by cohort groups

Each participating school identified students that the school deemed eligible to take the new

AP Biology or Chemistry course in the spring of the prior year We treated all eligible students

8

who assented to participate in the study and who obtained consent from their parent or guardian

as study participants Upon receipt of signed consentassent forms we randomly offered

enrollment in the newly launched course to a subset of participating students9 The study

includes a total of 27 teachers and 1819 students (with an average of approximately 19 students

per AP class)

Figure 1 shows the geographic distribution of the 11 participating districts which are

primarily concentrated in the western southern and eastern regions of the country10 The

underrepresentation of districts in the Midwest is consistent with evidence that the Midwestern

region has experienced less competition over the years in access to selective postsecondary

institutions and a corresponding lag in AP participation rates (Bound Hershbein and Long

2009) Relative to districts across the nation those participating in the study tend to be in

neighborhoods with lower levels of socioeconomic status and to educate students who score

below average on tests in earlier grades (see Figure 2) Correspondingly participating schools

tend to be larger and more likely to educate students who are eligible for free or reduced-price

lunch Black and Hispanic than other schools (Panel A of Table 1)

There are two reasons for this over-representation of larger schools serving less economically

prosperous communities First AP courses are already offered in the majority of the nationrsquos

public high schools and schools that serve students from high-income families tend to offer

more AP subjects than schools that serve students from lower-income families (Malkus 2016

Theokas and Saaris 2013) Given that our research design only allowed for schools that had not

recently offered an AP science course the population of schools from which we recruited tended

to be those in settings with fewer resources Second participating schools were required to state

that they believed they would have 60 or more students who were qualified to take the AP

science course and this requirement tended to disqualify smaller high schools

Reflecting the school demographics participating teachers are slightly younger less

experienced and more likely to be female Black Asian American and of Hispanic ethnicity

than US high school science teachers generally (Panel B of Table 1) Nearly half (a third) of our

study teachers have less than or equal to five (two) years of teaching experience which is more

than double (triple) the rate of US high school science teachers Study teachers are more likely to

hold an undergraduate major in a STEM field than other high school science teachers yet far less

likely to hold a mastersrsquo degree and slightly less likely to have earned a teaching credential in

science Most of the participating teachers had previously taught a higher-level course (mostly

honors) yet only 47 percent of them had previously taught an AP course Our research

consequently applies to a population of teachers who are relatively new to the AP science

curriculum and who have generally not received graduate training11 Assuming AP courses

improve with teacher preparation our results likely capture the effect of a less-than-ideal version

of AP and may result in less positive treatment effects than when AP is delivered by teachers

with more training and experience (Clotfelter Ladd and Vigdor 2010)

B Data and Student Descriptive Statistics

We rely on three primary and secondary data sources for impact estimates The first is an

assessment developed and validated by the research team that measures studentsrsquo scientific

inquiry skills We administered this assessment to students in both treatment and control groups

and designed it to measure general inquiry skills (eg how to analyze data) rather than specific

content knowledge in Biology or Chemistry To that end the assessment tool includes nine items

that rely on science disciplinary knowledge that is taught in middle school specifically material

from Life Sciences and Physical Sciences The assessment which we administered to all study

9

participants during a 45-minute period measures studentsrsquo skills in data analysis scientific

explanation and scientific argument12 Participating teachers were not provided copies of the

instrument in advance therefore teachers were unable to teach any content material prior to test

administration

The second source is a questionnaire that we administered concurrently with the assessment

and that asks students a number of questions about their most recent science class and their plans

after high school The assessment and questionnaire were completed together and administered

outside of class (henceforth we refer to these instruments as the ldquosurveyrdquo) The third data source

are studentsrsquo high school transcripts which contain data on demographic and socioeconomic

background grades courses standardized exams taken in the 8th and 10th grades as well as high

school completion We use these data to determine the balance of randomization on pre-

treatment covariates estimate the effect of randomization on course-taking (including

compliance) improve the precision of our estimates with statistical controls and estimate

treatment effects on studentsrsquo grades

Our survey response rate was 78 percent13 Attrition can be attributed to student absences

during the dates scheduled for survey administration and communication lapses between school

coordinators and students Students who were randomly assigned to treatment have a 9-

percentage point higher survey response rate Given the possibility of nonrandom sample

attrition we weight all regressions by the inverse of the probability of completing the survey

conditional on student characteristics14 We implement a variety of robustness checks as

additional means to account for nonresponse These include multiple imputation of missing

outcome variables excluding one high school that had a low response rate and using the Lee

(2009) technique to provide bounds on the estimated effects These methods and results are

discussed below

We supplement these data with surveys that we administered online to teachers of the new

AP courses at the conclusion of the course The teacher survey includes questions about their

educational background professional experiences and professional development past and

present instructional practices generally and around science specifically participation in the

College Board AP training ability to cover the content of the AP course and coaching

mentoring and other professional community supports received from the school district and

education community

Table 2 provides balancing tests on pre-treatment characteristics for the full sample and the

survey respondents conditional on school by cohort fixed effects15 Most of the estimated

differences between treatment and control group students on pre-treatment observed

characteristics are small with some notable exceptions In both the full and survey samples

treatment group studentsrsquo reading exam scores were 010 and 009 standard deviations higher

than control group students both at p-values below 005 The magnitude of the treatment-control

difference was slightly lower and less precisely-estimated in math yet also favored treatment

group students16 To adjust for these chance imbalances we include all student covariates as

predictors of outcomes in the models and in the robustness checks we exclude these

covariates17

Table 2 also shows the extent of differences between control group compliers and non-

compliers We find that non-compliers are generally much more academically prepared for AP

science they have higher pre-treatment reading and math test scores and are more likely to have

completed the prerequisite courses On demographics non-compliers are more likely to be Asian

American and female18

10

IV Empirical Strategy

We estimate the effect of taking the AP science course with a standard instrumental variable

specification

(1) 119884119894119895 = 120572119895 + 119860119894119895120573 + 119935119894120574 + 120598119894119895

(2) 119860119875119894119895 = 120575119895 + 119874119891119891119890119903119890119889119894119895120579 + 119935119894120583 + 120598119894119895

where 119860119875119894119895 = 1 if student i enrolled in the AP science course in school x cohort stratum j 119860119894119895 is

the fitted value based on the estimates of the parameters in Equation (2) Offeredij = 1 if the

student is randomized into the treatment group Xi is a vector of pre-treatment covariates

(including age math and reading exam scores from 8th and 10th grade (standardized and

averaged for math and reading separately) cumulative GPA prior to the year when the AP

science course was offered and indicator variables for female racial group (Asian American

Black or Hispanic Native American or Multiracial) disability gifted English Language

Learner eligible for free or reduced-price lunch home language is not English and took

recommended prerequisite courses) and 120572119895 and 120575119895 are school by cohort fixed effects19 We use

two-stage least squares to estimate the model for all outcomes The local average treatment effect

(LATE) estimate is given by β

The intent to treat (ITT) estimate is obtained by replacing 119860119894119895 with Offeredij in Equation (1)

as shown in Equation (3) The coefficient on Offeredij in Equation (3) provides the effect of

being offered enrollment in the new AP science course and is a weighted average of effects on

those who do and do not choose to enroll in the course

(3) 119884119894119895 = 120577119895 + 119874119891119891119890119903119890119889119894119895120591 + 119935119894120582 + 120598119894119895

For outcomes that are obtained from the survey we weight regressions by the inverse of the

estimated probability of completing the survey20 The results are similar without using these

weights (see Online Appendix Tables 3 4 and 6) Since we have some missingness in student

characteristics as a result of either missing student transcripts or certain data elements not

collected by the district we use multiple imputation by chained equations creating 10 imputed

datasets and combine the results21 For inference we cluster standard errors at the level of

treatment assignment (school by cohort) in our analysis of main effects In the analysis of

robustness we report permutation standard errors robust standard errors (for comparison to

permutations) and the statistical significance of the LATE estimates after adjusting our tests of

significance for multiple comparisons

V Results

A Course-Taking and Treatment Contrast

Table 3 provides estimated effects of the randomized offer of enrollment on AP science course

enrollment and share of credits in all courses for the full sample and the survey samples The

first-stage estimates indicate that the offer substantially increased the likelihood of the student

taking the AP science course by 38 percentage points in the full sample and 39 percentage points

in the survey sample As we expected compliance with randomization was imperfect with 42

11

percent of the students who received an offer choosing not to enroll and 19 percent of the control

students enrolling Nearly all of these latter crossovers reflected decisions by the district to

violate the study protocol and let control group students into the course while a few of these

came from hardship exemptions that were requested by the school and granted by the study team

The remaining rows in Table 3 shine light on the courses that were crowded out by the newly

offered AP science course Mechanically treatment group students took more credits in AP

science (an 11-percentage point increase in the share of total credits in the full sample)

Treatment group studentsrsquo share of courses in any AP also increased by 11 percentage points

indicating that they chose not to reduce enrollment in other AP courses Instead taking AP

science appears to have crowded out regular courses (down 9 percentage points) including

regular science courses (down 2 percentage points)22

Approximately 78 percent of the control group compliers took any science course with 34

percent taking a non-AP advanced science course (almost entirely honors courses) during the

study year The control students who did not take AP Biology or Chemistry took a variety of

alternative science courses with the most commonly reported courses including Chemistry

(13) Physics (12) AP Environmental Science (11) Biology (10) Honors Biology (9)

and AnatomyPhysiology (9)

Table 4 provides the contrast in treatment and control group complier reports on the content

and rigor of their science courses for three composite variables We find that taking AP science

yielded a substantially more academically challenging curriculum (up 080 sd p-value lt 001)

and raised the extent of inquiry-based classroom activities (up 033 sd p-value = 006) Our

results also suggest that AP course-takerrsquos classrooms were more likely to use technology (up

028 sd p-value = 014)23 Online Appendix Table 5 shows estimated impacts on each of the

component variables used in constructing the composite variables We find that while AP

classrooms were more inquiry-based than other science classrooms using our composite

measure some of the core components of the inquiry approach that were intended by the Board

(eg applying knowledge to solve a new problem) were not more prevalent in AP science

classes than other science classes24 This contrast between studentsrsquo reports of the content and

rigor of their AP science course relative to other courses available to them offers one measure of

the relative quality of the treatment In a companion manuscript we provide a detailed evaluation

of implementation fidelity (the degree to which the courses were implemented as intended by the

Board) through teacher surveys course syllabi student transcripts and interviews with teachers

and school administrators (Long Conger and McGhee 2018) In that manuscript we find results

that are consistent with the finding that most teachers were able to implement a rigorous AP

science classroom yet they also struggled with the inquiry-based approach and integrating

technology into the classroom

These reported differences between treatment and control group classrooms also hold despite

the fact that many of the teachers selected to teach AP also teach the other science courses taken

by control group students In fact almost 67 percent of AP teachers reported using some of their

AP science strategies and lessons in their non-AP classes These within-school spillovers likely

attenuate observed differences in outcomes between treatment and control group students in the

same school25

B AP Impact on Outcomes

Table 5 reports estimated impacts of AP science on the key outcomes of interest We estimate

that for the typical complier taking AP science raises objectively measured scientific inquiry

skills by 023 standard deviations We are unable to rule out zero treatment impacts with

12

conventionally high levels of confidence (p-value = 014) and consequently refer to these results

as more suggestive than definitive AP science also increased compliersrsquo interest in pursuing a

STEM degree should they enroll in college by 9 percentage points up from a control group

complier mean of 62 percent with again more suggestive than definitive results at traditional

levels of statistical inference (p-value = 016)

Table 5 provides stronger evidence of negative treatment effects on studentsrsquo confidence in

their ability to succeed in a college science course Among control group compliers 92 percent

express that they are at least somewhat confident in their ability to succeed in a college science

course These high levels of confidence are perhaps not surprising since all of our sample

participants demonstrated interest in taking AP Chemistry or Biology as a result of signing the

study assent forms Taking AP science substantially lowered participantsrsquo likelihood of being at

least somewhat confident in their ability to complete college courses in science (down 10

percentage points p-value = 006) We also find large effects of the AP course on studentsrsquo self-

reported stress levels Among control group compliers 12 percent stated that their most recent

science class had a negative or strong negative impact on their stress levels (where a negative

impact indicates more stress) Taking AP science more than doubles this rate raising the

likelihood of stating a negative impact by 17 percentage points (p-value = 001) In results

available from the authors we also examine the effect of taking AP on the full distribution of

studentrsquos self-reported confidence and stress levels We find that taking AP science increases

studentsrsquo likelihood of reporting strong negative impacts on stress by 5 percentage points (p-

value = 005) above the control group complier mean of 2 percent

In addition to experiencing a loss in confidence and an increase in stress treatment group

studentsrsquo grades suffered We estimate that taking AP science reduced studentsrsquo grades in their

science courses by 029 points (p-value = 007) Relative to a control group complier mean of

280 taking AP science lowers studentsrsquo science GPAs during the study year (usually their junior

year) from around a B- to a C+26 This decline is addressed to some degree by high schools that

use a weighted grade point average to upweight grades from AP courses The last row of Table 5

provides our estimated effects of AP science on studentsrsquo grades in other courses AP science

takers score approximately 018 grade points lower than control group compliers in non-science

courses during the study year (p-value below 001) These results suggest that students may be

shifting their effort away from their non-AP classes in order to meet the demands of the

challenging AP course An average of these impacts weighted by studentsrsquo share of credits in

science during the study year assuming that they take AP science (024) suggests that taking AP

science lowers studentsrsquo overall grades by 021 during the year ((-029 times 024) + (-018 times

076))

With our estimates in hand we can easily compute the adjustment that would leave the

studentrsquos GPA during the study year unaffected For students who took AP Biology or Chemistry

as result of this experiment the share of their classes in any AP science subject is predicted to be

14 percent (ie 002 + 012 from Table 3) If these studentsrsquo grades in AP science courses were

boosted by 146 (021014) their GPAs during the study year would be unaffected by their

enrollment in these AP courses This 146 boost is close to the higher end of the practices

documented in Klopfenstein and Lively (2016)27

C Robustness Checks

Table 6 presents a variety of robustness checks of the ITT estimates on our six main outcomes

The first two columns of this table repeat the findings previously shown in Table 5 Columns (3)

and (4) present alternate methods for inference Column (3) reports robust standard errors and

13

Column (4) reports the results of a permutation test where we randomly assign a pseudo

treatment and compute the share of 1000 permutations where the absolute value of the estimated

pseudo treatment effect exceeds the absolute value of the estimated treatment effect shown in

Column (2)28 The resulting p-values from this permutation test are similar to the results using

robust standard errors (shown in Column (3)) resulting in five of the six outcomes with p-values

of less than 01029

Columns (5) through (7) of Table 6 show that the results are robust to (a) dropping the one

high school that offered both AP Biology and AP Chemistry as part of the study (b) including

observations with multiply-imputed missing outcome variables and (c) excluding the high

school with the lowest survey response rate30 Column (8) shows the results when we exclude all

of the Xi covariates where we find much larger estimated positive effects on scientific inquiry

skills and smaller estimated negative effects on grades The differences in the treatment effects

on the remaining three outcomes are modest These results likely reflect the fact that students

who were randomly assigned into the treatment group have higher pre-treatment grades and

reading and math test scores all covariates that strongly correlate with science skill and future

grades

Columns (9) through (12) of Table 6 use the Lee (2009) method to place bounds on our

estimates due to potential nonresponse bias in the student survey used for the first four outcomes

This method trims particular observations from the treatment group (in this case) until it matches

the response rate of the control group The lower (upper) bound estimate trims the treatment

observations with the highest (lowest) values of the outcome Using these lower and upper bound

estimates we compute the 95 percent confidence interval for the treatment effect itself by

applying the Imbens and Manski (2004) method Consistent with our main findings the upper

and lower bound points estimates are positive for science skill (003 and 039 sd) interest in

pursuing a STEM degree (2 and 12 percentage points) and stress (1 and 11 percentage points)

However the 95 percent confidence intervals overlap zero in all cases and are roughly double the

size of the ordinary confidence intervals These results suggest that some additional caution

should be considered in evaluating the effects from outcomes based on the study survey31

Finally we would have liked to report the results of theoretically motivated heterogeneity

analyses yet we lack the statistical power needed to test heterogeneity with a high level of

confidence For example Figure 3 shows a quantile regression conditional on Xi with science

skill as the outcome We find that the point estimates at every quantile are insignificantly

different from the 009 ITT point estimate reported in Table 5 yet the 95 confidence intervals

fail to rule out large positives and negatives Additional heterogeneity results can be found in the

Online Appendix32

VI Conclusion

Most admissions committees at bachelorrsquos degree-granting institutions rely on applicantsrsquo AP

course and exam participation as signals of subject-matter skill and interest rendering the

relationship between AP uptake and college enrollment somewhat deterministic There has been

almost no empirical work to support the theory that AP disproportionately endows high school

students with greater human capital than the other courses available to them Many students

educators and parents have also complained that the rigor of the AP pro- gram causes students to

lose confidence gain stress and perform poorly in other courses We evaluate these claims with

experimental evidence on the impact of AP Biology and Chemistry courses on studentsrsquo skills

14

interests and beliefs We recruited 23 schools that had not previously offered AP Biology or

Chemistry and were willing to permit us to randomize student access to the newly offered

course At the time of our school recruitment an estimated 50 percent of US high schools

already offered AP science classes and they tended to be in relatively higher-income

communities disproportionately serving White students (Malkus 2016) Our study drew from the

remaining population of schools where teachers had lower levels of training than science

teachers nationally and students were disproportionately non-White and poor Consequently our

results on AP impacts best generalize to schools like these that are on the cusp of deciding

whether to offer an AP science course

The estimates suggest that AP science led to improvements in science skill and STEM

interest above the courses that these students would otherwise take Prior research points to

longer-run benefits of AP including a higher likelihood of college enrollment and completion as

well as possible earnings gains (Jackson 2010 2014) Our findings suggest that these long-term

effects are at least partially driven by genuine increases in skill and not due solely to

postsecondary admissions and credit-granting policies33 We also find that AP science classes

substantially increase studentsrsquo stress levels and reduce their confidence in completing a college

science course Students who take AP science also receive lower grades in science and in other

(non-science) courses The cognitive gains from AP science are consistent with evidence that

higher levels of pressure and a lower level of confidence cause students to learn more than they

would otherwise And some of the negative effect on grades can be offset by upwardly weighting

grades in advanced courses

Although we have no direct way to convert our study impacts into monetary values for

students or society our evidence suggests that schools and districts are not making unwise or

costly investments in AP Calculating the differential cost to deliver an AP course versus another

level course in the same subject is difficult given that few schools document per-course

expenditures One recent analysis of a US district that relied on teacher salaries and course

assignments offers a partial cost-analysis Roza (2009) finds approximately $360 more in per-

pupil expenditures to deliver AP versus honors due primarily to smaller class sizes and more

senior teachers in AP This cost does not factor in the time that teachers spend retraining

themselves to teach the new curriculum At the same time relative to other policies aimed at

increasing human capital in high school that are often more costly to implement (such as

reducing class size) offering an AP course may be one of the least expensive options

This study offers the first credible estimates on the impact of a curriculum that is now offered

in the majority of the nationrsquos high schools and used by most postsecondary institutions to assess

applicant potential Our findings offer evidence to support and refute some of the claims made

about the AP program At the same time many important questions remain about differential AP

course impacts along student teacher and school attributes and on different parts of the outcome

distributions What are the general equilibrium effects of AP expansion for instance on college

admissions decisions as AP expands into schools with fewer resources Do AP courses generate

spillover effects on non-AP course-takers via changes in peer interactions and changes in how

teachers teach their non-AP classes These are all questions that warrant further research

15

References

Abadie Alberto Susan Athey Guido W Imbens and Jeffrey Wooldridge 2017 ldquoWhen Should

you Adjust Standard Errors for Clusteringrdquo NBER Working Paper No 24003

Cambridge MA NBER

Adelman Clifford 2006 The Toolbox Revisited Paths to Degree Completion from High School

Through College Washington DC US Department of Education

Aguilar Lauren Greg Walton and Carl Wieman 2014 ldquoPsychological Insights for Improved

Physics Teachingrdquo Physics Today 67 (5) 43ndash49

Altonji Joseph G 1995 ldquoThe Effects of High School Curriculum on Education and Labor

Market Outcomesrdquo The Journal of Human Resources 30 (3) 409ndash438

Anderson Carl R 1976 ldquoCoping Behaviors as Intervening Mechanisms in the Inverted-U-stress-

performance Relationshiprdquo Journal of Applied Psychology 61 (1) 30ndash34

Attewell Paul and Thurston Domina 2008 ldquoRaising the Bar Curricular Intensity and

Academic Performancerdquo Educational Evaluation and Policy Analysis 30 (1) 51ndash71

Avery Christopher Oded Gurantz Michael Hurwitz and Jonathan Smith 2018 ldquoShifting

College Majors in Response to Advanced Placement Exam Scoresrdquo Journal of Human

Resources 53 (4) 918ndash956

Benjamini Yoav and Yosef Hochberg 1995 ldquoControlling the False Discovery Rate A Practical

and Powerful Approach to Multiple Testingrdquo Journal of the Royal Statistical Society 57

(1) 289ndash300

Bennett J S Hogarth F Lubben B Campbell and A Robinson 2010 ldquoTalking Science The

Research Evidence on the Use of Small Group Discussions in Science Teachingrdquo

International Journal of Science Education 32 (1) 69ndash95

Berger Joe 2006 ldquoDemoting Advanced Placementrdquo The New York Times October 4

Boekaerts Monique and Jeroen S Rozendaal 2010 ldquoUsing Multiple Calibration Indices in

Order to Capture the Complex Picture of What Affects Studentsrsquo Accuracy of Feeling of

Confidencerdquo Learning and Instruction 20 (5) 372ndash382

Bound John Brad Hershbein and Bridget Terry Long 2009 ldquoPlaying the Admissions Game

Student Reactions to Increasing College Competitionrdquo The Journal of Economic

Perspectives 23 (4) 119ndash146

Bowie Liz 2013 ldquoMaryland Schools have been Leader in Advanced Placement but Results are

Mixedrdquo The Baltimore Sun August 17

Bush George W 2006 ldquoState of the Union Address by the Presidentrdquo Washington DC The

White House

Chiu Ming Ming and Robert M Klassen 2010 ldquoRelations of Mathematics Self-Concept and its

Calibration with Mathematics Achievement Cultural Differences among Fifteen-year-

olds in 34 Countriesrdquo Learning and Instruction 20 (1) 2ndash17

Clotfelter Charles T Helen F Ladd and Jacob L Vigdor 2010 ldquoTeacher Credentials and

Student Achievement in High School Across-Subject Analysis with Student Fixed

Effectsrdquo Journal of Human Resources 45 (3) 655ndash681

College Board 2002 Equity Policy Statement New York NY

__________ 2011a AP Biology Curriculum Framework 2012-2013 New York NY

__________ 2011b AP Chemistry Curriculum Framework 2013-2014 New York NY

__________ 2017a AP Course and Exam Redesign New York NY

__________ 2017b AP Course Audit New York NY

__________ 2018 AP Program Participation and Performance Data 2018 New York NY

16

Davis Jennifer R 2014 ldquoA Little Goes a Long Way Pressure for College Students to Succeedrdquo

Journal of Undergraduate Research 12 (1) 1ndash9

Dobbie Will and Roland G Fryer Jr 2015 ldquoThe medium-term impacts of high-achieving

charter schoolsrdquo Journal of Political Economy 123 (5) 985ndash1037

Dougherty Chrys and Lynn Mellor 2009 ldquoPreparation Mattersrdquo National Center for

Educational Achievement Washington DC

Dounay Zinth Jennifer 2016 ldquo50-State Comparison Advanced Placement Policiesrdquo Education

Commission of the States

Drew Christopher 2011 ldquoRethinking Advanced Placementrdquo The New York Times January 7

Duffett Ann and Steve Farkas 2009 ldquoGrowing Pains in the Advanced Placement Program Do

Tough Trade-offs Lie Aheadrdquo Thomas B Fordham Institute Washington DC

Ellis Jessica Bailey K Fosdick and Chris Rasmussen 2016 ldquoWomen 15 Times More Likely to

Leave STEM Pipeline after Calculus Compared to Men Lack of Mathematical

Confidence a Potential Culpritrdquo PLOS ONE 11 (7) 1ndash14

Foust Regan Clark Holly Hertberg-Davis and Carolyn M Callahan 2009 ldquoStudentsrsquo

Perceptions of the Non-academic Advantages and Disadvantages of Participation in

Advanced Placement Courses and International Baccalaureate Programsrdquo Adolescence

44 (174) 289ndash312

Geiser Saul and Veronica Santelices 2004 ldquoThe Role of Advanced Placement and Honors

Courses in College Admissionsrdquo Center for Studies in Higher Education Research

Occasional Paper Series CSHE404

Goodman Joshua Samuel 2012 ldquoThe Labor of Division Returns to Compulsory Math

Courseworkrdquo Unpublished Manuscript

Harel O 2009 ldquoThe Estimation of R-squared and Adjusted R-squared in Incomplete Data Sets

Using Multiple Imputationrdquo Journal of Applied Statistics 36 (10) 1109ndash1118

Hippel Paul T von 2007 ldquoRegression with Missing Ys An Improved Strategy for Analyzing

Multiply Imputed Datardquo Sociological Methodology 37 (1) 83ndash117

Holstead Michael S Terry E Spradlin Margaret E McGillivray and Nathan Burroughs 2010

ldquoThe Impact of Advanced Placement Incentive Programsrdquo Center for Evaluation and

Education Policy Indiana University Education Policy Brief 8(1)

Hopkins Katy 2012 ldquoWeigh the Benefits Stress of AP Courses for Your Studentrdquo US News

the World Report May 10

Huber Martin 2013 ldquoA Simple Test for the Ignorability of Non-compliance in Experimentsrdquo

Economics Letters 120 (3) 389ndash391

Imbens G and F Manski 2004 ldquoConfidence Intervals for Partially Identified Parametersrdquo

Econometrica 72 (6) 1845ndash1857

Jackson C Kirabo 2010 ldquoA Little Now for a Lot Later A Look at a Texas Advanced Placement

Incentive Programrdquo Journal of Human Resources 45 (3) 591ndash639

__________ 2014 ldquoDo College-Preparatory Programs Improve Long-Term Outcomesrdquo

Economic Inquiry 52 (1) 72ndash99

Joensen Juanna Schroslashter and Helena Skyt Nielsen 2009 ldquoIs there a Causal Effect of High

School Math on Labor Market Outcomesrdquo Journal of Human Resources 44 (1) 171ndash

198

Kim Emily 2015 ldquoAP Classes often Translate to Advanced Pressurerdquo Los Angeles Times

September 22

Klopfenstein Kristin and Kit Lively 2016 ldquoDo Grade Weights Promote More Advanced

17

Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324

Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement

Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891

__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and

Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds

Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188

Cambridge Harvard Education Press

Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla

Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)

287ndash 313

Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on

Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102

Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations

of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347

(6219) 262ndash265

Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math

and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic

Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student

STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher

Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking

on Secondary and Postsecondary Successrdquo American Educational Research Journal 49

(2) 285ndash322

Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP

Expansion Can Schools in Less-Resourced Communities Successfully Implement

Advanced Placement Science Coursesrdquo Conditionally accepted by Educational

Researcher

Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo

American Enterprise Institute Washington DC

Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23

McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy

Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of

Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-

144) US Department of Education Washington DC National Center for Education

Statistics

National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of

Mathematics and Science in US High Schoolsrdquo Washington DC National Academies

Press

__________ 2012 A Framework for K-12 Science Education Practices Crosscutting

Concepts and Core Ideas Washington DC The National Academies Press

Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC

Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data

Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures

Version 10 Stanford University

Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic

Analysis amp Policy 4 (1) 1ndash30

18

Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The

Review of Economics and Statistics 86 (2) 497ndash513

Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)

Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of

Advanced High School Coursework in Increasing STEM Career Interestrdquo Science

Educator 23 (1) 1ndash13

Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework

in College Admission Decisionsrdquo College and University 82 (4) 7ndash14

Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan

Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific

Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo

Educational Measurement Forthcoming

Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where

it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor

Economics 35 (1) 67ndash147

Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An

Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732

Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual

differencesrdquo Personality and Individual Differences 21 (6) 971ndash986

Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of

Cross-Cultural Psychology 45 (5) 821ndash837

Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid

Growthrdquo The New York Times April 29

Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo

Liberal Education 94 (3) 38ndash43

The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo

Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo

Education Trust June 5

Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and

Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-

001) US Department of Education Washington DC National Center for Education

Statistics

Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13

Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate

US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the

Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced

Placement Testsrdquo Washington DC

Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of

Advanced Placementrdquo Progressive Policy Institute Washington DC

West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth

Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring

Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation

and Policy Analysis 38 (1) 148ndash170

Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity

of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482

19

Figure 1

Geographic Distribution of Participating Districts

20

Figure 2

Participating Districts Neighborhood Socioeconomic Status and School Test Scores

Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school

district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos

neighborhood defined as the first principal component factor score based on measures of median

income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed

household rate and unemployment rate Y-axis is the districtrsquos average test score in grade

equivalents based on the averaged spring math and English scores for students in grades 3-8 for

2009-2013 with the expected level of achievement standardized to zero The size of each circle

is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using

Statarsquos default settings and roughly shows the predicted test score as a function of the

neighborhoodrsquos SES

21

Figure 3

Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile

Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects

Corresponding OLS estimate shown by the dashed horizontal line Science skill has been

standardized to have a mean of 0 and SD of 1 for the full sample of participating students

Results are weighted by the inverse probability of completing the survey

22

Table 1

Participating Schools and Teachers Compared to Other US High Schools and High School

Science Teachers Panel A Schools Participating Others

Average Enrollment 1409 723

Free or Reduced-Price Lunch 0700 0438

Asian 0055 0050

Black 0349 0154

Hispanic 0410 0221

White 0164 0537

Adjusted Cohort Graduation Rate 0843 0802

District Instruction Expenditures Per Pupil $6561 $5636

District Student Services Expenditures Per Pupil $3787 $3385

Panel B Teachers Participating Others

Age Under 30 0407 0160

Age 30-49 0432 0553

Age 50 or over 0161 0287

Female 0630 0536

Hispanic or Latino 0111 0051

Race American Indian or Alaska Native 0000 0009

Race Asian American 0111 0041

Race Black 0111 0060

Race Native Hawaiian or other Pacific Islander 0000 0004

Race White 0778 0896

Years of Experience 103 132

Years of Experience lt=2 0290 0085

Years of Experience lt=5 0481 0234

Hold a Teaching Certificate 0926 0945

Undergraduate Major in STEM 0944 0747

Single Subject Credential in Science 0630 0823

Masterrsquos Degree or Higher 0356 0615

Previously Taught AP Course 0469 NA

Previously Taught AP IB or Honors Course 0796 NA

Number of Professional Development Trainings 309 NA

in the Past 5 years (0-5)

Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts

httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public

high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a

9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the

Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey

httpsncesedgovsurveyssass Others in Panel B refers to public and private high school

teachers in the US High school science teachers are defined as teachers of grades 9-12 whose

main teaching assignment is in the natural sciences

23

Table 2

TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics

(1) (2) (3) (4) (5) (6)

Full Sample Survey Sample

Pre-Treatment Characteristic

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Age as of October of 11th Grade 166 -003 -007 166 -001 -001

(002) (007) (003) (009)

[019] [035] [065] [094]

Math Exam Score 038 008 025 044 007 030

(004) (010) (005) (016)

[008] [002] [017] [006]

Reading Exam Score 029 010 018 036 009 017

(003) (012) (004) (017)

[000] [014] [002] [031]

HS Grade Point Average 316 005 020 323 006 013

(003) (008) (003) (010)

[014] [002] [006] [020]

Female 059 000 010 061 -001 011

(003) (006) (004) (007)

[099] [010] [073] [012]

Asian American 012 002 010 012 003 010

(002) (005) (001) (007)

[027] [006] [007] [012]

Black 032 -002 -006 027 000 -005

(002) (006) (002) (005)

[029] [028] [088] [040]

Hispanic Native American or Multiracial 031 001 005 033 001 005

24

(002) (006) (002) (007)

[055] [041] [081] [051]

Disabled 002 000 -001 001 000 -001

(001) (001) (001) (001)

[093] [024] [057] [05]

Gifted 013 003 000 014 002 001

(002) (005) (002) (009)

[006] [100] [025] [089]

English Language Learner 005 001 002 004 001 004

(001) (002) (001) (003)

[041] [039] [054] [022]

Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007

(002) (007) (003) (009)

[066] [077] [072] [045]

Language Other than English Spoken at Home 034 002 003 035 001 004

(002) (007) (002) (007)

[032] [073] [059] [056]

Took Recommended Prerequisite Courses 079 000 009 079 002 005

(002) (004) (002) (005)

[084] [004] [043] [031]

Number of Observations 1819 1417

Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by

School x Cohort are in parentheses and p-values are in brackets

25

Table 3

First Stage Impacts on AP Course Enrollment and Overall Course Enrollment

(1) (2) (3) (4) (5) (6)

Full Sample Survey Respondents

Outcome

Control

Group

Mean

ITT

LATE

Control

Group

Mean

ITT

LATE

AP Treatment Course Enrollment 019 038 024 039

(005) (006)

[000] [000] Share of Credits During Study Year in

AP Science 003 004 011 003 004 010

(001) (001) (001) (001)

[000] [000] [000] [000]

All AP 013 004 011 014 004 010

(001) (002) (001) (002)

[000] [000] [000] [000]

Other Advanced Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [023] [020] [020]

All Other Advanced 025 -001 -003 025 -001 -003

(001) (002) (001) (003)

[023] [023] [030] [030]

Regular Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [020] [024] [019]

All Regular 062 -003 -009 061 -003 -007

(001) (003) (001) (003)

[002] [000] [007] [003]

Number of Observations 1819 1417

Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating

Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation

(1) Course-taking information collected from student transcripts Control Group Mean uses the

full control group for the first outcome (ie AP Treatment Course Enrollment) and those control

group members who complied with their assignment (ie those who did not take the AP

Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are

weighted by the inverse probability of completing the survey Standard errors clustered by School

x Cohort are in parentheses and p-values are in brackets

26

Table 4

Treatment Contrast (Composite Variables)

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Academically Challenging Curriculum -033 031 080

(010) (024)

[000] [000]

Project-Based Independent Classroom

Activities -006 013 033

(007) (017)

[007] [006]

Integrated Use of Technology

-011 011 028

(008) (019)

[019] [014]

Number of Observations 1417

Notes To construct these composite variables we first converted the values on each component

variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest

category was set to 10 the lowest to 00 and the remaining categories evenly spaced between

00 and 10 We then averaged and standardized these converted values Results are weighted by

the inverse probability of completing the survey Online Appendix Table 5 provides the list of

component variables Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

27

Table 5

AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Science Skill -019 009 023

(006) (016)

[015] [014]

STEM Interest 062 004 009

(002) (007)

[016] [016]

Confidence in College

Science 092 -004 -010

(002) (005)

[011] [006]

Stress 012 007 017

(003) (007)

[002] [001]

Grades in Science Courses 280 -012 -029

(007) (016)

[008] [007]

Grades in Other Courses 314 -007 -018

(002) (006)

[000] [000]

Number of Observations 1819 for grades 1417 for other

outcomes

Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of

participating students STEM interest =1 if high or some interest in pursuing a STEM degree or

=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to

complete a college science course or =0 if somewhat not confident or not at all confident Stress=

1 if most recent science course had strong negative or negative impact on physical or emotional

health or =0 if strong positive impact positive impact or no impact Grades in science and other

courses are obtained from student transcripts and measure grades during the study year

Results with the exception of grades during study year are weighted by the inverse probability of

completing the survey Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

Table 6

Robustness Checks of Main ITT Results

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Outcome

Control

Group

Complie

r Mean

Main

Result

s

Robus

t SE

p-value

(permutatio

n test)

Excludin

g High

School

56

Including

Imputatio

n of

Missing

Outcome

Variables

Excluding

Covariate

s

Excludin

g High

School

23

Lee

Lower

Boun

d

Lee

Upper

Boun

d

95

Confidence

Interval

from Lee

Bounds

Rati

o of

95

CI in

(11)

to

95

CI in

(7)

Science Skill -019 009 010 011 020 007 003 039

-

009

05

1 20

(006) (005) (000) (000) (000) (000) (007) (007)

[015] [006] [006] [020] [011] [001] [024] [072] [000]

STEM Interest 062 004 005 003 003 003 002 012

-

003

01

8 19

(002) (003) (000) (000) (000) (000) (003) (004)

[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College

Science 092 -004 -003 -006 -006 -004 -006 005

-

009

01

0 20

(002) (002) (000) (000) (000) (000) (002) (003)

[011] [005] [007] [037] [002] [003] [010] [000] [017]

Stress 012 007 005 006 008 007 001 011

-

005

01

5 16

(003) (002) (000) (000) (000) (000) (003) (002)

[002] [000] [000] [014] [007] [002] [002] [079] [000]

Grades in Science Courses 280 -012 -006 -010 -007 |

(007) (004) (000) (000) (000)

[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts

Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey

(002) (003) (000) (000) (000) |

[000] [001] [001] [000] [001] [038]

Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than

standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby

a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of

43

the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)

reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the

experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply

imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and

where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)

from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those

treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and

control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to

derive confidence interval for the treatment effect itself)

30

1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the

Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the

effect of advanced high school courses more generally often without distinctions between AP

and other rigorous course options Nearly all of these nonexperimental studies find large positive

effects of rigorous secondary school courses particularly those in math and science on studentsrsquo

high school postsecondary and labor market performance (eg Altonji 1995 Attewell and

Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long

Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer

an AP course are encouraged (though not required) to attend a professional development

training The Board and other independent agencies offer several workshops with the most

extensive training being the AP summer institute a week-long training that is led by an

experienced AP instructor Teachers are then expected to develop their syllabi for the course and

submit them to the Board for review A team of auditors at the Board review each syllabus and

grant permission to a school to label the course as AP on course catalogs and student transcripts

once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they

do not meet the requirements upon original submission College Board (2017b) contains a

discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for

assessment (ie course delivery and student performance are not assessed by the Board) In

order to effectively run an AP Biology or Chemistry course teachers require access to a well-

equipped classroom and laboratory including all supplies necessary to engage in

experimentation (eg beakers solutions microscopes measuring equipment) Most of the

teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of

learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry

reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses

were based upon recommendations from the National Science Foundation the National Research

Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently

influenced by their frames of reference in ways that other assessments of these traits (eg

external observations) may be less influenced By increasing the standard to which they compare

themselves studentsrsquo confidence may decrease This feature of most self-assessments could be

considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et

al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome

depends to some extent on how these changes in perceived ability influence other behaviors

such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and

Biology I and Chemistry I for AP Biology with no additional requirements beyond these

prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week

training course classroom supplies (eg lab materials textbooks) and to compensate schools

for the staff time required for study administration efforts We also offered $1000 compensation

for an individual selected by the school to serve as a liaison between the study team and the

31

school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of

students which would have powered the study to detect effect sizes smaller than those detected

here We faced several challenges in recruiting schools to participate even with the monetary

incentives Some schools were uncomfortable with randomization across classrooms while

others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the

course would be offered We also made some assignments on a rolling basis as additional

consentassent forms were submitted We have no information on the students who were deemed

eligible by the school to take the new AP science course but who did not sign the consent form

to participate As these students did not participate we do not have permission to obtain

information on their characteristics (eg via transcripts) and for most schools we do not know

the number of such students 10 Participating districts include Anaheim Union High School District California East Side

Union High School District California Lynwood Unified School District California Jefferson

Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg

Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public

Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville

Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate

degrees held by teachers nationally are likely to be in education (not STEM) Thus the study

teachers are less likely to have a graduate degree but not necessarily less likely to have STEM

training We also did not survey teachers regarding their Teach for America (TFA) experience

but it is possible that the relatively high share of STEM undergraduate degrees could be driven

by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last

pilot test included 140 students) prior to administering the tool to study participants Reliability

metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of

the development of the assessment instrument in the survey can be found in Seeratan et al

(2017) 13 Each year in the spring semester our team administered and collected the participant surveys

during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However

if study participants who did not take the survey differ in unobserved ways then our reweighting

based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo

characteristics before imputation of missing values (as described below) these results are very

similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact

that some students did not have 10th grade scores) we created one reading and math score for

each student that is the average of both scores or just the 8th grade score For the 23 participating

students who were in 10th grade during the year in which the AP course was offered to their

cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be

endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of

enrollment in the course so the lack of balance is simply due to unlucky randomization rather

32

than manipulation by school administrators We considered implementing a randomized block

design to avoid such issues but found it infeasible to obtain the necessary test score information

prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was

allowed to register for the new class We added an entire planning year to our study design to

avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by

Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We

find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these

six outcomes which suggests that generalizing our estimated treatment effects to the full control

group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly

offered enrollment in an AP course and then given the option of Chemistry or Biology To

account for the two courses offered we treat the school as two separate groups School-

Chemistry and School-Biology For those students who were not offered an AP course we

randomly assign them to one of two control groups proportional to the number of treated

students who chose each course For example if 60 of the treated students chose Biology then

we randomly assign 60 of the control students to the School-Biology control group In Section

VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a

probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1

if student i in school by cohort j completed any part of the end-of-year survey Xi is the same

vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed

effects and Φ() is the cumulative normal distribution function The results of this regression are

included in Online Appendix Table 2 Students who had higher pre-treatment grades Black

students those who were not disabled and those who took prerequisite courses were more likely

to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives

more weight in the regression to study participants who completed the survey and yet had pre-

study characteristics that were similar to those study participants who did not complete the

survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and

with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we

observe each outcome variable This follows a multiple imputation then deletion strategy

suggested by Hippel (2007) which improves efficiency while protecting against problematic

imputed outcome values As a robustness check Section VC provides results including

imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually

challenging curriculum with more homework than non-AP complier students Treatment group

students are also more likely to report that the students in their class were driven to succeed and

that the teacher set high standards The AP science class also involved more student-led projects

or experiments hands on learning and small group work all activities that are deemed to be

essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)

Yet we do not find strong evidence that students in AP classes were more likely to present what

they learned apply their knowledge to solve a new problem or work independently and none of

the component measures of technology usage were statistically significantly affected Nor did

33

treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear

better able to implement the academic rigor expected of an AP science class than some of the

inquiry-based approaches that the College Board intends for AP science We do not find

evidence that taking AP science led students to be more likely to report that they found their

course more interesting which may reflect the inability of the teachers to fully implement a

creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects

that might render our estimated effects smaller A research design with randomization both

across and within schools would allow for estimation of spillover effects but such a design was

infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers

received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the

weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors

in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple

outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons

(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same

three outcomes that reach statistical significance without applying the correction (shown in

Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys

from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we

have included the students from cohort 1 of high school number 23 where nonresponse was due

mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes

(including student prior academic preparation raceethnicity gender and teacher preparation)

We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in

science and grades in other courses) Some of the differences in the point estimates were quite

large yet so too were the standard errors For instance five of the seven estimated differential

treatment effects on science skill exceed 025 standard deviations with p-values that fall in the

suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse

on all three cohorts of study participants Once data collection is complete we will have the

ability to examine the effect of AP science on college enrollment college selectivity and college

completion

Page 8: Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown,

8

who assented to participate in the study and who obtained consent from their parent or guardian

as study participants Upon receipt of signed consentassent forms we randomly offered

enrollment in the newly launched course to a subset of participating students9 The study

includes a total of 27 teachers and 1819 students (with an average of approximately 19 students

per AP class)

Figure 1 shows the geographic distribution of the 11 participating districts which are

primarily concentrated in the western southern and eastern regions of the country10 The

underrepresentation of districts in the Midwest is consistent with evidence that the Midwestern

region has experienced less competition over the years in access to selective postsecondary

institutions and a corresponding lag in AP participation rates (Bound Hershbein and Long

2009) Relative to districts across the nation those participating in the study tend to be in

neighborhoods with lower levels of socioeconomic status and to educate students who score

below average on tests in earlier grades (see Figure 2) Correspondingly participating schools

tend to be larger and more likely to educate students who are eligible for free or reduced-price

lunch Black and Hispanic than other schools (Panel A of Table 1)

There are two reasons for this over-representation of larger schools serving less economically

prosperous communities First AP courses are already offered in the majority of the nationrsquos

public high schools and schools that serve students from high-income families tend to offer

more AP subjects than schools that serve students from lower-income families (Malkus 2016

Theokas and Saaris 2013) Given that our research design only allowed for schools that had not

recently offered an AP science course the population of schools from which we recruited tended

to be those in settings with fewer resources Second participating schools were required to state

that they believed they would have 60 or more students who were qualified to take the AP

science course and this requirement tended to disqualify smaller high schools

Reflecting the school demographics participating teachers are slightly younger less

experienced and more likely to be female Black Asian American and of Hispanic ethnicity

than US high school science teachers generally (Panel B of Table 1) Nearly half (a third) of our

study teachers have less than or equal to five (two) years of teaching experience which is more

than double (triple) the rate of US high school science teachers Study teachers are more likely to

hold an undergraduate major in a STEM field than other high school science teachers yet far less

likely to hold a mastersrsquo degree and slightly less likely to have earned a teaching credential in

science Most of the participating teachers had previously taught a higher-level course (mostly

honors) yet only 47 percent of them had previously taught an AP course Our research

consequently applies to a population of teachers who are relatively new to the AP science

curriculum and who have generally not received graduate training11 Assuming AP courses

improve with teacher preparation our results likely capture the effect of a less-than-ideal version

of AP and may result in less positive treatment effects than when AP is delivered by teachers

with more training and experience (Clotfelter Ladd and Vigdor 2010)

B Data and Student Descriptive Statistics

We rely on three primary and secondary data sources for impact estimates The first is an

assessment developed and validated by the research team that measures studentsrsquo scientific

inquiry skills We administered this assessment to students in both treatment and control groups

and designed it to measure general inquiry skills (eg how to analyze data) rather than specific

content knowledge in Biology or Chemistry To that end the assessment tool includes nine items

that rely on science disciplinary knowledge that is taught in middle school specifically material

from Life Sciences and Physical Sciences The assessment which we administered to all study

9

participants during a 45-minute period measures studentsrsquo skills in data analysis scientific

explanation and scientific argument12 Participating teachers were not provided copies of the

instrument in advance therefore teachers were unable to teach any content material prior to test

administration

The second source is a questionnaire that we administered concurrently with the assessment

and that asks students a number of questions about their most recent science class and their plans

after high school The assessment and questionnaire were completed together and administered

outside of class (henceforth we refer to these instruments as the ldquosurveyrdquo) The third data source

are studentsrsquo high school transcripts which contain data on demographic and socioeconomic

background grades courses standardized exams taken in the 8th and 10th grades as well as high

school completion We use these data to determine the balance of randomization on pre-

treatment covariates estimate the effect of randomization on course-taking (including

compliance) improve the precision of our estimates with statistical controls and estimate

treatment effects on studentsrsquo grades

Our survey response rate was 78 percent13 Attrition can be attributed to student absences

during the dates scheduled for survey administration and communication lapses between school

coordinators and students Students who were randomly assigned to treatment have a 9-

percentage point higher survey response rate Given the possibility of nonrandom sample

attrition we weight all regressions by the inverse of the probability of completing the survey

conditional on student characteristics14 We implement a variety of robustness checks as

additional means to account for nonresponse These include multiple imputation of missing

outcome variables excluding one high school that had a low response rate and using the Lee

(2009) technique to provide bounds on the estimated effects These methods and results are

discussed below

We supplement these data with surveys that we administered online to teachers of the new

AP courses at the conclusion of the course The teacher survey includes questions about their

educational background professional experiences and professional development past and

present instructional practices generally and around science specifically participation in the

College Board AP training ability to cover the content of the AP course and coaching

mentoring and other professional community supports received from the school district and

education community

Table 2 provides balancing tests on pre-treatment characteristics for the full sample and the

survey respondents conditional on school by cohort fixed effects15 Most of the estimated

differences between treatment and control group students on pre-treatment observed

characteristics are small with some notable exceptions In both the full and survey samples

treatment group studentsrsquo reading exam scores were 010 and 009 standard deviations higher

than control group students both at p-values below 005 The magnitude of the treatment-control

difference was slightly lower and less precisely-estimated in math yet also favored treatment

group students16 To adjust for these chance imbalances we include all student covariates as

predictors of outcomes in the models and in the robustness checks we exclude these

covariates17

Table 2 also shows the extent of differences between control group compliers and non-

compliers We find that non-compliers are generally much more academically prepared for AP

science they have higher pre-treatment reading and math test scores and are more likely to have

completed the prerequisite courses On demographics non-compliers are more likely to be Asian

American and female18

10

IV Empirical Strategy

We estimate the effect of taking the AP science course with a standard instrumental variable

specification

(1) 119884119894119895 = 120572119895 + 119860119894119895120573 + 119935119894120574 + 120598119894119895

(2) 119860119875119894119895 = 120575119895 + 119874119891119891119890119903119890119889119894119895120579 + 119935119894120583 + 120598119894119895

where 119860119875119894119895 = 1 if student i enrolled in the AP science course in school x cohort stratum j 119860119894119895 is

the fitted value based on the estimates of the parameters in Equation (2) Offeredij = 1 if the

student is randomized into the treatment group Xi is a vector of pre-treatment covariates

(including age math and reading exam scores from 8th and 10th grade (standardized and

averaged for math and reading separately) cumulative GPA prior to the year when the AP

science course was offered and indicator variables for female racial group (Asian American

Black or Hispanic Native American or Multiracial) disability gifted English Language

Learner eligible for free or reduced-price lunch home language is not English and took

recommended prerequisite courses) and 120572119895 and 120575119895 are school by cohort fixed effects19 We use

two-stage least squares to estimate the model for all outcomes The local average treatment effect

(LATE) estimate is given by β

The intent to treat (ITT) estimate is obtained by replacing 119860119894119895 with Offeredij in Equation (1)

as shown in Equation (3) The coefficient on Offeredij in Equation (3) provides the effect of

being offered enrollment in the new AP science course and is a weighted average of effects on

those who do and do not choose to enroll in the course

(3) 119884119894119895 = 120577119895 + 119874119891119891119890119903119890119889119894119895120591 + 119935119894120582 + 120598119894119895

For outcomes that are obtained from the survey we weight regressions by the inverse of the

estimated probability of completing the survey20 The results are similar without using these

weights (see Online Appendix Tables 3 4 and 6) Since we have some missingness in student

characteristics as a result of either missing student transcripts or certain data elements not

collected by the district we use multiple imputation by chained equations creating 10 imputed

datasets and combine the results21 For inference we cluster standard errors at the level of

treatment assignment (school by cohort) in our analysis of main effects In the analysis of

robustness we report permutation standard errors robust standard errors (for comparison to

permutations) and the statistical significance of the LATE estimates after adjusting our tests of

significance for multiple comparisons

V Results

A Course-Taking and Treatment Contrast

Table 3 provides estimated effects of the randomized offer of enrollment on AP science course

enrollment and share of credits in all courses for the full sample and the survey samples The

first-stage estimates indicate that the offer substantially increased the likelihood of the student

taking the AP science course by 38 percentage points in the full sample and 39 percentage points

in the survey sample As we expected compliance with randomization was imperfect with 42

11

percent of the students who received an offer choosing not to enroll and 19 percent of the control

students enrolling Nearly all of these latter crossovers reflected decisions by the district to

violate the study protocol and let control group students into the course while a few of these

came from hardship exemptions that were requested by the school and granted by the study team

The remaining rows in Table 3 shine light on the courses that were crowded out by the newly

offered AP science course Mechanically treatment group students took more credits in AP

science (an 11-percentage point increase in the share of total credits in the full sample)

Treatment group studentsrsquo share of courses in any AP also increased by 11 percentage points

indicating that they chose not to reduce enrollment in other AP courses Instead taking AP

science appears to have crowded out regular courses (down 9 percentage points) including

regular science courses (down 2 percentage points)22

Approximately 78 percent of the control group compliers took any science course with 34

percent taking a non-AP advanced science course (almost entirely honors courses) during the

study year The control students who did not take AP Biology or Chemistry took a variety of

alternative science courses with the most commonly reported courses including Chemistry

(13) Physics (12) AP Environmental Science (11) Biology (10) Honors Biology (9)

and AnatomyPhysiology (9)

Table 4 provides the contrast in treatment and control group complier reports on the content

and rigor of their science courses for three composite variables We find that taking AP science

yielded a substantially more academically challenging curriculum (up 080 sd p-value lt 001)

and raised the extent of inquiry-based classroom activities (up 033 sd p-value = 006) Our

results also suggest that AP course-takerrsquos classrooms were more likely to use technology (up

028 sd p-value = 014)23 Online Appendix Table 5 shows estimated impacts on each of the

component variables used in constructing the composite variables We find that while AP

classrooms were more inquiry-based than other science classrooms using our composite

measure some of the core components of the inquiry approach that were intended by the Board

(eg applying knowledge to solve a new problem) were not more prevalent in AP science

classes than other science classes24 This contrast between studentsrsquo reports of the content and

rigor of their AP science course relative to other courses available to them offers one measure of

the relative quality of the treatment In a companion manuscript we provide a detailed evaluation

of implementation fidelity (the degree to which the courses were implemented as intended by the

Board) through teacher surveys course syllabi student transcripts and interviews with teachers

and school administrators (Long Conger and McGhee 2018) In that manuscript we find results

that are consistent with the finding that most teachers were able to implement a rigorous AP

science classroom yet they also struggled with the inquiry-based approach and integrating

technology into the classroom

These reported differences between treatment and control group classrooms also hold despite

the fact that many of the teachers selected to teach AP also teach the other science courses taken

by control group students In fact almost 67 percent of AP teachers reported using some of their

AP science strategies and lessons in their non-AP classes These within-school spillovers likely

attenuate observed differences in outcomes between treatment and control group students in the

same school25

B AP Impact on Outcomes

Table 5 reports estimated impacts of AP science on the key outcomes of interest We estimate

that for the typical complier taking AP science raises objectively measured scientific inquiry

skills by 023 standard deviations We are unable to rule out zero treatment impacts with

12

conventionally high levels of confidence (p-value = 014) and consequently refer to these results

as more suggestive than definitive AP science also increased compliersrsquo interest in pursuing a

STEM degree should they enroll in college by 9 percentage points up from a control group

complier mean of 62 percent with again more suggestive than definitive results at traditional

levels of statistical inference (p-value = 016)

Table 5 provides stronger evidence of negative treatment effects on studentsrsquo confidence in

their ability to succeed in a college science course Among control group compliers 92 percent

express that they are at least somewhat confident in their ability to succeed in a college science

course These high levels of confidence are perhaps not surprising since all of our sample

participants demonstrated interest in taking AP Chemistry or Biology as a result of signing the

study assent forms Taking AP science substantially lowered participantsrsquo likelihood of being at

least somewhat confident in their ability to complete college courses in science (down 10

percentage points p-value = 006) We also find large effects of the AP course on studentsrsquo self-

reported stress levels Among control group compliers 12 percent stated that their most recent

science class had a negative or strong negative impact on their stress levels (where a negative

impact indicates more stress) Taking AP science more than doubles this rate raising the

likelihood of stating a negative impact by 17 percentage points (p-value = 001) In results

available from the authors we also examine the effect of taking AP on the full distribution of

studentrsquos self-reported confidence and stress levels We find that taking AP science increases

studentsrsquo likelihood of reporting strong negative impacts on stress by 5 percentage points (p-

value = 005) above the control group complier mean of 2 percent

In addition to experiencing a loss in confidence and an increase in stress treatment group

studentsrsquo grades suffered We estimate that taking AP science reduced studentsrsquo grades in their

science courses by 029 points (p-value = 007) Relative to a control group complier mean of

280 taking AP science lowers studentsrsquo science GPAs during the study year (usually their junior

year) from around a B- to a C+26 This decline is addressed to some degree by high schools that

use a weighted grade point average to upweight grades from AP courses The last row of Table 5

provides our estimated effects of AP science on studentsrsquo grades in other courses AP science

takers score approximately 018 grade points lower than control group compliers in non-science

courses during the study year (p-value below 001) These results suggest that students may be

shifting their effort away from their non-AP classes in order to meet the demands of the

challenging AP course An average of these impacts weighted by studentsrsquo share of credits in

science during the study year assuming that they take AP science (024) suggests that taking AP

science lowers studentsrsquo overall grades by 021 during the year ((-029 times 024) + (-018 times

076))

With our estimates in hand we can easily compute the adjustment that would leave the

studentrsquos GPA during the study year unaffected For students who took AP Biology or Chemistry

as result of this experiment the share of their classes in any AP science subject is predicted to be

14 percent (ie 002 + 012 from Table 3) If these studentsrsquo grades in AP science courses were

boosted by 146 (021014) their GPAs during the study year would be unaffected by their

enrollment in these AP courses This 146 boost is close to the higher end of the practices

documented in Klopfenstein and Lively (2016)27

C Robustness Checks

Table 6 presents a variety of robustness checks of the ITT estimates on our six main outcomes

The first two columns of this table repeat the findings previously shown in Table 5 Columns (3)

and (4) present alternate methods for inference Column (3) reports robust standard errors and

13

Column (4) reports the results of a permutation test where we randomly assign a pseudo

treatment and compute the share of 1000 permutations where the absolute value of the estimated

pseudo treatment effect exceeds the absolute value of the estimated treatment effect shown in

Column (2)28 The resulting p-values from this permutation test are similar to the results using

robust standard errors (shown in Column (3)) resulting in five of the six outcomes with p-values

of less than 01029

Columns (5) through (7) of Table 6 show that the results are robust to (a) dropping the one

high school that offered both AP Biology and AP Chemistry as part of the study (b) including

observations with multiply-imputed missing outcome variables and (c) excluding the high

school with the lowest survey response rate30 Column (8) shows the results when we exclude all

of the Xi covariates where we find much larger estimated positive effects on scientific inquiry

skills and smaller estimated negative effects on grades The differences in the treatment effects

on the remaining three outcomes are modest These results likely reflect the fact that students

who were randomly assigned into the treatment group have higher pre-treatment grades and

reading and math test scores all covariates that strongly correlate with science skill and future

grades

Columns (9) through (12) of Table 6 use the Lee (2009) method to place bounds on our

estimates due to potential nonresponse bias in the student survey used for the first four outcomes

This method trims particular observations from the treatment group (in this case) until it matches

the response rate of the control group The lower (upper) bound estimate trims the treatment

observations with the highest (lowest) values of the outcome Using these lower and upper bound

estimates we compute the 95 percent confidence interval for the treatment effect itself by

applying the Imbens and Manski (2004) method Consistent with our main findings the upper

and lower bound points estimates are positive for science skill (003 and 039 sd) interest in

pursuing a STEM degree (2 and 12 percentage points) and stress (1 and 11 percentage points)

However the 95 percent confidence intervals overlap zero in all cases and are roughly double the

size of the ordinary confidence intervals These results suggest that some additional caution

should be considered in evaluating the effects from outcomes based on the study survey31

Finally we would have liked to report the results of theoretically motivated heterogeneity

analyses yet we lack the statistical power needed to test heterogeneity with a high level of

confidence For example Figure 3 shows a quantile regression conditional on Xi with science

skill as the outcome We find that the point estimates at every quantile are insignificantly

different from the 009 ITT point estimate reported in Table 5 yet the 95 confidence intervals

fail to rule out large positives and negatives Additional heterogeneity results can be found in the

Online Appendix32

VI Conclusion

Most admissions committees at bachelorrsquos degree-granting institutions rely on applicantsrsquo AP

course and exam participation as signals of subject-matter skill and interest rendering the

relationship between AP uptake and college enrollment somewhat deterministic There has been

almost no empirical work to support the theory that AP disproportionately endows high school

students with greater human capital than the other courses available to them Many students

educators and parents have also complained that the rigor of the AP pro- gram causes students to

lose confidence gain stress and perform poorly in other courses We evaluate these claims with

experimental evidence on the impact of AP Biology and Chemistry courses on studentsrsquo skills

14

interests and beliefs We recruited 23 schools that had not previously offered AP Biology or

Chemistry and were willing to permit us to randomize student access to the newly offered

course At the time of our school recruitment an estimated 50 percent of US high schools

already offered AP science classes and they tended to be in relatively higher-income

communities disproportionately serving White students (Malkus 2016) Our study drew from the

remaining population of schools where teachers had lower levels of training than science

teachers nationally and students were disproportionately non-White and poor Consequently our

results on AP impacts best generalize to schools like these that are on the cusp of deciding

whether to offer an AP science course

The estimates suggest that AP science led to improvements in science skill and STEM

interest above the courses that these students would otherwise take Prior research points to

longer-run benefits of AP including a higher likelihood of college enrollment and completion as

well as possible earnings gains (Jackson 2010 2014) Our findings suggest that these long-term

effects are at least partially driven by genuine increases in skill and not due solely to

postsecondary admissions and credit-granting policies33 We also find that AP science classes

substantially increase studentsrsquo stress levels and reduce their confidence in completing a college

science course Students who take AP science also receive lower grades in science and in other

(non-science) courses The cognitive gains from AP science are consistent with evidence that

higher levels of pressure and a lower level of confidence cause students to learn more than they

would otherwise And some of the negative effect on grades can be offset by upwardly weighting

grades in advanced courses

Although we have no direct way to convert our study impacts into monetary values for

students or society our evidence suggests that schools and districts are not making unwise or

costly investments in AP Calculating the differential cost to deliver an AP course versus another

level course in the same subject is difficult given that few schools document per-course

expenditures One recent analysis of a US district that relied on teacher salaries and course

assignments offers a partial cost-analysis Roza (2009) finds approximately $360 more in per-

pupil expenditures to deliver AP versus honors due primarily to smaller class sizes and more

senior teachers in AP This cost does not factor in the time that teachers spend retraining

themselves to teach the new curriculum At the same time relative to other policies aimed at

increasing human capital in high school that are often more costly to implement (such as

reducing class size) offering an AP course may be one of the least expensive options

This study offers the first credible estimates on the impact of a curriculum that is now offered

in the majority of the nationrsquos high schools and used by most postsecondary institutions to assess

applicant potential Our findings offer evidence to support and refute some of the claims made

about the AP program At the same time many important questions remain about differential AP

course impacts along student teacher and school attributes and on different parts of the outcome

distributions What are the general equilibrium effects of AP expansion for instance on college

admissions decisions as AP expands into schools with fewer resources Do AP courses generate

spillover effects on non-AP course-takers via changes in peer interactions and changes in how

teachers teach their non-AP classes These are all questions that warrant further research

15

References

Abadie Alberto Susan Athey Guido W Imbens and Jeffrey Wooldridge 2017 ldquoWhen Should

you Adjust Standard Errors for Clusteringrdquo NBER Working Paper No 24003

Cambridge MA NBER

Adelman Clifford 2006 The Toolbox Revisited Paths to Degree Completion from High School

Through College Washington DC US Department of Education

Aguilar Lauren Greg Walton and Carl Wieman 2014 ldquoPsychological Insights for Improved

Physics Teachingrdquo Physics Today 67 (5) 43ndash49

Altonji Joseph G 1995 ldquoThe Effects of High School Curriculum on Education and Labor

Market Outcomesrdquo The Journal of Human Resources 30 (3) 409ndash438

Anderson Carl R 1976 ldquoCoping Behaviors as Intervening Mechanisms in the Inverted-U-stress-

performance Relationshiprdquo Journal of Applied Psychology 61 (1) 30ndash34

Attewell Paul and Thurston Domina 2008 ldquoRaising the Bar Curricular Intensity and

Academic Performancerdquo Educational Evaluation and Policy Analysis 30 (1) 51ndash71

Avery Christopher Oded Gurantz Michael Hurwitz and Jonathan Smith 2018 ldquoShifting

College Majors in Response to Advanced Placement Exam Scoresrdquo Journal of Human

Resources 53 (4) 918ndash956

Benjamini Yoav and Yosef Hochberg 1995 ldquoControlling the False Discovery Rate A Practical

and Powerful Approach to Multiple Testingrdquo Journal of the Royal Statistical Society 57

(1) 289ndash300

Bennett J S Hogarth F Lubben B Campbell and A Robinson 2010 ldquoTalking Science The

Research Evidence on the Use of Small Group Discussions in Science Teachingrdquo

International Journal of Science Education 32 (1) 69ndash95

Berger Joe 2006 ldquoDemoting Advanced Placementrdquo The New York Times October 4

Boekaerts Monique and Jeroen S Rozendaal 2010 ldquoUsing Multiple Calibration Indices in

Order to Capture the Complex Picture of What Affects Studentsrsquo Accuracy of Feeling of

Confidencerdquo Learning and Instruction 20 (5) 372ndash382

Bound John Brad Hershbein and Bridget Terry Long 2009 ldquoPlaying the Admissions Game

Student Reactions to Increasing College Competitionrdquo The Journal of Economic

Perspectives 23 (4) 119ndash146

Bowie Liz 2013 ldquoMaryland Schools have been Leader in Advanced Placement but Results are

Mixedrdquo The Baltimore Sun August 17

Bush George W 2006 ldquoState of the Union Address by the Presidentrdquo Washington DC The

White House

Chiu Ming Ming and Robert M Klassen 2010 ldquoRelations of Mathematics Self-Concept and its

Calibration with Mathematics Achievement Cultural Differences among Fifteen-year-

olds in 34 Countriesrdquo Learning and Instruction 20 (1) 2ndash17

Clotfelter Charles T Helen F Ladd and Jacob L Vigdor 2010 ldquoTeacher Credentials and

Student Achievement in High School Across-Subject Analysis with Student Fixed

Effectsrdquo Journal of Human Resources 45 (3) 655ndash681

College Board 2002 Equity Policy Statement New York NY

__________ 2011a AP Biology Curriculum Framework 2012-2013 New York NY

__________ 2011b AP Chemistry Curriculum Framework 2013-2014 New York NY

__________ 2017a AP Course and Exam Redesign New York NY

__________ 2017b AP Course Audit New York NY

__________ 2018 AP Program Participation and Performance Data 2018 New York NY

16

Davis Jennifer R 2014 ldquoA Little Goes a Long Way Pressure for College Students to Succeedrdquo

Journal of Undergraduate Research 12 (1) 1ndash9

Dobbie Will and Roland G Fryer Jr 2015 ldquoThe medium-term impacts of high-achieving

charter schoolsrdquo Journal of Political Economy 123 (5) 985ndash1037

Dougherty Chrys and Lynn Mellor 2009 ldquoPreparation Mattersrdquo National Center for

Educational Achievement Washington DC

Dounay Zinth Jennifer 2016 ldquo50-State Comparison Advanced Placement Policiesrdquo Education

Commission of the States

Drew Christopher 2011 ldquoRethinking Advanced Placementrdquo The New York Times January 7

Duffett Ann and Steve Farkas 2009 ldquoGrowing Pains in the Advanced Placement Program Do

Tough Trade-offs Lie Aheadrdquo Thomas B Fordham Institute Washington DC

Ellis Jessica Bailey K Fosdick and Chris Rasmussen 2016 ldquoWomen 15 Times More Likely to

Leave STEM Pipeline after Calculus Compared to Men Lack of Mathematical

Confidence a Potential Culpritrdquo PLOS ONE 11 (7) 1ndash14

Foust Regan Clark Holly Hertberg-Davis and Carolyn M Callahan 2009 ldquoStudentsrsquo

Perceptions of the Non-academic Advantages and Disadvantages of Participation in

Advanced Placement Courses and International Baccalaureate Programsrdquo Adolescence

44 (174) 289ndash312

Geiser Saul and Veronica Santelices 2004 ldquoThe Role of Advanced Placement and Honors

Courses in College Admissionsrdquo Center for Studies in Higher Education Research

Occasional Paper Series CSHE404

Goodman Joshua Samuel 2012 ldquoThe Labor of Division Returns to Compulsory Math

Courseworkrdquo Unpublished Manuscript

Harel O 2009 ldquoThe Estimation of R-squared and Adjusted R-squared in Incomplete Data Sets

Using Multiple Imputationrdquo Journal of Applied Statistics 36 (10) 1109ndash1118

Hippel Paul T von 2007 ldquoRegression with Missing Ys An Improved Strategy for Analyzing

Multiply Imputed Datardquo Sociological Methodology 37 (1) 83ndash117

Holstead Michael S Terry E Spradlin Margaret E McGillivray and Nathan Burroughs 2010

ldquoThe Impact of Advanced Placement Incentive Programsrdquo Center for Evaluation and

Education Policy Indiana University Education Policy Brief 8(1)

Hopkins Katy 2012 ldquoWeigh the Benefits Stress of AP Courses for Your Studentrdquo US News

the World Report May 10

Huber Martin 2013 ldquoA Simple Test for the Ignorability of Non-compliance in Experimentsrdquo

Economics Letters 120 (3) 389ndash391

Imbens G and F Manski 2004 ldquoConfidence Intervals for Partially Identified Parametersrdquo

Econometrica 72 (6) 1845ndash1857

Jackson C Kirabo 2010 ldquoA Little Now for a Lot Later A Look at a Texas Advanced Placement

Incentive Programrdquo Journal of Human Resources 45 (3) 591ndash639

__________ 2014 ldquoDo College-Preparatory Programs Improve Long-Term Outcomesrdquo

Economic Inquiry 52 (1) 72ndash99

Joensen Juanna Schroslashter and Helena Skyt Nielsen 2009 ldquoIs there a Causal Effect of High

School Math on Labor Market Outcomesrdquo Journal of Human Resources 44 (1) 171ndash

198

Kim Emily 2015 ldquoAP Classes often Translate to Advanced Pressurerdquo Los Angeles Times

September 22

Klopfenstein Kristin and Kit Lively 2016 ldquoDo Grade Weights Promote More Advanced

17

Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324

Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement

Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891

__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and

Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds

Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188

Cambridge Harvard Education Press

Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla

Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)

287ndash 313

Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on

Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102

Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations

of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347

(6219) 262ndash265

Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math

and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic

Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student

STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher

Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking

on Secondary and Postsecondary Successrdquo American Educational Research Journal 49

(2) 285ndash322

Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP

Expansion Can Schools in Less-Resourced Communities Successfully Implement

Advanced Placement Science Coursesrdquo Conditionally accepted by Educational

Researcher

Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo

American Enterprise Institute Washington DC

Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23

McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy

Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of

Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-

144) US Department of Education Washington DC National Center for Education

Statistics

National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of

Mathematics and Science in US High Schoolsrdquo Washington DC National Academies

Press

__________ 2012 A Framework for K-12 Science Education Practices Crosscutting

Concepts and Core Ideas Washington DC The National Academies Press

Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC

Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data

Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures

Version 10 Stanford University

Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic

Analysis amp Policy 4 (1) 1ndash30

18

Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The

Review of Economics and Statistics 86 (2) 497ndash513

Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)

Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of

Advanced High School Coursework in Increasing STEM Career Interestrdquo Science

Educator 23 (1) 1ndash13

Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework

in College Admission Decisionsrdquo College and University 82 (4) 7ndash14

Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan

Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific

Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo

Educational Measurement Forthcoming

Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where

it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor

Economics 35 (1) 67ndash147

Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An

Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732

Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual

differencesrdquo Personality and Individual Differences 21 (6) 971ndash986

Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of

Cross-Cultural Psychology 45 (5) 821ndash837

Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid

Growthrdquo The New York Times April 29

Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo

Liberal Education 94 (3) 38ndash43

The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo

Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo

Education Trust June 5

Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and

Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-

001) US Department of Education Washington DC National Center for Education

Statistics

Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13

Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate

US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the

Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced

Placement Testsrdquo Washington DC

Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of

Advanced Placementrdquo Progressive Policy Institute Washington DC

West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth

Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring

Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation

and Policy Analysis 38 (1) 148ndash170

Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity

of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482

19

Figure 1

Geographic Distribution of Participating Districts

20

Figure 2

Participating Districts Neighborhood Socioeconomic Status and School Test Scores

Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school

district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos

neighborhood defined as the first principal component factor score based on measures of median

income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed

household rate and unemployment rate Y-axis is the districtrsquos average test score in grade

equivalents based on the averaged spring math and English scores for students in grades 3-8 for

2009-2013 with the expected level of achievement standardized to zero The size of each circle

is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using

Statarsquos default settings and roughly shows the predicted test score as a function of the

neighborhoodrsquos SES

21

Figure 3

Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile

Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects

Corresponding OLS estimate shown by the dashed horizontal line Science skill has been

standardized to have a mean of 0 and SD of 1 for the full sample of participating students

Results are weighted by the inverse probability of completing the survey

22

Table 1

Participating Schools and Teachers Compared to Other US High Schools and High School

Science Teachers Panel A Schools Participating Others

Average Enrollment 1409 723

Free or Reduced-Price Lunch 0700 0438

Asian 0055 0050

Black 0349 0154

Hispanic 0410 0221

White 0164 0537

Adjusted Cohort Graduation Rate 0843 0802

District Instruction Expenditures Per Pupil $6561 $5636

District Student Services Expenditures Per Pupil $3787 $3385

Panel B Teachers Participating Others

Age Under 30 0407 0160

Age 30-49 0432 0553

Age 50 or over 0161 0287

Female 0630 0536

Hispanic or Latino 0111 0051

Race American Indian or Alaska Native 0000 0009

Race Asian American 0111 0041

Race Black 0111 0060

Race Native Hawaiian or other Pacific Islander 0000 0004

Race White 0778 0896

Years of Experience 103 132

Years of Experience lt=2 0290 0085

Years of Experience lt=5 0481 0234

Hold a Teaching Certificate 0926 0945

Undergraduate Major in STEM 0944 0747

Single Subject Credential in Science 0630 0823

Masterrsquos Degree or Higher 0356 0615

Previously Taught AP Course 0469 NA

Previously Taught AP IB or Honors Course 0796 NA

Number of Professional Development Trainings 309 NA

in the Past 5 years (0-5)

Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts

httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public

high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a

9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the

Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey

httpsncesedgovsurveyssass Others in Panel B refers to public and private high school

teachers in the US High school science teachers are defined as teachers of grades 9-12 whose

main teaching assignment is in the natural sciences

23

Table 2

TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics

(1) (2) (3) (4) (5) (6)

Full Sample Survey Sample

Pre-Treatment Characteristic

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Age as of October of 11th Grade 166 -003 -007 166 -001 -001

(002) (007) (003) (009)

[019] [035] [065] [094]

Math Exam Score 038 008 025 044 007 030

(004) (010) (005) (016)

[008] [002] [017] [006]

Reading Exam Score 029 010 018 036 009 017

(003) (012) (004) (017)

[000] [014] [002] [031]

HS Grade Point Average 316 005 020 323 006 013

(003) (008) (003) (010)

[014] [002] [006] [020]

Female 059 000 010 061 -001 011

(003) (006) (004) (007)

[099] [010] [073] [012]

Asian American 012 002 010 012 003 010

(002) (005) (001) (007)

[027] [006] [007] [012]

Black 032 -002 -006 027 000 -005

(002) (006) (002) (005)

[029] [028] [088] [040]

Hispanic Native American or Multiracial 031 001 005 033 001 005

24

(002) (006) (002) (007)

[055] [041] [081] [051]

Disabled 002 000 -001 001 000 -001

(001) (001) (001) (001)

[093] [024] [057] [05]

Gifted 013 003 000 014 002 001

(002) (005) (002) (009)

[006] [100] [025] [089]

English Language Learner 005 001 002 004 001 004

(001) (002) (001) (003)

[041] [039] [054] [022]

Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007

(002) (007) (003) (009)

[066] [077] [072] [045]

Language Other than English Spoken at Home 034 002 003 035 001 004

(002) (007) (002) (007)

[032] [073] [059] [056]

Took Recommended Prerequisite Courses 079 000 009 079 002 005

(002) (004) (002) (005)

[084] [004] [043] [031]

Number of Observations 1819 1417

Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by

School x Cohort are in parentheses and p-values are in brackets

25

Table 3

First Stage Impacts on AP Course Enrollment and Overall Course Enrollment

(1) (2) (3) (4) (5) (6)

Full Sample Survey Respondents

Outcome

Control

Group

Mean

ITT

LATE

Control

Group

Mean

ITT

LATE

AP Treatment Course Enrollment 019 038 024 039

(005) (006)

[000] [000] Share of Credits During Study Year in

AP Science 003 004 011 003 004 010

(001) (001) (001) (001)

[000] [000] [000] [000]

All AP 013 004 011 014 004 010

(001) (002) (001) (002)

[000] [000] [000] [000]

Other Advanced Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [023] [020] [020]

All Other Advanced 025 -001 -003 025 -001 -003

(001) (002) (001) (003)

[023] [023] [030] [030]

Regular Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [020] [024] [019]

All Regular 062 -003 -009 061 -003 -007

(001) (003) (001) (003)

[002] [000] [007] [003]

Number of Observations 1819 1417

Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating

Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation

(1) Course-taking information collected from student transcripts Control Group Mean uses the

full control group for the first outcome (ie AP Treatment Course Enrollment) and those control

group members who complied with their assignment (ie those who did not take the AP

Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are

weighted by the inverse probability of completing the survey Standard errors clustered by School

x Cohort are in parentheses and p-values are in brackets

26

Table 4

Treatment Contrast (Composite Variables)

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Academically Challenging Curriculum -033 031 080

(010) (024)

[000] [000]

Project-Based Independent Classroom

Activities -006 013 033

(007) (017)

[007] [006]

Integrated Use of Technology

-011 011 028

(008) (019)

[019] [014]

Number of Observations 1417

Notes To construct these composite variables we first converted the values on each component

variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest

category was set to 10 the lowest to 00 and the remaining categories evenly spaced between

00 and 10 We then averaged and standardized these converted values Results are weighted by

the inverse probability of completing the survey Online Appendix Table 5 provides the list of

component variables Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

27

Table 5

AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Science Skill -019 009 023

(006) (016)

[015] [014]

STEM Interest 062 004 009

(002) (007)

[016] [016]

Confidence in College

Science 092 -004 -010

(002) (005)

[011] [006]

Stress 012 007 017

(003) (007)

[002] [001]

Grades in Science Courses 280 -012 -029

(007) (016)

[008] [007]

Grades in Other Courses 314 -007 -018

(002) (006)

[000] [000]

Number of Observations 1819 for grades 1417 for other

outcomes

Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of

participating students STEM interest =1 if high or some interest in pursuing a STEM degree or

=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to

complete a college science course or =0 if somewhat not confident or not at all confident Stress=

1 if most recent science course had strong negative or negative impact on physical or emotional

health or =0 if strong positive impact positive impact or no impact Grades in science and other

courses are obtained from student transcripts and measure grades during the study year

Results with the exception of grades during study year are weighted by the inverse probability of

completing the survey Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

Table 6

Robustness Checks of Main ITT Results

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Outcome

Control

Group

Complie

r Mean

Main

Result

s

Robus

t SE

p-value

(permutatio

n test)

Excludin

g High

School

56

Including

Imputatio

n of

Missing

Outcome

Variables

Excluding

Covariate

s

Excludin

g High

School

23

Lee

Lower

Boun

d

Lee

Upper

Boun

d

95

Confidence

Interval

from Lee

Bounds

Rati

o of

95

CI in

(11)

to

95

CI in

(7)

Science Skill -019 009 010 011 020 007 003 039

-

009

05

1 20

(006) (005) (000) (000) (000) (000) (007) (007)

[015] [006] [006] [020] [011] [001] [024] [072] [000]

STEM Interest 062 004 005 003 003 003 002 012

-

003

01

8 19

(002) (003) (000) (000) (000) (000) (003) (004)

[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College

Science 092 -004 -003 -006 -006 -004 -006 005

-

009

01

0 20

(002) (002) (000) (000) (000) (000) (002) (003)

[011] [005] [007] [037] [002] [003] [010] [000] [017]

Stress 012 007 005 006 008 007 001 011

-

005

01

5 16

(003) (002) (000) (000) (000) (000) (003) (002)

[002] [000] [000] [014] [007] [002] [002] [079] [000]

Grades in Science Courses 280 -012 -006 -010 -007 |

(007) (004) (000) (000) (000)

[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts

Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey

(002) (003) (000) (000) (000) |

[000] [001] [001] [000] [001] [038]

Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than

standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby

a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of

43

the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)

reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the

experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply

imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and

where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)

from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those

treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and

control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to

derive confidence interval for the treatment effect itself)

30

1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the

Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the

effect of advanced high school courses more generally often without distinctions between AP

and other rigorous course options Nearly all of these nonexperimental studies find large positive

effects of rigorous secondary school courses particularly those in math and science on studentsrsquo

high school postsecondary and labor market performance (eg Altonji 1995 Attewell and

Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long

Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer

an AP course are encouraged (though not required) to attend a professional development

training The Board and other independent agencies offer several workshops with the most

extensive training being the AP summer institute a week-long training that is led by an

experienced AP instructor Teachers are then expected to develop their syllabi for the course and

submit them to the Board for review A team of auditors at the Board review each syllabus and

grant permission to a school to label the course as AP on course catalogs and student transcripts

once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they

do not meet the requirements upon original submission College Board (2017b) contains a

discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for

assessment (ie course delivery and student performance are not assessed by the Board) In

order to effectively run an AP Biology or Chemistry course teachers require access to a well-

equipped classroom and laboratory including all supplies necessary to engage in

experimentation (eg beakers solutions microscopes measuring equipment) Most of the

teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of

learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry

reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses

were based upon recommendations from the National Science Foundation the National Research

Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently

influenced by their frames of reference in ways that other assessments of these traits (eg

external observations) may be less influenced By increasing the standard to which they compare

themselves studentsrsquo confidence may decrease This feature of most self-assessments could be

considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et

al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome

depends to some extent on how these changes in perceived ability influence other behaviors

such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and

Biology I and Chemistry I for AP Biology with no additional requirements beyond these

prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week

training course classroom supplies (eg lab materials textbooks) and to compensate schools

for the staff time required for study administration efforts We also offered $1000 compensation

for an individual selected by the school to serve as a liaison between the study team and the

31

school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of

students which would have powered the study to detect effect sizes smaller than those detected

here We faced several challenges in recruiting schools to participate even with the monetary

incentives Some schools were uncomfortable with randomization across classrooms while

others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the

course would be offered We also made some assignments on a rolling basis as additional

consentassent forms were submitted We have no information on the students who were deemed

eligible by the school to take the new AP science course but who did not sign the consent form

to participate As these students did not participate we do not have permission to obtain

information on their characteristics (eg via transcripts) and for most schools we do not know

the number of such students 10 Participating districts include Anaheim Union High School District California East Side

Union High School District California Lynwood Unified School District California Jefferson

Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg

Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public

Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville

Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate

degrees held by teachers nationally are likely to be in education (not STEM) Thus the study

teachers are less likely to have a graduate degree but not necessarily less likely to have STEM

training We also did not survey teachers regarding their Teach for America (TFA) experience

but it is possible that the relatively high share of STEM undergraduate degrees could be driven

by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last

pilot test included 140 students) prior to administering the tool to study participants Reliability

metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of

the development of the assessment instrument in the survey can be found in Seeratan et al

(2017) 13 Each year in the spring semester our team administered and collected the participant surveys

during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However

if study participants who did not take the survey differ in unobserved ways then our reweighting

based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo

characteristics before imputation of missing values (as described below) these results are very

similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact

that some students did not have 10th grade scores) we created one reading and math score for

each student that is the average of both scores or just the 8th grade score For the 23 participating

students who were in 10th grade during the year in which the AP course was offered to their

cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be

endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of

enrollment in the course so the lack of balance is simply due to unlucky randomization rather

32

than manipulation by school administrators We considered implementing a randomized block

design to avoid such issues but found it infeasible to obtain the necessary test score information

prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was

allowed to register for the new class We added an entire planning year to our study design to

avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by

Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We

find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these

six outcomes which suggests that generalizing our estimated treatment effects to the full control

group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly

offered enrollment in an AP course and then given the option of Chemistry or Biology To

account for the two courses offered we treat the school as two separate groups School-

Chemistry and School-Biology For those students who were not offered an AP course we

randomly assign them to one of two control groups proportional to the number of treated

students who chose each course For example if 60 of the treated students chose Biology then

we randomly assign 60 of the control students to the School-Biology control group In Section

VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a

probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1

if student i in school by cohort j completed any part of the end-of-year survey Xi is the same

vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed

effects and Φ() is the cumulative normal distribution function The results of this regression are

included in Online Appendix Table 2 Students who had higher pre-treatment grades Black

students those who were not disabled and those who took prerequisite courses were more likely

to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives

more weight in the regression to study participants who completed the survey and yet had pre-

study characteristics that were similar to those study participants who did not complete the

survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and

with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we

observe each outcome variable This follows a multiple imputation then deletion strategy

suggested by Hippel (2007) which improves efficiency while protecting against problematic

imputed outcome values As a robustness check Section VC provides results including

imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually

challenging curriculum with more homework than non-AP complier students Treatment group

students are also more likely to report that the students in their class were driven to succeed and

that the teacher set high standards The AP science class also involved more student-led projects

or experiments hands on learning and small group work all activities that are deemed to be

essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)

Yet we do not find strong evidence that students in AP classes were more likely to present what

they learned apply their knowledge to solve a new problem or work independently and none of

the component measures of technology usage were statistically significantly affected Nor did

33

treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear

better able to implement the academic rigor expected of an AP science class than some of the

inquiry-based approaches that the College Board intends for AP science We do not find

evidence that taking AP science led students to be more likely to report that they found their

course more interesting which may reflect the inability of the teachers to fully implement a

creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects

that might render our estimated effects smaller A research design with randomization both

across and within schools would allow for estimation of spillover effects but such a design was

infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers

received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the

weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors

in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple

outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons

(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same

three outcomes that reach statistical significance without applying the correction (shown in

Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys

from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we

have included the students from cohort 1 of high school number 23 where nonresponse was due

mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes

(including student prior academic preparation raceethnicity gender and teacher preparation)

We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in

science and grades in other courses) Some of the differences in the point estimates were quite

large yet so too were the standard errors For instance five of the seven estimated differential

treatment effects on science skill exceed 025 standard deviations with p-values that fall in the

suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse

on all three cohorts of study participants Once data collection is complete we will have the

ability to examine the effect of AP science on college enrollment college selectivity and college

completion

Page 9: Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown,

9

participants during a 45-minute period measures studentsrsquo skills in data analysis scientific

explanation and scientific argument12 Participating teachers were not provided copies of the

instrument in advance therefore teachers were unable to teach any content material prior to test

administration

The second source is a questionnaire that we administered concurrently with the assessment

and that asks students a number of questions about their most recent science class and their plans

after high school The assessment and questionnaire were completed together and administered

outside of class (henceforth we refer to these instruments as the ldquosurveyrdquo) The third data source

are studentsrsquo high school transcripts which contain data on demographic and socioeconomic

background grades courses standardized exams taken in the 8th and 10th grades as well as high

school completion We use these data to determine the balance of randomization on pre-

treatment covariates estimate the effect of randomization on course-taking (including

compliance) improve the precision of our estimates with statistical controls and estimate

treatment effects on studentsrsquo grades

Our survey response rate was 78 percent13 Attrition can be attributed to student absences

during the dates scheduled for survey administration and communication lapses between school

coordinators and students Students who were randomly assigned to treatment have a 9-

percentage point higher survey response rate Given the possibility of nonrandom sample

attrition we weight all regressions by the inverse of the probability of completing the survey

conditional on student characteristics14 We implement a variety of robustness checks as

additional means to account for nonresponse These include multiple imputation of missing

outcome variables excluding one high school that had a low response rate and using the Lee

(2009) technique to provide bounds on the estimated effects These methods and results are

discussed below

We supplement these data with surveys that we administered online to teachers of the new

AP courses at the conclusion of the course The teacher survey includes questions about their

educational background professional experiences and professional development past and

present instructional practices generally and around science specifically participation in the

College Board AP training ability to cover the content of the AP course and coaching

mentoring and other professional community supports received from the school district and

education community

Table 2 provides balancing tests on pre-treatment characteristics for the full sample and the

survey respondents conditional on school by cohort fixed effects15 Most of the estimated

differences between treatment and control group students on pre-treatment observed

characteristics are small with some notable exceptions In both the full and survey samples

treatment group studentsrsquo reading exam scores were 010 and 009 standard deviations higher

than control group students both at p-values below 005 The magnitude of the treatment-control

difference was slightly lower and less precisely-estimated in math yet also favored treatment

group students16 To adjust for these chance imbalances we include all student covariates as

predictors of outcomes in the models and in the robustness checks we exclude these

covariates17

Table 2 also shows the extent of differences between control group compliers and non-

compliers We find that non-compliers are generally much more academically prepared for AP

science they have higher pre-treatment reading and math test scores and are more likely to have

completed the prerequisite courses On demographics non-compliers are more likely to be Asian

American and female18

10

IV Empirical Strategy

We estimate the effect of taking the AP science course with a standard instrumental variable

specification

(1) 119884119894119895 = 120572119895 + 119860119894119895120573 + 119935119894120574 + 120598119894119895

(2) 119860119875119894119895 = 120575119895 + 119874119891119891119890119903119890119889119894119895120579 + 119935119894120583 + 120598119894119895

where 119860119875119894119895 = 1 if student i enrolled in the AP science course in school x cohort stratum j 119860119894119895 is

the fitted value based on the estimates of the parameters in Equation (2) Offeredij = 1 if the

student is randomized into the treatment group Xi is a vector of pre-treatment covariates

(including age math and reading exam scores from 8th and 10th grade (standardized and

averaged for math and reading separately) cumulative GPA prior to the year when the AP

science course was offered and indicator variables for female racial group (Asian American

Black or Hispanic Native American or Multiracial) disability gifted English Language

Learner eligible for free or reduced-price lunch home language is not English and took

recommended prerequisite courses) and 120572119895 and 120575119895 are school by cohort fixed effects19 We use

two-stage least squares to estimate the model for all outcomes The local average treatment effect

(LATE) estimate is given by β

The intent to treat (ITT) estimate is obtained by replacing 119860119894119895 with Offeredij in Equation (1)

as shown in Equation (3) The coefficient on Offeredij in Equation (3) provides the effect of

being offered enrollment in the new AP science course and is a weighted average of effects on

those who do and do not choose to enroll in the course

(3) 119884119894119895 = 120577119895 + 119874119891119891119890119903119890119889119894119895120591 + 119935119894120582 + 120598119894119895

For outcomes that are obtained from the survey we weight regressions by the inverse of the

estimated probability of completing the survey20 The results are similar without using these

weights (see Online Appendix Tables 3 4 and 6) Since we have some missingness in student

characteristics as a result of either missing student transcripts or certain data elements not

collected by the district we use multiple imputation by chained equations creating 10 imputed

datasets and combine the results21 For inference we cluster standard errors at the level of

treatment assignment (school by cohort) in our analysis of main effects In the analysis of

robustness we report permutation standard errors robust standard errors (for comparison to

permutations) and the statistical significance of the LATE estimates after adjusting our tests of

significance for multiple comparisons

V Results

A Course-Taking and Treatment Contrast

Table 3 provides estimated effects of the randomized offer of enrollment on AP science course

enrollment and share of credits in all courses for the full sample and the survey samples The

first-stage estimates indicate that the offer substantially increased the likelihood of the student

taking the AP science course by 38 percentage points in the full sample and 39 percentage points

in the survey sample As we expected compliance with randomization was imperfect with 42

11

percent of the students who received an offer choosing not to enroll and 19 percent of the control

students enrolling Nearly all of these latter crossovers reflected decisions by the district to

violate the study protocol and let control group students into the course while a few of these

came from hardship exemptions that were requested by the school and granted by the study team

The remaining rows in Table 3 shine light on the courses that were crowded out by the newly

offered AP science course Mechanically treatment group students took more credits in AP

science (an 11-percentage point increase in the share of total credits in the full sample)

Treatment group studentsrsquo share of courses in any AP also increased by 11 percentage points

indicating that they chose not to reduce enrollment in other AP courses Instead taking AP

science appears to have crowded out regular courses (down 9 percentage points) including

regular science courses (down 2 percentage points)22

Approximately 78 percent of the control group compliers took any science course with 34

percent taking a non-AP advanced science course (almost entirely honors courses) during the

study year The control students who did not take AP Biology or Chemistry took a variety of

alternative science courses with the most commonly reported courses including Chemistry

(13) Physics (12) AP Environmental Science (11) Biology (10) Honors Biology (9)

and AnatomyPhysiology (9)

Table 4 provides the contrast in treatment and control group complier reports on the content

and rigor of their science courses for three composite variables We find that taking AP science

yielded a substantially more academically challenging curriculum (up 080 sd p-value lt 001)

and raised the extent of inquiry-based classroom activities (up 033 sd p-value = 006) Our

results also suggest that AP course-takerrsquos classrooms were more likely to use technology (up

028 sd p-value = 014)23 Online Appendix Table 5 shows estimated impacts on each of the

component variables used in constructing the composite variables We find that while AP

classrooms were more inquiry-based than other science classrooms using our composite

measure some of the core components of the inquiry approach that were intended by the Board

(eg applying knowledge to solve a new problem) were not more prevalent in AP science

classes than other science classes24 This contrast between studentsrsquo reports of the content and

rigor of their AP science course relative to other courses available to them offers one measure of

the relative quality of the treatment In a companion manuscript we provide a detailed evaluation

of implementation fidelity (the degree to which the courses were implemented as intended by the

Board) through teacher surveys course syllabi student transcripts and interviews with teachers

and school administrators (Long Conger and McGhee 2018) In that manuscript we find results

that are consistent with the finding that most teachers were able to implement a rigorous AP

science classroom yet they also struggled with the inquiry-based approach and integrating

technology into the classroom

These reported differences between treatment and control group classrooms also hold despite

the fact that many of the teachers selected to teach AP also teach the other science courses taken

by control group students In fact almost 67 percent of AP teachers reported using some of their

AP science strategies and lessons in their non-AP classes These within-school spillovers likely

attenuate observed differences in outcomes between treatment and control group students in the

same school25

B AP Impact on Outcomes

Table 5 reports estimated impacts of AP science on the key outcomes of interest We estimate

that for the typical complier taking AP science raises objectively measured scientific inquiry

skills by 023 standard deviations We are unable to rule out zero treatment impacts with

12

conventionally high levels of confidence (p-value = 014) and consequently refer to these results

as more suggestive than definitive AP science also increased compliersrsquo interest in pursuing a

STEM degree should they enroll in college by 9 percentage points up from a control group

complier mean of 62 percent with again more suggestive than definitive results at traditional

levels of statistical inference (p-value = 016)

Table 5 provides stronger evidence of negative treatment effects on studentsrsquo confidence in

their ability to succeed in a college science course Among control group compliers 92 percent

express that they are at least somewhat confident in their ability to succeed in a college science

course These high levels of confidence are perhaps not surprising since all of our sample

participants demonstrated interest in taking AP Chemistry or Biology as a result of signing the

study assent forms Taking AP science substantially lowered participantsrsquo likelihood of being at

least somewhat confident in their ability to complete college courses in science (down 10

percentage points p-value = 006) We also find large effects of the AP course on studentsrsquo self-

reported stress levels Among control group compliers 12 percent stated that their most recent

science class had a negative or strong negative impact on their stress levels (where a negative

impact indicates more stress) Taking AP science more than doubles this rate raising the

likelihood of stating a negative impact by 17 percentage points (p-value = 001) In results

available from the authors we also examine the effect of taking AP on the full distribution of

studentrsquos self-reported confidence and stress levels We find that taking AP science increases

studentsrsquo likelihood of reporting strong negative impacts on stress by 5 percentage points (p-

value = 005) above the control group complier mean of 2 percent

In addition to experiencing a loss in confidence and an increase in stress treatment group

studentsrsquo grades suffered We estimate that taking AP science reduced studentsrsquo grades in their

science courses by 029 points (p-value = 007) Relative to a control group complier mean of

280 taking AP science lowers studentsrsquo science GPAs during the study year (usually their junior

year) from around a B- to a C+26 This decline is addressed to some degree by high schools that

use a weighted grade point average to upweight grades from AP courses The last row of Table 5

provides our estimated effects of AP science on studentsrsquo grades in other courses AP science

takers score approximately 018 grade points lower than control group compliers in non-science

courses during the study year (p-value below 001) These results suggest that students may be

shifting their effort away from their non-AP classes in order to meet the demands of the

challenging AP course An average of these impacts weighted by studentsrsquo share of credits in

science during the study year assuming that they take AP science (024) suggests that taking AP

science lowers studentsrsquo overall grades by 021 during the year ((-029 times 024) + (-018 times

076))

With our estimates in hand we can easily compute the adjustment that would leave the

studentrsquos GPA during the study year unaffected For students who took AP Biology or Chemistry

as result of this experiment the share of their classes in any AP science subject is predicted to be

14 percent (ie 002 + 012 from Table 3) If these studentsrsquo grades in AP science courses were

boosted by 146 (021014) their GPAs during the study year would be unaffected by their

enrollment in these AP courses This 146 boost is close to the higher end of the practices

documented in Klopfenstein and Lively (2016)27

C Robustness Checks

Table 6 presents a variety of robustness checks of the ITT estimates on our six main outcomes

The first two columns of this table repeat the findings previously shown in Table 5 Columns (3)

and (4) present alternate methods for inference Column (3) reports robust standard errors and

13

Column (4) reports the results of a permutation test where we randomly assign a pseudo

treatment and compute the share of 1000 permutations where the absolute value of the estimated

pseudo treatment effect exceeds the absolute value of the estimated treatment effect shown in

Column (2)28 The resulting p-values from this permutation test are similar to the results using

robust standard errors (shown in Column (3)) resulting in five of the six outcomes with p-values

of less than 01029

Columns (5) through (7) of Table 6 show that the results are robust to (a) dropping the one

high school that offered both AP Biology and AP Chemistry as part of the study (b) including

observations with multiply-imputed missing outcome variables and (c) excluding the high

school with the lowest survey response rate30 Column (8) shows the results when we exclude all

of the Xi covariates where we find much larger estimated positive effects on scientific inquiry

skills and smaller estimated negative effects on grades The differences in the treatment effects

on the remaining three outcomes are modest These results likely reflect the fact that students

who were randomly assigned into the treatment group have higher pre-treatment grades and

reading and math test scores all covariates that strongly correlate with science skill and future

grades

Columns (9) through (12) of Table 6 use the Lee (2009) method to place bounds on our

estimates due to potential nonresponse bias in the student survey used for the first four outcomes

This method trims particular observations from the treatment group (in this case) until it matches

the response rate of the control group The lower (upper) bound estimate trims the treatment

observations with the highest (lowest) values of the outcome Using these lower and upper bound

estimates we compute the 95 percent confidence interval for the treatment effect itself by

applying the Imbens and Manski (2004) method Consistent with our main findings the upper

and lower bound points estimates are positive for science skill (003 and 039 sd) interest in

pursuing a STEM degree (2 and 12 percentage points) and stress (1 and 11 percentage points)

However the 95 percent confidence intervals overlap zero in all cases and are roughly double the

size of the ordinary confidence intervals These results suggest that some additional caution

should be considered in evaluating the effects from outcomes based on the study survey31

Finally we would have liked to report the results of theoretically motivated heterogeneity

analyses yet we lack the statistical power needed to test heterogeneity with a high level of

confidence For example Figure 3 shows a quantile regression conditional on Xi with science

skill as the outcome We find that the point estimates at every quantile are insignificantly

different from the 009 ITT point estimate reported in Table 5 yet the 95 confidence intervals

fail to rule out large positives and negatives Additional heterogeneity results can be found in the

Online Appendix32

VI Conclusion

Most admissions committees at bachelorrsquos degree-granting institutions rely on applicantsrsquo AP

course and exam participation as signals of subject-matter skill and interest rendering the

relationship between AP uptake and college enrollment somewhat deterministic There has been

almost no empirical work to support the theory that AP disproportionately endows high school

students with greater human capital than the other courses available to them Many students

educators and parents have also complained that the rigor of the AP pro- gram causes students to

lose confidence gain stress and perform poorly in other courses We evaluate these claims with

experimental evidence on the impact of AP Biology and Chemistry courses on studentsrsquo skills

14

interests and beliefs We recruited 23 schools that had not previously offered AP Biology or

Chemistry and were willing to permit us to randomize student access to the newly offered

course At the time of our school recruitment an estimated 50 percent of US high schools

already offered AP science classes and they tended to be in relatively higher-income

communities disproportionately serving White students (Malkus 2016) Our study drew from the

remaining population of schools where teachers had lower levels of training than science

teachers nationally and students were disproportionately non-White and poor Consequently our

results on AP impacts best generalize to schools like these that are on the cusp of deciding

whether to offer an AP science course

The estimates suggest that AP science led to improvements in science skill and STEM

interest above the courses that these students would otherwise take Prior research points to

longer-run benefits of AP including a higher likelihood of college enrollment and completion as

well as possible earnings gains (Jackson 2010 2014) Our findings suggest that these long-term

effects are at least partially driven by genuine increases in skill and not due solely to

postsecondary admissions and credit-granting policies33 We also find that AP science classes

substantially increase studentsrsquo stress levels and reduce their confidence in completing a college

science course Students who take AP science also receive lower grades in science and in other

(non-science) courses The cognitive gains from AP science are consistent with evidence that

higher levels of pressure and a lower level of confidence cause students to learn more than they

would otherwise And some of the negative effect on grades can be offset by upwardly weighting

grades in advanced courses

Although we have no direct way to convert our study impacts into monetary values for

students or society our evidence suggests that schools and districts are not making unwise or

costly investments in AP Calculating the differential cost to deliver an AP course versus another

level course in the same subject is difficult given that few schools document per-course

expenditures One recent analysis of a US district that relied on teacher salaries and course

assignments offers a partial cost-analysis Roza (2009) finds approximately $360 more in per-

pupil expenditures to deliver AP versus honors due primarily to smaller class sizes and more

senior teachers in AP This cost does not factor in the time that teachers spend retraining

themselves to teach the new curriculum At the same time relative to other policies aimed at

increasing human capital in high school that are often more costly to implement (such as

reducing class size) offering an AP course may be one of the least expensive options

This study offers the first credible estimates on the impact of a curriculum that is now offered

in the majority of the nationrsquos high schools and used by most postsecondary institutions to assess

applicant potential Our findings offer evidence to support and refute some of the claims made

about the AP program At the same time many important questions remain about differential AP

course impacts along student teacher and school attributes and on different parts of the outcome

distributions What are the general equilibrium effects of AP expansion for instance on college

admissions decisions as AP expands into schools with fewer resources Do AP courses generate

spillover effects on non-AP course-takers via changes in peer interactions and changes in how

teachers teach their non-AP classes These are all questions that warrant further research

15

References

Abadie Alberto Susan Athey Guido W Imbens and Jeffrey Wooldridge 2017 ldquoWhen Should

you Adjust Standard Errors for Clusteringrdquo NBER Working Paper No 24003

Cambridge MA NBER

Adelman Clifford 2006 The Toolbox Revisited Paths to Degree Completion from High School

Through College Washington DC US Department of Education

Aguilar Lauren Greg Walton and Carl Wieman 2014 ldquoPsychological Insights for Improved

Physics Teachingrdquo Physics Today 67 (5) 43ndash49

Altonji Joseph G 1995 ldquoThe Effects of High School Curriculum on Education and Labor

Market Outcomesrdquo The Journal of Human Resources 30 (3) 409ndash438

Anderson Carl R 1976 ldquoCoping Behaviors as Intervening Mechanisms in the Inverted-U-stress-

performance Relationshiprdquo Journal of Applied Psychology 61 (1) 30ndash34

Attewell Paul and Thurston Domina 2008 ldquoRaising the Bar Curricular Intensity and

Academic Performancerdquo Educational Evaluation and Policy Analysis 30 (1) 51ndash71

Avery Christopher Oded Gurantz Michael Hurwitz and Jonathan Smith 2018 ldquoShifting

College Majors in Response to Advanced Placement Exam Scoresrdquo Journal of Human

Resources 53 (4) 918ndash956

Benjamini Yoav and Yosef Hochberg 1995 ldquoControlling the False Discovery Rate A Practical

and Powerful Approach to Multiple Testingrdquo Journal of the Royal Statistical Society 57

(1) 289ndash300

Bennett J S Hogarth F Lubben B Campbell and A Robinson 2010 ldquoTalking Science The

Research Evidence on the Use of Small Group Discussions in Science Teachingrdquo

International Journal of Science Education 32 (1) 69ndash95

Berger Joe 2006 ldquoDemoting Advanced Placementrdquo The New York Times October 4

Boekaerts Monique and Jeroen S Rozendaal 2010 ldquoUsing Multiple Calibration Indices in

Order to Capture the Complex Picture of What Affects Studentsrsquo Accuracy of Feeling of

Confidencerdquo Learning and Instruction 20 (5) 372ndash382

Bound John Brad Hershbein and Bridget Terry Long 2009 ldquoPlaying the Admissions Game

Student Reactions to Increasing College Competitionrdquo The Journal of Economic

Perspectives 23 (4) 119ndash146

Bowie Liz 2013 ldquoMaryland Schools have been Leader in Advanced Placement but Results are

Mixedrdquo The Baltimore Sun August 17

Bush George W 2006 ldquoState of the Union Address by the Presidentrdquo Washington DC The

White House

Chiu Ming Ming and Robert M Klassen 2010 ldquoRelations of Mathematics Self-Concept and its

Calibration with Mathematics Achievement Cultural Differences among Fifteen-year-

olds in 34 Countriesrdquo Learning and Instruction 20 (1) 2ndash17

Clotfelter Charles T Helen F Ladd and Jacob L Vigdor 2010 ldquoTeacher Credentials and

Student Achievement in High School Across-Subject Analysis with Student Fixed

Effectsrdquo Journal of Human Resources 45 (3) 655ndash681

College Board 2002 Equity Policy Statement New York NY

__________ 2011a AP Biology Curriculum Framework 2012-2013 New York NY

__________ 2011b AP Chemistry Curriculum Framework 2013-2014 New York NY

__________ 2017a AP Course and Exam Redesign New York NY

__________ 2017b AP Course Audit New York NY

__________ 2018 AP Program Participation and Performance Data 2018 New York NY

16

Davis Jennifer R 2014 ldquoA Little Goes a Long Way Pressure for College Students to Succeedrdquo

Journal of Undergraduate Research 12 (1) 1ndash9

Dobbie Will and Roland G Fryer Jr 2015 ldquoThe medium-term impacts of high-achieving

charter schoolsrdquo Journal of Political Economy 123 (5) 985ndash1037

Dougherty Chrys and Lynn Mellor 2009 ldquoPreparation Mattersrdquo National Center for

Educational Achievement Washington DC

Dounay Zinth Jennifer 2016 ldquo50-State Comparison Advanced Placement Policiesrdquo Education

Commission of the States

Drew Christopher 2011 ldquoRethinking Advanced Placementrdquo The New York Times January 7

Duffett Ann and Steve Farkas 2009 ldquoGrowing Pains in the Advanced Placement Program Do

Tough Trade-offs Lie Aheadrdquo Thomas B Fordham Institute Washington DC

Ellis Jessica Bailey K Fosdick and Chris Rasmussen 2016 ldquoWomen 15 Times More Likely to

Leave STEM Pipeline after Calculus Compared to Men Lack of Mathematical

Confidence a Potential Culpritrdquo PLOS ONE 11 (7) 1ndash14

Foust Regan Clark Holly Hertberg-Davis and Carolyn M Callahan 2009 ldquoStudentsrsquo

Perceptions of the Non-academic Advantages and Disadvantages of Participation in

Advanced Placement Courses and International Baccalaureate Programsrdquo Adolescence

44 (174) 289ndash312

Geiser Saul and Veronica Santelices 2004 ldquoThe Role of Advanced Placement and Honors

Courses in College Admissionsrdquo Center for Studies in Higher Education Research

Occasional Paper Series CSHE404

Goodman Joshua Samuel 2012 ldquoThe Labor of Division Returns to Compulsory Math

Courseworkrdquo Unpublished Manuscript

Harel O 2009 ldquoThe Estimation of R-squared and Adjusted R-squared in Incomplete Data Sets

Using Multiple Imputationrdquo Journal of Applied Statistics 36 (10) 1109ndash1118

Hippel Paul T von 2007 ldquoRegression with Missing Ys An Improved Strategy for Analyzing

Multiply Imputed Datardquo Sociological Methodology 37 (1) 83ndash117

Holstead Michael S Terry E Spradlin Margaret E McGillivray and Nathan Burroughs 2010

ldquoThe Impact of Advanced Placement Incentive Programsrdquo Center for Evaluation and

Education Policy Indiana University Education Policy Brief 8(1)

Hopkins Katy 2012 ldquoWeigh the Benefits Stress of AP Courses for Your Studentrdquo US News

the World Report May 10

Huber Martin 2013 ldquoA Simple Test for the Ignorability of Non-compliance in Experimentsrdquo

Economics Letters 120 (3) 389ndash391

Imbens G and F Manski 2004 ldquoConfidence Intervals for Partially Identified Parametersrdquo

Econometrica 72 (6) 1845ndash1857

Jackson C Kirabo 2010 ldquoA Little Now for a Lot Later A Look at a Texas Advanced Placement

Incentive Programrdquo Journal of Human Resources 45 (3) 591ndash639

__________ 2014 ldquoDo College-Preparatory Programs Improve Long-Term Outcomesrdquo

Economic Inquiry 52 (1) 72ndash99

Joensen Juanna Schroslashter and Helena Skyt Nielsen 2009 ldquoIs there a Causal Effect of High

School Math on Labor Market Outcomesrdquo Journal of Human Resources 44 (1) 171ndash

198

Kim Emily 2015 ldquoAP Classes often Translate to Advanced Pressurerdquo Los Angeles Times

September 22

Klopfenstein Kristin and Kit Lively 2016 ldquoDo Grade Weights Promote More Advanced

17

Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324

Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement

Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891

__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and

Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds

Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188

Cambridge Harvard Education Press

Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla

Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)

287ndash 313

Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on

Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102

Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations

of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347

(6219) 262ndash265

Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math

and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic

Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student

STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher

Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking

on Secondary and Postsecondary Successrdquo American Educational Research Journal 49

(2) 285ndash322

Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP

Expansion Can Schools in Less-Resourced Communities Successfully Implement

Advanced Placement Science Coursesrdquo Conditionally accepted by Educational

Researcher

Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo

American Enterprise Institute Washington DC

Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23

McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy

Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of

Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-

144) US Department of Education Washington DC National Center for Education

Statistics

National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of

Mathematics and Science in US High Schoolsrdquo Washington DC National Academies

Press

__________ 2012 A Framework for K-12 Science Education Practices Crosscutting

Concepts and Core Ideas Washington DC The National Academies Press

Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC

Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data

Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures

Version 10 Stanford University

Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic

Analysis amp Policy 4 (1) 1ndash30

18

Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The

Review of Economics and Statistics 86 (2) 497ndash513

Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)

Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of

Advanced High School Coursework in Increasing STEM Career Interestrdquo Science

Educator 23 (1) 1ndash13

Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework

in College Admission Decisionsrdquo College and University 82 (4) 7ndash14

Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan

Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific

Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo

Educational Measurement Forthcoming

Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where

it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor

Economics 35 (1) 67ndash147

Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An

Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732

Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual

differencesrdquo Personality and Individual Differences 21 (6) 971ndash986

Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of

Cross-Cultural Psychology 45 (5) 821ndash837

Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid

Growthrdquo The New York Times April 29

Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo

Liberal Education 94 (3) 38ndash43

The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo

Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo

Education Trust June 5

Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and

Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-

001) US Department of Education Washington DC National Center for Education

Statistics

Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13

Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate

US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the

Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced

Placement Testsrdquo Washington DC

Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of

Advanced Placementrdquo Progressive Policy Institute Washington DC

West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth

Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring

Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation

and Policy Analysis 38 (1) 148ndash170

Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity

of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482

19

Figure 1

Geographic Distribution of Participating Districts

20

Figure 2

Participating Districts Neighborhood Socioeconomic Status and School Test Scores

Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school

district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos

neighborhood defined as the first principal component factor score based on measures of median

income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed

household rate and unemployment rate Y-axis is the districtrsquos average test score in grade

equivalents based on the averaged spring math and English scores for students in grades 3-8 for

2009-2013 with the expected level of achievement standardized to zero The size of each circle

is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using

Statarsquos default settings and roughly shows the predicted test score as a function of the

neighborhoodrsquos SES

21

Figure 3

Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile

Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects

Corresponding OLS estimate shown by the dashed horizontal line Science skill has been

standardized to have a mean of 0 and SD of 1 for the full sample of participating students

Results are weighted by the inverse probability of completing the survey

22

Table 1

Participating Schools and Teachers Compared to Other US High Schools and High School

Science Teachers Panel A Schools Participating Others

Average Enrollment 1409 723

Free or Reduced-Price Lunch 0700 0438

Asian 0055 0050

Black 0349 0154

Hispanic 0410 0221

White 0164 0537

Adjusted Cohort Graduation Rate 0843 0802

District Instruction Expenditures Per Pupil $6561 $5636

District Student Services Expenditures Per Pupil $3787 $3385

Panel B Teachers Participating Others

Age Under 30 0407 0160

Age 30-49 0432 0553

Age 50 or over 0161 0287

Female 0630 0536

Hispanic or Latino 0111 0051

Race American Indian or Alaska Native 0000 0009

Race Asian American 0111 0041

Race Black 0111 0060

Race Native Hawaiian or other Pacific Islander 0000 0004

Race White 0778 0896

Years of Experience 103 132

Years of Experience lt=2 0290 0085

Years of Experience lt=5 0481 0234

Hold a Teaching Certificate 0926 0945

Undergraduate Major in STEM 0944 0747

Single Subject Credential in Science 0630 0823

Masterrsquos Degree or Higher 0356 0615

Previously Taught AP Course 0469 NA

Previously Taught AP IB or Honors Course 0796 NA

Number of Professional Development Trainings 309 NA

in the Past 5 years (0-5)

Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts

httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public

high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a

9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the

Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey

httpsncesedgovsurveyssass Others in Panel B refers to public and private high school

teachers in the US High school science teachers are defined as teachers of grades 9-12 whose

main teaching assignment is in the natural sciences

23

Table 2

TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics

(1) (2) (3) (4) (5) (6)

Full Sample Survey Sample

Pre-Treatment Characteristic

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Age as of October of 11th Grade 166 -003 -007 166 -001 -001

(002) (007) (003) (009)

[019] [035] [065] [094]

Math Exam Score 038 008 025 044 007 030

(004) (010) (005) (016)

[008] [002] [017] [006]

Reading Exam Score 029 010 018 036 009 017

(003) (012) (004) (017)

[000] [014] [002] [031]

HS Grade Point Average 316 005 020 323 006 013

(003) (008) (003) (010)

[014] [002] [006] [020]

Female 059 000 010 061 -001 011

(003) (006) (004) (007)

[099] [010] [073] [012]

Asian American 012 002 010 012 003 010

(002) (005) (001) (007)

[027] [006] [007] [012]

Black 032 -002 -006 027 000 -005

(002) (006) (002) (005)

[029] [028] [088] [040]

Hispanic Native American or Multiracial 031 001 005 033 001 005

24

(002) (006) (002) (007)

[055] [041] [081] [051]

Disabled 002 000 -001 001 000 -001

(001) (001) (001) (001)

[093] [024] [057] [05]

Gifted 013 003 000 014 002 001

(002) (005) (002) (009)

[006] [100] [025] [089]

English Language Learner 005 001 002 004 001 004

(001) (002) (001) (003)

[041] [039] [054] [022]

Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007

(002) (007) (003) (009)

[066] [077] [072] [045]

Language Other than English Spoken at Home 034 002 003 035 001 004

(002) (007) (002) (007)

[032] [073] [059] [056]

Took Recommended Prerequisite Courses 079 000 009 079 002 005

(002) (004) (002) (005)

[084] [004] [043] [031]

Number of Observations 1819 1417

Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by

School x Cohort are in parentheses and p-values are in brackets

25

Table 3

First Stage Impacts on AP Course Enrollment and Overall Course Enrollment

(1) (2) (3) (4) (5) (6)

Full Sample Survey Respondents

Outcome

Control

Group

Mean

ITT

LATE

Control

Group

Mean

ITT

LATE

AP Treatment Course Enrollment 019 038 024 039

(005) (006)

[000] [000] Share of Credits During Study Year in

AP Science 003 004 011 003 004 010

(001) (001) (001) (001)

[000] [000] [000] [000]

All AP 013 004 011 014 004 010

(001) (002) (001) (002)

[000] [000] [000] [000]

Other Advanced Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [023] [020] [020]

All Other Advanced 025 -001 -003 025 -001 -003

(001) (002) (001) (003)

[023] [023] [030] [030]

Regular Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [020] [024] [019]

All Regular 062 -003 -009 061 -003 -007

(001) (003) (001) (003)

[002] [000] [007] [003]

Number of Observations 1819 1417

Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating

Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation

(1) Course-taking information collected from student transcripts Control Group Mean uses the

full control group for the first outcome (ie AP Treatment Course Enrollment) and those control

group members who complied with their assignment (ie those who did not take the AP

Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are

weighted by the inverse probability of completing the survey Standard errors clustered by School

x Cohort are in parentheses and p-values are in brackets

26

Table 4

Treatment Contrast (Composite Variables)

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Academically Challenging Curriculum -033 031 080

(010) (024)

[000] [000]

Project-Based Independent Classroom

Activities -006 013 033

(007) (017)

[007] [006]

Integrated Use of Technology

-011 011 028

(008) (019)

[019] [014]

Number of Observations 1417

Notes To construct these composite variables we first converted the values on each component

variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest

category was set to 10 the lowest to 00 and the remaining categories evenly spaced between

00 and 10 We then averaged and standardized these converted values Results are weighted by

the inverse probability of completing the survey Online Appendix Table 5 provides the list of

component variables Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

27

Table 5

AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Science Skill -019 009 023

(006) (016)

[015] [014]

STEM Interest 062 004 009

(002) (007)

[016] [016]

Confidence in College

Science 092 -004 -010

(002) (005)

[011] [006]

Stress 012 007 017

(003) (007)

[002] [001]

Grades in Science Courses 280 -012 -029

(007) (016)

[008] [007]

Grades in Other Courses 314 -007 -018

(002) (006)

[000] [000]

Number of Observations 1819 for grades 1417 for other

outcomes

Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of

participating students STEM interest =1 if high or some interest in pursuing a STEM degree or

=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to

complete a college science course or =0 if somewhat not confident or not at all confident Stress=

1 if most recent science course had strong negative or negative impact on physical or emotional

health or =0 if strong positive impact positive impact or no impact Grades in science and other

courses are obtained from student transcripts and measure grades during the study year

Results with the exception of grades during study year are weighted by the inverse probability of

completing the survey Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

Table 6

Robustness Checks of Main ITT Results

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Outcome

Control

Group

Complie

r Mean

Main

Result

s

Robus

t SE

p-value

(permutatio

n test)

Excludin

g High

School

56

Including

Imputatio

n of

Missing

Outcome

Variables

Excluding

Covariate

s

Excludin

g High

School

23

Lee

Lower

Boun

d

Lee

Upper

Boun

d

95

Confidence

Interval

from Lee

Bounds

Rati

o of

95

CI in

(11)

to

95

CI in

(7)

Science Skill -019 009 010 011 020 007 003 039

-

009

05

1 20

(006) (005) (000) (000) (000) (000) (007) (007)

[015] [006] [006] [020] [011] [001] [024] [072] [000]

STEM Interest 062 004 005 003 003 003 002 012

-

003

01

8 19

(002) (003) (000) (000) (000) (000) (003) (004)

[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College

Science 092 -004 -003 -006 -006 -004 -006 005

-

009

01

0 20

(002) (002) (000) (000) (000) (000) (002) (003)

[011] [005] [007] [037] [002] [003] [010] [000] [017]

Stress 012 007 005 006 008 007 001 011

-

005

01

5 16

(003) (002) (000) (000) (000) (000) (003) (002)

[002] [000] [000] [014] [007] [002] [002] [079] [000]

Grades in Science Courses 280 -012 -006 -010 -007 |

(007) (004) (000) (000) (000)

[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts

Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey

(002) (003) (000) (000) (000) |

[000] [001] [001] [000] [001] [038]

Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than

standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby

a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of

43

the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)

reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the

experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply

imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and

where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)

from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those

treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and

control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to

derive confidence interval for the treatment effect itself)

30

1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the

Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the

effect of advanced high school courses more generally often without distinctions between AP

and other rigorous course options Nearly all of these nonexperimental studies find large positive

effects of rigorous secondary school courses particularly those in math and science on studentsrsquo

high school postsecondary and labor market performance (eg Altonji 1995 Attewell and

Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long

Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer

an AP course are encouraged (though not required) to attend a professional development

training The Board and other independent agencies offer several workshops with the most

extensive training being the AP summer institute a week-long training that is led by an

experienced AP instructor Teachers are then expected to develop their syllabi for the course and

submit them to the Board for review A team of auditors at the Board review each syllabus and

grant permission to a school to label the course as AP on course catalogs and student transcripts

once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they

do not meet the requirements upon original submission College Board (2017b) contains a

discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for

assessment (ie course delivery and student performance are not assessed by the Board) In

order to effectively run an AP Biology or Chemistry course teachers require access to a well-

equipped classroom and laboratory including all supplies necessary to engage in

experimentation (eg beakers solutions microscopes measuring equipment) Most of the

teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of

learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry

reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses

were based upon recommendations from the National Science Foundation the National Research

Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently

influenced by their frames of reference in ways that other assessments of these traits (eg

external observations) may be less influenced By increasing the standard to which they compare

themselves studentsrsquo confidence may decrease This feature of most self-assessments could be

considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et

al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome

depends to some extent on how these changes in perceived ability influence other behaviors

such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and

Biology I and Chemistry I for AP Biology with no additional requirements beyond these

prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week

training course classroom supplies (eg lab materials textbooks) and to compensate schools

for the staff time required for study administration efforts We also offered $1000 compensation

for an individual selected by the school to serve as a liaison between the study team and the

31

school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of

students which would have powered the study to detect effect sizes smaller than those detected

here We faced several challenges in recruiting schools to participate even with the monetary

incentives Some schools were uncomfortable with randomization across classrooms while

others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the

course would be offered We also made some assignments on a rolling basis as additional

consentassent forms were submitted We have no information on the students who were deemed

eligible by the school to take the new AP science course but who did not sign the consent form

to participate As these students did not participate we do not have permission to obtain

information on their characteristics (eg via transcripts) and for most schools we do not know

the number of such students 10 Participating districts include Anaheim Union High School District California East Side

Union High School District California Lynwood Unified School District California Jefferson

Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg

Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public

Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville

Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate

degrees held by teachers nationally are likely to be in education (not STEM) Thus the study

teachers are less likely to have a graduate degree but not necessarily less likely to have STEM

training We also did not survey teachers regarding their Teach for America (TFA) experience

but it is possible that the relatively high share of STEM undergraduate degrees could be driven

by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last

pilot test included 140 students) prior to administering the tool to study participants Reliability

metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of

the development of the assessment instrument in the survey can be found in Seeratan et al

(2017) 13 Each year in the spring semester our team administered and collected the participant surveys

during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However

if study participants who did not take the survey differ in unobserved ways then our reweighting

based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo

characteristics before imputation of missing values (as described below) these results are very

similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact

that some students did not have 10th grade scores) we created one reading and math score for

each student that is the average of both scores or just the 8th grade score For the 23 participating

students who were in 10th grade during the year in which the AP course was offered to their

cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be

endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of

enrollment in the course so the lack of balance is simply due to unlucky randomization rather

32

than manipulation by school administrators We considered implementing a randomized block

design to avoid such issues but found it infeasible to obtain the necessary test score information

prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was

allowed to register for the new class We added an entire planning year to our study design to

avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by

Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We

find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these

six outcomes which suggests that generalizing our estimated treatment effects to the full control

group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly

offered enrollment in an AP course and then given the option of Chemistry or Biology To

account for the two courses offered we treat the school as two separate groups School-

Chemistry and School-Biology For those students who were not offered an AP course we

randomly assign them to one of two control groups proportional to the number of treated

students who chose each course For example if 60 of the treated students chose Biology then

we randomly assign 60 of the control students to the School-Biology control group In Section

VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a

probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1

if student i in school by cohort j completed any part of the end-of-year survey Xi is the same

vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed

effects and Φ() is the cumulative normal distribution function The results of this regression are

included in Online Appendix Table 2 Students who had higher pre-treatment grades Black

students those who were not disabled and those who took prerequisite courses were more likely

to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives

more weight in the regression to study participants who completed the survey and yet had pre-

study characteristics that were similar to those study participants who did not complete the

survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and

with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we

observe each outcome variable This follows a multiple imputation then deletion strategy

suggested by Hippel (2007) which improves efficiency while protecting against problematic

imputed outcome values As a robustness check Section VC provides results including

imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually

challenging curriculum with more homework than non-AP complier students Treatment group

students are also more likely to report that the students in their class were driven to succeed and

that the teacher set high standards The AP science class also involved more student-led projects

or experiments hands on learning and small group work all activities that are deemed to be

essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)

Yet we do not find strong evidence that students in AP classes were more likely to present what

they learned apply their knowledge to solve a new problem or work independently and none of

the component measures of technology usage were statistically significantly affected Nor did

33

treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear

better able to implement the academic rigor expected of an AP science class than some of the

inquiry-based approaches that the College Board intends for AP science We do not find

evidence that taking AP science led students to be more likely to report that they found their

course more interesting which may reflect the inability of the teachers to fully implement a

creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects

that might render our estimated effects smaller A research design with randomization both

across and within schools would allow for estimation of spillover effects but such a design was

infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers

received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the

weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors

in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple

outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons

(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same

three outcomes that reach statistical significance without applying the correction (shown in

Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys

from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we

have included the students from cohort 1 of high school number 23 where nonresponse was due

mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes

(including student prior academic preparation raceethnicity gender and teacher preparation)

We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in

science and grades in other courses) Some of the differences in the point estimates were quite

large yet so too were the standard errors For instance five of the seven estimated differential

treatment effects on science skill exceed 025 standard deviations with p-values that fall in the

suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse

on all three cohorts of study participants Once data collection is complete we will have the

ability to examine the effect of AP science on college enrollment college selectivity and college

completion

Page 10: Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown,

10

IV Empirical Strategy

We estimate the effect of taking the AP science course with a standard instrumental variable

specification

(1) 119884119894119895 = 120572119895 + 119860119894119895120573 + 119935119894120574 + 120598119894119895

(2) 119860119875119894119895 = 120575119895 + 119874119891119891119890119903119890119889119894119895120579 + 119935119894120583 + 120598119894119895

where 119860119875119894119895 = 1 if student i enrolled in the AP science course in school x cohort stratum j 119860119894119895 is

the fitted value based on the estimates of the parameters in Equation (2) Offeredij = 1 if the

student is randomized into the treatment group Xi is a vector of pre-treatment covariates

(including age math and reading exam scores from 8th and 10th grade (standardized and

averaged for math and reading separately) cumulative GPA prior to the year when the AP

science course was offered and indicator variables for female racial group (Asian American

Black or Hispanic Native American or Multiracial) disability gifted English Language

Learner eligible for free or reduced-price lunch home language is not English and took

recommended prerequisite courses) and 120572119895 and 120575119895 are school by cohort fixed effects19 We use

two-stage least squares to estimate the model for all outcomes The local average treatment effect

(LATE) estimate is given by β

The intent to treat (ITT) estimate is obtained by replacing 119860119894119895 with Offeredij in Equation (1)

as shown in Equation (3) The coefficient on Offeredij in Equation (3) provides the effect of

being offered enrollment in the new AP science course and is a weighted average of effects on

those who do and do not choose to enroll in the course

(3) 119884119894119895 = 120577119895 + 119874119891119891119890119903119890119889119894119895120591 + 119935119894120582 + 120598119894119895

For outcomes that are obtained from the survey we weight regressions by the inverse of the

estimated probability of completing the survey20 The results are similar without using these

weights (see Online Appendix Tables 3 4 and 6) Since we have some missingness in student

characteristics as a result of either missing student transcripts or certain data elements not

collected by the district we use multiple imputation by chained equations creating 10 imputed

datasets and combine the results21 For inference we cluster standard errors at the level of

treatment assignment (school by cohort) in our analysis of main effects In the analysis of

robustness we report permutation standard errors robust standard errors (for comparison to

permutations) and the statistical significance of the LATE estimates after adjusting our tests of

significance for multiple comparisons

V Results

A Course-Taking and Treatment Contrast

Table 3 provides estimated effects of the randomized offer of enrollment on AP science course

enrollment and share of credits in all courses for the full sample and the survey samples The

first-stage estimates indicate that the offer substantially increased the likelihood of the student

taking the AP science course by 38 percentage points in the full sample and 39 percentage points

in the survey sample As we expected compliance with randomization was imperfect with 42

11

percent of the students who received an offer choosing not to enroll and 19 percent of the control

students enrolling Nearly all of these latter crossovers reflected decisions by the district to

violate the study protocol and let control group students into the course while a few of these

came from hardship exemptions that were requested by the school and granted by the study team

The remaining rows in Table 3 shine light on the courses that were crowded out by the newly

offered AP science course Mechanically treatment group students took more credits in AP

science (an 11-percentage point increase in the share of total credits in the full sample)

Treatment group studentsrsquo share of courses in any AP also increased by 11 percentage points

indicating that they chose not to reduce enrollment in other AP courses Instead taking AP

science appears to have crowded out regular courses (down 9 percentage points) including

regular science courses (down 2 percentage points)22

Approximately 78 percent of the control group compliers took any science course with 34

percent taking a non-AP advanced science course (almost entirely honors courses) during the

study year The control students who did not take AP Biology or Chemistry took a variety of

alternative science courses with the most commonly reported courses including Chemistry

(13) Physics (12) AP Environmental Science (11) Biology (10) Honors Biology (9)

and AnatomyPhysiology (9)

Table 4 provides the contrast in treatment and control group complier reports on the content

and rigor of their science courses for three composite variables We find that taking AP science

yielded a substantially more academically challenging curriculum (up 080 sd p-value lt 001)

and raised the extent of inquiry-based classroom activities (up 033 sd p-value = 006) Our

results also suggest that AP course-takerrsquos classrooms were more likely to use technology (up

028 sd p-value = 014)23 Online Appendix Table 5 shows estimated impacts on each of the

component variables used in constructing the composite variables We find that while AP

classrooms were more inquiry-based than other science classrooms using our composite

measure some of the core components of the inquiry approach that were intended by the Board

(eg applying knowledge to solve a new problem) were not more prevalent in AP science

classes than other science classes24 This contrast between studentsrsquo reports of the content and

rigor of their AP science course relative to other courses available to them offers one measure of

the relative quality of the treatment In a companion manuscript we provide a detailed evaluation

of implementation fidelity (the degree to which the courses were implemented as intended by the

Board) through teacher surveys course syllabi student transcripts and interviews with teachers

and school administrators (Long Conger and McGhee 2018) In that manuscript we find results

that are consistent with the finding that most teachers were able to implement a rigorous AP

science classroom yet they also struggled with the inquiry-based approach and integrating

technology into the classroom

These reported differences between treatment and control group classrooms also hold despite

the fact that many of the teachers selected to teach AP also teach the other science courses taken

by control group students In fact almost 67 percent of AP teachers reported using some of their

AP science strategies and lessons in their non-AP classes These within-school spillovers likely

attenuate observed differences in outcomes between treatment and control group students in the

same school25

B AP Impact on Outcomes

Table 5 reports estimated impacts of AP science on the key outcomes of interest We estimate

that for the typical complier taking AP science raises objectively measured scientific inquiry

skills by 023 standard deviations We are unable to rule out zero treatment impacts with

12

conventionally high levels of confidence (p-value = 014) and consequently refer to these results

as more suggestive than definitive AP science also increased compliersrsquo interest in pursuing a

STEM degree should they enroll in college by 9 percentage points up from a control group

complier mean of 62 percent with again more suggestive than definitive results at traditional

levels of statistical inference (p-value = 016)

Table 5 provides stronger evidence of negative treatment effects on studentsrsquo confidence in

their ability to succeed in a college science course Among control group compliers 92 percent

express that they are at least somewhat confident in their ability to succeed in a college science

course These high levels of confidence are perhaps not surprising since all of our sample

participants demonstrated interest in taking AP Chemistry or Biology as a result of signing the

study assent forms Taking AP science substantially lowered participantsrsquo likelihood of being at

least somewhat confident in their ability to complete college courses in science (down 10

percentage points p-value = 006) We also find large effects of the AP course on studentsrsquo self-

reported stress levels Among control group compliers 12 percent stated that their most recent

science class had a negative or strong negative impact on their stress levels (where a negative

impact indicates more stress) Taking AP science more than doubles this rate raising the

likelihood of stating a negative impact by 17 percentage points (p-value = 001) In results

available from the authors we also examine the effect of taking AP on the full distribution of

studentrsquos self-reported confidence and stress levels We find that taking AP science increases

studentsrsquo likelihood of reporting strong negative impacts on stress by 5 percentage points (p-

value = 005) above the control group complier mean of 2 percent

In addition to experiencing a loss in confidence and an increase in stress treatment group

studentsrsquo grades suffered We estimate that taking AP science reduced studentsrsquo grades in their

science courses by 029 points (p-value = 007) Relative to a control group complier mean of

280 taking AP science lowers studentsrsquo science GPAs during the study year (usually their junior

year) from around a B- to a C+26 This decline is addressed to some degree by high schools that

use a weighted grade point average to upweight grades from AP courses The last row of Table 5

provides our estimated effects of AP science on studentsrsquo grades in other courses AP science

takers score approximately 018 grade points lower than control group compliers in non-science

courses during the study year (p-value below 001) These results suggest that students may be

shifting their effort away from their non-AP classes in order to meet the demands of the

challenging AP course An average of these impacts weighted by studentsrsquo share of credits in

science during the study year assuming that they take AP science (024) suggests that taking AP

science lowers studentsrsquo overall grades by 021 during the year ((-029 times 024) + (-018 times

076))

With our estimates in hand we can easily compute the adjustment that would leave the

studentrsquos GPA during the study year unaffected For students who took AP Biology or Chemistry

as result of this experiment the share of their classes in any AP science subject is predicted to be

14 percent (ie 002 + 012 from Table 3) If these studentsrsquo grades in AP science courses were

boosted by 146 (021014) their GPAs during the study year would be unaffected by their

enrollment in these AP courses This 146 boost is close to the higher end of the practices

documented in Klopfenstein and Lively (2016)27

C Robustness Checks

Table 6 presents a variety of robustness checks of the ITT estimates on our six main outcomes

The first two columns of this table repeat the findings previously shown in Table 5 Columns (3)

and (4) present alternate methods for inference Column (3) reports robust standard errors and

13

Column (4) reports the results of a permutation test where we randomly assign a pseudo

treatment and compute the share of 1000 permutations where the absolute value of the estimated

pseudo treatment effect exceeds the absolute value of the estimated treatment effect shown in

Column (2)28 The resulting p-values from this permutation test are similar to the results using

robust standard errors (shown in Column (3)) resulting in five of the six outcomes with p-values

of less than 01029

Columns (5) through (7) of Table 6 show that the results are robust to (a) dropping the one

high school that offered both AP Biology and AP Chemistry as part of the study (b) including

observations with multiply-imputed missing outcome variables and (c) excluding the high

school with the lowest survey response rate30 Column (8) shows the results when we exclude all

of the Xi covariates where we find much larger estimated positive effects on scientific inquiry

skills and smaller estimated negative effects on grades The differences in the treatment effects

on the remaining three outcomes are modest These results likely reflect the fact that students

who were randomly assigned into the treatment group have higher pre-treatment grades and

reading and math test scores all covariates that strongly correlate with science skill and future

grades

Columns (9) through (12) of Table 6 use the Lee (2009) method to place bounds on our

estimates due to potential nonresponse bias in the student survey used for the first four outcomes

This method trims particular observations from the treatment group (in this case) until it matches

the response rate of the control group The lower (upper) bound estimate trims the treatment

observations with the highest (lowest) values of the outcome Using these lower and upper bound

estimates we compute the 95 percent confidence interval for the treatment effect itself by

applying the Imbens and Manski (2004) method Consistent with our main findings the upper

and lower bound points estimates are positive for science skill (003 and 039 sd) interest in

pursuing a STEM degree (2 and 12 percentage points) and stress (1 and 11 percentage points)

However the 95 percent confidence intervals overlap zero in all cases and are roughly double the

size of the ordinary confidence intervals These results suggest that some additional caution

should be considered in evaluating the effects from outcomes based on the study survey31

Finally we would have liked to report the results of theoretically motivated heterogeneity

analyses yet we lack the statistical power needed to test heterogeneity with a high level of

confidence For example Figure 3 shows a quantile regression conditional on Xi with science

skill as the outcome We find that the point estimates at every quantile are insignificantly

different from the 009 ITT point estimate reported in Table 5 yet the 95 confidence intervals

fail to rule out large positives and negatives Additional heterogeneity results can be found in the

Online Appendix32

VI Conclusion

Most admissions committees at bachelorrsquos degree-granting institutions rely on applicantsrsquo AP

course and exam participation as signals of subject-matter skill and interest rendering the

relationship between AP uptake and college enrollment somewhat deterministic There has been

almost no empirical work to support the theory that AP disproportionately endows high school

students with greater human capital than the other courses available to them Many students

educators and parents have also complained that the rigor of the AP pro- gram causes students to

lose confidence gain stress and perform poorly in other courses We evaluate these claims with

experimental evidence on the impact of AP Biology and Chemistry courses on studentsrsquo skills

14

interests and beliefs We recruited 23 schools that had not previously offered AP Biology or

Chemistry and were willing to permit us to randomize student access to the newly offered

course At the time of our school recruitment an estimated 50 percent of US high schools

already offered AP science classes and they tended to be in relatively higher-income

communities disproportionately serving White students (Malkus 2016) Our study drew from the

remaining population of schools where teachers had lower levels of training than science

teachers nationally and students were disproportionately non-White and poor Consequently our

results on AP impacts best generalize to schools like these that are on the cusp of deciding

whether to offer an AP science course

The estimates suggest that AP science led to improvements in science skill and STEM

interest above the courses that these students would otherwise take Prior research points to

longer-run benefits of AP including a higher likelihood of college enrollment and completion as

well as possible earnings gains (Jackson 2010 2014) Our findings suggest that these long-term

effects are at least partially driven by genuine increases in skill and not due solely to

postsecondary admissions and credit-granting policies33 We also find that AP science classes

substantially increase studentsrsquo stress levels and reduce their confidence in completing a college

science course Students who take AP science also receive lower grades in science and in other

(non-science) courses The cognitive gains from AP science are consistent with evidence that

higher levels of pressure and a lower level of confidence cause students to learn more than they

would otherwise And some of the negative effect on grades can be offset by upwardly weighting

grades in advanced courses

Although we have no direct way to convert our study impacts into monetary values for

students or society our evidence suggests that schools and districts are not making unwise or

costly investments in AP Calculating the differential cost to deliver an AP course versus another

level course in the same subject is difficult given that few schools document per-course

expenditures One recent analysis of a US district that relied on teacher salaries and course

assignments offers a partial cost-analysis Roza (2009) finds approximately $360 more in per-

pupil expenditures to deliver AP versus honors due primarily to smaller class sizes and more

senior teachers in AP This cost does not factor in the time that teachers spend retraining

themselves to teach the new curriculum At the same time relative to other policies aimed at

increasing human capital in high school that are often more costly to implement (such as

reducing class size) offering an AP course may be one of the least expensive options

This study offers the first credible estimates on the impact of a curriculum that is now offered

in the majority of the nationrsquos high schools and used by most postsecondary institutions to assess

applicant potential Our findings offer evidence to support and refute some of the claims made

about the AP program At the same time many important questions remain about differential AP

course impacts along student teacher and school attributes and on different parts of the outcome

distributions What are the general equilibrium effects of AP expansion for instance on college

admissions decisions as AP expands into schools with fewer resources Do AP courses generate

spillover effects on non-AP course-takers via changes in peer interactions and changes in how

teachers teach their non-AP classes These are all questions that warrant further research

15

References

Abadie Alberto Susan Athey Guido W Imbens and Jeffrey Wooldridge 2017 ldquoWhen Should

you Adjust Standard Errors for Clusteringrdquo NBER Working Paper No 24003

Cambridge MA NBER

Adelman Clifford 2006 The Toolbox Revisited Paths to Degree Completion from High School

Through College Washington DC US Department of Education

Aguilar Lauren Greg Walton and Carl Wieman 2014 ldquoPsychological Insights for Improved

Physics Teachingrdquo Physics Today 67 (5) 43ndash49

Altonji Joseph G 1995 ldquoThe Effects of High School Curriculum on Education and Labor

Market Outcomesrdquo The Journal of Human Resources 30 (3) 409ndash438

Anderson Carl R 1976 ldquoCoping Behaviors as Intervening Mechanisms in the Inverted-U-stress-

performance Relationshiprdquo Journal of Applied Psychology 61 (1) 30ndash34

Attewell Paul and Thurston Domina 2008 ldquoRaising the Bar Curricular Intensity and

Academic Performancerdquo Educational Evaluation and Policy Analysis 30 (1) 51ndash71

Avery Christopher Oded Gurantz Michael Hurwitz and Jonathan Smith 2018 ldquoShifting

College Majors in Response to Advanced Placement Exam Scoresrdquo Journal of Human

Resources 53 (4) 918ndash956

Benjamini Yoav and Yosef Hochberg 1995 ldquoControlling the False Discovery Rate A Practical

and Powerful Approach to Multiple Testingrdquo Journal of the Royal Statistical Society 57

(1) 289ndash300

Bennett J S Hogarth F Lubben B Campbell and A Robinson 2010 ldquoTalking Science The

Research Evidence on the Use of Small Group Discussions in Science Teachingrdquo

International Journal of Science Education 32 (1) 69ndash95

Berger Joe 2006 ldquoDemoting Advanced Placementrdquo The New York Times October 4

Boekaerts Monique and Jeroen S Rozendaal 2010 ldquoUsing Multiple Calibration Indices in

Order to Capture the Complex Picture of What Affects Studentsrsquo Accuracy of Feeling of

Confidencerdquo Learning and Instruction 20 (5) 372ndash382

Bound John Brad Hershbein and Bridget Terry Long 2009 ldquoPlaying the Admissions Game

Student Reactions to Increasing College Competitionrdquo The Journal of Economic

Perspectives 23 (4) 119ndash146

Bowie Liz 2013 ldquoMaryland Schools have been Leader in Advanced Placement but Results are

Mixedrdquo The Baltimore Sun August 17

Bush George W 2006 ldquoState of the Union Address by the Presidentrdquo Washington DC The

White House

Chiu Ming Ming and Robert M Klassen 2010 ldquoRelations of Mathematics Self-Concept and its

Calibration with Mathematics Achievement Cultural Differences among Fifteen-year-

olds in 34 Countriesrdquo Learning and Instruction 20 (1) 2ndash17

Clotfelter Charles T Helen F Ladd and Jacob L Vigdor 2010 ldquoTeacher Credentials and

Student Achievement in High School Across-Subject Analysis with Student Fixed

Effectsrdquo Journal of Human Resources 45 (3) 655ndash681

College Board 2002 Equity Policy Statement New York NY

__________ 2011a AP Biology Curriculum Framework 2012-2013 New York NY

__________ 2011b AP Chemistry Curriculum Framework 2013-2014 New York NY

__________ 2017a AP Course and Exam Redesign New York NY

__________ 2017b AP Course Audit New York NY

__________ 2018 AP Program Participation and Performance Data 2018 New York NY

16

Davis Jennifer R 2014 ldquoA Little Goes a Long Way Pressure for College Students to Succeedrdquo

Journal of Undergraduate Research 12 (1) 1ndash9

Dobbie Will and Roland G Fryer Jr 2015 ldquoThe medium-term impacts of high-achieving

charter schoolsrdquo Journal of Political Economy 123 (5) 985ndash1037

Dougherty Chrys and Lynn Mellor 2009 ldquoPreparation Mattersrdquo National Center for

Educational Achievement Washington DC

Dounay Zinth Jennifer 2016 ldquo50-State Comparison Advanced Placement Policiesrdquo Education

Commission of the States

Drew Christopher 2011 ldquoRethinking Advanced Placementrdquo The New York Times January 7

Duffett Ann and Steve Farkas 2009 ldquoGrowing Pains in the Advanced Placement Program Do

Tough Trade-offs Lie Aheadrdquo Thomas B Fordham Institute Washington DC

Ellis Jessica Bailey K Fosdick and Chris Rasmussen 2016 ldquoWomen 15 Times More Likely to

Leave STEM Pipeline after Calculus Compared to Men Lack of Mathematical

Confidence a Potential Culpritrdquo PLOS ONE 11 (7) 1ndash14

Foust Regan Clark Holly Hertberg-Davis and Carolyn M Callahan 2009 ldquoStudentsrsquo

Perceptions of the Non-academic Advantages and Disadvantages of Participation in

Advanced Placement Courses and International Baccalaureate Programsrdquo Adolescence

44 (174) 289ndash312

Geiser Saul and Veronica Santelices 2004 ldquoThe Role of Advanced Placement and Honors

Courses in College Admissionsrdquo Center for Studies in Higher Education Research

Occasional Paper Series CSHE404

Goodman Joshua Samuel 2012 ldquoThe Labor of Division Returns to Compulsory Math

Courseworkrdquo Unpublished Manuscript

Harel O 2009 ldquoThe Estimation of R-squared and Adjusted R-squared in Incomplete Data Sets

Using Multiple Imputationrdquo Journal of Applied Statistics 36 (10) 1109ndash1118

Hippel Paul T von 2007 ldquoRegression with Missing Ys An Improved Strategy for Analyzing

Multiply Imputed Datardquo Sociological Methodology 37 (1) 83ndash117

Holstead Michael S Terry E Spradlin Margaret E McGillivray and Nathan Burroughs 2010

ldquoThe Impact of Advanced Placement Incentive Programsrdquo Center for Evaluation and

Education Policy Indiana University Education Policy Brief 8(1)

Hopkins Katy 2012 ldquoWeigh the Benefits Stress of AP Courses for Your Studentrdquo US News

the World Report May 10

Huber Martin 2013 ldquoA Simple Test for the Ignorability of Non-compliance in Experimentsrdquo

Economics Letters 120 (3) 389ndash391

Imbens G and F Manski 2004 ldquoConfidence Intervals for Partially Identified Parametersrdquo

Econometrica 72 (6) 1845ndash1857

Jackson C Kirabo 2010 ldquoA Little Now for a Lot Later A Look at a Texas Advanced Placement

Incentive Programrdquo Journal of Human Resources 45 (3) 591ndash639

__________ 2014 ldquoDo College-Preparatory Programs Improve Long-Term Outcomesrdquo

Economic Inquiry 52 (1) 72ndash99

Joensen Juanna Schroslashter and Helena Skyt Nielsen 2009 ldquoIs there a Causal Effect of High

School Math on Labor Market Outcomesrdquo Journal of Human Resources 44 (1) 171ndash

198

Kim Emily 2015 ldquoAP Classes often Translate to Advanced Pressurerdquo Los Angeles Times

September 22

Klopfenstein Kristin and Kit Lively 2016 ldquoDo Grade Weights Promote More Advanced

17

Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324

Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement

Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891

__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and

Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds

Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188

Cambridge Harvard Education Press

Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla

Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)

287ndash 313

Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on

Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102

Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations

of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347

(6219) 262ndash265

Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math

and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic

Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student

STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher

Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking

on Secondary and Postsecondary Successrdquo American Educational Research Journal 49

(2) 285ndash322

Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP

Expansion Can Schools in Less-Resourced Communities Successfully Implement

Advanced Placement Science Coursesrdquo Conditionally accepted by Educational

Researcher

Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo

American Enterprise Institute Washington DC

Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23

McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy

Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of

Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-

144) US Department of Education Washington DC National Center for Education

Statistics

National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of

Mathematics and Science in US High Schoolsrdquo Washington DC National Academies

Press

__________ 2012 A Framework for K-12 Science Education Practices Crosscutting

Concepts and Core Ideas Washington DC The National Academies Press

Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC

Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data

Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures

Version 10 Stanford University

Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic

Analysis amp Policy 4 (1) 1ndash30

18

Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The

Review of Economics and Statistics 86 (2) 497ndash513

Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)

Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of

Advanced High School Coursework in Increasing STEM Career Interestrdquo Science

Educator 23 (1) 1ndash13

Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework

in College Admission Decisionsrdquo College and University 82 (4) 7ndash14

Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan

Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific

Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo

Educational Measurement Forthcoming

Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where

it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor

Economics 35 (1) 67ndash147

Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An

Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732

Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual

differencesrdquo Personality and Individual Differences 21 (6) 971ndash986

Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of

Cross-Cultural Psychology 45 (5) 821ndash837

Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid

Growthrdquo The New York Times April 29

Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo

Liberal Education 94 (3) 38ndash43

The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo

Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo

Education Trust June 5

Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and

Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-

001) US Department of Education Washington DC National Center for Education

Statistics

Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13

Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate

US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the

Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced

Placement Testsrdquo Washington DC

Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of

Advanced Placementrdquo Progressive Policy Institute Washington DC

West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth

Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring

Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation

and Policy Analysis 38 (1) 148ndash170

Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity

of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482

19

Figure 1

Geographic Distribution of Participating Districts

20

Figure 2

Participating Districts Neighborhood Socioeconomic Status and School Test Scores

Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school

district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos

neighborhood defined as the first principal component factor score based on measures of median

income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed

household rate and unemployment rate Y-axis is the districtrsquos average test score in grade

equivalents based on the averaged spring math and English scores for students in grades 3-8 for

2009-2013 with the expected level of achievement standardized to zero The size of each circle

is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using

Statarsquos default settings and roughly shows the predicted test score as a function of the

neighborhoodrsquos SES

21

Figure 3

Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile

Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects

Corresponding OLS estimate shown by the dashed horizontal line Science skill has been

standardized to have a mean of 0 and SD of 1 for the full sample of participating students

Results are weighted by the inverse probability of completing the survey

22

Table 1

Participating Schools and Teachers Compared to Other US High Schools and High School

Science Teachers Panel A Schools Participating Others

Average Enrollment 1409 723

Free or Reduced-Price Lunch 0700 0438

Asian 0055 0050

Black 0349 0154

Hispanic 0410 0221

White 0164 0537

Adjusted Cohort Graduation Rate 0843 0802

District Instruction Expenditures Per Pupil $6561 $5636

District Student Services Expenditures Per Pupil $3787 $3385

Panel B Teachers Participating Others

Age Under 30 0407 0160

Age 30-49 0432 0553

Age 50 or over 0161 0287

Female 0630 0536

Hispanic or Latino 0111 0051

Race American Indian or Alaska Native 0000 0009

Race Asian American 0111 0041

Race Black 0111 0060

Race Native Hawaiian or other Pacific Islander 0000 0004

Race White 0778 0896

Years of Experience 103 132

Years of Experience lt=2 0290 0085

Years of Experience lt=5 0481 0234

Hold a Teaching Certificate 0926 0945

Undergraduate Major in STEM 0944 0747

Single Subject Credential in Science 0630 0823

Masterrsquos Degree or Higher 0356 0615

Previously Taught AP Course 0469 NA

Previously Taught AP IB or Honors Course 0796 NA

Number of Professional Development Trainings 309 NA

in the Past 5 years (0-5)

Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts

httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public

high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a

9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the

Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey

httpsncesedgovsurveyssass Others in Panel B refers to public and private high school

teachers in the US High school science teachers are defined as teachers of grades 9-12 whose

main teaching assignment is in the natural sciences

23

Table 2

TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics

(1) (2) (3) (4) (5) (6)

Full Sample Survey Sample

Pre-Treatment Characteristic

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Age as of October of 11th Grade 166 -003 -007 166 -001 -001

(002) (007) (003) (009)

[019] [035] [065] [094]

Math Exam Score 038 008 025 044 007 030

(004) (010) (005) (016)

[008] [002] [017] [006]

Reading Exam Score 029 010 018 036 009 017

(003) (012) (004) (017)

[000] [014] [002] [031]

HS Grade Point Average 316 005 020 323 006 013

(003) (008) (003) (010)

[014] [002] [006] [020]

Female 059 000 010 061 -001 011

(003) (006) (004) (007)

[099] [010] [073] [012]

Asian American 012 002 010 012 003 010

(002) (005) (001) (007)

[027] [006] [007] [012]

Black 032 -002 -006 027 000 -005

(002) (006) (002) (005)

[029] [028] [088] [040]

Hispanic Native American or Multiracial 031 001 005 033 001 005

24

(002) (006) (002) (007)

[055] [041] [081] [051]

Disabled 002 000 -001 001 000 -001

(001) (001) (001) (001)

[093] [024] [057] [05]

Gifted 013 003 000 014 002 001

(002) (005) (002) (009)

[006] [100] [025] [089]

English Language Learner 005 001 002 004 001 004

(001) (002) (001) (003)

[041] [039] [054] [022]

Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007

(002) (007) (003) (009)

[066] [077] [072] [045]

Language Other than English Spoken at Home 034 002 003 035 001 004

(002) (007) (002) (007)

[032] [073] [059] [056]

Took Recommended Prerequisite Courses 079 000 009 079 002 005

(002) (004) (002) (005)

[084] [004] [043] [031]

Number of Observations 1819 1417

Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by

School x Cohort are in parentheses and p-values are in brackets

25

Table 3

First Stage Impacts on AP Course Enrollment and Overall Course Enrollment

(1) (2) (3) (4) (5) (6)

Full Sample Survey Respondents

Outcome

Control

Group

Mean

ITT

LATE

Control

Group

Mean

ITT

LATE

AP Treatment Course Enrollment 019 038 024 039

(005) (006)

[000] [000] Share of Credits During Study Year in

AP Science 003 004 011 003 004 010

(001) (001) (001) (001)

[000] [000] [000] [000]

All AP 013 004 011 014 004 010

(001) (002) (001) (002)

[000] [000] [000] [000]

Other Advanced Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [023] [020] [020]

All Other Advanced 025 -001 -003 025 -001 -003

(001) (002) (001) (003)

[023] [023] [030] [030]

Regular Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [020] [024] [019]

All Regular 062 -003 -009 061 -003 -007

(001) (003) (001) (003)

[002] [000] [007] [003]

Number of Observations 1819 1417

Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating

Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation

(1) Course-taking information collected from student transcripts Control Group Mean uses the

full control group for the first outcome (ie AP Treatment Course Enrollment) and those control

group members who complied with their assignment (ie those who did not take the AP

Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are

weighted by the inverse probability of completing the survey Standard errors clustered by School

x Cohort are in parentheses and p-values are in brackets

26

Table 4

Treatment Contrast (Composite Variables)

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Academically Challenging Curriculum -033 031 080

(010) (024)

[000] [000]

Project-Based Independent Classroom

Activities -006 013 033

(007) (017)

[007] [006]

Integrated Use of Technology

-011 011 028

(008) (019)

[019] [014]

Number of Observations 1417

Notes To construct these composite variables we first converted the values on each component

variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest

category was set to 10 the lowest to 00 and the remaining categories evenly spaced between

00 and 10 We then averaged and standardized these converted values Results are weighted by

the inverse probability of completing the survey Online Appendix Table 5 provides the list of

component variables Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

27

Table 5

AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Science Skill -019 009 023

(006) (016)

[015] [014]

STEM Interest 062 004 009

(002) (007)

[016] [016]

Confidence in College

Science 092 -004 -010

(002) (005)

[011] [006]

Stress 012 007 017

(003) (007)

[002] [001]

Grades in Science Courses 280 -012 -029

(007) (016)

[008] [007]

Grades in Other Courses 314 -007 -018

(002) (006)

[000] [000]

Number of Observations 1819 for grades 1417 for other

outcomes

Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of

participating students STEM interest =1 if high or some interest in pursuing a STEM degree or

=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to

complete a college science course or =0 if somewhat not confident or not at all confident Stress=

1 if most recent science course had strong negative or negative impact on physical or emotional

health or =0 if strong positive impact positive impact or no impact Grades in science and other

courses are obtained from student transcripts and measure grades during the study year

Results with the exception of grades during study year are weighted by the inverse probability of

completing the survey Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

Table 6

Robustness Checks of Main ITT Results

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Outcome

Control

Group

Complie

r Mean

Main

Result

s

Robus

t SE

p-value

(permutatio

n test)

Excludin

g High

School

56

Including

Imputatio

n of

Missing

Outcome

Variables

Excluding

Covariate

s

Excludin

g High

School

23

Lee

Lower

Boun

d

Lee

Upper

Boun

d

95

Confidence

Interval

from Lee

Bounds

Rati

o of

95

CI in

(11)

to

95

CI in

(7)

Science Skill -019 009 010 011 020 007 003 039

-

009

05

1 20

(006) (005) (000) (000) (000) (000) (007) (007)

[015] [006] [006] [020] [011] [001] [024] [072] [000]

STEM Interest 062 004 005 003 003 003 002 012

-

003

01

8 19

(002) (003) (000) (000) (000) (000) (003) (004)

[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College

Science 092 -004 -003 -006 -006 -004 -006 005

-

009

01

0 20

(002) (002) (000) (000) (000) (000) (002) (003)

[011] [005] [007] [037] [002] [003] [010] [000] [017]

Stress 012 007 005 006 008 007 001 011

-

005

01

5 16

(003) (002) (000) (000) (000) (000) (003) (002)

[002] [000] [000] [014] [007] [002] [002] [079] [000]

Grades in Science Courses 280 -012 -006 -010 -007 |

(007) (004) (000) (000) (000)

[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts

Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey

(002) (003) (000) (000) (000) |

[000] [001] [001] [000] [001] [038]

Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than

standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby

a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of

43

the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)

reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the

experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply

imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and

where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)

from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those

treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and

control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to

derive confidence interval for the treatment effect itself)

30

1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the

Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the

effect of advanced high school courses more generally often without distinctions between AP

and other rigorous course options Nearly all of these nonexperimental studies find large positive

effects of rigorous secondary school courses particularly those in math and science on studentsrsquo

high school postsecondary and labor market performance (eg Altonji 1995 Attewell and

Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long

Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer

an AP course are encouraged (though not required) to attend a professional development

training The Board and other independent agencies offer several workshops with the most

extensive training being the AP summer institute a week-long training that is led by an

experienced AP instructor Teachers are then expected to develop their syllabi for the course and

submit them to the Board for review A team of auditors at the Board review each syllabus and

grant permission to a school to label the course as AP on course catalogs and student transcripts

once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they

do not meet the requirements upon original submission College Board (2017b) contains a

discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for

assessment (ie course delivery and student performance are not assessed by the Board) In

order to effectively run an AP Biology or Chemistry course teachers require access to a well-

equipped classroom and laboratory including all supplies necessary to engage in

experimentation (eg beakers solutions microscopes measuring equipment) Most of the

teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of

learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry

reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses

were based upon recommendations from the National Science Foundation the National Research

Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently

influenced by their frames of reference in ways that other assessments of these traits (eg

external observations) may be less influenced By increasing the standard to which they compare

themselves studentsrsquo confidence may decrease This feature of most self-assessments could be

considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et

al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome

depends to some extent on how these changes in perceived ability influence other behaviors

such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and

Biology I and Chemistry I for AP Biology with no additional requirements beyond these

prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week

training course classroom supplies (eg lab materials textbooks) and to compensate schools

for the staff time required for study administration efforts We also offered $1000 compensation

for an individual selected by the school to serve as a liaison between the study team and the

31

school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of

students which would have powered the study to detect effect sizes smaller than those detected

here We faced several challenges in recruiting schools to participate even with the monetary

incentives Some schools were uncomfortable with randomization across classrooms while

others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the

course would be offered We also made some assignments on a rolling basis as additional

consentassent forms were submitted We have no information on the students who were deemed

eligible by the school to take the new AP science course but who did not sign the consent form

to participate As these students did not participate we do not have permission to obtain

information on their characteristics (eg via transcripts) and for most schools we do not know

the number of such students 10 Participating districts include Anaheim Union High School District California East Side

Union High School District California Lynwood Unified School District California Jefferson

Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg

Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public

Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville

Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate

degrees held by teachers nationally are likely to be in education (not STEM) Thus the study

teachers are less likely to have a graduate degree but not necessarily less likely to have STEM

training We also did not survey teachers regarding their Teach for America (TFA) experience

but it is possible that the relatively high share of STEM undergraduate degrees could be driven

by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last

pilot test included 140 students) prior to administering the tool to study participants Reliability

metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of

the development of the assessment instrument in the survey can be found in Seeratan et al

(2017) 13 Each year in the spring semester our team administered and collected the participant surveys

during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However

if study participants who did not take the survey differ in unobserved ways then our reweighting

based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo

characteristics before imputation of missing values (as described below) these results are very

similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact

that some students did not have 10th grade scores) we created one reading and math score for

each student that is the average of both scores or just the 8th grade score For the 23 participating

students who were in 10th grade during the year in which the AP course was offered to their

cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be

endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of

enrollment in the course so the lack of balance is simply due to unlucky randomization rather

32

than manipulation by school administrators We considered implementing a randomized block

design to avoid such issues but found it infeasible to obtain the necessary test score information

prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was

allowed to register for the new class We added an entire planning year to our study design to

avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by

Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We

find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these

six outcomes which suggests that generalizing our estimated treatment effects to the full control

group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly

offered enrollment in an AP course and then given the option of Chemistry or Biology To

account for the two courses offered we treat the school as two separate groups School-

Chemistry and School-Biology For those students who were not offered an AP course we

randomly assign them to one of two control groups proportional to the number of treated

students who chose each course For example if 60 of the treated students chose Biology then

we randomly assign 60 of the control students to the School-Biology control group In Section

VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a

probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1

if student i in school by cohort j completed any part of the end-of-year survey Xi is the same

vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed

effects and Φ() is the cumulative normal distribution function The results of this regression are

included in Online Appendix Table 2 Students who had higher pre-treatment grades Black

students those who were not disabled and those who took prerequisite courses were more likely

to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives

more weight in the regression to study participants who completed the survey and yet had pre-

study characteristics that were similar to those study participants who did not complete the

survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and

with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we

observe each outcome variable This follows a multiple imputation then deletion strategy

suggested by Hippel (2007) which improves efficiency while protecting against problematic

imputed outcome values As a robustness check Section VC provides results including

imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually

challenging curriculum with more homework than non-AP complier students Treatment group

students are also more likely to report that the students in their class were driven to succeed and

that the teacher set high standards The AP science class also involved more student-led projects

or experiments hands on learning and small group work all activities that are deemed to be

essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)

Yet we do not find strong evidence that students in AP classes were more likely to present what

they learned apply their knowledge to solve a new problem or work independently and none of

the component measures of technology usage were statistically significantly affected Nor did

33

treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear

better able to implement the academic rigor expected of an AP science class than some of the

inquiry-based approaches that the College Board intends for AP science We do not find

evidence that taking AP science led students to be more likely to report that they found their

course more interesting which may reflect the inability of the teachers to fully implement a

creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects

that might render our estimated effects smaller A research design with randomization both

across and within schools would allow for estimation of spillover effects but such a design was

infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers

received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the

weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors

in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple

outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons

(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same

three outcomes that reach statistical significance without applying the correction (shown in

Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys

from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we

have included the students from cohort 1 of high school number 23 where nonresponse was due

mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes

(including student prior academic preparation raceethnicity gender and teacher preparation)

We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in

science and grades in other courses) Some of the differences in the point estimates were quite

large yet so too were the standard errors For instance five of the seven estimated differential

treatment effects on science skill exceed 025 standard deviations with p-values that fall in the

suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse

on all three cohorts of study participants Once data collection is complete we will have the

ability to examine the effect of AP science on college enrollment college selectivity and college

completion

Page 11: Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown,

11

percent of the students who received an offer choosing not to enroll and 19 percent of the control

students enrolling Nearly all of these latter crossovers reflected decisions by the district to

violate the study protocol and let control group students into the course while a few of these

came from hardship exemptions that were requested by the school and granted by the study team

The remaining rows in Table 3 shine light on the courses that were crowded out by the newly

offered AP science course Mechanically treatment group students took more credits in AP

science (an 11-percentage point increase in the share of total credits in the full sample)

Treatment group studentsrsquo share of courses in any AP also increased by 11 percentage points

indicating that they chose not to reduce enrollment in other AP courses Instead taking AP

science appears to have crowded out regular courses (down 9 percentage points) including

regular science courses (down 2 percentage points)22

Approximately 78 percent of the control group compliers took any science course with 34

percent taking a non-AP advanced science course (almost entirely honors courses) during the

study year The control students who did not take AP Biology or Chemistry took a variety of

alternative science courses with the most commonly reported courses including Chemistry

(13) Physics (12) AP Environmental Science (11) Biology (10) Honors Biology (9)

and AnatomyPhysiology (9)

Table 4 provides the contrast in treatment and control group complier reports on the content

and rigor of their science courses for three composite variables We find that taking AP science

yielded a substantially more academically challenging curriculum (up 080 sd p-value lt 001)

and raised the extent of inquiry-based classroom activities (up 033 sd p-value = 006) Our

results also suggest that AP course-takerrsquos classrooms were more likely to use technology (up

028 sd p-value = 014)23 Online Appendix Table 5 shows estimated impacts on each of the

component variables used in constructing the composite variables We find that while AP

classrooms were more inquiry-based than other science classrooms using our composite

measure some of the core components of the inquiry approach that were intended by the Board

(eg applying knowledge to solve a new problem) were not more prevalent in AP science

classes than other science classes24 This contrast between studentsrsquo reports of the content and

rigor of their AP science course relative to other courses available to them offers one measure of

the relative quality of the treatment In a companion manuscript we provide a detailed evaluation

of implementation fidelity (the degree to which the courses were implemented as intended by the

Board) through teacher surveys course syllabi student transcripts and interviews with teachers

and school administrators (Long Conger and McGhee 2018) In that manuscript we find results

that are consistent with the finding that most teachers were able to implement a rigorous AP

science classroom yet they also struggled with the inquiry-based approach and integrating

technology into the classroom

These reported differences between treatment and control group classrooms also hold despite

the fact that many of the teachers selected to teach AP also teach the other science courses taken

by control group students In fact almost 67 percent of AP teachers reported using some of their

AP science strategies and lessons in their non-AP classes These within-school spillovers likely

attenuate observed differences in outcomes between treatment and control group students in the

same school25

B AP Impact on Outcomes

Table 5 reports estimated impacts of AP science on the key outcomes of interest We estimate

that for the typical complier taking AP science raises objectively measured scientific inquiry

skills by 023 standard deviations We are unable to rule out zero treatment impacts with

12

conventionally high levels of confidence (p-value = 014) and consequently refer to these results

as more suggestive than definitive AP science also increased compliersrsquo interest in pursuing a

STEM degree should they enroll in college by 9 percentage points up from a control group

complier mean of 62 percent with again more suggestive than definitive results at traditional

levels of statistical inference (p-value = 016)

Table 5 provides stronger evidence of negative treatment effects on studentsrsquo confidence in

their ability to succeed in a college science course Among control group compliers 92 percent

express that they are at least somewhat confident in their ability to succeed in a college science

course These high levels of confidence are perhaps not surprising since all of our sample

participants demonstrated interest in taking AP Chemistry or Biology as a result of signing the

study assent forms Taking AP science substantially lowered participantsrsquo likelihood of being at

least somewhat confident in their ability to complete college courses in science (down 10

percentage points p-value = 006) We also find large effects of the AP course on studentsrsquo self-

reported stress levels Among control group compliers 12 percent stated that their most recent

science class had a negative or strong negative impact on their stress levels (where a negative

impact indicates more stress) Taking AP science more than doubles this rate raising the

likelihood of stating a negative impact by 17 percentage points (p-value = 001) In results

available from the authors we also examine the effect of taking AP on the full distribution of

studentrsquos self-reported confidence and stress levels We find that taking AP science increases

studentsrsquo likelihood of reporting strong negative impacts on stress by 5 percentage points (p-

value = 005) above the control group complier mean of 2 percent

In addition to experiencing a loss in confidence and an increase in stress treatment group

studentsrsquo grades suffered We estimate that taking AP science reduced studentsrsquo grades in their

science courses by 029 points (p-value = 007) Relative to a control group complier mean of

280 taking AP science lowers studentsrsquo science GPAs during the study year (usually their junior

year) from around a B- to a C+26 This decline is addressed to some degree by high schools that

use a weighted grade point average to upweight grades from AP courses The last row of Table 5

provides our estimated effects of AP science on studentsrsquo grades in other courses AP science

takers score approximately 018 grade points lower than control group compliers in non-science

courses during the study year (p-value below 001) These results suggest that students may be

shifting their effort away from their non-AP classes in order to meet the demands of the

challenging AP course An average of these impacts weighted by studentsrsquo share of credits in

science during the study year assuming that they take AP science (024) suggests that taking AP

science lowers studentsrsquo overall grades by 021 during the year ((-029 times 024) + (-018 times

076))

With our estimates in hand we can easily compute the adjustment that would leave the

studentrsquos GPA during the study year unaffected For students who took AP Biology or Chemistry

as result of this experiment the share of their classes in any AP science subject is predicted to be

14 percent (ie 002 + 012 from Table 3) If these studentsrsquo grades in AP science courses were

boosted by 146 (021014) their GPAs during the study year would be unaffected by their

enrollment in these AP courses This 146 boost is close to the higher end of the practices

documented in Klopfenstein and Lively (2016)27

C Robustness Checks

Table 6 presents a variety of robustness checks of the ITT estimates on our six main outcomes

The first two columns of this table repeat the findings previously shown in Table 5 Columns (3)

and (4) present alternate methods for inference Column (3) reports robust standard errors and

13

Column (4) reports the results of a permutation test where we randomly assign a pseudo

treatment and compute the share of 1000 permutations where the absolute value of the estimated

pseudo treatment effect exceeds the absolute value of the estimated treatment effect shown in

Column (2)28 The resulting p-values from this permutation test are similar to the results using

robust standard errors (shown in Column (3)) resulting in five of the six outcomes with p-values

of less than 01029

Columns (5) through (7) of Table 6 show that the results are robust to (a) dropping the one

high school that offered both AP Biology and AP Chemistry as part of the study (b) including

observations with multiply-imputed missing outcome variables and (c) excluding the high

school with the lowest survey response rate30 Column (8) shows the results when we exclude all

of the Xi covariates where we find much larger estimated positive effects on scientific inquiry

skills and smaller estimated negative effects on grades The differences in the treatment effects

on the remaining three outcomes are modest These results likely reflect the fact that students

who were randomly assigned into the treatment group have higher pre-treatment grades and

reading and math test scores all covariates that strongly correlate with science skill and future

grades

Columns (9) through (12) of Table 6 use the Lee (2009) method to place bounds on our

estimates due to potential nonresponse bias in the student survey used for the first four outcomes

This method trims particular observations from the treatment group (in this case) until it matches

the response rate of the control group The lower (upper) bound estimate trims the treatment

observations with the highest (lowest) values of the outcome Using these lower and upper bound

estimates we compute the 95 percent confidence interval for the treatment effect itself by

applying the Imbens and Manski (2004) method Consistent with our main findings the upper

and lower bound points estimates are positive for science skill (003 and 039 sd) interest in

pursuing a STEM degree (2 and 12 percentage points) and stress (1 and 11 percentage points)

However the 95 percent confidence intervals overlap zero in all cases and are roughly double the

size of the ordinary confidence intervals These results suggest that some additional caution

should be considered in evaluating the effects from outcomes based on the study survey31

Finally we would have liked to report the results of theoretically motivated heterogeneity

analyses yet we lack the statistical power needed to test heterogeneity with a high level of

confidence For example Figure 3 shows a quantile regression conditional on Xi with science

skill as the outcome We find that the point estimates at every quantile are insignificantly

different from the 009 ITT point estimate reported in Table 5 yet the 95 confidence intervals

fail to rule out large positives and negatives Additional heterogeneity results can be found in the

Online Appendix32

VI Conclusion

Most admissions committees at bachelorrsquos degree-granting institutions rely on applicantsrsquo AP

course and exam participation as signals of subject-matter skill and interest rendering the

relationship between AP uptake and college enrollment somewhat deterministic There has been

almost no empirical work to support the theory that AP disproportionately endows high school

students with greater human capital than the other courses available to them Many students

educators and parents have also complained that the rigor of the AP pro- gram causes students to

lose confidence gain stress and perform poorly in other courses We evaluate these claims with

experimental evidence on the impact of AP Biology and Chemistry courses on studentsrsquo skills

14

interests and beliefs We recruited 23 schools that had not previously offered AP Biology or

Chemistry and were willing to permit us to randomize student access to the newly offered

course At the time of our school recruitment an estimated 50 percent of US high schools

already offered AP science classes and they tended to be in relatively higher-income

communities disproportionately serving White students (Malkus 2016) Our study drew from the

remaining population of schools where teachers had lower levels of training than science

teachers nationally and students were disproportionately non-White and poor Consequently our

results on AP impacts best generalize to schools like these that are on the cusp of deciding

whether to offer an AP science course

The estimates suggest that AP science led to improvements in science skill and STEM

interest above the courses that these students would otherwise take Prior research points to

longer-run benefits of AP including a higher likelihood of college enrollment and completion as

well as possible earnings gains (Jackson 2010 2014) Our findings suggest that these long-term

effects are at least partially driven by genuine increases in skill and not due solely to

postsecondary admissions and credit-granting policies33 We also find that AP science classes

substantially increase studentsrsquo stress levels and reduce their confidence in completing a college

science course Students who take AP science also receive lower grades in science and in other

(non-science) courses The cognitive gains from AP science are consistent with evidence that

higher levels of pressure and a lower level of confidence cause students to learn more than they

would otherwise And some of the negative effect on grades can be offset by upwardly weighting

grades in advanced courses

Although we have no direct way to convert our study impacts into monetary values for

students or society our evidence suggests that schools and districts are not making unwise or

costly investments in AP Calculating the differential cost to deliver an AP course versus another

level course in the same subject is difficult given that few schools document per-course

expenditures One recent analysis of a US district that relied on teacher salaries and course

assignments offers a partial cost-analysis Roza (2009) finds approximately $360 more in per-

pupil expenditures to deliver AP versus honors due primarily to smaller class sizes and more

senior teachers in AP This cost does not factor in the time that teachers spend retraining

themselves to teach the new curriculum At the same time relative to other policies aimed at

increasing human capital in high school that are often more costly to implement (such as

reducing class size) offering an AP course may be one of the least expensive options

This study offers the first credible estimates on the impact of a curriculum that is now offered

in the majority of the nationrsquos high schools and used by most postsecondary institutions to assess

applicant potential Our findings offer evidence to support and refute some of the claims made

about the AP program At the same time many important questions remain about differential AP

course impacts along student teacher and school attributes and on different parts of the outcome

distributions What are the general equilibrium effects of AP expansion for instance on college

admissions decisions as AP expands into schools with fewer resources Do AP courses generate

spillover effects on non-AP course-takers via changes in peer interactions and changes in how

teachers teach their non-AP classes These are all questions that warrant further research

15

References

Abadie Alberto Susan Athey Guido W Imbens and Jeffrey Wooldridge 2017 ldquoWhen Should

you Adjust Standard Errors for Clusteringrdquo NBER Working Paper No 24003

Cambridge MA NBER

Adelman Clifford 2006 The Toolbox Revisited Paths to Degree Completion from High School

Through College Washington DC US Department of Education

Aguilar Lauren Greg Walton and Carl Wieman 2014 ldquoPsychological Insights for Improved

Physics Teachingrdquo Physics Today 67 (5) 43ndash49

Altonji Joseph G 1995 ldquoThe Effects of High School Curriculum on Education and Labor

Market Outcomesrdquo The Journal of Human Resources 30 (3) 409ndash438

Anderson Carl R 1976 ldquoCoping Behaviors as Intervening Mechanisms in the Inverted-U-stress-

performance Relationshiprdquo Journal of Applied Psychology 61 (1) 30ndash34

Attewell Paul and Thurston Domina 2008 ldquoRaising the Bar Curricular Intensity and

Academic Performancerdquo Educational Evaluation and Policy Analysis 30 (1) 51ndash71

Avery Christopher Oded Gurantz Michael Hurwitz and Jonathan Smith 2018 ldquoShifting

College Majors in Response to Advanced Placement Exam Scoresrdquo Journal of Human

Resources 53 (4) 918ndash956

Benjamini Yoav and Yosef Hochberg 1995 ldquoControlling the False Discovery Rate A Practical

and Powerful Approach to Multiple Testingrdquo Journal of the Royal Statistical Society 57

(1) 289ndash300

Bennett J S Hogarth F Lubben B Campbell and A Robinson 2010 ldquoTalking Science The

Research Evidence on the Use of Small Group Discussions in Science Teachingrdquo

International Journal of Science Education 32 (1) 69ndash95

Berger Joe 2006 ldquoDemoting Advanced Placementrdquo The New York Times October 4

Boekaerts Monique and Jeroen S Rozendaal 2010 ldquoUsing Multiple Calibration Indices in

Order to Capture the Complex Picture of What Affects Studentsrsquo Accuracy of Feeling of

Confidencerdquo Learning and Instruction 20 (5) 372ndash382

Bound John Brad Hershbein and Bridget Terry Long 2009 ldquoPlaying the Admissions Game

Student Reactions to Increasing College Competitionrdquo The Journal of Economic

Perspectives 23 (4) 119ndash146

Bowie Liz 2013 ldquoMaryland Schools have been Leader in Advanced Placement but Results are

Mixedrdquo The Baltimore Sun August 17

Bush George W 2006 ldquoState of the Union Address by the Presidentrdquo Washington DC The

White House

Chiu Ming Ming and Robert M Klassen 2010 ldquoRelations of Mathematics Self-Concept and its

Calibration with Mathematics Achievement Cultural Differences among Fifteen-year-

olds in 34 Countriesrdquo Learning and Instruction 20 (1) 2ndash17

Clotfelter Charles T Helen F Ladd and Jacob L Vigdor 2010 ldquoTeacher Credentials and

Student Achievement in High School Across-Subject Analysis with Student Fixed

Effectsrdquo Journal of Human Resources 45 (3) 655ndash681

College Board 2002 Equity Policy Statement New York NY

__________ 2011a AP Biology Curriculum Framework 2012-2013 New York NY

__________ 2011b AP Chemistry Curriculum Framework 2013-2014 New York NY

__________ 2017a AP Course and Exam Redesign New York NY

__________ 2017b AP Course Audit New York NY

__________ 2018 AP Program Participation and Performance Data 2018 New York NY

16

Davis Jennifer R 2014 ldquoA Little Goes a Long Way Pressure for College Students to Succeedrdquo

Journal of Undergraduate Research 12 (1) 1ndash9

Dobbie Will and Roland G Fryer Jr 2015 ldquoThe medium-term impacts of high-achieving

charter schoolsrdquo Journal of Political Economy 123 (5) 985ndash1037

Dougherty Chrys and Lynn Mellor 2009 ldquoPreparation Mattersrdquo National Center for

Educational Achievement Washington DC

Dounay Zinth Jennifer 2016 ldquo50-State Comparison Advanced Placement Policiesrdquo Education

Commission of the States

Drew Christopher 2011 ldquoRethinking Advanced Placementrdquo The New York Times January 7

Duffett Ann and Steve Farkas 2009 ldquoGrowing Pains in the Advanced Placement Program Do

Tough Trade-offs Lie Aheadrdquo Thomas B Fordham Institute Washington DC

Ellis Jessica Bailey K Fosdick and Chris Rasmussen 2016 ldquoWomen 15 Times More Likely to

Leave STEM Pipeline after Calculus Compared to Men Lack of Mathematical

Confidence a Potential Culpritrdquo PLOS ONE 11 (7) 1ndash14

Foust Regan Clark Holly Hertberg-Davis and Carolyn M Callahan 2009 ldquoStudentsrsquo

Perceptions of the Non-academic Advantages and Disadvantages of Participation in

Advanced Placement Courses and International Baccalaureate Programsrdquo Adolescence

44 (174) 289ndash312

Geiser Saul and Veronica Santelices 2004 ldquoThe Role of Advanced Placement and Honors

Courses in College Admissionsrdquo Center for Studies in Higher Education Research

Occasional Paper Series CSHE404

Goodman Joshua Samuel 2012 ldquoThe Labor of Division Returns to Compulsory Math

Courseworkrdquo Unpublished Manuscript

Harel O 2009 ldquoThe Estimation of R-squared and Adjusted R-squared in Incomplete Data Sets

Using Multiple Imputationrdquo Journal of Applied Statistics 36 (10) 1109ndash1118

Hippel Paul T von 2007 ldquoRegression with Missing Ys An Improved Strategy for Analyzing

Multiply Imputed Datardquo Sociological Methodology 37 (1) 83ndash117

Holstead Michael S Terry E Spradlin Margaret E McGillivray and Nathan Burroughs 2010

ldquoThe Impact of Advanced Placement Incentive Programsrdquo Center for Evaluation and

Education Policy Indiana University Education Policy Brief 8(1)

Hopkins Katy 2012 ldquoWeigh the Benefits Stress of AP Courses for Your Studentrdquo US News

the World Report May 10

Huber Martin 2013 ldquoA Simple Test for the Ignorability of Non-compliance in Experimentsrdquo

Economics Letters 120 (3) 389ndash391

Imbens G and F Manski 2004 ldquoConfidence Intervals for Partially Identified Parametersrdquo

Econometrica 72 (6) 1845ndash1857

Jackson C Kirabo 2010 ldquoA Little Now for a Lot Later A Look at a Texas Advanced Placement

Incentive Programrdquo Journal of Human Resources 45 (3) 591ndash639

__________ 2014 ldquoDo College-Preparatory Programs Improve Long-Term Outcomesrdquo

Economic Inquiry 52 (1) 72ndash99

Joensen Juanna Schroslashter and Helena Skyt Nielsen 2009 ldquoIs there a Causal Effect of High

School Math on Labor Market Outcomesrdquo Journal of Human Resources 44 (1) 171ndash

198

Kim Emily 2015 ldquoAP Classes often Translate to Advanced Pressurerdquo Los Angeles Times

September 22

Klopfenstein Kristin and Kit Lively 2016 ldquoDo Grade Weights Promote More Advanced

17

Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324

Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement

Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891

__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and

Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds

Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188

Cambridge Harvard Education Press

Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla

Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)

287ndash 313

Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on

Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102

Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations

of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347

(6219) 262ndash265

Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math

and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic

Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student

STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher

Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking

on Secondary and Postsecondary Successrdquo American Educational Research Journal 49

(2) 285ndash322

Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP

Expansion Can Schools in Less-Resourced Communities Successfully Implement

Advanced Placement Science Coursesrdquo Conditionally accepted by Educational

Researcher

Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo

American Enterprise Institute Washington DC

Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23

McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy

Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of

Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-

144) US Department of Education Washington DC National Center for Education

Statistics

National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of

Mathematics and Science in US High Schoolsrdquo Washington DC National Academies

Press

__________ 2012 A Framework for K-12 Science Education Practices Crosscutting

Concepts and Core Ideas Washington DC The National Academies Press

Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC

Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data

Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures

Version 10 Stanford University

Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic

Analysis amp Policy 4 (1) 1ndash30

18

Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The

Review of Economics and Statistics 86 (2) 497ndash513

Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)

Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of

Advanced High School Coursework in Increasing STEM Career Interestrdquo Science

Educator 23 (1) 1ndash13

Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework

in College Admission Decisionsrdquo College and University 82 (4) 7ndash14

Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan

Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific

Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo

Educational Measurement Forthcoming

Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where

it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor

Economics 35 (1) 67ndash147

Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An

Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732

Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual

differencesrdquo Personality and Individual Differences 21 (6) 971ndash986

Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of

Cross-Cultural Psychology 45 (5) 821ndash837

Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid

Growthrdquo The New York Times April 29

Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo

Liberal Education 94 (3) 38ndash43

The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo

Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo

Education Trust June 5

Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and

Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-

001) US Department of Education Washington DC National Center for Education

Statistics

Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13

Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate

US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the

Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced

Placement Testsrdquo Washington DC

Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of

Advanced Placementrdquo Progressive Policy Institute Washington DC

West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth

Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring

Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation

and Policy Analysis 38 (1) 148ndash170

Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity

of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482

19

Figure 1

Geographic Distribution of Participating Districts

20

Figure 2

Participating Districts Neighborhood Socioeconomic Status and School Test Scores

Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school

district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos

neighborhood defined as the first principal component factor score based on measures of median

income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed

household rate and unemployment rate Y-axis is the districtrsquos average test score in grade

equivalents based on the averaged spring math and English scores for students in grades 3-8 for

2009-2013 with the expected level of achievement standardized to zero The size of each circle

is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using

Statarsquos default settings and roughly shows the predicted test score as a function of the

neighborhoodrsquos SES

21

Figure 3

Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile

Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects

Corresponding OLS estimate shown by the dashed horizontal line Science skill has been

standardized to have a mean of 0 and SD of 1 for the full sample of participating students

Results are weighted by the inverse probability of completing the survey

22

Table 1

Participating Schools and Teachers Compared to Other US High Schools and High School

Science Teachers Panel A Schools Participating Others

Average Enrollment 1409 723

Free or Reduced-Price Lunch 0700 0438

Asian 0055 0050

Black 0349 0154

Hispanic 0410 0221

White 0164 0537

Adjusted Cohort Graduation Rate 0843 0802

District Instruction Expenditures Per Pupil $6561 $5636

District Student Services Expenditures Per Pupil $3787 $3385

Panel B Teachers Participating Others

Age Under 30 0407 0160

Age 30-49 0432 0553

Age 50 or over 0161 0287

Female 0630 0536

Hispanic or Latino 0111 0051

Race American Indian or Alaska Native 0000 0009

Race Asian American 0111 0041

Race Black 0111 0060

Race Native Hawaiian or other Pacific Islander 0000 0004

Race White 0778 0896

Years of Experience 103 132

Years of Experience lt=2 0290 0085

Years of Experience lt=5 0481 0234

Hold a Teaching Certificate 0926 0945

Undergraduate Major in STEM 0944 0747

Single Subject Credential in Science 0630 0823

Masterrsquos Degree or Higher 0356 0615

Previously Taught AP Course 0469 NA

Previously Taught AP IB or Honors Course 0796 NA

Number of Professional Development Trainings 309 NA

in the Past 5 years (0-5)

Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts

httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public

high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a

9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the

Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey

httpsncesedgovsurveyssass Others in Panel B refers to public and private high school

teachers in the US High school science teachers are defined as teachers of grades 9-12 whose

main teaching assignment is in the natural sciences

23

Table 2

TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics

(1) (2) (3) (4) (5) (6)

Full Sample Survey Sample

Pre-Treatment Characteristic

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Age as of October of 11th Grade 166 -003 -007 166 -001 -001

(002) (007) (003) (009)

[019] [035] [065] [094]

Math Exam Score 038 008 025 044 007 030

(004) (010) (005) (016)

[008] [002] [017] [006]

Reading Exam Score 029 010 018 036 009 017

(003) (012) (004) (017)

[000] [014] [002] [031]

HS Grade Point Average 316 005 020 323 006 013

(003) (008) (003) (010)

[014] [002] [006] [020]

Female 059 000 010 061 -001 011

(003) (006) (004) (007)

[099] [010] [073] [012]

Asian American 012 002 010 012 003 010

(002) (005) (001) (007)

[027] [006] [007] [012]

Black 032 -002 -006 027 000 -005

(002) (006) (002) (005)

[029] [028] [088] [040]

Hispanic Native American or Multiracial 031 001 005 033 001 005

24

(002) (006) (002) (007)

[055] [041] [081] [051]

Disabled 002 000 -001 001 000 -001

(001) (001) (001) (001)

[093] [024] [057] [05]

Gifted 013 003 000 014 002 001

(002) (005) (002) (009)

[006] [100] [025] [089]

English Language Learner 005 001 002 004 001 004

(001) (002) (001) (003)

[041] [039] [054] [022]

Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007

(002) (007) (003) (009)

[066] [077] [072] [045]

Language Other than English Spoken at Home 034 002 003 035 001 004

(002) (007) (002) (007)

[032] [073] [059] [056]

Took Recommended Prerequisite Courses 079 000 009 079 002 005

(002) (004) (002) (005)

[084] [004] [043] [031]

Number of Observations 1819 1417

Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by

School x Cohort are in parentheses and p-values are in brackets

25

Table 3

First Stage Impacts on AP Course Enrollment and Overall Course Enrollment

(1) (2) (3) (4) (5) (6)

Full Sample Survey Respondents

Outcome

Control

Group

Mean

ITT

LATE

Control

Group

Mean

ITT

LATE

AP Treatment Course Enrollment 019 038 024 039

(005) (006)

[000] [000] Share of Credits During Study Year in

AP Science 003 004 011 003 004 010

(001) (001) (001) (001)

[000] [000] [000] [000]

All AP 013 004 011 014 004 010

(001) (002) (001) (002)

[000] [000] [000] [000]

Other Advanced Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [023] [020] [020]

All Other Advanced 025 -001 -003 025 -001 -003

(001) (002) (001) (003)

[023] [023] [030] [030]

Regular Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [020] [024] [019]

All Regular 062 -003 -009 061 -003 -007

(001) (003) (001) (003)

[002] [000] [007] [003]

Number of Observations 1819 1417

Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating

Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation

(1) Course-taking information collected from student transcripts Control Group Mean uses the

full control group for the first outcome (ie AP Treatment Course Enrollment) and those control

group members who complied with their assignment (ie those who did not take the AP

Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are

weighted by the inverse probability of completing the survey Standard errors clustered by School

x Cohort are in parentheses and p-values are in brackets

26

Table 4

Treatment Contrast (Composite Variables)

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Academically Challenging Curriculum -033 031 080

(010) (024)

[000] [000]

Project-Based Independent Classroom

Activities -006 013 033

(007) (017)

[007] [006]

Integrated Use of Technology

-011 011 028

(008) (019)

[019] [014]

Number of Observations 1417

Notes To construct these composite variables we first converted the values on each component

variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest

category was set to 10 the lowest to 00 and the remaining categories evenly spaced between

00 and 10 We then averaged and standardized these converted values Results are weighted by

the inverse probability of completing the survey Online Appendix Table 5 provides the list of

component variables Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

27

Table 5

AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Science Skill -019 009 023

(006) (016)

[015] [014]

STEM Interest 062 004 009

(002) (007)

[016] [016]

Confidence in College

Science 092 -004 -010

(002) (005)

[011] [006]

Stress 012 007 017

(003) (007)

[002] [001]

Grades in Science Courses 280 -012 -029

(007) (016)

[008] [007]

Grades in Other Courses 314 -007 -018

(002) (006)

[000] [000]

Number of Observations 1819 for grades 1417 for other

outcomes

Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of

participating students STEM interest =1 if high or some interest in pursuing a STEM degree or

=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to

complete a college science course or =0 if somewhat not confident or not at all confident Stress=

1 if most recent science course had strong negative or negative impact on physical or emotional

health or =0 if strong positive impact positive impact or no impact Grades in science and other

courses are obtained from student transcripts and measure grades during the study year

Results with the exception of grades during study year are weighted by the inverse probability of

completing the survey Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

Table 6

Robustness Checks of Main ITT Results

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Outcome

Control

Group

Complie

r Mean

Main

Result

s

Robus

t SE

p-value

(permutatio

n test)

Excludin

g High

School

56

Including

Imputatio

n of

Missing

Outcome

Variables

Excluding

Covariate

s

Excludin

g High

School

23

Lee

Lower

Boun

d

Lee

Upper

Boun

d

95

Confidence

Interval

from Lee

Bounds

Rati

o of

95

CI in

(11)

to

95

CI in

(7)

Science Skill -019 009 010 011 020 007 003 039

-

009

05

1 20

(006) (005) (000) (000) (000) (000) (007) (007)

[015] [006] [006] [020] [011] [001] [024] [072] [000]

STEM Interest 062 004 005 003 003 003 002 012

-

003

01

8 19

(002) (003) (000) (000) (000) (000) (003) (004)

[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College

Science 092 -004 -003 -006 -006 -004 -006 005

-

009

01

0 20

(002) (002) (000) (000) (000) (000) (002) (003)

[011] [005] [007] [037] [002] [003] [010] [000] [017]

Stress 012 007 005 006 008 007 001 011

-

005

01

5 16

(003) (002) (000) (000) (000) (000) (003) (002)

[002] [000] [000] [014] [007] [002] [002] [079] [000]

Grades in Science Courses 280 -012 -006 -010 -007 |

(007) (004) (000) (000) (000)

[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts

Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey

(002) (003) (000) (000) (000) |

[000] [001] [001] [000] [001] [038]

Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than

standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby

a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of

43

the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)

reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the

experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply

imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and

where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)

from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those

treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and

control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to

derive confidence interval for the treatment effect itself)

30

1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the

Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the

effect of advanced high school courses more generally often without distinctions between AP

and other rigorous course options Nearly all of these nonexperimental studies find large positive

effects of rigorous secondary school courses particularly those in math and science on studentsrsquo

high school postsecondary and labor market performance (eg Altonji 1995 Attewell and

Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long

Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer

an AP course are encouraged (though not required) to attend a professional development

training The Board and other independent agencies offer several workshops with the most

extensive training being the AP summer institute a week-long training that is led by an

experienced AP instructor Teachers are then expected to develop their syllabi for the course and

submit them to the Board for review A team of auditors at the Board review each syllabus and

grant permission to a school to label the course as AP on course catalogs and student transcripts

once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they

do not meet the requirements upon original submission College Board (2017b) contains a

discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for

assessment (ie course delivery and student performance are not assessed by the Board) In

order to effectively run an AP Biology or Chemistry course teachers require access to a well-

equipped classroom and laboratory including all supplies necessary to engage in

experimentation (eg beakers solutions microscopes measuring equipment) Most of the

teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of

learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry

reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses

were based upon recommendations from the National Science Foundation the National Research

Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently

influenced by their frames of reference in ways that other assessments of these traits (eg

external observations) may be less influenced By increasing the standard to which they compare

themselves studentsrsquo confidence may decrease This feature of most self-assessments could be

considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et

al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome

depends to some extent on how these changes in perceived ability influence other behaviors

such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and

Biology I and Chemistry I for AP Biology with no additional requirements beyond these

prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week

training course classroom supplies (eg lab materials textbooks) and to compensate schools

for the staff time required for study administration efforts We also offered $1000 compensation

for an individual selected by the school to serve as a liaison between the study team and the

31

school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of

students which would have powered the study to detect effect sizes smaller than those detected

here We faced several challenges in recruiting schools to participate even with the monetary

incentives Some schools were uncomfortable with randomization across classrooms while

others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the

course would be offered We also made some assignments on a rolling basis as additional

consentassent forms were submitted We have no information on the students who were deemed

eligible by the school to take the new AP science course but who did not sign the consent form

to participate As these students did not participate we do not have permission to obtain

information on their characteristics (eg via transcripts) and for most schools we do not know

the number of such students 10 Participating districts include Anaheim Union High School District California East Side

Union High School District California Lynwood Unified School District California Jefferson

Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg

Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public

Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville

Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate

degrees held by teachers nationally are likely to be in education (not STEM) Thus the study

teachers are less likely to have a graduate degree but not necessarily less likely to have STEM

training We also did not survey teachers regarding their Teach for America (TFA) experience

but it is possible that the relatively high share of STEM undergraduate degrees could be driven

by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last

pilot test included 140 students) prior to administering the tool to study participants Reliability

metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of

the development of the assessment instrument in the survey can be found in Seeratan et al

(2017) 13 Each year in the spring semester our team administered and collected the participant surveys

during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However

if study participants who did not take the survey differ in unobserved ways then our reweighting

based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo

characteristics before imputation of missing values (as described below) these results are very

similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact

that some students did not have 10th grade scores) we created one reading and math score for

each student that is the average of both scores or just the 8th grade score For the 23 participating

students who were in 10th grade during the year in which the AP course was offered to their

cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be

endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of

enrollment in the course so the lack of balance is simply due to unlucky randomization rather

32

than manipulation by school administrators We considered implementing a randomized block

design to avoid such issues but found it infeasible to obtain the necessary test score information

prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was

allowed to register for the new class We added an entire planning year to our study design to

avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by

Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We

find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these

six outcomes which suggests that generalizing our estimated treatment effects to the full control

group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly

offered enrollment in an AP course and then given the option of Chemistry or Biology To

account for the two courses offered we treat the school as two separate groups School-

Chemistry and School-Biology For those students who were not offered an AP course we

randomly assign them to one of two control groups proportional to the number of treated

students who chose each course For example if 60 of the treated students chose Biology then

we randomly assign 60 of the control students to the School-Biology control group In Section

VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a

probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1

if student i in school by cohort j completed any part of the end-of-year survey Xi is the same

vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed

effects and Φ() is the cumulative normal distribution function The results of this regression are

included in Online Appendix Table 2 Students who had higher pre-treatment grades Black

students those who were not disabled and those who took prerequisite courses were more likely

to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives

more weight in the regression to study participants who completed the survey and yet had pre-

study characteristics that were similar to those study participants who did not complete the

survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and

with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we

observe each outcome variable This follows a multiple imputation then deletion strategy

suggested by Hippel (2007) which improves efficiency while protecting against problematic

imputed outcome values As a robustness check Section VC provides results including

imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually

challenging curriculum with more homework than non-AP complier students Treatment group

students are also more likely to report that the students in their class were driven to succeed and

that the teacher set high standards The AP science class also involved more student-led projects

or experiments hands on learning and small group work all activities that are deemed to be

essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)

Yet we do not find strong evidence that students in AP classes were more likely to present what

they learned apply their knowledge to solve a new problem or work independently and none of

the component measures of technology usage were statistically significantly affected Nor did

33

treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear

better able to implement the academic rigor expected of an AP science class than some of the

inquiry-based approaches that the College Board intends for AP science We do not find

evidence that taking AP science led students to be more likely to report that they found their

course more interesting which may reflect the inability of the teachers to fully implement a

creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects

that might render our estimated effects smaller A research design with randomization both

across and within schools would allow for estimation of spillover effects but such a design was

infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers

received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the

weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors

in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple

outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons

(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same

three outcomes that reach statistical significance without applying the correction (shown in

Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys

from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we

have included the students from cohort 1 of high school number 23 where nonresponse was due

mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes

(including student prior academic preparation raceethnicity gender and teacher preparation)

We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in

science and grades in other courses) Some of the differences in the point estimates were quite

large yet so too were the standard errors For instance five of the seven estimated differential

treatment effects on science skill exceed 025 standard deviations with p-values that fall in the

suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse

on all three cohorts of study participants Once data collection is complete we will have the

ability to examine the effect of AP science on college enrollment college selectivity and college

completion

Page 12: Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown,

12

conventionally high levels of confidence (p-value = 014) and consequently refer to these results

as more suggestive than definitive AP science also increased compliersrsquo interest in pursuing a

STEM degree should they enroll in college by 9 percentage points up from a control group

complier mean of 62 percent with again more suggestive than definitive results at traditional

levels of statistical inference (p-value = 016)

Table 5 provides stronger evidence of negative treatment effects on studentsrsquo confidence in

their ability to succeed in a college science course Among control group compliers 92 percent

express that they are at least somewhat confident in their ability to succeed in a college science

course These high levels of confidence are perhaps not surprising since all of our sample

participants demonstrated interest in taking AP Chemistry or Biology as a result of signing the

study assent forms Taking AP science substantially lowered participantsrsquo likelihood of being at

least somewhat confident in their ability to complete college courses in science (down 10

percentage points p-value = 006) We also find large effects of the AP course on studentsrsquo self-

reported stress levels Among control group compliers 12 percent stated that their most recent

science class had a negative or strong negative impact on their stress levels (where a negative

impact indicates more stress) Taking AP science more than doubles this rate raising the

likelihood of stating a negative impact by 17 percentage points (p-value = 001) In results

available from the authors we also examine the effect of taking AP on the full distribution of

studentrsquos self-reported confidence and stress levels We find that taking AP science increases

studentsrsquo likelihood of reporting strong negative impacts on stress by 5 percentage points (p-

value = 005) above the control group complier mean of 2 percent

In addition to experiencing a loss in confidence and an increase in stress treatment group

studentsrsquo grades suffered We estimate that taking AP science reduced studentsrsquo grades in their

science courses by 029 points (p-value = 007) Relative to a control group complier mean of

280 taking AP science lowers studentsrsquo science GPAs during the study year (usually their junior

year) from around a B- to a C+26 This decline is addressed to some degree by high schools that

use a weighted grade point average to upweight grades from AP courses The last row of Table 5

provides our estimated effects of AP science on studentsrsquo grades in other courses AP science

takers score approximately 018 grade points lower than control group compliers in non-science

courses during the study year (p-value below 001) These results suggest that students may be

shifting their effort away from their non-AP classes in order to meet the demands of the

challenging AP course An average of these impacts weighted by studentsrsquo share of credits in

science during the study year assuming that they take AP science (024) suggests that taking AP

science lowers studentsrsquo overall grades by 021 during the year ((-029 times 024) + (-018 times

076))

With our estimates in hand we can easily compute the adjustment that would leave the

studentrsquos GPA during the study year unaffected For students who took AP Biology or Chemistry

as result of this experiment the share of their classes in any AP science subject is predicted to be

14 percent (ie 002 + 012 from Table 3) If these studentsrsquo grades in AP science courses were

boosted by 146 (021014) their GPAs during the study year would be unaffected by their

enrollment in these AP courses This 146 boost is close to the higher end of the practices

documented in Klopfenstein and Lively (2016)27

C Robustness Checks

Table 6 presents a variety of robustness checks of the ITT estimates on our six main outcomes

The first two columns of this table repeat the findings previously shown in Table 5 Columns (3)

and (4) present alternate methods for inference Column (3) reports robust standard errors and

13

Column (4) reports the results of a permutation test where we randomly assign a pseudo

treatment and compute the share of 1000 permutations where the absolute value of the estimated

pseudo treatment effect exceeds the absolute value of the estimated treatment effect shown in

Column (2)28 The resulting p-values from this permutation test are similar to the results using

robust standard errors (shown in Column (3)) resulting in five of the six outcomes with p-values

of less than 01029

Columns (5) through (7) of Table 6 show that the results are robust to (a) dropping the one

high school that offered both AP Biology and AP Chemistry as part of the study (b) including

observations with multiply-imputed missing outcome variables and (c) excluding the high

school with the lowest survey response rate30 Column (8) shows the results when we exclude all

of the Xi covariates where we find much larger estimated positive effects on scientific inquiry

skills and smaller estimated negative effects on grades The differences in the treatment effects

on the remaining three outcomes are modest These results likely reflect the fact that students

who were randomly assigned into the treatment group have higher pre-treatment grades and

reading and math test scores all covariates that strongly correlate with science skill and future

grades

Columns (9) through (12) of Table 6 use the Lee (2009) method to place bounds on our

estimates due to potential nonresponse bias in the student survey used for the first four outcomes

This method trims particular observations from the treatment group (in this case) until it matches

the response rate of the control group The lower (upper) bound estimate trims the treatment

observations with the highest (lowest) values of the outcome Using these lower and upper bound

estimates we compute the 95 percent confidence interval for the treatment effect itself by

applying the Imbens and Manski (2004) method Consistent with our main findings the upper

and lower bound points estimates are positive for science skill (003 and 039 sd) interest in

pursuing a STEM degree (2 and 12 percentage points) and stress (1 and 11 percentage points)

However the 95 percent confidence intervals overlap zero in all cases and are roughly double the

size of the ordinary confidence intervals These results suggest that some additional caution

should be considered in evaluating the effects from outcomes based on the study survey31

Finally we would have liked to report the results of theoretically motivated heterogeneity

analyses yet we lack the statistical power needed to test heterogeneity with a high level of

confidence For example Figure 3 shows a quantile regression conditional on Xi with science

skill as the outcome We find that the point estimates at every quantile are insignificantly

different from the 009 ITT point estimate reported in Table 5 yet the 95 confidence intervals

fail to rule out large positives and negatives Additional heterogeneity results can be found in the

Online Appendix32

VI Conclusion

Most admissions committees at bachelorrsquos degree-granting institutions rely on applicantsrsquo AP

course and exam participation as signals of subject-matter skill and interest rendering the

relationship between AP uptake and college enrollment somewhat deterministic There has been

almost no empirical work to support the theory that AP disproportionately endows high school

students with greater human capital than the other courses available to them Many students

educators and parents have also complained that the rigor of the AP pro- gram causes students to

lose confidence gain stress and perform poorly in other courses We evaluate these claims with

experimental evidence on the impact of AP Biology and Chemistry courses on studentsrsquo skills

14

interests and beliefs We recruited 23 schools that had not previously offered AP Biology or

Chemistry and were willing to permit us to randomize student access to the newly offered

course At the time of our school recruitment an estimated 50 percent of US high schools

already offered AP science classes and they tended to be in relatively higher-income

communities disproportionately serving White students (Malkus 2016) Our study drew from the

remaining population of schools where teachers had lower levels of training than science

teachers nationally and students were disproportionately non-White and poor Consequently our

results on AP impacts best generalize to schools like these that are on the cusp of deciding

whether to offer an AP science course

The estimates suggest that AP science led to improvements in science skill and STEM

interest above the courses that these students would otherwise take Prior research points to

longer-run benefits of AP including a higher likelihood of college enrollment and completion as

well as possible earnings gains (Jackson 2010 2014) Our findings suggest that these long-term

effects are at least partially driven by genuine increases in skill and not due solely to

postsecondary admissions and credit-granting policies33 We also find that AP science classes

substantially increase studentsrsquo stress levels and reduce their confidence in completing a college

science course Students who take AP science also receive lower grades in science and in other

(non-science) courses The cognitive gains from AP science are consistent with evidence that

higher levels of pressure and a lower level of confidence cause students to learn more than they

would otherwise And some of the negative effect on grades can be offset by upwardly weighting

grades in advanced courses

Although we have no direct way to convert our study impacts into monetary values for

students or society our evidence suggests that schools and districts are not making unwise or

costly investments in AP Calculating the differential cost to deliver an AP course versus another

level course in the same subject is difficult given that few schools document per-course

expenditures One recent analysis of a US district that relied on teacher salaries and course

assignments offers a partial cost-analysis Roza (2009) finds approximately $360 more in per-

pupil expenditures to deliver AP versus honors due primarily to smaller class sizes and more

senior teachers in AP This cost does not factor in the time that teachers spend retraining

themselves to teach the new curriculum At the same time relative to other policies aimed at

increasing human capital in high school that are often more costly to implement (such as

reducing class size) offering an AP course may be one of the least expensive options

This study offers the first credible estimates on the impact of a curriculum that is now offered

in the majority of the nationrsquos high schools and used by most postsecondary institutions to assess

applicant potential Our findings offer evidence to support and refute some of the claims made

about the AP program At the same time many important questions remain about differential AP

course impacts along student teacher and school attributes and on different parts of the outcome

distributions What are the general equilibrium effects of AP expansion for instance on college

admissions decisions as AP expands into schools with fewer resources Do AP courses generate

spillover effects on non-AP course-takers via changes in peer interactions and changes in how

teachers teach their non-AP classes These are all questions that warrant further research

15

References

Abadie Alberto Susan Athey Guido W Imbens and Jeffrey Wooldridge 2017 ldquoWhen Should

you Adjust Standard Errors for Clusteringrdquo NBER Working Paper No 24003

Cambridge MA NBER

Adelman Clifford 2006 The Toolbox Revisited Paths to Degree Completion from High School

Through College Washington DC US Department of Education

Aguilar Lauren Greg Walton and Carl Wieman 2014 ldquoPsychological Insights for Improved

Physics Teachingrdquo Physics Today 67 (5) 43ndash49

Altonji Joseph G 1995 ldquoThe Effects of High School Curriculum on Education and Labor

Market Outcomesrdquo The Journal of Human Resources 30 (3) 409ndash438

Anderson Carl R 1976 ldquoCoping Behaviors as Intervening Mechanisms in the Inverted-U-stress-

performance Relationshiprdquo Journal of Applied Psychology 61 (1) 30ndash34

Attewell Paul and Thurston Domina 2008 ldquoRaising the Bar Curricular Intensity and

Academic Performancerdquo Educational Evaluation and Policy Analysis 30 (1) 51ndash71

Avery Christopher Oded Gurantz Michael Hurwitz and Jonathan Smith 2018 ldquoShifting

College Majors in Response to Advanced Placement Exam Scoresrdquo Journal of Human

Resources 53 (4) 918ndash956

Benjamini Yoav and Yosef Hochberg 1995 ldquoControlling the False Discovery Rate A Practical

and Powerful Approach to Multiple Testingrdquo Journal of the Royal Statistical Society 57

(1) 289ndash300

Bennett J S Hogarth F Lubben B Campbell and A Robinson 2010 ldquoTalking Science The

Research Evidence on the Use of Small Group Discussions in Science Teachingrdquo

International Journal of Science Education 32 (1) 69ndash95

Berger Joe 2006 ldquoDemoting Advanced Placementrdquo The New York Times October 4

Boekaerts Monique and Jeroen S Rozendaal 2010 ldquoUsing Multiple Calibration Indices in

Order to Capture the Complex Picture of What Affects Studentsrsquo Accuracy of Feeling of

Confidencerdquo Learning and Instruction 20 (5) 372ndash382

Bound John Brad Hershbein and Bridget Terry Long 2009 ldquoPlaying the Admissions Game

Student Reactions to Increasing College Competitionrdquo The Journal of Economic

Perspectives 23 (4) 119ndash146

Bowie Liz 2013 ldquoMaryland Schools have been Leader in Advanced Placement but Results are

Mixedrdquo The Baltimore Sun August 17

Bush George W 2006 ldquoState of the Union Address by the Presidentrdquo Washington DC The

White House

Chiu Ming Ming and Robert M Klassen 2010 ldquoRelations of Mathematics Self-Concept and its

Calibration with Mathematics Achievement Cultural Differences among Fifteen-year-

olds in 34 Countriesrdquo Learning and Instruction 20 (1) 2ndash17

Clotfelter Charles T Helen F Ladd and Jacob L Vigdor 2010 ldquoTeacher Credentials and

Student Achievement in High School Across-Subject Analysis with Student Fixed

Effectsrdquo Journal of Human Resources 45 (3) 655ndash681

College Board 2002 Equity Policy Statement New York NY

__________ 2011a AP Biology Curriculum Framework 2012-2013 New York NY

__________ 2011b AP Chemistry Curriculum Framework 2013-2014 New York NY

__________ 2017a AP Course and Exam Redesign New York NY

__________ 2017b AP Course Audit New York NY

__________ 2018 AP Program Participation and Performance Data 2018 New York NY

16

Davis Jennifer R 2014 ldquoA Little Goes a Long Way Pressure for College Students to Succeedrdquo

Journal of Undergraduate Research 12 (1) 1ndash9

Dobbie Will and Roland G Fryer Jr 2015 ldquoThe medium-term impacts of high-achieving

charter schoolsrdquo Journal of Political Economy 123 (5) 985ndash1037

Dougherty Chrys and Lynn Mellor 2009 ldquoPreparation Mattersrdquo National Center for

Educational Achievement Washington DC

Dounay Zinth Jennifer 2016 ldquo50-State Comparison Advanced Placement Policiesrdquo Education

Commission of the States

Drew Christopher 2011 ldquoRethinking Advanced Placementrdquo The New York Times January 7

Duffett Ann and Steve Farkas 2009 ldquoGrowing Pains in the Advanced Placement Program Do

Tough Trade-offs Lie Aheadrdquo Thomas B Fordham Institute Washington DC

Ellis Jessica Bailey K Fosdick and Chris Rasmussen 2016 ldquoWomen 15 Times More Likely to

Leave STEM Pipeline after Calculus Compared to Men Lack of Mathematical

Confidence a Potential Culpritrdquo PLOS ONE 11 (7) 1ndash14

Foust Regan Clark Holly Hertberg-Davis and Carolyn M Callahan 2009 ldquoStudentsrsquo

Perceptions of the Non-academic Advantages and Disadvantages of Participation in

Advanced Placement Courses and International Baccalaureate Programsrdquo Adolescence

44 (174) 289ndash312

Geiser Saul and Veronica Santelices 2004 ldquoThe Role of Advanced Placement and Honors

Courses in College Admissionsrdquo Center for Studies in Higher Education Research

Occasional Paper Series CSHE404

Goodman Joshua Samuel 2012 ldquoThe Labor of Division Returns to Compulsory Math

Courseworkrdquo Unpublished Manuscript

Harel O 2009 ldquoThe Estimation of R-squared and Adjusted R-squared in Incomplete Data Sets

Using Multiple Imputationrdquo Journal of Applied Statistics 36 (10) 1109ndash1118

Hippel Paul T von 2007 ldquoRegression with Missing Ys An Improved Strategy for Analyzing

Multiply Imputed Datardquo Sociological Methodology 37 (1) 83ndash117

Holstead Michael S Terry E Spradlin Margaret E McGillivray and Nathan Burroughs 2010

ldquoThe Impact of Advanced Placement Incentive Programsrdquo Center for Evaluation and

Education Policy Indiana University Education Policy Brief 8(1)

Hopkins Katy 2012 ldquoWeigh the Benefits Stress of AP Courses for Your Studentrdquo US News

the World Report May 10

Huber Martin 2013 ldquoA Simple Test for the Ignorability of Non-compliance in Experimentsrdquo

Economics Letters 120 (3) 389ndash391

Imbens G and F Manski 2004 ldquoConfidence Intervals for Partially Identified Parametersrdquo

Econometrica 72 (6) 1845ndash1857

Jackson C Kirabo 2010 ldquoA Little Now for a Lot Later A Look at a Texas Advanced Placement

Incentive Programrdquo Journal of Human Resources 45 (3) 591ndash639

__________ 2014 ldquoDo College-Preparatory Programs Improve Long-Term Outcomesrdquo

Economic Inquiry 52 (1) 72ndash99

Joensen Juanna Schroslashter and Helena Skyt Nielsen 2009 ldquoIs there a Causal Effect of High

School Math on Labor Market Outcomesrdquo Journal of Human Resources 44 (1) 171ndash

198

Kim Emily 2015 ldquoAP Classes often Translate to Advanced Pressurerdquo Los Angeles Times

September 22

Klopfenstein Kristin and Kit Lively 2016 ldquoDo Grade Weights Promote More Advanced

17

Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324

Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement

Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891

__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and

Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds

Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188

Cambridge Harvard Education Press

Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla

Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)

287ndash 313

Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on

Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102

Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations

of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347

(6219) 262ndash265

Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math

and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic

Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student

STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher

Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking

on Secondary and Postsecondary Successrdquo American Educational Research Journal 49

(2) 285ndash322

Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP

Expansion Can Schools in Less-Resourced Communities Successfully Implement

Advanced Placement Science Coursesrdquo Conditionally accepted by Educational

Researcher

Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo

American Enterprise Institute Washington DC

Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23

McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy

Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of

Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-

144) US Department of Education Washington DC National Center for Education

Statistics

National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of

Mathematics and Science in US High Schoolsrdquo Washington DC National Academies

Press

__________ 2012 A Framework for K-12 Science Education Practices Crosscutting

Concepts and Core Ideas Washington DC The National Academies Press

Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC

Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data

Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures

Version 10 Stanford University

Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic

Analysis amp Policy 4 (1) 1ndash30

18

Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The

Review of Economics and Statistics 86 (2) 497ndash513

Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)

Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of

Advanced High School Coursework in Increasing STEM Career Interestrdquo Science

Educator 23 (1) 1ndash13

Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework

in College Admission Decisionsrdquo College and University 82 (4) 7ndash14

Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan

Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific

Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo

Educational Measurement Forthcoming

Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where

it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor

Economics 35 (1) 67ndash147

Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An

Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732

Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual

differencesrdquo Personality and Individual Differences 21 (6) 971ndash986

Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of

Cross-Cultural Psychology 45 (5) 821ndash837

Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid

Growthrdquo The New York Times April 29

Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo

Liberal Education 94 (3) 38ndash43

The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo

Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo

Education Trust June 5

Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and

Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-

001) US Department of Education Washington DC National Center for Education

Statistics

Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13

Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate

US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the

Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced

Placement Testsrdquo Washington DC

Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of

Advanced Placementrdquo Progressive Policy Institute Washington DC

West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth

Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring

Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation

and Policy Analysis 38 (1) 148ndash170

Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity

of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482

19

Figure 1

Geographic Distribution of Participating Districts

20

Figure 2

Participating Districts Neighborhood Socioeconomic Status and School Test Scores

Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school

district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos

neighborhood defined as the first principal component factor score based on measures of median

income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed

household rate and unemployment rate Y-axis is the districtrsquos average test score in grade

equivalents based on the averaged spring math and English scores for students in grades 3-8 for

2009-2013 with the expected level of achievement standardized to zero The size of each circle

is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using

Statarsquos default settings and roughly shows the predicted test score as a function of the

neighborhoodrsquos SES

21

Figure 3

Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile

Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects

Corresponding OLS estimate shown by the dashed horizontal line Science skill has been

standardized to have a mean of 0 and SD of 1 for the full sample of participating students

Results are weighted by the inverse probability of completing the survey

22

Table 1

Participating Schools and Teachers Compared to Other US High Schools and High School

Science Teachers Panel A Schools Participating Others

Average Enrollment 1409 723

Free or Reduced-Price Lunch 0700 0438

Asian 0055 0050

Black 0349 0154

Hispanic 0410 0221

White 0164 0537

Adjusted Cohort Graduation Rate 0843 0802

District Instruction Expenditures Per Pupil $6561 $5636

District Student Services Expenditures Per Pupil $3787 $3385

Panel B Teachers Participating Others

Age Under 30 0407 0160

Age 30-49 0432 0553

Age 50 or over 0161 0287

Female 0630 0536

Hispanic or Latino 0111 0051

Race American Indian or Alaska Native 0000 0009

Race Asian American 0111 0041

Race Black 0111 0060

Race Native Hawaiian or other Pacific Islander 0000 0004

Race White 0778 0896

Years of Experience 103 132

Years of Experience lt=2 0290 0085

Years of Experience lt=5 0481 0234

Hold a Teaching Certificate 0926 0945

Undergraduate Major in STEM 0944 0747

Single Subject Credential in Science 0630 0823

Masterrsquos Degree or Higher 0356 0615

Previously Taught AP Course 0469 NA

Previously Taught AP IB or Honors Course 0796 NA

Number of Professional Development Trainings 309 NA

in the Past 5 years (0-5)

Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts

httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public

high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a

9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the

Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey

httpsncesedgovsurveyssass Others in Panel B refers to public and private high school

teachers in the US High school science teachers are defined as teachers of grades 9-12 whose

main teaching assignment is in the natural sciences

23

Table 2

TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics

(1) (2) (3) (4) (5) (6)

Full Sample Survey Sample

Pre-Treatment Characteristic

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Age as of October of 11th Grade 166 -003 -007 166 -001 -001

(002) (007) (003) (009)

[019] [035] [065] [094]

Math Exam Score 038 008 025 044 007 030

(004) (010) (005) (016)

[008] [002] [017] [006]

Reading Exam Score 029 010 018 036 009 017

(003) (012) (004) (017)

[000] [014] [002] [031]

HS Grade Point Average 316 005 020 323 006 013

(003) (008) (003) (010)

[014] [002] [006] [020]

Female 059 000 010 061 -001 011

(003) (006) (004) (007)

[099] [010] [073] [012]

Asian American 012 002 010 012 003 010

(002) (005) (001) (007)

[027] [006] [007] [012]

Black 032 -002 -006 027 000 -005

(002) (006) (002) (005)

[029] [028] [088] [040]

Hispanic Native American or Multiracial 031 001 005 033 001 005

24

(002) (006) (002) (007)

[055] [041] [081] [051]

Disabled 002 000 -001 001 000 -001

(001) (001) (001) (001)

[093] [024] [057] [05]

Gifted 013 003 000 014 002 001

(002) (005) (002) (009)

[006] [100] [025] [089]

English Language Learner 005 001 002 004 001 004

(001) (002) (001) (003)

[041] [039] [054] [022]

Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007

(002) (007) (003) (009)

[066] [077] [072] [045]

Language Other than English Spoken at Home 034 002 003 035 001 004

(002) (007) (002) (007)

[032] [073] [059] [056]

Took Recommended Prerequisite Courses 079 000 009 079 002 005

(002) (004) (002) (005)

[084] [004] [043] [031]

Number of Observations 1819 1417

Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by

School x Cohort are in parentheses and p-values are in brackets

25

Table 3

First Stage Impacts on AP Course Enrollment and Overall Course Enrollment

(1) (2) (3) (4) (5) (6)

Full Sample Survey Respondents

Outcome

Control

Group

Mean

ITT

LATE

Control

Group

Mean

ITT

LATE

AP Treatment Course Enrollment 019 038 024 039

(005) (006)

[000] [000] Share of Credits During Study Year in

AP Science 003 004 011 003 004 010

(001) (001) (001) (001)

[000] [000] [000] [000]

All AP 013 004 011 014 004 010

(001) (002) (001) (002)

[000] [000] [000] [000]

Other Advanced Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [023] [020] [020]

All Other Advanced 025 -001 -003 025 -001 -003

(001) (002) (001) (003)

[023] [023] [030] [030]

Regular Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [020] [024] [019]

All Regular 062 -003 -009 061 -003 -007

(001) (003) (001) (003)

[002] [000] [007] [003]

Number of Observations 1819 1417

Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating

Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation

(1) Course-taking information collected from student transcripts Control Group Mean uses the

full control group for the first outcome (ie AP Treatment Course Enrollment) and those control

group members who complied with their assignment (ie those who did not take the AP

Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are

weighted by the inverse probability of completing the survey Standard errors clustered by School

x Cohort are in parentheses and p-values are in brackets

26

Table 4

Treatment Contrast (Composite Variables)

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Academically Challenging Curriculum -033 031 080

(010) (024)

[000] [000]

Project-Based Independent Classroom

Activities -006 013 033

(007) (017)

[007] [006]

Integrated Use of Technology

-011 011 028

(008) (019)

[019] [014]

Number of Observations 1417

Notes To construct these composite variables we first converted the values on each component

variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest

category was set to 10 the lowest to 00 and the remaining categories evenly spaced between

00 and 10 We then averaged and standardized these converted values Results are weighted by

the inverse probability of completing the survey Online Appendix Table 5 provides the list of

component variables Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

27

Table 5

AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Science Skill -019 009 023

(006) (016)

[015] [014]

STEM Interest 062 004 009

(002) (007)

[016] [016]

Confidence in College

Science 092 -004 -010

(002) (005)

[011] [006]

Stress 012 007 017

(003) (007)

[002] [001]

Grades in Science Courses 280 -012 -029

(007) (016)

[008] [007]

Grades in Other Courses 314 -007 -018

(002) (006)

[000] [000]

Number of Observations 1819 for grades 1417 for other

outcomes

Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of

participating students STEM interest =1 if high or some interest in pursuing a STEM degree or

=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to

complete a college science course or =0 if somewhat not confident or not at all confident Stress=

1 if most recent science course had strong negative or negative impact on physical or emotional

health or =0 if strong positive impact positive impact or no impact Grades in science and other

courses are obtained from student transcripts and measure grades during the study year

Results with the exception of grades during study year are weighted by the inverse probability of

completing the survey Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

Table 6

Robustness Checks of Main ITT Results

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Outcome

Control

Group

Complie

r Mean

Main

Result

s

Robus

t SE

p-value

(permutatio

n test)

Excludin

g High

School

56

Including

Imputatio

n of

Missing

Outcome

Variables

Excluding

Covariate

s

Excludin

g High

School

23

Lee

Lower

Boun

d

Lee

Upper

Boun

d

95

Confidence

Interval

from Lee

Bounds

Rati

o of

95

CI in

(11)

to

95

CI in

(7)

Science Skill -019 009 010 011 020 007 003 039

-

009

05

1 20

(006) (005) (000) (000) (000) (000) (007) (007)

[015] [006] [006] [020] [011] [001] [024] [072] [000]

STEM Interest 062 004 005 003 003 003 002 012

-

003

01

8 19

(002) (003) (000) (000) (000) (000) (003) (004)

[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College

Science 092 -004 -003 -006 -006 -004 -006 005

-

009

01

0 20

(002) (002) (000) (000) (000) (000) (002) (003)

[011] [005] [007] [037] [002] [003] [010] [000] [017]

Stress 012 007 005 006 008 007 001 011

-

005

01

5 16

(003) (002) (000) (000) (000) (000) (003) (002)

[002] [000] [000] [014] [007] [002] [002] [079] [000]

Grades in Science Courses 280 -012 -006 -010 -007 |

(007) (004) (000) (000) (000)

[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts

Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey

(002) (003) (000) (000) (000) |

[000] [001] [001] [000] [001] [038]

Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than

standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby

a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of

43

the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)

reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the

experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply

imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and

where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)

from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those

treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and

control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to

derive confidence interval for the treatment effect itself)

30

1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the

Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the

effect of advanced high school courses more generally often without distinctions between AP

and other rigorous course options Nearly all of these nonexperimental studies find large positive

effects of rigorous secondary school courses particularly those in math and science on studentsrsquo

high school postsecondary and labor market performance (eg Altonji 1995 Attewell and

Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long

Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer

an AP course are encouraged (though not required) to attend a professional development

training The Board and other independent agencies offer several workshops with the most

extensive training being the AP summer institute a week-long training that is led by an

experienced AP instructor Teachers are then expected to develop their syllabi for the course and

submit them to the Board for review A team of auditors at the Board review each syllabus and

grant permission to a school to label the course as AP on course catalogs and student transcripts

once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they

do not meet the requirements upon original submission College Board (2017b) contains a

discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for

assessment (ie course delivery and student performance are not assessed by the Board) In

order to effectively run an AP Biology or Chemistry course teachers require access to a well-

equipped classroom and laboratory including all supplies necessary to engage in

experimentation (eg beakers solutions microscopes measuring equipment) Most of the

teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of

learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry

reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses

were based upon recommendations from the National Science Foundation the National Research

Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently

influenced by their frames of reference in ways that other assessments of these traits (eg

external observations) may be less influenced By increasing the standard to which they compare

themselves studentsrsquo confidence may decrease This feature of most self-assessments could be

considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et

al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome

depends to some extent on how these changes in perceived ability influence other behaviors

such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and

Biology I and Chemistry I for AP Biology with no additional requirements beyond these

prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week

training course classroom supplies (eg lab materials textbooks) and to compensate schools

for the staff time required for study administration efforts We also offered $1000 compensation

for an individual selected by the school to serve as a liaison between the study team and the

31

school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of

students which would have powered the study to detect effect sizes smaller than those detected

here We faced several challenges in recruiting schools to participate even with the monetary

incentives Some schools were uncomfortable with randomization across classrooms while

others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the

course would be offered We also made some assignments on a rolling basis as additional

consentassent forms were submitted We have no information on the students who were deemed

eligible by the school to take the new AP science course but who did not sign the consent form

to participate As these students did not participate we do not have permission to obtain

information on their characteristics (eg via transcripts) and for most schools we do not know

the number of such students 10 Participating districts include Anaheim Union High School District California East Side

Union High School District California Lynwood Unified School District California Jefferson

Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg

Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public

Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville

Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate

degrees held by teachers nationally are likely to be in education (not STEM) Thus the study

teachers are less likely to have a graduate degree but not necessarily less likely to have STEM

training We also did not survey teachers regarding their Teach for America (TFA) experience

but it is possible that the relatively high share of STEM undergraduate degrees could be driven

by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last

pilot test included 140 students) prior to administering the tool to study participants Reliability

metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of

the development of the assessment instrument in the survey can be found in Seeratan et al

(2017) 13 Each year in the spring semester our team administered and collected the participant surveys

during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However

if study participants who did not take the survey differ in unobserved ways then our reweighting

based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo

characteristics before imputation of missing values (as described below) these results are very

similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact

that some students did not have 10th grade scores) we created one reading and math score for

each student that is the average of both scores or just the 8th grade score For the 23 participating

students who were in 10th grade during the year in which the AP course was offered to their

cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be

endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of

enrollment in the course so the lack of balance is simply due to unlucky randomization rather

32

than manipulation by school administrators We considered implementing a randomized block

design to avoid such issues but found it infeasible to obtain the necessary test score information

prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was

allowed to register for the new class We added an entire planning year to our study design to

avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by

Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We

find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these

six outcomes which suggests that generalizing our estimated treatment effects to the full control

group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly

offered enrollment in an AP course and then given the option of Chemistry or Biology To

account for the two courses offered we treat the school as two separate groups School-

Chemistry and School-Biology For those students who were not offered an AP course we

randomly assign them to one of two control groups proportional to the number of treated

students who chose each course For example if 60 of the treated students chose Biology then

we randomly assign 60 of the control students to the School-Biology control group In Section

VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a

probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1

if student i in school by cohort j completed any part of the end-of-year survey Xi is the same

vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed

effects and Φ() is the cumulative normal distribution function The results of this regression are

included in Online Appendix Table 2 Students who had higher pre-treatment grades Black

students those who were not disabled and those who took prerequisite courses were more likely

to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives

more weight in the regression to study participants who completed the survey and yet had pre-

study characteristics that were similar to those study participants who did not complete the

survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and

with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we

observe each outcome variable This follows a multiple imputation then deletion strategy

suggested by Hippel (2007) which improves efficiency while protecting against problematic

imputed outcome values As a robustness check Section VC provides results including

imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually

challenging curriculum with more homework than non-AP complier students Treatment group

students are also more likely to report that the students in their class were driven to succeed and

that the teacher set high standards The AP science class also involved more student-led projects

or experiments hands on learning and small group work all activities that are deemed to be

essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)

Yet we do not find strong evidence that students in AP classes were more likely to present what

they learned apply their knowledge to solve a new problem or work independently and none of

the component measures of technology usage were statistically significantly affected Nor did

33

treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear

better able to implement the academic rigor expected of an AP science class than some of the

inquiry-based approaches that the College Board intends for AP science We do not find

evidence that taking AP science led students to be more likely to report that they found their

course more interesting which may reflect the inability of the teachers to fully implement a

creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects

that might render our estimated effects smaller A research design with randomization both

across and within schools would allow for estimation of spillover effects but such a design was

infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers

received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the

weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors

in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple

outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons

(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same

three outcomes that reach statistical significance without applying the correction (shown in

Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys

from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we

have included the students from cohort 1 of high school number 23 where nonresponse was due

mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes

(including student prior academic preparation raceethnicity gender and teacher preparation)

We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in

science and grades in other courses) Some of the differences in the point estimates were quite

large yet so too were the standard errors For instance five of the seven estimated differential

treatment effects on science skill exceed 025 standard deviations with p-values that fall in the

suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse

on all three cohorts of study participants Once data collection is complete we will have the

ability to examine the effect of AP science on college enrollment college selectivity and college

completion

Page 13: Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown,

13

Column (4) reports the results of a permutation test where we randomly assign a pseudo

treatment and compute the share of 1000 permutations where the absolute value of the estimated

pseudo treatment effect exceeds the absolute value of the estimated treatment effect shown in

Column (2)28 The resulting p-values from this permutation test are similar to the results using

robust standard errors (shown in Column (3)) resulting in five of the six outcomes with p-values

of less than 01029

Columns (5) through (7) of Table 6 show that the results are robust to (a) dropping the one

high school that offered both AP Biology and AP Chemistry as part of the study (b) including

observations with multiply-imputed missing outcome variables and (c) excluding the high

school with the lowest survey response rate30 Column (8) shows the results when we exclude all

of the Xi covariates where we find much larger estimated positive effects on scientific inquiry

skills and smaller estimated negative effects on grades The differences in the treatment effects

on the remaining three outcomes are modest These results likely reflect the fact that students

who were randomly assigned into the treatment group have higher pre-treatment grades and

reading and math test scores all covariates that strongly correlate with science skill and future

grades

Columns (9) through (12) of Table 6 use the Lee (2009) method to place bounds on our

estimates due to potential nonresponse bias in the student survey used for the first four outcomes

This method trims particular observations from the treatment group (in this case) until it matches

the response rate of the control group The lower (upper) bound estimate trims the treatment

observations with the highest (lowest) values of the outcome Using these lower and upper bound

estimates we compute the 95 percent confidence interval for the treatment effect itself by

applying the Imbens and Manski (2004) method Consistent with our main findings the upper

and lower bound points estimates are positive for science skill (003 and 039 sd) interest in

pursuing a STEM degree (2 and 12 percentage points) and stress (1 and 11 percentage points)

However the 95 percent confidence intervals overlap zero in all cases and are roughly double the

size of the ordinary confidence intervals These results suggest that some additional caution

should be considered in evaluating the effects from outcomes based on the study survey31

Finally we would have liked to report the results of theoretically motivated heterogeneity

analyses yet we lack the statistical power needed to test heterogeneity with a high level of

confidence For example Figure 3 shows a quantile regression conditional on Xi with science

skill as the outcome We find that the point estimates at every quantile are insignificantly

different from the 009 ITT point estimate reported in Table 5 yet the 95 confidence intervals

fail to rule out large positives and negatives Additional heterogeneity results can be found in the

Online Appendix32

VI Conclusion

Most admissions committees at bachelorrsquos degree-granting institutions rely on applicantsrsquo AP

course and exam participation as signals of subject-matter skill and interest rendering the

relationship between AP uptake and college enrollment somewhat deterministic There has been

almost no empirical work to support the theory that AP disproportionately endows high school

students with greater human capital than the other courses available to them Many students

educators and parents have also complained that the rigor of the AP pro- gram causes students to

lose confidence gain stress and perform poorly in other courses We evaluate these claims with

experimental evidence on the impact of AP Biology and Chemistry courses on studentsrsquo skills

14

interests and beliefs We recruited 23 schools that had not previously offered AP Biology or

Chemistry and were willing to permit us to randomize student access to the newly offered

course At the time of our school recruitment an estimated 50 percent of US high schools

already offered AP science classes and they tended to be in relatively higher-income

communities disproportionately serving White students (Malkus 2016) Our study drew from the

remaining population of schools where teachers had lower levels of training than science

teachers nationally and students were disproportionately non-White and poor Consequently our

results on AP impacts best generalize to schools like these that are on the cusp of deciding

whether to offer an AP science course

The estimates suggest that AP science led to improvements in science skill and STEM

interest above the courses that these students would otherwise take Prior research points to

longer-run benefits of AP including a higher likelihood of college enrollment and completion as

well as possible earnings gains (Jackson 2010 2014) Our findings suggest that these long-term

effects are at least partially driven by genuine increases in skill and not due solely to

postsecondary admissions and credit-granting policies33 We also find that AP science classes

substantially increase studentsrsquo stress levels and reduce their confidence in completing a college

science course Students who take AP science also receive lower grades in science and in other

(non-science) courses The cognitive gains from AP science are consistent with evidence that

higher levels of pressure and a lower level of confidence cause students to learn more than they

would otherwise And some of the negative effect on grades can be offset by upwardly weighting

grades in advanced courses

Although we have no direct way to convert our study impacts into monetary values for

students or society our evidence suggests that schools and districts are not making unwise or

costly investments in AP Calculating the differential cost to deliver an AP course versus another

level course in the same subject is difficult given that few schools document per-course

expenditures One recent analysis of a US district that relied on teacher salaries and course

assignments offers a partial cost-analysis Roza (2009) finds approximately $360 more in per-

pupil expenditures to deliver AP versus honors due primarily to smaller class sizes and more

senior teachers in AP This cost does not factor in the time that teachers spend retraining

themselves to teach the new curriculum At the same time relative to other policies aimed at

increasing human capital in high school that are often more costly to implement (such as

reducing class size) offering an AP course may be one of the least expensive options

This study offers the first credible estimates on the impact of a curriculum that is now offered

in the majority of the nationrsquos high schools and used by most postsecondary institutions to assess

applicant potential Our findings offer evidence to support and refute some of the claims made

about the AP program At the same time many important questions remain about differential AP

course impacts along student teacher and school attributes and on different parts of the outcome

distributions What are the general equilibrium effects of AP expansion for instance on college

admissions decisions as AP expands into schools with fewer resources Do AP courses generate

spillover effects on non-AP course-takers via changes in peer interactions and changes in how

teachers teach their non-AP classes These are all questions that warrant further research

15

References

Abadie Alberto Susan Athey Guido W Imbens and Jeffrey Wooldridge 2017 ldquoWhen Should

you Adjust Standard Errors for Clusteringrdquo NBER Working Paper No 24003

Cambridge MA NBER

Adelman Clifford 2006 The Toolbox Revisited Paths to Degree Completion from High School

Through College Washington DC US Department of Education

Aguilar Lauren Greg Walton and Carl Wieman 2014 ldquoPsychological Insights for Improved

Physics Teachingrdquo Physics Today 67 (5) 43ndash49

Altonji Joseph G 1995 ldquoThe Effects of High School Curriculum on Education and Labor

Market Outcomesrdquo The Journal of Human Resources 30 (3) 409ndash438

Anderson Carl R 1976 ldquoCoping Behaviors as Intervening Mechanisms in the Inverted-U-stress-

performance Relationshiprdquo Journal of Applied Psychology 61 (1) 30ndash34

Attewell Paul and Thurston Domina 2008 ldquoRaising the Bar Curricular Intensity and

Academic Performancerdquo Educational Evaluation and Policy Analysis 30 (1) 51ndash71

Avery Christopher Oded Gurantz Michael Hurwitz and Jonathan Smith 2018 ldquoShifting

College Majors in Response to Advanced Placement Exam Scoresrdquo Journal of Human

Resources 53 (4) 918ndash956

Benjamini Yoav and Yosef Hochberg 1995 ldquoControlling the False Discovery Rate A Practical

and Powerful Approach to Multiple Testingrdquo Journal of the Royal Statistical Society 57

(1) 289ndash300

Bennett J S Hogarth F Lubben B Campbell and A Robinson 2010 ldquoTalking Science The

Research Evidence on the Use of Small Group Discussions in Science Teachingrdquo

International Journal of Science Education 32 (1) 69ndash95

Berger Joe 2006 ldquoDemoting Advanced Placementrdquo The New York Times October 4

Boekaerts Monique and Jeroen S Rozendaal 2010 ldquoUsing Multiple Calibration Indices in

Order to Capture the Complex Picture of What Affects Studentsrsquo Accuracy of Feeling of

Confidencerdquo Learning and Instruction 20 (5) 372ndash382

Bound John Brad Hershbein and Bridget Terry Long 2009 ldquoPlaying the Admissions Game

Student Reactions to Increasing College Competitionrdquo The Journal of Economic

Perspectives 23 (4) 119ndash146

Bowie Liz 2013 ldquoMaryland Schools have been Leader in Advanced Placement but Results are

Mixedrdquo The Baltimore Sun August 17

Bush George W 2006 ldquoState of the Union Address by the Presidentrdquo Washington DC The

White House

Chiu Ming Ming and Robert M Klassen 2010 ldquoRelations of Mathematics Self-Concept and its

Calibration with Mathematics Achievement Cultural Differences among Fifteen-year-

olds in 34 Countriesrdquo Learning and Instruction 20 (1) 2ndash17

Clotfelter Charles T Helen F Ladd and Jacob L Vigdor 2010 ldquoTeacher Credentials and

Student Achievement in High School Across-Subject Analysis with Student Fixed

Effectsrdquo Journal of Human Resources 45 (3) 655ndash681

College Board 2002 Equity Policy Statement New York NY

__________ 2011a AP Biology Curriculum Framework 2012-2013 New York NY

__________ 2011b AP Chemistry Curriculum Framework 2013-2014 New York NY

__________ 2017a AP Course and Exam Redesign New York NY

__________ 2017b AP Course Audit New York NY

__________ 2018 AP Program Participation and Performance Data 2018 New York NY

16

Davis Jennifer R 2014 ldquoA Little Goes a Long Way Pressure for College Students to Succeedrdquo

Journal of Undergraduate Research 12 (1) 1ndash9

Dobbie Will and Roland G Fryer Jr 2015 ldquoThe medium-term impacts of high-achieving

charter schoolsrdquo Journal of Political Economy 123 (5) 985ndash1037

Dougherty Chrys and Lynn Mellor 2009 ldquoPreparation Mattersrdquo National Center for

Educational Achievement Washington DC

Dounay Zinth Jennifer 2016 ldquo50-State Comparison Advanced Placement Policiesrdquo Education

Commission of the States

Drew Christopher 2011 ldquoRethinking Advanced Placementrdquo The New York Times January 7

Duffett Ann and Steve Farkas 2009 ldquoGrowing Pains in the Advanced Placement Program Do

Tough Trade-offs Lie Aheadrdquo Thomas B Fordham Institute Washington DC

Ellis Jessica Bailey K Fosdick and Chris Rasmussen 2016 ldquoWomen 15 Times More Likely to

Leave STEM Pipeline after Calculus Compared to Men Lack of Mathematical

Confidence a Potential Culpritrdquo PLOS ONE 11 (7) 1ndash14

Foust Regan Clark Holly Hertberg-Davis and Carolyn M Callahan 2009 ldquoStudentsrsquo

Perceptions of the Non-academic Advantages and Disadvantages of Participation in

Advanced Placement Courses and International Baccalaureate Programsrdquo Adolescence

44 (174) 289ndash312

Geiser Saul and Veronica Santelices 2004 ldquoThe Role of Advanced Placement and Honors

Courses in College Admissionsrdquo Center for Studies in Higher Education Research

Occasional Paper Series CSHE404

Goodman Joshua Samuel 2012 ldquoThe Labor of Division Returns to Compulsory Math

Courseworkrdquo Unpublished Manuscript

Harel O 2009 ldquoThe Estimation of R-squared and Adjusted R-squared in Incomplete Data Sets

Using Multiple Imputationrdquo Journal of Applied Statistics 36 (10) 1109ndash1118

Hippel Paul T von 2007 ldquoRegression with Missing Ys An Improved Strategy for Analyzing

Multiply Imputed Datardquo Sociological Methodology 37 (1) 83ndash117

Holstead Michael S Terry E Spradlin Margaret E McGillivray and Nathan Burroughs 2010

ldquoThe Impact of Advanced Placement Incentive Programsrdquo Center for Evaluation and

Education Policy Indiana University Education Policy Brief 8(1)

Hopkins Katy 2012 ldquoWeigh the Benefits Stress of AP Courses for Your Studentrdquo US News

the World Report May 10

Huber Martin 2013 ldquoA Simple Test for the Ignorability of Non-compliance in Experimentsrdquo

Economics Letters 120 (3) 389ndash391

Imbens G and F Manski 2004 ldquoConfidence Intervals for Partially Identified Parametersrdquo

Econometrica 72 (6) 1845ndash1857

Jackson C Kirabo 2010 ldquoA Little Now for a Lot Later A Look at a Texas Advanced Placement

Incentive Programrdquo Journal of Human Resources 45 (3) 591ndash639

__________ 2014 ldquoDo College-Preparatory Programs Improve Long-Term Outcomesrdquo

Economic Inquiry 52 (1) 72ndash99

Joensen Juanna Schroslashter and Helena Skyt Nielsen 2009 ldquoIs there a Causal Effect of High

School Math on Labor Market Outcomesrdquo Journal of Human Resources 44 (1) 171ndash

198

Kim Emily 2015 ldquoAP Classes often Translate to Advanced Pressurerdquo Los Angeles Times

September 22

Klopfenstein Kristin and Kit Lively 2016 ldquoDo Grade Weights Promote More Advanced

17

Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324

Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement

Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891

__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and

Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds

Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188

Cambridge Harvard Education Press

Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla

Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)

287ndash 313

Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on

Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102

Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations

of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347

(6219) 262ndash265

Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math

and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic

Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student

STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher

Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking

on Secondary and Postsecondary Successrdquo American Educational Research Journal 49

(2) 285ndash322

Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP

Expansion Can Schools in Less-Resourced Communities Successfully Implement

Advanced Placement Science Coursesrdquo Conditionally accepted by Educational

Researcher

Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo

American Enterprise Institute Washington DC

Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23

McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy

Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of

Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-

144) US Department of Education Washington DC National Center for Education

Statistics

National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of

Mathematics and Science in US High Schoolsrdquo Washington DC National Academies

Press

__________ 2012 A Framework for K-12 Science Education Practices Crosscutting

Concepts and Core Ideas Washington DC The National Academies Press

Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC

Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data

Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures

Version 10 Stanford University

Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic

Analysis amp Policy 4 (1) 1ndash30

18

Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The

Review of Economics and Statistics 86 (2) 497ndash513

Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)

Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of

Advanced High School Coursework in Increasing STEM Career Interestrdquo Science

Educator 23 (1) 1ndash13

Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework

in College Admission Decisionsrdquo College and University 82 (4) 7ndash14

Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan

Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific

Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo

Educational Measurement Forthcoming

Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where

it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor

Economics 35 (1) 67ndash147

Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An

Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732

Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual

differencesrdquo Personality and Individual Differences 21 (6) 971ndash986

Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of

Cross-Cultural Psychology 45 (5) 821ndash837

Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid

Growthrdquo The New York Times April 29

Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo

Liberal Education 94 (3) 38ndash43

The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo

Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo

Education Trust June 5

Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and

Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-

001) US Department of Education Washington DC National Center for Education

Statistics

Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13

Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate

US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the

Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced

Placement Testsrdquo Washington DC

Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of

Advanced Placementrdquo Progressive Policy Institute Washington DC

West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth

Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring

Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation

and Policy Analysis 38 (1) 148ndash170

Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity

of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482

19

Figure 1

Geographic Distribution of Participating Districts

20

Figure 2

Participating Districts Neighborhood Socioeconomic Status and School Test Scores

Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school

district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos

neighborhood defined as the first principal component factor score based on measures of median

income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed

household rate and unemployment rate Y-axis is the districtrsquos average test score in grade

equivalents based on the averaged spring math and English scores for students in grades 3-8 for

2009-2013 with the expected level of achievement standardized to zero The size of each circle

is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using

Statarsquos default settings and roughly shows the predicted test score as a function of the

neighborhoodrsquos SES

21

Figure 3

Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile

Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects

Corresponding OLS estimate shown by the dashed horizontal line Science skill has been

standardized to have a mean of 0 and SD of 1 for the full sample of participating students

Results are weighted by the inverse probability of completing the survey

22

Table 1

Participating Schools and Teachers Compared to Other US High Schools and High School

Science Teachers Panel A Schools Participating Others

Average Enrollment 1409 723

Free or Reduced-Price Lunch 0700 0438

Asian 0055 0050

Black 0349 0154

Hispanic 0410 0221

White 0164 0537

Adjusted Cohort Graduation Rate 0843 0802

District Instruction Expenditures Per Pupil $6561 $5636

District Student Services Expenditures Per Pupil $3787 $3385

Panel B Teachers Participating Others

Age Under 30 0407 0160

Age 30-49 0432 0553

Age 50 or over 0161 0287

Female 0630 0536

Hispanic or Latino 0111 0051

Race American Indian or Alaska Native 0000 0009

Race Asian American 0111 0041

Race Black 0111 0060

Race Native Hawaiian or other Pacific Islander 0000 0004

Race White 0778 0896

Years of Experience 103 132

Years of Experience lt=2 0290 0085

Years of Experience lt=5 0481 0234

Hold a Teaching Certificate 0926 0945

Undergraduate Major in STEM 0944 0747

Single Subject Credential in Science 0630 0823

Masterrsquos Degree or Higher 0356 0615

Previously Taught AP Course 0469 NA

Previously Taught AP IB or Honors Course 0796 NA

Number of Professional Development Trainings 309 NA

in the Past 5 years (0-5)

Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts

httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public

high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a

9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the

Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey

httpsncesedgovsurveyssass Others in Panel B refers to public and private high school

teachers in the US High school science teachers are defined as teachers of grades 9-12 whose

main teaching assignment is in the natural sciences

23

Table 2

TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics

(1) (2) (3) (4) (5) (6)

Full Sample Survey Sample

Pre-Treatment Characteristic

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Age as of October of 11th Grade 166 -003 -007 166 -001 -001

(002) (007) (003) (009)

[019] [035] [065] [094]

Math Exam Score 038 008 025 044 007 030

(004) (010) (005) (016)

[008] [002] [017] [006]

Reading Exam Score 029 010 018 036 009 017

(003) (012) (004) (017)

[000] [014] [002] [031]

HS Grade Point Average 316 005 020 323 006 013

(003) (008) (003) (010)

[014] [002] [006] [020]

Female 059 000 010 061 -001 011

(003) (006) (004) (007)

[099] [010] [073] [012]

Asian American 012 002 010 012 003 010

(002) (005) (001) (007)

[027] [006] [007] [012]

Black 032 -002 -006 027 000 -005

(002) (006) (002) (005)

[029] [028] [088] [040]

Hispanic Native American or Multiracial 031 001 005 033 001 005

24

(002) (006) (002) (007)

[055] [041] [081] [051]

Disabled 002 000 -001 001 000 -001

(001) (001) (001) (001)

[093] [024] [057] [05]

Gifted 013 003 000 014 002 001

(002) (005) (002) (009)

[006] [100] [025] [089]

English Language Learner 005 001 002 004 001 004

(001) (002) (001) (003)

[041] [039] [054] [022]

Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007

(002) (007) (003) (009)

[066] [077] [072] [045]

Language Other than English Spoken at Home 034 002 003 035 001 004

(002) (007) (002) (007)

[032] [073] [059] [056]

Took Recommended Prerequisite Courses 079 000 009 079 002 005

(002) (004) (002) (005)

[084] [004] [043] [031]

Number of Observations 1819 1417

Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by

School x Cohort are in parentheses and p-values are in brackets

25

Table 3

First Stage Impacts on AP Course Enrollment and Overall Course Enrollment

(1) (2) (3) (4) (5) (6)

Full Sample Survey Respondents

Outcome

Control

Group

Mean

ITT

LATE

Control

Group

Mean

ITT

LATE

AP Treatment Course Enrollment 019 038 024 039

(005) (006)

[000] [000] Share of Credits During Study Year in

AP Science 003 004 011 003 004 010

(001) (001) (001) (001)

[000] [000] [000] [000]

All AP 013 004 011 014 004 010

(001) (002) (001) (002)

[000] [000] [000] [000]

Other Advanced Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [023] [020] [020]

All Other Advanced 025 -001 -003 025 -001 -003

(001) (002) (001) (003)

[023] [023] [030] [030]

Regular Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [020] [024] [019]

All Regular 062 -003 -009 061 -003 -007

(001) (003) (001) (003)

[002] [000] [007] [003]

Number of Observations 1819 1417

Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating

Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation

(1) Course-taking information collected from student transcripts Control Group Mean uses the

full control group for the first outcome (ie AP Treatment Course Enrollment) and those control

group members who complied with their assignment (ie those who did not take the AP

Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are

weighted by the inverse probability of completing the survey Standard errors clustered by School

x Cohort are in parentheses and p-values are in brackets

26

Table 4

Treatment Contrast (Composite Variables)

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Academically Challenging Curriculum -033 031 080

(010) (024)

[000] [000]

Project-Based Independent Classroom

Activities -006 013 033

(007) (017)

[007] [006]

Integrated Use of Technology

-011 011 028

(008) (019)

[019] [014]

Number of Observations 1417

Notes To construct these composite variables we first converted the values on each component

variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest

category was set to 10 the lowest to 00 and the remaining categories evenly spaced between

00 and 10 We then averaged and standardized these converted values Results are weighted by

the inverse probability of completing the survey Online Appendix Table 5 provides the list of

component variables Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

27

Table 5

AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Science Skill -019 009 023

(006) (016)

[015] [014]

STEM Interest 062 004 009

(002) (007)

[016] [016]

Confidence in College

Science 092 -004 -010

(002) (005)

[011] [006]

Stress 012 007 017

(003) (007)

[002] [001]

Grades in Science Courses 280 -012 -029

(007) (016)

[008] [007]

Grades in Other Courses 314 -007 -018

(002) (006)

[000] [000]

Number of Observations 1819 for grades 1417 for other

outcomes

Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of

participating students STEM interest =1 if high or some interest in pursuing a STEM degree or

=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to

complete a college science course or =0 if somewhat not confident or not at all confident Stress=

1 if most recent science course had strong negative or negative impact on physical or emotional

health or =0 if strong positive impact positive impact or no impact Grades in science and other

courses are obtained from student transcripts and measure grades during the study year

Results with the exception of grades during study year are weighted by the inverse probability of

completing the survey Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

Table 6

Robustness Checks of Main ITT Results

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Outcome

Control

Group

Complie

r Mean

Main

Result

s

Robus

t SE

p-value

(permutatio

n test)

Excludin

g High

School

56

Including

Imputatio

n of

Missing

Outcome

Variables

Excluding

Covariate

s

Excludin

g High

School

23

Lee

Lower

Boun

d

Lee

Upper

Boun

d

95

Confidence

Interval

from Lee

Bounds

Rati

o of

95

CI in

(11)

to

95

CI in

(7)

Science Skill -019 009 010 011 020 007 003 039

-

009

05

1 20

(006) (005) (000) (000) (000) (000) (007) (007)

[015] [006] [006] [020] [011] [001] [024] [072] [000]

STEM Interest 062 004 005 003 003 003 002 012

-

003

01

8 19

(002) (003) (000) (000) (000) (000) (003) (004)

[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College

Science 092 -004 -003 -006 -006 -004 -006 005

-

009

01

0 20

(002) (002) (000) (000) (000) (000) (002) (003)

[011] [005] [007] [037] [002] [003] [010] [000] [017]

Stress 012 007 005 006 008 007 001 011

-

005

01

5 16

(003) (002) (000) (000) (000) (000) (003) (002)

[002] [000] [000] [014] [007] [002] [002] [079] [000]

Grades in Science Courses 280 -012 -006 -010 -007 |

(007) (004) (000) (000) (000)

[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts

Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey

(002) (003) (000) (000) (000) |

[000] [001] [001] [000] [001] [038]

Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than

standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby

a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of

43

the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)

reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the

experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply

imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and

where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)

from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those

treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and

control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to

derive confidence interval for the treatment effect itself)

30

1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the

Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the

effect of advanced high school courses more generally often without distinctions between AP

and other rigorous course options Nearly all of these nonexperimental studies find large positive

effects of rigorous secondary school courses particularly those in math and science on studentsrsquo

high school postsecondary and labor market performance (eg Altonji 1995 Attewell and

Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long

Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer

an AP course are encouraged (though not required) to attend a professional development

training The Board and other independent agencies offer several workshops with the most

extensive training being the AP summer institute a week-long training that is led by an

experienced AP instructor Teachers are then expected to develop their syllabi for the course and

submit them to the Board for review A team of auditors at the Board review each syllabus and

grant permission to a school to label the course as AP on course catalogs and student transcripts

once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they

do not meet the requirements upon original submission College Board (2017b) contains a

discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for

assessment (ie course delivery and student performance are not assessed by the Board) In

order to effectively run an AP Biology or Chemistry course teachers require access to a well-

equipped classroom and laboratory including all supplies necessary to engage in

experimentation (eg beakers solutions microscopes measuring equipment) Most of the

teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of

learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry

reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses

were based upon recommendations from the National Science Foundation the National Research

Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently

influenced by their frames of reference in ways that other assessments of these traits (eg

external observations) may be less influenced By increasing the standard to which they compare

themselves studentsrsquo confidence may decrease This feature of most self-assessments could be

considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et

al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome

depends to some extent on how these changes in perceived ability influence other behaviors

such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and

Biology I and Chemistry I for AP Biology with no additional requirements beyond these

prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week

training course classroom supplies (eg lab materials textbooks) and to compensate schools

for the staff time required for study administration efforts We also offered $1000 compensation

for an individual selected by the school to serve as a liaison between the study team and the

31

school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of

students which would have powered the study to detect effect sizes smaller than those detected

here We faced several challenges in recruiting schools to participate even with the monetary

incentives Some schools were uncomfortable with randomization across classrooms while

others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the

course would be offered We also made some assignments on a rolling basis as additional

consentassent forms were submitted We have no information on the students who were deemed

eligible by the school to take the new AP science course but who did not sign the consent form

to participate As these students did not participate we do not have permission to obtain

information on their characteristics (eg via transcripts) and for most schools we do not know

the number of such students 10 Participating districts include Anaheim Union High School District California East Side

Union High School District California Lynwood Unified School District California Jefferson

Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg

Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public

Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville

Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate

degrees held by teachers nationally are likely to be in education (not STEM) Thus the study

teachers are less likely to have a graduate degree but not necessarily less likely to have STEM

training We also did not survey teachers regarding their Teach for America (TFA) experience

but it is possible that the relatively high share of STEM undergraduate degrees could be driven

by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last

pilot test included 140 students) prior to administering the tool to study participants Reliability

metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of

the development of the assessment instrument in the survey can be found in Seeratan et al

(2017) 13 Each year in the spring semester our team administered and collected the participant surveys

during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However

if study participants who did not take the survey differ in unobserved ways then our reweighting

based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo

characteristics before imputation of missing values (as described below) these results are very

similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact

that some students did not have 10th grade scores) we created one reading and math score for

each student that is the average of both scores or just the 8th grade score For the 23 participating

students who were in 10th grade during the year in which the AP course was offered to their

cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be

endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of

enrollment in the course so the lack of balance is simply due to unlucky randomization rather

32

than manipulation by school administrators We considered implementing a randomized block

design to avoid such issues but found it infeasible to obtain the necessary test score information

prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was

allowed to register for the new class We added an entire planning year to our study design to

avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by

Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We

find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these

six outcomes which suggests that generalizing our estimated treatment effects to the full control

group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly

offered enrollment in an AP course and then given the option of Chemistry or Biology To

account for the two courses offered we treat the school as two separate groups School-

Chemistry and School-Biology For those students who were not offered an AP course we

randomly assign them to one of two control groups proportional to the number of treated

students who chose each course For example if 60 of the treated students chose Biology then

we randomly assign 60 of the control students to the School-Biology control group In Section

VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a

probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1

if student i in school by cohort j completed any part of the end-of-year survey Xi is the same

vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed

effects and Φ() is the cumulative normal distribution function The results of this regression are

included in Online Appendix Table 2 Students who had higher pre-treatment grades Black

students those who were not disabled and those who took prerequisite courses were more likely

to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives

more weight in the regression to study participants who completed the survey and yet had pre-

study characteristics that were similar to those study participants who did not complete the

survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and

with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we

observe each outcome variable This follows a multiple imputation then deletion strategy

suggested by Hippel (2007) which improves efficiency while protecting against problematic

imputed outcome values As a robustness check Section VC provides results including

imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually

challenging curriculum with more homework than non-AP complier students Treatment group

students are also more likely to report that the students in their class were driven to succeed and

that the teacher set high standards The AP science class also involved more student-led projects

or experiments hands on learning and small group work all activities that are deemed to be

essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)

Yet we do not find strong evidence that students in AP classes were more likely to present what

they learned apply their knowledge to solve a new problem or work independently and none of

the component measures of technology usage were statistically significantly affected Nor did

33

treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear

better able to implement the academic rigor expected of an AP science class than some of the

inquiry-based approaches that the College Board intends for AP science We do not find

evidence that taking AP science led students to be more likely to report that they found their

course more interesting which may reflect the inability of the teachers to fully implement a

creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects

that might render our estimated effects smaller A research design with randomization both

across and within schools would allow for estimation of spillover effects but such a design was

infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers

received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the

weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors

in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple

outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons

(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same

three outcomes that reach statistical significance without applying the correction (shown in

Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys

from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we

have included the students from cohort 1 of high school number 23 where nonresponse was due

mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes

(including student prior academic preparation raceethnicity gender and teacher preparation)

We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in

science and grades in other courses) Some of the differences in the point estimates were quite

large yet so too were the standard errors For instance five of the seven estimated differential

treatment effects on science skill exceed 025 standard deviations with p-values that fall in the

suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse

on all three cohorts of study participants Once data collection is complete we will have the

ability to examine the effect of AP science on college enrollment college selectivity and college

completion

Page 14: Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown,

14

interests and beliefs We recruited 23 schools that had not previously offered AP Biology or

Chemistry and were willing to permit us to randomize student access to the newly offered

course At the time of our school recruitment an estimated 50 percent of US high schools

already offered AP science classes and they tended to be in relatively higher-income

communities disproportionately serving White students (Malkus 2016) Our study drew from the

remaining population of schools where teachers had lower levels of training than science

teachers nationally and students were disproportionately non-White and poor Consequently our

results on AP impacts best generalize to schools like these that are on the cusp of deciding

whether to offer an AP science course

The estimates suggest that AP science led to improvements in science skill and STEM

interest above the courses that these students would otherwise take Prior research points to

longer-run benefits of AP including a higher likelihood of college enrollment and completion as

well as possible earnings gains (Jackson 2010 2014) Our findings suggest that these long-term

effects are at least partially driven by genuine increases in skill and not due solely to

postsecondary admissions and credit-granting policies33 We also find that AP science classes

substantially increase studentsrsquo stress levels and reduce their confidence in completing a college

science course Students who take AP science also receive lower grades in science and in other

(non-science) courses The cognitive gains from AP science are consistent with evidence that

higher levels of pressure and a lower level of confidence cause students to learn more than they

would otherwise And some of the negative effect on grades can be offset by upwardly weighting

grades in advanced courses

Although we have no direct way to convert our study impacts into monetary values for

students or society our evidence suggests that schools and districts are not making unwise or

costly investments in AP Calculating the differential cost to deliver an AP course versus another

level course in the same subject is difficult given that few schools document per-course

expenditures One recent analysis of a US district that relied on teacher salaries and course

assignments offers a partial cost-analysis Roza (2009) finds approximately $360 more in per-

pupil expenditures to deliver AP versus honors due primarily to smaller class sizes and more

senior teachers in AP This cost does not factor in the time that teachers spend retraining

themselves to teach the new curriculum At the same time relative to other policies aimed at

increasing human capital in high school that are often more costly to implement (such as

reducing class size) offering an AP course may be one of the least expensive options

This study offers the first credible estimates on the impact of a curriculum that is now offered

in the majority of the nationrsquos high schools and used by most postsecondary institutions to assess

applicant potential Our findings offer evidence to support and refute some of the claims made

about the AP program At the same time many important questions remain about differential AP

course impacts along student teacher and school attributes and on different parts of the outcome

distributions What are the general equilibrium effects of AP expansion for instance on college

admissions decisions as AP expands into schools with fewer resources Do AP courses generate

spillover effects on non-AP course-takers via changes in peer interactions and changes in how

teachers teach their non-AP classes These are all questions that warrant further research

15

References

Abadie Alberto Susan Athey Guido W Imbens and Jeffrey Wooldridge 2017 ldquoWhen Should

you Adjust Standard Errors for Clusteringrdquo NBER Working Paper No 24003

Cambridge MA NBER

Adelman Clifford 2006 The Toolbox Revisited Paths to Degree Completion from High School

Through College Washington DC US Department of Education

Aguilar Lauren Greg Walton and Carl Wieman 2014 ldquoPsychological Insights for Improved

Physics Teachingrdquo Physics Today 67 (5) 43ndash49

Altonji Joseph G 1995 ldquoThe Effects of High School Curriculum on Education and Labor

Market Outcomesrdquo The Journal of Human Resources 30 (3) 409ndash438

Anderson Carl R 1976 ldquoCoping Behaviors as Intervening Mechanisms in the Inverted-U-stress-

performance Relationshiprdquo Journal of Applied Psychology 61 (1) 30ndash34

Attewell Paul and Thurston Domina 2008 ldquoRaising the Bar Curricular Intensity and

Academic Performancerdquo Educational Evaluation and Policy Analysis 30 (1) 51ndash71

Avery Christopher Oded Gurantz Michael Hurwitz and Jonathan Smith 2018 ldquoShifting

College Majors in Response to Advanced Placement Exam Scoresrdquo Journal of Human

Resources 53 (4) 918ndash956

Benjamini Yoav and Yosef Hochberg 1995 ldquoControlling the False Discovery Rate A Practical

and Powerful Approach to Multiple Testingrdquo Journal of the Royal Statistical Society 57

(1) 289ndash300

Bennett J S Hogarth F Lubben B Campbell and A Robinson 2010 ldquoTalking Science The

Research Evidence on the Use of Small Group Discussions in Science Teachingrdquo

International Journal of Science Education 32 (1) 69ndash95

Berger Joe 2006 ldquoDemoting Advanced Placementrdquo The New York Times October 4

Boekaerts Monique and Jeroen S Rozendaal 2010 ldquoUsing Multiple Calibration Indices in

Order to Capture the Complex Picture of What Affects Studentsrsquo Accuracy of Feeling of

Confidencerdquo Learning and Instruction 20 (5) 372ndash382

Bound John Brad Hershbein and Bridget Terry Long 2009 ldquoPlaying the Admissions Game

Student Reactions to Increasing College Competitionrdquo The Journal of Economic

Perspectives 23 (4) 119ndash146

Bowie Liz 2013 ldquoMaryland Schools have been Leader in Advanced Placement but Results are

Mixedrdquo The Baltimore Sun August 17

Bush George W 2006 ldquoState of the Union Address by the Presidentrdquo Washington DC The

White House

Chiu Ming Ming and Robert M Klassen 2010 ldquoRelations of Mathematics Self-Concept and its

Calibration with Mathematics Achievement Cultural Differences among Fifteen-year-

olds in 34 Countriesrdquo Learning and Instruction 20 (1) 2ndash17

Clotfelter Charles T Helen F Ladd and Jacob L Vigdor 2010 ldquoTeacher Credentials and

Student Achievement in High School Across-Subject Analysis with Student Fixed

Effectsrdquo Journal of Human Resources 45 (3) 655ndash681

College Board 2002 Equity Policy Statement New York NY

__________ 2011a AP Biology Curriculum Framework 2012-2013 New York NY

__________ 2011b AP Chemistry Curriculum Framework 2013-2014 New York NY

__________ 2017a AP Course and Exam Redesign New York NY

__________ 2017b AP Course Audit New York NY

__________ 2018 AP Program Participation and Performance Data 2018 New York NY

16

Davis Jennifer R 2014 ldquoA Little Goes a Long Way Pressure for College Students to Succeedrdquo

Journal of Undergraduate Research 12 (1) 1ndash9

Dobbie Will and Roland G Fryer Jr 2015 ldquoThe medium-term impacts of high-achieving

charter schoolsrdquo Journal of Political Economy 123 (5) 985ndash1037

Dougherty Chrys and Lynn Mellor 2009 ldquoPreparation Mattersrdquo National Center for

Educational Achievement Washington DC

Dounay Zinth Jennifer 2016 ldquo50-State Comparison Advanced Placement Policiesrdquo Education

Commission of the States

Drew Christopher 2011 ldquoRethinking Advanced Placementrdquo The New York Times January 7

Duffett Ann and Steve Farkas 2009 ldquoGrowing Pains in the Advanced Placement Program Do

Tough Trade-offs Lie Aheadrdquo Thomas B Fordham Institute Washington DC

Ellis Jessica Bailey K Fosdick and Chris Rasmussen 2016 ldquoWomen 15 Times More Likely to

Leave STEM Pipeline after Calculus Compared to Men Lack of Mathematical

Confidence a Potential Culpritrdquo PLOS ONE 11 (7) 1ndash14

Foust Regan Clark Holly Hertberg-Davis and Carolyn M Callahan 2009 ldquoStudentsrsquo

Perceptions of the Non-academic Advantages and Disadvantages of Participation in

Advanced Placement Courses and International Baccalaureate Programsrdquo Adolescence

44 (174) 289ndash312

Geiser Saul and Veronica Santelices 2004 ldquoThe Role of Advanced Placement and Honors

Courses in College Admissionsrdquo Center for Studies in Higher Education Research

Occasional Paper Series CSHE404

Goodman Joshua Samuel 2012 ldquoThe Labor of Division Returns to Compulsory Math

Courseworkrdquo Unpublished Manuscript

Harel O 2009 ldquoThe Estimation of R-squared and Adjusted R-squared in Incomplete Data Sets

Using Multiple Imputationrdquo Journal of Applied Statistics 36 (10) 1109ndash1118

Hippel Paul T von 2007 ldquoRegression with Missing Ys An Improved Strategy for Analyzing

Multiply Imputed Datardquo Sociological Methodology 37 (1) 83ndash117

Holstead Michael S Terry E Spradlin Margaret E McGillivray and Nathan Burroughs 2010

ldquoThe Impact of Advanced Placement Incentive Programsrdquo Center for Evaluation and

Education Policy Indiana University Education Policy Brief 8(1)

Hopkins Katy 2012 ldquoWeigh the Benefits Stress of AP Courses for Your Studentrdquo US News

the World Report May 10

Huber Martin 2013 ldquoA Simple Test for the Ignorability of Non-compliance in Experimentsrdquo

Economics Letters 120 (3) 389ndash391

Imbens G and F Manski 2004 ldquoConfidence Intervals for Partially Identified Parametersrdquo

Econometrica 72 (6) 1845ndash1857

Jackson C Kirabo 2010 ldquoA Little Now for a Lot Later A Look at a Texas Advanced Placement

Incentive Programrdquo Journal of Human Resources 45 (3) 591ndash639

__________ 2014 ldquoDo College-Preparatory Programs Improve Long-Term Outcomesrdquo

Economic Inquiry 52 (1) 72ndash99

Joensen Juanna Schroslashter and Helena Skyt Nielsen 2009 ldquoIs there a Causal Effect of High

School Math on Labor Market Outcomesrdquo Journal of Human Resources 44 (1) 171ndash

198

Kim Emily 2015 ldquoAP Classes often Translate to Advanced Pressurerdquo Los Angeles Times

September 22

Klopfenstein Kristin and Kit Lively 2016 ldquoDo Grade Weights Promote More Advanced

17

Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324

Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement

Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891

__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and

Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds

Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188

Cambridge Harvard Education Press

Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla

Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)

287ndash 313

Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on

Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102

Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations

of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347

(6219) 262ndash265

Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math

and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic

Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student

STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher

Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking

on Secondary and Postsecondary Successrdquo American Educational Research Journal 49

(2) 285ndash322

Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP

Expansion Can Schools in Less-Resourced Communities Successfully Implement

Advanced Placement Science Coursesrdquo Conditionally accepted by Educational

Researcher

Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo

American Enterprise Institute Washington DC

Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23

McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy

Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of

Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-

144) US Department of Education Washington DC National Center for Education

Statistics

National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of

Mathematics and Science in US High Schoolsrdquo Washington DC National Academies

Press

__________ 2012 A Framework for K-12 Science Education Practices Crosscutting

Concepts and Core Ideas Washington DC The National Academies Press

Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC

Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data

Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures

Version 10 Stanford University

Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic

Analysis amp Policy 4 (1) 1ndash30

18

Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The

Review of Economics and Statistics 86 (2) 497ndash513

Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)

Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of

Advanced High School Coursework in Increasing STEM Career Interestrdquo Science

Educator 23 (1) 1ndash13

Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework

in College Admission Decisionsrdquo College and University 82 (4) 7ndash14

Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan

Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific

Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo

Educational Measurement Forthcoming

Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where

it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor

Economics 35 (1) 67ndash147

Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An

Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732

Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual

differencesrdquo Personality and Individual Differences 21 (6) 971ndash986

Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of

Cross-Cultural Psychology 45 (5) 821ndash837

Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid

Growthrdquo The New York Times April 29

Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo

Liberal Education 94 (3) 38ndash43

The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo

Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo

Education Trust June 5

Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and

Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-

001) US Department of Education Washington DC National Center for Education

Statistics

Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13

Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate

US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the

Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced

Placement Testsrdquo Washington DC

Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of

Advanced Placementrdquo Progressive Policy Institute Washington DC

West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth

Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring

Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation

and Policy Analysis 38 (1) 148ndash170

Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity

of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482

19

Figure 1

Geographic Distribution of Participating Districts

20

Figure 2

Participating Districts Neighborhood Socioeconomic Status and School Test Scores

Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school

district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos

neighborhood defined as the first principal component factor score based on measures of median

income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed

household rate and unemployment rate Y-axis is the districtrsquos average test score in grade

equivalents based on the averaged spring math and English scores for students in grades 3-8 for

2009-2013 with the expected level of achievement standardized to zero The size of each circle

is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using

Statarsquos default settings and roughly shows the predicted test score as a function of the

neighborhoodrsquos SES

21

Figure 3

Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile

Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects

Corresponding OLS estimate shown by the dashed horizontal line Science skill has been

standardized to have a mean of 0 and SD of 1 for the full sample of participating students

Results are weighted by the inverse probability of completing the survey

22

Table 1

Participating Schools and Teachers Compared to Other US High Schools and High School

Science Teachers Panel A Schools Participating Others

Average Enrollment 1409 723

Free or Reduced-Price Lunch 0700 0438

Asian 0055 0050

Black 0349 0154

Hispanic 0410 0221

White 0164 0537

Adjusted Cohort Graduation Rate 0843 0802

District Instruction Expenditures Per Pupil $6561 $5636

District Student Services Expenditures Per Pupil $3787 $3385

Panel B Teachers Participating Others

Age Under 30 0407 0160

Age 30-49 0432 0553

Age 50 or over 0161 0287

Female 0630 0536

Hispanic or Latino 0111 0051

Race American Indian or Alaska Native 0000 0009

Race Asian American 0111 0041

Race Black 0111 0060

Race Native Hawaiian or other Pacific Islander 0000 0004

Race White 0778 0896

Years of Experience 103 132

Years of Experience lt=2 0290 0085

Years of Experience lt=5 0481 0234

Hold a Teaching Certificate 0926 0945

Undergraduate Major in STEM 0944 0747

Single Subject Credential in Science 0630 0823

Masterrsquos Degree or Higher 0356 0615

Previously Taught AP Course 0469 NA

Previously Taught AP IB or Honors Course 0796 NA

Number of Professional Development Trainings 309 NA

in the Past 5 years (0-5)

Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts

httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public

high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a

9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the

Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey

httpsncesedgovsurveyssass Others in Panel B refers to public and private high school

teachers in the US High school science teachers are defined as teachers of grades 9-12 whose

main teaching assignment is in the natural sciences

23

Table 2

TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics

(1) (2) (3) (4) (5) (6)

Full Sample Survey Sample

Pre-Treatment Characteristic

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Age as of October of 11th Grade 166 -003 -007 166 -001 -001

(002) (007) (003) (009)

[019] [035] [065] [094]

Math Exam Score 038 008 025 044 007 030

(004) (010) (005) (016)

[008] [002] [017] [006]

Reading Exam Score 029 010 018 036 009 017

(003) (012) (004) (017)

[000] [014] [002] [031]

HS Grade Point Average 316 005 020 323 006 013

(003) (008) (003) (010)

[014] [002] [006] [020]

Female 059 000 010 061 -001 011

(003) (006) (004) (007)

[099] [010] [073] [012]

Asian American 012 002 010 012 003 010

(002) (005) (001) (007)

[027] [006] [007] [012]

Black 032 -002 -006 027 000 -005

(002) (006) (002) (005)

[029] [028] [088] [040]

Hispanic Native American or Multiracial 031 001 005 033 001 005

24

(002) (006) (002) (007)

[055] [041] [081] [051]

Disabled 002 000 -001 001 000 -001

(001) (001) (001) (001)

[093] [024] [057] [05]

Gifted 013 003 000 014 002 001

(002) (005) (002) (009)

[006] [100] [025] [089]

English Language Learner 005 001 002 004 001 004

(001) (002) (001) (003)

[041] [039] [054] [022]

Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007

(002) (007) (003) (009)

[066] [077] [072] [045]

Language Other than English Spoken at Home 034 002 003 035 001 004

(002) (007) (002) (007)

[032] [073] [059] [056]

Took Recommended Prerequisite Courses 079 000 009 079 002 005

(002) (004) (002) (005)

[084] [004] [043] [031]

Number of Observations 1819 1417

Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by

School x Cohort are in parentheses and p-values are in brackets

25

Table 3

First Stage Impacts on AP Course Enrollment and Overall Course Enrollment

(1) (2) (3) (4) (5) (6)

Full Sample Survey Respondents

Outcome

Control

Group

Mean

ITT

LATE

Control

Group

Mean

ITT

LATE

AP Treatment Course Enrollment 019 038 024 039

(005) (006)

[000] [000] Share of Credits During Study Year in

AP Science 003 004 011 003 004 010

(001) (001) (001) (001)

[000] [000] [000] [000]

All AP 013 004 011 014 004 010

(001) (002) (001) (002)

[000] [000] [000] [000]

Other Advanced Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [023] [020] [020]

All Other Advanced 025 -001 -003 025 -001 -003

(001) (002) (001) (003)

[023] [023] [030] [030]

Regular Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [020] [024] [019]

All Regular 062 -003 -009 061 -003 -007

(001) (003) (001) (003)

[002] [000] [007] [003]

Number of Observations 1819 1417

Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating

Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation

(1) Course-taking information collected from student transcripts Control Group Mean uses the

full control group for the first outcome (ie AP Treatment Course Enrollment) and those control

group members who complied with their assignment (ie those who did not take the AP

Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are

weighted by the inverse probability of completing the survey Standard errors clustered by School

x Cohort are in parentheses and p-values are in brackets

26

Table 4

Treatment Contrast (Composite Variables)

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Academically Challenging Curriculum -033 031 080

(010) (024)

[000] [000]

Project-Based Independent Classroom

Activities -006 013 033

(007) (017)

[007] [006]

Integrated Use of Technology

-011 011 028

(008) (019)

[019] [014]

Number of Observations 1417

Notes To construct these composite variables we first converted the values on each component

variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest

category was set to 10 the lowest to 00 and the remaining categories evenly spaced between

00 and 10 We then averaged and standardized these converted values Results are weighted by

the inverse probability of completing the survey Online Appendix Table 5 provides the list of

component variables Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

27

Table 5

AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Science Skill -019 009 023

(006) (016)

[015] [014]

STEM Interest 062 004 009

(002) (007)

[016] [016]

Confidence in College

Science 092 -004 -010

(002) (005)

[011] [006]

Stress 012 007 017

(003) (007)

[002] [001]

Grades in Science Courses 280 -012 -029

(007) (016)

[008] [007]

Grades in Other Courses 314 -007 -018

(002) (006)

[000] [000]

Number of Observations 1819 for grades 1417 for other

outcomes

Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of

participating students STEM interest =1 if high or some interest in pursuing a STEM degree or

=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to

complete a college science course or =0 if somewhat not confident or not at all confident Stress=

1 if most recent science course had strong negative or negative impact on physical or emotional

health or =0 if strong positive impact positive impact or no impact Grades in science and other

courses are obtained from student transcripts and measure grades during the study year

Results with the exception of grades during study year are weighted by the inverse probability of

completing the survey Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

Table 6

Robustness Checks of Main ITT Results

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Outcome

Control

Group

Complie

r Mean

Main

Result

s

Robus

t SE

p-value

(permutatio

n test)

Excludin

g High

School

56

Including

Imputatio

n of

Missing

Outcome

Variables

Excluding

Covariate

s

Excludin

g High

School

23

Lee

Lower

Boun

d

Lee

Upper

Boun

d

95

Confidence

Interval

from Lee

Bounds

Rati

o of

95

CI in

(11)

to

95

CI in

(7)

Science Skill -019 009 010 011 020 007 003 039

-

009

05

1 20

(006) (005) (000) (000) (000) (000) (007) (007)

[015] [006] [006] [020] [011] [001] [024] [072] [000]

STEM Interest 062 004 005 003 003 003 002 012

-

003

01

8 19

(002) (003) (000) (000) (000) (000) (003) (004)

[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College

Science 092 -004 -003 -006 -006 -004 -006 005

-

009

01

0 20

(002) (002) (000) (000) (000) (000) (002) (003)

[011] [005] [007] [037] [002] [003] [010] [000] [017]

Stress 012 007 005 006 008 007 001 011

-

005

01

5 16

(003) (002) (000) (000) (000) (000) (003) (002)

[002] [000] [000] [014] [007] [002] [002] [079] [000]

Grades in Science Courses 280 -012 -006 -010 -007 |

(007) (004) (000) (000) (000)

[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts

Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey

(002) (003) (000) (000) (000) |

[000] [001] [001] [000] [001] [038]

Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than

standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby

a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of

43

the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)

reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the

experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply

imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and

where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)

from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those

treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and

control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to

derive confidence interval for the treatment effect itself)

30

1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the

Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the

effect of advanced high school courses more generally often without distinctions between AP

and other rigorous course options Nearly all of these nonexperimental studies find large positive

effects of rigorous secondary school courses particularly those in math and science on studentsrsquo

high school postsecondary and labor market performance (eg Altonji 1995 Attewell and

Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long

Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer

an AP course are encouraged (though not required) to attend a professional development

training The Board and other independent agencies offer several workshops with the most

extensive training being the AP summer institute a week-long training that is led by an

experienced AP instructor Teachers are then expected to develop their syllabi for the course and

submit them to the Board for review A team of auditors at the Board review each syllabus and

grant permission to a school to label the course as AP on course catalogs and student transcripts

once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they

do not meet the requirements upon original submission College Board (2017b) contains a

discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for

assessment (ie course delivery and student performance are not assessed by the Board) In

order to effectively run an AP Biology or Chemistry course teachers require access to a well-

equipped classroom and laboratory including all supplies necessary to engage in

experimentation (eg beakers solutions microscopes measuring equipment) Most of the

teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of

learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry

reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses

were based upon recommendations from the National Science Foundation the National Research

Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently

influenced by their frames of reference in ways that other assessments of these traits (eg

external observations) may be less influenced By increasing the standard to which they compare

themselves studentsrsquo confidence may decrease This feature of most self-assessments could be

considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et

al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome

depends to some extent on how these changes in perceived ability influence other behaviors

such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and

Biology I and Chemistry I for AP Biology with no additional requirements beyond these

prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week

training course classroom supplies (eg lab materials textbooks) and to compensate schools

for the staff time required for study administration efforts We also offered $1000 compensation

for an individual selected by the school to serve as a liaison between the study team and the

31

school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of

students which would have powered the study to detect effect sizes smaller than those detected

here We faced several challenges in recruiting schools to participate even with the monetary

incentives Some schools were uncomfortable with randomization across classrooms while

others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the

course would be offered We also made some assignments on a rolling basis as additional

consentassent forms were submitted We have no information on the students who were deemed

eligible by the school to take the new AP science course but who did not sign the consent form

to participate As these students did not participate we do not have permission to obtain

information on their characteristics (eg via transcripts) and for most schools we do not know

the number of such students 10 Participating districts include Anaheim Union High School District California East Side

Union High School District California Lynwood Unified School District California Jefferson

Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg

Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public

Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville

Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate

degrees held by teachers nationally are likely to be in education (not STEM) Thus the study

teachers are less likely to have a graduate degree but not necessarily less likely to have STEM

training We also did not survey teachers regarding their Teach for America (TFA) experience

but it is possible that the relatively high share of STEM undergraduate degrees could be driven

by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last

pilot test included 140 students) prior to administering the tool to study participants Reliability

metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of

the development of the assessment instrument in the survey can be found in Seeratan et al

(2017) 13 Each year in the spring semester our team administered and collected the participant surveys

during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However

if study participants who did not take the survey differ in unobserved ways then our reweighting

based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo

characteristics before imputation of missing values (as described below) these results are very

similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact

that some students did not have 10th grade scores) we created one reading and math score for

each student that is the average of both scores or just the 8th grade score For the 23 participating

students who were in 10th grade during the year in which the AP course was offered to their

cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be

endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of

enrollment in the course so the lack of balance is simply due to unlucky randomization rather

32

than manipulation by school administrators We considered implementing a randomized block

design to avoid such issues but found it infeasible to obtain the necessary test score information

prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was

allowed to register for the new class We added an entire planning year to our study design to

avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by

Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We

find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these

six outcomes which suggests that generalizing our estimated treatment effects to the full control

group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly

offered enrollment in an AP course and then given the option of Chemistry or Biology To

account for the two courses offered we treat the school as two separate groups School-

Chemistry and School-Biology For those students who were not offered an AP course we

randomly assign them to one of two control groups proportional to the number of treated

students who chose each course For example if 60 of the treated students chose Biology then

we randomly assign 60 of the control students to the School-Biology control group In Section

VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a

probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1

if student i in school by cohort j completed any part of the end-of-year survey Xi is the same

vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed

effects and Φ() is the cumulative normal distribution function The results of this regression are

included in Online Appendix Table 2 Students who had higher pre-treatment grades Black

students those who were not disabled and those who took prerequisite courses were more likely

to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives

more weight in the regression to study participants who completed the survey and yet had pre-

study characteristics that were similar to those study participants who did not complete the

survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and

with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we

observe each outcome variable This follows a multiple imputation then deletion strategy

suggested by Hippel (2007) which improves efficiency while protecting against problematic

imputed outcome values As a robustness check Section VC provides results including

imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually

challenging curriculum with more homework than non-AP complier students Treatment group

students are also more likely to report that the students in their class were driven to succeed and

that the teacher set high standards The AP science class also involved more student-led projects

or experiments hands on learning and small group work all activities that are deemed to be

essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)

Yet we do not find strong evidence that students in AP classes were more likely to present what

they learned apply their knowledge to solve a new problem or work independently and none of

the component measures of technology usage were statistically significantly affected Nor did

33

treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear

better able to implement the academic rigor expected of an AP science class than some of the

inquiry-based approaches that the College Board intends for AP science We do not find

evidence that taking AP science led students to be more likely to report that they found their

course more interesting which may reflect the inability of the teachers to fully implement a

creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects

that might render our estimated effects smaller A research design with randomization both

across and within schools would allow for estimation of spillover effects but such a design was

infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers

received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the

weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors

in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple

outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons

(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same

three outcomes that reach statistical significance without applying the correction (shown in

Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys

from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we

have included the students from cohort 1 of high school number 23 where nonresponse was due

mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes

(including student prior academic preparation raceethnicity gender and teacher preparation)

We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in

science and grades in other courses) Some of the differences in the point estimates were quite

large yet so too were the standard errors For instance five of the seven estimated differential

treatment effects on science skill exceed 025 standard deviations with p-values that fall in the

suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse

on all three cohorts of study participants Once data collection is complete we will have the

ability to examine the effect of AP science on college enrollment college selectivity and college

completion

Page 15: Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown,

15

References

Abadie Alberto Susan Athey Guido W Imbens and Jeffrey Wooldridge 2017 ldquoWhen Should

you Adjust Standard Errors for Clusteringrdquo NBER Working Paper No 24003

Cambridge MA NBER

Adelman Clifford 2006 The Toolbox Revisited Paths to Degree Completion from High School

Through College Washington DC US Department of Education

Aguilar Lauren Greg Walton and Carl Wieman 2014 ldquoPsychological Insights for Improved

Physics Teachingrdquo Physics Today 67 (5) 43ndash49

Altonji Joseph G 1995 ldquoThe Effects of High School Curriculum on Education and Labor

Market Outcomesrdquo The Journal of Human Resources 30 (3) 409ndash438

Anderson Carl R 1976 ldquoCoping Behaviors as Intervening Mechanisms in the Inverted-U-stress-

performance Relationshiprdquo Journal of Applied Psychology 61 (1) 30ndash34

Attewell Paul and Thurston Domina 2008 ldquoRaising the Bar Curricular Intensity and

Academic Performancerdquo Educational Evaluation and Policy Analysis 30 (1) 51ndash71

Avery Christopher Oded Gurantz Michael Hurwitz and Jonathan Smith 2018 ldquoShifting

College Majors in Response to Advanced Placement Exam Scoresrdquo Journal of Human

Resources 53 (4) 918ndash956

Benjamini Yoav and Yosef Hochberg 1995 ldquoControlling the False Discovery Rate A Practical

and Powerful Approach to Multiple Testingrdquo Journal of the Royal Statistical Society 57

(1) 289ndash300

Bennett J S Hogarth F Lubben B Campbell and A Robinson 2010 ldquoTalking Science The

Research Evidence on the Use of Small Group Discussions in Science Teachingrdquo

International Journal of Science Education 32 (1) 69ndash95

Berger Joe 2006 ldquoDemoting Advanced Placementrdquo The New York Times October 4

Boekaerts Monique and Jeroen S Rozendaal 2010 ldquoUsing Multiple Calibration Indices in

Order to Capture the Complex Picture of What Affects Studentsrsquo Accuracy of Feeling of

Confidencerdquo Learning and Instruction 20 (5) 372ndash382

Bound John Brad Hershbein and Bridget Terry Long 2009 ldquoPlaying the Admissions Game

Student Reactions to Increasing College Competitionrdquo The Journal of Economic

Perspectives 23 (4) 119ndash146

Bowie Liz 2013 ldquoMaryland Schools have been Leader in Advanced Placement but Results are

Mixedrdquo The Baltimore Sun August 17

Bush George W 2006 ldquoState of the Union Address by the Presidentrdquo Washington DC The

White House

Chiu Ming Ming and Robert M Klassen 2010 ldquoRelations of Mathematics Self-Concept and its

Calibration with Mathematics Achievement Cultural Differences among Fifteen-year-

olds in 34 Countriesrdquo Learning and Instruction 20 (1) 2ndash17

Clotfelter Charles T Helen F Ladd and Jacob L Vigdor 2010 ldquoTeacher Credentials and

Student Achievement in High School Across-Subject Analysis with Student Fixed

Effectsrdquo Journal of Human Resources 45 (3) 655ndash681

College Board 2002 Equity Policy Statement New York NY

__________ 2011a AP Biology Curriculum Framework 2012-2013 New York NY

__________ 2011b AP Chemistry Curriculum Framework 2013-2014 New York NY

__________ 2017a AP Course and Exam Redesign New York NY

__________ 2017b AP Course Audit New York NY

__________ 2018 AP Program Participation and Performance Data 2018 New York NY

16

Davis Jennifer R 2014 ldquoA Little Goes a Long Way Pressure for College Students to Succeedrdquo

Journal of Undergraduate Research 12 (1) 1ndash9

Dobbie Will and Roland G Fryer Jr 2015 ldquoThe medium-term impacts of high-achieving

charter schoolsrdquo Journal of Political Economy 123 (5) 985ndash1037

Dougherty Chrys and Lynn Mellor 2009 ldquoPreparation Mattersrdquo National Center for

Educational Achievement Washington DC

Dounay Zinth Jennifer 2016 ldquo50-State Comparison Advanced Placement Policiesrdquo Education

Commission of the States

Drew Christopher 2011 ldquoRethinking Advanced Placementrdquo The New York Times January 7

Duffett Ann and Steve Farkas 2009 ldquoGrowing Pains in the Advanced Placement Program Do

Tough Trade-offs Lie Aheadrdquo Thomas B Fordham Institute Washington DC

Ellis Jessica Bailey K Fosdick and Chris Rasmussen 2016 ldquoWomen 15 Times More Likely to

Leave STEM Pipeline after Calculus Compared to Men Lack of Mathematical

Confidence a Potential Culpritrdquo PLOS ONE 11 (7) 1ndash14

Foust Regan Clark Holly Hertberg-Davis and Carolyn M Callahan 2009 ldquoStudentsrsquo

Perceptions of the Non-academic Advantages and Disadvantages of Participation in

Advanced Placement Courses and International Baccalaureate Programsrdquo Adolescence

44 (174) 289ndash312

Geiser Saul and Veronica Santelices 2004 ldquoThe Role of Advanced Placement and Honors

Courses in College Admissionsrdquo Center for Studies in Higher Education Research

Occasional Paper Series CSHE404

Goodman Joshua Samuel 2012 ldquoThe Labor of Division Returns to Compulsory Math

Courseworkrdquo Unpublished Manuscript

Harel O 2009 ldquoThe Estimation of R-squared and Adjusted R-squared in Incomplete Data Sets

Using Multiple Imputationrdquo Journal of Applied Statistics 36 (10) 1109ndash1118

Hippel Paul T von 2007 ldquoRegression with Missing Ys An Improved Strategy for Analyzing

Multiply Imputed Datardquo Sociological Methodology 37 (1) 83ndash117

Holstead Michael S Terry E Spradlin Margaret E McGillivray and Nathan Burroughs 2010

ldquoThe Impact of Advanced Placement Incentive Programsrdquo Center for Evaluation and

Education Policy Indiana University Education Policy Brief 8(1)

Hopkins Katy 2012 ldquoWeigh the Benefits Stress of AP Courses for Your Studentrdquo US News

the World Report May 10

Huber Martin 2013 ldquoA Simple Test for the Ignorability of Non-compliance in Experimentsrdquo

Economics Letters 120 (3) 389ndash391

Imbens G and F Manski 2004 ldquoConfidence Intervals for Partially Identified Parametersrdquo

Econometrica 72 (6) 1845ndash1857

Jackson C Kirabo 2010 ldquoA Little Now for a Lot Later A Look at a Texas Advanced Placement

Incentive Programrdquo Journal of Human Resources 45 (3) 591ndash639

__________ 2014 ldquoDo College-Preparatory Programs Improve Long-Term Outcomesrdquo

Economic Inquiry 52 (1) 72ndash99

Joensen Juanna Schroslashter and Helena Skyt Nielsen 2009 ldquoIs there a Causal Effect of High

School Math on Labor Market Outcomesrdquo Journal of Human Resources 44 (1) 171ndash

198

Kim Emily 2015 ldquoAP Classes often Translate to Advanced Pressurerdquo Los Angeles Times

September 22

Klopfenstein Kristin and Kit Lively 2016 ldquoDo Grade Weights Promote More Advanced

17

Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324

Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement

Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891

__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and

Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds

Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188

Cambridge Harvard Education Press

Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla

Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)

287ndash 313

Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on

Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102

Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations

of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347

(6219) 262ndash265

Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math

and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic

Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student

STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher

Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking

on Secondary and Postsecondary Successrdquo American Educational Research Journal 49

(2) 285ndash322

Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP

Expansion Can Schools in Less-Resourced Communities Successfully Implement

Advanced Placement Science Coursesrdquo Conditionally accepted by Educational

Researcher

Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo

American Enterprise Institute Washington DC

Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23

McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy

Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of

Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-

144) US Department of Education Washington DC National Center for Education

Statistics

National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of

Mathematics and Science in US High Schoolsrdquo Washington DC National Academies

Press

__________ 2012 A Framework for K-12 Science Education Practices Crosscutting

Concepts and Core Ideas Washington DC The National Academies Press

Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC

Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data

Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures

Version 10 Stanford University

Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic

Analysis amp Policy 4 (1) 1ndash30

18

Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The

Review of Economics and Statistics 86 (2) 497ndash513

Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)

Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of

Advanced High School Coursework in Increasing STEM Career Interestrdquo Science

Educator 23 (1) 1ndash13

Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework

in College Admission Decisionsrdquo College and University 82 (4) 7ndash14

Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan

Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific

Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo

Educational Measurement Forthcoming

Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where

it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor

Economics 35 (1) 67ndash147

Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An

Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732

Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual

differencesrdquo Personality and Individual Differences 21 (6) 971ndash986

Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of

Cross-Cultural Psychology 45 (5) 821ndash837

Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid

Growthrdquo The New York Times April 29

Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo

Liberal Education 94 (3) 38ndash43

The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo

Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo

Education Trust June 5

Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and

Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-

001) US Department of Education Washington DC National Center for Education

Statistics

Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13

Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate

US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the

Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced

Placement Testsrdquo Washington DC

Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of

Advanced Placementrdquo Progressive Policy Institute Washington DC

West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth

Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring

Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation

and Policy Analysis 38 (1) 148ndash170

Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity

of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482

19

Figure 1

Geographic Distribution of Participating Districts

20

Figure 2

Participating Districts Neighborhood Socioeconomic Status and School Test Scores

Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school

district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos

neighborhood defined as the first principal component factor score based on measures of median

income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed

household rate and unemployment rate Y-axis is the districtrsquos average test score in grade

equivalents based on the averaged spring math and English scores for students in grades 3-8 for

2009-2013 with the expected level of achievement standardized to zero The size of each circle

is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using

Statarsquos default settings and roughly shows the predicted test score as a function of the

neighborhoodrsquos SES

21

Figure 3

Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile

Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects

Corresponding OLS estimate shown by the dashed horizontal line Science skill has been

standardized to have a mean of 0 and SD of 1 for the full sample of participating students

Results are weighted by the inverse probability of completing the survey

22

Table 1

Participating Schools and Teachers Compared to Other US High Schools and High School

Science Teachers Panel A Schools Participating Others

Average Enrollment 1409 723

Free or Reduced-Price Lunch 0700 0438

Asian 0055 0050

Black 0349 0154

Hispanic 0410 0221

White 0164 0537

Adjusted Cohort Graduation Rate 0843 0802

District Instruction Expenditures Per Pupil $6561 $5636

District Student Services Expenditures Per Pupil $3787 $3385

Panel B Teachers Participating Others

Age Under 30 0407 0160

Age 30-49 0432 0553

Age 50 or over 0161 0287

Female 0630 0536

Hispanic or Latino 0111 0051

Race American Indian or Alaska Native 0000 0009

Race Asian American 0111 0041

Race Black 0111 0060

Race Native Hawaiian or other Pacific Islander 0000 0004

Race White 0778 0896

Years of Experience 103 132

Years of Experience lt=2 0290 0085

Years of Experience lt=5 0481 0234

Hold a Teaching Certificate 0926 0945

Undergraduate Major in STEM 0944 0747

Single Subject Credential in Science 0630 0823

Masterrsquos Degree or Higher 0356 0615

Previously Taught AP Course 0469 NA

Previously Taught AP IB or Honors Course 0796 NA

Number of Professional Development Trainings 309 NA

in the Past 5 years (0-5)

Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts

httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public

high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a

9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the

Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey

httpsncesedgovsurveyssass Others in Panel B refers to public and private high school

teachers in the US High school science teachers are defined as teachers of grades 9-12 whose

main teaching assignment is in the natural sciences

23

Table 2

TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics

(1) (2) (3) (4) (5) (6)

Full Sample Survey Sample

Pre-Treatment Characteristic

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Age as of October of 11th Grade 166 -003 -007 166 -001 -001

(002) (007) (003) (009)

[019] [035] [065] [094]

Math Exam Score 038 008 025 044 007 030

(004) (010) (005) (016)

[008] [002] [017] [006]

Reading Exam Score 029 010 018 036 009 017

(003) (012) (004) (017)

[000] [014] [002] [031]

HS Grade Point Average 316 005 020 323 006 013

(003) (008) (003) (010)

[014] [002] [006] [020]

Female 059 000 010 061 -001 011

(003) (006) (004) (007)

[099] [010] [073] [012]

Asian American 012 002 010 012 003 010

(002) (005) (001) (007)

[027] [006] [007] [012]

Black 032 -002 -006 027 000 -005

(002) (006) (002) (005)

[029] [028] [088] [040]

Hispanic Native American or Multiracial 031 001 005 033 001 005

24

(002) (006) (002) (007)

[055] [041] [081] [051]

Disabled 002 000 -001 001 000 -001

(001) (001) (001) (001)

[093] [024] [057] [05]

Gifted 013 003 000 014 002 001

(002) (005) (002) (009)

[006] [100] [025] [089]

English Language Learner 005 001 002 004 001 004

(001) (002) (001) (003)

[041] [039] [054] [022]

Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007

(002) (007) (003) (009)

[066] [077] [072] [045]

Language Other than English Spoken at Home 034 002 003 035 001 004

(002) (007) (002) (007)

[032] [073] [059] [056]

Took Recommended Prerequisite Courses 079 000 009 079 002 005

(002) (004) (002) (005)

[084] [004] [043] [031]

Number of Observations 1819 1417

Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by

School x Cohort are in parentheses and p-values are in brackets

25

Table 3

First Stage Impacts on AP Course Enrollment and Overall Course Enrollment

(1) (2) (3) (4) (5) (6)

Full Sample Survey Respondents

Outcome

Control

Group

Mean

ITT

LATE

Control

Group

Mean

ITT

LATE

AP Treatment Course Enrollment 019 038 024 039

(005) (006)

[000] [000] Share of Credits During Study Year in

AP Science 003 004 011 003 004 010

(001) (001) (001) (001)

[000] [000] [000] [000]

All AP 013 004 011 014 004 010

(001) (002) (001) (002)

[000] [000] [000] [000]

Other Advanced Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [023] [020] [020]

All Other Advanced 025 -001 -003 025 -001 -003

(001) (002) (001) (003)

[023] [023] [030] [030]

Regular Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [020] [024] [019]

All Regular 062 -003 -009 061 -003 -007

(001) (003) (001) (003)

[002] [000] [007] [003]

Number of Observations 1819 1417

Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating

Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation

(1) Course-taking information collected from student transcripts Control Group Mean uses the

full control group for the first outcome (ie AP Treatment Course Enrollment) and those control

group members who complied with their assignment (ie those who did not take the AP

Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are

weighted by the inverse probability of completing the survey Standard errors clustered by School

x Cohort are in parentheses and p-values are in brackets

26

Table 4

Treatment Contrast (Composite Variables)

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Academically Challenging Curriculum -033 031 080

(010) (024)

[000] [000]

Project-Based Independent Classroom

Activities -006 013 033

(007) (017)

[007] [006]

Integrated Use of Technology

-011 011 028

(008) (019)

[019] [014]

Number of Observations 1417

Notes To construct these composite variables we first converted the values on each component

variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest

category was set to 10 the lowest to 00 and the remaining categories evenly spaced between

00 and 10 We then averaged and standardized these converted values Results are weighted by

the inverse probability of completing the survey Online Appendix Table 5 provides the list of

component variables Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

27

Table 5

AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Science Skill -019 009 023

(006) (016)

[015] [014]

STEM Interest 062 004 009

(002) (007)

[016] [016]

Confidence in College

Science 092 -004 -010

(002) (005)

[011] [006]

Stress 012 007 017

(003) (007)

[002] [001]

Grades in Science Courses 280 -012 -029

(007) (016)

[008] [007]

Grades in Other Courses 314 -007 -018

(002) (006)

[000] [000]

Number of Observations 1819 for grades 1417 for other

outcomes

Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of

participating students STEM interest =1 if high or some interest in pursuing a STEM degree or

=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to

complete a college science course or =0 if somewhat not confident or not at all confident Stress=

1 if most recent science course had strong negative or negative impact on physical or emotional

health or =0 if strong positive impact positive impact or no impact Grades in science and other

courses are obtained from student transcripts and measure grades during the study year

Results with the exception of grades during study year are weighted by the inverse probability of

completing the survey Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

Table 6

Robustness Checks of Main ITT Results

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Outcome

Control

Group

Complie

r Mean

Main

Result

s

Robus

t SE

p-value

(permutatio

n test)

Excludin

g High

School

56

Including

Imputatio

n of

Missing

Outcome

Variables

Excluding

Covariate

s

Excludin

g High

School

23

Lee

Lower

Boun

d

Lee

Upper

Boun

d

95

Confidence

Interval

from Lee

Bounds

Rati

o of

95

CI in

(11)

to

95

CI in

(7)

Science Skill -019 009 010 011 020 007 003 039

-

009

05

1 20

(006) (005) (000) (000) (000) (000) (007) (007)

[015] [006] [006] [020] [011] [001] [024] [072] [000]

STEM Interest 062 004 005 003 003 003 002 012

-

003

01

8 19

(002) (003) (000) (000) (000) (000) (003) (004)

[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College

Science 092 -004 -003 -006 -006 -004 -006 005

-

009

01

0 20

(002) (002) (000) (000) (000) (000) (002) (003)

[011] [005] [007] [037] [002] [003] [010] [000] [017]

Stress 012 007 005 006 008 007 001 011

-

005

01

5 16

(003) (002) (000) (000) (000) (000) (003) (002)

[002] [000] [000] [014] [007] [002] [002] [079] [000]

Grades in Science Courses 280 -012 -006 -010 -007 |

(007) (004) (000) (000) (000)

[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts

Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey

(002) (003) (000) (000) (000) |

[000] [001] [001] [000] [001] [038]

Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than

standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby

a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of

43

the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)

reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the

experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply

imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and

where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)

from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those

treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and

control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to

derive confidence interval for the treatment effect itself)

30

1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the

Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the

effect of advanced high school courses more generally often without distinctions between AP

and other rigorous course options Nearly all of these nonexperimental studies find large positive

effects of rigorous secondary school courses particularly those in math and science on studentsrsquo

high school postsecondary and labor market performance (eg Altonji 1995 Attewell and

Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long

Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer

an AP course are encouraged (though not required) to attend a professional development

training The Board and other independent agencies offer several workshops with the most

extensive training being the AP summer institute a week-long training that is led by an

experienced AP instructor Teachers are then expected to develop their syllabi for the course and

submit them to the Board for review A team of auditors at the Board review each syllabus and

grant permission to a school to label the course as AP on course catalogs and student transcripts

once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they

do not meet the requirements upon original submission College Board (2017b) contains a

discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for

assessment (ie course delivery and student performance are not assessed by the Board) In

order to effectively run an AP Biology or Chemistry course teachers require access to a well-

equipped classroom and laboratory including all supplies necessary to engage in

experimentation (eg beakers solutions microscopes measuring equipment) Most of the

teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of

learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry

reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses

were based upon recommendations from the National Science Foundation the National Research

Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently

influenced by their frames of reference in ways that other assessments of these traits (eg

external observations) may be less influenced By increasing the standard to which they compare

themselves studentsrsquo confidence may decrease This feature of most self-assessments could be

considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et

al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome

depends to some extent on how these changes in perceived ability influence other behaviors

such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and

Biology I and Chemistry I for AP Biology with no additional requirements beyond these

prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week

training course classroom supplies (eg lab materials textbooks) and to compensate schools

for the staff time required for study administration efforts We also offered $1000 compensation

for an individual selected by the school to serve as a liaison between the study team and the

31

school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of

students which would have powered the study to detect effect sizes smaller than those detected

here We faced several challenges in recruiting schools to participate even with the monetary

incentives Some schools were uncomfortable with randomization across classrooms while

others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the

course would be offered We also made some assignments on a rolling basis as additional

consentassent forms were submitted We have no information on the students who were deemed

eligible by the school to take the new AP science course but who did not sign the consent form

to participate As these students did not participate we do not have permission to obtain

information on their characteristics (eg via transcripts) and for most schools we do not know

the number of such students 10 Participating districts include Anaheim Union High School District California East Side

Union High School District California Lynwood Unified School District California Jefferson

Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg

Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public

Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville

Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate

degrees held by teachers nationally are likely to be in education (not STEM) Thus the study

teachers are less likely to have a graduate degree but not necessarily less likely to have STEM

training We also did not survey teachers regarding their Teach for America (TFA) experience

but it is possible that the relatively high share of STEM undergraduate degrees could be driven

by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last

pilot test included 140 students) prior to administering the tool to study participants Reliability

metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of

the development of the assessment instrument in the survey can be found in Seeratan et al

(2017) 13 Each year in the spring semester our team administered and collected the participant surveys

during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However

if study participants who did not take the survey differ in unobserved ways then our reweighting

based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo

characteristics before imputation of missing values (as described below) these results are very

similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact

that some students did not have 10th grade scores) we created one reading and math score for

each student that is the average of both scores or just the 8th grade score For the 23 participating

students who were in 10th grade during the year in which the AP course was offered to their

cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be

endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of

enrollment in the course so the lack of balance is simply due to unlucky randomization rather

32

than manipulation by school administrators We considered implementing a randomized block

design to avoid such issues but found it infeasible to obtain the necessary test score information

prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was

allowed to register for the new class We added an entire planning year to our study design to

avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by

Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We

find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these

six outcomes which suggests that generalizing our estimated treatment effects to the full control

group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly

offered enrollment in an AP course and then given the option of Chemistry or Biology To

account for the two courses offered we treat the school as two separate groups School-

Chemistry and School-Biology For those students who were not offered an AP course we

randomly assign them to one of two control groups proportional to the number of treated

students who chose each course For example if 60 of the treated students chose Biology then

we randomly assign 60 of the control students to the School-Biology control group In Section

VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a

probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1

if student i in school by cohort j completed any part of the end-of-year survey Xi is the same

vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed

effects and Φ() is the cumulative normal distribution function The results of this regression are

included in Online Appendix Table 2 Students who had higher pre-treatment grades Black

students those who were not disabled and those who took prerequisite courses were more likely

to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives

more weight in the regression to study participants who completed the survey and yet had pre-

study characteristics that were similar to those study participants who did not complete the

survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and

with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we

observe each outcome variable This follows a multiple imputation then deletion strategy

suggested by Hippel (2007) which improves efficiency while protecting against problematic

imputed outcome values As a robustness check Section VC provides results including

imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually

challenging curriculum with more homework than non-AP complier students Treatment group

students are also more likely to report that the students in their class were driven to succeed and

that the teacher set high standards The AP science class also involved more student-led projects

or experiments hands on learning and small group work all activities that are deemed to be

essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)

Yet we do not find strong evidence that students in AP classes were more likely to present what

they learned apply their knowledge to solve a new problem or work independently and none of

the component measures of technology usage were statistically significantly affected Nor did

33

treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear

better able to implement the academic rigor expected of an AP science class than some of the

inquiry-based approaches that the College Board intends for AP science We do not find

evidence that taking AP science led students to be more likely to report that they found their

course more interesting which may reflect the inability of the teachers to fully implement a

creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects

that might render our estimated effects smaller A research design with randomization both

across and within schools would allow for estimation of spillover effects but such a design was

infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers

received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the

weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors

in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple

outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons

(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same

three outcomes that reach statistical significance without applying the correction (shown in

Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys

from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we

have included the students from cohort 1 of high school number 23 where nonresponse was due

mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes

(including student prior academic preparation raceethnicity gender and teacher preparation)

We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in

science and grades in other courses) Some of the differences in the point estimates were quite

large yet so too were the standard errors For instance five of the seven estimated differential

treatment effects on science skill exceed 025 standard deviations with p-values that fall in the

suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse

on all three cohorts of study participants Once data collection is complete we will have the

ability to examine the effect of AP science on college enrollment college selectivity and college

completion

Page 16: Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown,

16

Davis Jennifer R 2014 ldquoA Little Goes a Long Way Pressure for College Students to Succeedrdquo

Journal of Undergraduate Research 12 (1) 1ndash9

Dobbie Will and Roland G Fryer Jr 2015 ldquoThe medium-term impacts of high-achieving

charter schoolsrdquo Journal of Political Economy 123 (5) 985ndash1037

Dougherty Chrys and Lynn Mellor 2009 ldquoPreparation Mattersrdquo National Center for

Educational Achievement Washington DC

Dounay Zinth Jennifer 2016 ldquo50-State Comparison Advanced Placement Policiesrdquo Education

Commission of the States

Drew Christopher 2011 ldquoRethinking Advanced Placementrdquo The New York Times January 7

Duffett Ann and Steve Farkas 2009 ldquoGrowing Pains in the Advanced Placement Program Do

Tough Trade-offs Lie Aheadrdquo Thomas B Fordham Institute Washington DC

Ellis Jessica Bailey K Fosdick and Chris Rasmussen 2016 ldquoWomen 15 Times More Likely to

Leave STEM Pipeline after Calculus Compared to Men Lack of Mathematical

Confidence a Potential Culpritrdquo PLOS ONE 11 (7) 1ndash14

Foust Regan Clark Holly Hertberg-Davis and Carolyn M Callahan 2009 ldquoStudentsrsquo

Perceptions of the Non-academic Advantages and Disadvantages of Participation in

Advanced Placement Courses and International Baccalaureate Programsrdquo Adolescence

44 (174) 289ndash312

Geiser Saul and Veronica Santelices 2004 ldquoThe Role of Advanced Placement and Honors

Courses in College Admissionsrdquo Center for Studies in Higher Education Research

Occasional Paper Series CSHE404

Goodman Joshua Samuel 2012 ldquoThe Labor of Division Returns to Compulsory Math

Courseworkrdquo Unpublished Manuscript

Harel O 2009 ldquoThe Estimation of R-squared and Adjusted R-squared in Incomplete Data Sets

Using Multiple Imputationrdquo Journal of Applied Statistics 36 (10) 1109ndash1118

Hippel Paul T von 2007 ldquoRegression with Missing Ys An Improved Strategy for Analyzing

Multiply Imputed Datardquo Sociological Methodology 37 (1) 83ndash117

Holstead Michael S Terry E Spradlin Margaret E McGillivray and Nathan Burroughs 2010

ldquoThe Impact of Advanced Placement Incentive Programsrdquo Center for Evaluation and

Education Policy Indiana University Education Policy Brief 8(1)

Hopkins Katy 2012 ldquoWeigh the Benefits Stress of AP Courses for Your Studentrdquo US News

the World Report May 10

Huber Martin 2013 ldquoA Simple Test for the Ignorability of Non-compliance in Experimentsrdquo

Economics Letters 120 (3) 389ndash391

Imbens G and F Manski 2004 ldquoConfidence Intervals for Partially Identified Parametersrdquo

Econometrica 72 (6) 1845ndash1857

Jackson C Kirabo 2010 ldquoA Little Now for a Lot Later A Look at a Texas Advanced Placement

Incentive Programrdquo Journal of Human Resources 45 (3) 591ndash639

__________ 2014 ldquoDo College-Preparatory Programs Improve Long-Term Outcomesrdquo

Economic Inquiry 52 (1) 72ndash99

Joensen Juanna Schroslashter and Helena Skyt Nielsen 2009 ldquoIs there a Causal Effect of High

School Math on Labor Market Outcomesrdquo Journal of Human Resources 44 (1) 171ndash

198

Kim Emily 2015 ldquoAP Classes often Translate to Advanced Pressurerdquo Los Angeles Times

September 22

Klopfenstein Kristin and Kit Lively 2016 ldquoDo Grade Weights Promote More Advanced

17

Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324

Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement

Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891

__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and

Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds

Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188

Cambridge Harvard Education Press

Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla

Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)

287ndash 313

Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on

Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102

Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations

of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347

(6219) 262ndash265

Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math

and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic

Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student

STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher

Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking

on Secondary and Postsecondary Successrdquo American Educational Research Journal 49

(2) 285ndash322

Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP

Expansion Can Schools in Less-Resourced Communities Successfully Implement

Advanced Placement Science Coursesrdquo Conditionally accepted by Educational

Researcher

Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo

American Enterprise Institute Washington DC

Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23

McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy

Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of

Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-

144) US Department of Education Washington DC National Center for Education

Statistics

National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of

Mathematics and Science in US High Schoolsrdquo Washington DC National Academies

Press

__________ 2012 A Framework for K-12 Science Education Practices Crosscutting

Concepts and Core Ideas Washington DC The National Academies Press

Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC

Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data

Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures

Version 10 Stanford University

Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic

Analysis amp Policy 4 (1) 1ndash30

18

Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The

Review of Economics and Statistics 86 (2) 497ndash513

Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)

Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of

Advanced High School Coursework in Increasing STEM Career Interestrdquo Science

Educator 23 (1) 1ndash13

Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework

in College Admission Decisionsrdquo College and University 82 (4) 7ndash14

Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan

Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific

Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo

Educational Measurement Forthcoming

Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where

it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor

Economics 35 (1) 67ndash147

Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An

Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732

Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual

differencesrdquo Personality and Individual Differences 21 (6) 971ndash986

Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of

Cross-Cultural Psychology 45 (5) 821ndash837

Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid

Growthrdquo The New York Times April 29

Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo

Liberal Education 94 (3) 38ndash43

The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo

Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo

Education Trust June 5

Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and

Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-

001) US Department of Education Washington DC National Center for Education

Statistics

Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13

Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate

US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the

Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced

Placement Testsrdquo Washington DC

Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of

Advanced Placementrdquo Progressive Policy Institute Washington DC

West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth

Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring

Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation

and Policy Analysis 38 (1) 148ndash170

Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity

of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482

19

Figure 1

Geographic Distribution of Participating Districts

20

Figure 2

Participating Districts Neighborhood Socioeconomic Status and School Test Scores

Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school

district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos

neighborhood defined as the first principal component factor score based on measures of median

income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed

household rate and unemployment rate Y-axis is the districtrsquos average test score in grade

equivalents based on the averaged spring math and English scores for students in grades 3-8 for

2009-2013 with the expected level of achievement standardized to zero The size of each circle

is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using

Statarsquos default settings and roughly shows the predicted test score as a function of the

neighborhoodrsquos SES

21

Figure 3

Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile

Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects

Corresponding OLS estimate shown by the dashed horizontal line Science skill has been

standardized to have a mean of 0 and SD of 1 for the full sample of participating students

Results are weighted by the inverse probability of completing the survey

22

Table 1

Participating Schools and Teachers Compared to Other US High Schools and High School

Science Teachers Panel A Schools Participating Others

Average Enrollment 1409 723

Free or Reduced-Price Lunch 0700 0438

Asian 0055 0050

Black 0349 0154

Hispanic 0410 0221

White 0164 0537

Adjusted Cohort Graduation Rate 0843 0802

District Instruction Expenditures Per Pupil $6561 $5636

District Student Services Expenditures Per Pupil $3787 $3385

Panel B Teachers Participating Others

Age Under 30 0407 0160

Age 30-49 0432 0553

Age 50 or over 0161 0287

Female 0630 0536

Hispanic or Latino 0111 0051

Race American Indian or Alaska Native 0000 0009

Race Asian American 0111 0041

Race Black 0111 0060

Race Native Hawaiian or other Pacific Islander 0000 0004

Race White 0778 0896

Years of Experience 103 132

Years of Experience lt=2 0290 0085

Years of Experience lt=5 0481 0234

Hold a Teaching Certificate 0926 0945

Undergraduate Major in STEM 0944 0747

Single Subject Credential in Science 0630 0823

Masterrsquos Degree or Higher 0356 0615

Previously Taught AP Course 0469 NA

Previously Taught AP IB or Honors Course 0796 NA

Number of Professional Development Trainings 309 NA

in the Past 5 years (0-5)

Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts

httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public

high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a

9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the

Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey

httpsncesedgovsurveyssass Others in Panel B refers to public and private high school

teachers in the US High school science teachers are defined as teachers of grades 9-12 whose

main teaching assignment is in the natural sciences

23

Table 2

TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics

(1) (2) (3) (4) (5) (6)

Full Sample Survey Sample

Pre-Treatment Characteristic

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Age as of October of 11th Grade 166 -003 -007 166 -001 -001

(002) (007) (003) (009)

[019] [035] [065] [094]

Math Exam Score 038 008 025 044 007 030

(004) (010) (005) (016)

[008] [002] [017] [006]

Reading Exam Score 029 010 018 036 009 017

(003) (012) (004) (017)

[000] [014] [002] [031]

HS Grade Point Average 316 005 020 323 006 013

(003) (008) (003) (010)

[014] [002] [006] [020]

Female 059 000 010 061 -001 011

(003) (006) (004) (007)

[099] [010] [073] [012]

Asian American 012 002 010 012 003 010

(002) (005) (001) (007)

[027] [006] [007] [012]

Black 032 -002 -006 027 000 -005

(002) (006) (002) (005)

[029] [028] [088] [040]

Hispanic Native American or Multiracial 031 001 005 033 001 005

24

(002) (006) (002) (007)

[055] [041] [081] [051]

Disabled 002 000 -001 001 000 -001

(001) (001) (001) (001)

[093] [024] [057] [05]

Gifted 013 003 000 014 002 001

(002) (005) (002) (009)

[006] [100] [025] [089]

English Language Learner 005 001 002 004 001 004

(001) (002) (001) (003)

[041] [039] [054] [022]

Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007

(002) (007) (003) (009)

[066] [077] [072] [045]

Language Other than English Spoken at Home 034 002 003 035 001 004

(002) (007) (002) (007)

[032] [073] [059] [056]

Took Recommended Prerequisite Courses 079 000 009 079 002 005

(002) (004) (002) (005)

[084] [004] [043] [031]

Number of Observations 1819 1417

Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by

School x Cohort are in parentheses and p-values are in brackets

25

Table 3

First Stage Impacts on AP Course Enrollment and Overall Course Enrollment

(1) (2) (3) (4) (5) (6)

Full Sample Survey Respondents

Outcome

Control

Group

Mean

ITT

LATE

Control

Group

Mean

ITT

LATE

AP Treatment Course Enrollment 019 038 024 039

(005) (006)

[000] [000] Share of Credits During Study Year in

AP Science 003 004 011 003 004 010

(001) (001) (001) (001)

[000] [000] [000] [000]

All AP 013 004 011 014 004 010

(001) (002) (001) (002)

[000] [000] [000] [000]

Other Advanced Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [023] [020] [020]

All Other Advanced 025 -001 -003 025 -001 -003

(001) (002) (001) (003)

[023] [023] [030] [030]

Regular Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [020] [024] [019]

All Regular 062 -003 -009 061 -003 -007

(001) (003) (001) (003)

[002] [000] [007] [003]

Number of Observations 1819 1417

Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating

Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation

(1) Course-taking information collected from student transcripts Control Group Mean uses the

full control group for the first outcome (ie AP Treatment Course Enrollment) and those control

group members who complied with their assignment (ie those who did not take the AP

Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are

weighted by the inverse probability of completing the survey Standard errors clustered by School

x Cohort are in parentheses and p-values are in brackets

26

Table 4

Treatment Contrast (Composite Variables)

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Academically Challenging Curriculum -033 031 080

(010) (024)

[000] [000]

Project-Based Independent Classroom

Activities -006 013 033

(007) (017)

[007] [006]

Integrated Use of Technology

-011 011 028

(008) (019)

[019] [014]

Number of Observations 1417

Notes To construct these composite variables we first converted the values on each component

variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest

category was set to 10 the lowest to 00 and the remaining categories evenly spaced between

00 and 10 We then averaged and standardized these converted values Results are weighted by

the inverse probability of completing the survey Online Appendix Table 5 provides the list of

component variables Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

27

Table 5

AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Science Skill -019 009 023

(006) (016)

[015] [014]

STEM Interest 062 004 009

(002) (007)

[016] [016]

Confidence in College

Science 092 -004 -010

(002) (005)

[011] [006]

Stress 012 007 017

(003) (007)

[002] [001]

Grades in Science Courses 280 -012 -029

(007) (016)

[008] [007]

Grades in Other Courses 314 -007 -018

(002) (006)

[000] [000]

Number of Observations 1819 for grades 1417 for other

outcomes

Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of

participating students STEM interest =1 if high or some interest in pursuing a STEM degree or

=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to

complete a college science course or =0 if somewhat not confident or not at all confident Stress=

1 if most recent science course had strong negative or negative impact on physical or emotional

health or =0 if strong positive impact positive impact or no impact Grades in science and other

courses are obtained from student transcripts and measure grades during the study year

Results with the exception of grades during study year are weighted by the inverse probability of

completing the survey Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

Table 6

Robustness Checks of Main ITT Results

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Outcome

Control

Group

Complie

r Mean

Main

Result

s

Robus

t SE

p-value

(permutatio

n test)

Excludin

g High

School

56

Including

Imputatio

n of

Missing

Outcome

Variables

Excluding

Covariate

s

Excludin

g High

School

23

Lee

Lower

Boun

d

Lee

Upper

Boun

d

95

Confidence

Interval

from Lee

Bounds

Rati

o of

95

CI in

(11)

to

95

CI in

(7)

Science Skill -019 009 010 011 020 007 003 039

-

009

05

1 20

(006) (005) (000) (000) (000) (000) (007) (007)

[015] [006] [006] [020] [011] [001] [024] [072] [000]

STEM Interest 062 004 005 003 003 003 002 012

-

003

01

8 19

(002) (003) (000) (000) (000) (000) (003) (004)

[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College

Science 092 -004 -003 -006 -006 -004 -006 005

-

009

01

0 20

(002) (002) (000) (000) (000) (000) (002) (003)

[011] [005] [007] [037] [002] [003] [010] [000] [017]

Stress 012 007 005 006 008 007 001 011

-

005

01

5 16

(003) (002) (000) (000) (000) (000) (003) (002)

[002] [000] [000] [014] [007] [002] [002] [079] [000]

Grades in Science Courses 280 -012 -006 -010 -007 |

(007) (004) (000) (000) (000)

[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts

Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey

(002) (003) (000) (000) (000) |

[000] [001] [001] [000] [001] [038]

Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than

standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby

a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of

43

the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)

reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the

experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply

imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and

where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)

from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those

treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and

control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to

derive confidence interval for the treatment effect itself)

30

1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the

Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the

effect of advanced high school courses more generally often without distinctions between AP

and other rigorous course options Nearly all of these nonexperimental studies find large positive

effects of rigorous secondary school courses particularly those in math and science on studentsrsquo

high school postsecondary and labor market performance (eg Altonji 1995 Attewell and

Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long

Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer

an AP course are encouraged (though not required) to attend a professional development

training The Board and other independent agencies offer several workshops with the most

extensive training being the AP summer institute a week-long training that is led by an

experienced AP instructor Teachers are then expected to develop their syllabi for the course and

submit them to the Board for review A team of auditors at the Board review each syllabus and

grant permission to a school to label the course as AP on course catalogs and student transcripts

once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they

do not meet the requirements upon original submission College Board (2017b) contains a

discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for

assessment (ie course delivery and student performance are not assessed by the Board) In

order to effectively run an AP Biology or Chemistry course teachers require access to a well-

equipped classroom and laboratory including all supplies necessary to engage in

experimentation (eg beakers solutions microscopes measuring equipment) Most of the

teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of

learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry

reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses

were based upon recommendations from the National Science Foundation the National Research

Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently

influenced by their frames of reference in ways that other assessments of these traits (eg

external observations) may be less influenced By increasing the standard to which they compare

themselves studentsrsquo confidence may decrease This feature of most self-assessments could be

considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et

al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome

depends to some extent on how these changes in perceived ability influence other behaviors

such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and

Biology I and Chemistry I for AP Biology with no additional requirements beyond these

prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week

training course classroom supplies (eg lab materials textbooks) and to compensate schools

for the staff time required for study administration efforts We also offered $1000 compensation

for an individual selected by the school to serve as a liaison between the study team and the

31

school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of

students which would have powered the study to detect effect sizes smaller than those detected

here We faced several challenges in recruiting schools to participate even with the monetary

incentives Some schools were uncomfortable with randomization across classrooms while

others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the

course would be offered We also made some assignments on a rolling basis as additional

consentassent forms were submitted We have no information on the students who were deemed

eligible by the school to take the new AP science course but who did not sign the consent form

to participate As these students did not participate we do not have permission to obtain

information on their characteristics (eg via transcripts) and for most schools we do not know

the number of such students 10 Participating districts include Anaheim Union High School District California East Side

Union High School District California Lynwood Unified School District California Jefferson

Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg

Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public

Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville

Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate

degrees held by teachers nationally are likely to be in education (not STEM) Thus the study

teachers are less likely to have a graduate degree but not necessarily less likely to have STEM

training We also did not survey teachers regarding their Teach for America (TFA) experience

but it is possible that the relatively high share of STEM undergraduate degrees could be driven

by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last

pilot test included 140 students) prior to administering the tool to study participants Reliability

metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of

the development of the assessment instrument in the survey can be found in Seeratan et al

(2017) 13 Each year in the spring semester our team administered and collected the participant surveys

during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However

if study participants who did not take the survey differ in unobserved ways then our reweighting

based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo

characteristics before imputation of missing values (as described below) these results are very

similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact

that some students did not have 10th grade scores) we created one reading and math score for

each student that is the average of both scores or just the 8th grade score For the 23 participating

students who were in 10th grade during the year in which the AP course was offered to their

cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be

endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of

enrollment in the course so the lack of balance is simply due to unlucky randomization rather

32

than manipulation by school administrators We considered implementing a randomized block

design to avoid such issues but found it infeasible to obtain the necessary test score information

prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was

allowed to register for the new class We added an entire planning year to our study design to

avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by

Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We

find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these

six outcomes which suggests that generalizing our estimated treatment effects to the full control

group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly

offered enrollment in an AP course and then given the option of Chemistry or Biology To

account for the two courses offered we treat the school as two separate groups School-

Chemistry and School-Biology For those students who were not offered an AP course we

randomly assign them to one of two control groups proportional to the number of treated

students who chose each course For example if 60 of the treated students chose Biology then

we randomly assign 60 of the control students to the School-Biology control group In Section

VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a

probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1

if student i in school by cohort j completed any part of the end-of-year survey Xi is the same

vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed

effects and Φ() is the cumulative normal distribution function The results of this regression are

included in Online Appendix Table 2 Students who had higher pre-treatment grades Black

students those who were not disabled and those who took prerequisite courses were more likely

to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives

more weight in the regression to study participants who completed the survey and yet had pre-

study characteristics that were similar to those study participants who did not complete the

survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and

with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we

observe each outcome variable This follows a multiple imputation then deletion strategy

suggested by Hippel (2007) which improves efficiency while protecting against problematic

imputed outcome values As a robustness check Section VC provides results including

imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually

challenging curriculum with more homework than non-AP complier students Treatment group

students are also more likely to report that the students in their class were driven to succeed and

that the teacher set high standards The AP science class also involved more student-led projects

or experiments hands on learning and small group work all activities that are deemed to be

essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)

Yet we do not find strong evidence that students in AP classes were more likely to present what

they learned apply their knowledge to solve a new problem or work independently and none of

the component measures of technology usage were statistically significantly affected Nor did

33

treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear

better able to implement the academic rigor expected of an AP science class than some of the

inquiry-based approaches that the College Board intends for AP science We do not find

evidence that taking AP science led students to be more likely to report that they found their

course more interesting which may reflect the inability of the teachers to fully implement a

creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects

that might render our estimated effects smaller A research design with randomization both

across and within schools would allow for estimation of spillover effects but such a design was

infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers

received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the

weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors

in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple

outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons

(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same

three outcomes that reach statistical significance without applying the correction (shown in

Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys

from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we

have included the students from cohort 1 of high school number 23 where nonresponse was due

mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes

(including student prior academic preparation raceethnicity gender and teacher preparation)

We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in

science and grades in other courses) Some of the differences in the point estimates were quite

large yet so too were the standard errors For instance five of the seven estimated differential

treatment effects on science skill exceed 025 standard deviations with p-values that fall in the

suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse

on all three cohorts of study participants Once data collection is complete we will have the

ability to examine the effect of AP science on college enrollment college selectivity and college

completion

Page 17: Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown,

17

Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324

Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement

Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891

__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and

Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds

Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188

Cambridge Harvard Education Press

Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla

Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)

287ndash 313

Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on

Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102

Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations

of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347

(6219) 262ndash265

Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math

and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic

Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student

STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher

Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking

on Secondary and Postsecondary Successrdquo American Educational Research Journal 49

(2) 285ndash322

Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP

Expansion Can Schools in Less-Resourced Communities Successfully Implement

Advanced Placement Science Coursesrdquo Conditionally accepted by Educational

Researcher

Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo

American Enterprise Institute Washington DC

Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23

McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy

Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of

Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-

144) US Department of Education Washington DC National Center for Education

Statistics

National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of

Mathematics and Science in US High Schoolsrdquo Washington DC National Academies

Press

__________ 2012 A Framework for K-12 Science Education Practices Crosscutting

Concepts and Core Ideas Washington DC The National Academies Press

Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC

Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data

Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures

Version 10 Stanford University

Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic

Analysis amp Policy 4 (1) 1ndash30

18

Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The

Review of Economics and Statistics 86 (2) 497ndash513

Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)

Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of

Advanced High School Coursework in Increasing STEM Career Interestrdquo Science

Educator 23 (1) 1ndash13

Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework

in College Admission Decisionsrdquo College and University 82 (4) 7ndash14

Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan

Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific

Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo

Educational Measurement Forthcoming

Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where

it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor

Economics 35 (1) 67ndash147

Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An

Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732

Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual

differencesrdquo Personality and Individual Differences 21 (6) 971ndash986

Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of

Cross-Cultural Psychology 45 (5) 821ndash837

Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid

Growthrdquo The New York Times April 29

Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo

Liberal Education 94 (3) 38ndash43

The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo

Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo

Education Trust June 5

Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and

Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-

001) US Department of Education Washington DC National Center for Education

Statistics

Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13

Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate

US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the

Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced

Placement Testsrdquo Washington DC

Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of

Advanced Placementrdquo Progressive Policy Institute Washington DC

West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth

Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring

Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation

and Policy Analysis 38 (1) 148ndash170

Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity

of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482

19

Figure 1

Geographic Distribution of Participating Districts

20

Figure 2

Participating Districts Neighborhood Socioeconomic Status and School Test Scores

Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school

district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos

neighborhood defined as the first principal component factor score based on measures of median

income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed

household rate and unemployment rate Y-axis is the districtrsquos average test score in grade

equivalents based on the averaged spring math and English scores for students in grades 3-8 for

2009-2013 with the expected level of achievement standardized to zero The size of each circle

is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using

Statarsquos default settings and roughly shows the predicted test score as a function of the

neighborhoodrsquos SES

21

Figure 3

Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile

Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects

Corresponding OLS estimate shown by the dashed horizontal line Science skill has been

standardized to have a mean of 0 and SD of 1 for the full sample of participating students

Results are weighted by the inverse probability of completing the survey

22

Table 1

Participating Schools and Teachers Compared to Other US High Schools and High School

Science Teachers Panel A Schools Participating Others

Average Enrollment 1409 723

Free or Reduced-Price Lunch 0700 0438

Asian 0055 0050

Black 0349 0154

Hispanic 0410 0221

White 0164 0537

Adjusted Cohort Graduation Rate 0843 0802

District Instruction Expenditures Per Pupil $6561 $5636

District Student Services Expenditures Per Pupil $3787 $3385

Panel B Teachers Participating Others

Age Under 30 0407 0160

Age 30-49 0432 0553

Age 50 or over 0161 0287

Female 0630 0536

Hispanic or Latino 0111 0051

Race American Indian or Alaska Native 0000 0009

Race Asian American 0111 0041

Race Black 0111 0060

Race Native Hawaiian or other Pacific Islander 0000 0004

Race White 0778 0896

Years of Experience 103 132

Years of Experience lt=2 0290 0085

Years of Experience lt=5 0481 0234

Hold a Teaching Certificate 0926 0945

Undergraduate Major in STEM 0944 0747

Single Subject Credential in Science 0630 0823

Masterrsquos Degree or Higher 0356 0615

Previously Taught AP Course 0469 NA

Previously Taught AP IB or Honors Course 0796 NA

Number of Professional Development Trainings 309 NA

in the Past 5 years (0-5)

Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts

httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public

high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a

9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the

Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey

httpsncesedgovsurveyssass Others in Panel B refers to public and private high school

teachers in the US High school science teachers are defined as teachers of grades 9-12 whose

main teaching assignment is in the natural sciences

23

Table 2

TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics

(1) (2) (3) (4) (5) (6)

Full Sample Survey Sample

Pre-Treatment Characteristic

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Age as of October of 11th Grade 166 -003 -007 166 -001 -001

(002) (007) (003) (009)

[019] [035] [065] [094]

Math Exam Score 038 008 025 044 007 030

(004) (010) (005) (016)

[008] [002] [017] [006]

Reading Exam Score 029 010 018 036 009 017

(003) (012) (004) (017)

[000] [014] [002] [031]

HS Grade Point Average 316 005 020 323 006 013

(003) (008) (003) (010)

[014] [002] [006] [020]

Female 059 000 010 061 -001 011

(003) (006) (004) (007)

[099] [010] [073] [012]

Asian American 012 002 010 012 003 010

(002) (005) (001) (007)

[027] [006] [007] [012]

Black 032 -002 -006 027 000 -005

(002) (006) (002) (005)

[029] [028] [088] [040]

Hispanic Native American or Multiracial 031 001 005 033 001 005

24

(002) (006) (002) (007)

[055] [041] [081] [051]

Disabled 002 000 -001 001 000 -001

(001) (001) (001) (001)

[093] [024] [057] [05]

Gifted 013 003 000 014 002 001

(002) (005) (002) (009)

[006] [100] [025] [089]

English Language Learner 005 001 002 004 001 004

(001) (002) (001) (003)

[041] [039] [054] [022]

Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007

(002) (007) (003) (009)

[066] [077] [072] [045]

Language Other than English Spoken at Home 034 002 003 035 001 004

(002) (007) (002) (007)

[032] [073] [059] [056]

Took Recommended Prerequisite Courses 079 000 009 079 002 005

(002) (004) (002) (005)

[084] [004] [043] [031]

Number of Observations 1819 1417

Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by

School x Cohort are in parentheses and p-values are in brackets

25

Table 3

First Stage Impacts on AP Course Enrollment and Overall Course Enrollment

(1) (2) (3) (4) (5) (6)

Full Sample Survey Respondents

Outcome

Control

Group

Mean

ITT

LATE

Control

Group

Mean

ITT

LATE

AP Treatment Course Enrollment 019 038 024 039

(005) (006)

[000] [000] Share of Credits During Study Year in

AP Science 003 004 011 003 004 010

(001) (001) (001) (001)

[000] [000] [000] [000]

All AP 013 004 011 014 004 010

(001) (002) (001) (002)

[000] [000] [000] [000]

Other Advanced Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [023] [020] [020]

All Other Advanced 025 -001 -003 025 -001 -003

(001) (002) (001) (003)

[023] [023] [030] [030]

Regular Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [020] [024] [019]

All Regular 062 -003 -009 061 -003 -007

(001) (003) (001) (003)

[002] [000] [007] [003]

Number of Observations 1819 1417

Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating

Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation

(1) Course-taking information collected from student transcripts Control Group Mean uses the

full control group for the first outcome (ie AP Treatment Course Enrollment) and those control

group members who complied with their assignment (ie those who did not take the AP

Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are

weighted by the inverse probability of completing the survey Standard errors clustered by School

x Cohort are in parentheses and p-values are in brackets

26

Table 4

Treatment Contrast (Composite Variables)

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Academically Challenging Curriculum -033 031 080

(010) (024)

[000] [000]

Project-Based Independent Classroom

Activities -006 013 033

(007) (017)

[007] [006]

Integrated Use of Technology

-011 011 028

(008) (019)

[019] [014]

Number of Observations 1417

Notes To construct these composite variables we first converted the values on each component

variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest

category was set to 10 the lowest to 00 and the remaining categories evenly spaced between

00 and 10 We then averaged and standardized these converted values Results are weighted by

the inverse probability of completing the survey Online Appendix Table 5 provides the list of

component variables Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

27

Table 5

AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Science Skill -019 009 023

(006) (016)

[015] [014]

STEM Interest 062 004 009

(002) (007)

[016] [016]

Confidence in College

Science 092 -004 -010

(002) (005)

[011] [006]

Stress 012 007 017

(003) (007)

[002] [001]

Grades in Science Courses 280 -012 -029

(007) (016)

[008] [007]

Grades in Other Courses 314 -007 -018

(002) (006)

[000] [000]

Number of Observations 1819 for grades 1417 for other

outcomes

Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of

participating students STEM interest =1 if high or some interest in pursuing a STEM degree or

=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to

complete a college science course or =0 if somewhat not confident or not at all confident Stress=

1 if most recent science course had strong negative or negative impact on physical or emotional

health or =0 if strong positive impact positive impact or no impact Grades in science and other

courses are obtained from student transcripts and measure grades during the study year

Results with the exception of grades during study year are weighted by the inverse probability of

completing the survey Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

Table 6

Robustness Checks of Main ITT Results

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Outcome

Control

Group

Complie

r Mean

Main

Result

s

Robus

t SE

p-value

(permutatio

n test)

Excludin

g High

School

56

Including

Imputatio

n of

Missing

Outcome

Variables

Excluding

Covariate

s

Excludin

g High

School

23

Lee

Lower

Boun

d

Lee

Upper

Boun

d

95

Confidence

Interval

from Lee

Bounds

Rati

o of

95

CI in

(11)

to

95

CI in

(7)

Science Skill -019 009 010 011 020 007 003 039

-

009

05

1 20

(006) (005) (000) (000) (000) (000) (007) (007)

[015] [006] [006] [020] [011] [001] [024] [072] [000]

STEM Interest 062 004 005 003 003 003 002 012

-

003

01

8 19

(002) (003) (000) (000) (000) (000) (003) (004)

[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College

Science 092 -004 -003 -006 -006 -004 -006 005

-

009

01

0 20

(002) (002) (000) (000) (000) (000) (002) (003)

[011] [005] [007] [037] [002] [003] [010] [000] [017]

Stress 012 007 005 006 008 007 001 011

-

005

01

5 16

(003) (002) (000) (000) (000) (000) (003) (002)

[002] [000] [000] [014] [007] [002] [002] [079] [000]

Grades in Science Courses 280 -012 -006 -010 -007 |

(007) (004) (000) (000) (000)

[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts

Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey

(002) (003) (000) (000) (000) |

[000] [001] [001] [000] [001] [038]

Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than

standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby

a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of

43

the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)

reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the

experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply

imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and

where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)

from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those

treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and

control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to

derive confidence interval for the treatment effect itself)

30

1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the

Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the

effect of advanced high school courses more generally often without distinctions between AP

and other rigorous course options Nearly all of these nonexperimental studies find large positive

effects of rigorous secondary school courses particularly those in math and science on studentsrsquo

high school postsecondary and labor market performance (eg Altonji 1995 Attewell and

Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long

Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer

an AP course are encouraged (though not required) to attend a professional development

training The Board and other independent agencies offer several workshops with the most

extensive training being the AP summer institute a week-long training that is led by an

experienced AP instructor Teachers are then expected to develop their syllabi for the course and

submit them to the Board for review A team of auditors at the Board review each syllabus and

grant permission to a school to label the course as AP on course catalogs and student transcripts

once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they

do not meet the requirements upon original submission College Board (2017b) contains a

discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for

assessment (ie course delivery and student performance are not assessed by the Board) In

order to effectively run an AP Biology or Chemistry course teachers require access to a well-

equipped classroom and laboratory including all supplies necessary to engage in

experimentation (eg beakers solutions microscopes measuring equipment) Most of the

teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of

learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry

reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses

were based upon recommendations from the National Science Foundation the National Research

Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently

influenced by their frames of reference in ways that other assessments of these traits (eg

external observations) may be less influenced By increasing the standard to which they compare

themselves studentsrsquo confidence may decrease This feature of most self-assessments could be

considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et

al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome

depends to some extent on how these changes in perceived ability influence other behaviors

such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and

Biology I and Chemistry I for AP Biology with no additional requirements beyond these

prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week

training course classroom supplies (eg lab materials textbooks) and to compensate schools

for the staff time required for study administration efforts We also offered $1000 compensation

for an individual selected by the school to serve as a liaison between the study team and the

31

school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of

students which would have powered the study to detect effect sizes smaller than those detected

here We faced several challenges in recruiting schools to participate even with the monetary

incentives Some schools were uncomfortable with randomization across classrooms while

others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the

course would be offered We also made some assignments on a rolling basis as additional

consentassent forms were submitted We have no information on the students who were deemed

eligible by the school to take the new AP science course but who did not sign the consent form

to participate As these students did not participate we do not have permission to obtain

information on their characteristics (eg via transcripts) and for most schools we do not know

the number of such students 10 Participating districts include Anaheim Union High School District California East Side

Union High School District California Lynwood Unified School District California Jefferson

Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg

Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public

Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville

Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate

degrees held by teachers nationally are likely to be in education (not STEM) Thus the study

teachers are less likely to have a graduate degree but not necessarily less likely to have STEM

training We also did not survey teachers regarding their Teach for America (TFA) experience

but it is possible that the relatively high share of STEM undergraduate degrees could be driven

by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last

pilot test included 140 students) prior to administering the tool to study participants Reliability

metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of

the development of the assessment instrument in the survey can be found in Seeratan et al

(2017) 13 Each year in the spring semester our team administered and collected the participant surveys

during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However

if study participants who did not take the survey differ in unobserved ways then our reweighting

based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo

characteristics before imputation of missing values (as described below) these results are very

similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact

that some students did not have 10th grade scores) we created one reading and math score for

each student that is the average of both scores or just the 8th grade score For the 23 participating

students who were in 10th grade during the year in which the AP course was offered to their

cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be

endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of

enrollment in the course so the lack of balance is simply due to unlucky randomization rather

32

than manipulation by school administrators We considered implementing a randomized block

design to avoid such issues but found it infeasible to obtain the necessary test score information

prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was

allowed to register for the new class We added an entire planning year to our study design to

avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by

Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We

find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these

six outcomes which suggests that generalizing our estimated treatment effects to the full control

group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly

offered enrollment in an AP course and then given the option of Chemistry or Biology To

account for the two courses offered we treat the school as two separate groups School-

Chemistry and School-Biology For those students who were not offered an AP course we

randomly assign them to one of two control groups proportional to the number of treated

students who chose each course For example if 60 of the treated students chose Biology then

we randomly assign 60 of the control students to the School-Biology control group In Section

VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a

probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1

if student i in school by cohort j completed any part of the end-of-year survey Xi is the same

vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed

effects and Φ() is the cumulative normal distribution function The results of this regression are

included in Online Appendix Table 2 Students who had higher pre-treatment grades Black

students those who were not disabled and those who took prerequisite courses were more likely

to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives

more weight in the regression to study participants who completed the survey and yet had pre-

study characteristics that were similar to those study participants who did not complete the

survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and

with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we

observe each outcome variable This follows a multiple imputation then deletion strategy

suggested by Hippel (2007) which improves efficiency while protecting against problematic

imputed outcome values As a robustness check Section VC provides results including

imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually

challenging curriculum with more homework than non-AP complier students Treatment group

students are also more likely to report that the students in their class were driven to succeed and

that the teacher set high standards The AP science class also involved more student-led projects

or experiments hands on learning and small group work all activities that are deemed to be

essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)

Yet we do not find strong evidence that students in AP classes were more likely to present what

they learned apply their knowledge to solve a new problem or work independently and none of

the component measures of technology usage were statistically significantly affected Nor did

33

treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear

better able to implement the academic rigor expected of an AP science class than some of the

inquiry-based approaches that the College Board intends for AP science We do not find

evidence that taking AP science led students to be more likely to report that they found their

course more interesting which may reflect the inability of the teachers to fully implement a

creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects

that might render our estimated effects smaller A research design with randomization both

across and within schools would allow for estimation of spillover effects but such a design was

infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers

received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the

weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors

in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple

outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons

(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same

three outcomes that reach statistical significance without applying the correction (shown in

Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys

from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we

have included the students from cohort 1 of high school number 23 where nonresponse was due

mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes

(including student prior academic preparation raceethnicity gender and teacher preparation)

We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in

science and grades in other courses) Some of the differences in the point estimates were quite

large yet so too were the standard errors For instance five of the seven estimated differential

treatment effects on science skill exceed 025 standard deviations with p-values that fall in the

suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse

on all three cohorts of study participants Once data collection is complete we will have the

ability to examine the effect of AP science on college enrollment college selectivity and college

completion

Page 18: Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown,

18

Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The

Review of Economics and Statistics 86 (2) 497ndash513

Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)

Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of

Advanced High School Coursework in Increasing STEM Career Interestrdquo Science

Educator 23 (1) 1ndash13

Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework

in College Admission Decisionsrdquo College and University 82 (4) 7ndash14

Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan

Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific

Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo

Educational Measurement Forthcoming

Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where

it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor

Economics 35 (1) 67ndash147

Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An

Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732

Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual

differencesrdquo Personality and Individual Differences 21 (6) 971ndash986

Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of

Cross-Cultural Psychology 45 (5) 821ndash837

Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid

Growthrdquo The New York Times April 29

Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo

Liberal Education 94 (3) 38ndash43

The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo

Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo

Education Trust June 5

Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and

Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-

001) US Department of Education Washington DC National Center for Education

Statistics

Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13

Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate

US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the

Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced

Placement Testsrdquo Washington DC

Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of

Advanced Placementrdquo Progressive Policy Institute Washington DC

West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth

Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring

Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation

and Policy Analysis 38 (1) 148ndash170

Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity

of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482

19

Figure 1

Geographic Distribution of Participating Districts

20

Figure 2

Participating Districts Neighborhood Socioeconomic Status and School Test Scores

Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school

district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos

neighborhood defined as the first principal component factor score based on measures of median

income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed

household rate and unemployment rate Y-axis is the districtrsquos average test score in grade

equivalents based on the averaged spring math and English scores for students in grades 3-8 for

2009-2013 with the expected level of achievement standardized to zero The size of each circle

is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using

Statarsquos default settings and roughly shows the predicted test score as a function of the

neighborhoodrsquos SES

21

Figure 3

Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile

Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects

Corresponding OLS estimate shown by the dashed horizontal line Science skill has been

standardized to have a mean of 0 and SD of 1 for the full sample of participating students

Results are weighted by the inverse probability of completing the survey

22

Table 1

Participating Schools and Teachers Compared to Other US High Schools and High School

Science Teachers Panel A Schools Participating Others

Average Enrollment 1409 723

Free or Reduced-Price Lunch 0700 0438

Asian 0055 0050

Black 0349 0154

Hispanic 0410 0221

White 0164 0537

Adjusted Cohort Graduation Rate 0843 0802

District Instruction Expenditures Per Pupil $6561 $5636

District Student Services Expenditures Per Pupil $3787 $3385

Panel B Teachers Participating Others

Age Under 30 0407 0160

Age 30-49 0432 0553

Age 50 or over 0161 0287

Female 0630 0536

Hispanic or Latino 0111 0051

Race American Indian or Alaska Native 0000 0009

Race Asian American 0111 0041

Race Black 0111 0060

Race Native Hawaiian or other Pacific Islander 0000 0004

Race White 0778 0896

Years of Experience 103 132

Years of Experience lt=2 0290 0085

Years of Experience lt=5 0481 0234

Hold a Teaching Certificate 0926 0945

Undergraduate Major in STEM 0944 0747

Single Subject Credential in Science 0630 0823

Masterrsquos Degree or Higher 0356 0615

Previously Taught AP Course 0469 NA

Previously Taught AP IB or Honors Course 0796 NA

Number of Professional Development Trainings 309 NA

in the Past 5 years (0-5)

Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts

httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public

high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a

9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the

Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey

httpsncesedgovsurveyssass Others in Panel B refers to public and private high school

teachers in the US High school science teachers are defined as teachers of grades 9-12 whose

main teaching assignment is in the natural sciences

23

Table 2

TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics

(1) (2) (3) (4) (5) (6)

Full Sample Survey Sample

Pre-Treatment Characteristic

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Age as of October of 11th Grade 166 -003 -007 166 -001 -001

(002) (007) (003) (009)

[019] [035] [065] [094]

Math Exam Score 038 008 025 044 007 030

(004) (010) (005) (016)

[008] [002] [017] [006]

Reading Exam Score 029 010 018 036 009 017

(003) (012) (004) (017)

[000] [014] [002] [031]

HS Grade Point Average 316 005 020 323 006 013

(003) (008) (003) (010)

[014] [002] [006] [020]

Female 059 000 010 061 -001 011

(003) (006) (004) (007)

[099] [010] [073] [012]

Asian American 012 002 010 012 003 010

(002) (005) (001) (007)

[027] [006] [007] [012]

Black 032 -002 -006 027 000 -005

(002) (006) (002) (005)

[029] [028] [088] [040]

Hispanic Native American or Multiracial 031 001 005 033 001 005

24

(002) (006) (002) (007)

[055] [041] [081] [051]

Disabled 002 000 -001 001 000 -001

(001) (001) (001) (001)

[093] [024] [057] [05]

Gifted 013 003 000 014 002 001

(002) (005) (002) (009)

[006] [100] [025] [089]

English Language Learner 005 001 002 004 001 004

(001) (002) (001) (003)

[041] [039] [054] [022]

Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007

(002) (007) (003) (009)

[066] [077] [072] [045]

Language Other than English Spoken at Home 034 002 003 035 001 004

(002) (007) (002) (007)

[032] [073] [059] [056]

Took Recommended Prerequisite Courses 079 000 009 079 002 005

(002) (004) (002) (005)

[084] [004] [043] [031]

Number of Observations 1819 1417

Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by

School x Cohort are in parentheses and p-values are in brackets

25

Table 3

First Stage Impacts on AP Course Enrollment and Overall Course Enrollment

(1) (2) (3) (4) (5) (6)

Full Sample Survey Respondents

Outcome

Control

Group

Mean

ITT

LATE

Control

Group

Mean

ITT

LATE

AP Treatment Course Enrollment 019 038 024 039

(005) (006)

[000] [000] Share of Credits During Study Year in

AP Science 003 004 011 003 004 010

(001) (001) (001) (001)

[000] [000] [000] [000]

All AP 013 004 011 014 004 010

(001) (002) (001) (002)

[000] [000] [000] [000]

Other Advanced Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [023] [020] [020]

All Other Advanced 025 -001 -003 025 -001 -003

(001) (002) (001) (003)

[023] [023] [030] [030]

Regular Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [020] [024] [019]

All Regular 062 -003 -009 061 -003 -007

(001) (003) (001) (003)

[002] [000] [007] [003]

Number of Observations 1819 1417

Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating

Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation

(1) Course-taking information collected from student transcripts Control Group Mean uses the

full control group for the first outcome (ie AP Treatment Course Enrollment) and those control

group members who complied with their assignment (ie those who did not take the AP

Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are

weighted by the inverse probability of completing the survey Standard errors clustered by School

x Cohort are in parentheses and p-values are in brackets

26

Table 4

Treatment Contrast (Composite Variables)

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Academically Challenging Curriculum -033 031 080

(010) (024)

[000] [000]

Project-Based Independent Classroom

Activities -006 013 033

(007) (017)

[007] [006]

Integrated Use of Technology

-011 011 028

(008) (019)

[019] [014]

Number of Observations 1417

Notes To construct these composite variables we first converted the values on each component

variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest

category was set to 10 the lowest to 00 and the remaining categories evenly spaced between

00 and 10 We then averaged and standardized these converted values Results are weighted by

the inverse probability of completing the survey Online Appendix Table 5 provides the list of

component variables Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

27

Table 5

AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Science Skill -019 009 023

(006) (016)

[015] [014]

STEM Interest 062 004 009

(002) (007)

[016] [016]

Confidence in College

Science 092 -004 -010

(002) (005)

[011] [006]

Stress 012 007 017

(003) (007)

[002] [001]

Grades in Science Courses 280 -012 -029

(007) (016)

[008] [007]

Grades in Other Courses 314 -007 -018

(002) (006)

[000] [000]

Number of Observations 1819 for grades 1417 for other

outcomes

Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of

participating students STEM interest =1 if high or some interest in pursuing a STEM degree or

=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to

complete a college science course or =0 if somewhat not confident or not at all confident Stress=

1 if most recent science course had strong negative or negative impact on physical or emotional

health or =0 if strong positive impact positive impact or no impact Grades in science and other

courses are obtained from student transcripts and measure grades during the study year

Results with the exception of grades during study year are weighted by the inverse probability of

completing the survey Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

Table 6

Robustness Checks of Main ITT Results

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Outcome

Control

Group

Complie

r Mean

Main

Result

s

Robus

t SE

p-value

(permutatio

n test)

Excludin

g High

School

56

Including

Imputatio

n of

Missing

Outcome

Variables

Excluding

Covariate

s

Excludin

g High

School

23

Lee

Lower

Boun

d

Lee

Upper

Boun

d

95

Confidence

Interval

from Lee

Bounds

Rati

o of

95

CI in

(11)

to

95

CI in

(7)

Science Skill -019 009 010 011 020 007 003 039

-

009

05

1 20

(006) (005) (000) (000) (000) (000) (007) (007)

[015] [006] [006] [020] [011] [001] [024] [072] [000]

STEM Interest 062 004 005 003 003 003 002 012

-

003

01

8 19

(002) (003) (000) (000) (000) (000) (003) (004)

[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College

Science 092 -004 -003 -006 -006 -004 -006 005

-

009

01

0 20

(002) (002) (000) (000) (000) (000) (002) (003)

[011] [005] [007] [037] [002] [003] [010] [000] [017]

Stress 012 007 005 006 008 007 001 011

-

005

01

5 16

(003) (002) (000) (000) (000) (000) (003) (002)

[002] [000] [000] [014] [007] [002] [002] [079] [000]

Grades in Science Courses 280 -012 -006 -010 -007 |

(007) (004) (000) (000) (000)

[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts

Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey

(002) (003) (000) (000) (000) |

[000] [001] [001] [000] [001] [038]

Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than

standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby

a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of

43

the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)

reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the

experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply

imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and

where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)

from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those

treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and

control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to

derive confidence interval for the treatment effect itself)

30

1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the

Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the

effect of advanced high school courses more generally often without distinctions between AP

and other rigorous course options Nearly all of these nonexperimental studies find large positive

effects of rigorous secondary school courses particularly those in math and science on studentsrsquo

high school postsecondary and labor market performance (eg Altonji 1995 Attewell and

Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long

Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer

an AP course are encouraged (though not required) to attend a professional development

training The Board and other independent agencies offer several workshops with the most

extensive training being the AP summer institute a week-long training that is led by an

experienced AP instructor Teachers are then expected to develop their syllabi for the course and

submit them to the Board for review A team of auditors at the Board review each syllabus and

grant permission to a school to label the course as AP on course catalogs and student transcripts

once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they

do not meet the requirements upon original submission College Board (2017b) contains a

discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for

assessment (ie course delivery and student performance are not assessed by the Board) In

order to effectively run an AP Biology or Chemistry course teachers require access to a well-

equipped classroom and laboratory including all supplies necessary to engage in

experimentation (eg beakers solutions microscopes measuring equipment) Most of the

teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of

learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry

reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses

were based upon recommendations from the National Science Foundation the National Research

Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently

influenced by their frames of reference in ways that other assessments of these traits (eg

external observations) may be less influenced By increasing the standard to which they compare

themselves studentsrsquo confidence may decrease This feature of most self-assessments could be

considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et

al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome

depends to some extent on how these changes in perceived ability influence other behaviors

such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and

Biology I and Chemistry I for AP Biology with no additional requirements beyond these

prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week

training course classroom supplies (eg lab materials textbooks) and to compensate schools

for the staff time required for study administration efforts We also offered $1000 compensation

for an individual selected by the school to serve as a liaison between the study team and the

31

school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of

students which would have powered the study to detect effect sizes smaller than those detected

here We faced several challenges in recruiting schools to participate even with the monetary

incentives Some schools were uncomfortable with randomization across classrooms while

others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the

course would be offered We also made some assignments on a rolling basis as additional

consentassent forms were submitted We have no information on the students who were deemed

eligible by the school to take the new AP science course but who did not sign the consent form

to participate As these students did not participate we do not have permission to obtain

information on their characteristics (eg via transcripts) and for most schools we do not know

the number of such students 10 Participating districts include Anaheim Union High School District California East Side

Union High School District California Lynwood Unified School District California Jefferson

Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg

Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public

Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville

Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate

degrees held by teachers nationally are likely to be in education (not STEM) Thus the study

teachers are less likely to have a graduate degree but not necessarily less likely to have STEM

training We also did not survey teachers regarding their Teach for America (TFA) experience

but it is possible that the relatively high share of STEM undergraduate degrees could be driven

by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last

pilot test included 140 students) prior to administering the tool to study participants Reliability

metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of

the development of the assessment instrument in the survey can be found in Seeratan et al

(2017) 13 Each year in the spring semester our team administered and collected the participant surveys

during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However

if study participants who did not take the survey differ in unobserved ways then our reweighting

based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo

characteristics before imputation of missing values (as described below) these results are very

similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact

that some students did not have 10th grade scores) we created one reading and math score for

each student that is the average of both scores or just the 8th grade score For the 23 participating

students who were in 10th grade during the year in which the AP course was offered to their

cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be

endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of

enrollment in the course so the lack of balance is simply due to unlucky randomization rather

32

than manipulation by school administrators We considered implementing a randomized block

design to avoid such issues but found it infeasible to obtain the necessary test score information

prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was

allowed to register for the new class We added an entire planning year to our study design to

avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by

Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We

find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these

six outcomes which suggests that generalizing our estimated treatment effects to the full control

group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly

offered enrollment in an AP course and then given the option of Chemistry or Biology To

account for the two courses offered we treat the school as two separate groups School-

Chemistry and School-Biology For those students who were not offered an AP course we

randomly assign them to one of two control groups proportional to the number of treated

students who chose each course For example if 60 of the treated students chose Biology then

we randomly assign 60 of the control students to the School-Biology control group In Section

VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a

probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1

if student i in school by cohort j completed any part of the end-of-year survey Xi is the same

vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed

effects and Φ() is the cumulative normal distribution function The results of this regression are

included in Online Appendix Table 2 Students who had higher pre-treatment grades Black

students those who were not disabled and those who took prerequisite courses were more likely

to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives

more weight in the regression to study participants who completed the survey and yet had pre-

study characteristics that were similar to those study participants who did not complete the

survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and

with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we

observe each outcome variable This follows a multiple imputation then deletion strategy

suggested by Hippel (2007) which improves efficiency while protecting against problematic

imputed outcome values As a robustness check Section VC provides results including

imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually

challenging curriculum with more homework than non-AP complier students Treatment group

students are also more likely to report that the students in their class were driven to succeed and

that the teacher set high standards The AP science class also involved more student-led projects

or experiments hands on learning and small group work all activities that are deemed to be

essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)

Yet we do not find strong evidence that students in AP classes were more likely to present what

they learned apply their knowledge to solve a new problem or work independently and none of

the component measures of technology usage were statistically significantly affected Nor did

33

treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear

better able to implement the academic rigor expected of an AP science class than some of the

inquiry-based approaches that the College Board intends for AP science We do not find

evidence that taking AP science led students to be more likely to report that they found their

course more interesting which may reflect the inability of the teachers to fully implement a

creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects

that might render our estimated effects smaller A research design with randomization both

across and within schools would allow for estimation of spillover effects but such a design was

infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers

received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the

weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors

in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple

outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons

(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same

three outcomes that reach statistical significance without applying the correction (shown in

Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys

from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we

have included the students from cohort 1 of high school number 23 where nonresponse was due

mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes

(including student prior academic preparation raceethnicity gender and teacher preparation)

We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in

science and grades in other courses) Some of the differences in the point estimates were quite

large yet so too were the standard errors For instance five of the seven estimated differential

treatment effects on science skill exceed 025 standard deviations with p-values that fall in the

suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse

on all three cohorts of study participants Once data collection is complete we will have the

ability to examine the effect of AP science on college enrollment college selectivity and college

completion

Page 19: Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown,

19

Figure 1

Geographic Distribution of Participating Districts

20

Figure 2

Participating Districts Neighborhood Socioeconomic Status and School Test Scores

Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school

district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos

neighborhood defined as the first principal component factor score based on measures of median

income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed

household rate and unemployment rate Y-axis is the districtrsquos average test score in grade

equivalents based on the averaged spring math and English scores for students in grades 3-8 for

2009-2013 with the expected level of achievement standardized to zero The size of each circle

is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using

Statarsquos default settings and roughly shows the predicted test score as a function of the

neighborhoodrsquos SES

21

Figure 3

Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile

Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects

Corresponding OLS estimate shown by the dashed horizontal line Science skill has been

standardized to have a mean of 0 and SD of 1 for the full sample of participating students

Results are weighted by the inverse probability of completing the survey

22

Table 1

Participating Schools and Teachers Compared to Other US High Schools and High School

Science Teachers Panel A Schools Participating Others

Average Enrollment 1409 723

Free or Reduced-Price Lunch 0700 0438

Asian 0055 0050

Black 0349 0154

Hispanic 0410 0221

White 0164 0537

Adjusted Cohort Graduation Rate 0843 0802

District Instruction Expenditures Per Pupil $6561 $5636

District Student Services Expenditures Per Pupil $3787 $3385

Panel B Teachers Participating Others

Age Under 30 0407 0160

Age 30-49 0432 0553

Age 50 or over 0161 0287

Female 0630 0536

Hispanic or Latino 0111 0051

Race American Indian or Alaska Native 0000 0009

Race Asian American 0111 0041

Race Black 0111 0060

Race Native Hawaiian or other Pacific Islander 0000 0004

Race White 0778 0896

Years of Experience 103 132

Years of Experience lt=2 0290 0085

Years of Experience lt=5 0481 0234

Hold a Teaching Certificate 0926 0945

Undergraduate Major in STEM 0944 0747

Single Subject Credential in Science 0630 0823

Masterrsquos Degree or Higher 0356 0615

Previously Taught AP Course 0469 NA

Previously Taught AP IB or Honors Course 0796 NA

Number of Professional Development Trainings 309 NA

in the Past 5 years (0-5)

Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts

httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public

high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a

9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the

Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey

httpsncesedgovsurveyssass Others in Panel B refers to public and private high school

teachers in the US High school science teachers are defined as teachers of grades 9-12 whose

main teaching assignment is in the natural sciences

23

Table 2

TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics

(1) (2) (3) (4) (5) (6)

Full Sample Survey Sample

Pre-Treatment Characteristic

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Age as of October of 11th Grade 166 -003 -007 166 -001 -001

(002) (007) (003) (009)

[019] [035] [065] [094]

Math Exam Score 038 008 025 044 007 030

(004) (010) (005) (016)

[008] [002] [017] [006]

Reading Exam Score 029 010 018 036 009 017

(003) (012) (004) (017)

[000] [014] [002] [031]

HS Grade Point Average 316 005 020 323 006 013

(003) (008) (003) (010)

[014] [002] [006] [020]

Female 059 000 010 061 -001 011

(003) (006) (004) (007)

[099] [010] [073] [012]

Asian American 012 002 010 012 003 010

(002) (005) (001) (007)

[027] [006] [007] [012]

Black 032 -002 -006 027 000 -005

(002) (006) (002) (005)

[029] [028] [088] [040]

Hispanic Native American or Multiracial 031 001 005 033 001 005

24

(002) (006) (002) (007)

[055] [041] [081] [051]

Disabled 002 000 -001 001 000 -001

(001) (001) (001) (001)

[093] [024] [057] [05]

Gifted 013 003 000 014 002 001

(002) (005) (002) (009)

[006] [100] [025] [089]

English Language Learner 005 001 002 004 001 004

(001) (002) (001) (003)

[041] [039] [054] [022]

Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007

(002) (007) (003) (009)

[066] [077] [072] [045]

Language Other than English Spoken at Home 034 002 003 035 001 004

(002) (007) (002) (007)

[032] [073] [059] [056]

Took Recommended Prerequisite Courses 079 000 009 079 002 005

(002) (004) (002) (005)

[084] [004] [043] [031]

Number of Observations 1819 1417

Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by

School x Cohort are in parentheses and p-values are in brackets

25

Table 3

First Stage Impacts on AP Course Enrollment and Overall Course Enrollment

(1) (2) (3) (4) (5) (6)

Full Sample Survey Respondents

Outcome

Control

Group

Mean

ITT

LATE

Control

Group

Mean

ITT

LATE

AP Treatment Course Enrollment 019 038 024 039

(005) (006)

[000] [000] Share of Credits During Study Year in

AP Science 003 004 011 003 004 010

(001) (001) (001) (001)

[000] [000] [000] [000]

All AP 013 004 011 014 004 010

(001) (002) (001) (002)

[000] [000] [000] [000]

Other Advanced Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [023] [020] [020]

All Other Advanced 025 -001 -003 025 -001 -003

(001) (002) (001) (003)

[023] [023] [030] [030]

Regular Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [020] [024] [019]

All Regular 062 -003 -009 061 -003 -007

(001) (003) (001) (003)

[002] [000] [007] [003]

Number of Observations 1819 1417

Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating

Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation

(1) Course-taking information collected from student transcripts Control Group Mean uses the

full control group for the first outcome (ie AP Treatment Course Enrollment) and those control

group members who complied with their assignment (ie those who did not take the AP

Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are

weighted by the inverse probability of completing the survey Standard errors clustered by School

x Cohort are in parentheses and p-values are in brackets

26

Table 4

Treatment Contrast (Composite Variables)

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Academically Challenging Curriculum -033 031 080

(010) (024)

[000] [000]

Project-Based Independent Classroom

Activities -006 013 033

(007) (017)

[007] [006]

Integrated Use of Technology

-011 011 028

(008) (019)

[019] [014]

Number of Observations 1417

Notes To construct these composite variables we first converted the values on each component

variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest

category was set to 10 the lowest to 00 and the remaining categories evenly spaced between

00 and 10 We then averaged and standardized these converted values Results are weighted by

the inverse probability of completing the survey Online Appendix Table 5 provides the list of

component variables Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

27

Table 5

AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Science Skill -019 009 023

(006) (016)

[015] [014]

STEM Interest 062 004 009

(002) (007)

[016] [016]

Confidence in College

Science 092 -004 -010

(002) (005)

[011] [006]

Stress 012 007 017

(003) (007)

[002] [001]

Grades in Science Courses 280 -012 -029

(007) (016)

[008] [007]

Grades in Other Courses 314 -007 -018

(002) (006)

[000] [000]

Number of Observations 1819 for grades 1417 for other

outcomes

Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of

participating students STEM interest =1 if high or some interest in pursuing a STEM degree or

=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to

complete a college science course or =0 if somewhat not confident or not at all confident Stress=

1 if most recent science course had strong negative or negative impact on physical or emotional

health or =0 if strong positive impact positive impact or no impact Grades in science and other

courses are obtained from student transcripts and measure grades during the study year

Results with the exception of grades during study year are weighted by the inverse probability of

completing the survey Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

Table 6

Robustness Checks of Main ITT Results

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Outcome

Control

Group

Complie

r Mean

Main

Result

s

Robus

t SE

p-value

(permutatio

n test)

Excludin

g High

School

56

Including

Imputatio

n of

Missing

Outcome

Variables

Excluding

Covariate

s

Excludin

g High

School

23

Lee

Lower

Boun

d

Lee

Upper

Boun

d

95

Confidence

Interval

from Lee

Bounds

Rati

o of

95

CI in

(11)

to

95

CI in

(7)

Science Skill -019 009 010 011 020 007 003 039

-

009

05

1 20

(006) (005) (000) (000) (000) (000) (007) (007)

[015] [006] [006] [020] [011] [001] [024] [072] [000]

STEM Interest 062 004 005 003 003 003 002 012

-

003

01

8 19

(002) (003) (000) (000) (000) (000) (003) (004)

[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College

Science 092 -004 -003 -006 -006 -004 -006 005

-

009

01

0 20

(002) (002) (000) (000) (000) (000) (002) (003)

[011] [005] [007] [037] [002] [003] [010] [000] [017]

Stress 012 007 005 006 008 007 001 011

-

005

01

5 16

(003) (002) (000) (000) (000) (000) (003) (002)

[002] [000] [000] [014] [007] [002] [002] [079] [000]

Grades in Science Courses 280 -012 -006 -010 -007 |

(007) (004) (000) (000) (000)

[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts

Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey

(002) (003) (000) (000) (000) |

[000] [001] [001] [000] [001] [038]

Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than

standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby

a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of

43

the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)

reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the

experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply

imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and

where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)

from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those

treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and

control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to

derive confidence interval for the treatment effect itself)

30

1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the

Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the

effect of advanced high school courses more generally often without distinctions between AP

and other rigorous course options Nearly all of these nonexperimental studies find large positive

effects of rigorous secondary school courses particularly those in math and science on studentsrsquo

high school postsecondary and labor market performance (eg Altonji 1995 Attewell and

Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long

Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer

an AP course are encouraged (though not required) to attend a professional development

training The Board and other independent agencies offer several workshops with the most

extensive training being the AP summer institute a week-long training that is led by an

experienced AP instructor Teachers are then expected to develop their syllabi for the course and

submit them to the Board for review A team of auditors at the Board review each syllabus and

grant permission to a school to label the course as AP on course catalogs and student transcripts

once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they

do not meet the requirements upon original submission College Board (2017b) contains a

discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for

assessment (ie course delivery and student performance are not assessed by the Board) In

order to effectively run an AP Biology or Chemistry course teachers require access to a well-

equipped classroom and laboratory including all supplies necessary to engage in

experimentation (eg beakers solutions microscopes measuring equipment) Most of the

teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of

learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry

reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses

were based upon recommendations from the National Science Foundation the National Research

Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently

influenced by their frames of reference in ways that other assessments of these traits (eg

external observations) may be less influenced By increasing the standard to which they compare

themselves studentsrsquo confidence may decrease This feature of most self-assessments could be

considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et

al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome

depends to some extent on how these changes in perceived ability influence other behaviors

such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and

Biology I and Chemistry I for AP Biology with no additional requirements beyond these

prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week

training course classroom supplies (eg lab materials textbooks) and to compensate schools

for the staff time required for study administration efforts We also offered $1000 compensation

for an individual selected by the school to serve as a liaison between the study team and the

31

school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of

students which would have powered the study to detect effect sizes smaller than those detected

here We faced several challenges in recruiting schools to participate even with the monetary

incentives Some schools were uncomfortable with randomization across classrooms while

others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the

course would be offered We also made some assignments on a rolling basis as additional

consentassent forms were submitted We have no information on the students who were deemed

eligible by the school to take the new AP science course but who did not sign the consent form

to participate As these students did not participate we do not have permission to obtain

information on their characteristics (eg via transcripts) and for most schools we do not know

the number of such students 10 Participating districts include Anaheim Union High School District California East Side

Union High School District California Lynwood Unified School District California Jefferson

Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg

Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public

Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville

Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate

degrees held by teachers nationally are likely to be in education (not STEM) Thus the study

teachers are less likely to have a graduate degree but not necessarily less likely to have STEM

training We also did not survey teachers regarding their Teach for America (TFA) experience

but it is possible that the relatively high share of STEM undergraduate degrees could be driven

by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last

pilot test included 140 students) prior to administering the tool to study participants Reliability

metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of

the development of the assessment instrument in the survey can be found in Seeratan et al

(2017) 13 Each year in the spring semester our team administered and collected the participant surveys

during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However

if study participants who did not take the survey differ in unobserved ways then our reweighting

based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo

characteristics before imputation of missing values (as described below) these results are very

similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact

that some students did not have 10th grade scores) we created one reading and math score for

each student that is the average of both scores or just the 8th grade score For the 23 participating

students who were in 10th grade during the year in which the AP course was offered to their

cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be

endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of

enrollment in the course so the lack of balance is simply due to unlucky randomization rather

32

than manipulation by school administrators We considered implementing a randomized block

design to avoid such issues but found it infeasible to obtain the necessary test score information

prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was

allowed to register for the new class We added an entire planning year to our study design to

avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by

Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We

find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these

six outcomes which suggests that generalizing our estimated treatment effects to the full control

group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly

offered enrollment in an AP course and then given the option of Chemistry or Biology To

account for the two courses offered we treat the school as two separate groups School-

Chemistry and School-Biology For those students who were not offered an AP course we

randomly assign them to one of two control groups proportional to the number of treated

students who chose each course For example if 60 of the treated students chose Biology then

we randomly assign 60 of the control students to the School-Biology control group In Section

VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a

probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1

if student i in school by cohort j completed any part of the end-of-year survey Xi is the same

vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed

effects and Φ() is the cumulative normal distribution function The results of this regression are

included in Online Appendix Table 2 Students who had higher pre-treatment grades Black

students those who were not disabled and those who took prerequisite courses were more likely

to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives

more weight in the regression to study participants who completed the survey and yet had pre-

study characteristics that were similar to those study participants who did not complete the

survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and

with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we

observe each outcome variable This follows a multiple imputation then deletion strategy

suggested by Hippel (2007) which improves efficiency while protecting against problematic

imputed outcome values As a robustness check Section VC provides results including

imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually

challenging curriculum with more homework than non-AP complier students Treatment group

students are also more likely to report that the students in their class were driven to succeed and

that the teacher set high standards The AP science class also involved more student-led projects

or experiments hands on learning and small group work all activities that are deemed to be

essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)

Yet we do not find strong evidence that students in AP classes were more likely to present what

they learned apply their knowledge to solve a new problem or work independently and none of

the component measures of technology usage were statistically significantly affected Nor did

33

treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear

better able to implement the academic rigor expected of an AP science class than some of the

inquiry-based approaches that the College Board intends for AP science We do not find

evidence that taking AP science led students to be more likely to report that they found their

course more interesting which may reflect the inability of the teachers to fully implement a

creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects

that might render our estimated effects smaller A research design with randomization both

across and within schools would allow for estimation of spillover effects but such a design was

infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers

received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the

weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors

in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple

outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons

(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same

three outcomes that reach statistical significance without applying the correction (shown in

Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys

from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we

have included the students from cohort 1 of high school number 23 where nonresponse was due

mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes

(including student prior academic preparation raceethnicity gender and teacher preparation)

We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in

science and grades in other courses) Some of the differences in the point estimates were quite

large yet so too were the standard errors For instance five of the seven estimated differential

treatment effects on science skill exceed 025 standard deviations with p-values that fall in the

suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse

on all three cohorts of study participants Once data collection is complete we will have the

ability to examine the effect of AP science on college enrollment college selectivity and college

completion

Page 20: Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown,

20

Figure 2

Participating Districts Neighborhood Socioeconomic Status and School Test Scores

Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school

district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos

neighborhood defined as the first principal component factor score based on measures of median

income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed

household rate and unemployment rate Y-axis is the districtrsquos average test score in grade

equivalents based on the averaged spring math and English scores for students in grades 3-8 for

2009-2013 with the expected level of achievement standardized to zero The size of each circle

is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using

Statarsquos default settings and roughly shows the predicted test score as a function of the

neighborhoodrsquos SES

21

Figure 3

Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile

Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects

Corresponding OLS estimate shown by the dashed horizontal line Science skill has been

standardized to have a mean of 0 and SD of 1 for the full sample of participating students

Results are weighted by the inverse probability of completing the survey

22

Table 1

Participating Schools and Teachers Compared to Other US High Schools and High School

Science Teachers Panel A Schools Participating Others

Average Enrollment 1409 723

Free or Reduced-Price Lunch 0700 0438

Asian 0055 0050

Black 0349 0154

Hispanic 0410 0221

White 0164 0537

Adjusted Cohort Graduation Rate 0843 0802

District Instruction Expenditures Per Pupil $6561 $5636

District Student Services Expenditures Per Pupil $3787 $3385

Panel B Teachers Participating Others

Age Under 30 0407 0160

Age 30-49 0432 0553

Age 50 or over 0161 0287

Female 0630 0536

Hispanic or Latino 0111 0051

Race American Indian or Alaska Native 0000 0009

Race Asian American 0111 0041

Race Black 0111 0060

Race Native Hawaiian or other Pacific Islander 0000 0004

Race White 0778 0896

Years of Experience 103 132

Years of Experience lt=2 0290 0085

Years of Experience lt=5 0481 0234

Hold a Teaching Certificate 0926 0945

Undergraduate Major in STEM 0944 0747

Single Subject Credential in Science 0630 0823

Masterrsquos Degree or Higher 0356 0615

Previously Taught AP Course 0469 NA

Previously Taught AP IB or Honors Course 0796 NA

Number of Professional Development Trainings 309 NA

in the Past 5 years (0-5)

Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts

httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public

high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a

9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the

Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey

httpsncesedgovsurveyssass Others in Panel B refers to public and private high school

teachers in the US High school science teachers are defined as teachers of grades 9-12 whose

main teaching assignment is in the natural sciences

23

Table 2

TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics

(1) (2) (3) (4) (5) (6)

Full Sample Survey Sample

Pre-Treatment Characteristic

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Age as of October of 11th Grade 166 -003 -007 166 -001 -001

(002) (007) (003) (009)

[019] [035] [065] [094]

Math Exam Score 038 008 025 044 007 030

(004) (010) (005) (016)

[008] [002] [017] [006]

Reading Exam Score 029 010 018 036 009 017

(003) (012) (004) (017)

[000] [014] [002] [031]

HS Grade Point Average 316 005 020 323 006 013

(003) (008) (003) (010)

[014] [002] [006] [020]

Female 059 000 010 061 -001 011

(003) (006) (004) (007)

[099] [010] [073] [012]

Asian American 012 002 010 012 003 010

(002) (005) (001) (007)

[027] [006] [007] [012]

Black 032 -002 -006 027 000 -005

(002) (006) (002) (005)

[029] [028] [088] [040]

Hispanic Native American or Multiracial 031 001 005 033 001 005

24

(002) (006) (002) (007)

[055] [041] [081] [051]

Disabled 002 000 -001 001 000 -001

(001) (001) (001) (001)

[093] [024] [057] [05]

Gifted 013 003 000 014 002 001

(002) (005) (002) (009)

[006] [100] [025] [089]

English Language Learner 005 001 002 004 001 004

(001) (002) (001) (003)

[041] [039] [054] [022]

Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007

(002) (007) (003) (009)

[066] [077] [072] [045]

Language Other than English Spoken at Home 034 002 003 035 001 004

(002) (007) (002) (007)

[032] [073] [059] [056]

Took Recommended Prerequisite Courses 079 000 009 079 002 005

(002) (004) (002) (005)

[084] [004] [043] [031]

Number of Observations 1819 1417

Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by

School x Cohort are in parentheses and p-values are in brackets

25

Table 3

First Stage Impacts on AP Course Enrollment and Overall Course Enrollment

(1) (2) (3) (4) (5) (6)

Full Sample Survey Respondents

Outcome

Control

Group

Mean

ITT

LATE

Control

Group

Mean

ITT

LATE

AP Treatment Course Enrollment 019 038 024 039

(005) (006)

[000] [000] Share of Credits During Study Year in

AP Science 003 004 011 003 004 010

(001) (001) (001) (001)

[000] [000] [000] [000]

All AP 013 004 011 014 004 010

(001) (002) (001) (002)

[000] [000] [000] [000]

Other Advanced Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [023] [020] [020]

All Other Advanced 025 -001 -003 025 -001 -003

(001) (002) (001) (003)

[023] [023] [030] [030]

Regular Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [020] [024] [019]

All Regular 062 -003 -009 061 -003 -007

(001) (003) (001) (003)

[002] [000] [007] [003]

Number of Observations 1819 1417

Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating

Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation

(1) Course-taking information collected from student transcripts Control Group Mean uses the

full control group for the first outcome (ie AP Treatment Course Enrollment) and those control

group members who complied with their assignment (ie those who did not take the AP

Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are

weighted by the inverse probability of completing the survey Standard errors clustered by School

x Cohort are in parentheses and p-values are in brackets

26

Table 4

Treatment Contrast (Composite Variables)

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Academically Challenging Curriculum -033 031 080

(010) (024)

[000] [000]

Project-Based Independent Classroom

Activities -006 013 033

(007) (017)

[007] [006]

Integrated Use of Technology

-011 011 028

(008) (019)

[019] [014]

Number of Observations 1417

Notes To construct these composite variables we first converted the values on each component

variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest

category was set to 10 the lowest to 00 and the remaining categories evenly spaced between

00 and 10 We then averaged and standardized these converted values Results are weighted by

the inverse probability of completing the survey Online Appendix Table 5 provides the list of

component variables Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

27

Table 5

AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Science Skill -019 009 023

(006) (016)

[015] [014]

STEM Interest 062 004 009

(002) (007)

[016] [016]

Confidence in College

Science 092 -004 -010

(002) (005)

[011] [006]

Stress 012 007 017

(003) (007)

[002] [001]

Grades in Science Courses 280 -012 -029

(007) (016)

[008] [007]

Grades in Other Courses 314 -007 -018

(002) (006)

[000] [000]

Number of Observations 1819 for grades 1417 for other

outcomes

Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of

participating students STEM interest =1 if high or some interest in pursuing a STEM degree or

=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to

complete a college science course or =0 if somewhat not confident or not at all confident Stress=

1 if most recent science course had strong negative or negative impact on physical or emotional

health or =0 if strong positive impact positive impact or no impact Grades in science and other

courses are obtained from student transcripts and measure grades during the study year

Results with the exception of grades during study year are weighted by the inverse probability of

completing the survey Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

Table 6

Robustness Checks of Main ITT Results

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Outcome

Control

Group

Complie

r Mean

Main

Result

s

Robus

t SE

p-value

(permutatio

n test)

Excludin

g High

School

56

Including

Imputatio

n of

Missing

Outcome

Variables

Excluding

Covariate

s

Excludin

g High

School

23

Lee

Lower

Boun

d

Lee

Upper

Boun

d

95

Confidence

Interval

from Lee

Bounds

Rati

o of

95

CI in

(11)

to

95

CI in

(7)

Science Skill -019 009 010 011 020 007 003 039

-

009

05

1 20

(006) (005) (000) (000) (000) (000) (007) (007)

[015] [006] [006] [020] [011] [001] [024] [072] [000]

STEM Interest 062 004 005 003 003 003 002 012

-

003

01

8 19

(002) (003) (000) (000) (000) (000) (003) (004)

[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College

Science 092 -004 -003 -006 -006 -004 -006 005

-

009

01

0 20

(002) (002) (000) (000) (000) (000) (002) (003)

[011] [005] [007] [037] [002] [003] [010] [000] [017]

Stress 012 007 005 006 008 007 001 011

-

005

01

5 16

(003) (002) (000) (000) (000) (000) (003) (002)

[002] [000] [000] [014] [007] [002] [002] [079] [000]

Grades in Science Courses 280 -012 -006 -010 -007 |

(007) (004) (000) (000) (000)

[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts

Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey

(002) (003) (000) (000) (000) |

[000] [001] [001] [000] [001] [038]

Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than

standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby

a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of

43

the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)

reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the

experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply

imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and

where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)

from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those

treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and

control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to

derive confidence interval for the treatment effect itself)

30

1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the

Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the

effect of advanced high school courses more generally often without distinctions between AP

and other rigorous course options Nearly all of these nonexperimental studies find large positive

effects of rigorous secondary school courses particularly those in math and science on studentsrsquo

high school postsecondary and labor market performance (eg Altonji 1995 Attewell and

Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long

Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer

an AP course are encouraged (though not required) to attend a professional development

training The Board and other independent agencies offer several workshops with the most

extensive training being the AP summer institute a week-long training that is led by an

experienced AP instructor Teachers are then expected to develop their syllabi for the course and

submit them to the Board for review A team of auditors at the Board review each syllabus and

grant permission to a school to label the course as AP on course catalogs and student transcripts

once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they

do not meet the requirements upon original submission College Board (2017b) contains a

discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for

assessment (ie course delivery and student performance are not assessed by the Board) In

order to effectively run an AP Biology or Chemistry course teachers require access to a well-

equipped classroom and laboratory including all supplies necessary to engage in

experimentation (eg beakers solutions microscopes measuring equipment) Most of the

teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of

learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry

reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses

were based upon recommendations from the National Science Foundation the National Research

Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently

influenced by their frames of reference in ways that other assessments of these traits (eg

external observations) may be less influenced By increasing the standard to which they compare

themselves studentsrsquo confidence may decrease This feature of most self-assessments could be

considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et

al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome

depends to some extent on how these changes in perceived ability influence other behaviors

such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and

Biology I and Chemistry I for AP Biology with no additional requirements beyond these

prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week

training course classroom supplies (eg lab materials textbooks) and to compensate schools

for the staff time required for study administration efforts We also offered $1000 compensation

for an individual selected by the school to serve as a liaison between the study team and the

31

school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of

students which would have powered the study to detect effect sizes smaller than those detected

here We faced several challenges in recruiting schools to participate even with the monetary

incentives Some schools were uncomfortable with randomization across classrooms while

others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the

course would be offered We also made some assignments on a rolling basis as additional

consentassent forms were submitted We have no information on the students who were deemed

eligible by the school to take the new AP science course but who did not sign the consent form

to participate As these students did not participate we do not have permission to obtain

information on their characteristics (eg via transcripts) and for most schools we do not know

the number of such students 10 Participating districts include Anaheim Union High School District California East Side

Union High School District California Lynwood Unified School District California Jefferson

Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg

Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public

Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville

Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate

degrees held by teachers nationally are likely to be in education (not STEM) Thus the study

teachers are less likely to have a graduate degree but not necessarily less likely to have STEM

training We also did not survey teachers regarding their Teach for America (TFA) experience

but it is possible that the relatively high share of STEM undergraduate degrees could be driven

by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last

pilot test included 140 students) prior to administering the tool to study participants Reliability

metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of

the development of the assessment instrument in the survey can be found in Seeratan et al

(2017) 13 Each year in the spring semester our team administered and collected the participant surveys

during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However

if study participants who did not take the survey differ in unobserved ways then our reweighting

based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo

characteristics before imputation of missing values (as described below) these results are very

similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact

that some students did not have 10th grade scores) we created one reading and math score for

each student that is the average of both scores or just the 8th grade score For the 23 participating

students who were in 10th grade during the year in which the AP course was offered to their

cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be

endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of

enrollment in the course so the lack of balance is simply due to unlucky randomization rather

32

than manipulation by school administrators We considered implementing a randomized block

design to avoid such issues but found it infeasible to obtain the necessary test score information

prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was

allowed to register for the new class We added an entire planning year to our study design to

avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by

Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We

find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these

six outcomes which suggests that generalizing our estimated treatment effects to the full control

group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly

offered enrollment in an AP course and then given the option of Chemistry or Biology To

account for the two courses offered we treat the school as two separate groups School-

Chemistry and School-Biology For those students who were not offered an AP course we

randomly assign them to one of two control groups proportional to the number of treated

students who chose each course For example if 60 of the treated students chose Biology then

we randomly assign 60 of the control students to the School-Biology control group In Section

VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a

probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1

if student i in school by cohort j completed any part of the end-of-year survey Xi is the same

vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed

effects and Φ() is the cumulative normal distribution function The results of this regression are

included in Online Appendix Table 2 Students who had higher pre-treatment grades Black

students those who were not disabled and those who took prerequisite courses were more likely

to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives

more weight in the regression to study participants who completed the survey and yet had pre-

study characteristics that were similar to those study participants who did not complete the

survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and

with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we

observe each outcome variable This follows a multiple imputation then deletion strategy

suggested by Hippel (2007) which improves efficiency while protecting against problematic

imputed outcome values As a robustness check Section VC provides results including

imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually

challenging curriculum with more homework than non-AP complier students Treatment group

students are also more likely to report that the students in their class were driven to succeed and

that the teacher set high standards The AP science class also involved more student-led projects

or experiments hands on learning and small group work all activities that are deemed to be

essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)

Yet we do not find strong evidence that students in AP classes were more likely to present what

they learned apply their knowledge to solve a new problem or work independently and none of

the component measures of technology usage were statistically significantly affected Nor did

33

treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear

better able to implement the academic rigor expected of an AP science class than some of the

inquiry-based approaches that the College Board intends for AP science We do not find

evidence that taking AP science led students to be more likely to report that they found their

course more interesting which may reflect the inability of the teachers to fully implement a

creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects

that might render our estimated effects smaller A research design with randomization both

across and within schools would allow for estimation of spillover effects but such a design was

infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers

received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the

weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors

in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple

outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons

(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same

three outcomes that reach statistical significance without applying the correction (shown in

Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys

from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we

have included the students from cohort 1 of high school number 23 where nonresponse was due

mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes

(including student prior academic preparation raceethnicity gender and teacher preparation)

We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in

science and grades in other courses) Some of the differences in the point estimates were quite

large yet so too were the standard errors For instance five of the seven estimated differential

treatment effects on science skill exceed 025 standard deviations with p-values that fall in the

suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse

on all three cohorts of study participants Once data collection is complete we will have the

ability to examine the effect of AP science on college enrollment college selectivity and college

completion

Page 21: Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown,

21

Figure 3

Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile

Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects

Corresponding OLS estimate shown by the dashed horizontal line Science skill has been

standardized to have a mean of 0 and SD of 1 for the full sample of participating students

Results are weighted by the inverse probability of completing the survey

22

Table 1

Participating Schools and Teachers Compared to Other US High Schools and High School

Science Teachers Panel A Schools Participating Others

Average Enrollment 1409 723

Free or Reduced-Price Lunch 0700 0438

Asian 0055 0050

Black 0349 0154

Hispanic 0410 0221

White 0164 0537

Adjusted Cohort Graduation Rate 0843 0802

District Instruction Expenditures Per Pupil $6561 $5636

District Student Services Expenditures Per Pupil $3787 $3385

Panel B Teachers Participating Others

Age Under 30 0407 0160

Age 30-49 0432 0553

Age 50 or over 0161 0287

Female 0630 0536

Hispanic or Latino 0111 0051

Race American Indian or Alaska Native 0000 0009

Race Asian American 0111 0041

Race Black 0111 0060

Race Native Hawaiian or other Pacific Islander 0000 0004

Race White 0778 0896

Years of Experience 103 132

Years of Experience lt=2 0290 0085

Years of Experience lt=5 0481 0234

Hold a Teaching Certificate 0926 0945

Undergraduate Major in STEM 0944 0747

Single Subject Credential in Science 0630 0823

Masterrsquos Degree or Higher 0356 0615

Previously Taught AP Course 0469 NA

Previously Taught AP IB or Honors Course 0796 NA

Number of Professional Development Trainings 309 NA

in the Past 5 years (0-5)

Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts

httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public

high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a

9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the

Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey

httpsncesedgovsurveyssass Others in Panel B refers to public and private high school

teachers in the US High school science teachers are defined as teachers of grades 9-12 whose

main teaching assignment is in the natural sciences

23

Table 2

TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics

(1) (2) (3) (4) (5) (6)

Full Sample Survey Sample

Pre-Treatment Characteristic

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Age as of October of 11th Grade 166 -003 -007 166 -001 -001

(002) (007) (003) (009)

[019] [035] [065] [094]

Math Exam Score 038 008 025 044 007 030

(004) (010) (005) (016)

[008] [002] [017] [006]

Reading Exam Score 029 010 018 036 009 017

(003) (012) (004) (017)

[000] [014] [002] [031]

HS Grade Point Average 316 005 020 323 006 013

(003) (008) (003) (010)

[014] [002] [006] [020]

Female 059 000 010 061 -001 011

(003) (006) (004) (007)

[099] [010] [073] [012]

Asian American 012 002 010 012 003 010

(002) (005) (001) (007)

[027] [006] [007] [012]

Black 032 -002 -006 027 000 -005

(002) (006) (002) (005)

[029] [028] [088] [040]

Hispanic Native American or Multiracial 031 001 005 033 001 005

24

(002) (006) (002) (007)

[055] [041] [081] [051]

Disabled 002 000 -001 001 000 -001

(001) (001) (001) (001)

[093] [024] [057] [05]

Gifted 013 003 000 014 002 001

(002) (005) (002) (009)

[006] [100] [025] [089]

English Language Learner 005 001 002 004 001 004

(001) (002) (001) (003)

[041] [039] [054] [022]

Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007

(002) (007) (003) (009)

[066] [077] [072] [045]

Language Other than English Spoken at Home 034 002 003 035 001 004

(002) (007) (002) (007)

[032] [073] [059] [056]

Took Recommended Prerequisite Courses 079 000 009 079 002 005

(002) (004) (002) (005)

[084] [004] [043] [031]

Number of Observations 1819 1417

Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by

School x Cohort are in parentheses and p-values are in brackets

25

Table 3

First Stage Impacts on AP Course Enrollment and Overall Course Enrollment

(1) (2) (3) (4) (5) (6)

Full Sample Survey Respondents

Outcome

Control

Group

Mean

ITT

LATE

Control

Group

Mean

ITT

LATE

AP Treatment Course Enrollment 019 038 024 039

(005) (006)

[000] [000] Share of Credits During Study Year in

AP Science 003 004 011 003 004 010

(001) (001) (001) (001)

[000] [000] [000] [000]

All AP 013 004 011 014 004 010

(001) (002) (001) (002)

[000] [000] [000] [000]

Other Advanced Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [023] [020] [020]

All Other Advanced 025 -001 -003 025 -001 -003

(001) (002) (001) (003)

[023] [023] [030] [030]

Regular Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [020] [024] [019]

All Regular 062 -003 -009 061 -003 -007

(001) (003) (001) (003)

[002] [000] [007] [003]

Number of Observations 1819 1417

Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating

Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation

(1) Course-taking information collected from student transcripts Control Group Mean uses the

full control group for the first outcome (ie AP Treatment Course Enrollment) and those control

group members who complied with their assignment (ie those who did not take the AP

Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are

weighted by the inverse probability of completing the survey Standard errors clustered by School

x Cohort are in parentheses and p-values are in brackets

26

Table 4

Treatment Contrast (Composite Variables)

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Academically Challenging Curriculum -033 031 080

(010) (024)

[000] [000]

Project-Based Independent Classroom

Activities -006 013 033

(007) (017)

[007] [006]

Integrated Use of Technology

-011 011 028

(008) (019)

[019] [014]

Number of Observations 1417

Notes To construct these composite variables we first converted the values on each component

variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest

category was set to 10 the lowest to 00 and the remaining categories evenly spaced between

00 and 10 We then averaged and standardized these converted values Results are weighted by

the inverse probability of completing the survey Online Appendix Table 5 provides the list of

component variables Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

27

Table 5

AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Science Skill -019 009 023

(006) (016)

[015] [014]

STEM Interest 062 004 009

(002) (007)

[016] [016]

Confidence in College

Science 092 -004 -010

(002) (005)

[011] [006]

Stress 012 007 017

(003) (007)

[002] [001]

Grades in Science Courses 280 -012 -029

(007) (016)

[008] [007]

Grades in Other Courses 314 -007 -018

(002) (006)

[000] [000]

Number of Observations 1819 for grades 1417 for other

outcomes

Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of

participating students STEM interest =1 if high or some interest in pursuing a STEM degree or

=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to

complete a college science course or =0 if somewhat not confident or not at all confident Stress=

1 if most recent science course had strong negative or negative impact on physical or emotional

health or =0 if strong positive impact positive impact or no impact Grades in science and other

courses are obtained from student transcripts and measure grades during the study year

Results with the exception of grades during study year are weighted by the inverse probability of

completing the survey Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

Table 6

Robustness Checks of Main ITT Results

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Outcome

Control

Group

Complie

r Mean

Main

Result

s

Robus

t SE

p-value

(permutatio

n test)

Excludin

g High

School

56

Including

Imputatio

n of

Missing

Outcome

Variables

Excluding

Covariate

s

Excludin

g High

School

23

Lee

Lower

Boun

d

Lee

Upper

Boun

d

95

Confidence

Interval

from Lee

Bounds

Rati

o of

95

CI in

(11)

to

95

CI in

(7)

Science Skill -019 009 010 011 020 007 003 039

-

009

05

1 20

(006) (005) (000) (000) (000) (000) (007) (007)

[015] [006] [006] [020] [011] [001] [024] [072] [000]

STEM Interest 062 004 005 003 003 003 002 012

-

003

01

8 19

(002) (003) (000) (000) (000) (000) (003) (004)

[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College

Science 092 -004 -003 -006 -006 -004 -006 005

-

009

01

0 20

(002) (002) (000) (000) (000) (000) (002) (003)

[011] [005] [007] [037] [002] [003] [010] [000] [017]

Stress 012 007 005 006 008 007 001 011

-

005

01

5 16

(003) (002) (000) (000) (000) (000) (003) (002)

[002] [000] [000] [014] [007] [002] [002] [079] [000]

Grades in Science Courses 280 -012 -006 -010 -007 |

(007) (004) (000) (000) (000)

[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts

Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey

(002) (003) (000) (000) (000) |

[000] [001] [001] [000] [001] [038]

Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than

standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby

a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of

43

the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)

reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the

experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply

imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and

where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)

from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those

treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and

control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to

derive confidence interval for the treatment effect itself)

30

1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the

Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the

effect of advanced high school courses more generally often without distinctions between AP

and other rigorous course options Nearly all of these nonexperimental studies find large positive

effects of rigorous secondary school courses particularly those in math and science on studentsrsquo

high school postsecondary and labor market performance (eg Altonji 1995 Attewell and

Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long

Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer

an AP course are encouraged (though not required) to attend a professional development

training The Board and other independent agencies offer several workshops with the most

extensive training being the AP summer institute a week-long training that is led by an

experienced AP instructor Teachers are then expected to develop their syllabi for the course and

submit them to the Board for review A team of auditors at the Board review each syllabus and

grant permission to a school to label the course as AP on course catalogs and student transcripts

once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they

do not meet the requirements upon original submission College Board (2017b) contains a

discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for

assessment (ie course delivery and student performance are not assessed by the Board) In

order to effectively run an AP Biology or Chemistry course teachers require access to a well-

equipped classroom and laboratory including all supplies necessary to engage in

experimentation (eg beakers solutions microscopes measuring equipment) Most of the

teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of

learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry

reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses

were based upon recommendations from the National Science Foundation the National Research

Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently

influenced by their frames of reference in ways that other assessments of these traits (eg

external observations) may be less influenced By increasing the standard to which they compare

themselves studentsrsquo confidence may decrease This feature of most self-assessments could be

considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et

al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome

depends to some extent on how these changes in perceived ability influence other behaviors

such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and

Biology I and Chemistry I for AP Biology with no additional requirements beyond these

prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week

training course classroom supplies (eg lab materials textbooks) and to compensate schools

for the staff time required for study administration efforts We also offered $1000 compensation

for an individual selected by the school to serve as a liaison between the study team and the

31

school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of

students which would have powered the study to detect effect sizes smaller than those detected

here We faced several challenges in recruiting schools to participate even with the monetary

incentives Some schools were uncomfortable with randomization across classrooms while

others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the

course would be offered We also made some assignments on a rolling basis as additional

consentassent forms were submitted We have no information on the students who were deemed

eligible by the school to take the new AP science course but who did not sign the consent form

to participate As these students did not participate we do not have permission to obtain

information on their characteristics (eg via transcripts) and for most schools we do not know

the number of such students 10 Participating districts include Anaheim Union High School District California East Side

Union High School District California Lynwood Unified School District California Jefferson

Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg

Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public

Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville

Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate

degrees held by teachers nationally are likely to be in education (not STEM) Thus the study

teachers are less likely to have a graduate degree but not necessarily less likely to have STEM

training We also did not survey teachers regarding their Teach for America (TFA) experience

but it is possible that the relatively high share of STEM undergraduate degrees could be driven

by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last

pilot test included 140 students) prior to administering the tool to study participants Reliability

metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of

the development of the assessment instrument in the survey can be found in Seeratan et al

(2017) 13 Each year in the spring semester our team administered and collected the participant surveys

during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However

if study participants who did not take the survey differ in unobserved ways then our reweighting

based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo

characteristics before imputation of missing values (as described below) these results are very

similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact

that some students did not have 10th grade scores) we created one reading and math score for

each student that is the average of both scores or just the 8th grade score For the 23 participating

students who were in 10th grade during the year in which the AP course was offered to their

cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be

endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of

enrollment in the course so the lack of balance is simply due to unlucky randomization rather

32

than manipulation by school administrators We considered implementing a randomized block

design to avoid such issues but found it infeasible to obtain the necessary test score information

prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was

allowed to register for the new class We added an entire planning year to our study design to

avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by

Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We

find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these

six outcomes which suggests that generalizing our estimated treatment effects to the full control

group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly

offered enrollment in an AP course and then given the option of Chemistry or Biology To

account for the two courses offered we treat the school as two separate groups School-

Chemistry and School-Biology For those students who were not offered an AP course we

randomly assign them to one of two control groups proportional to the number of treated

students who chose each course For example if 60 of the treated students chose Biology then

we randomly assign 60 of the control students to the School-Biology control group In Section

VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a

probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1

if student i in school by cohort j completed any part of the end-of-year survey Xi is the same

vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed

effects and Φ() is the cumulative normal distribution function The results of this regression are

included in Online Appendix Table 2 Students who had higher pre-treatment grades Black

students those who were not disabled and those who took prerequisite courses were more likely

to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives

more weight in the regression to study participants who completed the survey and yet had pre-

study characteristics that were similar to those study participants who did not complete the

survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and

with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we

observe each outcome variable This follows a multiple imputation then deletion strategy

suggested by Hippel (2007) which improves efficiency while protecting against problematic

imputed outcome values As a robustness check Section VC provides results including

imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually

challenging curriculum with more homework than non-AP complier students Treatment group

students are also more likely to report that the students in their class were driven to succeed and

that the teacher set high standards The AP science class also involved more student-led projects

or experiments hands on learning and small group work all activities that are deemed to be

essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)

Yet we do not find strong evidence that students in AP classes were more likely to present what

they learned apply their knowledge to solve a new problem or work independently and none of

the component measures of technology usage were statistically significantly affected Nor did

33

treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear

better able to implement the academic rigor expected of an AP science class than some of the

inquiry-based approaches that the College Board intends for AP science We do not find

evidence that taking AP science led students to be more likely to report that they found their

course more interesting which may reflect the inability of the teachers to fully implement a

creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects

that might render our estimated effects smaller A research design with randomization both

across and within schools would allow for estimation of spillover effects but such a design was

infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers

received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the

weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors

in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple

outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons

(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same

three outcomes that reach statistical significance without applying the correction (shown in

Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys

from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we

have included the students from cohort 1 of high school number 23 where nonresponse was due

mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes

(including student prior academic preparation raceethnicity gender and teacher preparation)

We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in

science and grades in other courses) Some of the differences in the point estimates were quite

large yet so too were the standard errors For instance five of the seven estimated differential

treatment effects on science skill exceed 025 standard deviations with p-values that fall in the

suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse

on all three cohorts of study participants Once data collection is complete we will have the

ability to examine the effect of AP science on college enrollment college selectivity and college

completion

Page 22: Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown,

22

Table 1

Participating Schools and Teachers Compared to Other US High Schools and High School

Science Teachers Panel A Schools Participating Others

Average Enrollment 1409 723

Free or Reduced-Price Lunch 0700 0438

Asian 0055 0050

Black 0349 0154

Hispanic 0410 0221

White 0164 0537

Adjusted Cohort Graduation Rate 0843 0802

District Instruction Expenditures Per Pupil $6561 $5636

District Student Services Expenditures Per Pupil $3787 $3385

Panel B Teachers Participating Others

Age Under 30 0407 0160

Age 30-49 0432 0553

Age 50 or over 0161 0287

Female 0630 0536

Hispanic or Latino 0111 0051

Race American Indian or Alaska Native 0000 0009

Race Asian American 0111 0041

Race Black 0111 0060

Race Native Hawaiian or other Pacific Islander 0000 0004

Race White 0778 0896

Years of Experience 103 132

Years of Experience lt=2 0290 0085

Years of Experience lt=5 0481 0234

Hold a Teaching Certificate 0926 0945

Undergraduate Major in STEM 0944 0747

Single Subject Credential in Science 0630 0823

Masterrsquos Degree or Higher 0356 0615

Previously Taught AP Course 0469 NA

Previously Taught AP IB or Honors Course 0796 NA

Number of Professional Development Trainings 309 NA

in the Past 5 years (0-5)

Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts

httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public

high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a

9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the

Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey

httpsncesedgovsurveyssass Others in Panel B refers to public and private high school

teachers in the US High school science teachers are defined as teachers of grades 9-12 whose

main teaching assignment is in the natural sciences

23

Table 2

TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics

(1) (2) (3) (4) (5) (6)

Full Sample Survey Sample

Pre-Treatment Characteristic

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Age as of October of 11th Grade 166 -003 -007 166 -001 -001

(002) (007) (003) (009)

[019] [035] [065] [094]

Math Exam Score 038 008 025 044 007 030

(004) (010) (005) (016)

[008] [002] [017] [006]

Reading Exam Score 029 010 018 036 009 017

(003) (012) (004) (017)

[000] [014] [002] [031]

HS Grade Point Average 316 005 020 323 006 013

(003) (008) (003) (010)

[014] [002] [006] [020]

Female 059 000 010 061 -001 011

(003) (006) (004) (007)

[099] [010] [073] [012]

Asian American 012 002 010 012 003 010

(002) (005) (001) (007)

[027] [006] [007] [012]

Black 032 -002 -006 027 000 -005

(002) (006) (002) (005)

[029] [028] [088] [040]

Hispanic Native American or Multiracial 031 001 005 033 001 005

24

(002) (006) (002) (007)

[055] [041] [081] [051]

Disabled 002 000 -001 001 000 -001

(001) (001) (001) (001)

[093] [024] [057] [05]

Gifted 013 003 000 014 002 001

(002) (005) (002) (009)

[006] [100] [025] [089]

English Language Learner 005 001 002 004 001 004

(001) (002) (001) (003)

[041] [039] [054] [022]

Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007

(002) (007) (003) (009)

[066] [077] [072] [045]

Language Other than English Spoken at Home 034 002 003 035 001 004

(002) (007) (002) (007)

[032] [073] [059] [056]

Took Recommended Prerequisite Courses 079 000 009 079 002 005

(002) (004) (002) (005)

[084] [004] [043] [031]

Number of Observations 1819 1417

Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by

School x Cohort are in parentheses and p-values are in brackets

25

Table 3

First Stage Impacts on AP Course Enrollment and Overall Course Enrollment

(1) (2) (3) (4) (5) (6)

Full Sample Survey Respondents

Outcome

Control

Group

Mean

ITT

LATE

Control

Group

Mean

ITT

LATE

AP Treatment Course Enrollment 019 038 024 039

(005) (006)

[000] [000] Share of Credits During Study Year in

AP Science 003 004 011 003 004 010

(001) (001) (001) (001)

[000] [000] [000] [000]

All AP 013 004 011 014 004 010

(001) (002) (001) (002)

[000] [000] [000] [000]

Other Advanced Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [023] [020] [020]

All Other Advanced 025 -001 -003 025 -001 -003

(001) (002) (001) (003)

[023] [023] [030] [030]

Regular Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [020] [024] [019]

All Regular 062 -003 -009 061 -003 -007

(001) (003) (001) (003)

[002] [000] [007] [003]

Number of Observations 1819 1417

Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating

Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation

(1) Course-taking information collected from student transcripts Control Group Mean uses the

full control group for the first outcome (ie AP Treatment Course Enrollment) and those control

group members who complied with their assignment (ie those who did not take the AP

Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are

weighted by the inverse probability of completing the survey Standard errors clustered by School

x Cohort are in parentheses and p-values are in brackets

26

Table 4

Treatment Contrast (Composite Variables)

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Academically Challenging Curriculum -033 031 080

(010) (024)

[000] [000]

Project-Based Independent Classroom

Activities -006 013 033

(007) (017)

[007] [006]

Integrated Use of Technology

-011 011 028

(008) (019)

[019] [014]

Number of Observations 1417

Notes To construct these composite variables we first converted the values on each component

variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest

category was set to 10 the lowest to 00 and the remaining categories evenly spaced between

00 and 10 We then averaged and standardized these converted values Results are weighted by

the inverse probability of completing the survey Online Appendix Table 5 provides the list of

component variables Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

27

Table 5

AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Science Skill -019 009 023

(006) (016)

[015] [014]

STEM Interest 062 004 009

(002) (007)

[016] [016]

Confidence in College

Science 092 -004 -010

(002) (005)

[011] [006]

Stress 012 007 017

(003) (007)

[002] [001]

Grades in Science Courses 280 -012 -029

(007) (016)

[008] [007]

Grades in Other Courses 314 -007 -018

(002) (006)

[000] [000]

Number of Observations 1819 for grades 1417 for other

outcomes

Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of

participating students STEM interest =1 if high or some interest in pursuing a STEM degree or

=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to

complete a college science course or =0 if somewhat not confident or not at all confident Stress=

1 if most recent science course had strong negative or negative impact on physical or emotional

health or =0 if strong positive impact positive impact or no impact Grades in science and other

courses are obtained from student transcripts and measure grades during the study year

Results with the exception of grades during study year are weighted by the inverse probability of

completing the survey Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

Table 6

Robustness Checks of Main ITT Results

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Outcome

Control

Group

Complie

r Mean

Main

Result

s

Robus

t SE

p-value

(permutatio

n test)

Excludin

g High

School

56

Including

Imputatio

n of

Missing

Outcome

Variables

Excluding

Covariate

s

Excludin

g High

School

23

Lee

Lower

Boun

d

Lee

Upper

Boun

d

95

Confidence

Interval

from Lee

Bounds

Rati

o of

95

CI in

(11)

to

95

CI in

(7)

Science Skill -019 009 010 011 020 007 003 039

-

009

05

1 20

(006) (005) (000) (000) (000) (000) (007) (007)

[015] [006] [006] [020] [011] [001] [024] [072] [000]

STEM Interest 062 004 005 003 003 003 002 012

-

003

01

8 19

(002) (003) (000) (000) (000) (000) (003) (004)

[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College

Science 092 -004 -003 -006 -006 -004 -006 005

-

009

01

0 20

(002) (002) (000) (000) (000) (000) (002) (003)

[011] [005] [007] [037] [002] [003] [010] [000] [017]

Stress 012 007 005 006 008 007 001 011

-

005

01

5 16

(003) (002) (000) (000) (000) (000) (003) (002)

[002] [000] [000] [014] [007] [002] [002] [079] [000]

Grades in Science Courses 280 -012 -006 -010 -007 |

(007) (004) (000) (000) (000)

[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts

Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey

(002) (003) (000) (000) (000) |

[000] [001] [001] [000] [001] [038]

Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than

standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby

a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of

43

the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)

reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the

experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply

imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and

where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)

from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those

treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and

control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to

derive confidence interval for the treatment effect itself)

30

1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the

Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the

effect of advanced high school courses more generally often without distinctions between AP

and other rigorous course options Nearly all of these nonexperimental studies find large positive

effects of rigorous secondary school courses particularly those in math and science on studentsrsquo

high school postsecondary and labor market performance (eg Altonji 1995 Attewell and

Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long

Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer

an AP course are encouraged (though not required) to attend a professional development

training The Board and other independent agencies offer several workshops with the most

extensive training being the AP summer institute a week-long training that is led by an

experienced AP instructor Teachers are then expected to develop their syllabi for the course and

submit them to the Board for review A team of auditors at the Board review each syllabus and

grant permission to a school to label the course as AP on course catalogs and student transcripts

once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they

do not meet the requirements upon original submission College Board (2017b) contains a

discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for

assessment (ie course delivery and student performance are not assessed by the Board) In

order to effectively run an AP Biology or Chemistry course teachers require access to a well-

equipped classroom and laboratory including all supplies necessary to engage in

experimentation (eg beakers solutions microscopes measuring equipment) Most of the

teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of

learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry

reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses

were based upon recommendations from the National Science Foundation the National Research

Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently

influenced by their frames of reference in ways that other assessments of these traits (eg

external observations) may be less influenced By increasing the standard to which they compare

themselves studentsrsquo confidence may decrease This feature of most self-assessments could be

considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et

al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome

depends to some extent on how these changes in perceived ability influence other behaviors

such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and

Biology I and Chemistry I for AP Biology with no additional requirements beyond these

prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week

training course classroom supplies (eg lab materials textbooks) and to compensate schools

for the staff time required for study administration efforts We also offered $1000 compensation

for an individual selected by the school to serve as a liaison between the study team and the

31

school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of

students which would have powered the study to detect effect sizes smaller than those detected

here We faced several challenges in recruiting schools to participate even with the monetary

incentives Some schools were uncomfortable with randomization across classrooms while

others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the

course would be offered We also made some assignments on a rolling basis as additional

consentassent forms were submitted We have no information on the students who were deemed

eligible by the school to take the new AP science course but who did not sign the consent form

to participate As these students did not participate we do not have permission to obtain

information on their characteristics (eg via transcripts) and for most schools we do not know

the number of such students 10 Participating districts include Anaheim Union High School District California East Side

Union High School District California Lynwood Unified School District California Jefferson

Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg

Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public

Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville

Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate

degrees held by teachers nationally are likely to be in education (not STEM) Thus the study

teachers are less likely to have a graduate degree but not necessarily less likely to have STEM

training We also did not survey teachers regarding their Teach for America (TFA) experience

but it is possible that the relatively high share of STEM undergraduate degrees could be driven

by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last

pilot test included 140 students) prior to administering the tool to study participants Reliability

metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of

the development of the assessment instrument in the survey can be found in Seeratan et al

(2017) 13 Each year in the spring semester our team administered and collected the participant surveys

during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However

if study participants who did not take the survey differ in unobserved ways then our reweighting

based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo

characteristics before imputation of missing values (as described below) these results are very

similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact

that some students did not have 10th grade scores) we created one reading and math score for

each student that is the average of both scores or just the 8th grade score For the 23 participating

students who were in 10th grade during the year in which the AP course was offered to their

cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be

endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of

enrollment in the course so the lack of balance is simply due to unlucky randomization rather

32

than manipulation by school administrators We considered implementing a randomized block

design to avoid such issues but found it infeasible to obtain the necessary test score information

prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was

allowed to register for the new class We added an entire planning year to our study design to

avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by

Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We

find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these

six outcomes which suggests that generalizing our estimated treatment effects to the full control

group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly

offered enrollment in an AP course and then given the option of Chemistry or Biology To

account for the two courses offered we treat the school as two separate groups School-

Chemistry and School-Biology For those students who were not offered an AP course we

randomly assign them to one of two control groups proportional to the number of treated

students who chose each course For example if 60 of the treated students chose Biology then

we randomly assign 60 of the control students to the School-Biology control group In Section

VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a

probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1

if student i in school by cohort j completed any part of the end-of-year survey Xi is the same

vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed

effects and Φ() is the cumulative normal distribution function The results of this regression are

included in Online Appendix Table 2 Students who had higher pre-treatment grades Black

students those who were not disabled and those who took prerequisite courses were more likely

to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives

more weight in the regression to study participants who completed the survey and yet had pre-

study characteristics that were similar to those study participants who did not complete the

survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and

with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we

observe each outcome variable This follows a multiple imputation then deletion strategy

suggested by Hippel (2007) which improves efficiency while protecting against problematic

imputed outcome values As a robustness check Section VC provides results including

imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually

challenging curriculum with more homework than non-AP complier students Treatment group

students are also more likely to report that the students in their class were driven to succeed and

that the teacher set high standards The AP science class also involved more student-led projects

or experiments hands on learning and small group work all activities that are deemed to be

essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)

Yet we do not find strong evidence that students in AP classes were more likely to present what

they learned apply their knowledge to solve a new problem or work independently and none of

the component measures of technology usage were statistically significantly affected Nor did

33

treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear

better able to implement the academic rigor expected of an AP science class than some of the

inquiry-based approaches that the College Board intends for AP science We do not find

evidence that taking AP science led students to be more likely to report that they found their

course more interesting which may reflect the inability of the teachers to fully implement a

creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects

that might render our estimated effects smaller A research design with randomization both

across and within schools would allow for estimation of spillover effects but such a design was

infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers

received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the

weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors

in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple

outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons

(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same

three outcomes that reach statistical significance without applying the correction (shown in

Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys

from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we

have included the students from cohort 1 of high school number 23 where nonresponse was due

mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes

(including student prior academic preparation raceethnicity gender and teacher preparation)

We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in

science and grades in other courses) Some of the differences in the point estimates were quite

large yet so too were the standard errors For instance five of the seven estimated differential

treatment effects on science skill exceed 025 standard deviations with p-values that fall in the

suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse

on all three cohorts of study participants Once data collection is complete we will have the

ability to examine the effect of AP science on college enrollment college selectivity and college

completion

Page 23: Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown,

23

Table 2

TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics

(1) (2) (3) (4) (5) (6)

Full Sample Survey Sample

Pre-Treatment Characteristic

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Control

Group

Mean

Difference

Between

Treated and

Controls

Difference

Between

Control Group

Non-Compliers

and Compliers

Age as of October of 11th Grade 166 -003 -007 166 -001 -001

(002) (007) (003) (009)

[019] [035] [065] [094]

Math Exam Score 038 008 025 044 007 030

(004) (010) (005) (016)

[008] [002] [017] [006]

Reading Exam Score 029 010 018 036 009 017

(003) (012) (004) (017)

[000] [014] [002] [031]

HS Grade Point Average 316 005 020 323 006 013

(003) (008) (003) (010)

[014] [002] [006] [020]

Female 059 000 010 061 -001 011

(003) (006) (004) (007)

[099] [010] [073] [012]

Asian American 012 002 010 012 003 010

(002) (005) (001) (007)

[027] [006] [007] [012]

Black 032 -002 -006 027 000 -005

(002) (006) (002) (005)

[029] [028] [088] [040]

Hispanic Native American or Multiracial 031 001 005 033 001 005

24

(002) (006) (002) (007)

[055] [041] [081] [051]

Disabled 002 000 -001 001 000 -001

(001) (001) (001) (001)

[093] [024] [057] [05]

Gifted 013 003 000 014 002 001

(002) (005) (002) (009)

[006] [100] [025] [089]

English Language Learner 005 001 002 004 001 004

(001) (002) (001) (003)

[041] [039] [054] [022]

Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007

(002) (007) (003) (009)

[066] [077] [072] [045]

Language Other than English Spoken at Home 034 002 003 035 001 004

(002) (007) (002) (007)

[032] [073] [059] [056]

Took Recommended Prerequisite Courses 079 000 009 079 002 005

(002) (004) (002) (005)

[084] [004] [043] [031]

Number of Observations 1819 1417

Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by

School x Cohort are in parentheses and p-values are in brackets

25

Table 3

First Stage Impacts on AP Course Enrollment and Overall Course Enrollment

(1) (2) (3) (4) (5) (6)

Full Sample Survey Respondents

Outcome

Control

Group

Mean

ITT

LATE

Control

Group

Mean

ITT

LATE

AP Treatment Course Enrollment 019 038 024 039

(005) (006)

[000] [000] Share of Credits During Study Year in

AP Science 003 004 011 003 004 010

(001) (001) (001) (001)

[000] [000] [000] [000]

All AP 013 004 011 014 004 010

(001) (002) (001) (002)

[000] [000] [000] [000]

Other Advanced Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [023] [020] [020]

All Other Advanced 025 -001 -003 025 -001 -003

(001) (002) (001) (003)

[023] [023] [030] [030]

Regular Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [020] [024] [019]

All Regular 062 -003 -009 061 -003 -007

(001) (003) (001) (003)

[002] [000] [007] [003]

Number of Observations 1819 1417

Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating

Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation

(1) Course-taking information collected from student transcripts Control Group Mean uses the

full control group for the first outcome (ie AP Treatment Course Enrollment) and those control

group members who complied with their assignment (ie those who did not take the AP

Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are

weighted by the inverse probability of completing the survey Standard errors clustered by School

x Cohort are in parentheses and p-values are in brackets

26

Table 4

Treatment Contrast (Composite Variables)

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Academically Challenging Curriculum -033 031 080

(010) (024)

[000] [000]

Project-Based Independent Classroom

Activities -006 013 033

(007) (017)

[007] [006]

Integrated Use of Technology

-011 011 028

(008) (019)

[019] [014]

Number of Observations 1417

Notes To construct these composite variables we first converted the values on each component

variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest

category was set to 10 the lowest to 00 and the remaining categories evenly spaced between

00 and 10 We then averaged and standardized these converted values Results are weighted by

the inverse probability of completing the survey Online Appendix Table 5 provides the list of

component variables Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

27

Table 5

AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Science Skill -019 009 023

(006) (016)

[015] [014]

STEM Interest 062 004 009

(002) (007)

[016] [016]

Confidence in College

Science 092 -004 -010

(002) (005)

[011] [006]

Stress 012 007 017

(003) (007)

[002] [001]

Grades in Science Courses 280 -012 -029

(007) (016)

[008] [007]

Grades in Other Courses 314 -007 -018

(002) (006)

[000] [000]

Number of Observations 1819 for grades 1417 for other

outcomes

Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of

participating students STEM interest =1 if high or some interest in pursuing a STEM degree or

=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to

complete a college science course or =0 if somewhat not confident or not at all confident Stress=

1 if most recent science course had strong negative or negative impact on physical or emotional

health or =0 if strong positive impact positive impact or no impact Grades in science and other

courses are obtained from student transcripts and measure grades during the study year

Results with the exception of grades during study year are weighted by the inverse probability of

completing the survey Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

Table 6

Robustness Checks of Main ITT Results

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Outcome

Control

Group

Complie

r Mean

Main

Result

s

Robus

t SE

p-value

(permutatio

n test)

Excludin

g High

School

56

Including

Imputatio

n of

Missing

Outcome

Variables

Excluding

Covariate

s

Excludin

g High

School

23

Lee

Lower

Boun

d

Lee

Upper

Boun

d

95

Confidence

Interval

from Lee

Bounds

Rati

o of

95

CI in

(11)

to

95

CI in

(7)

Science Skill -019 009 010 011 020 007 003 039

-

009

05

1 20

(006) (005) (000) (000) (000) (000) (007) (007)

[015] [006] [006] [020] [011] [001] [024] [072] [000]

STEM Interest 062 004 005 003 003 003 002 012

-

003

01

8 19

(002) (003) (000) (000) (000) (000) (003) (004)

[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College

Science 092 -004 -003 -006 -006 -004 -006 005

-

009

01

0 20

(002) (002) (000) (000) (000) (000) (002) (003)

[011] [005] [007] [037] [002] [003] [010] [000] [017]

Stress 012 007 005 006 008 007 001 011

-

005

01

5 16

(003) (002) (000) (000) (000) (000) (003) (002)

[002] [000] [000] [014] [007] [002] [002] [079] [000]

Grades in Science Courses 280 -012 -006 -010 -007 |

(007) (004) (000) (000) (000)

[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts

Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey

(002) (003) (000) (000) (000) |

[000] [001] [001] [000] [001] [038]

Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than

standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby

a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of

43

the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)

reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the

experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply

imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and

where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)

from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those

treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and

control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to

derive confidence interval for the treatment effect itself)

30

1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the

Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the

effect of advanced high school courses more generally often without distinctions between AP

and other rigorous course options Nearly all of these nonexperimental studies find large positive

effects of rigorous secondary school courses particularly those in math and science on studentsrsquo

high school postsecondary and labor market performance (eg Altonji 1995 Attewell and

Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long

Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer

an AP course are encouraged (though not required) to attend a professional development

training The Board and other independent agencies offer several workshops with the most

extensive training being the AP summer institute a week-long training that is led by an

experienced AP instructor Teachers are then expected to develop their syllabi for the course and

submit them to the Board for review A team of auditors at the Board review each syllabus and

grant permission to a school to label the course as AP on course catalogs and student transcripts

once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they

do not meet the requirements upon original submission College Board (2017b) contains a

discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for

assessment (ie course delivery and student performance are not assessed by the Board) In

order to effectively run an AP Biology or Chemistry course teachers require access to a well-

equipped classroom and laboratory including all supplies necessary to engage in

experimentation (eg beakers solutions microscopes measuring equipment) Most of the

teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of

learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry

reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses

were based upon recommendations from the National Science Foundation the National Research

Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently

influenced by their frames of reference in ways that other assessments of these traits (eg

external observations) may be less influenced By increasing the standard to which they compare

themselves studentsrsquo confidence may decrease This feature of most self-assessments could be

considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et

al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome

depends to some extent on how these changes in perceived ability influence other behaviors

such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and

Biology I and Chemistry I for AP Biology with no additional requirements beyond these

prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week

training course classroom supplies (eg lab materials textbooks) and to compensate schools

for the staff time required for study administration efforts We also offered $1000 compensation

for an individual selected by the school to serve as a liaison between the study team and the

31

school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of

students which would have powered the study to detect effect sizes smaller than those detected

here We faced several challenges in recruiting schools to participate even with the monetary

incentives Some schools were uncomfortable with randomization across classrooms while

others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the

course would be offered We also made some assignments on a rolling basis as additional

consentassent forms were submitted We have no information on the students who were deemed

eligible by the school to take the new AP science course but who did not sign the consent form

to participate As these students did not participate we do not have permission to obtain

information on their characteristics (eg via transcripts) and for most schools we do not know

the number of such students 10 Participating districts include Anaheim Union High School District California East Side

Union High School District California Lynwood Unified School District California Jefferson

Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg

Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public

Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville

Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate

degrees held by teachers nationally are likely to be in education (not STEM) Thus the study

teachers are less likely to have a graduate degree but not necessarily less likely to have STEM

training We also did not survey teachers regarding their Teach for America (TFA) experience

but it is possible that the relatively high share of STEM undergraduate degrees could be driven

by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last

pilot test included 140 students) prior to administering the tool to study participants Reliability

metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of

the development of the assessment instrument in the survey can be found in Seeratan et al

(2017) 13 Each year in the spring semester our team administered and collected the participant surveys

during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However

if study participants who did not take the survey differ in unobserved ways then our reweighting

based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo

characteristics before imputation of missing values (as described below) these results are very

similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact

that some students did not have 10th grade scores) we created one reading and math score for

each student that is the average of both scores or just the 8th grade score For the 23 participating

students who were in 10th grade during the year in which the AP course was offered to their

cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be

endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of

enrollment in the course so the lack of balance is simply due to unlucky randomization rather

32

than manipulation by school administrators We considered implementing a randomized block

design to avoid such issues but found it infeasible to obtain the necessary test score information

prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was

allowed to register for the new class We added an entire planning year to our study design to

avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by

Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We

find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these

six outcomes which suggests that generalizing our estimated treatment effects to the full control

group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly

offered enrollment in an AP course and then given the option of Chemistry or Biology To

account for the two courses offered we treat the school as two separate groups School-

Chemistry and School-Biology For those students who were not offered an AP course we

randomly assign them to one of two control groups proportional to the number of treated

students who chose each course For example if 60 of the treated students chose Biology then

we randomly assign 60 of the control students to the School-Biology control group In Section

VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a

probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1

if student i in school by cohort j completed any part of the end-of-year survey Xi is the same

vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed

effects and Φ() is the cumulative normal distribution function The results of this regression are

included in Online Appendix Table 2 Students who had higher pre-treatment grades Black

students those who were not disabled and those who took prerequisite courses were more likely

to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives

more weight in the regression to study participants who completed the survey and yet had pre-

study characteristics that were similar to those study participants who did not complete the

survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and

with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we

observe each outcome variable This follows a multiple imputation then deletion strategy

suggested by Hippel (2007) which improves efficiency while protecting against problematic

imputed outcome values As a robustness check Section VC provides results including

imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually

challenging curriculum with more homework than non-AP complier students Treatment group

students are also more likely to report that the students in their class were driven to succeed and

that the teacher set high standards The AP science class also involved more student-led projects

or experiments hands on learning and small group work all activities that are deemed to be

essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)

Yet we do not find strong evidence that students in AP classes were more likely to present what

they learned apply their knowledge to solve a new problem or work independently and none of

the component measures of technology usage were statistically significantly affected Nor did

33

treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear

better able to implement the academic rigor expected of an AP science class than some of the

inquiry-based approaches that the College Board intends for AP science We do not find

evidence that taking AP science led students to be more likely to report that they found their

course more interesting which may reflect the inability of the teachers to fully implement a

creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects

that might render our estimated effects smaller A research design with randomization both

across and within schools would allow for estimation of spillover effects but such a design was

infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers

received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the

weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors

in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple

outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons

(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same

three outcomes that reach statistical significance without applying the correction (shown in

Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys

from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we

have included the students from cohort 1 of high school number 23 where nonresponse was due

mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes

(including student prior academic preparation raceethnicity gender and teacher preparation)

We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in

science and grades in other courses) Some of the differences in the point estimates were quite

large yet so too were the standard errors For instance five of the seven estimated differential

treatment effects on science skill exceed 025 standard deviations with p-values that fall in the

suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse

on all three cohorts of study participants Once data collection is complete we will have the

ability to examine the effect of AP science on college enrollment college selectivity and college

completion

Page 24: Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown,

24

(002) (006) (002) (007)

[055] [041] [081] [051]

Disabled 002 000 -001 001 000 -001

(001) (001) (001) (001)

[093] [024] [057] [05]

Gifted 013 003 000 014 002 001

(002) (005) (002) (009)

[006] [100] [025] [089]

English Language Learner 005 001 002 004 001 004

(001) (002) (001) (003)

[041] [039] [054] [022]

Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007

(002) (007) (003) (009)

[066] [077] [072] [045]

Language Other than English Spoken at Home 034 002 003 035 001 004

(002) (007) (002) (007)

[032] [073] [059] [056]

Took Recommended Prerequisite Courses 079 000 009 079 002 005

(002) (004) (002) (005)

[084] [004] [043] [031]

Number of Observations 1819 1417

Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by

School x Cohort are in parentheses and p-values are in brackets

25

Table 3

First Stage Impacts on AP Course Enrollment and Overall Course Enrollment

(1) (2) (3) (4) (5) (6)

Full Sample Survey Respondents

Outcome

Control

Group

Mean

ITT

LATE

Control

Group

Mean

ITT

LATE

AP Treatment Course Enrollment 019 038 024 039

(005) (006)

[000] [000] Share of Credits During Study Year in

AP Science 003 004 011 003 004 010

(001) (001) (001) (001)

[000] [000] [000] [000]

All AP 013 004 011 014 004 010

(001) (002) (001) (002)

[000] [000] [000] [000]

Other Advanced Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [023] [020] [020]

All Other Advanced 025 -001 -003 025 -001 -003

(001) (002) (001) (003)

[023] [023] [030] [030]

Regular Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [020] [024] [019]

All Regular 062 -003 -009 061 -003 -007

(001) (003) (001) (003)

[002] [000] [007] [003]

Number of Observations 1819 1417

Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating

Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation

(1) Course-taking information collected from student transcripts Control Group Mean uses the

full control group for the first outcome (ie AP Treatment Course Enrollment) and those control

group members who complied with their assignment (ie those who did not take the AP

Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are

weighted by the inverse probability of completing the survey Standard errors clustered by School

x Cohort are in parentheses and p-values are in brackets

26

Table 4

Treatment Contrast (Composite Variables)

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Academically Challenging Curriculum -033 031 080

(010) (024)

[000] [000]

Project-Based Independent Classroom

Activities -006 013 033

(007) (017)

[007] [006]

Integrated Use of Technology

-011 011 028

(008) (019)

[019] [014]

Number of Observations 1417

Notes To construct these composite variables we first converted the values on each component

variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest

category was set to 10 the lowest to 00 and the remaining categories evenly spaced between

00 and 10 We then averaged and standardized these converted values Results are weighted by

the inverse probability of completing the survey Online Appendix Table 5 provides the list of

component variables Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

27

Table 5

AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Science Skill -019 009 023

(006) (016)

[015] [014]

STEM Interest 062 004 009

(002) (007)

[016] [016]

Confidence in College

Science 092 -004 -010

(002) (005)

[011] [006]

Stress 012 007 017

(003) (007)

[002] [001]

Grades in Science Courses 280 -012 -029

(007) (016)

[008] [007]

Grades in Other Courses 314 -007 -018

(002) (006)

[000] [000]

Number of Observations 1819 for grades 1417 for other

outcomes

Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of

participating students STEM interest =1 if high or some interest in pursuing a STEM degree or

=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to

complete a college science course or =0 if somewhat not confident or not at all confident Stress=

1 if most recent science course had strong negative or negative impact on physical or emotional

health or =0 if strong positive impact positive impact or no impact Grades in science and other

courses are obtained from student transcripts and measure grades during the study year

Results with the exception of grades during study year are weighted by the inverse probability of

completing the survey Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

Table 6

Robustness Checks of Main ITT Results

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Outcome

Control

Group

Complie

r Mean

Main

Result

s

Robus

t SE

p-value

(permutatio

n test)

Excludin

g High

School

56

Including

Imputatio

n of

Missing

Outcome

Variables

Excluding

Covariate

s

Excludin

g High

School

23

Lee

Lower

Boun

d

Lee

Upper

Boun

d

95

Confidence

Interval

from Lee

Bounds

Rati

o of

95

CI in

(11)

to

95

CI in

(7)

Science Skill -019 009 010 011 020 007 003 039

-

009

05

1 20

(006) (005) (000) (000) (000) (000) (007) (007)

[015] [006] [006] [020] [011] [001] [024] [072] [000]

STEM Interest 062 004 005 003 003 003 002 012

-

003

01

8 19

(002) (003) (000) (000) (000) (000) (003) (004)

[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College

Science 092 -004 -003 -006 -006 -004 -006 005

-

009

01

0 20

(002) (002) (000) (000) (000) (000) (002) (003)

[011] [005] [007] [037] [002] [003] [010] [000] [017]

Stress 012 007 005 006 008 007 001 011

-

005

01

5 16

(003) (002) (000) (000) (000) (000) (003) (002)

[002] [000] [000] [014] [007] [002] [002] [079] [000]

Grades in Science Courses 280 -012 -006 -010 -007 |

(007) (004) (000) (000) (000)

[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts

Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey

(002) (003) (000) (000) (000) |

[000] [001] [001] [000] [001] [038]

Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than

standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby

a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of

43

the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)

reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the

experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply

imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and

where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)

from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those

treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and

control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to

derive confidence interval for the treatment effect itself)

30

1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the

Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the

effect of advanced high school courses more generally often without distinctions between AP

and other rigorous course options Nearly all of these nonexperimental studies find large positive

effects of rigorous secondary school courses particularly those in math and science on studentsrsquo

high school postsecondary and labor market performance (eg Altonji 1995 Attewell and

Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long

Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer

an AP course are encouraged (though not required) to attend a professional development

training The Board and other independent agencies offer several workshops with the most

extensive training being the AP summer institute a week-long training that is led by an

experienced AP instructor Teachers are then expected to develop their syllabi for the course and

submit them to the Board for review A team of auditors at the Board review each syllabus and

grant permission to a school to label the course as AP on course catalogs and student transcripts

once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they

do not meet the requirements upon original submission College Board (2017b) contains a

discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for

assessment (ie course delivery and student performance are not assessed by the Board) In

order to effectively run an AP Biology or Chemistry course teachers require access to a well-

equipped classroom and laboratory including all supplies necessary to engage in

experimentation (eg beakers solutions microscopes measuring equipment) Most of the

teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of

learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry

reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses

were based upon recommendations from the National Science Foundation the National Research

Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently

influenced by their frames of reference in ways that other assessments of these traits (eg

external observations) may be less influenced By increasing the standard to which they compare

themselves studentsrsquo confidence may decrease This feature of most self-assessments could be

considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et

al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome

depends to some extent on how these changes in perceived ability influence other behaviors

such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and

Biology I and Chemistry I for AP Biology with no additional requirements beyond these

prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week

training course classroom supplies (eg lab materials textbooks) and to compensate schools

for the staff time required for study administration efforts We also offered $1000 compensation

for an individual selected by the school to serve as a liaison between the study team and the

31

school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of

students which would have powered the study to detect effect sizes smaller than those detected

here We faced several challenges in recruiting schools to participate even with the monetary

incentives Some schools were uncomfortable with randomization across classrooms while

others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the

course would be offered We also made some assignments on a rolling basis as additional

consentassent forms were submitted We have no information on the students who were deemed

eligible by the school to take the new AP science course but who did not sign the consent form

to participate As these students did not participate we do not have permission to obtain

information on their characteristics (eg via transcripts) and for most schools we do not know

the number of such students 10 Participating districts include Anaheim Union High School District California East Side

Union High School District California Lynwood Unified School District California Jefferson

Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg

Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public

Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville

Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate

degrees held by teachers nationally are likely to be in education (not STEM) Thus the study

teachers are less likely to have a graduate degree but not necessarily less likely to have STEM

training We also did not survey teachers regarding their Teach for America (TFA) experience

but it is possible that the relatively high share of STEM undergraduate degrees could be driven

by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last

pilot test included 140 students) prior to administering the tool to study participants Reliability

metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of

the development of the assessment instrument in the survey can be found in Seeratan et al

(2017) 13 Each year in the spring semester our team administered and collected the participant surveys

during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However

if study participants who did not take the survey differ in unobserved ways then our reweighting

based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo

characteristics before imputation of missing values (as described below) these results are very

similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact

that some students did not have 10th grade scores) we created one reading and math score for

each student that is the average of both scores or just the 8th grade score For the 23 participating

students who were in 10th grade during the year in which the AP course was offered to their

cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be

endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of

enrollment in the course so the lack of balance is simply due to unlucky randomization rather

32

than manipulation by school administrators We considered implementing a randomized block

design to avoid such issues but found it infeasible to obtain the necessary test score information

prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was

allowed to register for the new class We added an entire planning year to our study design to

avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by

Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We

find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these

six outcomes which suggests that generalizing our estimated treatment effects to the full control

group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly

offered enrollment in an AP course and then given the option of Chemistry or Biology To

account for the two courses offered we treat the school as two separate groups School-

Chemistry and School-Biology For those students who were not offered an AP course we

randomly assign them to one of two control groups proportional to the number of treated

students who chose each course For example if 60 of the treated students chose Biology then

we randomly assign 60 of the control students to the School-Biology control group In Section

VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a

probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1

if student i in school by cohort j completed any part of the end-of-year survey Xi is the same

vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed

effects and Φ() is the cumulative normal distribution function The results of this regression are

included in Online Appendix Table 2 Students who had higher pre-treatment grades Black

students those who were not disabled and those who took prerequisite courses were more likely

to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives

more weight in the regression to study participants who completed the survey and yet had pre-

study characteristics that were similar to those study participants who did not complete the

survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and

with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we

observe each outcome variable This follows a multiple imputation then deletion strategy

suggested by Hippel (2007) which improves efficiency while protecting against problematic

imputed outcome values As a robustness check Section VC provides results including

imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually

challenging curriculum with more homework than non-AP complier students Treatment group

students are also more likely to report that the students in their class were driven to succeed and

that the teacher set high standards The AP science class also involved more student-led projects

or experiments hands on learning and small group work all activities that are deemed to be

essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)

Yet we do not find strong evidence that students in AP classes were more likely to present what

they learned apply their knowledge to solve a new problem or work independently and none of

the component measures of technology usage were statistically significantly affected Nor did

33

treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear

better able to implement the academic rigor expected of an AP science class than some of the

inquiry-based approaches that the College Board intends for AP science We do not find

evidence that taking AP science led students to be more likely to report that they found their

course more interesting which may reflect the inability of the teachers to fully implement a

creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects

that might render our estimated effects smaller A research design with randomization both

across and within schools would allow for estimation of spillover effects but such a design was

infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers

received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the

weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors

in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple

outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons

(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same

three outcomes that reach statistical significance without applying the correction (shown in

Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys

from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we

have included the students from cohort 1 of high school number 23 where nonresponse was due

mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes

(including student prior academic preparation raceethnicity gender and teacher preparation)

We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in

science and grades in other courses) Some of the differences in the point estimates were quite

large yet so too were the standard errors For instance five of the seven estimated differential

treatment effects on science skill exceed 025 standard deviations with p-values that fall in the

suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse

on all three cohorts of study participants Once data collection is complete we will have the

ability to examine the effect of AP science on college enrollment college selectivity and college

completion

Page 25: Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown,

25

Table 3

First Stage Impacts on AP Course Enrollment and Overall Course Enrollment

(1) (2) (3) (4) (5) (6)

Full Sample Survey Respondents

Outcome

Control

Group

Mean

ITT

LATE

Control

Group

Mean

ITT

LATE

AP Treatment Course Enrollment 019 038 024 039

(005) (006)

[000] [000] Share of Credits During Study Year in

AP Science 003 004 011 003 004 010

(001) (001) (001) (001)

[000] [000] [000] [000]

All AP 013 004 011 014 004 010

(001) (002) (001) (002)

[000] [000] [000] [000]

Other Advanced Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [023] [020] [020]

All Other Advanced 025 -001 -003 025 -001 -003

(001) (002) (001) (003)

[023] [023] [030] [030]

Regular Science 006 -001 -002 006 -001 -002

(001) (002) (001) (002)

[024] [020] [024] [019]

All Regular 062 -003 -009 061 -003 -007

(001) (003) (001) (003)

[002] [000] [007] [003]

Number of Observations 1819 1417

Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating

Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation

(1) Course-taking information collected from student transcripts Control Group Mean uses the

full control group for the first outcome (ie AP Treatment Course Enrollment) and those control

group members who complied with their assignment (ie those who did not take the AP

Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are

weighted by the inverse probability of completing the survey Standard errors clustered by School

x Cohort are in parentheses and p-values are in brackets

26

Table 4

Treatment Contrast (Composite Variables)

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Academically Challenging Curriculum -033 031 080

(010) (024)

[000] [000]

Project-Based Independent Classroom

Activities -006 013 033

(007) (017)

[007] [006]

Integrated Use of Technology

-011 011 028

(008) (019)

[019] [014]

Number of Observations 1417

Notes To construct these composite variables we first converted the values on each component

variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest

category was set to 10 the lowest to 00 and the remaining categories evenly spaced between

00 and 10 We then averaged and standardized these converted values Results are weighted by

the inverse probability of completing the survey Online Appendix Table 5 provides the list of

component variables Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

27

Table 5

AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Science Skill -019 009 023

(006) (016)

[015] [014]

STEM Interest 062 004 009

(002) (007)

[016] [016]

Confidence in College

Science 092 -004 -010

(002) (005)

[011] [006]

Stress 012 007 017

(003) (007)

[002] [001]

Grades in Science Courses 280 -012 -029

(007) (016)

[008] [007]

Grades in Other Courses 314 -007 -018

(002) (006)

[000] [000]

Number of Observations 1819 for grades 1417 for other

outcomes

Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of

participating students STEM interest =1 if high or some interest in pursuing a STEM degree or

=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to

complete a college science course or =0 if somewhat not confident or not at all confident Stress=

1 if most recent science course had strong negative or negative impact on physical or emotional

health or =0 if strong positive impact positive impact or no impact Grades in science and other

courses are obtained from student transcripts and measure grades during the study year

Results with the exception of grades during study year are weighted by the inverse probability of

completing the survey Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

Table 6

Robustness Checks of Main ITT Results

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Outcome

Control

Group

Complie

r Mean

Main

Result

s

Robus

t SE

p-value

(permutatio

n test)

Excludin

g High

School

56

Including

Imputatio

n of

Missing

Outcome

Variables

Excluding

Covariate

s

Excludin

g High

School

23

Lee

Lower

Boun

d

Lee

Upper

Boun

d

95

Confidence

Interval

from Lee

Bounds

Rati

o of

95

CI in

(11)

to

95

CI in

(7)

Science Skill -019 009 010 011 020 007 003 039

-

009

05

1 20

(006) (005) (000) (000) (000) (000) (007) (007)

[015] [006] [006] [020] [011] [001] [024] [072] [000]

STEM Interest 062 004 005 003 003 003 002 012

-

003

01

8 19

(002) (003) (000) (000) (000) (000) (003) (004)

[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College

Science 092 -004 -003 -006 -006 -004 -006 005

-

009

01

0 20

(002) (002) (000) (000) (000) (000) (002) (003)

[011] [005] [007] [037] [002] [003] [010] [000] [017]

Stress 012 007 005 006 008 007 001 011

-

005

01

5 16

(003) (002) (000) (000) (000) (000) (003) (002)

[002] [000] [000] [014] [007] [002] [002] [079] [000]

Grades in Science Courses 280 -012 -006 -010 -007 |

(007) (004) (000) (000) (000)

[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts

Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey

(002) (003) (000) (000) (000) |

[000] [001] [001] [000] [001] [038]

Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than

standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby

a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of

43

the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)

reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the

experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply

imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and

where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)

from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those

treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and

control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to

derive confidence interval for the treatment effect itself)

30

1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the

Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the

effect of advanced high school courses more generally often without distinctions between AP

and other rigorous course options Nearly all of these nonexperimental studies find large positive

effects of rigorous secondary school courses particularly those in math and science on studentsrsquo

high school postsecondary and labor market performance (eg Altonji 1995 Attewell and

Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long

Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer

an AP course are encouraged (though not required) to attend a professional development

training The Board and other independent agencies offer several workshops with the most

extensive training being the AP summer institute a week-long training that is led by an

experienced AP instructor Teachers are then expected to develop their syllabi for the course and

submit them to the Board for review A team of auditors at the Board review each syllabus and

grant permission to a school to label the course as AP on course catalogs and student transcripts

once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they

do not meet the requirements upon original submission College Board (2017b) contains a

discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for

assessment (ie course delivery and student performance are not assessed by the Board) In

order to effectively run an AP Biology or Chemistry course teachers require access to a well-

equipped classroom and laboratory including all supplies necessary to engage in

experimentation (eg beakers solutions microscopes measuring equipment) Most of the

teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of

learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry

reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses

were based upon recommendations from the National Science Foundation the National Research

Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently

influenced by their frames of reference in ways that other assessments of these traits (eg

external observations) may be less influenced By increasing the standard to which they compare

themselves studentsrsquo confidence may decrease This feature of most self-assessments could be

considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et

al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome

depends to some extent on how these changes in perceived ability influence other behaviors

such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and

Biology I and Chemistry I for AP Biology with no additional requirements beyond these

prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week

training course classroom supplies (eg lab materials textbooks) and to compensate schools

for the staff time required for study administration efforts We also offered $1000 compensation

for an individual selected by the school to serve as a liaison between the study team and the

31

school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of

students which would have powered the study to detect effect sizes smaller than those detected

here We faced several challenges in recruiting schools to participate even with the monetary

incentives Some schools were uncomfortable with randomization across classrooms while

others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the

course would be offered We also made some assignments on a rolling basis as additional

consentassent forms were submitted We have no information on the students who were deemed

eligible by the school to take the new AP science course but who did not sign the consent form

to participate As these students did not participate we do not have permission to obtain

information on their characteristics (eg via transcripts) and for most schools we do not know

the number of such students 10 Participating districts include Anaheim Union High School District California East Side

Union High School District California Lynwood Unified School District California Jefferson

Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg

Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public

Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville

Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate

degrees held by teachers nationally are likely to be in education (not STEM) Thus the study

teachers are less likely to have a graduate degree but not necessarily less likely to have STEM

training We also did not survey teachers regarding their Teach for America (TFA) experience

but it is possible that the relatively high share of STEM undergraduate degrees could be driven

by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last

pilot test included 140 students) prior to administering the tool to study participants Reliability

metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of

the development of the assessment instrument in the survey can be found in Seeratan et al

(2017) 13 Each year in the spring semester our team administered and collected the participant surveys

during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However

if study participants who did not take the survey differ in unobserved ways then our reweighting

based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo

characteristics before imputation of missing values (as described below) these results are very

similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact

that some students did not have 10th grade scores) we created one reading and math score for

each student that is the average of both scores or just the 8th grade score For the 23 participating

students who were in 10th grade during the year in which the AP course was offered to their

cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be

endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of

enrollment in the course so the lack of balance is simply due to unlucky randomization rather

32

than manipulation by school administrators We considered implementing a randomized block

design to avoid such issues but found it infeasible to obtain the necessary test score information

prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was

allowed to register for the new class We added an entire planning year to our study design to

avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by

Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We

find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these

six outcomes which suggests that generalizing our estimated treatment effects to the full control

group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly

offered enrollment in an AP course and then given the option of Chemistry or Biology To

account for the two courses offered we treat the school as two separate groups School-

Chemistry and School-Biology For those students who were not offered an AP course we

randomly assign them to one of two control groups proportional to the number of treated

students who chose each course For example if 60 of the treated students chose Biology then

we randomly assign 60 of the control students to the School-Biology control group In Section

VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a

probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1

if student i in school by cohort j completed any part of the end-of-year survey Xi is the same

vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed

effects and Φ() is the cumulative normal distribution function The results of this regression are

included in Online Appendix Table 2 Students who had higher pre-treatment grades Black

students those who were not disabled and those who took prerequisite courses were more likely

to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives

more weight in the regression to study participants who completed the survey and yet had pre-

study characteristics that were similar to those study participants who did not complete the

survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and

with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we

observe each outcome variable This follows a multiple imputation then deletion strategy

suggested by Hippel (2007) which improves efficiency while protecting against problematic

imputed outcome values As a robustness check Section VC provides results including

imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually

challenging curriculum with more homework than non-AP complier students Treatment group

students are also more likely to report that the students in their class were driven to succeed and

that the teacher set high standards The AP science class also involved more student-led projects

or experiments hands on learning and small group work all activities that are deemed to be

essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)

Yet we do not find strong evidence that students in AP classes were more likely to present what

they learned apply their knowledge to solve a new problem or work independently and none of

the component measures of technology usage were statistically significantly affected Nor did

33

treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear

better able to implement the academic rigor expected of an AP science class than some of the

inquiry-based approaches that the College Board intends for AP science We do not find

evidence that taking AP science led students to be more likely to report that they found their

course more interesting which may reflect the inability of the teachers to fully implement a

creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects

that might render our estimated effects smaller A research design with randomization both

across and within schools would allow for estimation of spillover effects but such a design was

infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers

received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the

weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors

in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple

outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons

(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same

three outcomes that reach statistical significance without applying the correction (shown in

Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys

from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we

have included the students from cohort 1 of high school number 23 where nonresponse was due

mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes

(including student prior academic preparation raceethnicity gender and teacher preparation)

We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in

science and grades in other courses) Some of the differences in the point estimates were quite

large yet so too were the standard errors For instance five of the seven estimated differential

treatment effects on science skill exceed 025 standard deviations with p-values that fall in the

suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse

on all three cohorts of study participants Once data collection is complete we will have the

ability to examine the effect of AP science on college enrollment college selectivity and college

completion

Page 26: Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown,

26

Table 4

Treatment Contrast (Composite Variables)

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Academically Challenging Curriculum -033 031 080

(010) (024)

[000] [000]

Project-Based Independent Classroom

Activities -006 013 033

(007) (017)

[007] [006]

Integrated Use of Technology

-011 011 028

(008) (019)

[019] [014]

Number of Observations 1417

Notes To construct these composite variables we first converted the values on each component

variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest

category was set to 10 the lowest to 00 and the remaining categories evenly spaced between

00 and 10 We then averaged and standardized these converted values Results are weighted by

the inverse probability of completing the survey Online Appendix Table 5 provides the list of

component variables Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

27

Table 5

AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Science Skill -019 009 023

(006) (016)

[015] [014]

STEM Interest 062 004 009

(002) (007)

[016] [016]

Confidence in College

Science 092 -004 -010

(002) (005)

[011] [006]

Stress 012 007 017

(003) (007)

[002] [001]

Grades in Science Courses 280 -012 -029

(007) (016)

[008] [007]

Grades in Other Courses 314 -007 -018

(002) (006)

[000] [000]

Number of Observations 1819 for grades 1417 for other

outcomes

Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of

participating students STEM interest =1 if high or some interest in pursuing a STEM degree or

=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to

complete a college science course or =0 if somewhat not confident or not at all confident Stress=

1 if most recent science course had strong negative or negative impact on physical or emotional

health or =0 if strong positive impact positive impact or no impact Grades in science and other

courses are obtained from student transcripts and measure grades during the study year

Results with the exception of grades during study year are weighted by the inverse probability of

completing the survey Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

Table 6

Robustness Checks of Main ITT Results

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Outcome

Control

Group

Complie

r Mean

Main

Result

s

Robus

t SE

p-value

(permutatio

n test)

Excludin

g High

School

56

Including

Imputatio

n of

Missing

Outcome

Variables

Excluding

Covariate

s

Excludin

g High

School

23

Lee

Lower

Boun

d

Lee

Upper

Boun

d

95

Confidence

Interval

from Lee

Bounds

Rati

o of

95

CI in

(11)

to

95

CI in

(7)

Science Skill -019 009 010 011 020 007 003 039

-

009

05

1 20

(006) (005) (000) (000) (000) (000) (007) (007)

[015] [006] [006] [020] [011] [001] [024] [072] [000]

STEM Interest 062 004 005 003 003 003 002 012

-

003

01

8 19

(002) (003) (000) (000) (000) (000) (003) (004)

[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College

Science 092 -004 -003 -006 -006 -004 -006 005

-

009

01

0 20

(002) (002) (000) (000) (000) (000) (002) (003)

[011] [005] [007] [037] [002] [003] [010] [000] [017]

Stress 012 007 005 006 008 007 001 011

-

005

01

5 16

(003) (002) (000) (000) (000) (000) (003) (002)

[002] [000] [000] [014] [007] [002] [002] [079] [000]

Grades in Science Courses 280 -012 -006 -010 -007 |

(007) (004) (000) (000) (000)

[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts

Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey

(002) (003) (000) (000) (000) |

[000] [001] [001] [000] [001] [038]

Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than

standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby

a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of

43

the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)

reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the

experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply

imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and

where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)

from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those

treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and

control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to

derive confidence interval for the treatment effect itself)

30

1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the

Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the

effect of advanced high school courses more generally often without distinctions between AP

and other rigorous course options Nearly all of these nonexperimental studies find large positive

effects of rigorous secondary school courses particularly those in math and science on studentsrsquo

high school postsecondary and labor market performance (eg Altonji 1995 Attewell and

Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long

Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer

an AP course are encouraged (though not required) to attend a professional development

training The Board and other independent agencies offer several workshops with the most

extensive training being the AP summer institute a week-long training that is led by an

experienced AP instructor Teachers are then expected to develop their syllabi for the course and

submit them to the Board for review A team of auditors at the Board review each syllabus and

grant permission to a school to label the course as AP on course catalogs and student transcripts

once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they

do not meet the requirements upon original submission College Board (2017b) contains a

discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for

assessment (ie course delivery and student performance are not assessed by the Board) In

order to effectively run an AP Biology or Chemistry course teachers require access to a well-

equipped classroom and laboratory including all supplies necessary to engage in

experimentation (eg beakers solutions microscopes measuring equipment) Most of the

teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of

learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry

reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses

were based upon recommendations from the National Science Foundation the National Research

Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently

influenced by their frames of reference in ways that other assessments of these traits (eg

external observations) may be less influenced By increasing the standard to which they compare

themselves studentsrsquo confidence may decrease This feature of most self-assessments could be

considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et

al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome

depends to some extent on how these changes in perceived ability influence other behaviors

such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and

Biology I and Chemistry I for AP Biology with no additional requirements beyond these

prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week

training course classroom supplies (eg lab materials textbooks) and to compensate schools

for the staff time required for study administration efforts We also offered $1000 compensation

for an individual selected by the school to serve as a liaison between the study team and the

31

school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of

students which would have powered the study to detect effect sizes smaller than those detected

here We faced several challenges in recruiting schools to participate even with the monetary

incentives Some schools were uncomfortable with randomization across classrooms while

others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the

course would be offered We also made some assignments on a rolling basis as additional

consentassent forms were submitted We have no information on the students who were deemed

eligible by the school to take the new AP science course but who did not sign the consent form

to participate As these students did not participate we do not have permission to obtain

information on their characteristics (eg via transcripts) and for most schools we do not know

the number of such students 10 Participating districts include Anaheim Union High School District California East Side

Union High School District California Lynwood Unified School District California Jefferson

Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg

Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public

Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville

Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate

degrees held by teachers nationally are likely to be in education (not STEM) Thus the study

teachers are less likely to have a graduate degree but not necessarily less likely to have STEM

training We also did not survey teachers regarding their Teach for America (TFA) experience

but it is possible that the relatively high share of STEM undergraduate degrees could be driven

by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last

pilot test included 140 students) prior to administering the tool to study participants Reliability

metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of

the development of the assessment instrument in the survey can be found in Seeratan et al

(2017) 13 Each year in the spring semester our team administered and collected the participant surveys

during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However

if study participants who did not take the survey differ in unobserved ways then our reweighting

based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo

characteristics before imputation of missing values (as described below) these results are very

similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact

that some students did not have 10th grade scores) we created one reading and math score for

each student that is the average of both scores or just the 8th grade score For the 23 participating

students who were in 10th grade during the year in which the AP course was offered to their

cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be

endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of

enrollment in the course so the lack of balance is simply due to unlucky randomization rather

32

than manipulation by school administrators We considered implementing a randomized block

design to avoid such issues but found it infeasible to obtain the necessary test score information

prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was

allowed to register for the new class We added an entire planning year to our study design to

avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by

Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We

find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these

six outcomes which suggests that generalizing our estimated treatment effects to the full control

group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly

offered enrollment in an AP course and then given the option of Chemistry or Biology To

account for the two courses offered we treat the school as two separate groups School-

Chemistry and School-Biology For those students who were not offered an AP course we

randomly assign them to one of two control groups proportional to the number of treated

students who chose each course For example if 60 of the treated students chose Biology then

we randomly assign 60 of the control students to the School-Biology control group In Section

VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a

probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1

if student i in school by cohort j completed any part of the end-of-year survey Xi is the same

vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed

effects and Φ() is the cumulative normal distribution function The results of this regression are

included in Online Appendix Table 2 Students who had higher pre-treatment grades Black

students those who were not disabled and those who took prerequisite courses were more likely

to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives

more weight in the regression to study participants who completed the survey and yet had pre-

study characteristics that were similar to those study participants who did not complete the

survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and

with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we

observe each outcome variable This follows a multiple imputation then deletion strategy

suggested by Hippel (2007) which improves efficiency while protecting against problematic

imputed outcome values As a robustness check Section VC provides results including

imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually

challenging curriculum with more homework than non-AP complier students Treatment group

students are also more likely to report that the students in their class were driven to succeed and

that the teacher set high standards The AP science class also involved more student-led projects

or experiments hands on learning and small group work all activities that are deemed to be

essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)

Yet we do not find strong evidence that students in AP classes were more likely to present what

they learned apply their knowledge to solve a new problem or work independently and none of

the component measures of technology usage were statistically significantly affected Nor did

33

treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear

better able to implement the academic rigor expected of an AP science class than some of the

inquiry-based approaches that the College Board intends for AP science We do not find

evidence that taking AP science led students to be more likely to report that they found their

course more interesting which may reflect the inability of the teachers to fully implement a

creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects

that might render our estimated effects smaller A research design with randomization both

across and within schools would allow for estimation of spillover effects but such a design was

infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers

received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the

weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors

in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple

outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons

(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same

three outcomes that reach statistical significance without applying the correction (shown in

Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys

from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we

have included the students from cohort 1 of high school number 23 where nonresponse was due

mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes

(including student prior academic preparation raceethnicity gender and teacher preparation)

We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in

science and grades in other courses) Some of the differences in the point estimates were quite

large yet so too were the standard errors For instance five of the seven estimated differential

treatment effects on science skill exceed 025 standard deviations with p-values that fall in the

suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse

on all three cohorts of study participants Once data collection is complete we will have the

ability to examine the effect of AP science on college enrollment college selectivity and college

completion

Page 27: Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown,

27

Table 5

AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades

(1) (2) (3)

Outcome

Control

Group

Complier

Mean

ITT LATE

Science Skill -019 009 023

(006) (016)

[015] [014]

STEM Interest 062 004 009

(002) (007)

[016] [016]

Confidence in College

Science 092 -004 -010

(002) (005)

[011] [006]

Stress 012 007 017

(003) (007)

[002] [001]

Grades in Science Courses 280 -012 -029

(007) (016)

[008] [007]

Grades in Other Courses 314 -007 -018

(002) (006)

[000] [000]

Number of Observations 1819 for grades 1417 for other

outcomes

Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of

participating students STEM interest =1 if high or some interest in pursuing a STEM degree or

=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to

complete a college science course or =0 if somewhat not confident or not at all confident Stress=

1 if most recent science course had strong negative or negative impact on physical or emotional

health or =0 if strong positive impact positive impact or no impact Grades in science and other

courses are obtained from student transcripts and measure grades during the study year

Results with the exception of grades during study year are weighted by the inverse probability of

completing the survey Standard errors clustered by School x Cohort are in parentheses and p-

values are in brackets

Table 6

Robustness Checks of Main ITT Results

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Outcome

Control

Group

Complie

r Mean

Main

Result

s

Robus

t SE

p-value

(permutatio

n test)

Excludin

g High

School

56

Including

Imputatio

n of

Missing

Outcome

Variables

Excluding

Covariate

s

Excludin

g High

School

23

Lee

Lower

Boun

d

Lee

Upper

Boun

d

95

Confidence

Interval

from Lee

Bounds

Rati

o of

95

CI in

(11)

to

95

CI in

(7)

Science Skill -019 009 010 011 020 007 003 039

-

009

05

1 20

(006) (005) (000) (000) (000) (000) (007) (007)

[015] [006] [006] [020] [011] [001] [024] [072] [000]

STEM Interest 062 004 005 003 003 003 002 012

-

003

01

8 19

(002) (003) (000) (000) (000) (000) (003) (004)

[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College

Science 092 -004 -003 -006 -006 -004 -006 005

-

009

01

0 20

(002) (002) (000) (000) (000) (000) (002) (003)

[011] [005] [007] [037] [002] [003] [010] [000] [017]

Stress 012 007 005 006 008 007 001 011

-

005

01

5 16

(003) (002) (000) (000) (000) (000) (003) (002)

[002] [000] [000] [014] [007] [002] [002] [079] [000]

Grades in Science Courses 280 -012 -006 -010 -007 |

(007) (004) (000) (000) (000)

[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts

Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey

(002) (003) (000) (000) (000) |

[000] [001] [001] [000] [001] [038]

Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than

standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby

a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of

43

the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)

reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the

experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply

imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and

where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)

from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those

treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and

control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to

derive confidence interval for the treatment effect itself)

30

1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the

Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the

effect of advanced high school courses more generally often without distinctions between AP

and other rigorous course options Nearly all of these nonexperimental studies find large positive

effects of rigorous secondary school courses particularly those in math and science on studentsrsquo

high school postsecondary and labor market performance (eg Altonji 1995 Attewell and

Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long

Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer

an AP course are encouraged (though not required) to attend a professional development

training The Board and other independent agencies offer several workshops with the most

extensive training being the AP summer institute a week-long training that is led by an

experienced AP instructor Teachers are then expected to develop their syllabi for the course and

submit them to the Board for review A team of auditors at the Board review each syllabus and

grant permission to a school to label the course as AP on course catalogs and student transcripts

once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they

do not meet the requirements upon original submission College Board (2017b) contains a

discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for

assessment (ie course delivery and student performance are not assessed by the Board) In

order to effectively run an AP Biology or Chemistry course teachers require access to a well-

equipped classroom and laboratory including all supplies necessary to engage in

experimentation (eg beakers solutions microscopes measuring equipment) Most of the

teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of

learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry

reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses

were based upon recommendations from the National Science Foundation the National Research

Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently

influenced by their frames of reference in ways that other assessments of these traits (eg

external observations) may be less influenced By increasing the standard to which they compare

themselves studentsrsquo confidence may decrease This feature of most self-assessments could be

considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et

al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome

depends to some extent on how these changes in perceived ability influence other behaviors

such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and

Biology I and Chemistry I for AP Biology with no additional requirements beyond these

prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week

training course classroom supplies (eg lab materials textbooks) and to compensate schools

for the staff time required for study administration efforts We also offered $1000 compensation

for an individual selected by the school to serve as a liaison between the study team and the

31

school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of

students which would have powered the study to detect effect sizes smaller than those detected

here We faced several challenges in recruiting schools to participate even with the monetary

incentives Some schools were uncomfortable with randomization across classrooms while

others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the

course would be offered We also made some assignments on a rolling basis as additional

consentassent forms were submitted We have no information on the students who were deemed

eligible by the school to take the new AP science course but who did not sign the consent form

to participate As these students did not participate we do not have permission to obtain

information on their characteristics (eg via transcripts) and for most schools we do not know

the number of such students 10 Participating districts include Anaheim Union High School District California East Side

Union High School District California Lynwood Unified School District California Jefferson

Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg

Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public

Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville

Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate

degrees held by teachers nationally are likely to be in education (not STEM) Thus the study

teachers are less likely to have a graduate degree but not necessarily less likely to have STEM

training We also did not survey teachers regarding their Teach for America (TFA) experience

but it is possible that the relatively high share of STEM undergraduate degrees could be driven

by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last

pilot test included 140 students) prior to administering the tool to study participants Reliability

metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of

the development of the assessment instrument in the survey can be found in Seeratan et al

(2017) 13 Each year in the spring semester our team administered and collected the participant surveys

during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However

if study participants who did not take the survey differ in unobserved ways then our reweighting

based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo

characteristics before imputation of missing values (as described below) these results are very

similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact

that some students did not have 10th grade scores) we created one reading and math score for

each student that is the average of both scores or just the 8th grade score For the 23 participating

students who were in 10th grade during the year in which the AP course was offered to their

cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be

endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of

enrollment in the course so the lack of balance is simply due to unlucky randomization rather

32

than manipulation by school administrators We considered implementing a randomized block

design to avoid such issues but found it infeasible to obtain the necessary test score information

prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was

allowed to register for the new class We added an entire planning year to our study design to

avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by

Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We

find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these

six outcomes which suggests that generalizing our estimated treatment effects to the full control

group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly

offered enrollment in an AP course and then given the option of Chemistry or Biology To

account for the two courses offered we treat the school as two separate groups School-

Chemistry and School-Biology For those students who were not offered an AP course we

randomly assign them to one of two control groups proportional to the number of treated

students who chose each course For example if 60 of the treated students chose Biology then

we randomly assign 60 of the control students to the School-Biology control group In Section

VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a

probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1

if student i in school by cohort j completed any part of the end-of-year survey Xi is the same

vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed

effects and Φ() is the cumulative normal distribution function The results of this regression are

included in Online Appendix Table 2 Students who had higher pre-treatment grades Black

students those who were not disabled and those who took prerequisite courses were more likely

to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives

more weight in the regression to study participants who completed the survey and yet had pre-

study characteristics that were similar to those study participants who did not complete the

survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and

with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we

observe each outcome variable This follows a multiple imputation then deletion strategy

suggested by Hippel (2007) which improves efficiency while protecting against problematic

imputed outcome values As a robustness check Section VC provides results including

imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually

challenging curriculum with more homework than non-AP complier students Treatment group

students are also more likely to report that the students in their class were driven to succeed and

that the teacher set high standards The AP science class also involved more student-led projects

or experiments hands on learning and small group work all activities that are deemed to be

essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)

Yet we do not find strong evidence that students in AP classes were more likely to present what

they learned apply their knowledge to solve a new problem or work independently and none of

the component measures of technology usage were statistically significantly affected Nor did

33

treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear

better able to implement the academic rigor expected of an AP science class than some of the

inquiry-based approaches that the College Board intends for AP science We do not find

evidence that taking AP science led students to be more likely to report that they found their

course more interesting which may reflect the inability of the teachers to fully implement a

creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects

that might render our estimated effects smaller A research design with randomization both

across and within schools would allow for estimation of spillover effects but such a design was

infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers

received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the

weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors

in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple

outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons

(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same

three outcomes that reach statistical significance without applying the correction (shown in

Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys

from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we

have included the students from cohort 1 of high school number 23 where nonresponse was due

mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes

(including student prior academic preparation raceethnicity gender and teacher preparation)

We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in

science and grades in other courses) Some of the differences in the point estimates were quite

large yet so too were the standard errors For instance five of the seven estimated differential

treatment effects on science skill exceed 025 standard deviations with p-values that fall in the

suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse

on all three cohorts of study participants Once data collection is complete we will have the

ability to examine the effect of AP science on college enrollment college selectivity and college

completion

Page 28: Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown,

Table 6

Robustness Checks of Main ITT Results

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

Outcome

Control

Group

Complie

r Mean

Main

Result

s

Robus

t SE

p-value

(permutatio

n test)

Excludin

g High

School

56

Including

Imputatio

n of

Missing

Outcome

Variables

Excluding

Covariate

s

Excludin

g High

School

23

Lee

Lower

Boun

d

Lee

Upper

Boun

d

95

Confidence

Interval

from Lee

Bounds

Rati

o of

95

CI in

(11)

to

95

CI in

(7)

Science Skill -019 009 010 011 020 007 003 039

-

009

05

1 20

(006) (005) (000) (000) (000) (000) (007) (007)

[015] [006] [006] [020] [011] [001] [024] [072] [000]

STEM Interest 062 004 005 003 003 003 002 012

-

003

01

8 19

(002) (003) (000) (000) (000) (000) (003) (004)

[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College

Science 092 -004 -003 -006 -006 -004 -006 005

-

009

01

0 20

(002) (002) (000) (000) (000) (000) (002) (003)

[011] [005] [007] [037] [002] [003] [010] [000] [017]

Stress 012 007 005 006 008 007 001 011

-

005

01

5 16

(003) (002) (000) (000) (000) (000) (003) (002)

[002] [000] [000] [014] [007] [002] [002] [079] [000]

Grades in Science Courses 280 -012 -006 -010 -007 |

(007) (004) (000) (000) (000)

[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts

Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey

(002) (003) (000) (000) (000) |

[000] [001] [001] [000] [001] [038]

Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than

standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby

a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of

43

the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)

reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the

experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply

imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and

where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)

from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those

treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and

control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to

derive confidence interval for the treatment effect itself)

30

1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the

Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the

effect of advanced high school courses more generally often without distinctions between AP

and other rigorous course options Nearly all of these nonexperimental studies find large positive

effects of rigorous secondary school courses particularly those in math and science on studentsrsquo

high school postsecondary and labor market performance (eg Altonji 1995 Attewell and

Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long

Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer

an AP course are encouraged (though not required) to attend a professional development

training The Board and other independent agencies offer several workshops with the most

extensive training being the AP summer institute a week-long training that is led by an

experienced AP instructor Teachers are then expected to develop their syllabi for the course and

submit them to the Board for review A team of auditors at the Board review each syllabus and

grant permission to a school to label the course as AP on course catalogs and student transcripts

once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they

do not meet the requirements upon original submission College Board (2017b) contains a

discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for

assessment (ie course delivery and student performance are not assessed by the Board) In

order to effectively run an AP Biology or Chemistry course teachers require access to a well-

equipped classroom and laboratory including all supplies necessary to engage in

experimentation (eg beakers solutions microscopes measuring equipment) Most of the

teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of

learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry

reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses

were based upon recommendations from the National Science Foundation the National Research

Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently

influenced by their frames of reference in ways that other assessments of these traits (eg

external observations) may be less influenced By increasing the standard to which they compare

themselves studentsrsquo confidence may decrease This feature of most self-assessments could be

considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et

al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome

depends to some extent on how these changes in perceived ability influence other behaviors

such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and

Biology I and Chemistry I for AP Biology with no additional requirements beyond these

prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week

training course classroom supplies (eg lab materials textbooks) and to compensate schools

for the staff time required for study administration efforts We also offered $1000 compensation

for an individual selected by the school to serve as a liaison between the study team and the

31

school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of

students which would have powered the study to detect effect sizes smaller than those detected

here We faced several challenges in recruiting schools to participate even with the monetary

incentives Some schools were uncomfortable with randomization across classrooms while

others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the

course would be offered We also made some assignments on a rolling basis as additional

consentassent forms were submitted We have no information on the students who were deemed

eligible by the school to take the new AP science course but who did not sign the consent form

to participate As these students did not participate we do not have permission to obtain

information on their characteristics (eg via transcripts) and for most schools we do not know

the number of such students 10 Participating districts include Anaheim Union High School District California East Side

Union High School District California Lynwood Unified School District California Jefferson

Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg

Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public

Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville

Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate

degrees held by teachers nationally are likely to be in education (not STEM) Thus the study

teachers are less likely to have a graduate degree but not necessarily less likely to have STEM

training We also did not survey teachers regarding their Teach for America (TFA) experience

but it is possible that the relatively high share of STEM undergraduate degrees could be driven

by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last

pilot test included 140 students) prior to administering the tool to study participants Reliability

metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of

the development of the assessment instrument in the survey can be found in Seeratan et al

(2017) 13 Each year in the spring semester our team administered and collected the participant surveys

during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However

if study participants who did not take the survey differ in unobserved ways then our reweighting

based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo

characteristics before imputation of missing values (as described below) these results are very

similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact

that some students did not have 10th grade scores) we created one reading and math score for

each student that is the average of both scores or just the 8th grade score For the 23 participating

students who were in 10th grade during the year in which the AP course was offered to their

cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be

endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of

enrollment in the course so the lack of balance is simply due to unlucky randomization rather

32

than manipulation by school administrators We considered implementing a randomized block

design to avoid such issues but found it infeasible to obtain the necessary test score information

prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was

allowed to register for the new class We added an entire planning year to our study design to

avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by

Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We

find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these

six outcomes which suggests that generalizing our estimated treatment effects to the full control

group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly

offered enrollment in an AP course and then given the option of Chemistry or Biology To

account for the two courses offered we treat the school as two separate groups School-

Chemistry and School-Biology For those students who were not offered an AP course we

randomly assign them to one of two control groups proportional to the number of treated

students who chose each course For example if 60 of the treated students chose Biology then

we randomly assign 60 of the control students to the School-Biology control group In Section

VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a

probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1

if student i in school by cohort j completed any part of the end-of-year survey Xi is the same

vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed

effects and Φ() is the cumulative normal distribution function The results of this regression are

included in Online Appendix Table 2 Students who had higher pre-treatment grades Black

students those who were not disabled and those who took prerequisite courses were more likely

to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives

more weight in the regression to study participants who completed the survey and yet had pre-

study characteristics that were similar to those study participants who did not complete the

survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and

with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we

observe each outcome variable This follows a multiple imputation then deletion strategy

suggested by Hippel (2007) which improves efficiency while protecting against problematic

imputed outcome values As a robustness check Section VC provides results including

imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually

challenging curriculum with more homework than non-AP complier students Treatment group

students are also more likely to report that the students in their class were driven to succeed and

that the teacher set high standards The AP science class also involved more student-led projects

or experiments hands on learning and small group work all activities that are deemed to be

essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)

Yet we do not find strong evidence that students in AP classes were more likely to present what

they learned apply their knowledge to solve a new problem or work independently and none of

the component measures of technology usage were statistically significantly affected Nor did

33

treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear

better able to implement the academic rigor expected of an AP science class than some of the

inquiry-based approaches that the College Board intends for AP science We do not find

evidence that taking AP science led students to be more likely to report that they found their

course more interesting which may reflect the inability of the teachers to fully implement a

creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects

that might render our estimated effects smaller A research design with randomization both

across and within schools would allow for estimation of spillover effects but such a design was

infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers

received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the

weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors

in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple

outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons

(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same

three outcomes that reach statistical significance without applying the correction (shown in

Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys

from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we

have included the students from cohort 1 of high school number 23 where nonresponse was due

mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes

(including student prior academic preparation raceethnicity gender and teacher preparation)

We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in

science and grades in other courses) Some of the differences in the point estimates were quite

large yet so too were the standard errors For instance five of the seven estimated differential

treatment effects on science skill exceed 025 standard deviations with p-values that fall in the

suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse

on all three cohorts of study participants Once data collection is complete we will have the

ability to examine the effect of AP science on college enrollment college selectivity and college

completion

Page 29: Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown,

the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)

reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the

experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply

imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and

where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)

from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those

treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and

control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to

derive confidence interval for the treatment effect itself)

30

1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the

Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the

effect of advanced high school courses more generally often without distinctions between AP

and other rigorous course options Nearly all of these nonexperimental studies find large positive

effects of rigorous secondary school courses particularly those in math and science on studentsrsquo

high school postsecondary and labor market performance (eg Altonji 1995 Attewell and

Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long

Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer

an AP course are encouraged (though not required) to attend a professional development

training The Board and other independent agencies offer several workshops with the most

extensive training being the AP summer institute a week-long training that is led by an

experienced AP instructor Teachers are then expected to develop their syllabi for the course and

submit them to the Board for review A team of auditors at the Board review each syllabus and

grant permission to a school to label the course as AP on course catalogs and student transcripts

once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they

do not meet the requirements upon original submission College Board (2017b) contains a

discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for

assessment (ie course delivery and student performance are not assessed by the Board) In

order to effectively run an AP Biology or Chemistry course teachers require access to a well-

equipped classroom and laboratory including all supplies necessary to engage in

experimentation (eg beakers solutions microscopes measuring equipment) Most of the

teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of

learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry

reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses

were based upon recommendations from the National Science Foundation the National Research

Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently

influenced by their frames of reference in ways that other assessments of these traits (eg

external observations) may be less influenced By increasing the standard to which they compare

themselves studentsrsquo confidence may decrease This feature of most self-assessments could be

considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et

al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome

depends to some extent on how these changes in perceived ability influence other behaviors

such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and

Biology I and Chemistry I for AP Biology with no additional requirements beyond these

prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week

training course classroom supplies (eg lab materials textbooks) and to compensate schools

for the staff time required for study administration efforts We also offered $1000 compensation

for an individual selected by the school to serve as a liaison between the study team and the

31

school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of

students which would have powered the study to detect effect sizes smaller than those detected

here We faced several challenges in recruiting schools to participate even with the monetary

incentives Some schools were uncomfortable with randomization across classrooms while

others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the

course would be offered We also made some assignments on a rolling basis as additional

consentassent forms were submitted We have no information on the students who were deemed

eligible by the school to take the new AP science course but who did not sign the consent form

to participate As these students did not participate we do not have permission to obtain

information on their characteristics (eg via transcripts) and for most schools we do not know

the number of such students 10 Participating districts include Anaheim Union High School District California East Side

Union High School District California Lynwood Unified School District California Jefferson

Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg

Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public

Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville

Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate

degrees held by teachers nationally are likely to be in education (not STEM) Thus the study

teachers are less likely to have a graduate degree but not necessarily less likely to have STEM

training We also did not survey teachers regarding their Teach for America (TFA) experience

but it is possible that the relatively high share of STEM undergraduate degrees could be driven

by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last

pilot test included 140 students) prior to administering the tool to study participants Reliability

metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of

the development of the assessment instrument in the survey can be found in Seeratan et al

(2017) 13 Each year in the spring semester our team administered and collected the participant surveys

during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However

if study participants who did not take the survey differ in unobserved ways then our reweighting

based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo

characteristics before imputation of missing values (as described below) these results are very

similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact

that some students did not have 10th grade scores) we created one reading and math score for

each student that is the average of both scores or just the 8th grade score For the 23 participating

students who were in 10th grade during the year in which the AP course was offered to their

cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be

endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of

enrollment in the course so the lack of balance is simply due to unlucky randomization rather

32

than manipulation by school administrators We considered implementing a randomized block

design to avoid such issues but found it infeasible to obtain the necessary test score information

prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was

allowed to register for the new class We added an entire planning year to our study design to

avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by

Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We

find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these

six outcomes which suggests that generalizing our estimated treatment effects to the full control

group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly

offered enrollment in an AP course and then given the option of Chemistry or Biology To

account for the two courses offered we treat the school as two separate groups School-

Chemistry and School-Biology For those students who were not offered an AP course we

randomly assign them to one of two control groups proportional to the number of treated

students who chose each course For example if 60 of the treated students chose Biology then

we randomly assign 60 of the control students to the School-Biology control group In Section

VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a

probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1

if student i in school by cohort j completed any part of the end-of-year survey Xi is the same

vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed

effects and Φ() is the cumulative normal distribution function The results of this regression are

included in Online Appendix Table 2 Students who had higher pre-treatment grades Black

students those who were not disabled and those who took prerequisite courses were more likely

to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives

more weight in the regression to study participants who completed the survey and yet had pre-

study characteristics that were similar to those study participants who did not complete the

survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and

with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we

observe each outcome variable This follows a multiple imputation then deletion strategy

suggested by Hippel (2007) which improves efficiency while protecting against problematic

imputed outcome values As a robustness check Section VC provides results including

imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually

challenging curriculum with more homework than non-AP complier students Treatment group

students are also more likely to report that the students in their class were driven to succeed and

that the teacher set high standards The AP science class also involved more student-led projects

or experiments hands on learning and small group work all activities that are deemed to be

essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)

Yet we do not find strong evidence that students in AP classes were more likely to present what

they learned apply their knowledge to solve a new problem or work independently and none of

the component measures of technology usage were statistically significantly affected Nor did

33

treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear

better able to implement the academic rigor expected of an AP science class than some of the

inquiry-based approaches that the College Board intends for AP science We do not find

evidence that taking AP science led students to be more likely to report that they found their

course more interesting which may reflect the inability of the teachers to fully implement a

creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects

that might render our estimated effects smaller A research design with randomization both

across and within schools would allow for estimation of spillover effects but such a design was

infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers

received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the

weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors

in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple

outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons

(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same

three outcomes that reach statistical significance without applying the correction (shown in

Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys

from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we

have included the students from cohort 1 of high school number 23 where nonresponse was due

mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes

(including student prior academic preparation raceethnicity gender and teacher preparation)

We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in

science and grades in other courses) Some of the differences in the point estimates were quite

large yet so too were the standard errors For instance five of the seven estimated differential

treatment effects on science skill exceed 025 standard deviations with p-values that fall in the

suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse

on all three cohorts of study participants Once data collection is complete we will have the

ability to examine the effect of AP science on college enrollment college selectivity and college

completion

Page 30: Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown,

30

1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the

Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the

effect of advanced high school courses more generally often without distinctions between AP

and other rigorous course options Nearly all of these nonexperimental studies find large positive

effects of rigorous secondary school courses particularly those in math and science on studentsrsquo

high school postsecondary and labor market performance (eg Altonji 1995 Attewell and

Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long

Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer

an AP course are encouraged (though not required) to attend a professional development

training The Board and other independent agencies offer several workshops with the most

extensive training being the AP summer institute a week-long training that is led by an

experienced AP instructor Teachers are then expected to develop their syllabi for the course and

submit them to the Board for review A team of auditors at the Board review each syllabus and

grant permission to a school to label the course as AP on course catalogs and student transcripts

once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they

do not meet the requirements upon original submission College Board (2017b) contains a

discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for

assessment (ie course delivery and student performance are not assessed by the Board) In

order to effectively run an AP Biology or Chemistry course teachers require access to a well-

equipped classroom and laboratory including all supplies necessary to engage in

experimentation (eg beakers solutions microscopes measuring equipment) Most of the

teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of

learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry

reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses

were based upon recommendations from the National Science Foundation the National Research

Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently

influenced by their frames of reference in ways that other assessments of these traits (eg

external observations) may be less influenced By increasing the standard to which they compare

themselves studentsrsquo confidence may decrease This feature of most self-assessments could be

considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et

al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome

depends to some extent on how these changes in perceived ability influence other behaviors

such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and

Biology I and Chemistry I for AP Biology with no additional requirements beyond these

prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week

training course classroom supplies (eg lab materials textbooks) and to compensate schools

for the staff time required for study administration efforts We also offered $1000 compensation

for an individual selected by the school to serve as a liaison between the study team and the

31

school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of

students which would have powered the study to detect effect sizes smaller than those detected

here We faced several challenges in recruiting schools to participate even with the monetary

incentives Some schools were uncomfortable with randomization across classrooms while

others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the

course would be offered We also made some assignments on a rolling basis as additional

consentassent forms were submitted We have no information on the students who were deemed

eligible by the school to take the new AP science course but who did not sign the consent form

to participate As these students did not participate we do not have permission to obtain

information on their characteristics (eg via transcripts) and for most schools we do not know

the number of such students 10 Participating districts include Anaheim Union High School District California East Side

Union High School District California Lynwood Unified School District California Jefferson

Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg

Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public

Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville

Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate

degrees held by teachers nationally are likely to be in education (not STEM) Thus the study

teachers are less likely to have a graduate degree but not necessarily less likely to have STEM

training We also did not survey teachers regarding their Teach for America (TFA) experience

but it is possible that the relatively high share of STEM undergraduate degrees could be driven

by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last

pilot test included 140 students) prior to administering the tool to study participants Reliability

metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of

the development of the assessment instrument in the survey can be found in Seeratan et al

(2017) 13 Each year in the spring semester our team administered and collected the participant surveys

during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However

if study participants who did not take the survey differ in unobserved ways then our reweighting

based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo

characteristics before imputation of missing values (as described below) these results are very

similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact

that some students did not have 10th grade scores) we created one reading and math score for

each student that is the average of both scores or just the 8th grade score For the 23 participating

students who were in 10th grade during the year in which the AP course was offered to their

cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be

endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of

enrollment in the course so the lack of balance is simply due to unlucky randomization rather

32

than manipulation by school administrators We considered implementing a randomized block

design to avoid such issues but found it infeasible to obtain the necessary test score information

prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was

allowed to register for the new class We added an entire planning year to our study design to

avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by

Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We

find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these

six outcomes which suggests that generalizing our estimated treatment effects to the full control

group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly

offered enrollment in an AP course and then given the option of Chemistry or Biology To

account for the two courses offered we treat the school as two separate groups School-

Chemistry and School-Biology For those students who were not offered an AP course we

randomly assign them to one of two control groups proportional to the number of treated

students who chose each course For example if 60 of the treated students chose Biology then

we randomly assign 60 of the control students to the School-Biology control group In Section

VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a

probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1

if student i in school by cohort j completed any part of the end-of-year survey Xi is the same

vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed

effects and Φ() is the cumulative normal distribution function The results of this regression are

included in Online Appendix Table 2 Students who had higher pre-treatment grades Black

students those who were not disabled and those who took prerequisite courses were more likely

to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives

more weight in the regression to study participants who completed the survey and yet had pre-

study characteristics that were similar to those study participants who did not complete the

survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and

with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we

observe each outcome variable This follows a multiple imputation then deletion strategy

suggested by Hippel (2007) which improves efficiency while protecting against problematic

imputed outcome values As a robustness check Section VC provides results including

imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually

challenging curriculum with more homework than non-AP complier students Treatment group

students are also more likely to report that the students in their class were driven to succeed and

that the teacher set high standards The AP science class also involved more student-led projects

or experiments hands on learning and small group work all activities that are deemed to be

essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)

Yet we do not find strong evidence that students in AP classes were more likely to present what

they learned apply their knowledge to solve a new problem or work independently and none of

the component measures of technology usage were statistically significantly affected Nor did

33

treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear

better able to implement the academic rigor expected of an AP science class than some of the

inquiry-based approaches that the College Board intends for AP science We do not find

evidence that taking AP science led students to be more likely to report that they found their

course more interesting which may reflect the inability of the teachers to fully implement a

creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects

that might render our estimated effects smaller A research design with randomization both

across and within schools would allow for estimation of spillover effects but such a design was

infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers

received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the

weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors

in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple

outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons

(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same

three outcomes that reach statistical significance without applying the correction (shown in

Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys

from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we

have included the students from cohort 1 of high school number 23 where nonresponse was due

mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes

(including student prior academic preparation raceethnicity gender and teacher preparation)

We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in

science and grades in other courses) Some of the differences in the point estimates were quite

large yet so too were the standard errors For instance five of the seven estimated differential

treatment effects on science skill exceed 025 standard deviations with p-values that fall in the

suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse

on all three cohorts of study participants Once data collection is complete we will have the

ability to examine the effect of AP science on college enrollment college selectivity and college

completion

Page 31: Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown,

31

school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of

students which would have powered the study to detect effect sizes smaller than those detected

here We faced several challenges in recruiting schools to participate even with the monetary

incentives Some schools were uncomfortable with randomization across classrooms while

others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the

course would be offered We also made some assignments on a rolling basis as additional

consentassent forms were submitted We have no information on the students who were deemed

eligible by the school to take the new AP science course but who did not sign the consent form

to participate As these students did not participate we do not have permission to obtain

information on their characteristics (eg via transcripts) and for most schools we do not know

the number of such students 10 Participating districts include Anaheim Union High School District California East Side

Union High School District California Lynwood Unified School District California Jefferson

Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg

Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public

Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville

Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate

degrees held by teachers nationally are likely to be in education (not STEM) Thus the study

teachers are less likely to have a graduate degree but not necessarily less likely to have STEM

training We also did not survey teachers regarding their Teach for America (TFA) experience

but it is possible that the relatively high share of STEM undergraduate degrees could be driven

by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last

pilot test included 140 students) prior to administering the tool to study participants Reliability

metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of

the development of the assessment instrument in the survey can be found in Seeratan et al

(2017) 13 Each year in the spring semester our team administered and collected the participant surveys

during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However

if study participants who did not take the survey differ in unobserved ways then our reweighting

based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo

characteristics before imputation of missing values (as described below) these results are very

similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact

that some students did not have 10th grade scores) we created one reading and math score for

each student that is the average of both scores or just the 8th grade score For the 23 participating

students who were in 10th grade during the year in which the AP course was offered to their

cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be

endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of

enrollment in the course so the lack of balance is simply due to unlucky randomization rather

32

than manipulation by school administrators We considered implementing a randomized block

design to avoid such issues but found it infeasible to obtain the necessary test score information

prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was

allowed to register for the new class We added an entire planning year to our study design to

avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by

Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We

find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these

six outcomes which suggests that generalizing our estimated treatment effects to the full control

group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly

offered enrollment in an AP course and then given the option of Chemistry or Biology To

account for the two courses offered we treat the school as two separate groups School-

Chemistry and School-Biology For those students who were not offered an AP course we

randomly assign them to one of two control groups proportional to the number of treated

students who chose each course For example if 60 of the treated students chose Biology then

we randomly assign 60 of the control students to the School-Biology control group In Section

VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a

probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1

if student i in school by cohort j completed any part of the end-of-year survey Xi is the same

vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed

effects and Φ() is the cumulative normal distribution function The results of this regression are

included in Online Appendix Table 2 Students who had higher pre-treatment grades Black

students those who were not disabled and those who took prerequisite courses were more likely

to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives

more weight in the regression to study participants who completed the survey and yet had pre-

study characteristics that were similar to those study participants who did not complete the

survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and

with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we

observe each outcome variable This follows a multiple imputation then deletion strategy

suggested by Hippel (2007) which improves efficiency while protecting against problematic

imputed outcome values As a robustness check Section VC provides results including

imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually

challenging curriculum with more homework than non-AP complier students Treatment group

students are also more likely to report that the students in their class were driven to succeed and

that the teacher set high standards The AP science class also involved more student-led projects

or experiments hands on learning and small group work all activities that are deemed to be

essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)

Yet we do not find strong evidence that students in AP classes were more likely to present what

they learned apply their knowledge to solve a new problem or work independently and none of

the component measures of technology usage were statistically significantly affected Nor did

33

treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear

better able to implement the academic rigor expected of an AP science class than some of the

inquiry-based approaches that the College Board intends for AP science We do not find

evidence that taking AP science led students to be more likely to report that they found their

course more interesting which may reflect the inability of the teachers to fully implement a

creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects

that might render our estimated effects smaller A research design with randomization both

across and within schools would allow for estimation of spillover effects but such a design was

infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers

received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the

weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors

in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple

outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons

(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same

three outcomes that reach statistical significance without applying the correction (shown in

Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys

from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we

have included the students from cohort 1 of high school number 23 where nonresponse was due

mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes

(including student prior academic preparation raceethnicity gender and teacher preparation)

We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in

science and grades in other courses) Some of the differences in the point estimates were quite

large yet so too were the standard errors For instance five of the seven estimated differential

treatment effects on science skill exceed 025 standard deviations with p-values that fall in the

suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse

on all three cohorts of study participants Once data collection is complete we will have the

ability to examine the effect of AP science on college enrollment college selectivity and college

completion

Page 32: Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown,

32

than manipulation by school administrators We considered implementing a randomized block

design to avoid such issues but found it infeasible to obtain the necessary test score information

prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was

allowed to register for the new class We added an entire planning year to our study design to

avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by

Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We

find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these

six outcomes which suggests that generalizing our estimated treatment effects to the full control

group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly

offered enrollment in an AP course and then given the option of Chemistry or Biology To

account for the two courses offered we treat the school as two separate groups School-

Chemistry and School-Biology For those students who were not offered an AP course we

randomly assign them to one of two control groups proportional to the number of treated

students who chose each course For example if 60 of the treated students chose Biology then

we randomly assign 60 of the control students to the School-Biology control group In Section

VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a

probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1

if student i in school by cohort j completed any part of the end-of-year survey Xi is the same

vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed

effects and Φ() is the cumulative normal distribution function The results of this regression are

included in Online Appendix Table 2 Students who had higher pre-treatment grades Black

students those who were not disabled and those who took prerequisite courses were more likely

to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives

more weight in the regression to study participants who completed the survey and yet had pre-

study characteristics that were similar to those study participants who did not complete the

survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and

with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we

observe each outcome variable This follows a multiple imputation then deletion strategy

suggested by Hippel (2007) which improves efficiency while protecting against problematic

imputed outcome values As a robustness check Section VC provides results including

imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually

challenging curriculum with more homework than non-AP complier students Treatment group

students are also more likely to report that the students in their class were driven to succeed and

that the teacher set high standards The AP science class also involved more student-led projects

or experiments hands on learning and small group work all activities that are deemed to be

essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)

Yet we do not find strong evidence that students in AP classes were more likely to present what

they learned apply their knowledge to solve a new problem or work independently and none of

the component measures of technology usage were statistically significantly affected Nor did

33

treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear

better able to implement the academic rigor expected of an AP science class than some of the

inquiry-based approaches that the College Board intends for AP science We do not find

evidence that taking AP science led students to be more likely to report that they found their

course more interesting which may reflect the inability of the teachers to fully implement a

creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects

that might render our estimated effects smaller A research design with randomization both

across and within schools would allow for estimation of spillover effects but such a design was

infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers

received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the

weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors

in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple

outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons

(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same

three outcomes that reach statistical significance without applying the correction (shown in

Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys

from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we

have included the students from cohort 1 of high school number 23 where nonresponse was due

mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes

(including student prior academic preparation raceethnicity gender and teacher preparation)

We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in

science and grades in other courses) Some of the differences in the point estimates were quite

large yet so too were the standard errors For instance five of the seven estimated differential

treatment effects on science skill exceed 025 standard deviations with p-values that fall in the

suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse

on all three cohorts of study participants Once data collection is complete we will have the

ability to examine the effect of AP science on college enrollment college selectivity and college

completion

Page 33: Alec I. Kennedy Raymond McGhee Jr. ABSTRACT · Raymond McGhee Jr. is a senior director at Equal Measure. The authors thank Nicole Bateman, Kerry Beldoff, Grant H. Blume, Jordan Brown,

33

treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear

better able to implement the academic rigor expected of an AP science class than some of the

inquiry-based approaches that the College Board intends for AP science We do not find

evidence that taking AP science led students to be more likely to report that they found their

course more interesting which may reflect the inability of the teachers to fully implement a

creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects

that might render our estimated effects smaller A research design with randomization both

across and within schools would allow for estimation of spillover effects but such a design was

infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers

received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the

weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors

in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple

outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons

(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same

three outcomes that reach statistical significance without applying the correction (shown in

Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys

from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we

have included the students from cohort 1 of high school number 23 where nonresponse was due

mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes

(including student prior academic preparation raceethnicity gender and teacher preparation)

We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in

science and grades in other courses) Some of the differences in the point estimates were quite

large yet so too were the standard errors For instance five of the seven estimated differential

treatment effects on science skill exceed 025 standard deviations with p-values that fall in the

suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse

on all three cohorts of study participants Once data collection is complete we will have the

ability to examine the effect of AP science on college enrollment college selectivity and college

completion