Post on 19-Oct-2020
1
The Effect of Advanced Placement Science on Studentsrsquo Skills Confidence and Stress
Dylan Conger
Alec I Kennedy
Mark C Long
Raymond McGhee Jr
ABSTRACT
The AP program has been widely adopted by secondary schools yet the evidence on the impacts
of taking AP courses has been entirely observational We report results from the first
experimental study of AP focusing on whether AP endows students with greater human capital
than other regular and honors courses We find suggestive evidence that taking an AP science
course increases studentsrsquo science skill and their interest in pursuing a STEM major in college
AP course-takers also have lower confidence in their ability to succeed in college science higher
levels of stress and worse grades than their control counterparts
____________
Dylan Conger is a professor of public policy at the George Washington University Alec I
Kennedy is a doctoral student at the University of Washington Mark C Long is a professor of
public policy and governance and adjunct professor of economics at the University of
Washington Raymond McGhee Jr is a senior director at Equal Measure The authors thank
Nicole Bateman Kerry Beldoff Grant H Blume Jordan Brown Sarah Coffey Bonnee Groover
Josette Arevalo Gross Hernando Grueso Hurtado Jessica Mislevy Kelsey Rote Massiel
Sepulveda and Mariam Zameer for excellent research assistance They also appreciate the
guidance and insights provided by Del Harnisch Michal Kurlaender Richard Murnane Helen
Quinn and Aaron Rogat The authors are grateful for comments from three anonymous referees
The College Board staff provided answers to the study teamrsquos questions about the AP program
and general feedback on the research design but the College Board did not provide financial
support and was otherwise not involved in the production of this research The research was
funded by the National Science Foundation (Award 1220092) and is registered in the American
Economic Associationrsquos Registry for RCTs (ID 000140) The data used to produce the empirical
findings in this paper are available from the Inter-university Consortium for Political and Social
Research at httpdoiorg
Online Appendix can be found at httpjhruwpressorg
Corresponding author email marklonguwedu
JEL codes I20 J24
2
I Introduction
The Advanced Placement (AP) program a set of college-level courses and exams offered at the
high school level has become a centerpiece of efforts to strengthen the transition to
postsecondary training and boost human capital Many colleges and universities treat studentsrsquo
enrollment in AP courses and scores on AP exams as a signal of quality in admissions and grant
college credit or course waivers to students who receive high AP exam scores (Geiser and
Santelices 2004) These incentives have prompted a substantial increase in the number of
students taking AP courses and exams in recent decades with more than five times as many AP
exams taken in 2018 (over five million) as in 1996 (less than one million) (College Board 2018)
At the programrsquos inception in the mid-1950s AP courses were found in a handful of elite private
schools today AP is offered in nearly 70 percent of public schools in the United States (Thomas
et al 2013)
Much of the expansion has been driven by federal and state policies designed to increase
access to AP including offering subsidies to pay for exams building AP course offerings into
school accountability requirements and requiring public postsecondary institutions to offer
credit for AP exam scores (Adelman 2006 Dounay Zinth 2016 Holstead et al 2010) For almost
20 years for instance the US Department of Education has provided states with funds to offset
the cost of AP exams for low-income students1 Despite the programrsquos popularity among many
AP also has its critics Some researchers and educators claim that the programrsquos effectiveness
has been oversold and that there is no real evidence that AP endows students with greater skill or
subject-matter interest than other high school courses (Berger 2006 Drew 2011 Klopfenstein
and Thomas 2010 2009 Tai 2008 Tierney 2012) Others worry that the pressure of AP courses
causes students undue stress and confidence loss (Hopkins 2012 Kim 2015 Steinberg 2009)
The expansion of AP to less-resourced schools has also raised concerns that many of the students
now taking the courses are academically underprepared such that the monetary and psychic costs
of the investment may outweigh the potential benefits (Bowie 2013 Dougherty and Mellor 2009
Duffett and Farkas 2009 Smith Hurwitz and Avery 2017 Tierney 2012) Up to now
researchersrsquo ability to generate causal evidence on any of the claims made by proponents and
opponents has been substantially limited by the nonrandom sorting of students into AP classes
As a result all of the prior research on AP impacts has been observational
In this paper we provide the first experimental evidence on AP program impacts We focus
on AP science courses which have been endorsed by educators and policymakers as a key
strategy for increasing American studentsrsquo skill and interest in Science Technology
Engineering and Mathematics (STEM) and strengthening the STEM workforce (eg Adelman
2006 Bush 2006 House 2016) With participation from 23 schools and over 1800 students from
across the United States we randomly offered students enrollment into newly launched AP
Biology or Chemistry courses in their schools To directly evaluate whether AP endows students
with higher levels of skill than other science courses we designed and validated an instrument to
measure studentsrsquo scientific inquiry abilities (eg the ability to analyze data and make scientific
arguments) We also collected administrative data and surveyed students to assess AP impacts on
their interest in pursuing a STEM degree in college confidence in completing a college science
course high school grades and stress levels In addition to generating impact estimates we
report on the courses that AP crowds out along with the contrast between treatment and control
students in the content and rigor of their science courses
The results suggest that there is some truth in the claims made by both advocates and critics
3
of AP Consistent with the goals of an AP course treatment group students report that their
courses are more challenging and inquiry-based than control group students These views are
shared by teachers who report a higher level of rigor in their AP science courses compared to
their other science courses We find suggestive evidence that this academic challenge leads to
increases in skill AP course-takers score 023 standard deviations higher than control group
compliers on the end-of-year assessment of scientific skill Though our precision prevents us
from ruling out zero treatment effects at traditional levels of statistical inference (p-value=014)
this large point estimate suggests genuine productivity gains for students who take AP science
over and above the gains experienced by students who enroll in other high school courses We
also find suggestive evidence of an AP science boost to studentsrsquo interest in pursuing a STEM
degree should they enroll in college Together these results fail to support the concern that the
AP programrsquos impact on human capital has been oversold
At the same time our results confirm that the workload and expectations of an AP science
class causes students to lose confidence in their ability to succeed in college-level science gain
stress and earn lower grades (prior to the weights that are often attached to AP grades by
secondary and postsecondary institutions) The confidence levels among study participants are
quite high with 92 percent of control group compliers reporting that they are ldquosomewhatrdquo or
ldquoextremelyrdquo confident in their ability to succeed in a college science course AP course-takers
report a 10-percentage point lower estimation of their ability Students in the AP course are also
more than twice as likely as control group compliers to report that the course negatively affected
their physical or emotional health (our measure of stress) And comparisons of transcripts reveal
that treatment group students earned lower preweighted grades in science and other subjects
during the year that they took the AP class
Our study contributes to a small research base on the effects of the AP program2 Using a
regression discontinuity design Smith Hurwitz and Avery (2017) show that students who
barely earn a college-credit equivalent score on the AP exam (eg scoring just above the
threshold necessary to receive a 3 on the exam (out of 5) are more likely to complete their
bachelorrsquos degrees in four years than students who fall just below that threshold In a related
paper that relies on the same data and design Avery et al (2018) demonstrate that AP exam
scores also influence studentsrsquo college major choices These compelling results demonstrate that
students take advantage of postsecondary AP credit policies to waive out of intro courses and
that receiving a higher AP exam score may serve as a signal of skill to both institutions and
students These two studies however do not show that AP courses per se led to skill
development as they focus solely on differences in behavior for AP exam-takers who fall just
below and just above the score thresholds Jackson (2010 2014) evaluates the impacts of the AP
Incentive Program which offers cash incentives to teachers and students for passing scores on
AP exams as well as funds for training teachers and convening teams of teachers to align pre-AP
curriculum with the needs of the AP class Jackson identifies impact from variation in the timing
of program implementation across high schools in Texas and finds large positive treatment
effects on AP courses and exams (2010) The AP Incentive program also increased studentsrsquo
college going and persistence as well as their labor market earnings (Jackson 2010 2014) These
two studies indicate that the AP Incentive Program increased AP participation and subsequent
educational attainment and labor market performance However it is not clear whether these
results would hold in the absence of the Incentive Program
We build on these findings and inform policy and practice in several ways Most important
we directly test one of the main mechanisms through which AP is expected to influence studentsrsquo
4
attainment and earnings by increasing their skill and interest in the subject matter We determine
whether skill and interest gains as distinct from college admissions and credit-granting policies
are key drivers behind APrsquos impact on later outcomes This distinction is important given that
less than half of AP course-takers earn a credit-granting score on the AP exam either because
they do not take the exam or because they obtain low scores (National Research Council 2002
College Board 2018) Many selective colleges are also increasingly making it difficult for
students to receive credit for their AP exam scores Most top institutions restrict the number of
AP subject areas that are eligible only offer credit or waivers for very high scores on the exams
or cap the total amount of AP credit that a student can receive (Weinstein 2016) In 2012
Dartmouth College announced that it would no longer grant credit for any AP exam score a
policy shared by several other selective institutions including Amherst College Brown
University and the California Institute of Technology (Weinstein 2016) Our results which
generalize to a newly offered AP course suggest that AP endows students with human capital
even if it does not grant them the opportunity to earn credit at their preferred college For college
admissions officers the findings also suggest that AP course-taking offers a reasonable signal of
studentsrsquo skill and subject-matter interest Our estimated effects on skill and STEM interest are
somewhat limited by insufficient precision yet they represent the first and most credible
evidence to date on the impact of AP on these key outcomes
Our study is also among the first known to us that quantifies the AP impact on studentsrsquo
grades We find that students who take an AP science course earn lower grades in science (by
029 grade points) and lower grades in their other courses (by 018 grade points) The lower
grades in science are driven by the lower grade received in the AP class a negative effect that
many secondary and postsecondary institutions offset by upweighting AP grades The estimates
suggest that studentsrsquo AP science grade would have to be inflated by a factor of 146 (eg a C
would have to be converted to approximately a B+) to remove the net negative on overall grade
point average (GPA) While many high schools including those that participated in our study
weight studentsrsquo GPAs to adjust for the academic difficulty of the courses practices vary
substantially across institutions (Sadler and Tai 2007 Klopfenstein and Lively 2016) In a recent
survey of Texas high schools for instance Klopfenstein and Lively (2016) find that most
schools with AP courses used weights but that they ranged from 05 to 1 point (with a small
number assigning more than 1 extra point) Our findings suggest that the current practices at
many institutions under adjust for the grade penalty from AP courses In addition attaching
weight to AP grades cannot undo the learning loss that may occur when students shift their effort
away from non-AP coursework
We also contribute to other strands of literature on the relationship between studentsrsquo
academic achievement and their perceptions of their own confidence and stress Prior literature
on the relationship between studentsrsquo confidence in their ability and their true ability is rife with
mixed results (Boekaerts and Rozendaal 2010 Stankov and Crawford 1996 Stankov 2013)
Psychologists have also documented an inverted U-shaped relationship between perceived
pressure and performance where some amount of stress is necessary to increase achievement
yet too much stress can reduce studentsrsquo ability to gain knowledge (Anderson 1976 Davis 2014
Yerkes and Dodson 1908) We find that students taking an AP science class experience cognitive
gains concurrent with losses in their academic confidence This finding is consistent with
evidence that many US students are highly confident in their skills and that this noncognitive
belief often interferes with their ability to learn (Chiu and Klassen 2010 Stankov and Lee 2014)
The AP course appears to reduce studentsrsquo estimation of their own ability either by changing the
5
standard to which they compare themselves or by making them more aware of the challenges
they might face in a college course Whether these changes in perceived confidence persist and
how they influence later outcomes is uncertain Students with expectation levels that match the
real demands of college courses might eventually perform better in those courses Some students
might also use the insights they gain from a challenging AP science class to shift away from
difficult science courses in college (or entire majors) that could delay or hinder their college
completion Our results also suggest that AP causes a significant amount of stress for students
but we do not find evidence that the added pressure substantially limits their knowledge gains in
science
II AP Science and Conceptual Framework
A AP and Other Rigorous Secondary School Courses
The AP program is an appealing option for high school administrators who seek to offer college-
level courses to their students AP course descriptions and assignments are designed to match
those offered in introductory college courses in each subject and thus to prepare students for the
rigor of college coursework The College Board (the ldquoBoardrdquo for brevity) is a not-for-profit
organization that administers AP and provides professional development for teachers reviews of
course syllabi and extensive curricular materials (eg sample syllabi sample lab experiments)3
The Board also offers standardized AP exams in the spring of each year that are graded by
external examiners and provide an externally-validated measure of student learning Most exams
include both an essay or problem-solving component and multiple-choice questions all of which
are aligned with the course descriptions The exam is one of the key features of the AP program
and is used by high school and postsecondary educators to evaluate the depth of studentsrsquo skill
independently of teacher bias
In addition to AP courses high school students typically have three alternative options for
advanced coursework Most high schools offer ldquohonorsrdquo courses which are intended to provide a
more rigorous curriculum than the regular course in the same subject The content and rigor of
honors courses varies across high schools and there is no standardized honors exam offered to
students in these courses A second option is the International Baccalaureate (IB) program
which was originally designed for students in international schools and aims to develop
studentsrsquo critical thinking skills and their knowledge of international affairs The IB program is
offered worldwide but remains relatively uncommon in the United States with less than 5 percent
of high schools offering IB in 2016 (The IB Programme 2016) A final option is for students to
take a course at a nearby college (or online) or for some a course that is taught at their high
school by an instructor who has been approved as college-level These ldquodual enrollmentrdquo or
ldquodual creditrdquo courses are meant to provide students with the opportunity to simultaneously earn
high school and college credit In the most recent national survey high schools reported
approximately two million enrollments in dual credit courses (Thomas et al 2013) There is
limited information on the colleges that accept dual enrollment credits Most courses are offered
through collaborations between high schools and local community and public postsecondary
institutions suggesting that credits are generally accepted at these institutions and less often
accepted at other institutions Comparisons of AP science classes to regular and honors level
science classes reveal that students receive much more homework and work harder in their AP
classes (Sadler et al 2014) To our knowledge there have been no comparisons of the workload
or effort in AP science courses compared to IB or dual enrollment science courses
6
B Conceptual Framework
There are several channels through which an AP science class is expected to influence studentsrsquo
cognitive and noncognitive skills Much like the ideal college course AP science is designed to
provide rigorous content and a substantial workload be taught by teachers who have high
expectations and consist of students who are driven to succeed These inputsmdashcourse rigor
teacher expectations and peer motivationmdashare often thought of as the main characteristics that
distinguish AP courses from other high school courses
Yet AP science classes are also intended to offer an inquiry-based approach to science that
when combined with a high level of rigor provides an additional causal pathway to change
Specifically a well-implemented AP science course should encourage students to ask questions
gather and interpret data arrive at explanations grounded in scientific principles and
communicate their observations to one another under the guidance of teachers (College Board
2011a 2011b)4 This student-led inquiry-based approach differs from many traditional
secondary school science classrooms where the goal is often for students to memorize content
and replicate laboratory experiments that demonstrate the content (National Research Council
2002 2012) The AP science course in contrast seeks to expose students to the real-world
practices of science and the skills that form the basis of scientific inquiry by focusing more on
big picture concepts and small group experimentation with students directing the inquiry The
curriculum also encourages teachers to move away from lecture-based pedagogy and multiple-
choice quizzes and to increase their use of technology to help students analyze data draw
interpretations and communicate findings (College Board 2011a 2011b)
AP science classes are expected to increase studentsrsquo ability to ask research questions design
experiments analyze data and draw conclusions In the process of gaining these scientific
inquiry skills the new curriculum is intended to spur greater interest in the practice of science
because it becomes more enjoyable and more accessible to students for whom rote memorization
and execution of prefabricated lab experiments might have diminished enthusiasm in the subject
(National Research Council 2012) Science experts posit that inquiry-based science courses will
be particularly successful in generating greater interest and skill among women and among
students from underrepresented minority groups (Aguilar Walton and Wieman 2014 Ellis
Fosdick and Rasmussen 2016 Kurth Anderson and Palincsar 2002 Leslie et al 2015 Litzler
Samuelson and Lorah 2014)
While the rigor and expectations of a college course may be appropriate for some students it
can be too demanding for others Students often report high levels of stress and burnout from
taking AP courses particularly if they perceive that they are not prepared for the challenge of
college coursework (Kim 2015 Marx 2014 Tucker 2012) A strenuous AP course could in fact
cause students to lose confidence in their ability to complete college science courses A number
of mechanisms could cause students to lose confidence including exposure to stronger peers
inability to successfully complete assignments or simply receiving lower grades than they
received in their non-AP courses5 The AP effect on confidence will likely matter differently for
students with different levels of initial confidence For students who are over-confident in their
ability to succeed in college science courses taking a challenging AP course in high school
might cause them to revise their expectations to be more in line with the higher demands of
college-level work
Taking a more strenuous AP course is also likely to affect studentsrsquo time allocation
Studentsrsquo performance in each class will be determined by their subject-specific ability as well as
the amount of time they devote to their coursework versus other activities including work
7
extracurricular and leisure If AP courses are more demanding than other courses students
solving a time allocation problem may shift more effort into their AP course away from other
pursuits The impact of this change in time allocation on studentsrsquo performance in AP and other
courses will depend upon whether they shift effort away from other courses and on the degree of
complementarity between their AP science course and their other courses Study time devoted to
an AP science course could improve student performance in other math and science classes
(where the skills tasks and knowledge are similar) even if students spend less time on those
courses For courses that require students to perform tasks that are not complementary with AP
science (eg courses in the humanities) taking AP science concurrently with these courses
could decrease student performance in both courses Of course students taking an AP course
could choose to reduce time spent on alternative (non-academic) activities If these other
activities have no causal impact on performance in school then the impact on overall
achievement could be negligible
Some students report concerns about their time allocation as they weigh the decision to enroll
in AP (Foust Hertberg-Davis and Callahan 2009 Hopkins 2012 Kim 2015) Many of these
concerns have increased over time as the courses have become more accessible to students who
previously faced barriers to enrollment Traditionally teachers only recommended AP courses to
students with high grades in prerequisite classes and the courses were only offered in schools
with substantial resources The Board has made efforts to increase access with for instance a
policy statement that encourages schools to open AP to all students who are ldquowilling to accept
the challengerdquo and remove all barriers that restrict access (College Board 2002)6 In a 2008
survey of a nationally-representative sample 65 percent of secondary school teachers reported
that their schools encourage as many students as possible to take AP and 69 percent reported that
AP courses are generally open to any student who wants to enroll (Duffett and Farkas 2009)
These open access policies have led to complaints that students who enroll with less preparation
will be unable to engage in the material (and perhaps become more discouraged by the
difficulty of the course) than students with more prior preparation (Hopkins 2012 Steinberg
2009 Duffett and Farkas 2009) Open access could also adversely affect more prepared students
through negative peer effects or through teachers removing content and slowing the pace of
course delivery
III AP Science Impact Study
A Overview
We recruited 23 schools from across the United States and offered monetary compensation to
pay for equipment and teacher training and as an incentive to secure participation7 Eligible
schools included ones that had not offered AP Biology or AP Chemistry in recent years were
willing to add such a course and comply with study protocol and had more eligible students than
could be served in one class so as to supply a sufficiently-sized control group8 Of the 23
schools 12 schools added AP Chemistry 10 schools added AP Biology and 1 school added both
courses We recruited two waves of schools (those that offered the course for the first time in
2013 and those that offered it for the first time in 2014) both waves were asked to field the
course for two years and the earlier-joining schools had the option of fielding the course for
three years The study includes 47 schools by cohort groups
Each participating school identified students that the school deemed eligible to take the new
AP Biology or Chemistry course in the spring of the prior year We treated all eligible students
8
who assented to participate in the study and who obtained consent from their parent or guardian
as study participants Upon receipt of signed consentassent forms we randomly offered
enrollment in the newly launched course to a subset of participating students9 The study
includes a total of 27 teachers and 1819 students (with an average of approximately 19 students
per AP class)
Figure 1 shows the geographic distribution of the 11 participating districts which are
primarily concentrated in the western southern and eastern regions of the country10 The
underrepresentation of districts in the Midwest is consistent with evidence that the Midwestern
region has experienced less competition over the years in access to selective postsecondary
institutions and a corresponding lag in AP participation rates (Bound Hershbein and Long
2009) Relative to districts across the nation those participating in the study tend to be in
neighborhoods with lower levels of socioeconomic status and to educate students who score
below average on tests in earlier grades (see Figure 2) Correspondingly participating schools
tend to be larger and more likely to educate students who are eligible for free or reduced-price
lunch Black and Hispanic than other schools (Panel A of Table 1)
There are two reasons for this over-representation of larger schools serving less economically
prosperous communities First AP courses are already offered in the majority of the nationrsquos
public high schools and schools that serve students from high-income families tend to offer
more AP subjects than schools that serve students from lower-income families (Malkus 2016
Theokas and Saaris 2013) Given that our research design only allowed for schools that had not
recently offered an AP science course the population of schools from which we recruited tended
to be those in settings with fewer resources Second participating schools were required to state
that they believed they would have 60 or more students who were qualified to take the AP
science course and this requirement tended to disqualify smaller high schools
Reflecting the school demographics participating teachers are slightly younger less
experienced and more likely to be female Black Asian American and of Hispanic ethnicity
than US high school science teachers generally (Panel B of Table 1) Nearly half (a third) of our
study teachers have less than or equal to five (two) years of teaching experience which is more
than double (triple) the rate of US high school science teachers Study teachers are more likely to
hold an undergraduate major in a STEM field than other high school science teachers yet far less
likely to hold a mastersrsquo degree and slightly less likely to have earned a teaching credential in
science Most of the participating teachers had previously taught a higher-level course (mostly
honors) yet only 47 percent of them had previously taught an AP course Our research
consequently applies to a population of teachers who are relatively new to the AP science
curriculum and who have generally not received graduate training11 Assuming AP courses
improve with teacher preparation our results likely capture the effect of a less-than-ideal version
of AP and may result in less positive treatment effects than when AP is delivered by teachers
with more training and experience (Clotfelter Ladd and Vigdor 2010)
B Data and Student Descriptive Statistics
We rely on three primary and secondary data sources for impact estimates The first is an
assessment developed and validated by the research team that measures studentsrsquo scientific
inquiry skills We administered this assessment to students in both treatment and control groups
and designed it to measure general inquiry skills (eg how to analyze data) rather than specific
content knowledge in Biology or Chemistry To that end the assessment tool includes nine items
that rely on science disciplinary knowledge that is taught in middle school specifically material
from Life Sciences and Physical Sciences The assessment which we administered to all study
9
participants during a 45-minute period measures studentsrsquo skills in data analysis scientific
explanation and scientific argument12 Participating teachers were not provided copies of the
instrument in advance therefore teachers were unable to teach any content material prior to test
administration
The second source is a questionnaire that we administered concurrently with the assessment
and that asks students a number of questions about their most recent science class and their plans
after high school The assessment and questionnaire were completed together and administered
outside of class (henceforth we refer to these instruments as the ldquosurveyrdquo) The third data source
are studentsrsquo high school transcripts which contain data on demographic and socioeconomic
background grades courses standardized exams taken in the 8th and 10th grades as well as high
school completion We use these data to determine the balance of randomization on pre-
treatment covariates estimate the effect of randomization on course-taking (including
compliance) improve the precision of our estimates with statistical controls and estimate
treatment effects on studentsrsquo grades
Our survey response rate was 78 percent13 Attrition can be attributed to student absences
during the dates scheduled for survey administration and communication lapses between school
coordinators and students Students who were randomly assigned to treatment have a 9-
percentage point higher survey response rate Given the possibility of nonrandom sample
attrition we weight all regressions by the inverse of the probability of completing the survey
conditional on student characteristics14 We implement a variety of robustness checks as
additional means to account for nonresponse These include multiple imputation of missing
outcome variables excluding one high school that had a low response rate and using the Lee
(2009) technique to provide bounds on the estimated effects These methods and results are
discussed below
We supplement these data with surveys that we administered online to teachers of the new
AP courses at the conclusion of the course The teacher survey includes questions about their
educational background professional experiences and professional development past and
present instructional practices generally and around science specifically participation in the
College Board AP training ability to cover the content of the AP course and coaching
mentoring and other professional community supports received from the school district and
education community
Table 2 provides balancing tests on pre-treatment characteristics for the full sample and the
survey respondents conditional on school by cohort fixed effects15 Most of the estimated
differences between treatment and control group students on pre-treatment observed
characteristics are small with some notable exceptions In both the full and survey samples
treatment group studentsrsquo reading exam scores were 010 and 009 standard deviations higher
than control group students both at p-values below 005 The magnitude of the treatment-control
difference was slightly lower and less precisely-estimated in math yet also favored treatment
group students16 To adjust for these chance imbalances we include all student covariates as
predictors of outcomes in the models and in the robustness checks we exclude these
covariates17
Table 2 also shows the extent of differences between control group compliers and non-
compliers We find that non-compliers are generally much more academically prepared for AP
science they have higher pre-treatment reading and math test scores and are more likely to have
completed the prerequisite courses On demographics non-compliers are more likely to be Asian
American and female18
10
IV Empirical Strategy
We estimate the effect of taking the AP science course with a standard instrumental variable
specification
(1) 119884119894119895 = 120572119895 + 119860119894119895120573 + 119935119894120574 + 120598119894119895
(2) 119860119875119894119895 = 120575119895 + 119874119891119891119890119903119890119889119894119895120579 + 119935119894120583 + 120598119894119895
where 119860119875119894119895 = 1 if student i enrolled in the AP science course in school x cohort stratum j 119860119894119895 is
the fitted value based on the estimates of the parameters in Equation (2) Offeredij = 1 if the
student is randomized into the treatment group Xi is a vector of pre-treatment covariates
(including age math and reading exam scores from 8th and 10th grade (standardized and
averaged for math and reading separately) cumulative GPA prior to the year when the AP
science course was offered and indicator variables for female racial group (Asian American
Black or Hispanic Native American or Multiracial) disability gifted English Language
Learner eligible for free or reduced-price lunch home language is not English and took
recommended prerequisite courses) and 120572119895 and 120575119895 are school by cohort fixed effects19 We use
two-stage least squares to estimate the model for all outcomes The local average treatment effect
(LATE) estimate is given by β
The intent to treat (ITT) estimate is obtained by replacing 119860119894119895 with Offeredij in Equation (1)
as shown in Equation (3) The coefficient on Offeredij in Equation (3) provides the effect of
being offered enrollment in the new AP science course and is a weighted average of effects on
those who do and do not choose to enroll in the course
(3) 119884119894119895 = 120577119895 + 119874119891119891119890119903119890119889119894119895120591 + 119935119894120582 + 120598119894119895
For outcomes that are obtained from the survey we weight regressions by the inverse of the
estimated probability of completing the survey20 The results are similar without using these
weights (see Online Appendix Tables 3 4 and 6) Since we have some missingness in student
characteristics as a result of either missing student transcripts or certain data elements not
collected by the district we use multiple imputation by chained equations creating 10 imputed
datasets and combine the results21 For inference we cluster standard errors at the level of
treatment assignment (school by cohort) in our analysis of main effects In the analysis of
robustness we report permutation standard errors robust standard errors (for comparison to
permutations) and the statistical significance of the LATE estimates after adjusting our tests of
significance for multiple comparisons
V Results
A Course-Taking and Treatment Contrast
Table 3 provides estimated effects of the randomized offer of enrollment on AP science course
enrollment and share of credits in all courses for the full sample and the survey samples The
first-stage estimates indicate that the offer substantially increased the likelihood of the student
taking the AP science course by 38 percentage points in the full sample and 39 percentage points
in the survey sample As we expected compliance with randomization was imperfect with 42
11
percent of the students who received an offer choosing not to enroll and 19 percent of the control
students enrolling Nearly all of these latter crossovers reflected decisions by the district to
violate the study protocol and let control group students into the course while a few of these
came from hardship exemptions that were requested by the school and granted by the study team
The remaining rows in Table 3 shine light on the courses that were crowded out by the newly
offered AP science course Mechanically treatment group students took more credits in AP
science (an 11-percentage point increase in the share of total credits in the full sample)
Treatment group studentsrsquo share of courses in any AP also increased by 11 percentage points
indicating that they chose not to reduce enrollment in other AP courses Instead taking AP
science appears to have crowded out regular courses (down 9 percentage points) including
regular science courses (down 2 percentage points)22
Approximately 78 percent of the control group compliers took any science course with 34
percent taking a non-AP advanced science course (almost entirely honors courses) during the
study year The control students who did not take AP Biology or Chemistry took a variety of
alternative science courses with the most commonly reported courses including Chemistry
(13) Physics (12) AP Environmental Science (11) Biology (10) Honors Biology (9)
and AnatomyPhysiology (9)
Table 4 provides the contrast in treatment and control group complier reports on the content
and rigor of their science courses for three composite variables We find that taking AP science
yielded a substantially more academically challenging curriculum (up 080 sd p-value lt 001)
and raised the extent of inquiry-based classroom activities (up 033 sd p-value = 006) Our
results also suggest that AP course-takerrsquos classrooms were more likely to use technology (up
028 sd p-value = 014)23 Online Appendix Table 5 shows estimated impacts on each of the
component variables used in constructing the composite variables We find that while AP
classrooms were more inquiry-based than other science classrooms using our composite
measure some of the core components of the inquiry approach that were intended by the Board
(eg applying knowledge to solve a new problem) were not more prevalent in AP science
classes than other science classes24 This contrast between studentsrsquo reports of the content and
rigor of their AP science course relative to other courses available to them offers one measure of
the relative quality of the treatment In a companion manuscript we provide a detailed evaluation
of implementation fidelity (the degree to which the courses were implemented as intended by the
Board) through teacher surveys course syllabi student transcripts and interviews with teachers
and school administrators (Long Conger and McGhee 2018) In that manuscript we find results
that are consistent with the finding that most teachers were able to implement a rigorous AP
science classroom yet they also struggled with the inquiry-based approach and integrating
technology into the classroom
These reported differences between treatment and control group classrooms also hold despite
the fact that many of the teachers selected to teach AP also teach the other science courses taken
by control group students In fact almost 67 percent of AP teachers reported using some of their
AP science strategies and lessons in their non-AP classes These within-school spillovers likely
attenuate observed differences in outcomes between treatment and control group students in the
same school25
B AP Impact on Outcomes
Table 5 reports estimated impacts of AP science on the key outcomes of interest We estimate
that for the typical complier taking AP science raises objectively measured scientific inquiry
skills by 023 standard deviations We are unable to rule out zero treatment impacts with
12
conventionally high levels of confidence (p-value = 014) and consequently refer to these results
as more suggestive than definitive AP science also increased compliersrsquo interest in pursuing a
STEM degree should they enroll in college by 9 percentage points up from a control group
complier mean of 62 percent with again more suggestive than definitive results at traditional
levels of statistical inference (p-value = 016)
Table 5 provides stronger evidence of negative treatment effects on studentsrsquo confidence in
their ability to succeed in a college science course Among control group compliers 92 percent
express that they are at least somewhat confident in their ability to succeed in a college science
course These high levels of confidence are perhaps not surprising since all of our sample
participants demonstrated interest in taking AP Chemistry or Biology as a result of signing the
study assent forms Taking AP science substantially lowered participantsrsquo likelihood of being at
least somewhat confident in their ability to complete college courses in science (down 10
percentage points p-value = 006) We also find large effects of the AP course on studentsrsquo self-
reported stress levels Among control group compliers 12 percent stated that their most recent
science class had a negative or strong negative impact on their stress levels (where a negative
impact indicates more stress) Taking AP science more than doubles this rate raising the
likelihood of stating a negative impact by 17 percentage points (p-value = 001) In results
available from the authors we also examine the effect of taking AP on the full distribution of
studentrsquos self-reported confidence and stress levels We find that taking AP science increases
studentsrsquo likelihood of reporting strong negative impacts on stress by 5 percentage points (p-
value = 005) above the control group complier mean of 2 percent
In addition to experiencing a loss in confidence and an increase in stress treatment group
studentsrsquo grades suffered We estimate that taking AP science reduced studentsrsquo grades in their
science courses by 029 points (p-value = 007) Relative to a control group complier mean of
280 taking AP science lowers studentsrsquo science GPAs during the study year (usually their junior
year) from around a B- to a C+26 This decline is addressed to some degree by high schools that
use a weighted grade point average to upweight grades from AP courses The last row of Table 5
provides our estimated effects of AP science on studentsrsquo grades in other courses AP science
takers score approximately 018 grade points lower than control group compliers in non-science
courses during the study year (p-value below 001) These results suggest that students may be
shifting their effort away from their non-AP classes in order to meet the demands of the
challenging AP course An average of these impacts weighted by studentsrsquo share of credits in
science during the study year assuming that they take AP science (024) suggests that taking AP
science lowers studentsrsquo overall grades by 021 during the year ((-029 times 024) + (-018 times
076))
With our estimates in hand we can easily compute the adjustment that would leave the
studentrsquos GPA during the study year unaffected For students who took AP Biology or Chemistry
as result of this experiment the share of their classes in any AP science subject is predicted to be
14 percent (ie 002 + 012 from Table 3) If these studentsrsquo grades in AP science courses were
boosted by 146 (021014) their GPAs during the study year would be unaffected by their
enrollment in these AP courses This 146 boost is close to the higher end of the practices
documented in Klopfenstein and Lively (2016)27
C Robustness Checks
Table 6 presents a variety of robustness checks of the ITT estimates on our six main outcomes
The first two columns of this table repeat the findings previously shown in Table 5 Columns (3)
and (4) present alternate methods for inference Column (3) reports robust standard errors and
13
Column (4) reports the results of a permutation test where we randomly assign a pseudo
treatment and compute the share of 1000 permutations where the absolute value of the estimated
pseudo treatment effect exceeds the absolute value of the estimated treatment effect shown in
Column (2)28 The resulting p-values from this permutation test are similar to the results using
robust standard errors (shown in Column (3)) resulting in five of the six outcomes with p-values
of less than 01029
Columns (5) through (7) of Table 6 show that the results are robust to (a) dropping the one
high school that offered both AP Biology and AP Chemistry as part of the study (b) including
observations with multiply-imputed missing outcome variables and (c) excluding the high
school with the lowest survey response rate30 Column (8) shows the results when we exclude all
of the Xi covariates where we find much larger estimated positive effects on scientific inquiry
skills and smaller estimated negative effects on grades The differences in the treatment effects
on the remaining three outcomes are modest These results likely reflect the fact that students
who were randomly assigned into the treatment group have higher pre-treatment grades and
reading and math test scores all covariates that strongly correlate with science skill and future
grades
Columns (9) through (12) of Table 6 use the Lee (2009) method to place bounds on our
estimates due to potential nonresponse bias in the student survey used for the first four outcomes
This method trims particular observations from the treatment group (in this case) until it matches
the response rate of the control group The lower (upper) bound estimate trims the treatment
observations with the highest (lowest) values of the outcome Using these lower and upper bound
estimates we compute the 95 percent confidence interval for the treatment effect itself by
applying the Imbens and Manski (2004) method Consistent with our main findings the upper
and lower bound points estimates are positive for science skill (003 and 039 sd) interest in
pursuing a STEM degree (2 and 12 percentage points) and stress (1 and 11 percentage points)
However the 95 percent confidence intervals overlap zero in all cases and are roughly double the
size of the ordinary confidence intervals These results suggest that some additional caution
should be considered in evaluating the effects from outcomes based on the study survey31
Finally we would have liked to report the results of theoretically motivated heterogeneity
analyses yet we lack the statistical power needed to test heterogeneity with a high level of
confidence For example Figure 3 shows a quantile regression conditional on Xi with science
skill as the outcome We find that the point estimates at every quantile are insignificantly
different from the 009 ITT point estimate reported in Table 5 yet the 95 confidence intervals
fail to rule out large positives and negatives Additional heterogeneity results can be found in the
Online Appendix32
VI Conclusion
Most admissions committees at bachelorrsquos degree-granting institutions rely on applicantsrsquo AP
course and exam participation as signals of subject-matter skill and interest rendering the
relationship between AP uptake and college enrollment somewhat deterministic There has been
almost no empirical work to support the theory that AP disproportionately endows high school
students with greater human capital than the other courses available to them Many students
educators and parents have also complained that the rigor of the AP pro- gram causes students to
lose confidence gain stress and perform poorly in other courses We evaluate these claims with
experimental evidence on the impact of AP Biology and Chemistry courses on studentsrsquo skills
14
interests and beliefs We recruited 23 schools that had not previously offered AP Biology or
Chemistry and were willing to permit us to randomize student access to the newly offered
course At the time of our school recruitment an estimated 50 percent of US high schools
already offered AP science classes and they tended to be in relatively higher-income
communities disproportionately serving White students (Malkus 2016) Our study drew from the
remaining population of schools where teachers had lower levels of training than science
teachers nationally and students were disproportionately non-White and poor Consequently our
results on AP impacts best generalize to schools like these that are on the cusp of deciding
whether to offer an AP science course
The estimates suggest that AP science led to improvements in science skill and STEM
interest above the courses that these students would otherwise take Prior research points to
longer-run benefits of AP including a higher likelihood of college enrollment and completion as
well as possible earnings gains (Jackson 2010 2014) Our findings suggest that these long-term
effects are at least partially driven by genuine increases in skill and not due solely to
postsecondary admissions and credit-granting policies33 We also find that AP science classes
substantially increase studentsrsquo stress levels and reduce their confidence in completing a college
science course Students who take AP science also receive lower grades in science and in other
(non-science) courses The cognitive gains from AP science are consistent with evidence that
higher levels of pressure and a lower level of confidence cause students to learn more than they
would otherwise And some of the negative effect on grades can be offset by upwardly weighting
grades in advanced courses
Although we have no direct way to convert our study impacts into monetary values for
students or society our evidence suggests that schools and districts are not making unwise or
costly investments in AP Calculating the differential cost to deliver an AP course versus another
level course in the same subject is difficult given that few schools document per-course
expenditures One recent analysis of a US district that relied on teacher salaries and course
assignments offers a partial cost-analysis Roza (2009) finds approximately $360 more in per-
pupil expenditures to deliver AP versus honors due primarily to smaller class sizes and more
senior teachers in AP This cost does not factor in the time that teachers spend retraining
themselves to teach the new curriculum At the same time relative to other policies aimed at
increasing human capital in high school that are often more costly to implement (such as
reducing class size) offering an AP course may be one of the least expensive options
This study offers the first credible estimates on the impact of a curriculum that is now offered
in the majority of the nationrsquos high schools and used by most postsecondary institutions to assess
applicant potential Our findings offer evidence to support and refute some of the claims made
about the AP program At the same time many important questions remain about differential AP
course impacts along student teacher and school attributes and on different parts of the outcome
distributions What are the general equilibrium effects of AP expansion for instance on college
admissions decisions as AP expands into schools with fewer resources Do AP courses generate
spillover effects on non-AP course-takers via changes in peer interactions and changes in how
teachers teach their non-AP classes These are all questions that warrant further research
15
References
Abadie Alberto Susan Athey Guido W Imbens and Jeffrey Wooldridge 2017 ldquoWhen Should
you Adjust Standard Errors for Clusteringrdquo NBER Working Paper No 24003
Cambridge MA NBER
Adelman Clifford 2006 The Toolbox Revisited Paths to Degree Completion from High School
Through College Washington DC US Department of Education
Aguilar Lauren Greg Walton and Carl Wieman 2014 ldquoPsychological Insights for Improved
Physics Teachingrdquo Physics Today 67 (5) 43ndash49
Altonji Joseph G 1995 ldquoThe Effects of High School Curriculum on Education and Labor
Market Outcomesrdquo The Journal of Human Resources 30 (3) 409ndash438
Anderson Carl R 1976 ldquoCoping Behaviors as Intervening Mechanisms in the Inverted-U-stress-
performance Relationshiprdquo Journal of Applied Psychology 61 (1) 30ndash34
Attewell Paul and Thurston Domina 2008 ldquoRaising the Bar Curricular Intensity and
Academic Performancerdquo Educational Evaluation and Policy Analysis 30 (1) 51ndash71
Avery Christopher Oded Gurantz Michael Hurwitz and Jonathan Smith 2018 ldquoShifting
College Majors in Response to Advanced Placement Exam Scoresrdquo Journal of Human
Resources 53 (4) 918ndash956
Benjamini Yoav and Yosef Hochberg 1995 ldquoControlling the False Discovery Rate A Practical
and Powerful Approach to Multiple Testingrdquo Journal of the Royal Statistical Society 57
(1) 289ndash300
Bennett J S Hogarth F Lubben B Campbell and A Robinson 2010 ldquoTalking Science The
Research Evidence on the Use of Small Group Discussions in Science Teachingrdquo
International Journal of Science Education 32 (1) 69ndash95
Berger Joe 2006 ldquoDemoting Advanced Placementrdquo The New York Times October 4
Boekaerts Monique and Jeroen S Rozendaal 2010 ldquoUsing Multiple Calibration Indices in
Order to Capture the Complex Picture of What Affects Studentsrsquo Accuracy of Feeling of
Confidencerdquo Learning and Instruction 20 (5) 372ndash382
Bound John Brad Hershbein and Bridget Terry Long 2009 ldquoPlaying the Admissions Game
Student Reactions to Increasing College Competitionrdquo The Journal of Economic
Perspectives 23 (4) 119ndash146
Bowie Liz 2013 ldquoMaryland Schools have been Leader in Advanced Placement but Results are
Mixedrdquo The Baltimore Sun August 17
Bush George W 2006 ldquoState of the Union Address by the Presidentrdquo Washington DC The
White House
Chiu Ming Ming and Robert M Klassen 2010 ldquoRelations of Mathematics Self-Concept and its
Calibration with Mathematics Achievement Cultural Differences among Fifteen-year-
olds in 34 Countriesrdquo Learning and Instruction 20 (1) 2ndash17
Clotfelter Charles T Helen F Ladd and Jacob L Vigdor 2010 ldquoTeacher Credentials and
Student Achievement in High School Across-Subject Analysis with Student Fixed
Effectsrdquo Journal of Human Resources 45 (3) 655ndash681
College Board 2002 Equity Policy Statement New York NY
__________ 2011a AP Biology Curriculum Framework 2012-2013 New York NY
__________ 2011b AP Chemistry Curriculum Framework 2013-2014 New York NY
__________ 2017a AP Course and Exam Redesign New York NY
__________ 2017b AP Course Audit New York NY
__________ 2018 AP Program Participation and Performance Data 2018 New York NY
16
Davis Jennifer R 2014 ldquoA Little Goes a Long Way Pressure for College Students to Succeedrdquo
Journal of Undergraduate Research 12 (1) 1ndash9
Dobbie Will and Roland G Fryer Jr 2015 ldquoThe medium-term impacts of high-achieving
charter schoolsrdquo Journal of Political Economy 123 (5) 985ndash1037
Dougherty Chrys and Lynn Mellor 2009 ldquoPreparation Mattersrdquo National Center for
Educational Achievement Washington DC
Dounay Zinth Jennifer 2016 ldquo50-State Comparison Advanced Placement Policiesrdquo Education
Commission of the States
Drew Christopher 2011 ldquoRethinking Advanced Placementrdquo The New York Times January 7
Duffett Ann and Steve Farkas 2009 ldquoGrowing Pains in the Advanced Placement Program Do
Tough Trade-offs Lie Aheadrdquo Thomas B Fordham Institute Washington DC
Ellis Jessica Bailey K Fosdick and Chris Rasmussen 2016 ldquoWomen 15 Times More Likely to
Leave STEM Pipeline after Calculus Compared to Men Lack of Mathematical
Confidence a Potential Culpritrdquo PLOS ONE 11 (7) 1ndash14
Foust Regan Clark Holly Hertberg-Davis and Carolyn M Callahan 2009 ldquoStudentsrsquo
Perceptions of the Non-academic Advantages and Disadvantages of Participation in
Advanced Placement Courses and International Baccalaureate Programsrdquo Adolescence
44 (174) 289ndash312
Geiser Saul and Veronica Santelices 2004 ldquoThe Role of Advanced Placement and Honors
Courses in College Admissionsrdquo Center for Studies in Higher Education Research
Occasional Paper Series CSHE404
Goodman Joshua Samuel 2012 ldquoThe Labor of Division Returns to Compulsory Math
Courseworkrdquo Unpublished Manuscript
Harel O 2009 ldquoThe Estimation of R-squared and Adjusted R-squared in Incomplete Data Sets
Using Multiple Imputationrdquo Journal of Applied Statistics 36 (10) 1109ndash1118
Hippel Paul T von 2007 ldquoRegression with Missing Ys An Improved Strategy for Analyzing
Multiply Imputed Datardquo Sociological Methodology 37 (1) 83ndash117
Holstead Michael S Terry E Spradlin Margaret E McGillivray and Nathan Burroughs 2010
ldquoThe Impact of Advanced Placement Incentive Programsrdquo Center for Evaluation and
Education Policy Indiana University Education Policy Brief 8(1)
Hopkins Katy 2012 ldquoWeigh the Benefits Stress of AP Courses for Your Studentrdquo US News
the World Report May 10
Huber Martin 2013 ldquoA Simple Test for the Ignorability of Non-compliance in Experimentsrdquo
Economics Letters 120 (3) 389ndash391
Imbens G and F Manski 2004 ldquoConfidence Intervals for Partially Identified Parametersrdquo
Econometrica 72 (6) 1845ndash1857
Jackson C Kirabo 2010 ldquoA Little Now for a Lot Later A Look at a Texas Advanced Placement
Incentive Programrdquo Journal of Human Resources 45 (3) 591ndash639
__________ 2014 ldquoDo College-Preparatory Programs Improve Long-Term Outcomesrdquo
Economic Inquiry 52 (1) 72ndash99
Joensen Juanna Schroslashter and Helena Skyt Nielsen 2009 ldquoIs there a Causal Effect of High
School Math on Labor Market Outcomesrdquo Journal of Human Resources 44 (1) 171ndash
198
Kim Emily 2015 ldquoAP Classes often Translate to Advanced Pressurerdquo Los Angeles Times
September 22
Klopfenstein Kristin and Kit Lively 2016 ldquoDo Grade Weights Promote More Advanced
17
Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324
Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement
Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891
__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and
Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds
Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188
Cambridge Harvard Education Press
Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla
Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)
287ndash 313
Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on
Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102
Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations
of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347
(6219) 262ndash265
Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math
and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic
Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student
STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher
Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking
on Secondary and Postsecondary Successrdquo American Educational Research Journal 49
(2) 285ndash322
Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP
Expansion Can Schools in Less-Resourced Communities Successfully Implement
Advanced Placement Science Coursesrdquo Conditionally accepted by Educational
Researcher
Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo
American Enterprise Institute Washington DC
Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23
McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy
Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of
Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-
144) US Department of Education Washington DC National Center for Education
Statistics
National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of
Mathematics and Science in US High Schoolsrdquo Washington DC National Academies
Press
__________ 2012 A Framework for K-12 Science Education Practices Crosscutting
Concepts and Core Ideas Washington DC The National Academies Press
Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC
Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data
Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures
Version 10 Stanford University
Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic
Analysis amp Policy 4 (1) 1ndash30
18
Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The
Review of Economics and Statistics 86 (2) 497ndash513
Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)
Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of
Advanced High School Coursework in Increasing STEM Career Interestrdquo Science
Educator 23 (1) 1ndash13
Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework
in College Admission Decisionsrdquo College and University 82 (4) 7ndash14
Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan
Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific
Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo
Educational Measurement Forthcoming
Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where
it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor
Economics 35 (1) 67ndash147
Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An
Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732
Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual
differencesrdquo Personality and Individual Differences 21 (6) 971ndash986
Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of
Cross-Cultural Psychology 45 (5) 821ndash837
Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid
Growthrdquo The New York Times April 29
Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo
Liberal Education 94 (3) 38ndash43
The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo
Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo
Education Trust June 5
Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and
Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-
001) US Department of Education Washington DC National Center for Education
Statistics
Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13
Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate
US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the
Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced
Placement Testsrdquo Washington DC
Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of
Advanced Placementrdquo Progressive Policy Institute Washington DC
West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth
Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring
Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation
and Policy Analysis 38 (1) 148ndash170
Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity
of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482
19
Figure 1
Geographic Distribution of Participating Districts
20
Figure 2
Participating Districts Neighborhood Socioeconomic Status and School Test Scores
Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school
district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos
neighborhood defined as the first principal component factor score based on measures of median
income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed
household rate and unemployment rate Y-axis is the districtrsquos average test score in grade
equivalents based on the averaged spring math and English scores for students in grades 3-8 for
2009-2013 with the expected level of achievement standardized to zero The size of each circle
is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using
Statarsquos default settings and roughly shows the predicted test score as a function of the
neighborhoodrsquos SES
21
Figure 3
Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile
Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects
Corresponding OLS estimate shown by the dashed horizontal line Science skill has been
standardized to have a mean of 0 and SD of 1 for the full sample of participating students
Results are weighted by the inverse probability of completing the survey
22
Table 1
Participating Schools and Teachers Compared to Other US High Schools and High School
Science Teachers Panel A Schools Participating Others
Average Enrollment 1409 723
Free or Reduced-Price Lunch 0700 0438
Asian 0055 0050
Black 0349 0154
Hispanic 0410 0221
White 0164 0537
Adjusted Cohort Graduation Rate 0843 0802
District Instruction Expenditures Per Pupil $6561 $5636
District Student Services Expenditures Per Pupil $3787 $3385
Panel B Teachers Participating Others
Age Under 30 0407 0160
Age 30-49 0432 0553
Age 50 or over 0161 0287
Female 0630 0536
Hispanic or Latino 0111 0051
Race American Indian or Alaska Native 0000 0009
Race Asian American 0111 0041
Race Black 0111 0060
Race Native Hawaiian or other Pacific Islander 0000 0004
Race White 0778 0896
Years of Experience 103 132
Years of Experience lt=2 0290 0085
Years of Experience lt=5 0481 0234
Hold a Teaching Certificate 0926 0945
Undergraduate Major in STEM 0944 0747
Single Subject Credential in Science 0630 0823
Masterrsquos Degree or Higher 0356 0615
Previously Taught AP Course 0469 NA
Previously Taught AP IB or Honors Course 0796 NA
Number of Professional Development Trainings 309 NA
in the Past 5 years (0-5)
Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts
httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public
high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a
9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the
Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey
httpsncesedgovsurveyssass Others in Panel B refers to public and private high school
teachers in the US High school science teachers are defined as teachers of grades 9-12 whose
main teaching assignment is in the natural sciences
23
Table 2
TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics
(1) (2) (3) (4) (5) (6)
Full Sample Survey Sample
Pre-Treatment Characteristic
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Age as of October of 11th Grade 166 -003 -007 166 -001 -001
(002) (007) (003) (009)
[019] [035] [065] [094]
Math Exam Score 038 008 025 044 007 030
(004) (010) (005) (016)
[008] [002] [017] [006]
Reading Exam Score 029 010 018 036 009 017
(003) (012) (004) (017)
[000] [014] [002] [031]
HS Grade Point Average 316 005 020 323 006 013
(003) (008) (003) (010)
[014] [002] [006] [020]
Female 059 000 010 061 -001 011
(003) (006) (004) (007)
[099] [010] [073] [012]
Asian American 012 002 010 012 003 010
(002) (005) (001) (007)
[027] [006] [007] [012]
Black 032 -002 -006 027 000 -005
(002) (006) (002) (005)
[029] [028] [088] [040]
Hispanic Native American or Multiracial 031 001 005 033 001 005
24
(002) (006) (002) (007)
[055] [041] [081] [051]
Disabled 002 000 -001 001 000 -001
(001) (001) (001) (001)
[093] [024] [057] [05]
Gifted 013 003 000 014 002 001
(002) (005) (002) (009)
[006] [100] [025] [089]
English Language Learner 005 001 002 004 001 004
(001) (002) (001) (003)
[041] [039] [054] [022]
Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007
(002) (007) (003) (009)
[066] [077] [072] [045]
Language Other than English Spoken at Home 034 002 003 035 001 004
(002) (007) (002) (007)
[032] [073] [059] [056]
Took Recommended Prerequisite Courses 079 000 009 079 002 005
(002) (004) (002) (005)
[084] [004] [043] [031]
Number of Observations 1819 1417
Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by
School x Cohort are in parentheses and p-values are in brackets
25
Table 3
First Stage Impacts on AP Course Enrollment and Overall Course Enrollment
(1) (2) (3) (4) (5) (6)
Full Sample Survey Respondents
Outcome
Control
Group
Mean
ITT
LATE
Control
Group
Mean
ITT
LATE
AP Treatment Course Enrollment 019 038 024 039
(005) (006)
[000] [000] Share of Credits During Study Year in
AP Science 003 004 011 003 004 010
(001) (001) (001) (001)
[000] [000] [000] [000]
All AP 013 004 011 014 004 010
(001) (002) (001) (002)
[000] [000] [000] [000]
Other Advanced Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [023] [020] [020]
All Other Advanced 025 -001 -003 025 -001 -003
(001) (002) (001) (003)
[023] [023] [030] [030]
Regular Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [020] [024] [019]
All Regular 062 -003 -009 061 -003 -007
(001) (003) (001) (003)
[002] [000] [007] [003]
Number of Observations 1819 1417
Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating
Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation
(1) Course-taking information collected from student transcripts Control Group Mean uses the
full control group for the first outcome (ie AP Treatment Course Enrollment) and those control
group members who complied with their assignment (ie those who did not take the AP
Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are
weighted by the inverse probability of completing the survey Standard errors clustered by School
x Cohort are in parentheses and p-values are in brackets
26
Table 4
Treatment Contrast (Composite Variables)
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Academically Challenging Curriculum -033 031 080
(010) (024)
[000] [000]
Project-Based Independent Classroom
Activities -006 013 033
(007) (017)
[007] [006]
Integrated Use of Technology
-011 011 028
(008) (019)
[019] [014]
Number of Observations 1417
Notes To construct these composite variables we first converted the values on each component
variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest
category was set to 10 the lowest to 00 and the remaining categories evenly spaced between
00 and 10 We then averaged and standardized these converted values Results are weighted by
the inverse probability of completing the survey Online Appendix Table 5 provides the list of
component variables Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
27
Table 5
AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Science Skill -019 009 023
(006) (016)
[015] [014]
STEM Interest 062 004 009
(002) (007)
[016] [016]
Confidence in College
Science 092 -004 -010
(002) (005)
[011] [006]
Stress 012 007 017
(003) (007)
[002] [001]
Grades in Science Courses 280 -012 -029
(007) (016)
[008] [007]
Grades in Other Courses 314 -007 -018
(002) (006)
[000] [000]
Number of Observations 1819 for grades 1417 for other
outcomes
Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of
participating students STEM interest =1 if high or some interest in pursuing a STEM degree or
=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to
complete a college science course or =0 if somewhat not confident or not at all confident Stress=
1 if most recent science course had strong negative or negative impact on physical or emotional
health or =0 if strong positive impact positive impact or no impact Grades in science and other
courses are obtained from student transcripts and measure grades during the study year
Results with the exception of grades during study year are weighted by the inverse probability of
completing the survey Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
Table 6
Robustness Checks of Main ITT Results
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Outcome
Control
Group
Complie
r Mean
Main
Result
s
Robus
t SE
p-value
(permutatio
n test)
Excludin
g High
School
56
Including
Imputatio
n of
Missing
Outcome
Variables
Excluding
Covariate
s
Excludin
g High
School
23
Lee
Lower
Boun
d
Lee
Upper
Boun
d
95
Confidence
Interval
from Lee
Bounds
Rati
o of
95
CI in
(11)
to
95
CI in
(7)
Science Skill -019 009 010 011 020 007 003 039
-
009
05
1 20
(006) (005) (000) (000) (000) (000) (007) (007)
[015] [006] [006] [020] [011] [001] [024] [072] [000]
STEM Interest 062 004 005 003 003 003 002 012
-
003
01
8 19
(002) (003) (000) (000) (000) (000) (003) (004)
[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College
Science 092 -004 -003 -006 -006 -004 -006 005
-
009
01
0 20
(002) (002) (000) (000) (000) (000) (002) (003)
[011] [005] [007] [037] [002] [003] [010] [000] [017]
Stress 012 007 005 006 008 007 001 011
-
005
01
5 16
(003) (002) (000) (000) (000) (000) (003) (002)
[002] [000] [000] [014] [007] [002] [002] [079] [000]
Grades in Science Courses 280 -012 -006 -010 -007 |
(007) (004) (000) (000) (000)
[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts
Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey
(002) (003) (000) (000) (000) |
[000] [001] [001] [000] [001] [038]
Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than
standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby
a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of
43
the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)
reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the
experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply
imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and
where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)
from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those
treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and
control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to
derive confidence interval for the treatment effect itself)
30
1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the
Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the
effect of advanced high school courses more generally often without distinctions between AP
and other rigorous course options Nearly all of these nonexperimental studies find large positive
effects of rigorous secondary school courses particularly those in math and science on studentsrsquo
high school postsecondary and labor market performance (eg Altonji 1995 Attewell and
Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long
Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer
an AP course are encouraged (though not required) to attend a professional development
training The Board and other independent agencies offer several workshops with the most
extensive training being the AP summer institute a week-long training that is led by an
experienced AP instructor Teachers are then expected to develop their syllabi for the course and
submit them to the Board for review A team of auditors at the Board review each syllabus and
grant permission to a school to label the course as AP on course catalogs and student transcripts
once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they
do not meet the requirements upon original submission College Board (2017b) contains a
discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for
assessment (ie course delivery and student performance are not assessed by the Board) In
order to effectively run an AP Biology or Chemistry course teachers require access to a well-
equipped classroom and laboratory including all supplies necessary to engage in
experimentation (eg beakers solutions microscopes measuring equipment) Most of the
teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of
learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry
reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses
were based upon recommendations from the National Science Foundation the National Research
Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently
influenced by their frames of reference in ways that other assessments of these traits (eg
external observations) may be less influenced By increasing the standard to which they compare
themselves studentsrsquo confidence may decrease This feature of most self-assessments could be
considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et
al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome
depends to some extent on how these changes in perceived ability influence other behaviors
such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and
Biology I and Chemistry I for AP Biology with no additional requirements beyond these
prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week
training course classroom supplies (eg lab materials textbooks) and to compensate schools
for the staff time required for study administration efforts We also offered $1000 compensation
for an individual selected by the school to serve as a liaison between the study team and the
31
school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of
students which would have powered the study to detect effect sizes smaller than those detected
here We faced several challenges in recruiting schools to participate even with the monetary
incentives Some schools were uncomfortable with randomization across classrooms while
others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the
course would be offered We also made some assignments on a rolling basis as additional
consentassent forms were submitted We have no information on the students who were deemed
eligible by the school to take the new AP science course but who did not sign the consent form
to participate As these students did not participate we do not have permission to obtain
information on their characteristics (eg via transcripts) and for most schools we do not know
the number of such students 10 Participating districts include Anaheim Union High School District California East Side
Union High School District California Lynwood Unified School District California Jefferson
Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg
Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public
Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville
Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate
degrees held by teachers nationally are likely to be in education (not STEM) Thus the study
teachers are less likely to have a graduate degree but not necessarily less likely to have STEM
training We also did not survey teachers regarding their Teach for America (TFA) experience
but it is possible that the relatively high share of STEM undergraduate degrees could be driven
by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last
pilot test included 140 students) prior to administering the tool to study participants Reliability
metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of
the development of the assessment instrument in the survey can be found in Seeratan et al
(2017) 13 Each year in the spring semester our team administered and collected the participant surveys
during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However
if study participants who did not take the survey differ in unobserved ways then our reweighting
based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo
characteristics before imputation of missing values (as described below) these results are very
similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact
that some students did not have 10th grade scores) we created one reading and math score for
each student that is the average of both scores or just the 8th grade score For the 23 participating
students who were in 10th grade during the year in which the AP course was offered to their
cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be
endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of
enrollment in the course so the lack of balance is simply due to unlucky randomization rather
32
than manipulation by school administrators We considered implementing a randomized block
design to avoid such issues but found it infeasible to obtain the necessary test score information
prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was
allowed to register for the new class We added an entire planning year to our study design to
avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by
Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We
find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these
six outcomes which suggests that generalizing our estimated treatment effects to the full control
group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly
offered enrollment in an AP course and then given the option of Chemistry or Biology To
account for the two courses offered we treat the school as two separate groups School-
Chemistry and School-Biology For those students who were not offered an AP course we
randomly assign them to one of two control groups proportional to the number of treated
students who chose each course For example if 60 of the treated students chose Biology then
we randomly assign 60 of the control students to the School-Biology control group In Section
VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a
probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1
if student i in school by cohort j completed any part of the end-of-year survey Xi is the same
vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed
effects and Φ() is the cumulative normal distribution function The results of this regression are
included in Online Appendix Table 2 Students who had higher pre-treatment grades Black
students those who were not disabled and those who took prerequisite courses were more likely
to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives
more weight in the regression to study participants who completed the survey and yet had pre-
study characteristics that were similar to those study participants who did not complete the
survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and
with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we
observe each outcome variable This follows a multiple imputation then deletion strategy
suggested by Hippel (2007) which improves efficiency while protecting against problematic
imputed outcome values As a robustness check Section VC provides results including
imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually
challenging curriculum with more homework than non-AP complier students Treatment group
students are also more likely to report that the students in their class were driven to succeed and
that the teacher set high standards The AP science class also involved more student-led projects
or experiments hands on learning and small group work all activities that are deemed to be
essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)
Yet we do not find strong evidence that students in AP classes were more likely to present what
they learned apply their knowledge to solve a new problem or work independently and none of
the component measures of technology usage were statistically significantly affected Nor did
33
treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear
better able to implement the academic rigor expected of an AP science class than some of the
inquiry-based approaches that the College Board intends for AP science We do not find
evidence that taking AP science led students to be more likely to report that they found their
course more interesting which may reflect the inability of the teachers to fully implement a
creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects
that might render our estimated effects smaller A research design with randomization both
across and within schools would allow for estimation of spillover effects but such a design was
infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers
received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the
weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors
in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple
outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons
(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same
three outcomes that reach statistical significance without applying the correction (shown in
Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys
from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we
have included the students from cohort 1 of high school number 23 where nonresponse was due
mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes
(including student prior academic preparation raceethnicity gender and teacher preparation)
We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in
science and grades in other courses) Some of the differences in the point estimates were quite
large yet so too were the standard errors For instance five of the seven estimated differential
treatment effects on science skill exceed 025 standard deviations with p-values that fall in the
suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse
on all three cohorts of study participants Once data collection is complete we will have the
ability to examine the effect of AP science on college enrollment college selectivity and college
completion
2
I Introduction
The Advanced Placement (AP) program a set of college-level courses and exams offered at the
high school level has become a centerpiece of efforts to strengthen the transition to
postsecondary training and boost human capital Many colleges and universities treat studentsrsquo
enrollment in AP courses and scores on AP exams as a signal of quality in admissions and grant
college credit or course waivers to students who receive high AP exam scores (Geiser and
Santelices 2004) These incentives have prompted a substantial increase in the number of
students taking AP courses and exams in recent decades with more than five times as many AP
exams taken in 2018 (over five million) as in 1996 (less than one million) (College Board 2018)
At the programrsquos inception in the mid-1950s AP courses were found in a handful of elite private
schools today AP is offered in nearly 70 percent of public schools in the United States (Thomas
et al 2013)
Much of the expansion has been driven by federal and state policies designed to increase
access to AP including offering subsidies to pay for exams building AP course offerings into
school accountability requirements and requiring public postsecondary institutions to offer
credit for AP exam scores (Adelman 2006 Dounay Zinth 2016 Holstead et al 2010) For almost
20 years for instance the US Department of Education has provided states with funds to offset
the cost of AP exams for low-income students1 Despite the programrsquos popularity among many
AP also has its critics Some researchers and educators claim that the programrsquos effectiveness
has been oversold and that there is no real evidence that AP endows students with greater skill or
subject-matter interest than other high school courses (Berger 2006 Drew 2011 Klopfenstein
and Thomas 2010 2009 Tai 2008 Tierney 2012) Others worry that the pressure of AP courses
causes students undue stress and confidence loss (Hopkins 2012 Kim 2015 Steinberg 2009)
The expansion of AP to less-resourced schools has also raised concerns that many of the students
now taking the courses are academically underprepared such that the monetary and psychic costs
of the investment may outweigh the potential benefits (Bowie 2013 Dougherty and Mellor 2009
Duffett and Farkas 2009 Smith Hurwitz and Avery 2017 Tierney 2012) Up to now
researchersrsquo ability to generate causal evidence on any of the claims made by proponents and
opponents has been substantially limited by the nonrandom sorting of students into AP classes
As a result all of the prior research on AP impacts has been observational
In this paper we provide the first experimental evidence on AP program impacts We focus
on AP science courses which have been endorsed by educators and policymakers as a key
strategy for increasing American studentsrsquo skill and interest in Science Technology
Engineering and Mathematics (STEM) and strengthening the STEM workforce (eg Adelman
2006 Bush 2006 House 2016) With participation from 23 schools and over 1800 students from
across the United States we randomly offered students enrollment into newly launched AP
Biology or Chemistry courses in their schools To directly evaluate whether AP endows students
with higher levels of skill than other science courses we designed and validated an instrument to
measure studentsrsquo scientific inquiry abilities (eg the ability to analyze data and make scientific
arguments) We also collected administrative data and surveyed students to assess AP impacts on
their interest in pursuing a STEM degree in college confidence in completing a college science
course high school grades and stress levels In addition to generating impact estimates we
report on the courses that AP crowds out along with the contrast between treatment and control
students in the content and rigor of their science courses
The results suggest that there is some truth in the claims made by both advocates and critics
3
of AP Consistent with the goals of an AP course treatment group students report that their
courses are more challenging and inquiry-based than control group students These views are
shared by teachers who report a higher level of rigor in their AP science courses compared to
their other science courses We find suggestive evidence that this academic challenge leads to
increases in skill AP course-takers score 023 standard deviations higher than control group
compliers on the end-of-year assessment of scientific skill Though our precision prevents us
from ruling out zero treatment effects at traditional levels of statistical inference (p-value=014)
this large point estimate suggests genuine productivity gains for students who take AP science
over and above the gains experienced by students who enroll in other high school courses We
also find suggestive evidence of an AP science boost to studentsrsquo interest in pursuing a STEM
degree should they enroll in college Together these results fail to support the concern that the
AP programrsquos impact on human capital has been oversold
At the same time our results confirm that the workload and expectations of an AP science
class causes students to lose confidence in their ability to succeed in college-level science gain
stress and earn lower grades (prior to the weights that are often attached to AP grades by
secondary and postsecondary institutions) The confidence levels among study participants are
quite high with 92 percent of control group compliers reporting that they are ldquosomewhatrdquo or
ldquoextremelyrdquo confident in their ability to succeed in a college science course AP course-takers
report a 10-percentage point lower estimation of their ability Students in the AP course are also
more than twice as likely as control group compliers to report that the course negatively affected
their physical or emotional health (our measure of stress) And comparisons of transcripts reveal
that treatment group students earned lower preweighted grades in science and other subjects
during the year that they took the AP class
Our study contributes to a small research base on the effects of the AP program2 Using a
regression discontinuity design Smith Hurwitz and Avery (2017) show that students who
barely earn a college-credit equivalent score on the AP exam (eg scoring just above the
threshold necessary to receive a 3 on the exam (out of 5) are more likely to complete their
bachelorrsquos degrees in four years than students who fall just below that threshold In a related
paper that relies on the same data and design Avery et al (2018) demonstrate that AP exam
scores also influence studentsrsquo college major choices These compelling results demonstrate that
students take advantage of postsecondary AP credit policies to waive out of intro courses and
that receiving a higher AP exam score may serve as a signal of skill to both institutions and
students These two studies however do not show that AP courses per se led to skill
development as they focus solely on differences in behavior for AP exam-takers who fall just
below and just above the score thresholds Jackson (2010 2014) evaluates the impacts of the AP
Incentive Program which offers cash incentives to teachers and students for passing scores on
AP exams as well as funds for training teachers and convening teams of teachers to align pre-AP
curriculum with the needs of the AP class Jackson identifies impact from variation in the timing
of program implementation across high schools in Texas and finds large positive treatment
effects on AP courses and exams (2010) The AP Incentive program also increased studentsrsquo
college going and persistence as well as their labor market earnings (Jackson 2010 2014) These
two studies indicate that the AP Incentive Program increased AP participation and subsequent
educational attainment and labor market performance However it is not clear whether these
results would hold in the absence of the Incentive Program
We build on these findings and inform policy and practice in several ways Most important
we directly test one of the main mechanisms through which AP is expected to influence studentsrsquo
4
attainment and earnings by increasing their skill and interest in the subject matter We determine
whether skill and interest gains as distinct from college admissions and credit-granting policies
are key drivers behind APrsquos impact on later outcomes This distinction is important given that
less than half of AP course-takers earn a credit-granting score on the AP exam either because
they do not take the exam or because they obtain low scores (National Research Council 2002
College Board 2018) Many selective colleges are also increasingly making it difficult for
students to receive credit for their AP exam scores Most top institutions restrict the number of
AP subject areas that are eligible only offer credit or waivers for very high scores on the exams
or cap the total amount of AP credit that a student can receive (Weinstein 2016) In 2012
Dartmouth College announced that it would no longer grant credit for any AP exam score a
policy shared by several other selective institutions including Amherst College Brown
University and the California Institute of Technology (Weinstein 2016) Our results which
generalize to a newly offered AP course suggest that AP endows students with human capital
even if it does not grant them the opportunity to earn credit at their preferred college For college
admissions officers the findings also suggest that AP course-taking offers a reasonable signal of
studentsrsquo skill and subject-matter interest Our estimated effects on skill and STEM interest are
somewhat limited by insufficient precision yet they represent the first and most credible
evidence to date on the impact of AP on these key outcomes
Our study is also among the first known to us that quantifies the AP impact on studentsrsquo
grades We find that students who take an AP science course earn lower grades in science (by
029 grade points) and lower grades in their other courses (by 018 grade points) The lower
grades in science are driven by the lower grade received in the AP class a negative effect that
many secondary and postsecondary institutions offset by upweighting AP grades The estimates
suggest that studentsrsquo AP science grade would have to be inflated by a factor of 146 (eg a C
would have to be converted to approximately a B+) to remove the net negative on overall grade
point average (GPA) While many high schools including those that participated in our study
weight studentsrsquo GPAs to adjust for the academic difficulty of the courses practices vary
substantially across institutions (Sadler and Tai 2007 Klopfenstein and Lively 2016) In a recent
survey of Texas high schools for instance Klopfenstein and Lively (2016) find that most
schools with AP courses used weights but that they ranged from 05 to 1 point (with a small
number assigning more than 1 extra point) Our findings suggest that the current practices at
many institutions under adjust for the grade penalty from AP courses In addition attaching
weight to AP grades cannot undo the learning loss that may occur when students shift their effort
away from non-AP coursework
We also contribute to other strands of literature on the relationship between studentsrsquo
academic achievement and their perceptions of their own confidence and stress Prior literature
on the relationship between studentsrsquo confidence in their ability and their true ability is rife with
mixed results (Boekaerts and Rozendaal 2010 Stankov and Crawford 1996 Stankov 2013)
Psychologists have also documented an inverted U-shaped relationship between perceived
pressure and performance where some amount of stress is necessary to increase achievement
yet too much stress can reduce studentsrsquo ability to gain knowledge (Anderson 1976 Davis 2014
Yerkes and Dodson 1908) We find that students taking an AP science class experience cognitive
gains concurrent with losses in their academic confidence This finding is consistent with
evidence that many US students are highly confident in their skills and that this noncognitive
belief often interferes with their ability to learn (Chiu and Klassen 2010 Stankov and Lee 2014)
The AP course appears to reduce studentsrsquo estimation of their own ability either by changing the
5
standard to which they compare themselves or by making them more aware of the challenges
they might face in a college course Whether these changes in perceived confidence persist and
how they influence later outcomes is uncertain Students with expectation levels that match the
real demands of college courses might eventually perform better in those courses Some students
might also use the insights they gain from a challenging AP science class to shift away from
difficult science courses in college (or entire majors) that could delay or hinder their college
completion Our results also suggest that AP causes a significant amount of stress for students
but we do not find evidence that the added pressure substantially limits their knowledge gains in
science
II AP Science and Conceptual Framework
A AP and Other Rigorous Secondary School Courses
The AP program is an appealing option for high school administrators who seek to offer college-
level courses to their students AP course descriptions and assignments are designed to match
those offered in introductory college courses in each subject and thus to prepare students for the
rigor of college coursework The College Board (the ldquoBoardrdquo for brevity) is a not-for-profit
organization that administers AP and provides professional development for teachers reviews of
course syllabi and extensive curricular materials (eg sample syllabi sample lab experiments)3
The Board also offers standardized AP exams in the spring of each year that are graded by
external examiners and provide an externally-validated measure of student learning Most exams
include both an essay or problem-solving component and multiple-choice questions all of which
are aligned with the course descriptions The exam is one of the key features of the AP program
and is used by high school and postsecondary educators to evaluate the depth of studentsrsquo skill
independently of teacher bias
In addition to AP courses high school students typically have three alternative options for
advanced coursework Most high schools offer ldquohonorsrdquo courses which are intended to provide a
more rigorous curriculum than the regular course in the same subject The content and rigor of
honors courses varies across high schools and there is no standardized honors exam offered to
students in these courses A second option is the International Baccalaureate (IB) program
which was originally designed for students in international schools and aims to develop
studentsrsquo critical thinking skills and their knowledge of international affairs The IB program is
offered worldwide but remains relatively uncommon in the United States with less than 5 percent
of high schools offering IB in 2016 (The IB Programme 2016) A final option is for students to
take a course at a nearby college (or online) or for some a course that is taught at their high
school by an instructor who has been approved as college-level These ldquodual enrollmentrdquo or
ldquodual creditrdquo courses are meant to provide students with the opportunity to simultaneously earn
high school and college credit In the most recent national survey high schools reported
approximately two million enrollments in dual credit courses (Thomas et al 2013) There is
limited information on the colleges that accept dual enrollment credits Most courses are offered
through collaborations between high schools and local community and public postsecondary
institutions suggesting that credits are generally accepted at these institutions and less often
accepted at other institutions Comparisons of AP science classes to regular and honors level
science classes reveal that students receive much more homework and work harder in their AP
classes (Sadler et al 2014) To our knowledge there have been no comparisons of the workload
or effort in AP science courses compared to IB or dual enrollment science courses
6
B Conceptual Framework
There are several channels through which an AP science class is expected to influence studentsrsquo
cognitive and noncognitive skills Much like the ideal college course AP science is designed to
provide rigorous content and a substantial workload be taught by teachers who have high
expectations and consist of students who are driven to succeed These inputsmdashcourse rigor
teacher expectations and peer motivationmdashare often thought of as the main characteristics that
distinguish AP courses from other high school courses
Yet AP science classes are also intended to offer an inquiry-based approach to science that
when combined with a high level of rigor provides an additional causal pathway to change
Specifically a well-implemented AP science course should encourage students to ask questions
gather and interpret data arrive at explanations grounded in scientific principles and
communicate their observations to one another under the guidance of teachers (College Board
2011a 2011b)4 This student-led inquiry-based approach differs from many traditional
secondary school science classrooms where the goal is often for students to memorize content
and replicate laboratory experiments that demonstrate the content (National Research Council
2002 2012) The AP science course in contrast seeks to expose students to the real-world
practices of science and the skills that form the basis of scientific inquiry by focusing more on
big picture concepts and small group experimentation with students directing the inquiry The
curriculum also encourages teachers to move away from lecture-based pedagogy and multiple-
choice quizzes and to increase their use of technology to help students analyze data draw
interpretations and communicate findings (College Board 2011a 2011b)
AP science classes are expected to increase studentsrsquo ability to ask research questions design
experiments analyze data and draw conclusions In the process of gaining these scientific
inquiry skills the new curriculum is intended to spur greater interest in the practice of science
because it becomes more enjoyable and more accessible to students for whom rote memorization
and execution of prefabricated lab experiments might have diminished enthusiasm in the subject
(National Research Council 2012) Science experts posit that inquiry-based science courses will
be particularly successful in generating greater interest and skill among women and among
students from underrepresented minority groups (Aguilar Walton and Wieman 2014 Ellis
Fosdick and Rasmussen 2016 Kurth Anderson and Palincsar 2002 Leslie et al 2015 Litzler
Samuelson and Lorah 2014)
While the rigor and expectations of a college course may be appropriate for some students it
can be too demanding for others Students often report high levels of stress and burnout from
taking AP courses particularly if they perceive that they are not prepared for the challenge of
college coursework (Kim 2015 Marx 2014 Tucker 2012) A strenuous AP course could in fact
cause students to lose confidence in their ability to complete college science courses A number
of mechanisms could cause students to lose confidence including exposure to stronger peers
inability to successfully complete assignments or simply receiving lower grades than they
received in their non-AP courses5 The AP effect on confidence will likely matter differently for
students with different levels of initial confidence For students who are over-confident in their
ability to succeed in college science courses taking a challenging AP course in high school
might cause them to revise their expectations to be more in line with the higher demands of
college-level work
Taking a more strenuous AP course is also likely to affect studentsrsquo time allocation
Studentsrsquo performance in each class will be determined by their subject-specific ability as well as
the amount of time they devote to their coursework versus other activities including work
7
extracurricular and leisure If AP courses are more demanding than other courses students
solving a time allocation problem may shift more effort into their AP course away from other
pursuits The impact of this change in time allocation on studentsrsquo performance in AP and other
courses will depend upon whether they shift effort away from other courses and on the degree of
complementarity between their AP science course and their other courses Study time devoted to
an AP science course could improve student performance in other math and science classes
(where the skills tasks and knowledge are similar) even if students spend less time on those
courses For courses that require students to perform tasks that are not complementary with AP
science (eg courses in the humanities) taking AP science concurrently with these courses
could decrease student performance in both courses Of course students taking an AP course
could choose to reduce time spent on alternative (non-academic) activities If these other
activities have no causal impact on performance in school then the impact on overall
achievement could be negligible
Some students report concerns about their time allocation as they weigh the decision to enroll
in AP (Foust Hertberg-Davis and Callahan 2009 Hopkins 2012 Kim 2015) Many of these
concerns have increased over time as the courses have become more accessible to students who
previously faced barriers to enrollment Traditionally teachers only recommended AP courses to
students with high grades in prerequisite classes and the courses were only offered in schools
with substantial resources The Board has made efforts to increase access with for instance a
policy statement that encourages schools to open AP to all students who are ldquowilling to accept
the challengerdquo and remove all barriers that restrict access (College Board 2002)6 In a 2008
survey of a nationally-representative sample 65 percent of secondary school teachers reported
that their schools encourage as many students as possible to take AP and 69 percent reported that
AP courses are generally open to any student who wants to enroll (Duffett and Farkas 2009)
These open access policies have led to complaints that students who enroll with less preparation
will be unable to engage in the material (and perhaps become more discouraged by the
difficulty of the course) than students with more prior preparation (Hopkins 2012 Steinberg
2009 Duffett and Farkas 2009) Open access could also adversely affect more prepared students
through negative peer effects or through teachers removing content and slowing the pace of
course delivery
III AP Science Impact Study
A Overview
We recruited 23 schools from across the United States and offered monetary compensation to
pay for equipment and teacher training and as an incentive to secure participation7 Eligible
schools included ones that had not offered AP Biology or AP Chemistry in recent years were
willing to add such a course and comply with study protocol and had more eligible students than
could be served in one class so as to supply a sufficiently-sized control group8 Of the 23
schools 12 schools added AP Chemistry 10 schools added AP Biology and 1 school added both
courses We recruited two waves of schools (those that offered the course for the first time in
2013 and those that offered it for the first time in 2014) both waves were asked to field the
course for two years and the earlier-joining schools had the option of fielding the course for
three years The study includes 47 schools by cohort groups
Each participating school identified students that the school deemed eligible to take the new
AP Biology or Chemistry course in the spring of the prior year We treated all eligible students
8
who assented to participate in the study and who obtained consent from their parent or guardian
as study participants Upon receipt of signed consentassent forms we randomly offered
enrollment in the newly launched course to a subset of participating students9 The study
includes a total of 27 teachers and 1819 students (with an average of approximately 19 students
per AP class)
Figure 1 shows the geographic distribution of the 11 participating districts which are
primarily concentrated in the western southern and eastern regions of the country10 The
underrepresentation of districts in the Midwest is consistent with evidence that the Midwestern
region has experienced less competition over the years in access to selective postsecondary
institutions and a corresponding lag in AP participation rates (Bound Hershbein and Long
2009) Relative to districts across the nation those participating in the study tend to be in
neighborhoods with lower levels of socioeconomic status and to educate students who score
below average on tests in earlier grades (see Figure 2) Correspondingly participating schools
tend to be larger and more likely to educate students who are eligible for free or reduced-price
lunch Black and Hispanic than other schools (Panel A of Table 1)
There are two reasons for this over-representation of larger schools serving less economically
prosperous communities First AP courses are already offered in the majority of the nationrsquos
public high schools and schools that serve students from high-income families tend to offer
more AP subjects than schools that serve students from lower-income families (Malkus 2016
Theokas and Saaris 2013) Given that our research design only allowed for schools that had not
recently offered an AP science course the population of schools from which we recruited tended
to be those in settings with fewer resources Second participating schools were required to state
that they believed they would have 60 or more students who were qualified to take the AP
science course and this requirement tended to disqualify smaller high schools
Reflecting the school demographics participating teachers are slightly younger less
experienced and more likely to be female Black Asian American and of Hispanic ethnicity
than US high school science teachers generally (Panel B of Table 1) Nearly half (a third) of our
study teachers have less than or equal to five (two) years of teaching experience which is more
than double (triple) the rate of US high school science teachers Study teachers are more likely to
hold an undergraduate major in a STEM field than other high school science teachers yet far less
likely to hold a mastersrsquo degree and slightly less likely to have earned a teaching credential in
science Most of the participating teachers had previously taught a higher-level course (mostly
honors) yet only 47 percent of them had previously taught an AP course Our research
consequently applies to a population of teachers who are relatively new to the AP science
curriculum and who have generally not received graduate training11 Assuming AP courses
improve with teacher preparation our results likely capture the effect of a less-than-ideal version
of AP and may result in less positive treatment effects than when AP is delivered by teachers
with more training and experience (Clotfelter Ladd and Vigdor 2010)
B Data and Student Descriptive Statistics
We rely on three primary and secondary data sources for impact estimates The first is an
assessment developed and validated by the research team that measures studentsrsquo scientific
inquiry skills We administered this assessment to students in both treatment and control groups
and designed it to measure general inquiry skills (eg how to analyze data) rather than specific
content knowledge in Biology or Chemistry To that end the assessment tool includes nine items
that rely on science disciplinary knowledge that is taught in middle school specifically material
from Life Sciences and Physical Sciences The assessment which we administered to all study
9
participants during a 45-minute period measures studentsrsquo skills in data analysis scientific
explanation and scientific argument12 Participating teachers were not provided copies of the
instrument in advance therefore teachers were unable to teach any content material prior to test
administration
The second source is a questionnaire that we administered concurrently with the assessment
and that asks students a number of questions about their most recent science class and their plans
after high school The assessment and questionnaire were completed together and administered
outside of class (henceforth we refer to these instruments as the ldquosurveyrdquo) The third data source
are studentsrsquo high school transcripts which contain data on demographic and socioeconomic
background grades courses standardized exams taken in the 8th and 10th grades as well as high
school completion We use these data to determine the balance of randomization on pre-
treatment covariates estimate the effect of randomization on course-taking (including
compliance) improve the precision of our estimates with statistical controls and estimate
treatment effects on studentsrsquo grades
Our survey response rate was 78 percent13 Attrition can be attributed to student absences
during the dates scheduled for survey administration and communication lapses between school
coordinators and students Students who were randomly assigned to treatment have a 9-
percentage point higher survey response rate Given the possibility of nonrandom sample
attrition we weight all regressions by the inverse of the probability of completing the survey
conditional on student characteristics14 We implement a variety of robustness checks as
additional means to account for nonresponse These include multiple imputation of missing
outcome variables excluding one high school that had a low response rate and using the Lee
(2009) technique to provide bounds on the estimated effects These methods and results are
discussed below
We supplement these data with surveys that we administered online to teachers of the new
AP courses at the conclusion of the course The teacher survey includes questions about their
educational background professional experiences and professional development past and
present instructional practices generally and around science specifically participation in the
College Board AP training ability to cover the content of the AP course and coaching
mentoring and other professional community supports received from the school district and
education community
Table 2 provides balancing tests on pre-treatment characteristics for the full sample and the
survey respondents conditional on school by cohort fixed effects15 Most of the estimated
differences between treatment and control group students on pre-treatment observed
characteristics are small with some notable exceptions In both the full and survey samples
treatment group studentsrsquo reading exam scores were 010 and 009 standard deviations higher
than control group students both at p-values below 005 The magnitude of the treatment-control
difference was slightly lower and less precisely-estimated in math yet also favored treatment
group students16 To adjust for these chance imbalances we include all student covariates as
predictors of outcomes in the models and in the robustness checks we exclude these
covariates17
Table 2 also shows the extent of differences between control group compliers and non-
compliers We find that non-compliers are generally much more academically prepared for AP
science they have higher pre-treatment reading and math test scores and are more likely to have
completed the prerequisite courses On demographics non-compliers are more likely to be Asian
American and female18
10
IV Empirical Strategy
We estimate the effect of taking the AP science course with a standard instrumental variable
specification
(1) 119884119894119895 = 120572119895 + 119860119894119895120573 + 119935119894120574 + 120598119894119895
(2) 119860119875119894119895 = 120575119895 + 119874119891119891119890119903119890119889119894119895120579 + 119935119894120583 + 120598119894119895
where 119860119875119894119895 = 1 if student i enrolled in the AP science course in school x cohort stratum j 119860119894119895 is
the fitted value based on the estimates of the parameters in Equation (2) Offeredij = 1 if the
student is randomized into the treatment group Xi is a vector of pre-treatment covariates
(including age math and reading exam scores from 8th and 10th grade (standardized and
averaged for math and reading separately) cumulative GPA prior to the year when the AP
science course was offered and indicator variables for female racial group (Asian American
Black or Hispanic Native American or Multiracial) disability gifted English Language
Learner eligible for free or reduced-price lunch home language is not English and took
recommended prerequisite courses) and 120572119895 and 120575119895 are school by cohort fixed effects19 We use
two-stage least squares to estimate the model for all outcomes The local average treatment effect
(LATE) estimate is given by β
The intent to treat (ITT) estimate is obtained by replacing 119860119894119895 with Offeredij in Equation (1)
as shown in Equation (3) The coefficient on Offeredij in Equation (3) provides the effect of
being offered enrollment in the new AP science course and is a weighted average of effects on
those who do and do not choose to enroll in the course
(3) 119884119894119895 = 120577119895 + 119874119891119891119890119903119890119889119894119895120591 + 119935119894120582 + 120598119894119895
For outcomes that are obtained from the survey we weight regressions by the inverse of the
estimated probability of completing the survey20 The results are similar without using these
weights (see Online Appendix Tables 3 4 and 6) Since we have some missingness in student
characteristics as a result of either missing student transcripts or certain data elements not
collected by the district we use multiple imputation by chained equations creating 10 imputed
datasets and combine the results21 For inference we cluster standard errors at the level of
treatment assignment (school by cohort) in our analysis of main effects In the analysis of
robustness we report permutation standard errors robust standard errors (for comparison to
permutations) and the statistical significance of the LATE estimates after adjusting our tests of
significance for multiple comparisons
V Results
A Course-Taking and Treatment Contrast
Table 3 provides estimated effects of the randomized offer of enrollment on AP science course
enrollment and share of credits in all courses for the full sample and the survey samples The
first-stage estimates indicate that the offer substantially increased the likelihood of the student
taking the AP science course by 38 percentage points in the full sample and 39 percentage points
in the survey sample As we expected compliance with randomization was imperfect with 42
11
percent of the students who received an offer choosing not to enroll and 19 percent of the control
students enrolling Nearly all of these latter crossovers reflected decisions by the district to
violate the study protocol and let control group students into the course while a few of these
came from hardship exemptions that were requested by the school and granted by the study team
The remaining rows in Table 3 shine light on the courses that were crowded out by the newly
offered AP science course Mechanically treatment group students took more credits in AP
science (an 11-percentage point increase in the share of total credits in the full sample)
Treatment group studentsrsquo share of courses in any AP also increased by 11 percentage points
indicating that they chose not to reduce enrollment in other AP courses Instead taking AP
science appears to have crowded out regular courses (down 9 percentage points) including
regular science courses (down 2 percentage points)22
Approximately 78 percent of the control group compliers took any science course with 34
percent taking a non-AP advanced science course (almost entirely honors courses) during the
study year The control students who did not take AP Biology or Chemistry took a variety of
alternative science courses with the most commonly reported courses including Chemistry
(13) Physics (12) AP Environmental Science (11) Biology (10) Honors Biology (9)
and AnatomyPhysiology (9)
Table 4 provides the contrast in treatment and control group complier reports on the content
and rigor of their science courses for three composite variables We find that taking AP science
yielded a substantially more academically challenging curriculum (up 080 sd p-value lt 001)
and raised the extent of inquiry-based classroom activities (up 033 sd p-value = 006) Our
results also suggest that AP course-takerrsquos classrooms were more likely to use technology (up
028 sd p-value = 014)23 Online Appendix Table 5 shows estimated impacts on each of the
component variables used in constructing the composite variables We find that while AP
classrooms were more inquiry-based than other science classrooms using our composite
measure some of the core components of the inquiry approach that were intended by the Board
(eg applying knowledge to solve a new problem) were not more prevalent in AP science
classes than other science classes24 This contrast between studentsrsquo reports of the content and
rigor of their AP science course relative to other courses available to them offers one measure of
the relative quality of the treatment In a companion manuscript we provide a detailed evaluation
of implementation fidelity (the degree to which the courses were implemented as intended by the
Board) through teacher surveys course syllabi student transcripts and interviews with teachers
and school administrators (Long Conger and McGhee 2018) In that manuscript we find results
that are consistent with the finding that most teachers were able to implement a rigorous AP
science classroom yet they also struggled with the inquiry-based approach and integrating
technology into the classroom
These reported differences between treatment and control group classrooms also hold despite
the fact that many of the teachers selected to teach AP also teach the other science courses taken
by control group students In fact almost 67 percent of AP teachers reported using some of their
AP science strategies and lessons in their non-AP classes These within-school spillovers likely
attenuate observed differences in outcomes between treatment and control group students in the
same school25
B AP Impact on Outcomes
Table 5 reports estimated impacts of AP science on the key outcomes of interest We estimate
that for the typical complier taking AP science raises objectively measured scientific inquiry
skills by 023 standard deviations We are unable to rule out zero treatment impacts with
12
conventionally high levels of confidence (p-value = 014) and consequently refer to these results
as more suggestive than definitive AP science also increased compliersrsquo interest in pursuing a
STEM degree should they enroll in college by 9 percentage points up from a control group
complier mean of 62 percent with again more suggestive than definitive results at traditional
levels of statistical inference (p-value = 016)
Table 5 provides stronger evidence of negative treatment effects on studentsrsquo confidence in
their ability to succeed in a college science course Among control group compliers 92 percent
express that they are at least somewhat confident in their ability to succeed in a college science
course These high levels of confidence are perhaps not surprising since all of our sample
participants demonstrated interest in taking AP Chemistry or Biology as a result of signing the
study assent forms Taking AP science substantially lowered participantsrsquo likelihood of being at
least somewhat confident in their ability to complete college courses in science (down 10
percentage points p-value = 006) We also find large effects of the AP course on studentsrsquo self-
reported stress levels Among control group compliers 12 percent stated that their most recent
science class had a negative or strong negative impact on their stress levels (where a negative
impact indicates more stress) Taking AP science more than doubles this rate raising the
likelihood of stating a negative impact by 17 percentage points (p-value = 001) In results
available from the authors we also examine the effect of taking AP on the full distribution of
studentrsquos self-reported confidence and stress levels We find that taking AP science increases
studentsrsquo likelihood of reporting strong negative impacts on stress by 5 percentage points (p-
value = 005) above the control group complier mean of 2 percent
In addition to experiencing a loss in confidence and an increase in stress treatment group
studentsrsquo grades suffered We estimate that taking AP science reduced studentsrsquo grades in their
science courses by 029 points (p-value = 007) Relative to a control group complier mean of
280 taking AP science lowers studentsrsquo science GPAs during the study year (usually their junior
year) from around a B- to a C+26 This decline is addressed to some degree by high schools that
use a weighted grade point average to upweight grades from AP courses The last row of Table 5
provides our estimated effects of AP science on studentsrsquo grades in other courses AP science
takers score approximately 018 grade points lower than control group compliers in non-science
courses during the study year (p-value below 001) These results suggest that students may be
shifting their effort away from their non-AP classes in order to meet the demands of the
challenging AP course An average of these impacts weighted by studentsrsquo share of credits in
science during the study year assuming that they take AP science (024) suggests that taking AP
science lowers studentsrsquo overall grades by 021 during the year ((-029 times 024) + (-018 times
076))
With our estimates in hand we can easily compute the adjustment that would leave the
studentrsquos GPA during the study year unaffected For students who took AP Biology or Chemistry
as result of this experiment the share of their classes in any AP science subject is predicted to be
14 percent (ie 002 + 012 from Table 3) If these studentsrsquo grades in AP science courses were
boosted by 146 (021014) their GPAs during the study year would be unaffected by their
enrollment in these AP courses This 146 boost is close to the higher end of the practices
documented in Klopfenstein and Lively (2016)27
C Robustness Checks
Table 6 presents a variety of robustness checks of the ITT estimates on our six main outcomes
The first two columns of this table repeat the findings previously shown in Table 5 Columns (3)
and (4) present alternate methods for inference Column (3) reports robust standard errors and
13
Column (4) reports the results of a permutation test where we randomly assign a pseudo
treatment and compute the share of 1000 permutations where the absolute value of the estimated
pseudo treatment effect exceeds the absolute value of the estimated treatment effect shown in
Column (2)28 The resulting p-values from this permutation test are similar to the results using
robust standard errors (shown in Column (3)) resulting in five of the six outcomes with p-values
of less than 01029
Columns (5) through (7) of Table 6 show that the results are robust to (a) dropping the one
high school that offered both AP Biology and AP Chemistry as part of the study (b) including
observations with multiply-imputed missing outcome variables and (c) excluding the high
school with the lowest survey response rate30 Column (8) shows the results when we exclude all
of the Xi covariates where we find much larger estimated positive effects on scientific inquiry
skills and smaller estimated negative effects on grades The differences in the treatment effects
on the remaining three outcomes are modest These results likely reflect the fact that students
who were randomly assigned into the treatment group have higher pre-treatment grades and
reading and math test scores all covariates that strongly correlate with science skill and future
grades
Columns (9) through (12) of Table 6 use the Lee (2009) method to place bounds on our
estimates due to potential nonresponse bias in the student survey used for the first four outcomes
This method trims particular observations from the treatment group (in this case) until it matches
the response rate of the control group The lower (upper) bound estimate trims the treatment
observations with the highest (lowest) values of the outcome Using these lower and upper bound
estimates we compute the 95 percent confidence interval for the treatment effect itself by
applying the Imbens and Manski (2004) method Consistent with our main findings the upper
and lower bound points estimates are positive for science skill (003 and 039 sd) interest in
pursuing a STEM degree (2 and 12 percentage points) and stress (1 and 11 percentage points)
However the 95 percent confidence intervals overlap zero in all cases and are roughly double the
size of the ordinary confidence intervals These results suggest that some additional caution
should be considered in evaluating the effects from outcomes based on the study survey31
Finally we would have liked to report the results of theoretically motivated heterogeneity
analyses yet we lack the statistical power needed to test heterogeneity with a high level of
confidence For example Figure 3 shows a quantile regression conditional on Xi with science
skill as the outcome We find that the point estimates at every quantile are insignificantly
different from the 009 ITT point estimate reported in Table 5 yet the 95 confidence intervals
fail to rule out large positives and negatives Additional heterogeneity results can be found in the
Online Appendix32
VI Conclusion
Most admissions committees at bachelorrsquos degree-granting institutions rely on applicantsrsquo AP
course and exam participation as signals of subject-matter skill and interest rendering the
relationship between AP uptake and college enrollment somewhat deterministic There has been
almost no empirical work to support the theory that AP disproportionately endows high school
students with greater human capital than the other courses available to them Many students
educators and parents have also complained that the rigor of the AP pro- gram causes students to
lose confidence gain stress and perform poorly in other courses We evaluate these claims with
experimental evidence on the impact of AP Biology and Chemistry courses on studentsrsquo skills
14
interests and beliefs We recruited 23 schools that had not previously offered AP Biology or
Chemistry and were willing to permit us to randomize student access to the newly offered
course At the time of our school recruitment an estimated 50 percent of US high schools
already offered AP science classes and they tended to be in relatively higher-income
communities disproportionately serving White students (Malkus 2016) Our study drew from the
remaining population of schools where teachers had lower levels of training than science
teachers nationally and students were disproportionately non-White and poor Consequently our
results on AP impacts best generalize to schools like these that are on the cusp of deciding
whether to offer an AP science course
The estimates suggest that AP science led to improvements in science skill and STEM
interest above the courses that these students would otherwise take Prior research points to
longer-run benefits of AP including a higher likelihood of college enrollment and completion as
well as possible earnings gains (Jackson 2010 2014) Our findings suggest that these long-term
effects are at least partially driven by genuine increases in skill and not due solely to
postsecondary admissions and credit-granting policies33 We also find that AP science classes
substantially increase studentsrsquo stress levels and reduce their confidence in completing a college
science course Students who take AP science also receive lower grades in science and in other
(non-science) courses The cognitive gains from AP science are consistent with evidence that
higher levels of pressure and a lower level of confidence cause students to learn more than they
would otherwise And some of the negative effect on grades can be offset by upwardly weighting
grades in advanced courses
Although we have no direct way to convert our study impacts into monetary values for
students or society our evidence suggests that schools and districts are not making unwise or
costly investments in AP Calculating the differential cost to deliver an AP course versus another
level course in the same subject is difficult given that few schools document per-course
expenditures One recent analysis of a US district that relied on teacher salaries and course
assignments offers a partial cost-analysis Roza (2009) finds approximately $360 more in per-
pupil expenditures to deliver AP versus honors due primarily to smaller class sizes and more
senior teachers in AP This cost does not factor in the time that teachers spend retraining
themselves to teach the new curriculum At the same time relative to other policies aimed at
increasing human capital in high school that are often more costly to implement (such as
reducing class size) offering an AP course may be one of the least expensive options
This study offers the first credible estimates on the impact of a curriculum that is now offered
in the majority of the nationrsquos high schools and used by most postsecondary institutions to assess
applicant potential Our findings offer evidence to support and refute some of the claims made
about the AP program At the same time many important questions remain about differential AP
course impacts along student teacher and school attributes and on different parts of the outcome
distributions What are the general equilibrium effects of AP expansion for instance on college
admissions decisions as AP expands into schools with fewer resources Do AP courses generate
spillover effects on non-AP course-takers via changes in peer interactions and changes in how
teachers teach their non-AP classes These are all questions that warrant further research
15
References
Abadie Alberto Susan Athey Guido W Imbens and Jeffrey Wooldridge 2017 ldquoWhen Should
you Adjust Standard Errors for Clusteringrdquo NBER Working Paper No 24003
Cambridge MA NBER
Adelman Clifford 2006 The Toolbox Revisited Paths to Degree Completion from High School
Through College Washington DC US Department of Education
Aguilar Lauren Greg Walton and Carl Wieman 2014 ldquoPsychological Insights for Improved
Physics Teachingrdquo Physics Today 67 (5) 43ndash49
Altonji Joseph G 1995 ldquoThe Effects of High School Curriculum on Education and Labor
Market Outcomesrdquo The Journal of Human Resources 30 (3) 409ndash438
Anderson Carl R 1976 ldquoCoping Behaviors as Intervening Mechanisms in the Inverted-U-stress-
performance Relationshiprdquo Journal of Applied Psychology 61 (1) 30ndash34
Attewell Paul and Thurston Domina 2008 ldquoRaising the Bar Curricular Intensity and
Academic Performancerdquo Educational Evaluation and Policy Analysis 30 (1) 51ndash71
Avery Christopher Oded Gurantz Michael Hurwitz and Jonathan Smith 2018 ldquoShifting
College Majors in Response to Advanced Placement Exam Scoresrdquo Journal of Human
Resources 53 (4) 918ndash956
Benjamini Yoav and Yosef Hochberg 1995 ldquoControlling the False Discovery Rate A Practical
and Powerful Approach to Multiple Testingrdquo Journal of the Royal Statistical Society 57
(1) 289ndash300
Bennett J S Hogarth F Lubben B Campbell and A Robinson 2010 ldquoTalking Science The
Research Evidence on the Use of Small Group Discussions in Science Teachingrdquo
International Journal of Science Education 32 (1) 69ndash95
Berger Joe 2006 ldquoDemoting Advanced Placementrdquo The New York Times October 4
Boekaerts Monique and Jeroen S Rozendaal 2010 ldquoUsing Multiple Calibration Indices in
Order to Capture the Complex Picture of What Affects Studentsrsquo Accuracy of Feeling of
Confidencerdquo Learning and Instruction 20 (5) 372ndash382
Bound John Brad Hershbein and Bridget Terry Long 2009 ldquoPlaying the Admissions Game
Student Reactions to Increasing College Competitionrdquo The Journal of Economic
Perspectives 23 (4) 119ndash146
Bowie Liz 2013 ldquoMaryland Schools have been Leader in Advanced Placement but Results are
Mixedrdquo The Baltimore Sun August 17
Bush George W 2006 ldquoState of the Union Address by the Presidentrdquo Washington DC The
White House
Chiu Ming Ming and Robert M Klassen 2010 ldquoRelations of Mathematics Self-Concept and its
Calibration with Mathematics Achievement Cultural Differences among Fifteen-year-
olds in 34 Countriesrdquo Learning and Instruction 20 (1) 2ndash17
Clotfelter Charles T Helen F Ladd and Jacob L Vigdor 2010 ldquoTeacher Credentials and
Student Achievement in High School Across-Subject Analysis with Student Fixed
Effectsrdquo Journal of Human Resources 45 (3) 655ndash681
College Board 2002 Equity Policy Statement New York NY
__________ 2011a AP Biology Curriculum Framework 2012-2013 New York NY
__________ 2011b AP Chemistry Curriculum Framework 2013-2014 New York NY
__________ 2017a AP Course and Exam Redesign New York NY
__________ 2017b AP Course Audit New York NY
__________ 2018 AP Program Participation and Performance Data 2018 New York NY
16
Davis Jennifer R 2014 ldquoA Little Goes a Long Way Pressure for College Students to Succeedrdquo
Journal of Undergraduate Research 12 (1) 1ndash9
Dobbie Will and Roland G Fryer Jr 2015 ldquoThe medium-term impacts of high-achieving
charter schoolsrdquo Journal of Political Economy 123 (5) 985ndash1037
Dougherty Chrys and Lynn Mellor 2009 ldquoPreparation Mattersrdquo National Center for
Educational Achievement Washington DC
Dounay Zinth Jennifer 2016 ldquo50-State Comparison Advanced Placement Policiesrdquo Education
Commission of the States
Drew Christopher 2011 ldquoRethinking Advanced Placementrdquo The New York Times January 7
Duffett Ann and Steve Farkas 2009 ldquoGrowing Pains in the Advanced Placement Program Do
Tough Trade-offs Lie Aheadrdquo Thomas B Fordham Institute Washington DC
Ellis Jessica Bailey K Fosdick and Chris Rasmussen 2016 ldquoWomen 15 Times More Likely to
Leave STEM Pipeline after Calculus Compared to Men Lack of Mathematical
Confidence a Potential Culpritrdquo PLOS ONE 11 (7) 1ndash14
Foust Regan Clark Holly Hertberg-Davis and Carolyn M Callahan 2009 ldquoStudentsrsquo
Perceptions of the Non-academic Advantages and Disadvantages of Participation in
Advanced Placement Courses and International Baccalaureate Programsrdquo Adolescence
44 (174) 289ndash312
Geiser Saul and Veronica Santelices 2004 ldquoThe Role of Advanced Placement and Honors
Courses in College Admissionsrdquo Center for Studies in Higher Education Research
Occasional Paper Series CSHE404
Goodman Joshua Samuel 2012 ldquoThe Labor of Division Returns to Compulsory Math
Courseworkrdquo Unpublished Manuscript
Harel O 2009 ldquoThe Estimation of R-squared and Adjusted R-squared in Incomplete Data Sets
Using Multiple Imputationrdquo Journal of Applied Statistics 36 (10) 1109ndash1118
Hippel Paul T von 2007 ldquoRegression with Missing Ys An Improved Strategy for Analyzing
Multiply Imputed Datardquo Sociological Methodology 37 (1) 83ndash117
Holstead Michael S Terry E Spradlin Margaret E McGillivray and Nathan Burroughs 2010
ldquoThe Impact of Advanced Placement Incentive Programsrdquo Center for Evaluation and
Education Policy Indiana University Education Policy Brief 8(1)
Hopkins Katy 2012 ldquoWeigh the Benefits Stress of AP Courses for Your Studentrdquo US News
the World Report May 10
Huber Martin 2013 ldquoA Simple Test for the Ignorability of Non-compliance in Experimentsrdquo
Economics Letters 120 (3) 389ndash391
Imbens G and F Manski 2004 ldquoConfidence Intervals for Partially Identified Parametersrdquo
Econometrica 72 (6) 1845ndash1857
Jackson C Kirabo 2010 ldquoA Little Now for a Lot Later A Look at a Texas Advanced Placement
Incentive Programrdquo Journal of Human Resources 45 (3) 591ndash639
__________ 2014 ldquoDo College-Preparatory Programs Improve Long-Term Outcomesrdquo
Economic Inquiry 52 (1) 72ndash99
Joensen Juanna Schroslashter and Helena Skyt Nielsen 2009 ldquoIs there a Causal Effect of High
School Math on Labor Market Outcomesrdquo Journal of Human Resources 44 (1) 171ndash
198
Kim Emily 2015 ldquoAP Classes often Translate to Advanced Pressurerdquo Los Angeles Times
September 22
Klopfenstein Kristin and Kit Lively 2016 ldquoDo Grade Weights Promote More Advanced
17
Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324
Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement
Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891
__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and
Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds
Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188
Cambridge Harvard Education Press
Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla
Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)
287ndash 313
Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on
Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102
Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations
of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347
(6219) 262ndash265
Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math
and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic
Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student
STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher
Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking
on Secondary and Postsecondary Successrdquo American Educational Research Journal 49
(2) 285ndash322
Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP
Expansion Can Schools in Less-Resourced Communities Successfully Implement
Advanced Placement Science Coursesrdquo Conditionally accepted by Educational
Researcher
Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo
American Enterprise Institute Washington DC
Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23
McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy
Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of
Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-
144) US Department of Education Washington DC National Center for Education
Statistics
National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of
Mathematics and Science in US High Schoolsrdquo Washington DC National Academies
Press
__________ 2012 A Framework for K-12 Science Education Practices Crosscutting
Concepts and Core Ideas Washington DC The National Academies Press
Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC
Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data
Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures
Version 10 Stanford University
Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic
Analysis amp Policy 4 (1) 1ndash30
18
Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The
Review of Economics and Statistics 86 (2) 497ndash513
Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)
Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of
Advanced High School Coursework in Increasing STEM Career Interestrdquo Science
Educator 23 (1) 1ndash13
Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework
in College Admission Decisionsrdquo College and University 82 (4) 7ndash14
Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan
Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific
Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo
Educational Measurement Forthcoming
Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where
it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor
Economics 35 (1) 67ndash147
Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An
Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732
Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual
differencesrdquo Personality and Individual Differences 21 (6) 971ndash986
Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of
Cross-Cultural Psychology 45 (5) 821ndash837
Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid
Growthrdquo The New York Times April 29
Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo
Liberal Education 94 (3) 38ndash43
The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo
Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo
Education Trust June 5
Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and
Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-
001) US Department of Education Washington DC National Center for Education
Statistics
Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13
Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate
US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the
Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced
Placement Testsrdquo Washington DC
Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of
Advanced Placementrdquo Progressive Policy Institute Washington DC
West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth
Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring
Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation
and Policy Analysis 38 (1) 148ndash170
Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity
of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482
19
Figure 1
Geographic Distribution of Participating Districts
20
Figure 2
Participating Districts Neighborhood Socioeconomic Status and School Test Scores
Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school
district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos
neighborhood defined as the first principal component factor score based on measures of median
income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed
household rate and unemployment rate Y-axis is the districtrsquos average test score in grade
equivalents based on the averaged spring math and English scores for students in grades 3-8 for
2009-2013 with the expected level of achievement standardized to zero The size of each circle
is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using
Statarsquos default settings and roughly shows the predicted test score as a function of the
neighborhoodrsquos SES
21
Figure 3
Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile
Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects
Corresponding OLS estimate shown by the dashed horizontal line Science skill has been
standardized to have a mean of 0 and SD of 1 for the full sample of participating students
Results are weighted by the inverse probability of completing the survey
22
Table 1
Participating Schools and Teachers Compared to Other US High Schools and High School
Science Teachers Panel A Schools Participating Others
Average Enrollment 1409 723
Free or Reduced-Price Lunch 0700 0438
Asian 0055 0050
Black 0349 0154
Hispanic 0410 0221
White 0164 0537
Adjusted Cohort Graduation Rate 0843 0802
District Instruction Expenditures Per Pupil $6561 $5636
District Student Services Expenditures Per Pupil $3787 $3385
Panel B Teachers Participating Others
Age Under 30 0407 0160
Age 30-49 0432 0553
Age 50 or over 0161 0287
Female 0630 0536
Hispanic or Latino 0111 0051
Race American Indian or Alaska Native 0000 0009
Race Asian American 0111 0041
Race Black 0111 0060
Race Native Hawaiian or other Pacific Islander 0000 0004
Race White 0778 0896
Years of Experience 103 132
Years of Experience lt=2 0290 0085
Years of Experience lt=5 0481 0234
Hold a Teaching Certificate 0926 0945
Undergraduate Major in STEM 0944 0747
Single Subject Credential in Science 0630 0823
Masterrsquos Degree or Higher 0356 0615
Previously Taught AP Course 0469 NA
Previously Taught AP IB or Honors Course 0796 NA
Number of Professional Development Trainings 309 NA
in the Past 5 years (0-5)
Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts
httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public
high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a
9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the
Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey
httpsncesedgovsurveyssass Others in Panel B refers to public and private high school
teachers in the US High school science teachers are defined as teachers of grades 9-12 whose
main teaching assignment is in the natural sciences
23
Table 2
TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics
(1) (2) (3) (4) (5) (6)
Full Sample Survey Sample
Pre-Treatment Characteristic
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Age as of October of 11th Grade 166 -003 -007 166 -001 -001
(002) (007) (003) (009)
[019] [035] [065] [094]
Math Exam Score 038 008 025 044 007 030
(004) (010) (005) (016)
[008] [002] [017] [006]
Reading Exam Score 029 010 018 036 009 017
(003) (012) (004) (017)
[000] [014] [002] [031]
HS Grade Point Average 316 005 020 323 006 013
(003) (008) (003) (010)
[014] [002] [006] [020]
Female 059 000 010 061 -001 011
(003) (006) (004) (007)
[099] [010] [073] [012]
Asian American 012 002 010 012 003 010
(002) (005) (001) (007)
[027] [006] [007] [012]
Black 032 -002 -006 027 000 -005
(002) (006) (002) (005)
[029] [028] [088] [040]
Hispanic Native American or Multiracial 031 001 005 033 001 005
24
(002) (006) (002) (007)
[055] [041] [081] [051]
Disabled 002 000 -001 001 000 -001
(001) (001) (001) (001)
[093] [024] [057] [05]
Gifted 013 003 000 014 002 001
(002) (005) (002) (009)
[006] [100] [025] [089]
English Language Learner 005 001 002 004 001 004
(001) (002) (001) (003)
[041] [039] [054] [022]
Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007
(002) (007) (003) (009)
[066] [077] [072] [045]
Language Other than English Spoken at Home 034 002 003 035 001 004
(002) (007) (002) (007)
[032] [073] [059] [056]
Took Recommended Prerequisite Courses 079 000 009 079 002 005
(002) (004) (002) (005)
[084] [004] [043] [031]
Number of Observations 1819 1417
Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by
School x Cohort are in parentheses and p-values are in brackets
25
Table 3
First Stage Impacts on AP Course Enrollment and Overall Course Enrollment
(1) (2) (3) (4) (5) (6)
Full Sample Survey Respondents
Outcome
Control
Group
Mean
ITT
LATE
Control
Group
Mean
ITT
LATE
AP Treatment Course Enrollment 019 038 024 039
(005) (006)
[000] [000] Share of Credits During Study Year in
AP Science 003 004 011 003 004 010
(001) (001) (001) (001)
[000] [000] [000] [000]
All AP 013 004 011 014 004 010
(001) (002) (001) (002)
[000] [000] [000] [000]
Other Advanced Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [023] [020] [020]
All Other Advanced 025 -001 -003 025 -001 -003
(001) (002) (001) (003)
[023] [023] [030] [030]
Regular Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [020] [024] [019]
All Regular 062 -003 -009 061 -003 -007
(001) (003) (001) (003)
[002] [000] [007] [003]
Number of Observations 1819 1417
Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating
Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation
(1) Course-taking information collected from student transcripts Control Group Mean uses the
full control group for the first outcome (ie AP Treatment Course Enrollment) and those control
group members who complied with their assignment (ie those who did not take the AP
Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are
weighted by the inverse probability of completing the survey Standard errors clustered by School
x Cohort are in parentheses and p-values are in brackets
26
Table 4
Treatment Contrast (Composite Variables)
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Academically Challenging Curriculum -033 031 080
(010) (024)
[000] [000]
Project-Based Independent Classroom
Activities -006 013 033
(007) (017)
[007] [006]
Integrated Use of Technology
-011 011 028
(008) (019)
[019] [014]
Number of Observations 1417
Notes To construct these composite variables we first converted the values on each component
variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest
category was set to 10 the lowest to 00 and the remaining categories evenly spaced between
00 and 10 We then averaged and standardized these converted values Results are weighted by
the inverse probability of completing the survey Online Appendix Table 5 provides the list of
component variables Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
27
Table 5
AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Science Skill -019 009 023
(006) (016)
[015] [014]
STEM Interest 062 004 009
(002) (007)
[016] [016]
Confidence in College
Science 092 -004 -010
(002) (005)
[011] [006]
Stress 012 007 017
(003) (007)
[002] [001]
Grades in Science Courses 280 -012 -029
(007) (016)
[008] [007]
Grades in Other Courses 314 -007 -018
(002) (006)
[000] [000]
Number of Observations 1819 for grades 1417 for other
outcomes
Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of
participating students STEM interest =1 if high or some interest in pursuing a STEM degree or
=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to
complete a college science course or =0 if somewhat not confident or not at all confident Stress=
1 if most recent science course had strong negative or negative impact on physical or emotional
health or =0 if strong positive impact positive impact or no impact Grades in science and other
courses are obtained from student transcripts and measure grades during the study year
Results with the exception of grades during study year are weighted by the inverse probability of
completing the survey Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
Table 6
Robustness Checks of Main ITT Results
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Outcome
Control
Group
Complie
r Mean
Main
Result
s
Robus
t SE
p-value
(permutatio
n test)
Excludin
g High
School
56
Including
Imputatio
n of
Missing
Outcome
Variables
Excluding
Covariate
s
Excludin
g High
School
23
Lee
Lower
Boun
d
Lee
Upper
Boun
d
95
Confidence
Interval
from Lee
Bounds
Rati
o of
95
CI in
(11)
to
95
CI in
(7)
Science Skill -019 009 010 011 020 007 003 039
-
009
05
1 20
(006) (005) (000) (000) (000) (000) (007) (007)
[015] [006] [006] [020] [011] [001] [024] [072] [000]
STEM Interest 062 004 005 003 003 003 002 012
-
003
01
8 19
(002) (003) (000) (000) (000) (000) (003) (004)
[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College
Science 092 -004 -003 -006 -006 -004 -006 005
-
009
01
0 20
(002) (002) (000) (000) (000) (000) (002) (003)
[011] [005] [007] [037] [002] [003] [010] [000] [017]
Stress 012 007 005 006 008 007 001 011
-
005
01
5 16
(003) (002) (000) (000) (000) (000) (003) (002)
[002] [000] [000] [014] [007] [002] [002] [079] [000]
Grades in Science Courses 280 -012 -006 -010 -007 |
(007) (004) (000) (000) (000)
[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts
Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey
(002) (003) (000) (000) (000) |
[000] [001] [001] [000] [001] [038]
Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than
standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby
a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of
43
the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)
reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the
experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply
imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and
where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)
from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those
treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and
control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to
derive confidence interval for the treatment effect itself)
30
1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the
Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the
effect of advanced high school courses more generally often without distinctions between AP
and other rigorous course options Nearly all of these nonexperimental studies find large positive
effects of rigorous secondary school courses particularly those in math and science on studentsrsquo
high school postsecondary and labor market performance (eg Altonji 1995 Attewell and
Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long
Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer
an AP course are encouraged (though not required) to attend a professional development
training The Board and other independent agencies offer several workshops with the most
extensive training being the AP summer institute a week-long training that is led by an
experienced AP instructor Teachers are then expected to develop their syllabi for the course and
submit them to the Board for review A team of auditors at the Board review each syllabus and
grant permission to a school to label the course as AP on course catalogs and student transcripts
once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they
do not meet the requirements upon original submission College Board (2017b) contains a
discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for
assessment (ie course delivery and student performance are not assessed by the Board) In
order to effectively run an AP Biology or Chemistry course teachers require access to a well-
equipped classroom and laboratory including all supplies necessary to engage in
experimentation (eg beakers solutions microscopes measuring equipment) Most of the
teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of
learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry
reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses
were based upon recommendations from the National Science Foundation the National Research
Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently
influenced by their frames of reference in ways that other assessments of these traits (eg
external observations) may be less influenced By increasing the standard to which they compare
themselves studentsrsquo confidence may decrease This feature of most self-assessments could be
considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et
al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome
depends to some extent on how these changes in perceived ability influence other behaviors
such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and
Biology I and Chemistry I for AP Biology with no additional requirements beyond these
prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week
training course classroom supplies (eg lab materials textbooks) and to compensate schools
for the staff time required for study administration efforts We also offered $1000 compensation
for an individual selected by the school to serve as a liaison between the study team and the
31
school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of
students which would have powered the study to detect effect sizes smaller than those detected
here We faced several challenges in recruiting schools to participate even with the monetary
incentives Some schools were uncomfortable with randomization across classrooms while
others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the
course would be offered We also made some assignments on a rolling basis as additional
consentassent forms were submitted We have no information on the students who were deemed
eligible by the school to take the new AP science course but who did not sign the consent form
to participate As these students did not participate we do not have permission to obtain
information on their characteristics (eg via transcripts) and for most schools we do not know
the number of such students 10 Participating districts include Anaheim Union High School District California East Side
Union High School District California Lynwood Unified School District California Jefferson
Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg
Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public
Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville
Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate
degrees held by teachers nationally are likely to be in education (not STEM) Thus the study
teachers are less likely to have a graduate degree but not necessarily less likely to have STEM
training We also did not survey teachers regarding their Teach for America (TFA) experience
but it is possible that the relatively high share of STEM undergraduate degrees could be driven
by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last
pilot test included 140 students) prior to administering the tool to study participants Reliability
metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of
the development of the assessment instrument in the survey can be found in Seeratan et al
(2017) 13 Each year in the spring semester our team administered and collected the participant surveys
during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However
if study participants who did not take the survey differ in unobserved ways then our reweighting
based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo
characteristics before imputation of missing values (as described below) these results are very
similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact
that some students did not have 10th grade scores) we created one reading and math score for
each student that is the average of both scores or just the 8th grade score For the 23 participating
students who were in 10th grade during the year in which the AP course was offered to their
cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be
endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of
enrollment in the course so the lack of balance is simply due to unlucky randomization rather
32
than manipulation by school administrators We considered implementing a randomized block
design to avoid such issues but found it infeasible to obtain the necessary test score information
prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was
allowed to register for the new class We added an entire planning year to our study design to
avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by
Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We
find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these
six outcomes which suggests that generalizing our estimated treatment effects to the full control
group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly
offered enrollment in an AP course and then given the option of Chemistry or Biology To
account for the two courses offered we treat the school as two separate groups School-
Chemistry and School-Biology For those students who were not offered an AP course we
randomly assign them to one of two control groups proportional to the number of treated
students who chose each course For example if 60 of the treated students chose Biology then
we randomly assign 60 of the control students to the School-Biology control group In Section
VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a
probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1
if student i in school by cohort j completed any part of the end-of-year survey Xi is the same
vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed
effects and Φ() is the cumulative normal distribution function The results of this regression are
included in Online Appendix Table 2 Students who had higher pre-treatment grades Black
students those who were not disabled and those who took prerequisite courses were more likely
to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives
more weight in the regression to study participants who completed the survey and yet had pre-
study characteristics that were similar to those study participants who did not complete the
survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and
with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we
observe each outcome variable This follows a multiple imputation then deletion strategy
suggested by Hippel (2007) which improves efficiency while protecting against problematic
imputed outcome values As a robustness check Section VC provides results including
imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually
challenging curriculum with more homework than non-AP complier students Treatment group
students are also more likely to report that the students in their class were driven to succeed and
that the teacher set high standards The AP science class also involved more student-led projects
or experiments hands on learning and small group work all activities that are deemed to be
essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)
Yet we do not find strong evidence that students in AP classes were more likely to present what
they learned apply their knowledge to solve a new problem or work independently and none of
the component measures of technology usage were statistically significantly affected Nor did
33
treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear
better able to implement the academic rigor expected of an AP science class than some of the
inquiry-based approaches that the College Board intends for AP science We do not find
evidence that taking AP science led students to be more likely to report that they found their
course more interesting which may reflect the inability of the teachers to fully implement a
creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects
that might render our estimated effects smaller A research design with randomization both
across and within schools would allow for estimation of spillover effects but such a design was
infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers
received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the
weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors
in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple
outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons
(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same
three outcomes that reach statistical significance without applying the correction (shown in
Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys
from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we
have included the students from cohort 1 of high school number 23 where nonresponse was due
mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes
(including student prior academic preparation raceethnicity gender and teacher preparation)
We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in
science and grades in other courses) Some of the differences in the point estimates were quite
large yet so too were the standard errors For instance five of the seven estimated differential
treatment effects on science skill exceed 025 standard deviations with p-values that fall in the
suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse
on all three cohorts of study participants Once data collection is complete we will have the
ability to examine the effect of AP science on college enrollment college selectivity and college
completion
3
of AP Consistent with the goals of an AP course treatment group students report that their
courses are more challenging and inquiry-based than control group students These views are
shared by teachers who report a higher level of rigor in their AP science courses compared to
their other science courses We find suggestive evidence that this academic challenge leads to
increases in skill AP course-takers score 023 standard deviations higher than control group
compliers on the end-of-year assessment of scientific skill Though our precision prevents us
from ruling out zero treatment effects at traditional levels of statistical inference (p-value=014)
this large point estimate suggests genuine productivity gains for students who take AP science
over and above the gains experienced by students who enroll in other high school courses We
also find suggestive evidence of an AP science boost to studentsrsquo interest in pursuing a STEM
degree should they enroll in college Together these results fail to support the concern that the
AP programrsquos impact on human capital has been oversold
At the same time our results confirm that the workload and expectations of an AP science
class causes students to lose confidence in their ability to succeed in college-level science gain
stress and earn lower grades (prior to the weights that are often attached to AP grades by
secondary and postsecondary institutions) The confidence levels among study participants are
quite high with 92 percent of control group compliers reporting that they are ldquosomewhatrdquo or
ldquoextremelyrdquo confident in their ability to succeed in a college science course AP course-takers
report a 10-percentage point lower estimation of their ability Students in the AP course are also
more than twice as likely as control group compliers to report that the course negatively affected
their physical or emotional health (our measure of stress) And comparisons of transcripts reveal
that treatment group students earned lower preweighted grades in science and other subjects
during the year that they took the AP class
Our study contributes to a small research base on the effects of the AP program2 Using a
regression discontinuity design Smith Hurwitz and Avery (2017) show that students who
barely earn a college-credit equivalent score on the AP exam (eg scoring just above the
threshold necessary to receive a 3 on the exam (out of 5) are more likely to complete their
bachelorrsquos degrees in four years than students who fall just below that threshold In a related
paper that relies on the same data and design Avery et al (2018) demonstrate that AP exam
scores also influence studentsrsquo college major choices These compelling results demonstrate that
students take advantage of postsecondary AP credit policies to waive out of intro courses and
that receiving a higher AP exam score may serve as a signal of skill to both institutions and
students These two studies however do not show that AP courses per se led to skill
development as they focus solely on differences in behavior for AP exam-takers who fall just
below and just above the score thresholds Jackson (2010 2014) evaluates the impacts of the AP
Incentive Program which offers cash incentives to teachers and students for passing scores on
AP exams as well as funds for training teachers and convening teams of teachers to align pre-AP
curriculum with the needs of the AP class Jackson identifies impact from variation in the timing
of program implementation across high schools in Texas and finds large positive treatment
effects on AP courses and exams (2010) The AP Incentive program also increased studentsrsquo
college going and persistence as well as their labor market earnings (Jackson 2010 2014) These
two studies indicate that the AP Incentive Program increased AP participation and subsequent
educational attainment and labor market performance However it is not clear whether these
results would hold in the absence of the Incentive Program
We build on these findings and inform policy and practice in several ways Most important
we directly test one of the main mechanisms through which AP is expected to influence studentsrsquo
4
attainment and earnings by increasing their skill and interest in the subject matter We determine
whether skill and interest gains as distinct from college admissions and credit-granting policies
are key drivers behind APrsquos impact on later outcomes This distinction is important given that
less than half of AP course-takers earn a credit-granting score on the AP exam either because
they do not take the exam or because they obtain low scores (National Research Council 2002
College Board 2018) Many selective colleges are also increasingly making it difficult for
students to receive credit for their AP exam scores Most top institutions restrict the number of
AP subject areas that are eligible only offer credit or waivers for very high scores on the exams
or cap the total amount of AP credit that a student can receive (Weinstein 2016) In 2012
Dartmouth College announced that it would no longer grant credit for any AP exam score a
policy shared by several other selective institutions including Amherst College Brown
University and the California Institute of Technology (Weinstein 2016) Our results which
generalize to a newly offered AP course suggest that AP endows students with human capital
even if it does not grant them the opportunity to earn credit at their preferred college For college
admissions officers the findings also suggest that AP course-taking offers a reasonable signal of
studentsrsquo skill and subject-matter interest Our estimated effects on skill and STEM interest are
somewhat limited by insufficient precision yet they represent the first and most credible
evidence to date on the impact of AP on these key outcomes
Our study is also among the first known to us that quantifies the AP impact on studentsrsquo
grades We find that students who take an AP science course earn lower grades in science (by
029 grade points) and lower grades in their other courses (by 018 grade points) The lower
grades in science are driven by the lower grade received in the AP class a negative effect that
many secondary and postsecondary institutions offset by upweighting AP grades The estimates
suggest that studentsrsquo AP science grade would have to be inflated by a factor of 146 (eg a C
would have to be converted to approximately a B+) to remove the net negative on overall grade
point average (GPA) While many high schools including those that participated in our study
weight studentsrsquo GPAs to adjust for the academic difficulty of the courses practices vary
substantially across institutions (Sadler and Tai 2007 Klopfenstein and Lively 2016) In a recent
survey of Texas high schools for instance Klopfenstein and Lively (2016) find that most
schools with AP courses used weights but that they ranged from 05 to 1 point (with a small
number assigning more than 1 extra point) Our findings suggest that the current practices at
many institutions under adjust for the grade penalty from AP courses In addition attaching
weight to AP grades cannot undo the learning loss that may occur when students shift their effort
away from non-AP coursework
We also contribute to other strands of literature on the relationship between studentsrsquo
academic achievement and their perceptions of their own confidence and stress Prior literature
on the relationship between studentsrsquo confidence in their ability and their true ability is rife with
mixed results (Boekaerts and Rozendaal 2010 Stankov and Crawford 1996 Stankov 2013)
Psychologists have also documented an inverted U-shaped relationship between perceived
pressure and performance where some amount of stress is necessary to increase achievement
yet too much stress can reduce studentsrsquo ability to gain knowledge (Anderson 1976 Davis 2014
Yerkes and Dodson 1908) We find that students taking an AP science class experience cognitive
gains concurrent with losses in their academic confidence This finding is consistent with
evidence that many US students are highly confident in their skills and that this noncognitive
belief often interferes with their ability to learn (Chiu and Klassen 2010 Stankov and Lee 2014)
The AP course appears to reduce studentsrsquo estimation of their own ability either by changing the
5
standard to which they compare themselves or by making them more aware of the challenges
they might face in a college course Whether these changes in perceived confidence persist and
how they influence later outcomes is uncertain Students with expectation levels that match the
real demands of college courses might eventually perform better in those courses Some students
might also use the insights they gain from a challenging AP science class to shift away from
difficult science courses in college (or entire majors) that could delay or hinder their college
completion Our results also suggest that AP causes a significant amount of stress for students
but we do not find evidence that the added pressure substantially limits their knowledge gains in
science
II AP Science and Conceptual Framework
A AP and Other Rigorous Secondary School Courses
The AP program is an appealing option for high school administrators who seek to offer college-
level courses to their students AP course descriptions and assignments are designed to match
those offered in introductory college courses in each subject and thus to prepare students for the
rigor of college coursework The College Board (the ldquoBoardrdquo for brevity) is a not-for-profit
organization that administers AP and provides professional development for teachers reviews of
course syllabi and extensive curricular materials (eg sample syllabi sample lab experiments)3
The Board also offers standardized AP exams in the spring of each year that are graded by
external examiners and provide an externally-validated measure of student learning Most exams
include both an essay or problem-solving component and multiple-choice questions all of which
are aligned with the course descriptions The exam is one of the key features of the AP program
and is used by high school and postsecondary educators to evaluate the depth of studentsrsquo skill
independently of teacher bias
In addition to AP courses high school students typically have three alternative options for
advanced coursework Most high schools offer ldquohonorsrdquo courses which are intended to provide a
more rigorous curriculum than the regular course in the same subject The content and rigor of
honors courses varies across high schools and there is no standardized honors exam offered to
students in these courses A second option is the International Baccalaureate (IB) program
which was originally designed for students in international schools and aims to develop
studentsrsquo critical thinking skills and their knowledge of international affairs The IB program is
offered worldwide but remains relatively uncommon in the United States with less than 5 percent
of high schools offering IB in 2016 (The IB Programme 2016) A final option is for students to
take a course at a nearby college (or online) or for some a course that is taught at their high
school by an instructor who has been approved as college-level These ldquodual enrollmentrdquo or
ldquodual creditrdquo courses are meant to provide students with the opportunity to simultaneously earn
high school and college credit In the most recent national survey high schools reported
approximately two million enrollments in dual credit courses (Thomas et al 2013) There is
limited information on the colleges that accept dual enrollment credits Most courses are offered
through collaborations between high schools and local community and public postsecondary
institutions suggesting that credits are generally accepted at these institutions and less often
accepted at other institutions Comparisons of AP science classes to regular and honors level
science classes reveal that students receive much more homework and work harder in their AP
classes (Sadler et al 2014) To our knowledge there have been no comparisons of the workload
or effort in AP science courses compared to IB or dual enrollment science courses
6
B Conceptual Framework
There are several channels through which an AP science class is expected to influence studentsrsquo
cognitive and noncognitive skills Much like the ideal college course AP science is designed to
provide rigorous content and a substantial workload be taught by teachers who have high
expectations and consist of students who are driven to succeed These inputsmdashcourse rigor
teacher expectations and peer motivationmdashare often thought of as the main characteristics that
distinguish AP courses from other high school courses
Yet AP science classes are also intended to offer an inquiry-based approach to science that
when combined with a high level of rigor provides an additional causal pathway to change
Specifically a well-implemented AP science course should encourage students to ask questions
gather and interpret data arrive at explanations grounded in scientific principles and
communicate their observations to one another under the guidance of teachers (College Board
2011a 2011b)4 This student-led inquiry-based approach differs from many traditional
secondary school science classrooms where the goal is often for students to memorize content
and replicate laboratory experiments that demonstrate the content (National Research Council
2002 2012) The AP science course in contrast seeks to expose students to the real-world
practices of science and the skills that form the basis of scientific inquiry by focusing more on
big picture concepts and small group experimentation with students directing the inquiry The
curriculum also encourages teachers to move away from lecture-based pedagogy and multiple-
choice quizzes and to increase their use of technology to help students analyze data draw
interpretations and communicate findings (College Board 2011a 2011b)
AP science classes are expected to increase studentsrsquo ability to ask research questions design
experiments analyze data and draw conclusions In the process of gaining these scientific
inquiry skills the new curriculum is intended to spur greater interest in the practice of science
because it becomes more enjoyable and more accessible to students for whom rote memorization
and execution of prefabricated lab experiments might have diminished enthusiasm in the subject
(National Research Council 2012) Science experts posit that inquiry-based science courses will
be particularly successful in generating greater interest and skill among women and among
students from underrepresented minority groups (Aguilar Walton and Wieman 2014 Ellis
Fosdick and Rasmussen 2016 Kurth Anderson and Palincsar 2002 Leslie et al 2015 Litzler
Samuelson and Lorah 2014)
While the rigor and expectations of a college course may be appropriate for some students it
can be too demanding for others Students often report high levels of stress and burnout from
taking AP courses particularly if they perceive that they are not prepared for the challenge of
college coursework (Kim 2015 Marx 2014 Tucker 2012) A strenuous AP course could in fact
cause students to lose confidence in their ability to complete college science courses A number
of mechanisms could cause students to lose confidence including exposure to stronger peers
inability to successfully complete assignments or simply receiving lower grades than they
received in their non-AP courses5 The AP effect on confidence will likely matter differently for
students with different levels of initial confidence For students who are over-confident in their
ability to succeed in college science courses taking a challenging AP course in high school
might cause them to revise their expectations to be more in line with the higher demands of
college-level work
Taking a more strenuous AP course is also likely to affect studentsrsquo time allocation
Studentsrsquo performance in each class will be determined by their subject-specific ability as well as
the amount of time they devote to their coursework versus other activities including work
7
extracurricular and leisure If AP courses are more demanding than other courses students
solving a time allocation problem may shift more effort into their AP course away from other
pursuits The impact of this change in time allocation on studentsrsquo performance in AP and other
courses will depend upon whether they shift effort away from other courses and on the degree of
complementarity between their AP science course and their other courses Study time devoted to
an AP science course could improve student performance in other math and science classes
(where the skills tasks and knowledge are similar) even if students spend less time on those
courses For courses that require students to perform tasks that are not complementary with AP
science (eg courses in the humanities) taking AP science concurrently with these courses
could decrease student performance in both courses Of course students taking an AP course
could choose to reduce time spent on alternative (non-academic) activities If these other
activities have no causal impact on performance in school then the impact on overall
achievement could be negligible
Some students report concerns about their time allocation as they weigh the decision to enroll
in AP (Foust Hertberg-Davis and Callahan 2009 Hopkins 2012 Kim 2015) Many of these
concerns have increased over time as the courses have become more accessible to students who
previously faced barriers to enrollment Traditionally teachers only recommended AP courses to
students with high grades in prerequisite classes and the courses were only offered in schools
with substantial resources The Board has made efforts to increase access with for instance a
policy statement that encourages schools to open AP to all students who are ldquowilling to accept
the challengerdquo and remove all barriers that restrict access (College Board 2002)6 In a 2008
survey of a nationally-representative sample 65 percent of secondary school teachers reported
that their schools encourage as many students as possible to take AP and 69 percent reported that
AP courses are generally open to any student who wants to enroll (Duffett and Farkas 2009)
These open access policies have led to complaints that students who enroll with less preparation
will be unable to engage in the material (and perhaps become more discouraged by the
difficulty of the course) than students with more prior preparation (Hopkins 2012 Steinberg
2009 Duffett and Farkas 2009) Open access could also adversely affect more prepared students
through negative peer effects or through teachers removing content and slowing the pace of
course delivery
III AP Science Impact Study
A Overview
We recruited 23 schools from across the United States and offered monetary compensation to
pay for equipment and teacher training and as an incentive to secure participation7 Eligible
schools included ones that had not offered AP Biology or AP Chemistry in recent years were
willing to add such a course and comply with study protocol and had more eligible students than
could be served in one class so as to supply a sufficiently-sized control group8 Of the 23
schools 12 schools added AP Chemistry 10 schools added AP Biology and 1 school added both
courses We recruited two waves of schools (those that offered the course for the first time in
2013 and those that offered it for the first time in 2014) both waves were asked to field the
course for two years and the earlier-joining schools had the option of fielding the course for
three years The study includes 47 schools by cohort groups
Each participating school identified students that the school deemed eligible to take the new
AP Biology or Chemistry course in the spring of the prior year We treated all eligible students
8
who assented to participate in the study and who obtained consent from their parent or guardian
as study participants Upon receipt of signed consentassent forms we randomly offered
enrollment in the newly launched course to a subset of participating students9 The study
includes a total of 27 teachers and 1819 students (with an average of approximately 19 students
per AP class)
Figure 1 shows the geographic distribution of the 11 participating districts which are
primarily concentrated in the western southern and eastern regions of the country10 The
underrepresentation of districts in the Midwest is consistent with evidence that the Midwestern
region has experienced less competition over the years in access to selective postsecondary
institutions and a corresponding lag in AP participation rates (Bound Hershbein and Long
2009) Relative to districts across the nation those participating in the study tend to be in
neighborhoods with lower levels of socioeconomic status and to educate students who score
below average on tests in earlier grades (see Figure 2) Correspondingly participating schools
tend to be larger and more likely to educate students who are eligible for free or reduced-price
lunch Black and Hispanic than other schools (Panel A of Table 1)
There are two reasons for this over-representation of larger schools serving less economically
prosperous communities First AP courses are already offered in the majority of the nationrsquos
public high schools and schools that serve students from high-income families tend to offer
more AP subjects than schools that serve students from lower-income families (Malkus 2016
Theokas and Saaris 2013) Given that our research design only allowed for schools that had not
recently offered an AP science course the population of schools from which we recruited tended
to be those in settings with fewer resources Second participating schools were required to state
that they believed they would have 60 or more students who were qualified to take the AP
science course and this requirement tended to disqualify smaller high schools
Reflecting the school demographics participating teachers are slightly younger less
experienced and more likely to be female Black Asian American and of Hispanic ethnicity
than US high school science teachers generally (Panel B of Table 1) Nearly half (a third) of our
study teachers have less than or equal to five (two) years of teaching experience which is more
than double (triple) the rate of US high school science teachers Study teachers are more likely to
hold an undergraduate major in a STEM field than other high school science teachers yet far less
likely to hold a mastersrsquo degree and slightly less likely to have earned a teaching credential in
science Most of the participating teachers had previously taught a higher-level course (mostly
honors) yet only 47 percent of them had previously taught an AP course Our research
consequently applies to a population of teachers who are relatively new to the AP science
curriculum and who have generally not received graduate training11 Assuming AP courses
improve with teacher preparation our results likely capture the effect of a less-than-ideal version
of AP and may result in less positive treatment effects than when AP is delivered by teachers
with more training and experience (Clotfelter Ladd and Vigdor 2010)
B Data and Student Descriptive Statistics
We rely on three primary and secondary data sources for impact estimates The first is an
assessment developed and validated by the research team that measures studentsrsquo scientific
inquiry skills We administered this assessment to students in both treatment and control groups
and designed it to measure general inquiry skills (eg how to analyze data) rather than specific
content knowledge in Biology or Chemistry To that end the assessment tool includes nine items
that rely on science disciplinary knowledge that is taught in middle school specifically material
from Life Sciences and Physical Sciences The assessment which we administered to all study
9
participants during a 45-minute period measures studentsrsquo skills in data analysis scientific
explanation and scientific argument12 Participating teachers were not provided copies of the
instrument in advance therefore teachers were unable to teach any content material prior to test
administration
The second source is a questionnaire that we administered concurrently with the assessment
and that asks students a number of questions about their most recent science class and their plans
after high school The assessment and questionnaire were completed together and administered
outside of class (henceforth we refer to these instruments as the ldquosurveyrdquo) The third data source
are studentsrsquo high school transcripts which contain data on demographic and socioeconomic
background grades courses standardized exams taken in the 8th and 10th grades as well as high
school completion We use these data to determine the balance of randomization on pre-
treatment covariates estimate the effect of randomization on course-taking (including
compliance) improve the precision of our estimates with statistical controls and estimate
treatment effects on studentsrsquo grades
Our survey response rate was 78 percent13 Attrition can be attributed to student absences
during the dates scheduled for survey administration and communication lapses between school
coordinators and students Students who were randomly assigned to treatment have a 9-
percentage point higher survey response rate Given the possibility of nonrandom sample
attrition we weight all regressions by the inverse of the probability of completing the survey
conditional on student characteristics14 We implement a variety of robustness checks as
additional means to account for nonresponse These include multiple imputation of missing
outcome variables excluding one high school that had a low response rate and using the Lee
(2009) technique to provide bounds on the estimated effects These methods and results are
discussed below
We supplement these data with surveys that we administered online to teachers of the new
AP courses at the conclusion of the course The teacher survey includes questions about their
educational background professional experiences and professional development past and
present instructional practices generally and around science specifically participation in the
College Board AP training ability to cover the content of the AP course and coaching
mentoring and other professional community supports received from the school district and
education community
Table 2 provides balancing tests on pre-treatment characteristics for the full sample and the
survey respondents conditional on school by cohort fixed effects15 Most of the estimated
differences between treatment and control group students on pre-treatment observed
characteristics are small with some notable exceptions In both the full and survey samples
treatment group studentsrsquo reading exam scores were 010 and 009 standard deviations higher
than control group students both at p-values below 005 The magnitude of the treatment-control
difference was slightly lower and less precisely-estimated in math yet also favored treatment
group students16 To adjust for these chance imbalances we include all student covariates as
predictors of outcomes in the models and in the robustness checks we exclude these
covariates17
Table 2 also shows the extent of differences between control group compliers and non-
compliers We find that non-compliers are generally much more academically prepared for AP
science they have higher pre-treatment reading and math test scores and are more likely to have
completed the prerequisite courses On demographics non-compliers are more likely to be Asian
American and female18
10
IV Empirical Strategy
We estimate the effect of taking the AP science course with a standard instrumental variable
specification
(1) 119884119894119895 = 120572119895 + 119860119894119895120573 + 119935119894120574 + 120598119894119895
(2) 119860119875119894119895 = 120575119895 + 119874119891119891119890119903119890119889119894119895120579 + 119935119894120583 + 120598119894119895
where 119860119875119894119895 = 1 if student i enrolled in the AP science course in school x cohort stratum j 119860119894119895 is
the fitted value based on the estimates of the parameters in Equation (2) Offeredij = 1 if the
student is randomized into the treatment group Xi is a vector of pre-treatment covariates
(including age math and reading exam scores from 8th and 10th grade (standardized and
averaged for math and reading separately) cumulative GPA prior to the year when the AP
science course was offered and indicator variables for female racial group (Asian American
Black or Hispanic Native American or Multiracial) disability gifted English Language
Learner eligible for free or reduced-price lunch home language is not English and took
recommended prerequisite courses) and 120572119895 and 120575119895 are school by cohort fixed effects19 We use
two-stage least squares to estimate the model for all outcomes The local average treatment effect
(LATE) estimate is given by β
The intent to treat (ITT) estimate is obtained by replacing 119860119894119895 with Offeredij in Equation (1)
as shown in Equation (3) The coefficient on Offeredij in Equation (3) provides the effect of
being offered enrollment in the new AP science course and is a weighted average of effects on
those who do and do not choose to enroll in the course
(3) 119884119894119895 = 120577119895 + 119874119891119891119890119903119890119889119894119895120591 + 119935119894120582 + 120598119894119895
For outcomes that are obtained from the survey we weight regressions by the inverse of the
estimated probability of completing the survey20 The results are similar without using these
weights (see Online Appendix Tables 3 4 and 6) Since we have some missingness in student
characteristics as a result of either missing student transcripts or certain data elements not
collected by the district we use multiple imputation by chained equations creating 10 imputed
datasets and combine the results21 For inference we cluster standard errors at the level of
treatment assignment (school by cohort) in our analysis of main effects In the analysis of
robustness we report permutation standard errors robust standard errors (for comparison to
permutations) and the statistical significance of the LATE estimates after adjusting our tests of
significance for multiple comparisons
V Results
A Course-Taking and Treatment Contrast
Table 3 provides estimated effects of the randomized offer of enrollment on AP science course
enrollment and share of credits in all courses for the full sample and the survey samples The
first-stage estimates indicate that the offer substantially increased the likelihood of the student
taking the AP science course by 38 percentage points in the full sample and 39 percentage points
in the survey sample As we expected compliance with randomization was imperfect with 42
11
percent of the students who received an offer choosing not to enroll and 19 percent of the control
students enrolling Nearly all of these latter crossovers reflected decisions by the district to
violate the study protocol and let control group students into the course while a few of these
came from hardship exemptions that were requested by the school and granted by the study team
The remaining rows in Table 3 shine light on the courses that were crowded out by the newly
offered AP science course Mechanically treatment group students took more credits in AP
science (an 11-percentage point increase in the share of total credits in the full sample)
Treatment group studentsrsquo share of courses in any AP also increased by 11 percentage points
indicating that they chose not to reduce enrollment in other AP courses Instead taking AP
science appears to have crowded out regular courses (down 9 percentage points) including
regular science courses (down 2 percentage points)22
Approximately 78 percent of the control group compliers took any science course with 34
percent taking a non-AP advanced science course (almost entirely honors courses) during the
study year The control students who did not take AP Biology or Chemistry took a variety of
alternative science courses with the most commonly reported courses including Chemistry
(13) Physics (12) AP Environmental Science (11) Biology (10) Honors Biology (9)
and AnatomyPhysiology (9)
Table 4 provides the contrast in treatment and control group complier reports on the content
and rigor of their science courses for three composite variables We find that taking AP science
yielded a substantially more academically challenging curriculum (up 080 sd p-value lt 001)
and raised the extent of inquiry-based classroom activities (up 033 sd p-value = 006) Our
results also suggest that AP course-takerrsquos classrooms were more likely to use technology (up
028 sd p-value = 014)23 Online Appendix Table 5 shows estimated impacts on each of the
component variables used in constructing the composite variables We find that while AP
classrooms were more inquiry-based than other science classrooms using our composite
measure some of the core components of the inquiry approach that were intended by the Board
(eg applying knowledge to solve a new problem) were not more prevalent in AP science
classes than other science classes24 This contrast between studentsrsquo reports of the content and
rigor of their AP science course relative to other courses available to them offers one measure of
the relative quality of the treatment In a companion manuscript we provide a detailed evaluation
of implementation fidelity (the degree to which the courses were implemented as intended by the
Board) through teacher surveys course syllabi student transcripts and interviews with teachers
and school administrators (Long Conger and McGhee 2018) In that manuscript we find results
that are consistent with the finding that most teachers were able to implement a rigorous AP
science classroom yet they also struggled with the inquiry-based approach and integrating
technology into the classroom
These reported differences between treatment and control group classrooms also hold despite
the fact that many of the teachers selected to teach AP also teach the other science courses taken
by control group students In fact almost 67 percent of AP teachers reported using some of their
AP science strategies and lessons in their non-AP classes These within-school spillovers likely
attenuate observed differences in outcomes between treatment and control group students in the
same school25
B AP Impact on Outcomes
Table 5 reports estimated impacts of AP science on the key outcomes of interest We estimate
that for the typical complier taking AP science raises objectively measured scientific inquiry
skills by 023 standard deviations We are unable to rule out zero treatment impacts with
12
conventionally high levels of confidence (p-value = 014) and consequently refer to these results
as more suggestive than definitive AP science also increased compliersrsquo interest in pursuing a
STEM degree should they enroll in college by 9 percentage points up from a control group
complier mean of 62 percent with again more suggestive than definitive results at traditional
levels of statistical inference (p-value = 016)
Table 5 provides stronger evidence of negative treatment effects on studentsrsquo confidence in
their ability to succeed in a college science course Among control group compliers 92 percent
express that they are at least somewhat confident in their ability to succeed in a college science
course These high levels of confidence are perhaps not surprising since all of our sample
participants demonstrated interest in taking AP Chemistry or Biology as a result of signing the
study assent forms Taking AP science substantially lowered participantsrsquo likelihood of being at
least somewhat confident in their ability to complete college courses in science (down 10
percentage points p-value = 006) We also find large effects of the AP course on studentsrsquo self-
reported stress levels Among control group compliers 12 percent stated that their most recent
science class had a negative or strong negative impact on their stress levels (where a negative
impact indicates more stress) Taking AP science more than doubles this rate raising the
likelihood of stating a negative impact by 17 percentage points (p-value = 001) In results
available from the authors we also examine the effect of taking AP on the full distribution of
studentrsquos self-reported confidence and stress levels We find that taking AP science increases
studentsrsquo likelihood of reporting strong negative impacts on stress by 5 percentage points (p-
value = 005) above the control group complier mean of 2 percent
In addition to experiencing a loss in confidence and an increase in stress treatment group
studentsrsquo grades suffered We estimate that taking AP science reduced studentsrsquo grades in their
science courses by 029 points (p-value = 007) Relative to a control group complier mean of
280 taking AP science lowers studentsrsquo science GPAs during the study year (usually their junior
year) from around a B- to a C+26 This decline is addressed to some degree by high schools that
use a weighted grade point average to upweight grades from AP courses The last row of Table 5
provides our estimated effects of AP science on studentsrsquo grades in other courses AP science
takers score approximately 018 grade points lower than control group compliers in non-science
courses during the study year (p-value below 001) These results suggest that students may be
shifting their effort away from their non-AP classes in order to meet the demands of the
challenging AP course An average of these impacts weighted by studentsrsquo share of credits in
science during the study year assuming that they take AP science (024) suggests that taking AP
science lowers studentsrsquo overall grades by 021 during the year ((-029 times 024) + (-018 times
076))
With our estimates in hand we can easily compute the adjustment that would leave the
studentrsquos GPA during the study year unaffected For students who took AP Biology or Chemistry
as result of this experiment the share of their classes in any AP science subject is predicted to be
14 percent (ie 002 + 012 from Table 3) If these studentsrsquo grades in AP science courses were
boosted by 146 (021014) their GPAs during the study year would be unaffected by their
enrollment in these AP courses This 146 boost is close to the higher end of the practices
documented in Klopfenstein and Lively (2016)27
C Robustness Checks
Table 6 presents a variety of robustness checks of the ITT estimates on our six main outcomes
The first two columns of this table repeat the findings previously shown in Table 5 Columns (3)
and (4) present alternate methods for inference Column (3) reports robust standard errors and
13
Column (4) reports the results of a permutation test where we randomly assign a pseudo
treatment and compute the share of 1000 permutations where the absolute value of the estimated
pseudo treatment effect exceeds the absolute value of the estimated treatment effect shown in
Column (2)28 The resulting p-values from this permutation test are similar to the results using
robust standard errors (shown in Column (3)) resulting in five of the six outcomes with p-values
of less than 01029
Columns (5) through (7) of Table 6 show that the results are robust to (a) dropping the one
high school that offered both AP Biology and AP Chemistry as part of the study (b) including
observations with multiply-imputed missing outcome variables and (c) excluding the high
school with the lowest survey response rate30 Column (8) shows the results when we exclude all
of the Xi covariates where we find much larger estimated positive effects on scientific inquiry
skills and smaller estimated negative effects on grades The differences in the treatment effects
on the remaining three outcomes are modest These results likely reflect the fact that students
who were randomly assigned into the treatment group have higher pre-treatment grades and
reading and math test scores all covariates that strongly correlate with science skill and future
grades
Columns (9) through (12) of Table 6 use the Lee (2009) method to place bounds on our
estimates due to potential nonresponse bias in the student survey used for the first four outcomes
This method trims particular observations from the treatment group (in this case) until it matches
the response rate of the control group The lower (upper) bound estimate trims the treatment
observations with the highest (lowest) values of the outcome Using these lower and upper bound
estimates we compute the 95 percent confidence interval for the treatment effect itself by
applying the Imbens and Manski (2004) method Consistent with our main findings the upper
and lower bound points estimates are positive for science skill (003 and 039 sd) interest in
pursuing a STEM degree (2 and 12 percentage points) and stress (1 and 11 percentage points)
However the 95 percent confidence intervals overlap zero in all cases and are roughly double the
size of the ordinary confidence intervals These results suggest that some additional caution
should be considered in evaluating the effects from outcomes based on the study survey31
Finally we would have liked to report the results of theoretically motivated heterogeneity
analyses yet we lack the statistical power needed to test heterogeneity with a high level of
confidence For example Figure 3 shows a quantile regression conditional on Xi with science
skill as the outcome We find that the point estimates at every quantile are insignificantly
different from the 009 ITT point estimate reported in Table 5 yet the 95 confidence intervals
fail to rule out large positives and negatives Additional heterogeneity results can be found in the
Online Appendix32
VI Conclusion
Most admissions committees at bachelorrsquos degree-granting institutions rely on applicantsrsquo AP
course and exam participation as signals of subject-matter skill and interest rendering the
relationship between AP uptake and college enrollment somewhat deterministic There has been
almost no empirical work to support the theory that AP disproportionately endows high school
students with greater human capital than the other courses available to them Many students
educators and parents have also complained that the rigor of the AP pro- gram causes students to
lose confidence gain stress and perform poorly in other courses We evaluate these claims with
experimental evidence on the impact of AP Biology and Chemistry courses on studentsrsquo skills
14
interests and beliefs We recruited 23 schools that had not previously offered AP Biology or
Chemistry and were willing to permit us to randomize student access to the newly offered
course At the time of our school recruitment an estimated 50 percent of US high schools
already offered AP science classes and they tended to be in relatively higher-income
communities disproportionately serving White students (Malkus 2016) Our study drew from the
remaining population of schools where teachers had lower levels of training than science
teachers nationally and students were disproportionately non-White and poor Consequently our
results on AP impacts best generalize to schools like these that are on the cusp of deciding
whether to offer an AP science course
The estimates suggest that AP science led to improvements in science skill and STEM
interest above the courses that these students would otherwise take Prior research points to
longer-run benefits of AP including a higher likelihood of college enrollment and completion as
well as possible earnings gains (Jackson 2010 2014) Our findings suggest that these long-term
effects are at least partially driven by genuine increases in skill and not due solely to
postsecondary admissions and credit-granting policies33 We also find that AP science classes
substantially increase studentsrsquo stress levels and reduce their confidence in completing a college
science course Students who take AP science also receive lower grades in science and in other
(non-science) courses The cognitive gains from AP science are consistent with evidence that
higher levels of pressure and a lower level of confidence cause students to learn more than they
would otherwise And some of the negative effect on grades can be offset by upwardly weighting
grades in advanced courses
Although we have no direct way to convert our study impacts into monetary values for
students or society our evidence suggests that schools and districts are not making unwise or
costly investments in AP Calculating the differential cost to deliver an AP course versus another
level course in the same subject is difficult given that few schools document per-course
expenditures One recent analysis of a US district that relied on teacher salaries and course
assignments offers a partial cost-analysis Roza (2009) finds approximately $360 more in per-
pupil expenditures to deliver AP versus honors due primarily to smaller class sizes and more
senior teachers in AP This cost does not factor in the time that teachers spend retraining
themselves to teach the new curriculum At the same time relative to other policies aimed at
increasing human capital in high school that are often more costly to implement (such as
reducing class size) offering an AP course may be one of the least expensive options
This study offers the first credible estimates on the impact of a curriculum that is now offered
in the majority of the nationrsquos high schools and used by most postsecondary institutions to assess
applicant potential Our findings offer evidence to support and refute some of the claims made
about the AP program At the same time many important questions remain about differential AP
course impacts along student teacher and school attributes and on different parts of the outcome
distributions What are the general equilibrium effects of AP expansion for instance on college
admissions decisions as AP expands into schools with fewer resources Do AP courses generate
spillover effects on non-AP course-takers via changes in peer interactions and changes in how
teachers teach their non-AP classes These are all questions that warrant further research
15
References
Abadie Alberto Susan Athey Guido W Imbens and Jeffrey Wooldridge 2017 ldquoWhen Should
you Adjust Standard Errors for Clusteringrdquo NBER Working Paper No 24003
Cambridge MA NBER
Adelman Clifford 2006 The Toolbox Revisited Paths to Degree Completion from High School
Through College Washington DC US Department of Education
Aguilar Lauren Greg Walton and Carl Wieman 2014 ldquoPsychological Insights for Improved
Physics Teachingrdquo Physics Today 67 (5) 43ndash49
Altonji Joseph G 1995 ldquoThe Effects of High School Curriculum on Education and Labor
Market Outcomesrdquo The Journal of Human Resources 30 (3) 409ndash438
Anderson Carl R 1976 ldquoCoping Behaviors as Intervening Mechanisms in the Inverted-U-stress-
performance Relationshiprdquo Journal of Applied Psychology 61 (1) 30ndash34
Attewell Paul and Thurston Domina 2008 ldquoRaising the Bar Curricular Intensity and
Academic Performancerdquo Educational Evaluation and Policy Analysis 30 (1) 51ndash71
Avery Christopher Oded Gurantz Michael Hurwitz and Jonathan Smith 2018 ldquoShifting
College Majors in Response to Advanced Placement Exam Scoresrdquo Journal of Human
Resources 53 (4) 918ndash956
Benjamini Yoav and Yosef Hochberg 1995 ldquoControlling the False Discovery Rate A Practical
and Powerful Approach to Multiple Testingrdquo Journal of the Royal Statistical Society 57
(1) 289ndash300
Bennett J S Hogarth F Lubben B Campbell and A Robinson 2010 ldquoTalking Science The
Research Evidence on the Use of Small Group Discussions in Science Teachingrdquo
International Journal of Science Education 32 (1) 69ndash95
Berger Joe 2006 ldquoDemoting Advanced Placementrdquo The New York Times October 4
Boekaerts Monique and Jeroen S Rozendaal 2010 ldquoUsing Multiple Calibration Indices in
Order to Capture the Complex Picture of What Affects Studentsrsquo Accuracy of Feeling of
Confidencerdquo Learning and Instruction 20 (5) 372ndash382
Bound John Brad Hershbein and Bridget Terry Long 2009 ldquoPlaying the Admissions Game
Student Reactions to Increasing College Competitionrdquo The Journal of Economic
Perspectives 23 (4) 119ndash146
Bowie Liz 2013 ldquoMaryland Schools have been Leader in Advanced Placement but Results are
Mixedrdquo The Baltimore Sun August 17
Bush George W 2006 ldquoState of the Union Address by the Presidentrdquo Washington DC The
White House
Chiu Ming Ming and Robert M Klassen 2010 ldquoRelations of Mathematics Self-Concept and its
Calibration with Mathematics Achievement Cultural Differences among Fifteen-year-
olds in 34 Countriesrdquo Learning and Instruction 20 (1) 2ndash17
Clotfelter Charles T Helen F Ladd and Jacob L Vigdor 2010 ldquoTeacher Credentials and
Student Achievement in High School Across-Subject Analysis with Student Fixed
Effectsrdquo Journal of Human Resources 45 (3) 655ndash681
College Board 2002 Equity Policy Statement New York NY
__________ 2011a AP Biology Curriculum Framework 2012-2013 New York NY
__________ 2011b AP Chemistry Curriculum Framework 2013-2014 New York NY
__________ 2017a AP Course and Exam Redesign New York NY
__________ 2017b AP Course Audit New York NY
__________ 2018 AP Program Participation and Performance Data 2018 New York NY
16
Davis Jennifer R 2014 ldquoA Little Goes a Long Way Pressure for College Students to Succeedrdquo
Journal of Undergraduate Research 12 (1) 1ndash9
Dobbie Will and Roland G Fryer Jr 2015 ldquoThe medium-term impacts of high-achieving
charter schoolsrdquo Journal of Political Economy 123 (5) 985ndash1037
Dougherty Chrys and Lynn Mellor 2009 ldquoPreparation Mattersrdquo National Center for
Educational Achievement Washington DC
Dounay Zinth Jennifer 2016 ldquo50-State Comparison Advanced Placement Policiesrdquo Education
Commission of the States
Drew Christopher 2011 ldquoRethinking Advanced Placementrdquo The New York Times January 7
Duffett Ann and Steve Farkas 2009 ldquoGrowing Pains in the Advanced Placement Program Do
Tough Trade-offs Lie Aheadrdquo Thomas B Fordham Institute Washington DC
Ellis Jessica Bailey K Fosdick and Chris Rasmussen 2016 ldquoWomen 15 Times More Likely to
Leave STEM Pipeline after Calculus Compared to Men Lack of Mathematical
Confidence a Potential Culpritrdquo PLOS ONE 11 (7) 1ndash14
Foust Regan Clark Holly Hertberg-Davis and Carolyn M Callahan 2009 ldquoStudentsrsquo
Perceptions of the Non-academic Advantages and Disadvantages of Participation in
Advanced Placement Courses and International Baccalaureate Programsrdquo Adolescence
44 (174) 289ndash312
Geiser Saul and Veronica Santelices 2004 ldquoThe Role of Advanced Placement and Honors
Courses in College Admissionsrdquo Center for Studies in Higher Education Research
Occasional Paper Series CSHE404
Goodman Joshua Samuel 2012 ldquoThe Labor of Division Returns to Compulsory Math
Courseworkrdquo Unpublished Manuscript
Harel O 2009 ldquoThe Estimation of R-squared and Adjusted R-squared in Incomplete Data Sets
Using Multiple Imputationrdquo Journal of Applied Statistics 36 (10) 1109ndash1118
Hippel Paul T von 2007 ldquoRegression with Missing Ys An Improved Strategy for Analyzing
Multiply Imputed Datardquo Sociological Methodology 37 (1) 83ndash117
Holstead Michael S Terry E Spradlin Margaret E McGillivray and Nathan Burroughs 2010
ldquoThe Impact of Advanced Placement Incentive Programsrdquo Center for Evaluation and
Education Policy Indiana University Education Policy Brief 8(1)
Hopkins Katy 2012 ldquoWeigh the Benefits Stress of AP Courses for Your Studentrdquo US News
the World Report May 10
Huber Martin 2013 ldquoA Simple Test for the Ignorability of Non-compliance in Experimentsrdquo
Economics Letters 120 (3) 389ndash391
Imbens G and F Manski 2004 ldquoConfidence Intervals for Partially Identified Parametersrdquo
Econometrica 72 (6) 1845ndash1857
Jackson C Kirabo 2010 ldquoA Little Now for a Lot Later A Look at a Texas Advanced Placement
Incentive Programrdquo Journal of Human Resources 45 (3) 591ndash639
__________ 2014 ldquoDo College-Preparatory Programs Improve Long-Term Outcomesrdquo
Economic Inquiry 52 (1) 72ndash99
Joensen Juanna Schroslashter and Helena Skyt Nielsen 2009 ldquoIs there a Causal Effect of High
School Math on Labor Market Outcomesrdquo Journal of Human Resources 44 (1) 171ndash
198
Kim Emily 2015 ldquoAP Classes often Translate to Advanced Pressurerdquo Los Angeles Times
September 22
Klopfenstein Kristin and Kit Lively 2016 ldquoDo Grade Weights Promote More Advanced
17
Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324
Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement
Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891
__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and
Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds
Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188
Cambridge Harvard Education Press
Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla
Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)
287ndash 313
Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on
Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102
Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations
of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347
(6219) 262ndash265
Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math
and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic
Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student
STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher
Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking
on Secondary and Postsecondary Successrdquo American Educational Research Journal 49
(2) 285ndash322
Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP
Expansion Can Schools in Less-Resourced Communities Successfully Implement
Advanced Placement Science Coursesrdquo Conditionally accepted by Educational
Researcher
Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo
American Enterprise Institute Washington DC
Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23
McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy
Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of
Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-
144) US Department of Education Washington DC National Center for Education
Statistics
National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of
Mathematics and Science in US High Schoolsrdquo Washington DC National Academies
Press
__________ 2012 A Framework for K-12 Science Education Practices Crosscutting
Concepts and Core Ideas Washington DC The National Academies Press
Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC
Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data
Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures
Version 10 Stanford University
Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic
Analysis amp Policy 4 (1) 1ndash30
18
Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The
Review of Economics and Statistics 86 (2) 497ndash513
Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)
Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of
Advanced High School Coursework in Increasing STEM Career Interestrdquo Science
Educator 23 (1) 1ndash13
Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework
in College Admission Decisionsrdquo College and University 82 (4) 7ndash14
Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan
Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific
Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo
Educational Measurement Forthcoming
Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where
it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor
Economics 35 (1) 67ndash147
Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An
Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732
Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual
differencesrdquo Personality and Individual Differences 21 (6) 971ndash986
Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of
Cross-Cultural Psychology 45 (5) 821ndash837
Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid
Growthrdquo The New York Times April 29
Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo
Liberal Education 94 (3) 38ndash43
The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo
Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo
Education Trust June 5
Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and
Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-
001) US Department of Education Washington DC National Center for Education
Statistics
Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13
Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate
US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the
Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced
Placement Testsrdquo Washington DC
Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of
Advanced Placementrdquo Progressive Policy Institute Washington DC
West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth
Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring
Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation
and Policy Analysis 38 (1) 148ndash170
Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity
of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482
19
Figure 1
Geographic Distribution of Participating Districts
20
Figure 2
Participating Districts Neighborhood Socioeconomic Status and School Test Scores
Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school
district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos
neighborhood defined as the first principal component factor score based on measures of median
income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed
household rate and unemployment rate Y-axis is the districtrsquos average test score in grade
equivalents based on the averaged spring math and English scores for students in grades 3-8 for
2009-2013 with the expected level of achievement standardized to zero The size of each circle
is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using
Statarsquos default settings and roughly shows the predicted test score as a function of the
neighborhoodrsquos SES
21
Figure 3
Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile
Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects
Corresponding OLS estimate shown by the dashed horizontal line Science skill has been
standardized to have a mean of 0 and SD of 1 for the full sample of participating students
Results are weighted by the inverse probability of completing the survey
22
Table 1
Participating Schools and Teachers Compared to Other US High Schools and High School
Science Teachers Panel A Schools Participating Others
Average Enrollment 1409 723
Free or Reduced-Price Lunch 0700 0438
Asian 0055 0050
Black 0349 0154
Hispanic 0410 0221
White 0164 0537
Adjusted Cohort Graduation Rate 0843 0802
District Instruction Expenditures Per Pupil $6561 $5636
District Student Services Expenditures Per Pupil $3787 $3385
Panel B Teachers Participating Others
Age Under 30 0407 0160
Age 30-49 0432 0553
Age 50 or over 0161 0287
Female 0630 0536
Hispanic or Latino 0111 0051
Race American Indian or Alaska Native 0000 0009
Race Asian American 0111 0041
Race Black 0111 0060
Race Native Hawaiian or other Pacific Islander 0000 0004
Race White 0778 0896
Years of Experience 103 132
Years of Experience lt=2 0290 0085
Years of Experience lt=5 0481 0234
Hold a Teaching Certificate 0926 0945
Undergraduate Major in STEM 0944 0747
Single Subject Credential in Science 0630 0823
Masterrsquos Degree or Higher 0356 0615
Previously Taught AP Course 0469 NA
Previously Taught AP IB or Honors Course 0796 NA
Number of Professional Development Trainings 309 NA
in the Past 5 years (0-5)
Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts
httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public
high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a
9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the
Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey
httpsncesedgovsurveyssass Others in Panel B refers to public and private high school
teachers in the US High school science teachers are defined as teachers of grades 9-12 whose
main teaching assignment is in the natural sciences
23
Table 2
TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics
(1) (2) (3) (4) (5) (6)
Full Sample Survey Sample
Pre-Treatment Characteristic
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Age as of October of 11th Grade 166 -003 -007 166 -001 -001
(002) (007) (003) (009)
[019] [035] [065] [094]
Math Exam Score 038 008 025 044 007 030
(004) (010) (005) (016)
[008] [002] [017] [006]
Reading Exam Score 029 010 018 036 009 017
(003) (012) (004) (017)
[000] [014] [002] [031]
HS Grade Point Average 316 005 020 323 006 013
(003) (008) (003) (010)
[014] [002] [006] [020]
Female 059 000 010 061 -001 011
(003) (006) (004) (007)
[099] [010] [073] [012]
Asian American 012 002 010 012 003 010
(002) (005) (001) (007)
[027] [006] [007] [012]
Black 032 -002 -006 027 000 -005
(002) (006) (002) (005)
[029] [028] [088] [040]
Hispanic Native American or Multiracial 031 001 005 033 001 005
24
(002) (006) (002) (007)
[055] [041] [081] [051]
Disabled 002 000 -001 001 000 -001
(001) (001) (001) (001)
[093] [024] [057] [05]
Gifted 013 003 000 014 002 001
(002) (005) (002) (009)
[006] [100] [025] [089]
English Language Learner 005 001 002 004 001 004
(001) (002) (001) (003)
[041] [039] [054] [022]
Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007
(002) (007) (003) (009)
[066] [077] [072] [045]
Language Other than English Spoken at Home 034 002 003 035 001 004
(002) (007) (002) (007)
[032] [073] [059] [056]
Took Recommended Prerequisite Courses 079 000 009 079 002 005
(002) (004) (002) (005)
[084] [004] [043] [031]
Number of Observations 1819 1417
Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by
School x Cohort are in parentheses and p-values are in brackets
25
Table 3
First Stage Impacts on AP Course Enrollment and Overall Course Enrollment
(1) (2) (3) (4) (5) (6)
Full Sample Survey Respondents
Outcome
Control
Group
Mean
ITT
LATE
Control
Group
Mean
ITT
LATE
AP Treatment Course Enrollment 019 038 024 039
(005) (006)
[000] [000] Share of Credits During Study Year in
AP Science 003 004 011 003 004 010
(001) (001) (001) (001)
[000] [000] [000] [000]
All AP 013 004 011 014 004 010
(001) (002) (001) (002)
[000] [000] [000] [000]
Other Advanced Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [023] [020] [020]
All Other Advanced 025 -001 -003 025 -001 -003
(001) (002) (001) (003)
[023] [023] [030] [030]
Regular Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [020] [024] [019]
All Regular 062 -003 -009 061 -003 -007
(001) (003) (001) (003)
[002] [000] [007] [003]
Number of Observations 1819 1417
Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating
Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation
(1) Course-taking information collected from student transcripts Control Group Mean uses the
full control group for the first outcome (ie AP Treatment Course Enrollment) and those control
group members who complied with their assignment (ie those who did not take the AP
Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are
weighted by the inverse probability of completing the survey Standard errors clustered by School
x Cohort are in parentheses and p-values are in brackets
26
Table 4
Treatment Contrast (Composite Variables)
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Academically Challenging Curriculum -033 031 080
(010) (024)
[000] [000]
Project-Based Independent Classroom
Activities -006 013 033
(007) (017)
[007] [006]
Integrated Use of Technology
-011 011 028
(008) (019)
[019] [014]
Number of Observations 1417
Notes To construct these composite variables we first converted the values on each component
variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest
category was set to 10 the lowest to 00 and the remaining categories evenly spaced between
00 and 10 We then averaged and standardized these converted values Results are weighted by
the inverse probability of completing the survey Online Appendix Table 5 provides the list of
component variables Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
27
Table 5
AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Science Skill -019 009 023
(006) (016)
[015] [014]
STEM Interest 062 004 009
(002) (007)
[016] [016]
Confidence in College
Science 092 -004 -010
(002) (005)
[011] [006]
Stress 012 007 017
(003) (007)
[002] [001]
Grades in Science Courses 280 -012 -029
(007) (016)
[008] [007]
Grades in Other Courses 314 -007 -018
(002) (006)
[000] [000]
Number of Observations 1819 for grades 1417 for other
outcomes
Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of
participating students STEM interest =1 if high or some interest in pursuing a STEM degree or
=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to
complete a college science course or =0 if somewhat not confident or not at all confident Stress=
1 if most recent science course had strong negative or negative impact on physical or emotional
health or =0 if strong positive impact positive impact or no impact Grades in science and other
courses are obtained from student transcripts and measure grades during the study year
Results with the exception of grades during study year are weighted by the inverse probability of
completing the survey Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
Table 6
Robustness Checks of Main ITT Results
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Outcome
Control
Group
Complie
r Mean
Main
Result
s
Robus
t SE
p-value
(permutatio
n test)
Excludin
g High
School
56
Including
Imputatio
n of
Missing
Outcome
Variables
Excluding
Covariate
s
Excludin
g High
School
23
Lee
Lower
Boun
d
Lee
Upper
Boun
d
95
Confidence
Interval
from Lee
Bounds
Rati
o of
95
CI in
(11)
to
95
CI in
(7)
Science Skill -019 009 010 011 020 007 003 039
-
009
05
1 20
(006) (005) (000) (000) (000) (000) (007) (007)
[015] [006] [006] [020] [011] [001] [024] [072] [000]
STEM Interest 062 004 005 003 003 003 002 012
-
003
01
8 19
(002) (003) (000) (000) (000) (000) (003) (004)
[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College
Science 092 -004 -003 -006 -006 -004 -006 005
-
009
01
0 20
(002) (002) (000) (000) (000) (000) (002) (003)
[011] [005] [007] [037] [002] [003] [010] [000] [017]
Stress 012 007 005 006 008 007 001 011
-
005
01
5 16
(003) (002) (000) (000) (000) (000) (003) (002)
[002] [000] [000] [014] [007] [002] [002] [079] [000]
Grades in Science Courses 280 -012 -006 -010 -007 |
(007) (004) (000) (000) (000)
[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts
Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey
(002) (003) (000) (000) (000) |
[000] [001] [001] [000] [001] [038]
Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than
standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby
a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of
43
the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)
reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the
experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply
imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and
where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)
from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those
treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and
control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to
derive confidence interval for the treatment effect itself)
30
1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the
Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the
effect of advanced high school courses more generally often without distinctions between AP
and other rigorous course options Nearly all of these nonexperimental studies find large positive
effects of rigorous secondary school courses particularly those in math and science on studentsrsquo
high school postsecondary and labor market performance (eg Altonji 1995 Attewell and
Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long
Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer
an AP course are encouraged (though not required) to attend a professional development
training The Board and other independent agencies offer several workshops with the most
extensive training being the AP summer institute a week-long training that is led by an
experienced AP instructor Teachers are then expected to develop their syllabi for the course and
submit them to the Board for review A team of auditors at the Board review each syllabus and
grant permission to a school to label the course as AP on course catalogs and student transcripts
once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they
do not meet the requirements upon original submission College Board (2017b) contains a
discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for
assessment (ie course delivery and student performance are not assessed by the Board) In
order to effectively run an AP Biology or Chemistry course teachers require access to a well-
equipped classroom and laboratory including all supplies necessary to engage in
experimentation (eg beakers solutions microscopes measuring equipment) Most of the
teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of
learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry
reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses
were based upon recommendations from the National Science Foundation the National Research
Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently
influenced by their frames of reference in ways that other assessments of these traits (eg
external observations) may be less influenced By increasing the standard to which they compare
themselves studentsrsquo confidence may decrease This feature of most self-assessments could be
considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et
al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome
depends to some extent on how these changes in perceived ability influence other behaviors
such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and
Biology I and Chemistry I for AP Biology with no additional requirements beyond these
prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week
training course classroom supplies (eg lab materials textbooks) and to compensate schools
for the staff time required for study administration efforts We also offered $1000 compensation
for an individual selected by the school to serve as a liaison between the study team and the
31
school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of
students which would have powered the study to detect effect sizes smaller than those detected
here We faced several challenges in recruiting schools to participate even with the monetary
incentives Some schools were uncomfortable with randomization across classrooms while
others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the
course would be offered We also made some assignments on a rolling basis as additional
consentassent forms were submitted We have no information on the students who were deemed
eligible by the school to take the new AP science course but who did not sign the consent form
to participate As these students did not participate we do not have permission to obtain
information on their characteristics (eg via transcripts) and for most schools we do not know
the number of such students 10 Participating districts include Anaheim Union High School District California East Side
Union High School District California Lynwood Unified School District California Jefferson
Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg
Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public
Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville
Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate
degrees held by teachers nationally are likely to be in education (not STEM) Thus the study
teachers are less likely to have a graduate degree but not necessarily less likely to have STEM
training We also did not survey teachers regarding their Teach for America (TFA) experience
but it is possible that the relatively high share of STEM undergraduate degrees could be driven
by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last
pilot test included 140 students) prior to administering the tool to study participants Reliability
metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of
the development of the assessment instrument in the survey can be found in Seeratan et al
(2017) 13 Each year in the spring semester our team administered and collected the participant surveys
during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However
if study participants who did not take the survey differ in unobserved ways then our reweighting
based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo
characteristics before imputation of missing values (as described below) these results are very
similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact
that some students did not have 10th grade scores) we created one reading and math score for
each student that is the average of both scores or just the 8th grade score For the 23 participating
students who were in 10th grade during the year in which the AP course was offered to their
cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be
endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of
enrollment in the course so the lack of balance is simply due to unlucky randomization rather
32
than manipulation by school administrators We considered implementing a randomized block
design to avoid such issues but found it infeasible to obtain the necessary test score information
prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was
allowed to register for the new class We added an entire planning year to our study design to
avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by
Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We
find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these
six outcomes which suggests that generalizing our estimated treatment effects to the full control
group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly
offered enrollment in an AP course and then given the option of Chemistry or Biology To
account for the two courses offered we treat the school as two separate groups School-
Chemistry and School-Biology For those students who were not offered an AP course we
randomly assign them to one of two control groups proportional to the number of treated
students who chose each course For example if 60 of the treated students chose Biology then
we randomly assign 60 of the control students to the School-Biology control group In Section
VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a
probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1
if student i in school by cohort j completed any part of the end-of-year survey Xi is the same
vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed
effects and Φ() is the cumulative normal distribution function The results of this regression are
included in Online Appendix Table 2 Students who had higher pre-treatment grades Black
students those who were not disabled and those who took prerequisite courses were more likely
to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives
more weight in the regression to study participants who completed the survey and yet had pre-
study characteristics that were similar to those study participants who did not complete the
survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and
with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we
observe each outcome variable This follows a multiple imputation then deletion strategy
suggested by Hippel (2007) which improves efficiency while protecting against problematic
imputed outcome values As a robustness check Section VC provides results including
imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually
challenging curriculum with more homework than non-AP complier students Treatment group
students are also more likely to report that the students in their class were driven to succeed and
that the teacher set high standards The AP science class also involved more student-led projects
or experiments hands on learning and small group work all activities that are deemed to be
essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)
Yet we do not find strong evidence that students in AP classes were more likely to present what
they learned apply their knowledge to solve a new problem or work independently and none of
the component measures of technology usage were statistically significantly affected Nor did
33
treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear
better able to implement the academic rigor expected of an AP science class than some of the
inquiry-based approaches that the College Board intends for AP science We do not find
evidence that taking AP science led students to be more likely to report that they found their
course more interesting which may reflect the inability of the teachers to fully implement a
creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects
that might render our estimated effects smaller A research design with randomization both
across and within schools would allow for estimation of spillover effects but such a design was
infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers
received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the
weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors
in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple
outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons
(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same
three outcomes that reach statistical significance without applying the correction (shown in
Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys
from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we
have included the students from cohort 1 of high school number 23 where nonresponse was due
mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes
(including student prior academic preparation raceethnicity gender and teacher preparation)
We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in
science and grades in other courses) Some of the differences in the point estimates were quite
large yet so too were the standard errors For instance five of the seven estimated differential
treatment effects on science skill exceed 025 standard deviations with p-values that fall in the
suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse
on all three cohorts of study participants Once data collection is complete we will have the
ability to examine the effect of AP science on college enrollment college selectivity and college
completion
4
attainment and earnings by increasing their skill and interest in the subject matter We determine
whether skill and interest gains as distinct from college admissions and credit-granting policies
are key drivers behind APrsquos impact on later outcomes This distinction is important given that
less than half of AP course-takers earn a credit-granting score on the AP exam either because
they do not take the exam or because they obtain low scores (National Research Council 2002
College Board 2018) Many selective colleges are also increasingly making it difficult for
students to receive credit for their AP exam scores Most top institutions restrict the number of
AP subject areas that are eligible only offer credit or waivers for very high scores on the exams
or cap the total amount of AP credit that a student can receive (Weinstein 2016) In 2012
Dartmouth College announced that it would no longer grant credit for any AP exam score a
policy shared by several other selective institutions including Amherst College Brown
University and the California Institute of Technology (Weinstein 2016) Our results which
generalize to a newly offered AP course suggest that AP endows students with human capital
even if it does not grant them the opportunity to earn credit at their preferred college For college
admissions officers the findings also suggest that AP course-taking offers a reasonable signal of
studentsrsquo skill and subject-matter interest Our estimated effects on skill and STEM interest are
somewhat limited by insufficient precision yet they represent the first and most credible
evidence to date on the impact of AP on these key outcomes
Our study is also among the first known to us that quantifies the AP impact on studentsrsquo
grades We find that students who take an AP science course earn lower grades in science (by
029 grade points) and lower grades in their other courses (by 018 grade points) The lower
grades in science are driven by the lower grade received in the AP class a negative effect that
many secondary and postsecondary institutions offset by upweighting AP grades The estimates
suggest that studentsrsquo AP science grade would have to be inflated by a factor of 146 (eg a C
would have to be converted to approximately a B+) to remove the net negative on overall grade
point average (GPA) While many high schools including those that participated in our study
weight studentsrsquo GPAs to adjust for the academic difficulty of the courses practices vary
substantially across institutions (Sadler and Tai 2007 Klopfenstein and Lively 2016) In a recent
survey of Texas high schools for instance Klopfenstein and Lively (2016) find that most
schools with AP courses used weights but that they ranged from 05 to 1 point (with a small
number assigning more than 1 extra point) Our findings suggest that the current practices at
many institutions under adjust for the grade penalty from AP courses In addition attaching
weight to AP grades cannot undo the learning loss that may occur when students shift their effort
away from non-AP coursework
We also contribute to other strands of literature on the relationship between studentsrsquo
academic achievement and their perceptions of their own confidence and stress Prior literature
on the relationship between studentsrsquo confidence in their ability and their true ability is rife with
mixed results (Boekaerts and Rozendaal 2010 Stankov and Crawford 1996 Stankov 2013)
Psychologists have also documented an inverted U-shaped relationship between perceived
pressure and performance where some amount of stress is necessary to increase achievement
yet too much stress can reduce studentsrsquo ability to gain knowledge (Anderson 1976 Davis 2014
Yerkes and Dodson 1908) We find that students taking an AP science class experience cognitive
gains concurrent with losses in their academic confidence This finding is consistent with
evidence that many US students are highly confident in their skills and that this noncognitive
belief often interferes with their ability to learn (Chiu and Klassen 2010 Stankov and Lee 2014)
The AP course appears to reduce studentsrsquo estimation of their own ability either by changing the
5
standard to which they compare themselves or by making them more aware of the challenges
they might face in a college course Whether these changes in perceived confidence persist and
how they influence later outcomes is uncertain Students with expectation levels that match the
real demands of college courses might eventually perform better in those courses Some students
might also use the insights they gain from a challenging AP science class to shift away from
difficult science courses in college (or entire majors) that could delay or hinder their college
completion Our results also suggest that AP causes a significant amount of stress for students
but we do not find evidence that the added pressure substantially limits their knowledge gains in
science
II AP Science and Conceptual Framework
A AP and Other Rigorous Secondary School Courses
The AP program is an appealing option for high school administrators who seek to offer college-
level courses to their students AP course descriptions and assignments are designed to match
those offered in introductory college courses in each subject and thus to prepare students for the
rigor of college coursework The College Board (the ldquoBoardrdquo for brevity) is a not-for-profit
organization that administers AP and provides professional development for teachers reviews of
course syllabi and extensive curricular materials (eg sample syllabi sample lab experiments)3
The Board also offers standardized AP exams in the spring of each year that are graded by
external examiners and provide an externally-validated measure of student learning Most exams
include both an essay or problem-solving component and multiple-choice questions all of which
are aligned with the course descriptions The exam is one of the key features of the AP program
and is used by high school and postsecondary educators to evaluate the depth of studentsrsquo skill
independently of teacher bias
In addition to AP courses high school students typically have three alternative options for
advanced coursework Most high schools offer ldquohonorsrdquo courses which are intended to provide a
more rigorous curriculum than the regular course in the same subject The content and rigor of
honors courses varies across high schools and there is no standardized honors exam offered to
students in these courses A second option is the International Baccalaureate (IB) program
which was originally designed for students in international schools and aims to develop
studentsrsquo critical thinking skills and their knowledge of international affairs The IB program is
offered worldwide but remains relatively uncommon in the United States with less than 5 percent
of high schools offering IB in 2016 (The IB Programme 2016) A final option is for students to
take a course at a nearby college (or online) or for some a course that is taught at their high
school by an instructor who has been approved as college-level These ldquodual enrollmentrdquo or
ldquodual creditrdquo courses are meant to provide students with the opportunity to simultaneously earn
high school and college credit In the most recent national survey high schools reported
approximately two million enrollments in dual credit courses (Thomas et al 2013) There is
limited information on the colleges that accept dual enrollment credits Most courses are offered
through collaborations between high schools and local community and public postsecondary
institutions suggesting that credits are generally accepted at these institutions and less often
accepted at other institutions Comparisons of AP science classes to regular and honors level
science classes reveal that students receive much more homework and work harder in their AP
classes (Sadler et al 2014) To our knowledge there have been no comparisons of the workload
or effort in AP science courses compared to IB or dual enrollment science courses
6
B Conceptual Framework
There are several channels through which an AP science class is expected to influence studentsrsquo
cognitive and noncognitive skills Much like the ideal college course AP science is designed to
provide rigorous content and a substantial workload be taught by teachers who have high
expectations and consist of students who are driven to succeed These inputsmdashcourse rigor
teacher expectations and peer motivationmdashare often thought of as the main characteristics that
distinguish AP courses from other high school courses
Yet AP science classes are also intended to offer an inquiry-based approach to science that
when combined with a high level of rigor provides an additional causal pathway to change
Specifically a well-implemented AP science course should encourage students to ask questions
gather and interpret data arrive at explanations grounded in scientific principles and
communicate their observations to one another under the guidance of teachers (College Board
2011a 2011b)4 This student-led inquiry-based approach differs from many traditional
secondary school science classrooms where the goal is often for students to memorize content
and replicate laboratory experiments that demonstrate the content (National Research Council
2002 2012) The AP science course in contrast seeks to expose students to the real-world
practices of science and the skills that form the basis of scientific inquiry by focusing more on
big picture concepts and small group experimentation with students directing the inquiry The
curriculum also encourages teachers to move away from lecture-based pedagogy and multiple-
choice quizzes and to increase their use of technology to help students analyze data draw
interpretations and communicate findings (College Board 2011a 2011b)
AP science classes are expected to increase studentsrsquo ability to ask research questions design
experiments analyze data and draw conclusions In the process of gaining these scientific
inquiry skills the new curriculum is intended to spur greater interest in the practice of science
because it becomes more enjoyable and more accessible to students for whom rote memorization
and execution of prefabricated lab experiments might have diminished enthusiasm in the subject
(National Research Council 2012) Science experts posit that inquiry-based science courses will
be particularly successful in generating greater interest and skill among women and among
students from underrepresented minority groups (Aguilar Walton and Wieman 2014 Ellis
Fosdick and Rasmussen 2016 Kurth Anderson and Palincsar 2002 Leslie et al 2015 Litzler
Samuelson and Lorah 2014)
While the rigor and expectations of a college course may be appropriate for some students it
can be too demanding for others Students often report high levels of stress and burnout from
taking AP courses particularly if they perceive that they are not prepared for the challenge of
college coursework (Kim 2015 Marx 2014 Tucker 2012) A strenuous AP course could in fact
cause students to lose confidence in their ability to complete college science courses A number
of mechanisms could cause students to lose confidence including exposure to stronger peers
inability to successfully complete assignments or simply receiving lower grades than they
received in their non-AP courses5 The AP effect on confidence will likely matter differently for
students with different levels of initial confidence For students who are over-confident in their
ability to succeed in college science courses taking a challenging AP course in high school
might cause them to revise their expectations to be more in line with the higher demands of
college-level work
Taking a more strenuous AP course is also likely to affect studentsrsquo time allocation
Studentsrsquo performance in each class will be determined by their subject-specific ability as well as
the amount of time they devote to their coursework versus other activities including work
7
extracurricular and leisure If AP courses are more demanding than other courses students
solving a time allocation problem may shift more effort into their AP course away from other
pursuits The impact of this change in time allocation on studentsrsquo performance in AP and other
courses will depend upon whether they shift effort away from other courses and on the degree of
complementarity between their AP science course and their other courses Study time devoted to
an AP science course could improve student performance in other math and science classes
(where the skills tasks and knowledge are similar) even if students spend less time on those
courses For courses that require students to perform tasks that are not complementary with AP
science (eg courses in the humanities) taking AP science concurrently with these courses
could decrease student performance in both courses Of course students taking an AP course
could choose to reduce time spent on alternative (non-academic) activities If these other
activities have no causal impact on performance in school then the impact on overall
achievement could be negligible
Some students report concerns about their time allocation as they weigh the decision to enroll
in AP (Foust Hertberg-Davis and Callahan 2009 Hopkins 2012 Kim 2015) Many of these
concerns have increased over time as the courses have become more accessible to students who
previously faced barriers to enrollment Traditionally teachers only recommended AP courses to
students with high grades in prerequisite classes and the courses were only offered in schools
with substantial resources The Board has made efforts to increase access with for instance a
policy statement that encourages schools to open AP to all students who are ldquowilling to accept
the challengerdquo and remove all barriers that restrict access (College Board 2002)6 In a 2008
survey of a nationally-representative sample 65 percent of secondary school teachers reported
that their schools encourage as many students as possible to take AP and 69 percent reported that
AP courses are generally open to any student who wants to enroll (Duffett and Farkas 2009)
These open access policies have led to complaints that students who enroll with less preparation
will be unable to engage in the material (and perhaps become more discouraged by the
difficulty of the course) than students with more prior preparation (Hopkins 2012 Steinberg
2009 Duffett and Farkas 2009) Open access could also adversely affect more prepared students
through negative peer effects or through teachers removing content and slowing the pace of
course delivery
III AP Science Impact Study
A Overview
We recruited 23 schools from across the United States and offered monetary compensation to
pay for equipment and teacher training and as an incentive to secure participation7 Eligible
schools included ones that had not offered AP Biology or AP Chemistry in recent years were
willing to add such a course and comply with study protocol and had more eligible students than
could be served in one class so as to supply a sufficiently-sized control group8 Of the 23
schools 12 schools added AP Chemistry 10 schools added AP Biology and 1 school added both
courses We recruited two waves of schools (those that offered the course for the first time in
2013 and those that offered it for the first time in 2014) both waves were asked to field the
course for two years and the earlier-joining schools had the option of fielding the course for
three years The study includes 47 schools by cohort groups
Each participating school identified students that the school deemed eligible to take the new
AP Biology or Chemistry course in the spring of the prior year We treated all eligible students
8
who assented to participate in the study and who obtained consent from their parent or guardian
as study participants Upon receipt of signed consentassent forms we randomly offered
enrollment in the newly launched course to a subset of participating students9 The study
includes a total of 27 teachers and 1819 students (with an average of approximately 19 students
per AP class)
Figure 1 shows the geographic distribution of the 11 participating districts which are
primarily concentrated in the western southern and eastern regions of the country10 The
underrepresentation of districts in the Midwest is consistent with evidence that the Midwestern
region has experienced less competition over the years in access to selective postsecondary
institutions and a corresponding lag in AP participation rates (Bound Hershbein and Long
2009) Relative to districts across the nation those participating in the study tend to be in
neighborhoods with lower levels of socioeconomic status and to educate students who score
below average on tests in earlier grades (see Figure 2) Correspondingly participating schools
tend to be larger and more likely to educate students who are eligible for free or reduced-price
lunch Black and Hispanic than other schools (Panel A of Table 1)
There are two reasons for this over-representation of larger schools serving less economically
prosperous communities First AP courses are already offered in the majority of the nationrsquos
public high schools and schools that serve students from high-income families tend to offer
more AP subjects than schools that serve students from lower-income families (Malkus 2016
Theokas and Saaris 2013) Given that our research design only allowed for schools that had not
recently offered an AP science course the population of schools from which we recruited tended
to be those in settings with fewer resources Second participating schools were required to state
that they believed they would have 60 or more students who were qualified to take the AP
science course and this requirement tended to disqualify smaller high schools
Reflecting the school demographics participating teachers are slightly younger less
experienced and more likely to be female Black Asian American and of Hispanic ethnicity
than US high school science teachers generally (Panel B of Table 1) Nearly half (a third) of our
study teachers have less than or equal to five (two) years of teaching experience which is more
than double (triple) the rate of US high school science teachers Study teachers are more likely to
hold an undergraduate major in a STEM field than other high school science teachers yet far less
likely to hold a mastersrsquo degree and slightly less likely to have earned a teaching credential in
science Most of the participating teachers had previously taught a higher-level course (mostly
honors) yet only 47 percent of them had previously taught an AP course Our research
consequently applies to a population of teachers who are relatively new to the AP science
curriculum and who have generally not received graduate training11 Assuming AP courses
improve with teacher preparation our results likely capture the effect of a less-than-ideal version
of AP and may result in less positive treatment effects than when AP is delivered by teachers
with more training and experience (Clotfelter Ladd and Vigdor 2010)
B Data and Student Descriptive Statistics
We rely on three primary and secondary data sources for impact estimates The first is an
assessment developed and validated by the research team that measures studentsrsquo scientific
inquiry skills We administered this assessment to students in both treatment and control groups
and designed it to measure general inquiry skills (eg how to analyze data) rather than specific
content knowledge in Biology or Chemistry To that end the assessment tool includes nine items
that rely on science disciplinary knowledge that is taught in middle school specifically material
from Life Sciences and Physical Sciences The assessment which we administered to all study
9
participants during a 45-minute period measures studentsrsquo skills in data analysis scientific
explanation and scientific argument12 Participating teachers were not provided copies of the
instrument in advance therefore teachers were unable to teach any content material prior to test
administration
The second source is a questionnaire that we administered concurrently with the assessment
and that asks students a number of questions about their most recent science class and their plans
after high school The assessment and questionnaire were completed together and administered
outside of class (henceforth we refer to these instruments as the ldquosurveyrdquo) The third data source
are studentsrsquo high school transcripts which contain data on demographic and socioeconomic
background grades courses standardized exams taken in the 8th and 10th grades as well as high
school completion We use these data to determine the balance of randomization on pre-
treatment covariates estimate the effect of randomization on course-taking (including
compliance) improve the precision of our estimates with statistical controls and estimate
treatment effects on studentsrsquo grades
Our survey response rate was 78 percent13 Attrition can be attributed to student absences
during the dates scheduled for survey administration and communication lapses between school
coordinators and students Students who were randomly assigned to treatment have a 9-
percentage point higher survey response rate Given the possibility of nonrandom sample
attrition we weight all regressions by the inverse of the probability of completing the survey
conditional on student characteristics14 We implement a variety of robustness checks as
additional means to account for nonresponse These include multiple imputation of missing
outcome variables excluding one high school that had a low response rate and using the Lee
(2009) technique to provide bounds on the estimated effects These methods and results are
discussed below
We supplement these data with surveys that we administered online to teachers of the new
AP courses at the conclusion of the course The teacher survey includes questions about their
educational background professional experiences and professional development past and
present instructional practices generally and around science specifically participation in the
College Board AP training ability to cover the content of the AP course and coaching
mentoring and other professional community supports received from the school district and
education community
Table 2 provides balancing tests on pre-treatment characteristics for the full sample and the
survey respondents conditional on school by cohort fixed effects15 Most of the estimated
differences between treatment and control group students on pre-treatment observed
characteristics are small with some notable exceptions In both the full and survey samples
treatment group studentsrsquo reading exam scores were 010 and 009 standard deviations higher
than control group students both at p-values below 005 The magnitude of the treatment-control
difference was slightly lower and less precisely-estimated in math yet also favored treatment
group students16 To adjust for these chance imbalances we include all student covariates as
predictors of outcomes in the models and in the robustness checks we exclude these
covariates17
Table 2 also shows the extent of differences between control group compliers and non-
compliers We find that non-compliers are generally much more academically prepared for AP
science they have higher pre-treatment reading and math test scores and are more likely to have
completed the prerequisite courses On demographics non-compliers are more likely to be Asian
American and female18
10
IV Empirical Strategy
We estimate the effect of taking the AP science course with a standard instrumental variable
specification
(1) 119884119894119895 = 120572119895 + 119860119894119895120573 + 119935119894120574 + 120598119894119895
(2) 119860119875119894119895 = 120575119895 + 119874119891119891119890119903119890119889119894119895120579 + 119935119894120583 + 120598119894119895
where 119860119875119894119895 = 1 if student i enrolled in the AP science course in school x cohort stratum j 119860119894119895 is
the fitted value based on the estimates of the parameters in Equation (2) Offeredij = 1 if the
student is randomized into the treatment group Xi is a vector of pre-treatment covariates
(including age math and reading exam scores from 8th and 10th grade (standardized and
averaged for math and reading separately) cumulative GPA prior to the year when the AP
science course was offered and indicator variables for female racial group (Asian American
Black or Hispanic Native American or Multiracial) disability gifted English Language
Learner eligible for free or reduced-price lunch home language is not English and took
recommended prerequisite courses) and 120572119895 and 120575119895 are school by cohort fixed effects19 We use
two-stage least squares to estimate the model for all outcomes The local average treatment effect
(LATE) estimate is given by β
The intent to treat (ITT) estimate is obtained by replacing 119860119894119895 with Offeredij in Equation (1)
as shown in Equation (3) The coefficient on Offeredij in Equation (3) provides the effect of
being offered enrollment in the new AP science course and is a weighted average of effects on
those who do and do not choose to enroll in the course
(3) 119884119894119895 = 120577119895 + 119874119891119891119890119903119890119889119894119895120591 + 119935119894120582 + 120598119894119895
For outcomes that are obtained from the survey we weight regressions by the inverse of the
estimated probability of completing the survey20 The results are similar without using these
weights (see Online Appendix Tables 3 4 and 6) Since we have some missingness in student
characteristics as a result of either missing student transcripts or certain data elements not
collected by the district we use multiple imputation by chained equations creating 10 imputed
datasets and combine the results21 For inference we cluster standard errors at the level of
treatment assignment (school by cohort) in our analysis of main effects In the analysis of
robustness we report permutation standard errors robust standard errors (for comparison to
permutations) and the statistical significance of the LATE estimates after adjusting our tests of
significance for multiple comparisons
V Results
A Course-Taking and Treatment Contrast
Table 3 provides estimated effects of the randomized offer of enrollment on AP science course
enrollment and share of credits in all courses for the full sample and the survey samples The
first-stage estimates indicate that the offer substantially increased the likelihood of the student
taking the AP science course by 38 percentage points in the full sample and 39 percentage points
in the survey sample As we expected compliance with randomization was imperfect with 42
11
percent of the students who received an offer choosing not to enroll and 19 percent of the control
students enrolling Nearly all of these latter crossovers reflected decisions by the district to
violate the study protocol and let control group students into the course while a few of these
came from hardship exemptions that were requested by the school and granted by the study team
The remaining rows in Table 3 shine light on the courses that were crowded out by the newly
offered AP science course Mechanically treatment group students took more credits in AP
science (an 11-percentage point increase in the share of total credits in the full sample)
Treatment group studentsrsquo share of courses in any AP also increased by 11 percentage points
indicating that they chose not to reduce enrollment in other AP courses Instead taking AP
science appears to have crowded out regular courses (down 9 percentage points) including
regular science courses (down 2 percentage points)22
Approximately 78 percent of the control group compliers took any science course with 34
percent taking a non-AP advanced science course (almost entirely honors courses) during the
study year The control students who did not take AP Biology or Chemistry took a variety of
alternative science courses with the most commonly reported courses including Chemistry
(13) Physics (12) AP Environmental Science (11) Biology (10) Honors Biology (9)
and AnatomyPhysiology (9)
Table 4 provides the contrast in treatment and control group complier reports on the content
and rigor of their science courses for three composite variables We find that taking AP science
yielded a substantially more academically challenging curriculum (up 080 sd p-value lt 001)
and raised the extent of inquiry-based classroom activities (up 033 sd p-value = 006) Our
results also suggest that AP course-takerrsquos classrooms were more likely to use technology (up
028 sd p-value = 014)23 Online Appendix Table 5 shows estimated impacts on each of the
component variables used in constructing the composite variables We find that while AP
classrooms were more inquiry-based than other science classrooms using our composite
measure some of the core components of the inquiry approach that were intended by the Board
(eg applying knowledge to solve a new problem) were not more prevalent in AP science
classes than other science classes24 This contrast between studentsrsquo reports of the content and
rigor of their AP science course relative to other courses available to them offers one measure of
the relative quality of the treatment In a companion manuscript we provide a detailed evaluation
of implementation fidelity (the degree to which the courses were implemented as intended by the
Board) through teacher surveys course syllabi student transcripts and interviews with teachers
and school administrators (Long Conger and McGhee 2018) In that manuscript we find results
that are consistent with the finding that most teachers were able to implement a rigorous AP
science classroom yet they also struggled with the inquiry-based approach and integrating
technology into the classroom
These reported differences between treatment and control group classrooms also hold despite
the fact that many of the teachers selected to teach AP also teach the other science courses taken
by control group students In fact almost 67 percent of AP teachers reported using some of their
AP science strategies and lessons in their non-AP classes These within-school spillovers likely
attenuate observed differences in outcomes between treatment and control group students in the
same school25
B AP Impact on Outcomes
Table 5 reports estimated impacts of AP science on the key outcomes of interest We estimate
that for the typical complier taking AP science raises objectively measured scientific inquiry
skills by 023 standard deviations We are unable to rule out zero treatment impacts with
12
conventionally high levels of confidence (p-value = 014) and consequently refer to these results
as more suggestive than definitive AP science also increased compliersrsquo interest in pursuing a
STEM degree should they enroll in college by 9 percentage points up from a control group
complier mean of 62 percent with again more suggestive than definitive results at traditional
levels of statistical inference (p-value = 016)
Table 5 provides stronger evidence of negative treatment effects on studentsrsquo confidence in
their ability to succeed in a college science course Among control group compliers 92 percent
express that they are at least somewhat confident in their ability to succeed in a college science
course These high levels of confidence are perhaps not surprising since all of our sample
participants demonstrated interest in taking AP Chemistry or Biology as a result of signing the
study assent forms Taking AP science substantially lowered participantsrsquo likelihood of being at
least somewhat confident in their ability to complete college courses in science (down 10
percentage points p-value = 006) We also find large effects of the AP course on studentsrsquo self-
reported stress levels Among control group compliers 12 percent stated that their most recent
science class had a negative or strong negative impact on their stress levels (where a negative
impact indicates more stress) Taking AP science more than doubles this rate raising the
likelihood of stating a negative impact by 17 percentage points (p-value = 001) In results
available from the authors we also examine the effect of taking AP on the full distribution of
studentrsquos self-reported confidence and stress levels We find that taking AP science increases
studentsrsquo likelihood of reporting strong negative impacts on stress by 5 percentage points (p-
value = 005) above the control group complier mean of 2 percent
In addition to experiencing a loss in confidence and an increase in stress treatment group
studentsrsquo grades suffered We estimate that taking AP science reduced studentsrsquo grades in their
science courses by 029 points (p-value = 007) Relative to a control group complier mean of
280 taking AP science lowers studentsrsquo science GPAs during the study year (usually their junior
year) from around a B- to a C+26 This decline is addressed to some degree by high schools that
use a weighted grade point average to upweight grades from AP courses The last row of Table 5
provides our estimated effects of AP science on studentsrsquo grades in other courses AP science
takers score approximately 018 grade points lower than control group compliers in non-science
courses during the study year (p-value below 001) These results suggest that students may be
shifting their effort away from their non-AP classes in order to meet the demands of the
challenging AP course An average of these impacts weighted by studentsrsquo share of credits in
science during the study year assuming that they take AP science (024) suggests that taking AP
science lowers studentsrsquo overall grades by 021 during the year ((-029 times 024) + (-018 times
076))
With our estimates in hand we can easily compute the adjustment that would leave the
studentrsquos GPA during the study year unaffected For students who took AP Biology or Chemistry
as result of this experiment the share of their classes in any AP science subject is predicted to be
14 percent (ie 002 + 012 from Table 3) If these studentsrsquo grades in AP science courses were
boosted by 146 (021014) their GPAs during the study year would be unaffected by their
enrollment in these AP courses This 146 boost is close to the higher end of the practices
documented in Klopfenstein and Lively (2016)27
C Robustness Checks
Table 6 presents a variety of robustness checks of the ITT estimates on our six main outcomes
The first two columns of this table repeat the findings previously shown in Table 5 Columns (3)
and (4) present alternate methods for inference Column (3) reports robust standard errors and
13
Column (4) reports the results of a permutation test where we randomly assign a pseudo
treatment and compute the share of 1000 permutations where the absolute value of the estimated
pseudo treatment effect exceeds the absolute value of the estimated treatment effect shown in
Column (2)28 The resulting p-values from this permutation test are similar to the results using
robust standard errors (shown in Column (3)) resulting in five of the six outcomes with p-values
of less than 01029
Columns (5) through (7) of Table 6 show that the results are robust to (a) dropping the one
high school that offered both AP Biology and AP Chemistry as part of the study (b) including
observations with multiply-imputed missing outcome variables and (c) excluding the high
school with the lowest survey response rate30 Column (8) shows the results when we exclude all
of the Xi covariates where we find much larger estimated positive effects on scientific inquiry
skills and smaller estimated negative effects on grades The differences in the treatment effects
on the remaining three outcomes are modest These results likely reflect the fact that students
who were randomly assigned into the treatment group have higher pre-treatment grades and
reading and math test scores all covariates that strongly correlate with science skill and future
grades
Columns (9) through (12) of Table 6 use the Lee (2009) method to place bounds on our
estimates due to potential nonresponse bias in the student survey used for the first four outcomes
This method trims particular observations from the treatment group (in this case) until it matches
the response rate of the control group The lower (upper) bound estimate trims the treatment
observations with the highest (lowest) values of the outcome Using these lower and upper bound
estimates we compute the 95 percent confidence interval for the treatment effect itself by
applying the Imbens and Manski (2004) method Consistent with our main findings the upper
and lower bound points estimates are positive for science skill (003 and 039 sd) interest in
pursuing a STEM degree (2 and 12 percentage points) and stress (1 and 11 percentage points)
However the 95 percent confidence intervals overlap zero in all cases and are roughly double the
size of the ordinary confidence intervals These results suggest that some additional caution
should be considered in evaluating the effects from outcomes based on the study survey31
Finally we would have liked to report the results of theoretically motivated heterogeneity
analyses yet we lack the statistical power needed to test heterogeneity with a high level of
confidence For example Figure 3 shows a quantile regression conditional on Xi with science
skill as the outcome We find that the point estimates at every quantile are insignificantly
different from the 009 ITT point estimate reported in Table 5 yet the 95 confidence intervals
fail to rule out large positives and negatives Additional heterogeneity results can be found in the
Online Appendix32
VI Conclusion
Most admissions committees at bachelorrsquos degree-granting institutions rely on applicantsrsquo AP
course and exam participation as signals of subject-matter skill and interest rendering the
relationship between AP uptake and college enrollment somewhat deterministic There has been
almost no empirical work to support the theory that AP disproportionately endows high school
students with greater human capital than the other courses available to them Many students
educators and parents have also complained that the rigor of the AP pro- gram causes students to
lose confidence gain stress and perform poorly in other courses We evaluate these claims with
experimental evidence on the impact of AP Biology and Chemistry courses on studentsrsquo skills
14
interests and beliefs We recruited 23 schools that had not previously offered AP Biology or
Chemistry and were willing to permit us to randomize student access to the newly offered
course At the time of our school recruitment an estimated 50 percent of US high schools
already offered AP science classes and they tended to be in relatively higher-income
communities disproportionately serving White students (Malkus 2016) Our study drew from the
remaining population of schools where teachers had lower levels of training than science
teachers nationally and students were disproportionately non-White and poor Consequently our
results on AP impacts best generalize to schools like these that are on the cusp of deciding
whether to offer an AP science course
The estimates suggest that AP science led to improvements in science skill and STEM
interest above the courses that these students would otherwise take Prior research points to
longer-run benefits of AP including a higher likelihood of college enrollment and completion as
well as possible earnings gains (Jackson 2010 2014) Our findings suggest that these long-term
effects are at least partially driven by genuine increases in skill and not due solely to
postsecondary admissions and credit-granting policies33 We also find that AP science classes
substantially increase studentsrsquo stress levels and reduce their confidence in completing a college
science course Students who take AP science also receive lower grades in science and in other
(non-science) courses The cognitive gains from AP science are consistent with evidence that
higher levels of pressure and a lower level of confidence cause students to learn more than they
would otherwise And some of the negative effect on grades can be offset by upwardly weighting
grades in advanced courses
Although we have no direct way to convert our study impacts into monetary values for
students or society our evidence suggests that schools and districts are not making unwise or
costly investments in AP Calculating the differential cost to deliver an AP course versus another
level course in the same subject is difficult given that few schools document per-course
expenditures One recent analysis of a US district that relied on teacher salaries and course
assignments offers a partial cost-analysis Roza (2009) finds approximately $360 more in per-
pupil expenditures to deliver AP versus honors due primarily to smaller class sizes and more
senior teachers in AP This cost does not factor in the time that teachers spend retraining
themselves to teach the new curriculum At the same time relative to other policies aimed at
increasing human capital in high school that are often more costly to implement (such as
reducing class size) offering an AP course may be one of the least expensive options
This study offers the first credible estimates on the impact of a curriculum that is now offered
in the majority of the nationrsquos high schools and used by most postsecondary institutions to assess
applicant potential Our findings offer evidence to support and refute some of the claims made
about the AP program At the same time many important questions remain about differential AP
course impacts along student teacher and school attributes and on different parts of the outcome
distributions What are the general equilibrium effects of AP expansion for instance on college
admissions decisions as AP expands into schools with fewer resources Do AP courses generate
spillover effects on non-AP course-takers via changes in peer interactions and changes in how
teachers teach their non-AP classes These are all questions that warrant further research
15
References
Abadie Alberto Susan Athey Guido W Imbens and Jeffrey Wooldridge 2017 ldquoWhen Should
you Adjust Standard Errors for Clusteringrdquo NBER Working Paper No 24003
Cambridge MA NBER
Adelman Clifford 2006 The Toolbox Revisited Paths to Degree Completion from High School
Through College Washington DC US Department of Education
Aguilar Lauren Greg Walton and Carl Wieman 2014 ldquoPsychological Insights for Improved
Physics Teachingrdquo Physics Today 67 (5) 43ndash49
Altonji Joseph G 1995 ldquoThe Effects of High School Curriculum on Education and Labor
Market Outcomesrdquo The Journal of Human Resources 30 (3) 409ndash438
Anderson Carl R 1976 ldquoCoping Behaviors as Intervening Mechanisms in the Inverted-U-stress-
performance Relationshiprdquo Journal of Applied Psychology 61 (1) 30ndash34
Attewell Paul and Thurston Domina 2008 ldquoRaising the Bar Curricular Intensity and
Academic Performancerdquo Educational Evaluation and Policy Analysis 30 (1) 51ndash71
Avery Christopher Oded Gurantz Michael Hurwitz and Jonathan Smith 2018 ldquoShifting
College Majors in Response to Advanced Placement Exam Scoresrdquo Journal of Human
Resources 53 (4) 918ndash956
Benjamini Yoav and Yosef Hochberg 1995 ldquoControlling the False Discovery Rate A Practical
and Powerful Approach to Multiple Testingrdquo Journal of the Royal Statistical Society 57
(1) 289ndash300
Bennett J S Hogarth F Lubben B Campbell and A Robinson 2010 ldquoTalking Science The
Research Evidence on the Use of Small Group Discussions in Science Teachingrdquo
International Journal of Science Education 32 (1) 69ndash95
Berger Joe 2006 ldquoDemoting Advanced Placementrdquo The New York Times October 4
Boekaerts Monique and Jeroen S Rozendaal 2010 ldquoUsing Multiple Calibration Indices in
Order to Capture the Complex Picture of What Affects Studentsrsquo Accuracy of Feeling of
Confidencerdquo Learning and Instruction 20 (5) 372ndash382
Bound John Brad Hershbein and Bridget Terry Long 2009 ldquoPlaying the Admissions Game
Student Reactions to Increasing College Competitionrdquo The Journal of Economic
Perspectives 23 (4) 119ndash146
Bowie Liz 2013 ldquoMaryland Schools have been Leader in Advanced Placement but Results are
Mixedrdquo The Baltimore Sun August 17
Bush George W 2006 ldquoState of the Union Address by the Presidentrdquo Washington DC The
White House
Chiu Ming Ming and Robert M Klassen 2010 ldquoRelations of Mathematics Self-Concept and its
Calibration with Mathematics Achievement Cultural Differences among Fifteen-year-
olds in 34 Countriesrdquo Learning and Instruction 20 (1) 2ndash17
Clotfelter Charles T Helen F Ladd and Jacob L Vigdor 2010 ldquoTeacher Credentials and
Student Achievement in High School Across-Subject Analysis with Student Fixed
Effectsrdquo Journal of Human Resources 45 (3) 655ndash681
College Board 2002 Equity Policy Statement New York NY
__________ 2011a AP Biology Curriculum Framework 2012-2013 New York NY
__________ 2011b AP Chemistry Curriculum Framework 2013-2014 New York NY
__________ 2017a AP Course and Exam Redesign New York NY
__________ 2017b AP Course Audit New York NY
__________ 2018 AP Program Participation and Performance Data 2018 New York NY
16
Davis Jennifer R 2014 ldquoA Little Goes a Long Way Pressure for College Students to Succeedrdquo
Journal of Undergraduate Research 12 (1) 1ndash9
Dobbie Will and Roland G Fryer Jr 2015 ldquoThe medium-term impacts of high-achieving
charter schoolsrdquo Journal of Political Economy 123 (5) 985ndash1037
Dougherty Chrys and Lynn Mellor 2009 ldquoPreparation Mattersrdquo National Center for
Educational Achievement Washington DC
Dounay Zinth Jennifer 2016 ldquo50-State Comparison Advanced Placement Policiesrdquo Education
Commission of the States
Drew Christopher 2011 ldquoRethinking Advanced Placementrdquo The New York Times January 7
Duffett Ann and Steve Farkas 2009 ldquoGrowing Pains in the Advanced Placement Program Do
Tough Trade-offs Lie Aheadrdquo Thomas B Fordham Institute Washington DC
Ellis Jessica Bailey K Fosdick and Chris Rasmussen 2016 ldquoWomen 15 Times More Likely to
Leave STEM Pipeline after Calculus Compared to Men Lack of Mathematical
Confidence a Potential Culpritrdquo PLOS ONE 11 (7) 1ndash14
Foust Regan Clark Holly Hertberg-Davis and Carolyn M Callahan 2009 ldquoStudentsrsquo
Perceptions of the Non-academic Advantages and Disadvantages of Participation in
Advanced Placement Courses and International Baccalaureate Programsrdquo Adolescence
44 (174) 289ndash312
Geiser Saul and Veronica Santelices 2004 ldquoThe Role of Advanced Placement and Honors
Courses in College Admissionsrdquo Center for Studies in Higher Education Research
Occasional Paper Series CSHE404
Goodman Joshua Samuel 2012 ldquoThe Labor of Division Returns to Compulsory Math
Courseworkrdquo Unpublished Manuscript
Harel O 2009 ldquoThe Estimation of R-squared and Adjusted R-squared in Incomplete Data Sets
Using Multiple Imputationrdquo Journal of Applied Statistics 36 (10) 1109ndash1118
Hippel Paul T von 2007 ldquoRegression with Missing Ys An Improved Strategy for Analyzing
Multiply Imputed Datardquo Sociological Methodology 37 (1) 83ndash117
Holstead Michael S Terry E Spradlin Margaret E McGillivray and Nathan Burroughs 2010
ldquoThe Impact of Advanced Placement Incentive Programsrdquo Center for Evaluation and
Education Policy Indiana University Education Policy Brief 8(1)
Hopkins Katy 2012 ldquoWeigh the Benefits Stress of AP Courses for Your Studentrdquo US News
the World Report May 10
Huber Martin 2013 ldquoA Simple Test for the Ignorability of Non-compliance in Experimentsrdquo
Economics Letters 120 (3) 389ndash391
Imbens G and F Manski 2004 ldquoConfidence Intervals for Partially Identified Parametersrdquo
Econometrica 72 (6) 1845ndash1857
Jackson C Kirabo 2010 ldquoA Little Now for a Lot Later A Look at a Texas Advanced Placement
Incentive Programrdquo Journal of Human Resources 45 (3) 591ndash639
__________ 2014 ldquoDo College-Preparatory Programs Improve Long-Term Outcomesrdquo
Economic Inquiry 52 (1) 72ndash99
Joensen Juanna Schroslashter and Helena Skyt Nielsen 2009 ldquoIs there a Causal Effect of High
School Math on Labor Market Outcomesrdquo Journal of Human Resources 44 (1) 171ndash
198
Kim Emily 2015 ldquoAP Classes often Translate to Advanced Pressurerdquo Los Angeles Times
September 22
Klopfenstein Kristin and Kit Lively 2016 ldquoDo Grade Weights Promote More Advanced
17
Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324
Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement
Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891
__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and
Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds
Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188
Cambridge Harvard Education Press
Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla
Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)
287ndash 313
Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on
Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102
Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations
of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347
(6219) 262ndash265
Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math
and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic
Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student
STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher
Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking
on Secondary and Postsecondary Successrdquo American Educational Research Journal 49
(2) 285ndash322
Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP
Expansion Can Schools in Less-Resourced Communities Successfully Implement
Advanced Placement Science Coursesrdquo Conditionally accepted by Educational
Researcher
Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo
American Enterprise Institute Washington DC
Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23
McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy
Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of
Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-
144) US Department of Education Washington DC National Center for Education
Statistics
National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of
Mathematics and Science in US High Schoolsrdquo Washington DC National Academies
Press
__________ 2012 A Framework for K-12 Science Education Practices Crosscutting
Concepts and Core Ideas Washington DC The National Academies Press
Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC
Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data
Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures
Version 10 Stanford University
Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic
Analysis amp Policy 4 (1) 1ndash30
18
Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The
Review of Economics and Statistics 86 (2) 497ndash513
Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)
Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of
Advanced High School Coursework in Increasing STEM Career Interestrdquo Science
Educator 23 (1) 1ndash13
Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework
in College Admission Decisionsrdquo College and University 82 (4) 7ndash14
Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan
Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific
Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo
Educational Measurement Forthcoming
Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where
it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor
Economics 35 (1) 67ndash147
Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An
Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732
Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual
differencesrdquo Personality and Individual Differences 21 (6) 971ndash986
Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of
Cross-Cultural Psychology 45 (5) 821ndash837
Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid
Growthrdquo The New York Times April 29
Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo
Liberal Education 94 (3) 38ndash43
The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo
Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo
Education Trust June 5
Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and
Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-
001) US Department of Education Washington DC National Center for Education
Statistics
Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13
Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate
US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the
Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced
Placement Testsrdquo Washington DC
Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of
Advanced Placementrdquo Progressive Policy Institute Washington DC
West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth
Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring
Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation
and Policy Analysis 38 (1) 148ndash170
Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity
of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482
19
Figure 1
Geographic Distribution of Participating Districts
20
Figure 2
Participating Districts Neighborhood Socioeconomic Status and School Test Scores
Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school
district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos
neighborhood defined as the first principal component factor score based on measures of median
income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed
household rate and unemployment rate Y-axis is the districtrsquos average test score in grade
equivalents based on the averaged spring math and English scores for students in grades 3-8 for
2009-2013 with the expected level of achievement standardized to zero The size of each circle
is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using
Statarsquos default settings and roughly shows the predicted test score as a function of the
neighborhoodrsquos SES
21
Figure 3
Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile
Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects
Corresponding OLS estimate shown by the dashed horizontal line Science skill has been
standardized to have a mean of 0 and SD of 1 for the full sample of participating students
Results are weighted by the inverse probability of completing the survey
22
Table 1
Participating Schools and Teachers Compared to Other US High Schools and High School
Science Teachers Panel A Schools Participating Others
Average Enrollment 1409 723
Free or Reduced-Price Lunch 0700 0438
Asian 0055 0050
Black 0349 0154
Hispanic 0410 0221
White 0164 0537
Adjusted Cohort Graduation Rate 0843 0802
District Instruction Expenditures Per Pupil $6561 $5636
District Student Services Expenditures Per Pupil $3787 $3385
Panel B Teachers Participating Others
Age Under 30 0407 0160
Age 30-49 0432 0553
Age 50 or over 0161 0287
Female 0630 0536
Hispanic or Latino 0111 0051
Race American Indian or Alaska Native 0000 0009
Race Asian American 0111 0041
Race Black 0111 0060
Race Native Hawaiian or other Pacific Islander 0000 0004
Race White 0778 0896
Years of Experience 103 132
Years of Experience lt=2 0290 0085
Years of Experience lt=5 0481 0234
Hold a Teaching Certificate 0926 0945
Undergraduate Major in STEM 0944 0747
Single Subject Credential in Science 0630 0823
Masterrsquos Degree or Higher 0356 0615
Previously Taught AP Course 0469 NA
Previously Taught AP IB or Honors Course 0796 NA
Number of Professional Development Trainings 309 NA
in the Past 5 years (0-5)
Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts
httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public
high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a
9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the
Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey
httpsncesedgovsurveyssass Others in Panel B refers to public and private high school
teachers in the US High school science teachers are defined as teachers of grades 9-12 whose
main teaching assignment is in the natural sciences
23
Table 2
TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics
(1) (2) (3) (4) (5) (6)
Full Sample Survey Sample
Pre-Treatment Characteristic
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Age as of October of 11th Grade 166 -003 -007 166 -001 -001
(002) (007) (003) (009)
[019] [035] [065] [094]
Math Exam Score 038 008 025 044 007 030
(004) (010) (005) (016)
[008] [002] [017] [006]
Reading Exam Score 029 010 018 036 009 017
(003) (012) (004) (017)
[000] [014] [002] [031]
HS Grade Point Average 316 005 020 323 006 013
(003) (008) (003) (010)
[014] [002] [006] [020]
Female 059 000 010 061 -001 011
(003) (006) (004) (007)
[099] [010] [073] [012]
Asian American 012 002 010 012 003 010
(002) (005) (001) (007)
[027] [006] [007] [012]
Black 032 -002 -006 027 000 -005
(002) (006) (002) (005)
[029] [028] [088] [040]
Hispanic Native American or Multiracial 031 001 005 033 001 005
24
(002) (006) (002) (007)
[055] [041] [081] [051]
Disabled 002 000 -001 001 000 -001
(001) (001) (001) (001)
[093] [024] [057] [05]
Gifted 013 003 000 014 002 001
(002) (005) (002) (009)
[006] [100] [025] [089]
English Language Learner 005 001 002 004 001 004
(001) (002) (001) (003)
[041] [039] [054] [022]
Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007
(002) (007) (003) (009)
[066] [077] [072] [045]
Language Other than English Spoken at Home 034 002 003 035 001 004
(002) (007) (002) (007)
[032] [073] [059] [056]
Took Recommended Prerequisite Courses 079 000 009 079 002 005
(002) (004) (002) (005)
[084] [004] [043] [031]
Number of Observations 1819 1417
Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by
School x Cohort are in parentheses and p-values are in brackets
25
Table 3
First Stage Impacts on AP Course Enrollment and Overall Course Enrollment
(1) (2) (3) (4) (5) (6)
Full Sample Survey Respondents
Outcome
Control
Group
Mean
ITT
LATE
Control
Group
Mean
ITT
LATE
AP Treatment Course Enrollment 019 038 024 039
(005) (006)
[000] [000] Share of Credits During Study Year in
AP Science 003 004 011 003 004 010
(001) (001) (001) (001)
[000] [000] [000] [000]
All AP 013 004 011 014 004 010
(001) (002) (001) (002)
[000] [000] [000] [000]
Other Advanced Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [023] [020] [020]
All Other Advanced 025 -001 -003 025 -001 -003
(001) (002) (001) (003)
[023] [023] [030] [030]
Regular Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [020] [024] [019]
All Regular 062 -003 -009 061 -003 -007
(001) (003) (001) (003)
[002] [000] [007] [003]
Number of Observations 1819 1417
Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating
Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation
(1) Course-taking information collected from student transcripts Control Group Mean uses the
full control group for the first outcome (ie AP Treatment Course Enrollment) and those control
group members who complied with their assignment (ie those who did not take the AP
Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are
weighted by the inverse probability of completing the survey Standard errors clustered by School
x Cohort are in parentheses and p-values are in brackets
26
Table 4
Treatment Contrast (Composite Variables)
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Academically Challenging Curriculum -033 031 080
(010) (024)
[000] [000]
Project-Based Independent Classroom
Activities -006 013 033
(007) (017)
[007] [006]
Integrated Use of Technology
-011 011 028
(008) (019)
[019] [014]
Number of Observations 1417
Notes To construct these composite variables we first converted the values on each component
variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest
category was set to 10 the lowest to 00 and the remaining categories evenly spaced between
00 and 10 We then averaged and standardized these converted values Results are weighted by
the inverse probability of completing the survey Online Appendix Table 5 provides the list of
component variables Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
27
Table 5
AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Science Skill -019 009 023
(006) (016)
[015] [014]
STEM Interest 062 004 009
(002) (007)
[016] [016]
Confidence in College
Science 092 -004 -010
(002) (005)
[011] [006]
Stress 012 007 017
(003) (007)
[002] [001]
Grades in Science Courses 280 -012 -029
(007) (016)
[008] [007]
Grades in Other Courses 314 -007 -018
(002) (006)
[000] [000]
Number of Observations 1819 for grades 1417 for other
outcomes
Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of
participating students STEM interest =1 if high or some interest in pursuing a STEM degree or
=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to
complete a college science course or =0 if somewhat not confident or not at all confident Stress=
1 if most recent science course had strong negative or negative impact on physical or emotional
health or =0 if strong positive impact positive impact or no impact Grades in science and other
courses are obtained from student transcripts and measure grades during the study year
Results with the exception of grades during study year are weighted by the inverse probability of
completing the survey Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
Table 6
Robustness Checks of Main ITT Results
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Outcome
Control
Group
Complie
r Mean
Main
Result
s
Robus
t SE
p-value
(permutatio
n test)
Excludin
g High
School
56
Including
Imputatio
n of
Missing
Outcome
Variables
Excluding
Covariate
s
Excludin
g High
School
23
Lee
Lower
Boun
d
Lee
Upper
Boun
d
95
Confidence
Interval
from Lee
Bounds
Rati
o of
95
CI in
(11)
to
95
CI in
(7)
Science Skill -019 009 010 011 020 007 003 039
-
009
05
1 20
(006) (005) (000) (000) (000) (000) (007) (007)
[015] [006] [006] [020] [011] [001] [024] [072] [000]
STEM Interest 062 004 005 003 003 003 002 012
-
003
01
8 19
(002) (003) (000) (000) (000) (000) (003) (004)
[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College
Science 092 -004 -003 -006 -006 -004 -006 005
-
009
01
0 20
(002) (002) (000) (000) (000) (000) (002) (003)
[011] [005] [007] [037] [002] [003] [010] [000] [017]
Stress 012 007 005 006 008 007 001 011
-
005
01
5 16
(003) (002) (000) (000) (000) (000) (003) (002)
[002] [000] [000] [014] [007] [002] [002] [079] [000]
Grades in Science Courses 280 -012 -006 -010 -007 |
(007) (004) (000) (000) (000)
[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts
Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey
(002) (003) (000) (000) (000) |
[000] [001] [001] [000] [001] [038]
Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than
standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby
a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of
43
the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)
reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the
experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply
imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and
where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)
from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those
treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and
control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to
derive confidence interval for the treatment effect itself)
30
1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the
Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the
effect of advanced high school courses more generally often without distinctions between AP
and other rigorous course options Nearly all of these nonexperimental studies find large positive
effects of rigorous secondary school courses particularly those in math and science on studentsrsquo
high school postsecondary and labor market performance (eg Altonji 1995 Attewell and
Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long
Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer
an AP course are encouraged (though not required) to attend a professional development
training The Board and other independent agencies offer several workshops with the most
extensive training being the AP summer institute a week-long training that is led by an
experienced AP instructor Teachers are then expected to develop their syllabi for the course and
submit them to the Board for review A team of auditors at the Board review each syllabus and
grant permission to a school to label the course as AP on course catalogs and student transcripts
once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they
do not meet the requirements upon original submission College Board (2017b) contains a
discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for
assessment (ie course delivery and student performance are not assessed by the Board) In
order to effectively run an AP Biology or Chemistry course teachers require access to a well-
equipped classroom and laboratory including all supplies necessary to engage in
experimentation (eg beakers solutions microscopes measuring equipment) Most of the
teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of
learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry
reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses
were based upon recommendations from the National Science Foundation the National Research
Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently
influenced by their frames of reference in ways that other assessments of these traits (eg
external observations) may be less influenced By increasing the standard to which they compare
themselves studentsrsquo confidence may decrease This feature of most self-assessments could be
considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et
al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome
depends to some extent on how these changes in perceived ability influence other behaviors
such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and
Biology I and Chemistry I for AP Biology with no additional requirements beyond these
prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week
training course classroom supplies (eg lab materials textbooks) and to compensate schools
for the staff time required for study administration efforts We also offered $1000 compensation
for an individual selected by the school to serve as a liaison between the study team and the
31
school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of
students which would have powered the study to detect effect sizes smaller than those detected
here We faced several challenges in recruiting schools to participate even with the monetary
incentives Some schools were uncomfortable with randomization across classrooms while
others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the
course would be offered We also made some assignments on a rolling basis as additional
consentassent forms were submitted We have no information on the students who were deemed
eligible by the school to take the new AP science course but who did not sign the consent form
to participate As these students did not participate we do not have permission to obtain
information on their characteristics (eg via transcripts) and for most schools we do not know
the number of such students 10 Participating districts include Anaheim Union High School District California East Side
Union High School District California Lynwood Unified School District California Jefferson
Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg
Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public
Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville
Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate
degrees held by teachers nationally are likely to be in education (not STEM) Thus the study
teachers are less likely to have a graduate degree but not necessarily less likely to have STEM
training We also did not survey teachers regarding their Teach for America (TFA) experience
but it is possible that the relatively high share of STEM undergraduate degrees could be driven
by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last
pilot test included 140 students) prior to administering the tool to study participants Reliability
metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of
the development of the assessment instrument in the survey can be found in Seeratan et al
(2017) 13 Each year in the spring semester our team administered and collected the participant surveys
during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However
if study participants who did not take the survey differ in unobserved ways then our reweighting
based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo
characteristics before imputation of missing values (as described below) these results are very
similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact
that some students did not have 10th grade scores) we created one reading and math score for
each student that is the average of both scores or just the 8th grade score For the 23 participating
students who were in 10th grade during the year in which the AP course was offered to their
cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be
endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of
enrollment in the course so the lack of balance is simply due to unlucky randomization rather
32
than manipulation by school administrators We considered implementing a randomized block
design to avoid such issues but found it infeasible to obtain the necessary test score information
prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was
allowed to register for the new class We added an entire planning year to our study design to
avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by
Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We
find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these
six outcomes which suggests that generalizing our estimated treatment effects to the full control
group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly
offered enrollment in an AP course and then given the option of Chemistry or Biology To
account for the two courses offered we treat the school as two separate groups School-
Chemistry and School-Biology For those students who were not offered an AP course we
randomly assign them to one of two control groups proportional to the number of treated
students who chose each course For example if 60 of the treated students chose Biology then
we randomly assign 60 of the control students to the School-Biology control group In Section
VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a
probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1
if student i in school by cohort j completed any part of the end-of-year survey Xi is the same
vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed
effects and Φ() is the cumulative normal distribution function The results of this regression are
included in Online Appendix Table 2 Students who had higher pre-treatment grades Black
students those who were not disabled and those who took prerequisite courses were more likely
to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives
more weight in the regression to study participants who completed the survey and yet had pre-
study characteristics that were similar to those study participants who did not complete the
survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and
with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we
observe each outcome variable This follows a multiple imputation then deletion strategy
suggested by Hippel (2007) which improves efficiency while protecting against problematic
imputed outcome values As a robustness check Section VC provides results including
imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually
challenging curriculum with more homework than non-AP complier students Treatment group
students are also more likely to report that the students in their class were driven to succeed and
that the teacher set high standards The AP science class also involved more student-led projects
or experiments hands on learning and small group work all activities that are deemed to be
essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)
Yet we do not find strong evidence that students in AP classes were more likely to present what
they learned apply their knowledge to solve a new problem or work independently and none of
the component measures of technology usage were statistically significantly affected Nor did
33
treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear
better able to implement the academic rigor expected of an AP science class than some of the
inquiry-based approaches that the College Board intends for AP science We do not find
evidence that taking AP science led students to be more likely to report that they found their
course more interesting which may reflect the inability of the teachers to fully implement a
creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects
that might render our estimated effects smaller A research design with randomization both
across and within schools would allow for estimation of spillover effects but such a design was
infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers
received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the
weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors
in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple
outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons
(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same
three outcomes that reach statistical significance without applying the correction (shown in
Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys
from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we
have included the students from cohort 1 of high school number 23 where nonresponse was due
mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes
(including student prior academic preparation raceethnicity gender and teacher preparation)
We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in
science and grades in other courses) Some of the differences in the point estimates were quite
large yet so too were the standard errors For instance five of the seven estimated differential
treatment effects on science skill exceed 025 standard deviations with p-values that fall in the
suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse
on all three cohorts of study participants Once data collection is complete we will have the
ability to examine the effect of AP science on college enrollment college selectivity and college
completion
5
standard to which they compare themselves or by making them more aware of the challenges
they might face in a college course Whether these changes in perceived confidence persist and
how they influence later outcomes is uncertain Students with expectation levels that match the
real demands of college courses might eventually perform better in those courses Some students
might also use the insights they gain from a challenging AP science class to shift away from
difficult science courses in college (or entire majors) that could delay or hinder their college
completion Our results also suggest that AP causes a significant amount of stress for students
but we do not find evidence that the added pressure substantially limits their knowledge gains in
science
II AP Science and Conceptual Framework
A AP and Other Rigorous Secondary School Courses
The AP program is an appealing option for high school administrators who seek to offer college-
level courses to their students AP course descriptions and assignments are designed to match
those offered in introductory college courses in each subject and thus to prepare students for the
rigor of college coursework The College Board (the ldquoBoardrdquo for brevity) is a not-for-profit
organization that administers AP and provides professional development for teachers reviews of
course syllabi and extensive curricular materials (eg sample syllabi sample lab experiments)3
The Board also offers standardized AP exams in the spring of each year that are graded by
external examiners and provide an externally-validated measure of student learning Most exams
include both an essay or problem-solving component and multiple-choice questions all of which
are aligned with the course descriptions The exam is one of the key features of the AP program
and is used by high school and postsecondary educators to evaluate the depth of studentsrsquo skill
independently of teacher bias
In addition to AP courses high school students typically have three alternative options for
advanced coursework Most high schools offer ldquohonorsrdquo courses which are intended to provide a
more rigorous curriculum than the regular course in the same subject The content and rigor of
honors courses varies across high schools and there is no standardized honors exam offered to
students in these courses A second option is the International Baccalaureate (IB) program
which was originally designed for students in international schools and aims to develop
studentsrsquo critical thinking skills and their knowledge of international affairs The IB program is
offered worldwide but remains relatively uncommon in the United States with less than 5 percent
of high schools offering IB in 2016 (The IB Programme 2016) A final option is for students to
take a course at a nearby college (or online) or for some a course that is taught at their high
school by an instructor who has been approved as college-level These ldquodual enrollmentrdquo or
ldquodual creditrdquo courses are meant to provide students with the opportunity to simultaneously earn
high school and college credit In the most recent national survey high schools reported
approximately two million enrollments in dual credit courses (Thomas et al 2013) There is
limited information on the colleges that accept dual enrollment credits Most courses are offered
through collaborations between high schools and local community and public postsecondary
institutions suggesting that credits are generally accepted at these institutions and less often
accepted at other institutions Comparisons of AP science classes to regular and honors level
science classes reveal that students receive much more homework and work harder in their AP
classes (Sadler et al 2014) To our knowledge there have been no comparisons of the workload
or effort in AP science courses compared to IB or dual enrollment science courses
6
B Conceptual Framework
There are several channels through which an AP science class is expected to influence studentsrsquo
cognitive and noncognitive skills Much like the ideal college course AP science is designed to
provide rigorous content and a substantial workload be taught by teachers who have high
expectations and consist of students who are driven to succeed These inputsmdashcourse rigor
teacher expectations and peer motivationmdashare often thought of as the main characteristics that
distinguish AP courses from other high school courses
Yet AP science classes are also intended to offer an inquiry-based approach to science that
when combined with a high level of rigor provides an additional causal pathway to change
Specifically a well-implemented AP science course should encourage students to ask questions
gather and interpret data arrive at explanations grounded in scientific principles and
communicate their observations to one another under the guidance of teachers (College Board
2011a 2011b)4 This student-led inquiry-based approach differs from many traditional
secondary school science classrooms where the goal is often for students to memorize content
and replicate laboratory experiments that demonstrate the content (National Research Council
2002 2012) The AP science course in contrast seeks to expose students to the real-world
practices of science and the skills that form the basis of scientific inquiry by focusing more on
big picture concepts and small group experimentation with students directing the inquiry The
curriculum also encourages teachers to move away from lecture-based pedagogy and multiple-
choice quizzes and to increase their use of technology to help students analyze data draw
interpretations and communicate findings (College Board 2011a 2011b)
AP science classes are expected to increase studentsrsquo ability to ask research questions design
experiments analyze data and draw conclusions In the process of gaining these scientific
inquiry skills the new curriculum is intended to spur greater interest in the practice of science
because it becomes more enjoyable and more accessible to students for whom rote memorization
and execution of prefabricated lab experiments might have diminished enthusiasm in the subject
(National Research Council 2012) Science experts posit that inquiry-based science courses will
be particularly successful in generating greater interest and skill among women and among
students from underrepresented minority groups (Aguilar Walton and Wieman 2014 Ellis
Fosdick and Rasmussen 2016 Kurth Anderson and Palincsar 2002 Leslie et al 2015 Litzler
Samuelson and Lorah 2014)
While the rigor and expectations of a college course may be appropriate for some students it
can be too demanding for others Students often report high levels of stress and burnout from
taking AP courses particularly if they perceive that they are not prepared for the challenge of
college coursework (Kim 2015 Marx 2014 Tucker 2012) A strenuous AP course could in fact
cause students to lose confidence in their ability to complete college science courses A number
of mechanisms could cause students to lose confidence including exposure to stronger peers
inability to successfully complete assignments or simply receiving lower grades than they
received in their non-AP courses5 The AP effect on confidence will likely matter differently for
students with different levels of initial confidence For students who are over-confident in their
ability to succeed in college science courses taking a challenging AP course in high school
might cause them to revise their expectations to be more in line with the higher demands of
college-level work
Taking a more strenuous AP course is also likely to affect studentsrsquo time allocation
Studentsrsquo performance in each class will be determined by their subject-specific ability as well as
the amount of time they devote to their coursework versus other activities including work
7
extracurricular and leisure If AP courses are more demanding than other courses students
solving a time allocation problem may shift more effort into their AP course away from other
pursuits The impact of this change in time allocation on studentsrsquo performance in AP and other
courses will depend upon whether they shift effort away from other courses and on the degree of
complementarity between their AP science course and their other courses Study time devoted to
an AP science course could improve student performance in other math and science classes
(where the skills tasks and knowledge are similar) even if students spend less time on those
courses For courses that require students to perform tasks that are not complementary with AP
science (eg courses in the humanities) taking AP science concurrently with these courses
could decrease student performance in both courses Of course students taking an AP course
could choose to reduce time spent on alternative (non-academic) activities If these other
activities have no causal impact on performance in school then the impact on overall
achievement could be negligible
Some students report concerns about their time allocation as they weigh the decision to enroll
in AP (Foust Hertberg-Davis and Callahan 2009 Hopkins 2012 Kim 2015) Many of these
concerns have increased over time as the courses have become more accessible to students who
previously faced barriers to enrollment Traditionally teachers only recommended AP courses to
students with high grades in prerequisite classes and the courses were only offered in schools
with substantial resources The Board has made efforts to increase access with for instance a
policy statement that encourages schools to open AP to all students who are ldquowilling to accept
the challengerdquo and remove all barriers that restrict access (College Board 2002)6 In a 2008
survey of a nationally-representative sample 65 percent of secondary school teachers reported
that their schools encourage as many students as possible to take AP and 69 percent reported that
AP courses are generally open to any student who wants to enroll (Duffett and Farkas 2009)
These open access policies have led to complaints that students who enroll with less preparation
will be unable to engage in the material (and perhaps become more discouraged by the
difficulty of the course) than students with more prior preparation (Hopkins 2012 Steinberg
2009 Duffett and Farkas 2009) Open access could also adversely affect more prepared students
through negative peer effects or through teachers removing content and slowing the pace of
course delivery
III AP Science Impact Study
A Overview
We recruited 23 schools from across the United States and offered monetary compensation to
pay for equipment and teacher training and as an incentive to secure participation7 Eligible
schools included ones that had not offered AP Biology or AP Chemistry in recent years were
willing to add such a course and comply with study protocol and had more eligible students than
could be served in one class so as to supply a sufficiently-sized control group8 Of the 23
schools 12 schools added AP Chemistry 10 schools added AP Biology and 1 school added both
courses We recruited two waves of schools (those that offered the course for the first time in
2013 and those that offered it for the first time in 2014) both waves were asked to field the
course for two years and the earlier-joining schools had the option of fielding the course for
three years The study includes 47 schools by cohort groups
Each participating school identified students that the school deemed eligible to take the new
AP Biology or Chemistry course in the spring of the prior year We treated all eligible students
8
who assented to participate in the study and who obtained consent from their parent or guardian
as study participants Upon receipt of signed consentassent forms we randomly offered
enrollment in the newly launched course to a subset of participating students9 The study
includes a total of 27 teachers and 1819 students (with an average of approximately 19 students
per AP class)
Figure 1 shows the geographic distribution of the 11 participating districts which are
primarily concentrated in the western southern and eastern regions of the country10 The
underrepresentation of districts in the Midwest is consistent with evidence that the Midwestern
region has experienced less competition over the years in access to selective postsecondary
institutions and a corresponding lag in AP participation rates (Bound Hershbein and Long
2009) Relative to districts across the nation those participating in the study tend to be in
neighborhoods with lower levels of socioeconomic status and to educate students who score
below average on tests in earlier grades (see Figure 2) Correspondingly participating schools
tend to be larger and more likely to educate students who are eligible for free or reduced-price
lunch Black and Hispanic than other schools (Panel A of Table 1)
There are two reasons for this over-representation of larger schools serving less economically
prosperous communities First AP courses are already offered in the majority of the nationrsquos
public high schools and schools that serve students from high-income families tend to offer
more AP subjects than schools that serve students from lower-income families (Malkus 2016
Theokas and Saaris 2013) Given that our research design only allowed for schools that had not
recently offered an AP science course the population of schools from which we recruited tended
to be those in settings with fewer resources Second participating schools were required to state
that they believed they would have 60 or more students who were qualified to take the AP
science course and this requirement tended to disqualify smaller high schools
Reflecting the school demographics participating teachers are slightly younger less
experienced and more likely to be female Black Asian American and of Hispanic ethnicity
than US high school science teachers generally (Panel B of Table 1) Nearly half (a third) of our
study teachers have less than or equal to five (two) years of teaching experience which is more
than double (triple) the rate of US high school science teachers Study teachers are more likely to
hold an undergraduate major in a STEM field than other high school science teachers yet far less
likely to hold a mastersrsquo degree and slightly less likely to have earned a teaching credential in
science Most of the participating teachers had previously taught a higher-level course (mostly
honors) yet only 47 percent of them had previously taught an AP course Our research
consequently applies to a population of teachers who are relatively new to the AP science
curriculum and who have generally not received graduate training11 Assuming AP courses
improve with teacher preparation our results likely capture the effect of a less-than-ideal version
of AP and may result in less positive treatment effects than when AP is delivered by teachers
with more training and experience (Clotfelter Ladd and Vigdor 2010)
B Data and Student Descriptive Statistics
We rely on three primary and secondary data sources for impact estimates The first is an
assessment developed and validated by the research team that measures studentsrsquo scientific
inquiry skills We administered this assessment to students in both treatment and control groups
and designed it to measure general inquiry skills (eg how to analyze data) rather than specific
content knowledge in Biology or Chemistry To that end the assessment tool includes nine items
that rely on science disciplinary knowledge that is taught in middle school specifically material
from Life Sciences and Physical Sciences The assessment which we administered to all study
9
participants during a 45-minute period measures studentsrsquo skills in data analysis scientific
explanation and scientific argument12 Participating teachers were not provided copies of the
instrument in advance therefore teachers were unable to teach any content material prior to test
administration
The second source is a questionnaire that we administered concurrently with the assessment
and that asks students a number of questions about their most recent science class and their plans
after high school The assessment and questionnaire were completed together and administered
outside of class (henceforth we refer to these instruments as the ldquosurveyrdquo) The third data source
are studentsrsquo high school transcripts which contain data on demographic and socioeconomic
background grades courses standardized exams taken in the 8th and 10th grades as well as high
school completion We use these data to determine the balance of randomization on pre-
treatment covariates estimate the effect of randomization on course-taking (including
compliance) improve the precision of our estimates with statistical controls and estimate
treatment effects on studentsrsquo grades
Our survey response rate was 78 percent13 Attrition can be attributed to student absences
during the dates scheduled for survey administration and communication lapses between school
coordinators and students Students who were randomly assigned to treatment have a 9-
percentage point higher survey response rate Given the possibility of nonrandom sample
attrition we weight all regressions by the inverse of the probability of completing the survey
conditional on student characteristics14 We implement a variety of robustness checks as
additional means to account for nonresponse These include multiple imputation of missing
outcome variables excluding one high school that had a low response rate and using the Lee
(2009) technique to provide bounds on the estimated effects These methods and results are
discussed below
We supplement these data with surveys that we administered online to teachers of the new
AP courses at the conclusion of the course The teacher survey includes questions about their
educational background professional experiences and professional development past and
present instructional practices generally and around science specifically participation in the
College Board AP training ability to cover the content of the AP course and coaching
mentoring and other professional community supports received from the school district and
education community
Table 2 provides balancing tests on pre-treatment characteristics for the full sample and the
survey respondents conditional on school by cohort fixed effects15 Most of the estimated
differences between treatment and control group students on pre-treatment observed
characteristics are small with some notable exceptions In both the full and survey samples
treatment group studentsrsquo reading exam scores were 010 and 009 standard deviations higher
than control group students both at p-values below 005 The magnitude of the treatment-control
difference was slightly lower and less precisely-estimated in math yet also favored treatment
group students16 To adjust for these chance imbalances we include all student covariates as
predictors of outcomes in the models and in the robustness checks we exclude these
covariates17
Table 2 also shows the extent of differences between control group compliers and non-
compliers We find that non-compliers are generally much more academically prepared for AP
science they have higher pre-treatment reading and math test scores and are more likely to have
completed the prerequisite courses On demographics non-compliers are more likely to be Asian
American and female18
10
IV Empirical Strategy
We estimate the effect of taking the AP science course with a standard instrumental variable
specification
(1) 119884119894119895 = 120572119895 + 119860119894119895120573 + 119935119894120574 + 120598119894119895
(2) 119860119875119894119895 = 120575119895 + 119874119891119891119890119903119890119889119894119895120579 + 119935119894120583 + 120598119894119895
where 119860119875119894119895 = 1 if student i enrolled in the AP science course in school x cohort stratum j 119860119894119895 is
the fitted value based on the estimates of the parameters in Equation (2) Offeredij = 1 if the
student is randomized into the treatment group Xi is a vector of pre-treatment covariates
(including age math and reading exam scores from 8th and 10th grade (standardized and
averaged for math and reading separately) cumulative GPA prior to the year when the AP
science course was offered and indicator variables for female racial group (Asian American
Black or Hispanic Native American or Multiracial) disability gifted English Language
Learner eligible for free or reduced-price lunch home language is not English and took
recommended prerequisite courses) and 120572119895 and 120575119895 are school by cohort fixed effects19 We use
two-stage least squares to estimate the model for all outcomes The local average treatment effect
(LATE) estimate is given by β
The intent to treat (ITT) estimate is obtained by replacing 119860119894119895 with Offeredij in Equation (1)
as shown in Equation (3) The coefficient on Offeredij in Equation (3) provides the effect of
being offered enrollment in the new AP science course and is a weighted average of effects on
those who do and do not choose to enroll in the course
(3) 119884119894119895 = 120577119895 + 119874119891119891119890119903119890119889119894119895120591 + 119935119894120582 + 120598119894119895
For outcomes that are obtained from the survey we weight regressions by the inverse of the
estimated probability of completing the survey20 The results are similar without using these
weights (see Online Appendix Tables 3 4 and 6) Since we have some missingness in student
characteristics as a result of either missing student transcripts or certain data elements not
collected by the district we use multiple imputation by chained equations creating 10 imputed
datasets and combine the results21 For inference we cluster standard errors at the level of
treatment assignment (school by cohort) in our analysis of main effects In the analysis of
robustness we report permutation standard errors robust standard errors (for comparison to
permutations) and the statistical significance of the LATE estimates after adjusting our tests of
significance for multiple comparisons
V Results
A Course-Taking and Treatment Contrast
Table 3 provides estimated effects of the randomized offer of enrollment on AP science course
enrollment and share of credits in all courses for the full sample and the survey samples The
first-stage estimates indicate that the offer substantially increased the likelihood of the student
taking the AP science course by 38 percentage points in the full sample and 39 percentage points
in the survey sample As we expected compliance with randomization was imperfect with 42
11
percent of the students who received an offer choosing not to enroll and 19 percent of the control
students enrolling Nearly all of these latter crossovers reflected decisions by the district to
violate the study protocol and let control group students into the course while a few of these
came from hardship exemptions that were requested by the school and granted by the study team
The remaining rows in Table 3 shine light on the courses that were crowded out by the newly
offered AP science course Mechanically treatment group students took more credits in AP
science (an 11-percentage point increase in the share of total credits in the full sample)
Treatment group studentsrsquo share of courses in any AP also increased by 11 percentage points
indicating that they chose not to reduce enrollment in other AP courses Instead taking AP
science appears to have crowded out regular courses (down 9 percentage points) including
regular science courses (down 2 percentage points)22
Approximately 78 percent of the control group compliers took any science course with 34
percent taking a non-AP advanced science course (almost entirely honors courses) during the
study year The control students who did not take AP Biology or Chemistry took a variety of
alternative science courses with the most commonly reported courses including Chemistry
(13) Physics (12) AP Environmental Science (11) Biology (10) Honors Biology (9)
and AnatomyPhysiology (9)
Table 4 provides the contrast in treatment and control group complier reports on the content
and rigor of their science courses for three composite variables We find that taking AP science
yielded a substantially more academically challenging curriculum (up 080 sd p-value lt 001)
and raised the extent of inquiry-based classroom activities (up 033 sd p-value = 006) Our
results also suggest that AP course-takerrsquos classrooms were more likely to use technology (up
028 sd p-value = 014)23 Online Appendix Table 5 shows estimated impacts on each of the
component variables used in constructing the composite variables We find that while AP
classrooms were more inquiry-based than other science classrooms using our composite
measure some of the core components of the inquiry approach that were intended by the Board
(eg applying knowledge to solve a new problem) were not more prevalent in AP science
classes than other science classes24 This contrast between studentsrsquo reports of the content and
rigor of their AP science course relative to other courses available to them offers one measure of
the relative quality of the treatment In a companion manuscript we provide a detailed evaluation
of implementation fidelity (the degree to which the courses were implemented as intended by the
Board) through teacher surveys course syllabi student transcripts and interviews with teachers
and school administrators (Long Conger and McGhee 2018) In that manuscript we find results
that are consistent with the finding that most teachers were able to implement a rigorous AP
science classroom yet they also struggled with the inquiry-based approach and integrating
technology into the classroom
These reported differences between treatment and control group classrooms also hold despite
the fact that many of the teachers selected to teach AP also teach the other science courses taken
by control group students In fact almost 67 percent of AP teachers reported using some of their
AP science strategies and lessons in their non-AP classes These within-school spillovers likely
attenuate observed differences in outcomes between treatment and control group students in the
same school25
B AP Impact on Outcomes
Table 5 reports estimated impacts of AP science on the key outcomes of interest We estimate
that for the typical complier taking AP science raises objectively measured scientific inquiry
skills by 023 standard deviations We are unable to rule out zero treatment impacts with
12
conventionally high levels of confidence (p-value = 014) and consequently refer to these results
as more suggestive than definitive AP science also increased compliersrsquo interest in pursuing a
STEM degree should they enroll in college by 9 percentage points up from a control group
complier mean of 62 percent with again more suggestive than definitive results at traditional
levels of statistical inference (p-value = 016)
Table 5 provides stronger evidence of negative treatment effects on studentsrsquo confidence in
their ability to succeed in a college science course Among control group compliers 92 percent
express that they are at least somewhat confident in their ability to succeed in a college science
course These high levels of confidence are perhaps not surprising since all of our sample
participants demonstrated interest in taking AP Chemistry or Biology as a result of signing the
study assent forms Taking AP science substantially lowered participantsrsquo likelihood of being at
least somewhat confident in their ability to complete college courses in science (down 10
percentage points p-value = 006) We also find large effects of the AP course on studentsrsquo self-
reported stress levels Among control group compliers 12 percent stated that their most recent
science class had a negative or strong negative impact on their stress levels (where a negative
impact indicates more stress) Taking AP science more than doubles this rate raising the
likelihood of stating a negative impact by 17 percentage points (p-value = 001) In results
available from the authors we also examine the effect of taking AP on the full distribution of
studentrsquos self-reported confidence and stress levels We find that taking AP science increases
studentsrsquo likelihood of reporting strong negative impacts on stress by 5 percentage points (p-
value = 005) above the control group complier mean of 2 percent
In addition to experiencing a loss in confidence and an increase in stress treatment group
studentsrsquo grades suffered We estimate that taking AP science reduced studentsrsquo grades in their
science courses by 029 points (p-value = 007) Relative to a control group complier mean of
280 taking AP science lowers studentsrsquo science GPAs during the study year (usually their junior
year) from around a B- to a C+26 This decline is addressed to some degree by high schools that
use a weighted grade point average to upweight grades from AP courses The last row of Table 5
provides our estimated effects of AP science on studentsrsquo grades in other courses AP science
takers score approximately 018 grade points lower than control group compliers in non-science
courses during the study year (p-value below 001) These results suggest that students may be
shifting their effort away from their non-AP classes in order to meet the demands of the
challenging AP course An average of these impacts weighted by studentsrsquo share of credits in
science during the study year assuming that they take AP science (024) suggests that taking AP
science lowers studentsrsquo overall grades by 021 during the year ((-029 times 024) + (-018 times
076))
With our estimates in hand we can easily compute the adjustment that would leave the
studentrsquos GPA during the study year unaffected For students who took AP Biology or Chemistry
as result of this experiment the share of their classes in any AP science subject is predicted to be
14 percent (ie 002 + 012 from Table 3) If these studentsrsquo grades in AP science courses were
boosted by 146 (021014) their GPAs during the study year would be unaffected by their
enrollment in these AP courses This 146 boost is close to the higher end of the practices
documented in Klopfenstein and Lively (2016)27
C Robustness Checks
Table 6 presents a variety of robustness checks of the ITT estimates on our six main outcomes
The first two columns of this table repeat the findings previously shown in Table 5 Columns (3)
and (4) present alternate methods for inference Column (3) reports robust standard errors and
13
Column (4) reports the results of a permutation test where we randomly assign a pseudo
treatment and compute the share of 1000 permutations where the absolute value of the estimated
pseudo treatment effect exceeds the absolute value of the estimated treatment effect shown in
Column (2)28 The resulting p-values from this permutation test are similar to the results using
robust standard errors (shown in Column (3)) resulting in five of the six outcomes with p-values
of less than 01029
Columns (5) through (7) of Table 6 show that the results are robust to (a) dropping the one
high school that offered both AP Biology and AP Chemistry as part of the study (b) including
observations with multiply-imputed missing outcome variables and (c) excluding the high
school with the lowest survey response rate30 Column (8) shows the results when we exclude all
of the Xi covariates where we find much larger estimated positive effects on scientific inquiry
skills and smaller estimated negative effects on grades The differences in the treatment effects
on the remaining three outcomes are modest These results likely reflect the fact that students
who were randomly assigned into the treatment group have higher pre-treatment grades and
reading and math test scores all covariates that strongly correlate with science skill and future
grades
Columns (9) through (12) of Table 6 use the Lee (2009) method to place bounds on our
estimates due to potential nonresponse bias in the student survey used for the first four outcomes
This method trims particular observations from the treatment group (in this case) until it matches
the response rate of the control group The lower (upper) bound estimate trims the treatment
observations with the highest (lowest) values of the outcome Using these lower and upper bound
estimates we compute the 95 percent confidence interval for the treatment effect itself by
applying the Imbens and Manski (2004) method Consistent with our main findings the upper
and lower bound points estimates are positive for science skill (003 and 039 sd) interest in
pursuing a STEM degree (2 and 12 percentage points) and stress (1 and 11 percentage points)
However the 95 percent confidence intervals overlap zero in all cases and are roughly double the
size of the ordinary confidence intervals These results suggest that some additional caution
should be considered in evaluating the effects from outcomes based on the study survey31
Finally we would have liked to report the results of theoretically motivated heterogeneity
analyses yet we lack the statistical power needed to test heterogeneity with a high level of
confidence For example Figure 3 shows a quantile regression conditional on Xi with science
skill as the outcome We find that the point estimates at every quantile are insignificantly
different from the 009 ITT point estimate reported in Table 5 yet the 95 confidence intervals
fail to rule out large positives and negatives Additional heterogeneity results can be found in the
Online Appendix32
VI Conclusion
Most admissions committees at bachelorrsquos degree-granting institutions rely on applicantsrsquo AP
course and exam participation as signals of subject-matter skill and interest rendering the
relationship between AP uptake and college enrollment somewhat deterministic There has been
almost no empirical work to support the theory that AP disproportionately endows high school
students with greater human capital than the other courses available to them Many students
educators and parents have also complained that the rigor of the AP pro- gram causes students to
lose confidence gain stress and perform poorly in other courses We evaluate these claims with
experimental evidence on the impact of AP Biology and Chemistry courses on studentsrsquo skills
14
interests and beliefs We recruited 23 schools that had not previously offered AP Biology or
Chemistry and were willing to permit us to randomize student access to the newly offered
course At the time of our school recruitment an estimated 50 percent of US high schools
already offered AP science classes and they tended to be in relatively higher-income
communities disproportionately serving White students (Malkus 2016) Our study drew from the
remaining population of schools where teachers had lower levels of training than science
teachers nationally and students were disproportionately non-White and poor Consequently our
results on AP impacts best generalize to schools like these that are on the cusp of deciding
whether to offer an AP science course
The estimates suggest that AP science led to improvements in science skill and STEM
interest above the courses that these students would otherwise take Prior research points to
longer-run benefits of AP including a higher likelihood of college enrollment and completion as
well as possible earnings gains (Jackson 2010 2014) Our findings suggest that these long-term
effects are at least partially driven by genuine increases in skill and not due solely to
postsecondary admissions and credit-granting policies33 We also find that AP science classes
substantially increase studentsrsquo stress levels and reduce their confidence in completing a college
science course Students who take AP science also receive lower grades in science and in other
(non-science) courses The cognitive gains from AP science are consistent with evidence that
higher levels of pressure and a lower level of confidence cause students to learn more than they
would otherwise And some of the negative effect on grades can be offset by upwardly weighting
grades in advanced courses
Although we have no direct way to convert our study impacts into monetary values for
students or society our evidence suggests that schools and districts are not making unwise or
costly investments in AP Calculating the differential cost to deliver an AP course versus another
level course in the same subject is difficult given that few schools document per-course
expenditures One recent analysis of a US district that relied on teacher salaries and course
assignments offers a partial cost-analysis Roza (2009) finds approximately $360 more in per-
pupil expenditures to deliver AP versus honors due primarily to smaller class sizes and more
senior teachers in AP This cost does not factor in the time that teachers spend retraining
themselves to teach the new curriculum At the same time relative to other policies aimed at
increasing human capital in high school that are often more costly to implement (such as
reducing class size) offering an AP course may be one of the least expensive options
This study offers the first credible estimates on the impact of a curriculum that is now offered
in the majority of the nationrsquos high schools and used by most postsecondary institutions to assess
applicant potential Our findings offer evidence to support and refute some of the claims made
about the AP program At the same time many important questions remain about differential AP
course impacts along student teacher and school attributes and on different parts of the outcome
distributions What are the general equilibrium effects of AP expansion for instance on college
admissions decisions as AP expands into schools with fewer resources Do AP courses generate
spillover effects on non-AP course-takers via changes in peer interactions and changes in how
teachers teach their non-AP classes These are all questions that warrant further research
15
References
Abadie Alberto Susan Athey Guido W Imbens and Jeffrey Wooldridge 2017 ldquoWhen Should
you Adjust Standard Errors for Clusteringrdquo NBER Working Paper No 24003
Cambridge MA NBER
Adelman Clifford 2006 The Toolbox Revisited Paths to Degree Completion from High School
Through College Washington DC US Department of Education
Aguilar Lauren Greg Walton and Carl Wieman 2014 ldquoPsychological Insights for Improved
Physics Teachingrdquo Physics Today 67 (5) 43ndash49
Altonji Joseph G 1995 ldquoThe Effects of High School Curriculum on Education and Labor
Market Outcomesrdquo The Journal of Human Resources 30 (3) 409ndash438
Anderson Carl R 1976 ldquoCoping Behaviors as Intervening Mechanisms in the Inverted-U-stress-
performance Relationshiprdquo Journal of Applied Psychology 61 (1) 30ndash34
Attewell Paul and Thurston Domina 2008 ldquoRaising the Bar Curricular Intensity and
Academic Performancerdquo Educational Evaluation and Policy Analysis 30 (1) 51ndash71
Avery Christopher Oded Gurantz Michael Hurwitz and Jonathan Smith 2018 ldquoShifting
College Majors in Response to Advanced Placement Exam Scoresrdquo Journal of Human
Resources 53 (4) 918ndash956
Benjamini Yoav and Yosef Hochberg 1995 ldquoControlling the False Discovery Rate A Practical
and Powerful Approach to Multiple Testingrdquo Journal of the Royal Statistical Society 57
(1) 289ndash300
Bennett J S Hogarth F Lubben B Campbell and A Robinson 2010 ldquoTalking Science The
Research Evidence on the Use of Small Group Discussions in Science Teachingrdquo
International Journal of Science Education 32 (1) 69ndash95
Berger Joe 2006 ldquoDemoting Advanced Placementrdquo The New York Times October 4
Boekaerts Monique and Jeroen S Rozendaal 2010 ldquoUsing Multiple Calibration Indices in
Order to Capture the Complex Picture of What Affects Studentsrsquo Accuracy of Feeling of
Confidencerdquo Learning and Instruction 20 (5) 372ndash382
Bound John Brad Hershbein and Bridget Terry Long 2009 ldquoPlaying the Admissions Game
Student Reactions to Increasing College Competitionrdquo The Journal of Economic
Perspectives 23 (4) 119ndash146
Bowie Liz 2013 ldquoMaryland Schools have been Leader in Advanced Placement but Results are
Mixedrdquo The Baltimore Sun August 17
Bush George W 2006 ldquoState of the Union Address by the Presidentrdquo Washington DC The
White House
Chiu Ming Ming and Robert M Klassen 2010 ldquoRelations of Mathematics Self-Concept and its
Calibration with Mathematics Achievement Cultural Differences among Fifteen-year-
olds in 34 Countriesrdquo Learning and Instruction 20 (1) 2ndash17
Clotfelter Charles T Helen F Ladd and Jacob L Vigdor 2010 ldquoTeacher Credentials and
Student Achievement in High School Across-Subject Analysis with Student Fixed
Effectsrdquo Journal of Human Resources 45 (3) 655ndash681
College Board 2002 Equity Policy Statement New York NY
__________ 2011a AP Biology Curriculum Framework 2012-2013 New York NY
__________ 2011b AP Chemistry Curriculum Framework 2013-2014 New York NY
__________ 2017a AP Course and Exam Redesign New York NY
__________ 2017b AP Course Audit New York NY
__________ 2018 AP Program Participation and Performance Data 2018 New York NY
16
Davis Jennifer R 2014 ldquoA Little Goes a Long Way Pressure for College Students to Succeedrdquo
Journal of Undergraduate Research 12 (1) 1ndash9
Dobbie Will and Roland G Fryer Jr 2015 ldquoThe medium-term impacts of high-achieving
charter schoolsrdquo Journal of Political Economy 123 (5) 985ndash1037
Dougherty Chrys and Lynn Mellor 2009 ldquoPreparation Mattersrdquo National Center for
Educational Achievement Washington DC
Dounay Zinth Jennifer 2016 ldquo50-State Comparison Advanced Placement Policiesrdquo Education
Commission of the States
Drew Christopher 2011 ldquoRethinking Advanced Placementrdquo The New York Times January 7
Duffett Ann and Steve Farkas 2009 ldquoGrowing Pains in the Advanced Placement Program Do
Tough Trade-offs Lie Aheadrdquo Thomas B Fordham Institute Washington DC
Ellis Jessica Bailey K Fosdick and Chris Rasmussen 2016 ldquoWomen 15 Times More Likely to
Leave STEM Pipeline after Calculus Compared to Men Lack of Mathematical
Confidence a Potential Culpritrdquo PLOS ONE 11 (7) 1ndash14
Foust Regan Clark Holly Hertberg-Davis and Carolyn M Callahan 2009 ldquoStudentsrsquo
Perceptions of the Non-academic Advantages and Disadvantages of Participation in
Advanced Placement Courses and International Baccalaureate Programsrdquo Adolescence
44 (174) 289ndash312
Geiser Saul and Veronica Santelices 2004 ldquoThe Role of Advanced Placement and Honors
Courses in College Admissionsrdquo Center for Studies in Higher Education Research
Occasional Paper Series CSHE404
Goodman Joshua Samuel 2012 ldquoThe Labor of Division Returns to Compulsory Math
Courseworkrdquo Unpublished Manuscript
Harel O 2009 ldquoThe Estimation of R-squared and Adjusted R-squared in Incomplete Data Sets
Using Multiple Imputationrdquo Journal of Applied Statistics 36 (10) 1109ndash1118
Hippel Paul T von 2007 ldquoRegression with Missing Ys An Improved Strategy for Analyzing
Multiply Imputed Datardquo Sociological Methodology 37 (1) 83ndash117
Holstead Michael S Terry E Spradlin Margaret E McGillivray and Nathan Burroughs 2010
ldquoThe Impact of Advanced Placement Incentive Programsrdquo Center for Evaluation and
Education Policy Indiana University Education Policy Brief 8(1)
Hopkins Katy 2012 ldquoWeigh the Benefits Stress of AP Courses for Your Studentrdquo US News
the World Report May 10
Huber Martin 2013 ldquoA Simple Test for the Ignorability of Non-compliance in Experimentsrdquo
Economics Letters 120 (3) 389ndash391
Imbens G and F Manski 2004 ldquoConfidence Intervals for Partially Identified Parametersrdquo
Econometrica 72 (6) 1845ndash1857
Jackson C Kirabo 2010 ldquoA Little Now for a Lot Later A Look at a Texas Advanced Placement
Incentive Programrdquo Journal of Human Resources 45 (3) 591ndash639
__________ 2014 ldquoDo College-Preparatory Programs Improve Long-Term Outcomesrdquo
Economic Inquiry 52 (1) 72ndash99
Joensen Juanna Schroslashter and Helena Skyt Nielsen 2009 ldquoIs there a Causal Effect of High
School Math on Labor Market Outcomesrdquo Journal of Human Resources 44 (1) 171ndash
198
Kim Emily 2015 ldquoAP Classes often Translate to Advanced Pressurerdquo Los Angeles Times
September 22
Klopfenstein Kristin and Kit Lively 2016 ldquoDo Grade Weights Promote More Advanced
17
Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324
Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement
Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891
__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and
Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds
Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188
Cambridge Harvard Education Press
Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla
Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)
287ndash 313
Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on
Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102
Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations
of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347
(6219) 262ndash265
Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math
and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic
Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student
STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher
Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking
on Secondary and Postsecondary Successrdquo American Educational Research Journal 49
(2) 285ndash322
Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP
Expansion Can Schools in Less-Resourced Communities Successfully Implement
Advanced Placement Science Coursesrdquo Conditionally accepted by Educational
Researcher
Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo
American Enterprise Institute Washington DC
Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23
McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy
Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of
Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-
144) US Department of Education Washington DC National Center for Education
Statistics
National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of
Mathematics and Science in US High Schoolsrdquo Washington DC National Academies
Press
__________ 2012 A Framework for K-12 Science Education Practices Crosscutting
Concepts and Core Ideas Washington DC The National Academies Press
Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC
Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data
Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures
Version 10 Stanford University
Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic
Analysis amp Policy 4 (1) 1ndash30
18
Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The
Review of Economics and Statistics 86 (2) 497ndash513
Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)
Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of
Advanced High School Coursework in Increasing STEM Career Interestrdquo Science
Educator 23 (1) 1ndash13
Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework
in College Admission Decisionsrdquo College and University 82 (4) 7ndash14
Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan
Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific
Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo
Educational Measurement Forthcoming
Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where
it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor
Economics 35 (1) 67ndash147
Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An
Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732
Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual
differencesrdquo Personality and Individual Differences 21 (6) 971ndash986
Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of
Cross-Cultural Psychology 45 (5) 821ndash837
Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid
Growthrdquo The New York Times April 29
Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo
Liberal Education 94 (3) 38ndash43
The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo
Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo
Education Trust June 5
Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and
Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-
001) US Department of Education Washington DC National Center for Education
Statistics
Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13
Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate
US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the
Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced
Placement Testsrdquo Washington DC
Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of
Advanced Placementrdquo Progressive Policy Institute Washington DC
West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth
Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring
Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation
and Policy Analysis 38 (1) 148ndash170
Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity
of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482
19
Figure 1
Geographic Distribution of Participating Districts
20
Figure 2
Participating Districts Neighborhood Socioeconomic Status and School Test Scores
Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school
district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos
neighborhood defined as the first principal component factor score based on measures of median
income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed
household rate and unemployment rate Y-axis is the districtrsquos average test score in grade
equivalents based on the averaged spring math and English scores for students in grades 3-8 for
2009-2013 with the expected level of achievement standardized to zero The size of each circle
is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using
Statarsquos default settings and roughly shows the predicted test score as a function of the
neighborhoodrsquos SES
21
Figure 3
Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile
Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects
Corresponding OLS estimate shown by the dashed horizontal line Science skill has been
standardized to have a mean of 0 and SD of 1 for the full sample of participating students
Results are weighted by the inverse probability of completing the survey
22
Table 1
Participating Schools and Teachers Compared to Other US High Schools and High School
Science Teachers Panel A Schools Participating Others
Average Enrollment 1409 723
Free or Reduced-Price Lunch 0700 0438
Asian 0055 0050
Black 0349 0154
Hispanic 0410 0221
White 0164 0537
Adjusted Cohort Graduation Rate 0843 0802
District Instruction Expenditures Per Pupil $6561 $5636
District Student Services Expenditures Per Pupil $3787 $3385
Panel B Teachers Participating Others
Age Under 30 0407 0160
Age 30-49 0432 0553
Age 50 or over 0161 0287
Female 0630 0536
Hispanic or Latino 0111 0051
Race American Indian or Alaska Native 0000 0009
Race Asian American 0111 0041
Race Black 0111 0060
Race Native Hawaiian or other Pacific Islander 0000 0004
Race White 0778 0896
Years of Experience 103 132
Years of Experience lt=2 0290 0085
Years of Experience lt=5 0481 0234
Hold a Teaching Certificate 0926 0945
Undergraduate Major in STEM 0944 0747
Single Subject Credential in Science 0630 0823
Masterrsquos Degree or Higher 0356 0615
Previously Taught AP Course 0469 NA
Previously Taught AP IB or Honors Course 0796 NA
Number of Professional Development Trainings 309 NA
in the Past 5 years (0-5)
Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts
httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public
high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a
9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the
Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey
httpsncesedgovsurveyssass Others in Panel B refers to public and private high school
teachers in the US High school science teachers are defined as teachers of grades 9-12 whose
main teaching assignment is in the natural sciences
23
Table 2
TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics
(1) (2) (3) (4) (5) (6)
Full Sample Survey Sample
Pre-Treatment Characteristic
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Age as of October of 11th Grade 166 -003 -007 166 -001 -001
(002) (007) (003) (009)
[019] [035] [065] [094]
Math Exam Score 038 008 025 044 007 030
(004) (010) (005) (016)
[008] [002] [017] [006]
Reading Exam Score 029 010 018 036 009 017
(003) (012) (004) (017)
[000] [014] [002] [031]
HS Grade Point Average 316 005 020 323 006 013
(003) (008) (003) (010)
[014] [002] [006] [020]
Female 059 000 010 061 -001 011
(003) (006) (004) (007)
[099] [010] [073] [012]
Asian American 012 002 010 012 003 010
(002) (005) (001) (007)
[027] [006] [007] [012]
Black 032 -002 -006 027 000 -005
(002) (006) (002) (005)
[029] [028] [088] [040]
Hispanic Native American or Multiracial 031 001 005 033 001 005
24
(002) (006) (002) (007)
[055] [041] [081] [051]
Disabled 002 000 -001 001 000 -001
(001) (001) (001) (001)
[093] [024] [057] [05]
Gifted 013 003 000 014 002 001
(002) (005) (002) (009)
[006] [100] [025] [089]
English Language Learner 005 001 002 004 001 004
(001) (002) (001) (003)
[041] [039] [054] [022]
Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007
(002) (007) (003) (009)
[066] [077] [072] [045]
Language Other than English Spoken at Home 034 002 003 035 001 004
(002) (007) (002) (007)
[032] [073] [059] [056]
Took Recommended Prerequisite Courses 079 000 009 079 002 005
(002) (004) (002) (005)
[084] [004] [043] [031]
Number of Observations 1819 1417
Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by
School x Cohort are in parentheses and p-values are in brackets
25
Table 3
First Stage Impacts on AP Course Enrollment and Overall Course Enrollment
(1) (2) (3) (4) (5) (6)
Full Sample Survey Respondents
Outcome
Control
Group
Mean
ITT
LATE
Control
Group
Mean
ITT
LATE
AP Treatment Course Enrollment 019 038 024 039
(005) (006)
[000] [000] Share of Credits During Study Year in
AP Science 003 004 011 003 004 010
(001) (001) (001) (001)
[000] [000] [000] [000]
All AP 013 004 011 014 004 010
(001) (002) (001) (002)
[000] [000] [000] [000]
Other Advanced Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [023] [020] [020]
All Other Advanced 025 -001 -003 025 -001 -003
(001) (002) (001) (003)
[023] [023] [030] [030]
Regular Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [020] [024] [019]
All Regular 062 -003 -009 061 -003 -007
(001) (003) (001) (003)
[002] [000] [007] [003]
Number of Observations 1819 1417
Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating
Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation
(1) Course-taking information collected from student transcripts Control Group Mean uses the
full control group for the first outcome (ie AP Treatment Course Enrollment) and those control
group members who complied with their assignment (ie those who did not take the AP
Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are
weighted by the inverse probability of completing the survey Standard errors clustered by School
x Cohort are in parentheses and p-values are in brackets
26
Table 4
Treatment Contrast (Composite Variables)
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Academically Challenging Curriculum -033 031 080
(010) (024)
[000] [000]
Project-Based Independent Classroom
Activities -006 013 033
(007) (017)
[007] [006]
Integrated Use of Technology
-011 011 028
(008) (019)
[019] [014]
Number of Observations 1417
Notes To construct these composite variables we first converted the values on each component
variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest
category was set to 10 the lowest to 00 and the remaining categories evenly spaced between
00 and 10 We then averaged and standardized these converted values Results are weighted by
the inverse probability of completing the survey Online Appendix Table 5 provides the list of
component variables Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
27
Table 5
AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Science Skill -019 009 023
(006) (016)
[015] [014]
STEM Interest 062 004 009
(002) (007)
[016] [016]
Confidence in College
Science 092 -004 -010
(002) (005)
[011] [006]
Stress 012 007 017
(003) (007)
[002] [001]
Grades in Science Courses 280 -012 -029
(007) (016)
[008] [007]
Grades in Other Courses 314 -007 -018
(002) (006)
[000] [000]
Number of Observations 1819 for grades 1417 for other
outcomes
Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of
participating students STEM interest =1 if high or some interest in pursuing a STEM degree or
=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to
complete a college science course or =0 if somewhat not confident or not at all confident Stress=
1 if most recent science course had strong negative or negative impact on physical or emotional
health or =0 if strong positive impact positive impact or no impact Grades in science and other
courses are obtained from student transcripts and measure grades during the study year
Results with the exception of grades during study year are weighted by the inverse probability of
completing the survey Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
Table 6
Robustness Checks of Main ITT Results
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Outcome
Control
Group
Complie
r Mean
Main
Result
s
Robus
t SE
p-value
(permutatio
n test)
Excludin
g High
School
56
Including
Imputatio
n of
Missing
Outcome
Variables
Excluding
Covariate
s
Excludin
g High
School
23
Lee
Lower
Boun
d
Lee
Upper
Boun
d
95
Confidence
Interval
from Lee
Bounds
Rati
o of
95
CI in
(11)
to
95
CI in
(7)
Science Skill -019 009 010 011 020 007 003 039
-
009
05
1 20
(006) (005) (000) (000) (000) (000) (007) (007)
[015] [006] [006] [020] [011] [001] [024] [072] [000]
STEM Interest 062 004 005 003 003 003 002 012
-
003
01
8 19
(002) (003) (000) (000) (000) (000) (003) (004)
[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College
Science 092 -004 -003 -006 -006 -004 -006 005
-
009
01
0 20
(002) (002) (000) (000) (000) (000) (002) (003)
[011] [005] [007] [037] [002] [003] [010] [000] [017]
Stress 012 007 005 006 008 007 001 011
-
005
01
5 16
(003) (002) (000) (000) (000) (000) (003) (002)
[002] [000] [000] [014] [007] [002] [002] [079] [000]
Grades in Science Courses 280 -012 -006 -010 -007 |
(007) (004) (000) (000) (000)
[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts
Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey
(002) (003) (000) (000) (000) |
[000] [001] [001] [000] [001] [038]
Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than
standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby
a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of
43
the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)
reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the
experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply
imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and
where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)
from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those
treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and
control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to
derive confidence interval for the treatment effect itself)
30
1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the
Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the
effect of advanced high school courses more generally often without distinctions between AP
and other rigorous course options Nearly all of these nonexperimental studies find large positive
effects of rigorous secondary school courses particularly those in math and science on studentsrsquo
high school postsecondary and labor market performance (eg Altonji 1995 Attewell and
Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long
Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer
an AP course are encouraged (though not required) to attend a professional development
training The Board and other independent agencies offer several workshops with the most
extensive training being the AP summer institute a week-long training that is led by an
experienced AP instructor Teachers are then expected to develop their syllabi for the course and
submit them to the Board for review A team of auditors at the Board review each syllabus and
grant permission to a school to label the course as AP on course catalogs and student transcripts
once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they
do not meet the requirements upon original submission College Board (2017b) contains a
discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for
assessment (ie course delivery and student performance are not assessed by the Board) In
order to effectively run an AP Biology or Chemistry course teachers require access to a well-
equipped classroom and laboratory including all supplies necessary to engage in
experimentation (eg beakers solutions microscopes measuring equipment) Most of the
teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of
learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry
reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses
were based upon recommendations from the National Science Foundation the National Research
Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently
influenced by their frames of reference in ways that other assessments of these traits (eg
external observations) may be less influenced By increasing the standard to which they compare
themselves studentsrsquo confidence may decrease This feature of most self-assessments could be
considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et
al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome
depends to some extent on how these changes in perceived ability influence other behaviors
such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and
Biology I and Chemistry I for AP Biology with no additional requirements beyond these
prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week
training course classroom supplies (eg lab materials textbooks) and to compensate schools
for the staff time required for study administration efforts We also offered $1000 compensation
for an individual selected by the school to serve as a liaison between the study team and the
31
school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of
students which would have powered the study to detect effect sizes smaller than those detected
here We faced several challenges in recruiting schools to participate even with the monetary
incentives Some schools were uncomfortable with randomization across classrooms while
others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the
course would be offered We also made some assignments on a rolling basis as additional
consentassent forms were submitted We have no information on the students who were deemed
eligible by the school to take the new AP science course but who did not sign the consent form
to participate As these students did not participate we do not have permission to obtain
information on their characteristics (eg via transcripts) and for most schools we do not know
the number of such students 10 Participating districts include Anaheim Union High School District California East Side
Union High School District California Lynwood Unified School District California Jefferson
Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg
Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public
Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville
Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate
degrees held by teachers nationally are likely to be in education (not STEM) Thus the study
teachers are less likely to have a graduate degree but not necessarily less likely to have STEM
training We also did not survey teachers regarding their Teach for America (TFA) experience
but it is possible that the relatively high share of STEM undergraduate degrees could be driven
by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last
pilot test included 140 students) prior to administering the tool to study participants Reliability
metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of
the development of the assessment instrument in the survey can be found in Seeratan et al
(2017) 13 Each year in the spring semester our team administered and collected the participant surveys
during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However
if study participants who did not take the survey differ in unobserved ways then our reweighting
based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo
characteristics before imputation of missing values (as described below) these results are very
similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact
that some students did not have 10th grade scores) we created one reading and math score for
each student that is the average of both scores or just the 8th grade score For the 23 participating
students who were in 10th grade during the year in which the AP course was offered to their
cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be
endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of
enrollment in the course so the lack of balance is simply due to unlucky randomization rather
32
than manipulation by school administrators We considered implementing a randomized block
design to avoid such issues but found it infeasible to obtain the necessary test score information
prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was
allowed to register for the new class We added an entire planning year to our study design to
avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by
Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We
find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these
six outcomes which suggests that generalizing our estimated treatment effects to the full control
group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly
offered enrollment in an AP course and then given the option of Chemistry or Biology To
account for the two courses offered we treat the school as two separate groups School-
Chemistry and School-Biology For those students who were not offered an AP course we
randomly assign them to one of two control groups proportional to the number of treated
students who chose each course For example if 60 of the treated students chose Biology then
we randomly assign 60 of the control students to the School-Biology control group In Section
VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a
probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1
if student i in school by cohort j completed any part of the end-of-year survey Xi is the same
vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed
effects and Φ() is the cumulative normal distribution function The results of this regression are
included in Online Appendix Table 2 Students who had higher pre-treatment grades Black
students those who were not disabled and those who took prerequisite courses were more likely
to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives
more weight in the regression to study participants who completed the survey and yet had pre-
study characteristics that were similar to those study participants who did not complete the
survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and
with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we
observe each outcome variable This follows a multiple imputation then deletion strategy
suggested by Hippel (2007) which improves efficiency while protecting against problematic
imputed outcome values As a robustness check Section VC provides results including
imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually
challenging curriculum with more homework than non-AP complier students Treatment group
students are also more likely to report that the students in their class were driven to succeed and
that the teacher set high standards The AP science class also involved more student-led projects
or experiments hands on learning and small group work all activities that are deemed to be
essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)
Yet we do not find strong evidence that students in AP classes were more likely to present what
they learned apply their knowledge to solve a new problem or work independently and none of
the component measures of technology usage were statistically significantly affected Nor did
33
treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear
better able to implement the academic rigor expected of an AP science class than some of the
inquiry-based approaches that the College Board intends for AP science We do not find
evidence that taking AP science led students to be more likely to report that they found their
course more interesting which may reflect the inability of the teachers to fully implement a
creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects
that might render our estimated effects smaller A research design with randomization both
across and within schools would allow for estimation of spillover effects but such a design was
infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers
received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the
weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors
in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple
outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons
(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same
three outcomes that reach statistical significance without applying the correction (shown in
Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys
from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we
have included the students from cohort 1 of high school number 23 where nonresponse was due
mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes
(including student prior academic preparation raceethnicity gender and teacher preparation)
We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in
science and grades in other courses) Some of the differences in the point estimates were quite
large yet so too were the standard errors For instance five of the seven estimated differential
treatment effects on science skill exceed 025 standard deviations with p-values that fall in the
suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse
on all three cohorts of study participants Once data collection is complete we will have the
ability to examine the effect of AP science on college enrollment college selectivity and college
completion
6
B Conceptual Framework
There are several channels through which an AP science class is expected to influence studentsrsquo
cognitive and noncognitive skills Much like the ideal college course AP science is designed to
provide rigorous content and a substantial workload be taught by teachers who have high
expectations and consist of students who are driven to succeed These inputsmdashcourse rigor
teacher expectations and peer motivationmdashare often thought of as the main characteristics that
distinguish AP courses from other high school courses
Yet AP science classes are also intended to offer an inquiry-based approach to science that
when combined with a high level of rigor provides an additional causal pathway to change
Specifically a well-implemented AP science course should encourage students to ask questions
gather and interpret data arrive at explanations grounded in scientific principles and
communicate their observations to one another under the guidance of teachers (College Board
2011a 2011b)4 This student-led inquiry-based approach differs from many traditional
secondary school science classrooms where the goal is often for students to memorize content
and replicate laboratory experiments that demonstrate the content (National Research Council
2002 2012) The AP science course in contrast seeks to expose students to the real-world
practices of science and the skills that form the basis of scientific inquiry by focusing more on
big picture concepts and small group experimentation with students directing the inquiry The
curriculum also encourages teachers to move away from lecture-based pedagogy and multiple-
choice quizzes and to increase their use of technology to help students analyze data draw
interpretations and communicate findings (College Board 2011a 2011b)
AP science classes are expected to increase studentsrsquo ability to ask research questions design
experiments analyze data and draw conclusions In the process of gaining these scientific
inquiry skills the new curriculum is intended to spur greater interest in the practice of science
because it becomes more enjoyable and more accessible to students for whom rote memorization
and execution of prefabricated lab experiments might have diminished enthusiasm in the subject
(National Research Council 2012) Science experts posit that inquiry-based science courses will
be particularly successful in generating greater interest and skill among women and among
students from underrepresented minority groups (Aguilar Walton and Wieman 2014 Ellis
Fosdick and Rasmussen 2016 Kurth Anderson and Palincsar 2002 Leslie et al 2015 Litzler
Samuelson and Lorah 2014)
While the rigor and expectations of a college course may be appropriate for some students it
can be too demanding for others Students often report high levels of stress and burnout from
taking AP courses particularly if they perceive that they are not prepared for the challenge of
college coursework (Kim 2015 Marx 2014 Tucker 2012) A strenuous AP course could in fact
cause students to lose confidence in their ability to complete college science courses A number
of mechanisms could cause students to lose confidence including exposure to stronger peers
inability to successfully complete assignments or simply receiving lower grades than they
received in their non-AP courses5 The AP effect on confidence will likely matter differently for
students with different levels of initial confidence For students who are over-confident in their
ability to succeed in college science courses taking a challenging AP course in high school
might cause them to revise their expectations to be more in line with the higher demands of
college-level work
Taking a more strenuous AP course is also likely to affect studentsrsquo time allocation
Studentsrsquo performance in each class will be determined by their subject-specific ability as well as
the amount of time they devote to their coursework versus other activities including work
7
extracurricular and leisure If AP courses are more demanding than other courses students
solving a time allocation problem may shift more effort into their AP course away from other
pursuits The impact of this change in time allocation on studentsrsquo performance in AP and other
courses will depend upon whether they shift effort away from other courses and on the degree of
complementarity between their AP science course and their other courses Study time devoted to
an AP science course could improve student performance in other math and science classes
(where the skills tasks and knowledge are similar) even if students spend less time on those
courses For courses that require students to perform tasks that are not complementary with AP
science (eg courses in the humanities) taking AP science concurrently with these courses
could decrease student performance in both courses Of course students taking an AP course
could choose to reduce time spent on alternative (non-academic) activities If these other
activities have no causal impact on performance in school then the impact on overall
achievement could be negligible
Some students report concerns about their time allocation as they weigh the decision to enroll
in AP (Foust Hertberg-Davis and Callahan 2009 Hopkins 2012 Kim 2015) Many of these
concerns have increased over time as the courses have become more accessible to students who
previously faced barriers to enrollment Traditionally teachers only recommended AP courses to
students with high grades in prerequisite classes and the courses were only offered in schools
with substantial resources The Board has made efforts to increase access with for instance a
policy statement that encourages schools to open AP to all students who are ldquowilling to accept
the challengerdquo and remove all barriers that restrict access (College Board 2002)6 In a 2008
survey of a nationally-representative sample 65 percent of secondary school teachers reported
that their schools encourage as many students as possible to take AP and 69 percent reported that
AP courses are generally open to any student who wants to enroll (Duffett and Farkas 2009)
These open access policies have led to complaints that students who enroll with less preparation
will be unable to engage in the material (and perhaps become more discouraged by the
difficulty of the course) than students with more prior preparation (Hopkins 2012 Steinberg
2009 Duffett and Farkas 2009) Open access could also adversely affect more prepared students
through negative peer effects or through teachers removing content and slowing the pace of
course delivery
III AP Science Impact Study
A Overview
We recruited 23 schools from across the United States and offered monetary compensation to
pay for equipment and teacher training and as an incentive to secure participation7 Eligible
schools included ones that had not offered AP Biology or AP Chemistry in recent years were
willing to add such a course and comply with study protocol and had more eligible students than
could be served in one class so as to supply a sufficiently-sized control group8 Of the 23
schools 12 schools added AP Chemistry 10 schools added AP Biology and 1 school added both
courses We recruited two waves of schools (those that offered the course for the first time in
2013 and those that offered it for the first time in 2014) both waves were asked to field the
course for two years and the earlier-joining schools had the option of fielding the course for
three years The study includes 47 schools by cohort groups
Each participating school identified students that the school deemed eligible to take the new
AP Biology or Chemistry course in the spring of the prior year We treated all eligible students
8
who assented to participate in the study and who obtained consent from their parent or guardian
as study participants Upon receipt of signed consentassent forms we randomly offered
enrollment in the newly launched course to a subset of participating students9 The study
includes a total of 27 teachers and 1819 students (with an average of approximately 19 students
per AP class)
Figure 1 shows the geographic distribution of the 11 participating districts which are
primarily concentrated in the western southern and eastern regions of the country10 The
underrepresentation of districts in the Midwest is consistent with evidence that the Midwestern
region has experienced less competition over the years in access to selective postsecondary
institutions and a corresponding lag in AP participation rates (Bound Hershbein and Long
2009) Relative to districts across the nation those participating in the study tend to be in
neighborhoods with lower levels of socioeconomic status and to educate students who score
below average on tests in earlier grades (see Figure 2) Correspondingly participating schools
tend to be larger and more likely to educate students who are eligible for free or reduced-price
lunch Black and Hispanic than other schools (Panel A of Table 1)
There are two reasons for this over-representation of larger schools serving less economically
prosperous communities First AP courses are already offered in the majority of the nationrsquos
public high schools and schools that serve students from high-income families tend to offer
more AP subjects than schools that serve students from lower-income families (Malkus 2016
Theokas and Saaris 2013) Given that our research design only allowed for schools that had not
recently offered an AP science course the population of schools from which we recruited tended
to be those in settings with fewer resources Second participating schools were required to state
that they believed they would have 60 or more students who were qualified to take the AP
science course and this requirement tended to disqualify smaller high schools
Reflecting the school demographics participating teachers are slightly younger less
experienced and more likely to be female Black Asian American and of Hispanic ethnicity
than US high school science teachers generally (Panel B of Table 1) Nearly half (a third) of our
study teachers have less than or equal to five (two) years of teaching experience which is more
than double (triple) the rate of US high school science teachers Study teachers are more likely to
hold an undergraduate major in a STEM field than other high school science teachers yet far less
likely to hold a mastersrsquo degree and slightly less likely to have earned a teaching credential in
science Most of the participating teachers had previously taught a higher-level course (mostly
honors) yet only 47 percent of them had previously taught an AP course Our research
consequently applies to a population of teachers who are relatively new to the AP science
curriculum and who have generally not received graduate training11 Assuming AP courses
improve with teacher preparation our results likely capture the effect of a less-than-ideal version
of AP and may result in less positive treatment effects than when AP is delivered by teachers
with more training and experience (Clotfelter Ladd and Vigdor 2010)
B Data and Student Descriptive Statistics
We rely on three primary and secondary data sources for impact estimates The first is an
assessment developed and validated by the research team that measures studentsrsquo scientific
inquiry skills We administered this assessment to students in both treatment and control groups
and designed it to measure general inquiry skills (eg how to analyze data) rather than specific
content knowledge in Biology or Chemistry To that end the assessment tool includes nine items
that rely on science disciplinary knowledge that is taught in middle school specifically material
from Life Sciences and Physical Sciences The assessment which we administered to all study
9
participants during a 45-minute period measures studentsrsquo skills in data analysis scientific
explanation and scientific argument12 Participating teachers were not provided copies of the
instrument in advance therefore teachers were unable to teach any content material prior to test
administration
The second source is a questionnaire that we administered concurrently with the assessment
and that asks students a number of questions about their most recent science class and their plans
after high school The assessment and questionnaire were completed together and administered
outside of class (henceforth we refer to these instruments as the ldquosurveyrdquo) The third data source
are studentsrsquo high school transcripts which contain data on demographic and socioeconomic
background grades courses standardized exams taken in the 8th and 10th grades as well as high
school completion We use these data to determine the balance of randomization on pre-
treatment covariates estimate the effect of randomization on course-taking (including
compliance) improve the precision of our estimates with statistical controls and estimate
treatment effects on studentsrsquo grades
Our survey response rate was 78 percent13 Attrition can be attributed to student absences
during the dates scheduled for survey administration and communication lapses between school
coordinators and students Students who were randomly assigned to treatment have a 9-
percentage point higher survey response rate Given the possibility of nonrandom sample
attrition we weight all regressions by the inverse of the probability of completing the survey
conditional on student characteristics14 We implement a variety of robustness checks as
additional means to account for nonresponse These include multiple imputation of missing
outcome variables excluding one high school that had a low response rate and using the Lee
(2009) technique to provide bounds on the estimated effects These methods and results are
discussed below
We supplement these data with surveys that we administered online to teachers of the new
AP courses at the conclusion of the course The teacher survey includes questions about their
educational background professional experiences and professional development past and
present instructional practices generally and around science specifically participation in the
College Board AP training ability to cover the content of the AP course and coaching
mentoring and other professional community supports received from the school district and
education community
Table 2 provides balancing tests on pre-treatment characteristics for the full sample and the
survey respondents conditional on school by cohort fixed effects15 Most of the estimated
differences between treatment and control group students on pre-treatment observed
characteristics are small with some notable exceptions In both the full and survey samples
treatment group studentsrsquo reading exam scores were 010 and 009 standard deviations higher
than control group students both at p-values below 005 The magnitude of the treatment-control
difference was slightly lower and less precisely-estimated in math yet also favored treatment
group students16 To adjust for these chance imbalances we include all student covariates as
predictors of outcomes in the models and in the robustness checks we exclude these
covariates17
Table 2 also shows the extent of differences between control group compliers and non-
compliers We find that non-compliers are generally much more academically prepared for AP
science they have higher pre-treatment reading and math test scores and are more likely to have
completed the prerequisite courses On demographics non-compliers are more likely to be Asian
American and female18
10
IV Empirical Strategy
We estimate the effect of taking the AP science course with a standard instrumental variable
specification
(1) 119884119894119895 = 120572119895 + 119860119894119895120573 + 119935119894120574 + 120598119894119895
(2) 119860119875119894119895 = 120575119895 + 119874119891119891119890119903119890119889119894119895120579 + 119935119894120583 + 120598119894119895
where 119860119875119894119895 = 1 if student i enrolled in the AP science course in school x cohort stratum j 119860119894119895 is
the fitted value based on the estimates of the parameters in Equation (2) Offeredij = 1 if the
student is randomized into the treatment group Xi is a vector of pre-treatment covariates
(including age math and reading exam scores from 8th and 10th grade (standardized and
averaged for math and reading separately) cumulative GPA prior to the year when the AP
science course was offered and indicator variables for female racial group (Asian American
Black or Hispanic Native American or Multiracial) disability gifted English Language
Learner eligible for free or reduced-price lunch home language is not English and took
recommended prerequisite courses) and 120572119895 and 120575119895 are school by cohort fixed effects19 We use
two-stage least squares to estimate the model for all outcomes The local average treatment effect
(LATE) estimate is given by β
The intent to treat (ITT) estimate is obtained by replacing 119860119894119895 with Offeredij in Equation (1)
as shown in Equation (3) The coefficient on Offeredij in Equation (3) provides the effect of
being offered enrollment in the new AP science course and is a weighted average of effects on
those who do and do not choose to enroll in the course
(3) 119884119894119895 = 120577119895 + 119874119891119891119890119903119890119889119894119895120591 + 119935119894120582 + 120598119894119895
For outcomes that are obtained from the survey we weight regressions by the inverse of the
estimated probability of completing the survey20 The results are similar without using these
weights (see Online Appendix Tables 3 4 and 6) Since we have some missingness in student
characteristics as a result of either missing student transcripts or certain data elements not
collected by the district we use multiple imputation by chained equations creating 10 imputed
datasets and combine the results21 For inference we cluster standard errors at the level of
treatment assignment (school by cohort) in our analysis of main effects In the analysis of
robustness we report permutation standard errors robust standard errors (for comparison to
permutations) and the statistical significance of the LATE estimates after adjusting our tests of
significance for multiple comparisons
V Results
A Course-Taking and Treatment Contrast
Table 3 provides estimated effects of the randomized offer of enrollment on AP science course
enrollment and share of credits in all courses for the full sample and the survey samples The
first-stage estimates indicate that the offer substantially increased the likelihood of the student
taking the AP science course by 38 percentage points in the full sample and 39 percentage points
in the survey sample As we expected compliance with randomization was imperfect with 42
11
percent of the students who received an offer choosing not to enroll and 19 percent of the control
students enrolling Nearly all of these latter crossovers reflected decisions by the district to
violate the study protocol and let control group students into the course while a few of these
came from hardship exemptions that were requested by the school and granted by the study team
The remaining rows in Table 3 shine light on the courses that were crowded out by the newly
offered AP science course Mechanically treatment group students took more credits in AP
science (an 11-percentage point increase in the share of total credits in the full sample)
Treatment group studentsrsquo share of courses in any AP also increased by 11 percentage points
indicating that they chose not to reduce enrollment in other AP courses Instead taking AP
science appears to have crowded out regular courses (down 9 percentage points) including
regular science courses (down 2 percentage points)22
Approximately 78 percent of the control group compliers took any science course with 34
percent taking a non-AP advanced science course (almost entirely honors courses) during the
study year The control students who did not take AP Biology or Chemistry took a variety of
alternative science courses with the most commonly reported courses including Chemistry
(13) Physics (12) AP Environmental Science (11) Biology (10) Honors Biology (9)
and AnatomyPhysiology (9)
Table 4 provides the contrast in treatment and control group complier reports on the content
and rigor of their science courses for three composite variables We find that taking AP science
yielded a substantially more academically challenging curriculum (up 080 sd p-value lt 001)
and raised the extent of inquiry-based classroom activities (up 033 sd p-value = 006) Our
results also suggest that AP course-takerrsquos classrooms were more likely to use technology (up
028 sd p-value = 014)23 Online Appendix Table 5 shows estimated impacts on each of the
component variables used in constructing the composite variables We find that while AP
classrooms were more inquiry-based than other science classrooms using our composite
measure some of the core components of the inquiry approach that were intended by the Board
(eg applying knowledge to solve a new problem) were not more prevalent in AP science
classes than other science classes24 This contrast between studentsrsquo reports of the content and
rigor of their AP science course relative to other courses available to them offers one measure of
the relative quality of the treatment In a companion manuscript we provide a detailed evaluation
of implementation fidelity (the degree to which the courses were implemented as intended by the
Board) through teacher surveys course syllabi student transcripts and interviews with teachers
and school administrators (Long Conger and McGhee 2018) In that manuscript we find results
that are consistent with the finding that most teachers were able to implement a rigorous AP
science classroom yet they also struggled with the inquiry-based approach and integrating
technology into the classroom
These reported differences between treatment and control group classrooms also hold despite
the fact that many of the teachers selected to teach AP also teach the other science courses taken
by control group students In fact almost 67 percent of AP teachers reported using some of their
AP science strategies and lessons in their non-AP classes These within-school spillovers likely
attenuate observed differences in outcomes between treatment and control group students in the
same school25
B AP Impact on Outcomes
Table 5 reports estimated impacts of AP science on the key outcomes of interest We estimate
that for the typical complier taking AP science raises objectively measured scientific inquiry
skills by 023 standard deviations We are unable to rule out zero treatment impacts with
12
conventionally high levels of confidence (p-value = 014) and consequently refer to these results
as more suggestive than definitive AP science also increased compliersrsquo interest in pursuing a
STEM degree should they enroll in college by 9 percentage points up from a control group
complier mean of 62 percent with again more suggestive than definitive results at traditional
levels of statistical inference (p-value = 016)
Table 5 provides stronger evidence of negative treatment effects on studentsrsquo confidence in
their ability to succeed in a college science course Among control group compliers 92 percent
express that they are at least somewhat confident in their ability to succeed in a college science
course These high levels of confidence are perhaps not surprising since all of our sample
participants demonstrated interest in taking AP Chemistry or Biology as a result of signing the
study assent forms Taking AP science substantially lowered participantsrsquo likelihood of being at
least somewhat confident in their ability to complete college courses in science (down 10
percentage points p-value = 006) We also find large effects of the AP course on studentsrsquo self-
reported stress levels Among control group compliers 12 percent stated that their most recent
science class had a negative or strong negative impact on their stress levels (where a negative
impact indicates more stress) Taking AP science more than doubles this rate raising the
likelihood of stating a negative impact by 17 percentage points (p-value = 001) In results
available from the authors we also examine the effect of taking AP on the full distribution of
studentrsquos self-reported confidence and stress levels We find that taking AP science increases
studentsrsquo likelihood of reporting strong negative impacts on stress by 5 percentage points (p-
value = 005) above the control group complier mean of 2 percent
In addition to experiencing a loss in confidence and an increase in stress treatment group
studentsrsquo grades suffered We estimate that taking AP science reduced studentsrsquo grades in their
science courses by 029 points (p-value = 007) Relative to a control group complier mean of
280 taking AP science lowers studentsrsquo science GPAs during the study year (usually their junior
year) from around a B- to a C+26 This decline is addressed to some degree by high schools that
use a weighted grade point average to upweight grades from AP courses The last row of Table 5
provides our estimated effects of AP science on studentsrsquo grades in other courses AP science
takers score approximately 018 grade points lower than control group compliers in non-science
courses during the study year (p-value below 001) These results suggest that students may be
shifting their effort away from their non-AP classes in order to meet the demands of the
challenging AP course An average of these impacts weighted by studentsrsquo share of credits in
science during the study year assuming that they take AP science (024) suggests that taking AP
science lowers studentsrsquo overall grades by 021 during the year ((-029 times 024) + (-018 times
076))
With our estimates in hand we can easily compute the adjustment that would leave the
studentrsquos GPA during the study year unaffected For students who took AP Biology or Chemistry
as result of this experiment the share of their classes in any AP science subject is predicted to be
14 percent (ie 002 + 012 from Table 3) If these studentsrsquo grades in AP science courses were
boosted by 146 (021014) their GPAs during the study year would be unaffected by their
enrollment in these AP courses This 146 boost is close to the higher end of the practices
documented in Klopfenstein and Lively (2016)27
C Robustness Checks
Table 6 presents a variety of robustness checks of the ITT estimates on our six main outcomes
The first two columns of this table repeat the findings previously shown in Table 5 Columns (3)
and (4) present alternate methods for inference Column (3) reports robust standard errors and
13
Column (4) reports the results of a permutation test where we randomly assign a pseudo
treatment and compute the share of 1000 permutations where the absolute value of the estimated
pseudo treatment effect exceeds the absolute value of the estimated treatment effect shown in
Column (2)28 The resulting p-values from this permutation test are similar to the results using
robust standard errors (shown in Column (3)) resulting in five of the six outcomes with p-values
of less than 01029
Columns (5) through (7) of Table 6 show that the results are robust to (a) dropping the one
high school that offered both AP Biology and AP Chemistry as part of the study (b) including
observations with multiply-imputed missing outcome variables and (c) excluding the high
school with the lowest survey response rate30 Column (8) shows the results when we exclude all
of the Xi covariates where we find much larger estimated positive effects on scientific inquiry
skills and smaller estimated negative effects on grades The differences in the treatment effects
on the remaining three outcomes are modest These results likely reflect the fact that students
who were randomly assigned into the treatment group have higher pre-treatment grades and
reading and math test scores all covariates that strongly correlate with science skill and future
grades
Columns (9) through (12) of Table 6 use the Lee (2009) method to place bounds on our
estimates due to potential nonresponse bias in the student survey used for the first four outcomes
This method trims particular observations from the treatment group (in this case) until it matches
the response rate of the control group The lower (upper) bound estimate trims the treatment
observations with the highest (lowest) values of the outcome Using these lower and upper bound
estimates we compute the 95 percent confidence interval for the treatment effect itself by
applying the Imbens and Manski (2004) method Consistent with our main findings the upper
and lower bound points estimates are positive for science skill (003 and 039 sd) interest in
pursuing a STEM degree (2 and 12 percentage points) and stress (1 and 11 percentage points)
However the 95 percent confidence intervals overlap zero in all cases and are roughly double the
size of the ordinary confidence intervals These results suggest that some additional caution
should be considered in evaluating the effects from outcomes based on the study survey31
Finally we would have liked to report the results of theoretically motivated heterogeneity
analyses yet we lack the statistical power needed to test heterogeneity with a high level of
confidence For example Figure 3 shows a quantile regression conditional on Xi with science
skill as the outcome We find that the point estimates at every quantile are insignificantly
different from the 009 ITT point estimate reported in Table 5 yet the 95 confidence intervals
fail to rule out large positives and negatives Additional heterogeneity results can be found in the
Online Appendix32
VI Conclusion
Most admissions committees at bachelorrsquos degree-granting institutions rely on applicantsrsquo AP
course and exam participation as signals of subject-matter skill and interest rendering the
relationship between AP uptake and college enrollment somewhat deterministic There has been
almost no empirical work to support the theory that AP disproportionately endows high school
students with greater human capital than the other courses available to them Many students
educators and parents have also complained that the rigor of the AP pro- gram causes students to
lose confidence gain stress and perform poorly in other courses We evaluate these claims with
experimental evidence on the impact of AP Biology and Chemistry courses on studentsrsquo skills
14
interests and beliefs We recruited 23 schools that had not previously offered AP Biology or
Chemistry and were willing to permit us to randomize student access to the newly offered
course At the time of our school recruitment an estimated 50 percent of US high schools
already offered AP science classes and they tended to be in relatively higher-income
communities disproportionately serving White students (Malkus 2016) Our study drew from the
remaining population of schools where teachers had lower levels of training than science
teachers nationally and students were disproportionately non-White and poor Consequently our
results on AP impacts best generalize to schools like these that are on the cusp of deciding
whether to offer an AP science course
The estimates suggest that AP science led to improvements in science skill and STEM
interest above the courses that these students would otherwise take Prior research points to
longer-run benefits of AP including a higher likelihood of college enrollment and completion as
well as possible earnings gains (Jackson 2010 2014) Our findings suggest that these long-term
effects are at least partially driven by genuine increases in skill and not due solely to
postsecondary admissions and credit-granting policies33 We also find that AP science classes
substantially increase studentsrsquo stress levels and reduce their confidence in completing a college
science course Students who take AP science also receive lower grades in science and in other
(non-science) courses The cognitive gains from AP science are consistent with evidence that
higher levels of pressure and a lower level of confidence cause students to learn more than they
would otherwise And some of the negative effect on grades can be offset by upwardly weighting
grades in advanced courses
Although we have no direct way to convert our study impacts into monetary values for
students or society our evidence suggests that schools and districts are not making unwise or
costly investments in AP Calculating the differential cost to deliver an AP course versus another
level course in the same subject is difficult given that few schools document per-course
expenditures One recent analysis of a US district that relied on teacher salaries and course
assignments offers a partial cost-analysis Roza (2009) finds approximately $360 more in per-
pupil expenditures to deliver AP versus honors due primarily to smaller class sizes and more
senior teachers in AP This cost does not factor in the time that teachers spend retraining
themselves to teach the new curriculum At the same time relative to other policies aimed at
increasing human capital in high school that are often more costly to implement (such as
reducing class size) offering an AP course may be one of the least expensive options
This study offers the first credible estimates on the impact of a curriculum that is now offered
in the majority of the nationrsquos high schools and used by most postsecondary institutions to assess
applicant potential Our findings offer evidence to support and refute some of the claims made
about the AP program At the same time many important questions remain about differential AP
course impacts along student teacher and school attributes and on different parts of the outcome
distributions What are the general equilibrium effects of AP expansion for instance on college
admissions decisions as AP expands into schools with fewer resources Do AP courses generate
spillover effects on non-AP course-takers via changes in peer interactions and changes in how
teachers teach their non-AP classes These are all questions that warrant further research
15
References
Abadie Alberto Susan Athey Guido W Imbens and Jeffrey Wooldridge 2017 ldquoWhen Should
you Adjust Standard Errors for Clusteringrdquo NBER Working Paper No 24003
Cambridge MA NBER
Adelman Clifford 2006 The Toolbox Revisited Paths to Degree Completion from High School
Through College Washington DC US Department of Education
Aguilar Lauren Greg Walton and Carl Wieman 2014 ldquoPsychological Insights for Improved
Physics Teachingrdquo Physics Today 67 (5) 43ndash49
Altonji Joseph G 1995 ldquoThe Effects of High School Curriculum on Education and Labor
Market Outcomesrdquo The Journal of Human Resources 30 (3) 409ndash438
Anderson Carl R 1976 ldquoCoping Behaviors as Intervening Mechanisms in the Inverted-U-stress-
performance Relationshiprdquo Journal of Applied Psychology 61 (1) 30ndash34
Attewell Paul and Thurston Domina 2008 ldquoRaising the Bar Curricular Intensity and
Academic Performancerdquo Educational Evaluation and Policy Analysis 30 (1) 51ndash71
Avery Christopher Oded Gurantz Michael Hurwitz and Jonathan Smith 2018 ldquoShifting
College Majors in Response to Advanced Placement Exam Scoresrdquo Journal of Human
Resources 53 (4) 918ndash956
Benjamini Yoav and Yosef Hochberg 1995 ldquoControlling the False Discovery Rate A Practical
and Powerful Approach to Multiple Testingrdquo Journal of the Royal Statistical Society 57
(1) 289ndash300
Bennett J S Hogarth F Lubben B Campbell and A Robinson 2010 ldquoTalking Science The
Research Evidence on the Use of Small Group Discussions in Science Teachingrdquo
International Journal of Science Education 32 (1) 69ndash95
Berger Joe 2006 ldquoDemoting Advanced Placementrdquo The New York Times October 4
Boekaerts Monique and Jeroen S Rozendaal 2010 ldquoUsing Multiple Calibration Indices in
Order to Capture the Complex Picture of What Affects Studentsrsquo Accuracy of Feeling of
Confidencerdquo Learning and Instruction 20 (5) 372ndash382
Bound John Brad Hershbein and Bridget Terry Long 2009 ldquoPlaying the Admissions Game
Student Reactions to Increasing College Competitionrdquo The Journal of Economic
Perspectives 23 (4) 119ndash146
Bowie Liz 2013 ldquoMaryland Schools have been Leader in Advanced Placement but Results are
Mixedrdquo The Baltimore Sun August 17
Bush George W 2006 ldquoState of the Union Address by the Presidentrdquo Washington DC The
White House
Chiu Ming Ming and Robert M Klassen 2010 ldquoRelations of Mathematics Self-Concept and its
Calibration with Mathematics Achievement Cultural Differences among Fifteen-year-
olds in 34 Countriesrdquo Learning and Instruction 20 (1) 2ndash17
Clotfelter Charles T Helen F Ladd and Jacob L Vigdor 2010 ldquoTeacher Credentials and
Student Achievement in High School Across-Subject Analysis with Student Fixed
Effectsrdquo Journal of Human Resources 45 (3) 655ndash681
College Board 2002 Equity Policy Statement New York NY
__________ 2011a AP Biology Curriculum Framework 2012-2013 New York NY
__________ 2011b AP Chemistry Curriculum Framework 2013-2014 New York NY
__________ 2017a AP Course and Exam Redesign New York NY
__________ 2017b AP Course Audit New York NY
__________ 2018 AP Program Participation and Performance Data 2018 New York NY
16
Davis Jennifer R 2014 ldquoA Little Goes a Long Way Pressure for College Students to Succeedrdquo
Journal of Undergraduate Research 12 (1) 1ndash9
Dobbie Will and Roland G Fryer Jr 2015 ldquoThe medium-term impacts of high-achieving
charter schoolsrdquo Journal of Political Economy 123 (5) 985ndash1037
Dougherty Chrys and Lynn Mellor 2009 ldquoPreparation Mattersrdquo National Center for
Educational Achievement Washington DC
Dounay Zinth Jennifer 2016 ldquo50-State Comparison Advanced Placement Policiesrdquo Education
Commission of the States
Drew Christopher 2011 ldquoRethinking Advanced Placementrdquo The New York Times January 7
Duffett Ann and Steve Farkas 2009 ldquoGrowing Pains in the Advanced Placement Program Do
Tough Trade-offs Lie Aheadrdquo Thomas B Fordham Institute Washington DC
Ellis Jessica Bailey K Fosdick and Chris Rasmussen 2016 ldquoWomen 15 Times More Likely to
Leave STEM Pipeline after Calculus Compared to Men Lack of Mathematical
Confidence a Potential Culpritrdquo PLOS ONE 11 (7) 1ndash14
Foust Regan Clark Holly Hertberg-Davis and Carolyn M Callahan 2009 ldquoStudentsrsquo
Perceptions of the Non-academic Advantages and Disadvantages of Participation in
Advanced Placement Courses and International Baccalaureate Programsrdquo Adolescence
44 (174) 289ndash312
Geiser Saul and Veronica Santelices 2004 ldquoThe Role of Advanced Placement and Honors
Courses in College Admissionsrdquo Center for Studies in Higher Education Research
Occasional Paper Series CSHE404
Goodman Joshua Samuel 2012 ldquoThe Labor of Division Returns to Compulsory Math
Courseworkrdquo Unpublished Manuscript
Harel O 2009 ldquoThe Estimation of R-squared and Adjusted R-squared in Incomplete Data Sets
Using Multiple Imputationrdquo Journal of Applied Statistics 36 (10) 1109ndash1118
Hippel Paul T von 2007 ldquoRegression with Missing Ys An Improved Strategy for Analyzing
Multiply Imputed Datardquo Sociological Methodology 37 (1) 83ndash117
Holstead Michael S Terry E Spradlin Margaret E McGillivray and Nathan Burroughs 2010
ldquoThe Impact of Advanced Placement Incentive Programsrdquo Center for Evaluation and
Education Policy Indiana University Education Policy Brief 8(1)
Hopkins Katy 2012 ldquoWeigh the Benefits Stress of AP Courses for Your Studentrdquo US News
the World Report May 10
Huber Martin 2013 ldquoA Simple Test for the Ignorability of Non-compliance in Experimentsrdquo
Economics Letters 120 (3) 389ndash391
Imbens G and F Manski 2004 ldquoConfidence Intervals for Partially Identified Parametersrdquo
Econometrica 72 (6) 1845ndash1857
Jackson C Kirabo 2010 ldquoA Little Now for a Lot Later A Look at a Texas Advanced Placement
Incentive Programrdquo Journal of Human Resources 45 (3) 591ndash639
__________ 2014 ldquoDo College-Preparatory Programs Improve Long-Term Outcomesrdquo
Economic Inquiry 52 (1) 72ndash99
Joensen Juanna Schroslashter and Helena Skyt Nielsen 2009 ldquoIs there a Causal Effect of High
School Math on Labor Market Outcomesrdquo Journal of Human Resources 44 (1) 171ndash
198
Kim Emily 2015 ldquoAP Classes often Translate to Advanced Pressurerdquo Los Angeles Times
September 22
Klopfenstein Kristin and Kit Lively 2016 ldquoDo Grade Weights Promote More Advanced
17
Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324
Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement
Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891
__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and
Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds
Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188
Cambridge Harvard Education Press
Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla
Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)
287ndash 313
Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on
Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102
Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations
of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347
(6219) 262ndash265
Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math
and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic
Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student
STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher
Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking
on Secondary and Postsecondary Successrdquo American Educational Research Journal 49
(2) 285ndash322
Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP
Expansion Can Schools in Less-Resourced Communities Successfully Implement
Advanced Placement Science Coursesrdquo Conditionally accepted by Educational
Researcher
Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo
American Enterprise Institute Washington DC
Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23
McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy
Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of
Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-
144) US Department of Education Washington DC National Center for Education
Statistics
National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of
Mathematics and Science in US High Schoolsrdquo Washington DC National Academies
Press
__________ 2012 A Framework for K-12 Science Education Practices Crosscutting
Concepts and Core Ideas Washington DC The National Academies Press
Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC
Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data
Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures
Version 10 Stanford University
Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic
Analysis amp Policy 4 (1) 1ndash30
18
Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The
Review of Economics and Statistics 86 (2) 497ndash513
Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)
Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of
Advanced High School Coursework in Increasing STEM Career Interestrdquo Science
Educator 23 (1) 1ndash13
Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework
in College Admission Decisionsrdquo College and University 82 (4) 7ndash14
Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan
Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific
Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo
Educational Measurement Forthcoming
Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where
it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor
Economics 35 (1) 67ndash147
Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An
Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732
Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual
differencesrdquo Personality and Individual Differences 21 (6) 971ndash986
Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of
Cross-Cultural Psychology 45 (5) 821ndash837
Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid
Growthrdquo The New York Times April 29
Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo
Liberal Education 94 (3) 38ndash43
The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo
Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo
Education Trust June 5
Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and
Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-
001) US Department of Education Washington DC National Center for Education
Statistics
Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13
Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate
US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the
Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced
Placement Testsrdquo Washington DC
Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of
Advanced Placementrdquo Progressive Policy Institute Washington DC
West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth
Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring
Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation
and Policy Analysis 38 (1) 148ndash170
Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity
of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482
19
Figure 1
Geographic Distribution of Participating Districts
20
Figure 2
Participating Districts Neighborhood Socioeconomic Status and School Test Scores
Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school
district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos
neighborhood defined as the first principal component factor score based on measures of median
income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed
household rate and unemployment rate Y-axis is the districtrsquos average test score in grade
equivalents based on the averaged spring math and English scores for students in grades 3-8 for
2009-2013 with the expected level of achievement standardized to zero The size of each circle
is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using
Statarsquos default settings and roughly shows the predicted test score as a function of the
neighborhoodrsquos SES
21
Figure 3
Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile
Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects
Corresponding OLS estimate shown by the dashed horizontal line Science skill has been
standardized to have a mean of 0 and SD of 1 for the full sample of participating students
Results are weighted by the inverse probability of completing the survey
22
Table 1
Participating Schools and Teachers Compared to Other US High Schools and High School
Science Teachers Panel A Schools Participating Others
Average Enrollment 1409 723
Free or Reduced-Price Lunch 0700 0438
Asian 0055 0050
Black 0349 0154
Hispanic 0410 0221
White 0164 0537
Adjusted Cohort Graduation Rate 0843 0802
District Instruction Expenditures Per Pupil $6561 $5636
District Student Services Expenditures Per Pupil $3787 $3385
Panel B Teachers Participating Others
Age Under 30 0407 0160
Age 30-49 0432 0553
Age 50 or over 0161 0287
Female 0630 0536
Hispanic or Latino 0111 0051
Race American Indian or Alaska Native 0000 0009
Race Asian American 0111 0041
Race Black 0111 0060
Race Native Hawaiian or other Pacific Islander 0000 0004
Race White 0778 0896
Years of Experience 103 132
Years of Experience lt=2 0290 0085
Years of Experience lt=5 0481 0234
Hold a Teaching Certificate 0926 0945
Undergraduate Major in STEM 0944 0747
Single Subject Credential in Science 0630 0823
Masterrsquos Degree or Higher 0356 0615
Previously Taught AP Course 0469 NA
Previously Taught AP IB or Honors Course 0796 NA
Number of Professional Development Trainings 309 NA
in the Past 5 years (0-5)
Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts
httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public
high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a
9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the
Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey
httpsncesedgovsurveyssass Others in Panel B refers to public and private high school
teachers in the US High school science teachers are defined as teachers of grades 9-12 whose
main teaching assignment is in the natural sciences
23
Table 2
TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics
(1) (2) (3) (4) (5) (6)
Full Sample Survey Sample
Pre-Treatment Characteristic
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Age as of October of 11th Grade 166 -003 -007 166 -001 -001
(002) (007) (003) (009)
[019] [035] [065] [094]
Math Exam Score 038 008 025 044 007 030
(004) (010) (005) (016)
[008] [002] [017] [006]
Reading Exam Score 029 010 018 036 009 017
(003) (012) (004) (017)
[000] [014] [002] [031]
HS Grade Point Average 316 005 020 323 006 013
(003) (008) (003) (010)
[014] [002] [006] [020]
Female 059 000 010 061 -001 011
(003) (006) (004) (007)
[099] [010] [073] [012]
Asian American 012 002 010 012 003 010
(002) (005) (001) (007)
[027] [006] [007] [012]
Black 032 -002 -006 027 000 -005
(002) (006) (002) (005)
[029] [028] [088] [040]
Hispanic Native American or Multiracial 031 001 005 033 001 005
24
(002) (006) (002) (007)
[055] [041] [081] [051]
Disabled 002 000 -001 001 000 -001
(001) (001) (001) (001)
[093] [024] [057] [05]
Gifted 013 003 000 014 002 001
(002) (005) (002) (009)
[006] [100] [025] [089]
English Language Learner 005 001 002 004 001 004
(001) (002) (001) (003)
[041] [039] [054] [022]
Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007
(002) (007) (003) (009)
[066] [077] [072] [045]
Language Other than English Spoken at Home 034 002 003 035 001 004
(002) (007) (002) (007)
[032] [073] [059] [056]
Took Recommended Prerequisite Courses 079 000 009 079 002 005
(002) (004) (002) (005)
[084] [004] [043] [031]
Number of Observations 1819 1417
Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by
School x Cohort are in parentheses and p-values are in brackets
25
Table 3
First Stage Impacts on AP Course Enrollment and Overall Course Enrollment
(1) (2) (3) (4) (5) (6)
Full Sample Survey Respondents
Outcome
Control
Group
Mean
ITT
LATE
Control
Group
Mean
ITT
LATE
AP Treatment Course Enrollment 019 038 024 039
(005) (006)
[000] [000] Share of Credits During Study Year in
AP Science 003 004 011 003 004 010
(001) (001) (001) (001)
[000] [000] [000] [000]
All AP 013 004 011 014 004 010
(001) (002) (001) (002)
[000] [000] [000] [000]
Other Advanced Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [023] [020] [020]
All Other Advanced 025 -001 -003 025 -001 -003
(001) (002) (001) (003)
[023] [023] [030] [030]
Regular Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [020] [024] [019]
All Regular 062 -003 -009 061 -003 -007
(001) (003) (001) (003)
[002] [000] [007] [003]
Number of Observations 1819 1417
Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating
Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation
(1) Course-taking information collected from student transcripts Control Group Mean uses the
full control group for the first outcome (ie AP Treatment Course Enrollment) and those control
group members who complied with their assignment (ie those who did not take the AP
Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are
weighted by the inverse probability of completing the survey Standard errors clustered by School
x Cohort are in parentheses and p-values are in brackets
26
Table 4
Treatment Contrast (Composite Variables)
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Academically Challenging Curriculum -033 031 080
(010) (024)
[000] [000]
Project-Based Independent Classroom
Activities -006 013 033
(007) (017)
[007] [006]
Integrated Use of Technology
-011 011 028
(008) (019)
[019] [014]
Number of Observations 1417
Notes To construct these composite variables we first converted the values on each component
variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest
category was set to 10 the lowest to 00 and the remaining categories evenly spaced between
00 and 10 We then averaged and standardized these converted values Results are weighted by
the inverse probability of completing the survey Online Appendix Table 5 provides the list of
component variables Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
27
Table 5
AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Science Skill -019 009 023
(006) (016)
[015] [014]
STEM Interest 062 004 009
(002) (007)
[016] [016]
Confidence in College
Science 092 -004 -010
(002) (005)
[011] [006]
Stress 012 007 017
(003) (007)
[002] [001]
Grades in Science Courses 280 -012 -029
(007) (016)
[008] [007]
Grades in Other Courses 314 -007 -018
(002) (006)
[000] [000]
Number of Observations 1819 for grades 1417 for other
outcomes
Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of
participating students STEM interest =1 if high or some interest in pursuing a STEM degree or
=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to
complete a college science course or =0 if somewhat not confident or not at all confident Stress=
1 if most recent science course had strong negative or negative impact on physical or emotional
health or =0 if strong positive impact positive impact or no impact Grades in science and other
courses are obtained from student transcripts and measure grades during the study year
Results with the exception of grades during study year are weighted by the inverse probability of
completing the survey Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
Table 6
Robustness Checks of Main ITT Results
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Outcome
Control
Group
Complie
r Mean
Main
Result
s
Robus
t SE
p-value
(permutatio
n test)
Excludin
g High
School
56
Including
Imputatio
n of
Missing
Outcome
Variables
Excluding
Covariate
s
Excludin
g High
School
23
Lee
Lower
Boun
d
Lee
Upper
Boun
d
95
Confidence
Interval
from Lee
Bounds
Rati
o of
95
CI in
(11)
to
95
CI in
(7)
Science Skill -019 009 010 011 020 007 003 039
-
009
05
1 20
(006) (005) (000) (000) (000) (000) (007) (007)
[015] [006] [006] [020] [011] [001] [024] [072] [000]
STEM Interest 062 004 005 003 003 003 002 012
-
003
01
8 19
(002) (003) (000) (000) (000) (000) (003) (004)
[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College
Science 092 -004 -003 -006 -006 -004 -006 005
-
009
01
0 20
(002) (002) (000) (000) (000) (000) (002) (003)
[011] [005] [007] [037] [002] [003] [010] [000] [017]
Stress 012 007 005 006 008 007 001 011
-
005
01
5 16
(003) (002) (000) (000) (000) (000) (003) (002)
[002] [000] [000] [014] [007] [002] [002] [079] [000]
Grades in Science Courses 280 -012 -006 -010 -007 |
(007) (004) (000) (000) (000)
[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts
Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey
(002) (003) (000) (000) (000) |
[000] [001] [001] [000] [001] [038]
Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than
standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby
a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of
43
the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)
reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the
experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply
imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and
where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)
from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those
treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and
control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to
derive confidence interval for the treatment effect itself)
30
1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the
Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the
effect of advanced high school courses more generally often without distinctions between AP
and other rigorous course options Nearly all of these nonexperimental studies find large positive
effects of rigorous secondary school courses particularly those in math and science on studentsrsquo
high school postsecondary and labor market performance (eg Altonji 1995 Attewell and
Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long
Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer
an AP course are encouraged (though not required) to attend a professional development
training The Board and other independent agencies offer several workshops with the most
extensive training being the AP summer institute a week-long training that is led by an
experienced AP instructor Teachers are then expected to develop their syllabi for the course and
submit them to the Board for review A team of auditors at the Board review each syllabus and
grant permission to a school to label the course as AP on course catalogs and student transcripts
once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they
do not meet the requirements upon original submission College Board (2017b) contains a
discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for
assessment (ie course delivery and student performance are not assessed by the Board) In
order to effectively run an AP Biology or Chemistry course teachers require access to a well-
equipped classroom and laboratory including all supplies necessary to engage in
experimentation (eg beakers solutions microscopes measuring equipment) Most of the
teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of
learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry
reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses
were based upon recommendations from the National Science Foundation the National Research
Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently
influenced by their frames of reference in ways that other assessments of these traits (eg
external observations) may be less influenced By increasing the standard to which they compare
themselves studentsrsquo confidence may decrease This feature of most self-assessments could be
considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et
al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome
depends to some extent on how these changes in perceived ability influence other behaviors
such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and
Biology I and Chemistry I for AP Biology with no additional requirements beyond these
prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week
training course classroom supplies (eg lab materials textbooks) and to compensate schools
for the staff time required for study administration efforts We also offered $1000 compensation
for an individual selected by the school to serve as a liaison between the study team and the
31
school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of
students which would have powered the study to detect effect sizes smaller than those detected
here We faced several challenges in recruiting schools to participate even with the monetary
incentives Some schools were uncomfortable with randomization across classrooms while
others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the
course would be offered We also made some assignments on a rolling basis as additional
consentassent forms were submitted We have no information on the students who were deemed
eligible by the school to take the new AP science course but who did not sign the consent form
to participate As these students did not participate we do not have permission to obtain
information on their characteristics (eg via transcripts) and for most schools we do not know
the number of such students 10 Participating districts include Anaheim Union High School District California East Side
Union High School District California Lynwood Unified School District California Jefferson
Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg
Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public
Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville
Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate
degrees held by teachers nationally are likely to be in education (not STEM) Thus the study
teachers are less likely to have a graduate degree but not necessarily less likely to have STEM
training We also did not survey teachers regarding their Teach for America (TFA) experience
but it is possible that the relatively high share of STEM undergraduate degrees could be driven
by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last
pilot test included 140 students) prior to administering the tool to study participants Reliability
metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of
the development of the assessment instrument in the survey can be found in Seeratan et al
(2017) 13 Each year in the spring semester our team administered and collected the participant surveys
during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However
if study participants who did not take the survey differ in unobserved ways then our reweighting
based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo
characteristics before imputation of missing values (as described below) these results are very
similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact
that some students did not have 10th grade scores) we created one reading and math score for
each student that is the average of both scores or just the 8th grade score For the 23 participating
students who were in 10th grade during the year in which the AP course was offered to their
cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be
endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of
enrollment in the course so the lack of balance is simply due to unlucky randomization rather
32
than manipulation by school administrators We considered implementing a randomized block
design to avoid such issues but found it infeasible to obtain the necessary test score information
prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was
allowed to register for the new class We added an entire planning year to our study design to
avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by
Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We
find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these
six outcomes which suggests that generalizing our estimated treatment effects to the full control
group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly
offered enrollment in an AP course and then given the option of Chemistry or Biology To
account for the two courses offered we treat the school as two separate groups School-
Chemistry and School-Biology For those students who were not offered an AP course we
randomly assign them to one of two control groups proportional to the number of treated
students who chose each course For example if 60 of the treated students chose Biology then
we randomly assign 60 of the control students to the School-Biology control group In Section
VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a
probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1
if student i in school by cohort j completed any part of the end-of-year survey Xi is the same
vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed
effects and Φ() is the cumulative normal distribution function The results of this regression are
included in Online Appendix Table 2 Students who had higher pre-treatment grades Black
students those who were not disabled and those who took prerequisite courses were more likely
to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives
more weight in the regression to study participants who completed the survey and yet had pre-
study characteristics that were similar to those study participants who did not complete the
survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and
with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we
observe each outcome variable This follows a multiple imputation then deletion strategy
suggested by Hippel (2007) which improves efficiency while protecting against problematic
imputed outcome values As a robustness check Section VC provides results including
imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually
challenging curriculum with more homework than non-AP complier students Treatment group
students are also more likely to report that the students in their class were driven to succeed and
that the teacher set high standards The AP science class also involved more student-led projects
or experiments hands on learning and small group work all activities that are deemed to be
essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)
Yet we do not find strong evidence that students in AP classes were more likely to present what
they learned apply their knowledge to solve a new problem or work independently and none of
the component measures of technology usage were statistically significantly affected Nor did
33
treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear
better able to implement the academic rigor expected of an AP science class than some of the
inquiry-based approaches that the College Board intends for AP science We do not find
evidence that taking AP science led students to be more likely to report that they found their
course more interesting which may reflect the inability of the teachers to fully implement a
creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects
that might render our estimated effects smaller A research design with randomization both
across and within schools would allow for estimation of spillover effects but such a design was
infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers
received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the
weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors
in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple
outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons
(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same
three outcomes that reach statistical significance without applying the correction (shown in
Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys
from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we
have included the students from cohort 1 of high school number 23 where nonresponse was due
mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes
(including student prior academic preparation raceethnicity gender and teacher preparation)
We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in
science and grades in other courses) Some of the differences in the point estimates were quite
large yet so too were the standard errors For instance five of the seven estimated differential
treatment effects on science skill exceed 025 standard deviations with p-values that fall in the
suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse
on all three cohorts of study participants Once data collection is complete we will have the
ability to examine the effect of AP science on college enrollment college selectivity and college
completion
7
extracurricular and leisure If AP courses are more demanding than other courses students
solving a time allocation problem may shift more effort into their AP course away from other
pursuits The impact of this change in time allocation on studentsrsquo performance in AP and other
courses will depend upon whether they shift effort away from other courses and on the degree of
complementarity between their AP science course and their other courses Study time devoted to
an AP science course could improve student performance in other math and science classes
(where the skills tasks and knowledge are similar) even if students spend less time on those
courses For courses that require students to perform tasks that are not complementary with AP
science (eg courses in the humanities) taking AP science concurrently with these courses
could decrease student performance in both courses Of course students taking an AP course
could choose to reduce time spent on alternative (non-academic) activities If these other
activities have no causal impact on performance in school then the impact on overall
achievement could be negligible
Some students report concerns about their time allocation as they weigh the decision to enroll
in AP (Foust Hertberg-Davis and Callahan 2009 Hopkins 2012 Kim 2015) Many of these
concerns have increased over time as the courses have become more accessible to students who
previously faced barriers to enrollment Traditionally teachers only recommended AP courses to
students with high grades in prerequisite classes and the courses were only offered in schools
with substantial resources The Board has made efforts to increase access with for instance a
policy statement that encourages schools to open AP to all students who are ldquowilling to accept
the challengerdquo and remove all barriers that restrict access (College Board 2002)6 In a 2008
survey of a nationally-representative sample 65 percent of secondary school teachers reported
that their schools encourage as many students as possible to take AP and 69 percent reported that
AP courses are generally open to any student who wants to enroll (Duffett and Farkas 2009)
These open access policies have led to complaints that students who enroll with less preparation
will be unable to engage in the material (and perhaps become more discouraged by the
difficulty of the course) than students with more prior preparation (Hopkins 2012 Steinberg
2009 Duffett and Farkas 2009) Open access could also adversely affect more prepared students
through negative peer effects or through teachers removing content and slowing the pace of
course delivery
III AP Science Impact Study
A Overview
We recruited 23 schools from across the United States and offered monetary compensation to
pay for equipment and teacher training and as an incentive to secure participation7 Eligible
schools included ones that had not offered AP Biology or AP Chemistry in recent years were
willing to add such a course and comply with study protocol and had more eligible students than
could be served in one class so as to supply a sufficiently-sized control group8 Of the 23
schools 12 schools added AP Chemistry 10 schools added AP Biology and 1 school added both
courses We recruited two waves of schools (those that offered the course for the first time in
2013 and those that offered it for the first time in 2014) both waves were asked to field the
course for two years and the earlier-joining schools had the option of fielding the course for
three years The study includes 47 schools by cohort groups
Each participating school identified students that the school deemed eligible to take the new
AP Biology or Chemistry course in the spring of the prior year We treated all eligible students
8
who assented to participate in the study and who obtained consent from their parent or guardian
as study participants Upon receipt of signed consentassent forms we randomly offered
enrollment in the newly launched course to a subset of participating students9 The study
includes a total of 27 teachers and 1819 students (with an average of approximately 19 students
per AP class)
Figure 1 shows the geographic distribution of the 11 participating districts which are
primarily concentrated in the western southern and eastern regions of the country10 The
underrepresentation of districts in the Midwest is consistent with evidence that the Midwestern
region has experienced less competition over the years in access to selective postsecondary
institutions and a corresponding lag in AP participation rates (Bound Hershbein and Long
2009) Relative to districts across the nation those participating in the study tend to be in
neighborhoods with lower levels of socioeconomic status and to educate students who score
below average on tests in earlier grades (see Figure 2) Correspondingly participating schools
tend to be larger and more likely to educate students who are eligible for free or reduced-price
lunch Black and Hispanic than other schools (Panel A of Table 1)
There are two reasons for this over-representation of larger schools serving less economically
prosperous communities First AP courses are already offered in the majority of the nationrsquos
public high schools and schools that serve students from high-income families tend to offer
more AP subjects than schools that serve students from lower-income families (Malkus 2016
Theokas and Saaris 2013) Given that our research design only allowed for schools that had not
recently offered an AP science course the population of schools from which we recruited tended
to be those in settings with fewer resources Second participating schools were required to state
that they believed they would have 60 or more students who were qualified to take the AP
science course and this requirement tended to disqualify smaller high schools
Reflecting the school demographics participating teachers are slightly younger less
experienced and more likely to be female Black Asian American and of Hispanic ethnicity
than US high school science teachers generally (Panel B of Table 1) Nearly half (a third) of our
study teachers have less than or equal to five (two) years of teaching experience which is more
than double (triple) the rate of US high school science teachers Study teachers are more likely to
hold an undergraduate major in a STEM field than other high school science teachers yet far less
likely to hold a mastersrsquo degree and slightly less likely to have earned a teaching credential in
science Most of the participating teachers had previously taught a higher-level course (mostly
honors) yet only 47 percent of them had previously taught an AP course Our research
consequently applies to a population of teachers who are relatively new to the AP science
curriculum and who have generally not received graduate training11 Assuming AP courses
improve with teacher preparation our results likely capture the effect of a less-than-ideal version
of AP and may result in less positive treatment effects than when AP is delivered by teachers
with more training and experience (Clotfelter Ladd and Vigdor 2010)
B Data and Student Descriptive Statistics
We rely on three primary and secondary data sources for impact estimates The first is an
assessment developed and validated by the research team that measures studentsrsquo scientific
inquiry skills We administered this assessment to students in both treatment and control groups
and designed it to measure general inquiry skills (eg how to analyze data) rather than specific
content knowledge in Biology or Chemistry To that end the assessment tool includes nine items
that rely on science disciplinary knowledge that is taught in middle school specifically material
from Life Sciences and Physical Sciences The assessment which we administered to all study
9
participants during a 45-minute period measures studentsrsquo skills in data analysis scientific
explanation and scientific argument12 Participating teachers were not provided copies of the
instrument in advance therefore teachers were unable to teach any content material prior to test
administration
The second source is a questionnaire that we administered concurrently with the assessment
and that asks students a number of questions about their most recent science class and their plans
after high school The assessment and questionnaire were completed together and administered
outside of class (henceforth we refer to these instruments as the ldquosurveyrdquo) The third data source
are studentsrsquo high school transcripts which contain data on demographic and socioeconomic
background grades courses standardized exams taken in the 8th and 10th grades as well as high
school completion We use these data to determine the balance of randomization on pre-
treatment covariates estimate the effect of randomization on course-taking (including
compliance) improve the precision of our estimates with statistical controls and estimate
treatment effects on studentsrsquo grades
Our survey response rate was 78 percent13 Attrition can be attributed to student absences
during the dates scheduled for survey administration and communication lapses between school
coordinators and students Students who were randomly assigned to treatment have a 9-
percentage point higher survey response rate Given the possibility of nonrandom sample
attrition we weight all regressions by the inverse of the probability of completing the survey
conditional on student characteristics14 We implement a variety of robustness checks as
additional means to account for nonresponse These include multiple imputation of missing
outcome variables excluding one high school that had a low response rate and using the Lee
(2009) technique to provide bounds on the estimated effects These methods and results are
discussed below
We supplement these data with surveys that we administered online to teachers of the new
AP courses at the conclusion of the course The teacher survey includes questions about their
educational background professional experiences and professional development past and
present instructional practices generally and around science specifically participation in the
College Board AP training ability to cover the content of the AP course and coaching
mentoring and other professional community supports received from the school district and
education community
Table 2 provides balancing tests on pre-treatment characteristics for the full sample and the
survey respondents conditional on school by cohort fixed effects15 Most of the estimated
differences between treatment and control group students on pre-treatment observed
characteristics are small with some notable exceptions In both the full and survey samples
treatment group studentsrsquo reading exam scores were 010 and 009 standard deviations higher
than control group students both at p-values below 005 The magnitude of the treatment-control
difference was slightly lower and less precisely-estimated in math yet also favored treatment
group students16 To adjust for these chance imbalances we include all student covariates as
predictors of outcomes in the models and in the robustness checks we exclude these
covariates17
Table 2 also shows the extent of differences between control group compliers and non-
compliers We find that non-compliers are generally much more academically prepared for AP
science they have higher pre-treatment reading and math test scores and are more likely to have
completed the prerequisite courses On demographics non-compliers are more likely to be Asian
American and female18
10
IV Empirical Strategy
We estimate the effect of taking the AP science course with a standard instrumental variable
specification
(1) 119884119894119895 = 120572119895 + 119860119894119895120573 + 119935119894120574 + 120598119894119895
(2) 119860119875119894119895 = 120575119895 + 119874119891119891119890119903119890119889119894119895120579 + 119935119894120583 + 120598119894119895
where 119860119875119894119895 = 1 if student i enrolled in the AP science course in school x cohort stratum j 119860119894119895 is
the fitted value based on the estimates of the parameters in Equation (2) Offeredij = 1 if the
student is randomized into the treatment group Xi is a vector of pre-treatment covariates
(including age math and reading exam scores from 8th and 10th grade (standardized and
averaged for math and reading separately) cumulative GPA prior to the year when the AP
science course was offered and indicator variables for female racial group (Asian American
Black or Hispanic Native American or Multiracial) disability gifted English Language
Learner eligible for free or reduced-price lunch home language is not English and took
recommended prerequisite courses) and 120572119895 and 120575119895 are school by cohort fixed effects19 We use
two-stage least squares to estimate the model for all outcomes The local average treatment effect
(LATE) estimate is given by β
The intent to treat (ITT) estimate is obtained by replacing 119860119894119895 with Offeredij in Equation (1)
as shown in Equation (3) The coefficient on Offeredij in Equation (3) provides the effect of
being offered enrollment in the new AP science course and is a weighted average of effects on
those who do and do not choose to enroll in the course
(3) 119884119894119895 = 120577119895 + 119874119891119891119890119903119890119889119894119895120591 + 119935119894120582 + 120598119894119895
For outcomes that are obtained from the survey we weight regressions by the inverse of the
estimated probability of completing the survey20 The results are similar without using these
weights (see Online Appendix Tables 3 4 and 6) Since we have some missingness in student
characteristics as a result of either missing student transcripts or certain data elements not
collected by the district we use multiple imputation by chained equations creating 10 imputed
datasets and combine the results21 For inference we cluster standard errors at the level of
treatment assignment (school by cohort) in our analysis of main effects In the analysis of
robustness we report permutation standard errors robust standard errors (for comparison to
permutations) and the statistical significance of the LATE estimates after adjusting our tests of
significance for multiple comparisons
V Results
A Course-Taking and Treatment Contrast
Table 3 provides estimated effects of the randomized offer of enrollment on AP science course
enrollment and share of credits in all courses for the full sample and the survey samples The
first-stage estimates indicate that the offer substantially increased the likelihood of the student
taking the AP science course by 38 percentage points in the full sample and 39 percentage points
in the survey sample As we expected compliance with randomization was imperfect with 42
11
percent of the students who received an offer choosing not to enroll and 19 percent of the control
students enrolling Nearly all of these latter crossovers reflected decisions by the district to
violate the study protocol and let control group students into the course while a few of these
came from hardship exemptions that were requested by the school and granted by the study team
The remaining rows in Table 3 shine light on the courses that were crowded out by the newly
offered AP science course Mechanically treatment group students took more credits in AP
science (an 11-percentage point increase in the share of total credits in the full sample)
Treatment group studentsrsquo share of courses in any AP also increased by 11 percentage points
indicating that they chose not to reduce enrollment in other AP courses Instead taking AP
science appears to have crowded out regular courses (down 9 percentage points) including
regular science courses (down 2 percentage points)22
Approximately 78 percent of the control group compliers took any science course with 34
percent taking a non-AP advanced science course (almost entirely honors courses) during the
study year The control students who did not take AP Biology or Chemistry took a variety of
alternative science courses with the most commonly reported courses including Chemistry
(13) Physics (12) AP Environmental Science (11) Biology (10) Honors Biology (9)
and AnatomyPhysiology (9)
Table 4 provides the contrast in treatment and control group complier reports on the content
and rigor of their science courses for three composite variables We find that taking AP science
yielded a substantially more academically challenging curriculum (up 080 sd p-value lt 001)
and raised the extent of inquiry-based classroom activities (up 033 sd p-value = 006) Our
results also suggest that AP course-takerrsquos classrooms were more likely to use technology (up
028 sd p-value = 014)23 Online Appendix Table 5 shows estimated impacts on each of the
component variables used in constructing the composite variables We find that while AP
classrooms were more inquiry-based than other science classrooms using our composite
measure some of the core components of the inquiry approach that were intended by the Board
(eg applying knowledge to solve a new problem) were not more prevalent in AP science
classes than other science classes24 This contrast between studentsrsquo reports of the content and
rigor of their AP science course relative to other courses available to them offers one measure of
the relative quality of the treatment In a companion manuscript we provide a detailed evaluation
of implementation fidelity (the degree to which the courses were implemented as intended by the
Board) through teacher surveys course syllabi student transcripts and interviews with teachers
and school administrators (Long Conger and McGhee 2018) In that manuscript we find results
that are consistent with the finding that most teachers were able to implement a rigorous AP
science classroom yet they also struggled with the inquiry-based approach and integrating
technology into the classroom
These reported differences between treatment and control group classrooms also hold despite
the fact that many of the teachers selected to teach AP also teach the other science courses taken
by control group students In fact almost 67 percent of AP teachers reported using some of their
AP science strategies and lessons in their non-AP classes These within-school spillovers likely
attenuate observed differences in outcomes between treatment and control group students in the
same school25
B AP Impact on Outcomes
Table 5 reports estimated impacts of AP science on the key outcomes of interest We estimate
that for the typical complier taking AP science raises objectively measured scientific inquiry
skills by 023 standard deviations We are unable to rule out zero treatment impacts with
12
conventionally high levels of confidence (p-value = 014) and consequently refer to these results
as more suggestive than definitive AP science also increased compliersrsquo interest in pursuing a
STEM degree should they enroll in college by 9 percentage points up from a control group
complier mean of 62 percent with again more suggestive than definitive results at traditional
levels of statistical inference (p-value = 016)
Table 5 provides stronger evidence of negative treatment effects on studentsrsquo confidence in
their ability to succeed in a college science course Among control group compliers 92 percent
express that they are at least somewhat confident in their ability to succeed in a college science
course These high levels of confidence are perhaps not surprising since all of our sample
participants demonstrated interest in taking AP Chemistry or Biology as a result of signing the
study assent forms Taking AP science substantially lowered participantsrsquo likelihood of being at
least somewhat confident in their ability to complete college courses in science (down 10
percentage points p-value = 006) We also find large effects of the AP course on studentsrsquo self-
reported stress levels Among control group compliers 12 percent stated that their most recent
science class had a negative or strong negative impact on their stress levels (where a negative
impact indicates more stress) Taking AP science more than doubles this rate raising the
likelihood of stating a negative impact by 17 percentage points (p-value = 001) In results
available from the authors we also examine the effect of taking AP on the full distribution of
studentrsquos self-reported confidence and stress levels We find that taking AP science increases
studentsrsquo likelihood of reporting strong negative impacts on stress by 5 percentage points (p-
value = 005) above the control group complier mean of 2 percent
In addition to experiencing a loss in confidence and an increase in stress treatment group
studentsrsquo grades suffered We estimate that taking AP science reduced studentsrsquo grades in their
science courses by 029 points (p-value = 007) Relative to a control group complier mean of
280 taking AP science lowers studentsrsquo science GPAs during the study year (usually their junior
year) from around a B- to a C+26 This decline is addressed to some degree by high schools that
use a weighted grade point average to upweight grades from AP courses The last row of Table 5
provides our estimated effects of AP science on studentsrsquo grades in other courses AP science
takers score approximately 018 grade points lower than control group compliers in non-science
courses during the study year (p-value below 001) These results suggest that students may be
shifting their effort away from their non-AP classes in order to meet the demands of the
challenging AP course An average of these impacts weighted by studentsrsquo share of credits in
science during the study year assuming that they take AP science (024) suggests that taking AP
science lowers studentsrsquo overall grades by 021 during the year ((-029 times 024) + (-018 times
076))
With our estimates in hand we can easily compute the adjustment that would leave the
studentrsquos GPA during the study year unaffected For students who took AP Biology or Chemistry
as result of this experiment the share of their classes in any AP science subject is predicted to be
14 percent (ie 002 + 012 from Table 3) If these studentsrsquo grades in AP science courses were
boosted by 146 (021014) their GPAs during the study year would be unaffected by their
enrollment in these AP courses This 146 boost is close to the higher end of the practices
documented in Klopfenstein and Lively (2016)27
C Robustness Checks
Table 6 presents a variety of robustness checks of the ITT estimates on our six main outcomes
The first two columns of this table repeat the findings previously shown in Table 5 Columns (3)
and (4) present alternate methods for inference Column (3) reports robust standard errors and
13
Column (4) reports the results of a permutation test where we randomly assign a pseudo
treatment and compute the share of 1000 permutations where the absolute value of the estimated
pseudo treatment effect exceeds the absolute value of the estimated treatment effect shown in
Column (2)28 The resulting p-values from this permutation test are similar to the results using
robust standard errors (shown in Column (3)) resulting in five of the six outcomes with p-values
of less than 01029
Columns (5) through (7) of Table 6 show that the results are robust to (a) dropping the one
high school that offered both AP Biology and AP Chemistry as part of the study (b) including
observations with multiply-imputed missing outcome variables and (c) excluding the high
school with the lowest survey response rate30 Column (8) shows the results when we exclude all
of the Xi covariates where we find much larger estimated positive effects on scientific inquiry
skills and smaller estimated negative effects on grades The differences in the treatment effects
on the remaining three outcomes are modest These results likely reflect the fact that students
who were randomly assigned into the treatment group have higher pre-treatment grades and
reading and math test scores all covariates that strongly correlate with science skill and future
grades
Columns (9) through (12) of Table 6 use the Lee (2009) method to place bounds on our
estimates due to potential nonresponse bias in the student survey used for the first four outcomes
This method trims particular observations from the treatment group (in this case) until it matches
the response rate of the control group The lower (upper) bound estimate trims the treatment
observations with the highest (lowest) values of the outcome Using these lower and upper bound
estimates we compute the 95 percent confidence interval for the treatment effect itself by
applying the Imbens and Manski (2004) method Consistent with our main findings the upper
and lower bound points estimates are positive for science skill (003 and 039 sd) interest in
pursuing a STEM degree (2 and 12 percentage points) and stress (1 and 11 percentage points)
However the 95 percent confidence intervals overlap zero in all cases and are roughly double the
size of the ordinary confidence intervals These results suggest that some additional caution
should be considered in evaluating the effects from outcomes based on the study survey31
Finally we would have liked to report the results of theoretically motivated heterogeneity
analyses yet we lack the statistical power needed to test heterogeneity with a high level of
confidence For example Figure 3 shows a quantile regression conditional on Xi with science
skill as the outcome We find that the point estimates at every quantile are insignificantly
different from the 009 ITT point estimate reported in Table 5 yet the 95 confidence intervals
fail to rule out large positives and negatives Additional heterogeneity results can be found in the
Online Appendix32
VI Conclusion
Most admissions committees at bachelorrsquos degree-granting institutions rely on applicantsrsquo AP
course and exam participation as signals of subject-matter skill and interest rendering the
relationship between AP uptake and college enrollment somewhat deterministic There has been
almost no empirical work to support the theory that AP disproportionately endows high school
students with greater human capital than the other courses available to them Many students
educators and parents have also complained that the rigor of the AP pro- gram causes students to
lose confidence gain stress and perform poorly in other courses We evaluate these claims with
experimental evidence on the impact of AP Biology and Chemistry courses on studentsrsquo skills
14
interests and beliefs We recruited 23 schools that had not previously offered AP Biology or
Chemistry and were willing to permit us to randomize student access to the newly offered
course At the time of our school recruitment an estimated 50 percent of US high schools
already offered AP science classes and they tended to be in relatively higher-income
communities disproportionately serving White students (Malkus 2016) Our study drew from the
remaining population of schools where teachers had lower levels of training than science
teachers nationally and students were disproportionately non-White and poor Consequently our
results on AP impacts best generalize to schools like these that are on the cusp of deciding
whether to offer an AP science course
The estimates suggest that AP science led to improvements in science skill and STEM
interest above the courses that these students would otherwise take Prior research points to
longer-run benefits of AP including a higher likelihood of college enrollment and completion as
well as possible earnings gains (Jackson 2010 2014) Our findings suggest that these long-term
effects are at least partially driven by genuine increases in skill and not due solely to
postsecondary admissions and credit-granting policies33 We also find that AP science classes
substantially increase studentsrsquo stress levels and reduce their confidence in completing a college
science course Students who take AP science also receive lower grades in science and in other
(non-science) courses The cognitive gains from AP science are consistent with evidence that
higher levels of pressure and a lower level of confidence cause students to learn more than they
would otherwise And some of the negative effect on grades can be offset by upwardly weighting
grades in advanced courses
Although we have no direct way to convert our study impacts into monetary values for
students or society our evidence suggests that schools and districts are not making unwise or
costly investments in AP Calculating the differential cost to deliver an AP course versus another
level course in the same subject is difficult given that few schools document per-course
expenditures One recent analysis of a US district that relied on teacher salaries and course
assignments offers a partial cost-analysis Roza (2009) finds approximately $360 more in per-
pupil expenditures to deliver AP versus honors due primarily to smaller class sizes and more
senior teachers in AP This cost does not factor in the time that teachers spend retraining
themselves to teach the new curriculum At the same time relative to other policies aimed at
increasing human capital in high school that are often more costly to implement (such as
reducing class size) offering an AP course may be one of the least expensive options
This study offers the first credible estimates on the impact of a curriculum that is now offered
in the majority of the nationrsquos high schools and used by most postsecondary institutions to assess
applicant potential Our findings offer evidence to support and refute some of the claims made
about the AP program At the same time many important questions remain about differential AP
course impacts along student teacher and school attributes and on different parts of the outcome
distributions What are the general equilibrium effects of AP expansion for instance on college
admissions decisions as AP expands into schools with fewer resources Do AP courses generate
spillover effects on non-AP course-takers via changes in peer interactions and changes in how
teachers teach their non-AP classes These are all questions that warrant further research
15
References
Abadie Alberto Susan Athey Guido W Imbens and Jeffrey Wooldridge 2017 ldquoWhen Should
you Adjust Standard Errors for Clusteringrdquo NBER Working Paper No 24003
Cambridge MA NBER
Adelman Clifford 2006 The Toolbox Revisited Paths to Degree Completion from High School
Through College Washington DC US Department of Education
Aguilar Lauren Greg Walton and Carl Wieman 2014 ldquoPsychological Insights for Improved
Physics Teachingrdquo Physics Today 67 (5) 43ndash49
Altonji Joseph G 1995 ldquoThe Effects of High School Curriculum on Education and Labor
Market Outcomesrdquo The Journal of Human Resources 30 (3) 409ndash438
Anderson Carl R 1976 ldquoCoping Behaviors as Intervening Mechanisms in the Inverted-U-stress-
performance Relationshiprdquo Journal of Applied Psychology 61 (1) 30ndash34
Attewell Paul and Thurston Domina 2008 ldquoRaising the Bar Curricular Intensity and
Academic Performancerdquo Educational Evaluation and Policy Analysis 30 (1) 51ndash71
Avery Christopher Oded Gurantz Michael Hurwitz and Jonathan Smith 2018 ldquoShifting
College Majors in Response to Advanced Placement Exam Scoresrdquo Journal of Human
Resources 53 (4) 918ndash956
Benjamini Yoav and Yosef Hochberg 1995 ldquoControlling the False Discovery Rate A Practical
and Powerful Approach to Multiple Testingrdquo Journal of the Royal Statistical Society 57
(1) 289ndash300
Bennett J S Hogarth F Lubben B Campbell and A Robinson 2010 ldquoTalking Science The
Research Evidence on the Use of Small Group Discussions in Science Teachingrdquo
International Journal of Science Education 32 (1) 69ndash95
Berger Joe 2006 ldquoDemoting Advanced Placementrdquo The New York Times October 4
Boekaerts Monique and Jeroen S Rozendaal 2010 ldquoUsing Multiple Calibration Indices in
Order to Capture the Complex Picture of What Affects Studentsrsquo Accuracy of Feeling of
Confidencerdquo Learning and Instruction 20 (5) 372ndash382
Bound John Brad Hershbein and Bridget Terry Long 2009 ldquoPlaying the Admissions Game
Student Reactions to Increasing College Competitionrdquo The Journal of Economic
Perspectives 23 (4) 119ndash146
Bowie Liz 2013 ldquoMaryland Schools have been Leader in Advanced Placement but Results are
Mixedrdquo The Baltimore Sun August 17
Bush George W 2006 ldquoState of the Union Address by the Presidentrdquo Washington DC The
White House
Chiu Ming Ming and Robert M Klassen 2010 ldquoRelations of Mathematics Self-Concept and its
Calibration with Mathematics Achievement Cultural Differences among Fifteen-year-
olds in 34 Countriesrdquo Learning and Instruction 20 (1) 2ndash17
Clotfelter Charles T Helen F Ladd and Jacob L Vigdor 2010 ldquoTeacher Credentials and
Student Achievement in High School Across-Subject Analysis with Student Fixed
Effectsrdquo Journal of Human Resources 45 (3) 655ndash681
College Board 2002 Equity Policy Statement New York NY
__________ 2011a AP Biology Curriculum Framework 2012-2013 New York NY
__________ 2011b AP Chemistry Curriculum Framework 2013-2014 New York NY
__________ 2017a AP Course and Exam Redesign New York NY
__________ 2017b AP Course Audit New York NY
__________ 2018 AP Program Participation and Performance Data 2018 New York NY
16
Davis Jennifer R 2014 ldquoA Little Goes a Long Way Pressure for College Students to Succeedrdquo
Journal of Undergraduate Research 12 (1) 1ndash9
Dobbie Will and Roland G Fryer Jr 2015 ldquoThe medium-term impacts of high-achieving
charter schoolsrdquo Journal of Political Economy 123 (5) 985ndash1037
Dougherty Chrys and Lynn Mellor 2009 ldquoPreparation Mattersrdquo National Center for
Educational Achievement Washington DC
Dounay Zinth Jennifer 2016 ldquo50-State Comparison Advanced Placement Policiesrdquo Education
Commission of the States
Drew Christopher 2011 ldquoRethinking Advanced Placementrdquo The New York Times January 7
Duffett Ann and Steve Farkas 2009 ldquoGrowing Pains in the Advanced Placement Program Do
Tough Trade-offs Lie Aheadrdquo Thomas B Fordham Institute Washington DC
Ellis Jessica Bailey K Fosdick and Chris Rasmussen 2016 ldquoWomen 15 Times More Likely to
Leave STEM Pipeline after Calculus Compared to Men Lack of Mathematical
Confidence a Potential Culpritrdquo PLOS ONE 11 (7) 1ndash14
Foust Regan Clark Holly Hertberg-Davis and Carolyn M Callahan 2009 ldquoStudentsrsquo
Perceptions of the Non-academic Advantages and Disadvantages of Participation in
Advanced Placement Courses and International Baccalaureate Programsrdquo Adolescence
44 (174) 289ndash312
Geiser Saul and Veronica Santelices 2004 ldquoThe Role of Advanced Placement and Honors
Courses in College Admissionsrdquo Center for Studies in Higher Education Research
Occasional Paper Series CSHE404
Goodman Joshua Samuel 2012 ldquoThe Labor of Division Returns to Compulsory Math
Courseworkrdquo Unpublished Manuscript
Harel O 2009 ldquoThe Estimation of R-squared and Adjusted R-squared in Incomplete Data Sets
Using Multiple Imputationrdquo Journal of Applied Statistics 36 (10) 1109ndash1118
Hippel Paul T von 2007 ldquoRegression with Missing Ys An Improved Strategy for Analyzing
Multiply Imputed Datardquo Sociological Methodology 37 (1) 83ndash117
Holstead Michael S Terry E Spradlin Margaret E McGillivray and Nathan Burroughs 2010
ldquoThe Impact of Advanced Placement Incentive Programsrdquo Center for Evaluation and
Education Policy Indiana University Education Policy Brief 8(1)
Hopkins Katy 2012 ldquoWeigh the Benefits Stress of AP Courses for Your Studentrdquo US News
the World Report May 10
Huber Martin 2013 ldquoA Simple Test for the Ignorability of Non-compliance in Experimentsrdquo
Economics Letters 120 (3) 389ndash391
Imbens G and F Manski 2004 ldquoConfidence Intervals for Partially Identified Parametersrdquo
Econometrica 72 (6) 1845ndash1857
Jackson C Kirabo 2010 ldquoA Little Now for a Lot Later A Look at a Texas Advanced Placement
Incentive Programrdquo Journal of Human Resources 45 (3) 591ndash639
__________ 2014 ldquoDo College-Preparatory Programs Improve Long-Term Outcomesrdquo
Economic Inquiry 52 (1) 72ndash99
Joensen Juanna Schroslashter and Helena Skyt Nielsen 2009 ldquoIs there a Causal Effect of High
School Math on Labor Market Outcomesrdquo Journal of Human Resources 44 (1) 171ndash
198
Kim Emily 2015 ldquoAP Classes often Translate to Advanced Pressurerdquo Los Angeles Times
September 22
Klopfenstein Kristin and Kit Lively 2016 ldquoDo Grade Weights Promote More Advanced
17
Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324
Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement
Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891
__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and
Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds
Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188
Cambridge Harvard Education Press
Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla
Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)
287ndash 313
Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on
Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102
Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations
of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347
(6219) 262ndash265
Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math
and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic
Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student
STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher
Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking
on Secondary and Postsecondary Successrdquo American Educational Research Journal 49
(2) 285ndash322
Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP
Expansion Can Schools in Less-Resourced Communities Successfully Implement
Advanced Placement Science Coursesrdquo Conditionally accepted by Educational
Researcher
Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo
American Enterprise Institute Washington DC
Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23
McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy
Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of
Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-
144) US Department of Education Washington DC National Center for Education
Statistics
National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of
Mathematics and Science in US High Schoolsrdquo Washington DC National Academies
Press
__________ 2012 A Framework for K-12 Science Education Practices Crosscutting
Concepts and Core Ideas Washington DC The National Academies Press
Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC
Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data
Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures
Version 10 Stanford University
Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic
Analysis amp Policy 4 (1) 1ndash30
18
Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The
Review of Economics and Statistics 86 (2) 497ndash513
Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)
Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of
Advanced High School Coursework in Increasing STEM Career Interestrdquo Science
Educator 23 (1) 1ndash13
Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework
in College Admission Decisionsrdquo College and University 82 (4) 7ndash14
Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan
Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific
Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo
Educational Measurement Forthcoming
Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where
it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor
Economics 35 (1) 67ndash147
Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An
Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732
Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual
differencesrdquo Personality and Individual Differences 21 (6) 971ndash986
Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of
Cross-Cultural Psychology 45 (5) 821ndash837
Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid
Growthrdquo The New York Times April 29
Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo
Liberal Education 94 (3) 38ndash43
The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo
Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo
Education Trust June 5
Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and
Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-
001) US Department of Education Washington DC National Center for Education
Statistics
Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13
Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate
US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the
Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced
Placement Testsrdquo Washington DC
Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of
Advanced Placementrdquo Progressive Policy Institute Washington DC
West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth
Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring
Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation
and Policy Analysis 38 (1) 148ndash170
Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity
of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482
19
Figure 1
Geographic Distribution of Participating Districts
20
Figure 2
Participating Districts Neighborhood Socioeconomic Status and School Test Scores
Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school
district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos
neighborhood defined as the first principal component factor score based on measures of median
income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed
household rate and unemployment rate Y-axis is the districtrsquos average test score in grade
equivalents based on the averaged spring math and English scores for students in grades 3-8 for
2009-2013 with the expected level of achievement standardized to zero The size of each circle
is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using
Statarsquos default settings and roughly shows the predicted test score as a function of the
neighborhoodrsquos SES
21
Figure 3
Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile
Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects
Corresponding OLS estimate shown by the dashed horizontal line Science skill has been
standardized to have a mean of 0 and SD of 1 for the full sample of participating students
Results are weighted by the inverse probability of completing the survey
22
Table 1
Participating Schools and Teachers Compared to Other US High Schools and High School
Science Teachers Panel A Schools Participating Others
Average Enrollment 1409 723
Free or Reduced-Price Lunch 0700 0438
Asian 0055 0050
Black 0349 0154
Hispanic 0410 0221
White 0164 0537
Adjusted Cohort Graduation Rate 0843 0802
District Instruction Expenditures Per Pupil $6561 $5636
District Student Services Expenditures Per Pupil $3787 $3385
Panel B Teachers Participating Others
Age Under 30 0407 0160
Age 30-49 0432 0553
Age 50 or over 0161 0287
Female 0630 0536
Hispanic or Latino 0111 0051
Race American Indian or Alaska Native 0000 0009
Race Asian American 0111 0041
Race Black 0111 0060
Race Native Hawaiian or other Pacific Islander 0000 0004
Race White 0778 0896
Years of Experience 103 132
Years of Experience lt=2 0290 0085
Years of Experience lt=5 0481 0234
Hold a Teaching Certificate 0926 0945
Undergraduate Major in STEM 0944 0747
Single Subject Credential in Science 0630 0823
Masterrsquos Degree or Higher 0356 0615
Previously Taught AP Course 0469 NA
Previously Taught AP IB or Honors Course 0796 NA
Number of Professional Development Trainings 309 NA
in the Past 5 years (0-5)
Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts
httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public
high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a
9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the
Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey
httpsncesedgovsurveyssass Others in Panel B refers to public and private high school
teachers in the US High school science teachers are defined as teachers of grades 9-12 whose
main teaching assignment is in the natural sciences
23
Table 2
TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics
(1) (2) (3) (4) (5) (6)
Full Sample Survey Sample
Pre-Treatment Characteristic
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Age as of October of 11th Grade 166 -003 -007 166 -001 -001
(002) (007) (003) (009)
[019] [035] [065] [094]
Math Exam Score 038 008 025 044 007 030
(004) (010) (005) (016)
[008] [002] [017] [006]
Reading Exam Score 029 010 018 036 009 017
(003) (012) (004) (017)
[000] [014] [002] [031]
HS Grade Point Average 316 005 020 323 006 013
(003) (008) (003) (010)
[014] [002] [006] [020]
Female 059 000 010 061 -001 011
(003) (006) (004) (007)
[099] [010] [073] [012]
Asian American 012 002 010 012 003 010
(002) (005) (001) (007)
[027] [006] [007] [012]
Black 032 -002 -006 027 000 -005
(002) (006) (002) (005)
[029] [028] [088] [040]
Hispanic Native American or Multiracial 031 001 005 033 001 005
24
(002) (006) (002) (007)
[055] [041] [081] [051]
Disabled 002 000 -001 001 000 -001
(001) (001) (001) (001)
[093] [024] [057] [05]
Gifted 013 003 000 014 002 001
(002) (005) (002) (009)
[006] [100] [025] [089]
English Language Learner 005 001 002 004 001 004
(001) (002) (001) (003)
[041] [039] [054] [022]
Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007
(002) (007) (003) (009)
[066] [077] [072] [045]
Language Other than English Spoken at Home 034 002 003 035 001 004
(002) (007) (002) (007)
[032] [073] [059] [056]
Took Recommended Prerequisite Courses 079 000 009 079 002 005
(002) (004) (002) (005)
[084] [004] [043] [031]
Number of Observations 1819 1417
Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by
School x Cohort are in parentheses and p-values are in brackets
25
Table 3
First Stage Impacts on AP Course Enrollment and Overall Course Enrollment
(1) (2) (3) (4) (5) (6)
Full Sample Survey Respondents
Outcome
Control
Group
Mean
ITT
LATE
Control
Group
Mean
ITT
LATE
AP Treatment Course Enrollment 019 038 024 039
(005) (006)
[000] [000] Share of Credits During Study Year in
AP Science 003 004 011 003 004 010
(001) (001) (001) (001)
[000] [000] [000] [000]
All AP 013 004 011 014 004 010
(001) (002) (001) (002)
[000] [000] [000] [000]
Other Advanced Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [023] [020] [020]
All Other Advanced 025 -001 -003 025 -001 -003
(001) (002) (001) (003)
[023] [023] [030] [030]
Regular Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [020] [024] [019]
All Regular 062 -003 -009 061 -003 -007
(001) (003) (001) (003)
[002] [000] [007] [003]
Number of Observations 1819 1417
Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating
Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation
(1) Course-taking information collected from student transcripts Control Group Mean uses the
full control group for the first outcome (ie AP Treatment Course Enrollment) and those control
group members who complied with their assignment (ie those who did not take the AP
Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are
weighted by the inverse probability of completing the survey Standard errors clustered by School
x Cohort are in parentheses and p-values are in brackets
26
Table 4
Treatment Contrast (Composite Variables)
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Academically Challenging Curriculum -033 031 080
(010) (024)
[000] [000]
Project-Based Independent Classroom
Activities -006 013 033
(007) (017)
[007] [006]
Integrated Use of Technology
-011 011 028
(008) (019)
[019] [014]
Number of Observations 1417
Notes To construct these composite variables we first converted the values on each component
variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest
category was set to 10 the lowest to 00 and the remaining categories evenly spaced between
00 and 10 We then averaged and standardized these converted values Results are weighted by
the inverse probability of completing the survey Online Appendix Table 5 provides the list of
component variables Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
27
Table 5
AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Science Skill -019 009 023
(006) (016)
[015] [014]
STEM Interest 062 004 009
(002) (007)
[016] [016]
Confidence in College
Science 092 -004 -010
(002) (005)
[011] [006]
Stress 012 007 017
(003) (007)
[002] [001]
Grades in Science Courses 280 -012 -029
(007) (016)
[008] [007]
Grades in Other Courses 314 -007 -018
(002) (006)
[000] [000]
Number of Observations 1819 for grades 1417 for other
outcomes
Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of
participating students STEM interest =1 if high or some interest in pursuing a STEM degree or
=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to
complete a college science course or =0 if somewhat not confident or not at all confident Stress=
1 if most recent science course had strong negative or negative impact on physical or emotional
health or =0 if strong positive impact positive impact or no impact Grades in science and other
courses are obtained from student transcripts and measure grades during the study year
Results with the exception of grades during study year are weighted by the inverse probability of
completing the survey Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
Table 6
Robustness Checks of Main ITT Results
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Outcome
Control
Group
Complie
r Mean
Main
Result
s
Robus
t SE
p-value
(permutatio
n test)
Excludin
g High
School
56
Including
Imputatio
n of
Missing
Outcome
Variables
Excluding
Covariate
s
Excludin
g High
School
23
Lee
Lower
Boun
d
Lee
Upper
Boun
d
95
Confidence
Interval
from Lee
Bounds
Rati
o of
95
CI in
(11)
to
95
CI in
(7)
Science Skill -019 009 010 011 020 007 003 039
-
009
05
1 20
(006) (005) (000) (000) (000) (000) (007) (007)
[015] [006] [006] [020] [011] [001] [024] [072] [000]
STEM Interest 062 004 005 003 003 003 002 012
-
003
01
8 19
(002) (003) (000) (000) (000) (000) (003) (004)
[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College
Science 092 -004 -003 -006 -006 -004 -006 005
-
009
01
0 20
(002) (002) (000) (000) (000) (000) (002) (003)
[011] [005] [007] [037] [002] [003] [010] [000] [017]
Stress 012 007 005 006 008 007 001 011
-
005
01
5 16
(003) (002) (000) (000) (000) (000) (003) (002)
[002] [000] [000] [014] [007] [002] [002] [079] [000]
Grades in Science Courses 280 -012 -006 -010 -007 |
(007) (004) (000) (000) (000)
[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts
Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey
(002) (003) (000) (000) (000) |
[000] [001] [001] [000] [001] [038]
Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than
standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby
a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of
43
the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)
reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the
experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply
imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and
where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)
from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those
treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and
control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to
derive confidence interval for the treatment effect itself)
30
1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the
Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the
effect of advanced high school courses more generally often without distinctions between AP
and other rigorous course options Nearly all of these nonexperimental studies find large positive
effects of rigorous secondary school courses particularly those in math and science on studentsrsquo
high school postsecondary and labor market performance (eg Altonji 1995 Attewell and
Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long
Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer
an AP course are encouraged (though not required) to attend a professional development
training The Board and other independent agencies offer several workshops with the most
extensive training being the AP summer institute a week-long training that is led by an
experienced AP instructor Teachers are then expected to develop their syllabi for the course and
submit them to the Board for review A team of auditors at the Board review each syllabus and
grant permission to a school to label the course as AP on course catalogs and student transcripts
once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they
do not meet the requirements upon original submission College Board (2017b) contains a
discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for
assessment (ie course delivery and student performance are not assessed by the Board) In
order to effectively run an AP Biology or Chemistry course teachers require access to a well-
equipped classroom and laboratory including all supplies necessary to engage in
experimentation (eg beakers solutions microscopes measuring equipment) Most of the
teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of
learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry
reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses
were based upon recommendations from the National Science Foundation the National Research
Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently
influenced by their frames of reference in ways that other assessments of these traits (eg
external observations) may be less influenced By increasing the standard to which they compare
themselves studentsrsquo confidence may decrease This feature of most self-assessments could be
considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et
al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome
depends to some extent on how these changes in perceived ability influence other behaviors
such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and
Biology I and Chemistry I for AP Biology with no additional requirements beyond these
prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week
training course classroom supplies (eg lab materials textbooks) and to compensate schools
for the staff time required for study administration efforts We also offered $1000 compensation
for an individual selected by the school to serve as a liaison between the study team and the
31
school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of
students which would have powered the study to detect effect sizes smaller than those detected
here We faced several challenges in recruiting schools to participate even with the monetary
incentives Some schools were uncomfortable with randomization across classrooms while
others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the
course would be offered We also made some assignments on a rolling basis as additional
consentassent forms were submitted We have no information on the students who were deemed
eligible by the school to take the new AP science course but who did not sign the consent form
to participate As these students did not participate we do not have permission to obtain
information on their characteristics (eg via transcripts) and for most schools we do not know
the number of such students 10 Participating districts include Anaheim Union High School District California East Side
Union High School District California Lynwood Unified School District California Jefferson
Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg
Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public
Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville
Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate
degrees held by teachers nationally are likely to be in education (not STEM) Thus the study
teachers are less likely to have a graduate degree but not necessarily less likely to have STEM
training We also did not survey teachers regarding their Teach for America (TFA) experience
but it is possible that the relatively high share of STEM undergraduate degrees could be driven
by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last
pilot test included 140 students) prior to administering the tool to study participants Reliability
metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of
the development of the assessment instrument in the survey can be found in Seeratan et al
(2017) 13 Each year in the spring semester our team administered and collected the participant surveys
during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However
if study participants who did not take the survey differ in unobserved ways then our reweighting
based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo
characteristics before imputation of missing values (as described below) these results are very
similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact
that some students did not have 10th grade scores) we created one reading and math score for
each student that is the average of both scores or just the 8th grade score For the 23 participating
students who were in 10th grade during the year in which the AP course was offered to their
cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be
endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of
enrollment in the course so the lack of balance is simply due to unlucky randomization rather
32
than manipulation by school administrators We considered implementing a randomized block
design to avoid such issues but found it infeasible to obtain the necessary test score information
prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was
allowed to register for the new class We added an entire planning year to our study design to
avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by
Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We
find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these
six outcomes which suggests that generalizing our estimated treatment effects to the full control
group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly
offered enrollment in an AP course and then given the option of Chemistry or Biology To
account for the two courses offered we treat the school as two separate groups School-
Chemistry and School-Biology For those students who were not offered an AP course we
randomly assign them to one of two control groups proportional to the number of treated
students who chose each course For example if 60 of the treated students chose Biology then
we randomly assign 60 of the control students to the School-Biology control group In Section
VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a
probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1
if student i in school by cohort j completed any part of the end-of-year survey Xi is the same
vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed
effects and Φ() is the cumulative normal distribution function The results of this regression are
included in Online Appendix Table 2 Students who had higher pre-treatment grades Black
students those who were not disabled and those who took prerequisite courses were more likely
to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives
more weight in the regression to study participants who completed the survey and yet had pre-
study characteristics that were similar to those study participants who did not complete the
survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and
with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we
observe each outcome variable This follows a multiple imputation then deletion strategy
suggested by Hippel (2007) which improves efficiency while protecting against problematic
imputed outcome values As a robustness check Section VC provides results including
imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually
challenging curriculum with more homework than non-AP complier students Treatment group
students are also more likely to report that the students in their class were driven to succeed and
that the teacher set high standards The AP science class also involved more student-led projects
or experiments hands on learning and small group work all activities that are deemed to be
essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)
Yet we do not find strong evidence that students in AP classes were more likely to present what
they learned apply their knowledge to solve a new problem or work independently and none of
the component measures of technology usage were statistically significantly affected Nor did
33
treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear
better able to implement the academic rigor expected of an AP science class than some of the
inquiry-based approaches that the College Board intends for AP science We do not find
evidence that taking AP science led students to be more likely to report that they found their
course more interesting which may reflect the inability of the teachers to fully implement a
creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects
that might render our estimated effects smaller A research design with randomization both
across and within schools would allow for estimation of spillover effects but such a design was
infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers
received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the
weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors
in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple
outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons
(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same
three outcomes that reach statistical significance without applying the correction (shown in
Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys
from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we
have included the students from cohort 1 of high school number 23 where nonresponse was due
mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes
(including student prior academic preparation raceethnicity gender and teacher preparation)
We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in
science and grades in other courses) Some of the differences in the point estimates were quite
large yet so too were the standard errors For instance five of the seven estimated differential
treatment effects on science skill exceed 025 standard deviations with p-values that fall in the
suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse
on all three cohorts of study participants Once data collection is complete we will have the
ability to examine the effect of AP science on college enrollment college selectivity and college
completion
8
who assented to participate in the study and who obtained consent from their parent or guardian
as study participants Upon receipt of signed consentassent forms we randomly offered
enrollment in the newly launched course to a subset of participating students9 The study
includes a total of 27 teachers and 1819 students (with an average of approximately 19 students
per AP class)
Figure 1 shows the geographic distribution of the 11 participating districts which are
primarily concentrated in the western southern and eastern regions of the country10 The
underrepresentation of districts in the Midwest is consistent with evidence that the Midwestern
region has experienced less competition over the years in access to selective postsecondary
institutions and a corresponding lag in AP participation rates (Bound Hershbein and Long
2009) Relative to districts across the nation those participating in the study tend to be in
neighborhoods with lower levels of socioeconomic status and to educate students who score
below average on tests in earlier grades (see Figure 2) Correspondingly participating schools
tend to be larger and more likely to educate students who are eligible for free or reduced-price
lunch Black and Hispanic than other schools (Panel A of Table 1)
There are two reasons for this over-representation of larger schools serving less economically
prosperous communities First AP courses are already offered in the majority of the nationrsquos
public high schools and schools that serve students from high-income families tend to offer
more AP subjects than schools that serve students from lower-income families (Malkus 2016
Theokas and Saaris 2013) Given that our research design only allowed for schools that had not
recently offered an AP science course the population of schools from which we recruited tended
to be those in settings with fewer resources Second participating schools were required to state
that they believed they would have 60 or more students who were qualified to take the AP
science course and this requirement tended to disqualify smaller high schools
Reflecting the school demographics participating teachers are slightly younger less
experienced and more likely to be female Black Asian American and of Hispanic ethnicity
than US high school science teachers generally (Panel B of Table 1) Nearly half (a third) of our
study teachers have less than or equal to five (two) years of teaching experience which is more
than double (triple) the rate of US high school science teachers Study teachers are more likely to
hold an undergraduate major in a STEM field than other high school science teachers yet far less
likely to hold a mastersrsquo degree and slightly less likely to have earned a teaching credential in
science Most of the participating teachers had previously taught a higher-level course (mostly
honors) yet only 47 percent of them had previously taught an AP course Our research
consequently applies to a population of teachers who are relatively new to the AP science
curriculum and who have generally not received graduate training11 Assuming AP courses
improve with teacher preparation our results likely capture the effect of a less-than-ideal version
of AP and may result in less positive treatment effects than when AP is delivered by teachers
with more training and experience (Clotfelter Ladd and Vigdor 2010)
B Data and Student Descriptive Statistics
We rely on three primary and secondary data sources for impact estimates The first is an
assessment developed and validated by the research team that measures studentsrsquo scientific
inquiry skills We administered this assessment to students in both treatment and control groups
and designed it to measure general inquiry skills (eg how to analyze data) rather than specific
content knowledge in Biology or Chemistry To that end the assessment tool includes nine items
that rely on science disciplinary knowledge that is taught in middle school specifically material
from Life Sciences and Physical Sciences The assessment which we administered to all study
9
participants during a 45-minute period measures studentsrsquo skills in data analysis scientific
explanation and scientific argument12 Participating teachers were not provided copies of the
instrument in advance therefore teachers were unable to teach any content material prior to test
administration
The second source is a questionnaire that we administered concurrently with the assessment
and that asks students a number of questions about their most recent science class and their plans
after high school The assessment and questionnaire were completed together and administered
outside of class (henceforth we refer to these instruments as the ldquosurveyrdquo) The third data source
are studentsrsquo high school transcripts which contain data on demographic and socioeconomic
background grades courses standardized exams taken in the 8th and 10th grades as well as high
school completion We use these data to determine the balance of randomization on pre-
treatment covariates estimate the effect of randomization on course-taking (including
compliance) improve the precision of our estimates with statistical controls and estimate
treatment effects on studentsrsquo grades
Our survey response rate was 78 percent13 Attrition can be attributed to student absences
during the dates scheduled for survey administration and communication lapses between school
coordinators and students Students who were randomly assigned to treatment have a 9-
percentage point higher survey response rate Given the possibility of nonrandom sample
attrition we weight all regressions by the inverse of the probability of completing the survey
conditional on student characteristics14 We implement a variety of robustness checks as
additional means to account for nonresponse These include multiple imputation of missing
outcome variables excluding one high school that had a low response rate and using the Lee
(2009) technique to provide bounds on the estimated effects These methods and results are
discussed below
We supplement these data with surveys that we administered online to teachers of the new
AP courses at the conclusion of the course The teacher survey includes questions about their
educational background professional experiences and professional development past and
present instructional practices generally and around science specifically participation in the
College Board AP training ability to cover the content of the AP course and coaching
mentoring and other professional community supports received from the school district and
education community
Table 2 provides balancing tests on pre-treatment characteristics for the full sample and the
survey respondents conditional on school by cohort fixed effects15 Most of the estimated
differences between treatment and control group students on pre-treatment observed
characteristics are small with some notable exceptions In both the full and survey samples
treatment group studentsrsquo reading exam scores were 010 and 009 standard deviations higher
than control group students both at p-values below 005 The magnitude of the treatment-control
difference was slightly lower and less precisely-estimated in math yet also favored treatment
group students16 To adjust for these chance imbalances we include all student covariates as
predictors of outcomes in the models and in the robustness checks we exclude these
covariates17
Table 2 also shows the extent of differences between control group compliers and non-
compliers We find that non-compliers are generally much more academically prepared for AP
science they have higher pre-treatment reading and math test scores and are more likely to have
completed the prerequisite courses On demographics non-compliers are more likely to be Asian
American and female18
10
IV Empirical Strategy
We estimate the effect of taking the AP science course with a standard instrumental variable
specification
(1) 119884119894119895 = 120572119895 + 119860119894119895120573 + 119935119894120574 + 120598119894119895
(2) 119860119875119894119895 = 120575119895 + 119874119891119891119890119903119890119889119894119895120579 + 119935119894120583 + 120598119894119895
where 119860119875119894119895 = 1 if student i enrolled in the AP science course in school x cohort stratum j 119860119894119895 is
the fitted value based on the estimates of the parameters in Equation (2) Offeredij = 1 if the
student is randomized into the treatment group Xi is a vector of pre-treatment covariates
(including age math and reading exam scores from 8th and 10th grade (standardized and
averaged for math and reading separately) cumulative GPA prior to the year when the AP
science course was offered and indicator variables for female racial group (Asian American
Black or Hispanic Native American or Multiracial) disability gifted English Language
Learner eligible for free or reduced-price lunch home language is not English and took
recommended prerequisite courses) and 120572119895 and 120575119895 are school by cohort fixed effects19 We use
two-stage least squares to estimate the model for all outcomes The local average treatment effect
(LATE) estimate is given by β
The intent to treat (ITT) estimate is obtained by replacing 119860119894119895 with Offeredij in Equation (1)
as shown in Equation (3) The coefficient on Offeredij in Equation (3) provides the effect of
being offered enrollment in the new AP science course and is a weighted average of effects on
those who do and do not choose to enroll in the course
(3) 119884119894119895 = 120577119895 + 119874119891119891119890119903119890119889119894119895120591 + 119935119894120582 + 120598119894119895
For outcomes that are obtained from the survey we weight regressions by the inverse of the
estimated probability of completing the survey20 The results are similar without using these
weights (see Online Appendix Tables 3 4 and 6) Since we have some missingness in student
characteristics as a result of either missing student transcripts or certain data elements not
collected by the district we use multiple imputation by chained equations creating 10 imputed
datasets and combine the results21 For inference we cluster standard errors at the level of
treatment assignment (school by cohort) in our analysis of main effects In the analysis of
robustness we report permutation standard errors robust standard errors (for comparison to
permutations) and the statistical significance of the LATE estimates after adjusting our tests of
significance for multiple comparisons
V Results
A Course-Taking and Treatment Contrast
Table 3 provides estimated effects of the randomized offer of enrollment on AP science course
enrollment and share of credits in all courses for the full sample and the survey samples The
first-stage estimates indicate that the offer substantially increased the likelihood of the student
taking the AP science course by 38 percentage points in the full sample and 39 percentage points
in the survey sample As we expected compliance with randomization was imperfect with 42
11
percent of the students who received an offer choosing not to enroll and 19 percent of the control
students enrolling Nearly all of these latter crossovers reflected decisions by the district to
violate the study protocol and let control group students into the course while a few of these
came from hardship exemptions that were requested by the school and granted by the study team
The remaining rows in Table 3 shine light on the courses that were crowded out by the newly
offered AP science course Mechanically treatment group students took more credits in AP
science (an 11-percentage point increase in the share of total credits in the full sample)
Treatment group studentsrsquo share of courses in any AP also increased by 11 percentage points
indicating that they chose not to reduce enrollment in other AP courses Instead taking AP
science appears to have crowded out regular courses (down 9 percentage points) including
regular science courses (down 2 percentage points)22
Approximately 78 percent of the control group compliers took any science course with 34
percent taking a non-AP advanced science course (almost entirely honors courses) during the
study year The control students who did not take AP Biology or Chemistry took a variety of
alternative science courses with the most commonly reported courses including Chemistry
(13) Physics (12) AP Environmental Science (11) Biology (10) Honors Biology (9)
and AnatomyPhysiology (9)
Table 4 provides the contrast in treatment and control group complier reports on the content
and rigor of their science courses for three composite variables We find that taking AP science
yielded a substantially more academically challenging curriculum (up 080 sd p-value lt 001)
and raised the extent of inquiry-based classroom activities (up 033 sd p-value = 006) Our
results also suggest that AP course-takerrsquos classrooms were more likely to use technology (up
028 sd p-value = 014)23 Online Appendix Table 5 shows estimated impacts on each of the
component variables used in constructing the composite variables We find that while AP
classrooms were more inquiry-based than other science classrooms using our composite
measure some of the core components of the inquiry approach that were intended by the Board
(eg applying knowledge to solve a new problem) were not more prevalent in AP science
classes than other science classes24 This contrast between studentsrsquo reports of the content and
rigor of their AP science course relative to other courses available to them offers one measure of
the relative quality of the treatment In a companion manuscript we provide a detailed evaluation
of implementation fidelity (the degree to which the courses were implemented as intended by the
Board) through teacher surveys course syllabi student transcripts and interviews with teachers
and school administrators (Long Conger and McGhee 2018) In that manuscript we find results
that are consistent with the finding that most teachers were able to implement a rigorous AP
science classroom yet they also struggled with the inquiry-based approach and integrating
technology into the classroom
These reported differences between treatment and control group classrooms also hold despite
the fact that many of the teachers selected to teach AP also teach the other science courses taken
by control group students In fact almost 67 percent of AP teachers reported using some of their
AP science strategies and lessons in their non-AP classes These within-school spillovers likely
attenuate observed differences in outcomes between treatment and control group students in the
same school25
B AP Impact on Outcomes
Table 5 reports estimated impacts of AP science on the key outcomes of interest We estimate
that for the typical complier taking AP science raises objectively measured scientific inquiry
skills by 023 standard deviations We are unable to rule out zero treatment impacts with
12
conventionally high levels of confidence (p-value = 014) and consequently refer to these results
as more suggestive than definitive AP science also increased compliersrsquo interest in pursuing a
STEM degree should they enroll in college by 9 percentage points up from a control group
complier mean of 62 percent with again more suggestive than definitive results at traditional
levels of statistical inference (p-value = 016)
Table 5 provides stronger evidence of negative treatment effects on studentsrsquo confidence in
their ability to succeed in a college science course Among control group compliers 92 percent
express that they are at least somewhat confident in their ability to succeed in a college science
course These high levels of confidence are perhaps not surprising since all of our sample
participants demonstrated interest in taking AP Chemistry or Biology as a result of signing the
study assent forms Taking AP science substantially lowered participantsrsquo likelihood of being at
least somewhat confident in their ability to complete college courses in science (down 10
percentage points p-value = 006) We also find large effects of the AP course on studentsrsquo self-
reported stress levels Among control group compliers 12 percent stated that their most recent
science class had a negative or strong negative impact on their stress levels (where a negative
impact indicates more stress) Taking AP science more than doubles this rate raising the
likelihood of stating a negative impact by 17 percentage points (p-value = 001) In results
available from the authors we also examine the effect of taking AP on the full distribution of
studentrsquos self-reported confidence and stress levels We find that taking AP science increases
studentsrsquo likelihood of reporting strong negative impacts on stress by 5 percentage points (p-
value = 005) above the control group complier mean of 2 percent
In addition to experiencing a loss in confidence and an increase in stress treatment group
studentsrsquo grades suffered We estimate that taking AP science reduced studentsrsquo grades in their
science courses by 029 points (p-value = 007) Relative to a control group complier mean of
280 taking AP science lowers studentsrsquo science GPAs during the study year (usually their junior
year) from around a B- to a C+26 This decline is addressed to some degree by high schools that
use a weighted grade point average to upweight grades from AP courses The last row of Table 5
provides our estimated effects of AP science on studentsrsquo grades in other courses AP science
takers score approximately 018 grade points lower than control group compliers in non-science
courses during the study year (p-value below 001) These results suggest that students may be
shifting their effort away from their non-AP classes in order to meet the demands of the
challenging AP course An average of these impacts weighted by studentsrsquo share of credits in
science during the study year assuming that they take AP science (024) suggests that taking AP
science lowers studentsrsquo overall grades by 021 during the year ((-029 times 024) + (-018 times
076))
With our estimates in hand we can easily compute the adjustment that would leave the
studentrsquos GPA during the study year unaffected For students who took AP Biology or Chemistry
as result of this experiment the share of their classes in any AP science subject is predicted to be
14 percent (ie 002 + 012 from Table 3) If these studentsrsquo grades in AP science courses were
boosted by 146 (021014) their GPAs during the study year would be unaffected by their
enrollment in these AP courses This 146 boost is close to the higher end of the practices
documented in Klopfenstein and Lively (2016)27
C Robustness Checks
Table 6 presents a variety of robustness checks of the ITT estimates on our six main outcomes
The first two columns of this table repeat the findings previously shown in Table 5 Columns (3)
and (4) present alternate methods for inference Column (3) reports robust standard errors and
13
Column (4) reports the results of a permutation test where we randomly assign a pseudo
treatment and compute the share of 1000 permutations where the absolute value of the estimated
pseudo treatment effect exceeds the absolute value of the estimated treatment effect shown in
Column (2)28 The resulting p-values from this permutation test are similar to the results using
robust standard errors (shown in Column (3)) resulting in five of the six outcomes with p-values
of less than 01029
Columns (5) through (7) of Table 6 show that the results are robust to (a) dropping the one
high school that offered both AP Biology and AP Chemistry as part of the study (b) including
observations with multiply-imputed missing outcome variables and (c) excluding the high
school with the lowest survey response rate30 Column (8) shows the results when we exclude all
of the Xi covariates where we find much larger estimated positive effects on scientific inquiry
skills and smaller estimated negative effects on grades The differences in the treatment effects
on the remaining three outcomes are modest These results likely reflect the fact that students
who were randomly assigned into the treatment group have higher pre-treatment grades and
reading and math test scores all covariates that strongly correlate with science skill and future
grades
Columns (9) through (12) of Table 6 use the Lee (2009) method to place bounds on our
estimates due to potential nonresponse bias in the student survey used for the first four outcomes
This method trims particular observations from the treatment group (in this case) until it matches
the response rate of the control group The lower (upper) bound estimate trims the treatment
observations with the highest (lowest) values of the outcome Using these lower and upper bound
estimates we compute the 95 percent confidence interval for the treatment effect itself by
applying the Imbens and Manski (2004) method Consistent with our main findings the upper
and lower bound points estimates are positive for science skill (003 and 039 sd) interest in
pursuing a STEM degree (2 and 12 percentage points) and stress (1 and 11 percentage points)
However the 95 percent confidence intervals overlap zero in all cases and are roughly double the
size of the ordinary confidence intervals These results suggest that some additional caution
should be considered in evaluating the effects from outcomes based on the study survey31
Finally we would have liked to report the results of theoretically motivated heterogeneity
analyses yet we lack the statistical power needed to test heterogeneity with a high level of
confidence For example Figure 3 shows a quantile regression conditional on Xi with science
skill as the outcome We find that the point estimates at every quantile are insignificantly
different from the 009 ITT point estimate reported in Table 5 yet the 95 confidence intervals
fail to rule out large positives and negatives Additional heterogeneity results can be found in the
Online Appendix32
VI Conclusion
Most admissions committees at bachelorrsquos degree-granting institutions rely on applicantsrsquo AP
course and exam participation as signals of subject-matter skill and interest rendering the
relationship between AP uptake and college enrollment somewhat deterministic There has been
almost no empirical work to support the theory that AP disproportionately endows high school
students with greater human capital than the other courses available to them Many students
educators and parents have also complained that the rigor of the AP pro- gram causes students to
lose confidence gain stress and perform poorly in other courses We evaluate these claims with
experimental evidence on the impact of AP Biology and Chemistry courses on studentsrsquo skills
14
interests and beliefs We recruited 23 schools that had not previously offered AP Biology or
Chemistry and were willing to permit us to randomize student access to the newly offered
course At the time of our school recruitment an estimated 50 percent of US high schools
already offered AP science classes and they tended to be in relatively higher-income
communities disproportionately serving White students (Malkus 2016) Our study drew from the
remaining population of schools where teachers had lower levels of training than science
teachers nationally and students were disproportionately non-White and poor Consequently our
results on AP impacts best generalize to schools like these that are on the cusp of deciding
whether to offer an AP science course
The estimates suggest that AP science led to improvements in science skill and STEM
interest above the courses that these students would otherwise take Prior research points to
longer-run benefits of AP including a higher likelihood of college enrollment and completion as
well as possible earnings gains (Jackson 2010 2014) Our findings suggest that these long-term
effects are at least partially driven by genuine increases in skill and not due solely to
postsecondary admissions and credit-granting policies33 We also find that AP science classes
substantially increase studentsrsquo stress levels and reduce their confidence in completing a college
science course Students who take AP science also receive lower grades in science and in other
(non-science) courses The cognitive gains from AP science are consistent with evidence that
higher levels of pressure and a lower level of confidence cause students to learn more than they
would otherwise And some of the negative effect on grades can be offset by upwardly weighting
grades in advanced courses
Although we have no direct way to convert our study impacts into monetary values for
students or society our evidence suggests that schools and districts are not making unwise or
costly investments in AP Calculating the differential cost to deliver an AP course versus another
level course in the same subject is difficult given that few schools document per-course
expenditures One recent analysis of a US district that relied on teacher salaries and course
assignments offers a partial cost-analysis Roza (2009) finds approximately $360 more in per-
pupil expenditures to deliver AP versus honors due primarily to smaller class sizes and more
senior teachers in AP This cost does not factor in the time that teachers spend retraining
themselves to teach the new curriculum At the same time relative to other policies aimed at
increasing human capital in high school that are often more costly to implement (such as
reducing class size) offering an AP course may be one of the least expensive options
This study offers the first credible estimates on the impact of a curriculum that is now offered
in the majority of the nationrsquos high schools and used by most postsecondary institutions to assess
applicant potential Our findings offer evidence to support and refute some of the claims made
about the AP program At the same time many important questions remain about differential AP
course impacts along student teacher and school attributes and on different parts of the outcome
distributions What are the general equilibrium effects of AP expansion for instance on college
admissions decisions as AP expands into schools with fewer resources Do AP courses generate
spillover effects on non-AP course-takers via changes in peer interactions and changes in how
teachers teach their non-AP classes These are all questions that warrant further research
15
References
Abadie Alberto Susan Athey Guido W Imbens and Jeffrey Wooldridge 2017 ldquoWhen Should
you Adjust Standard Errors for Clusteringrdquo NBER Working Paper No 24003
Cambridge MA NBER
Adelman Clifford 2006 The Toolbox Revisited Paths to Degree Completion from High School
Through College Washington DC US Department of Education
Aguilar Lauren Greg Walton and Carl Wieman 2014 ldquoPsychological Insights for Improved
Physics Teachingrdquo Physics Today 67 (5) 43ndash49
Altonji Joseph G 1995 ldquoThe Effects of High School Curriculum on Education and Labor
Market Outcomesrdquo The Journal of Human Resources 30 (3) 409ndash438
Anderson Carl R 1976 ldquoCoping Behaviors as Intervening Mechanisms in the Inverted-U-stress-
performance Relationshiprdquo Journal of Applied Psychology 61 (1) 30ndash34
Attewell Paul and Thurston Domina 2008 ldquoRaising the Bar Curricular Intensity and
Academic Performancerdquo Educational Evaluation and Policy Analysis 30 (1) 51ndash71
Avery Christopher Oded Gurantz Michael Hurwitz and Jonathan Smith 2018 ldquoShifting
College Majors in Response to Advanced Placement Exam Scoresrdquo Journal of Human
Resources 53 (4) 918ndash956
Benjamini Yoav and Yosef Hochberg 1995 ldquoControlling the False Discovery Rate A Practical
and Powerful Approach to Multiple Testingrdquo Journal of the Royal Statistical Society 57
(1) 289ndash300
Bennett J S Hogarth F Lubben B Campbell and A Robinson 2010 ldquoTalking Science The
Research Evidence on the Use of Small Group Discussions in Science Teachingrdquo
International Journal of Science Education 32 (1) 69ndash95
Berger Joe 2006 ldquoDemoting Advanced Placementrdquo The New York Times October 4
Boekaerts Monique and Jeroen S Rozendaal 2010 ldquoUsing Multiple Calibration Indices in
Order to Capture the Complex Picture of What Affects Studentsrsquo Accuracy of Feeling of
Confidencerdquo Learning and Instruction 20 (5) 372ndash382
Bound John Brad Hershbein and Bridget Terry Long 2009 ldquoPlaying the Admissions Game
Student Reactions to Increasing College Competitionrdquo The Journal of Economic
Perspectives 23 (4) 119ndash146
Bowie Liz 2013 ldquoMaryland Schools have been Leader in Advanced Placement but Results are
Mixedrdquo The Baltimore Sun August 17
Bush George W 2006 ldquoState of the Union Address by the Presidentrdquo Washington DC The
White House
Chiu Ming Ming and Robert M Klassen 2010 ldquoRelations of Mathematics Self-Concept and its
Calibration with Mathematics Achievement Cultural Differences among Fifteen-year-
olds in 34 Countriesrdquo Learning and Instruction 20 (1) 2ndash17
Clotfelter Charles T Helen F Ladd and Jacob L Vigdor 2010 ldquoTeacher Credentials and
Student Achievement in High School Across-Subject Analysis with Student Fixed
Effectsrdquo Journal of Human Resources 45 (3) 655ndash681
College Board 2002 Equity Policy Statement New York NY
__________ 2011a AP Biology Curriculum Framework 2012-2013 New York NY
__________ 2011b AP Chemistry Curriculum Framework 2013-2014 New York NY
__________ 2017a AP Course and Exam Redesign New York NY
__________ 2017b AP Course Audit New York NY
__________ 2018 AP Program Participation and Performance Data 2018 New York NY
16
Davis Jennifer R 2014 ldquoA Little Goes a Long Way Pressure for College Students to Succeedrdquo
Journal of Undergraduate Research 12 (1) 1ndash9
Dobbie Will and Roland G Fryer Jr 2015 ldquoThe medium-term impacts of high-achieving
charter schoolsrdquo Journal of Political Economy 123 (5) 985ndash1037
Dougherty Chrys and Lynn Mellor 2009 ldquoPreparation Mattersrdquo National Center for
Educational Achievement Washington DC
Dounay Zinth Jennifer 2016 ldquo50-State Comparison Advanced Placement Policiesrdquo Education
Commission of the States
Drew Christopher 2011 ldquoRethinking Advanced Placementrdquo The New York Times January 7
Duffett Ann and Steve Farkas 2009 ldquoGrowing Pains in the Advanced Placement Program Do
Tough Trade-offs Lie Aheadrdquo Thomas B Fordham Institute Washington DC
Ellis Jessica Bailey K Fosdick and Chris Rasmussen 2016 ldquoWomen 15 Times More Likely to
Leave STEM Pipeline after Calculus Compared to Men Lack of Mathematical
Confidence a Potential Culpritrdquo PLOS ONE 11 (7) 1ndash14
Foust Regan Clark Holly Hertberg-Davis and Carolyn M Callahan 2009 ldquoStudentsrsquo
Perceptions of the Non-academic Advantages and Disadvantages of Participation in
Advanced Placement Courses and International Baccalaureate Programsrdquo Adolescence
44 (174) 289ndash312
Geiser Saul and Veronica Santelices 2004 ldquoThe Role of Advanced Placement and Honors
Courses in College Admissionsrdquo Center for Studies in Higher Education Research
Occasional Paper Series CSHE404
Goodman Joshua Samuel 2012 ldquoThe Labor of Division Returns to Compulsory Math
Courseworkrdquo Unpublished Manuscript
Harel O 2009 ldquoThe Estimation of R-squared and Adjusted R-squared in Incomplete Data Sets
Using Multiple Imputationrdquo Journal of Applied Statistics 36 (10) 1109ndash1118
Hippel Paul T von 2007 ldquoRegression with Missing Ys An Improved Strategy for Analyzing
Multiply Imputed Datardquo Sociological Methodology 37 (1) 83ndash117
Holstead Michael S Terry E Spradlin Margaret E McGillivray and Nathan Burroughs 2010
ldquoThe Impact of Advanced Placement Incentive Programsrdquo Center for Evaluation and
Education Policy Indiana University Education Policy Brief 8(1)
Hopkins Katy 2012 ldquoWeigh the Benefits Stress of AP Courses for Your Studentrdquo US News
the World Report May 10
Huber Martin 2013 ldquoA Simple Test for the Ignorability of Non-compliance in Experimentsrdquo
Economics Letters 120 (3) 389ndash391
Imbens G and F Manski 2004 ldquoConfidence Intervals for Partially Identified Parametersrdquo
Econometrica 72 (6) 1845ndash1857
Jackson C Kirabo 2010 ldquoA Little Now for a Lot Later A Look at a Texas Advanced Placement
Incentive Programrdquo Journal of Human Resources 45 (3) 591ndash639
__________ 2014 ldquoDo College-Preparatory Programs Improve Long-Term Outcomesrdquo
Economic Inquiry 52 (1) 72ndash99
Joensen Juanna Schroslashter and Helena Skyt Nielsen 2009 ldquoIs there a Causal Effect of High
School Math on Labor Market Outcomesrdquo Journal of Human Resources 44 (1) 171ndash
198
Kim Emily 2015 ldquoAP Classes often Translate to Advanced Pressurerdquo Los Angeles Times
September 22
Klopfenstein Kristin and Kit Lively 2016 ldquoDo Grade Weights Promote More Advanced
17
Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324
Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement
Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891
__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and
Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds
Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188
Cambridge Harvard Education Press
Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla
Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)
287ndash 313
Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on
Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102
Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations
of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347
(6219) 262ndash265
Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math
and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic
Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student
STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher
Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking
on Secondary and Postsecondary Successrdquo American Educational Research Journal 49
(2) 285ndash322
Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP
Expansion Can Schools in Less-Resourced Communities Successfully Implement
Advanced Placement Science Coursesrdquo Conditionally accepted by Educational
Researcher
Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo
American Enterprise Institute Washington DC
Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23
McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy
Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of
Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-
144) US Department of Education Washington DC National Center for Education
Statistics
National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of
Mathematics and Science in US High Schoolsrdquo Washington DC National Academies
Press
__________ 2012 A Framework for K-12 Science Education Practices Crosscutting
Concepts and Core Ideas Washington DC The National Academies Press
Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC
Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data
Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures
Version 10 Stanford University
Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic
Analysis amp Policy 4 (1) 1ndash30
18
Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The
Review of Economics and Statistics 86 (2) 497ndash513
Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)
Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of
Advanced High School Coursework in Increasing STEM Career Interestrdquo Science
Educator 23 (1) 1ndash13
Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework
in College Admission Decisionsrdquo College and University 82 (4) 7ndash14
Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan
Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific
Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo
Educational Measurement Forthcoming
Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where
it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor
Economics 35 (1) 67ndash147
Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An
Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732
Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual
differencesrdquo Personality and Individual Differences 21 (6) 971ndash986
Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of
Cross-Cultural Psychology 45 (5) 821ndash837
Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid
Growthrdquo The New York Times April 29
Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo
Liberal Education 94 (3) 38ndash43
The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo
Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo
Education Trust June 5
Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and
Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-
001) US Department of Education Washington DC National Center for Education
Statistics
Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13
Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate
US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the
Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced
Placement Testsrdquo Washington DC
Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of
Advanced Placementrdquo Progressive Policy Institute Washington DC
West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth
Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring
Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation
and Policy Analysis 38 (1) 148ndash170
Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity
of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482
19
Figure 1
Geographic Distribution of Participating Districts
20
Figure 2
Participating Districts Neighborhood Socioeconomic Status and School Test Scores
Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school
district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos
neighborhood defined as the first principal component factor score based on measures of median
income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed
household rate and unemployment rate Y-axis is the districtrsquos average test score in grade
equivalents based on the averaged spring math and English scores for students in grades 3-8 for
2009-2013 with the expected level of achievement standardized to zero The size of each circle
is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using
Statarsquos default settings and roughly shows the predicted test score as a function of the
neighborhoodrsquos SES
21
Figure 3
Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile
Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects
Corresponding OLS estimate shown by the dashed horizontal line Science skill has been
standardized to have a mean of 0 and SD of 1 for the full sample of participating students
Results are weighted by the inverse probability of completing the survey
22
Table 1
Participating Schools and Teachers Compared to Other US High Schools and High School
Science Teachers Panel A Schools Participating Others
Average Enrollment 1409 723
Free or Reduced-Price Lunch 0700 0438
Asian 0055 0050
Black 0349 0154
Hispanic 0410 0221
White 0164 0537
Adjusted Cohort Graduation Rate 0843 0802
District Instruction Expenditures Per Pupil $6561 $5636
District Student Services Expenditures Per Pupil $3787 $3385
Panel B Teachers Participating Others
Age Under 30 0407 0160
Age 30-49 0432 0553
Age 50 or over 0161 0287
Female 0630 0536
Hispanic or Latino 0111 0051
Race American Indian or Alaska Native 0000 0009
Race Asian American 0111 0041
Race Black 0111 0060
Race Native Hawaiian or other Pacific Islander 0000 0004
Race White 0778 0896
Years of Experience 103 132
Years of Experience lt=2 0290 0085
Years of Experience lt=5 0481 0234
Hold a Teaching Certificate 0926 0945
Undergraduate Major in STEM 0944 0747
Single Subject Credential in Science 0630 0823
Masterrsquos Degree or Higher 0356 0615
Previously Taught AP Course 0469 NA
Previously Taught AP IB or Honors Course 0796 NA
Number of Professional Development Trainings 309 NA
in the Past 5 years (0-5)
Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts
httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public
high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a
9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the
Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey
httpsncesedgovsurveyssass Others in Panel B refers to public and private high school
teachers in the US High school science teachers are defined as teachers of grades 9-12 whose
main teaching assignment is in the natural sciences
23
Table 2
TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics
(1) (2) (3) (4) (5) (6)
Full Sample Survey Sample
Pre-Treatment Characteristic
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Age as of October of 11th Grade 166 -003 -007 166 -001 -001
(002) (007) (003) (009)
[019] [035] [065] [094]
Math Exam Score 038 008 025 044 007 030
(004) (010) (005) (016)
[008] [002] [017] [006]
Reading Exam Score 029 010 018 036 009 017
(003) (012) (004) (017)
[000] [014] [002] [031]
HS Grade Point Average 316 005 020 323 006 013
(003) (008) (003) (010)
[014] [002] [006] [020]
Female 059 000 010 061 -001 011
(003) (006) (004) (007)
[099] [010] [073] [012]
Asian American 012 002 010 012 003 010
(002) (005) (001) (007)
[027] [006] [007] [012]
Black 032 -002 -006 027 000 -005
(002) (006) (002) (005)
[029] [028] [088] [040]
Hispanic Native American or Multiracial 031 001 005 033 001 005
24
(002) (006) (002) (007)
[055] [041] [081] [051]
Disabled 002 000 -001 001 000 -001
(001) (001) (001) (001)
[093] [024] [057] [05]
Gifted 013 003 000 014 002 001
(002) (005) (002) (009)
[006] [100] [025] [089]
English Language Learner 005 001 002 004 001 004
(001) (002) (001) (003)
[041] [039] [054] [022]
Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007
(002) (007) (003) (009)
[066] [077] [072] [045]
Language Other than English Spoken at Home 034 002 003 035 001 004
(002) (007) (002) (007)
[032] [073] [059] [056]
Took Recommended Prerequisite Courses 079 000 009 079 002 005
(002) (004) (002) (005)
[084] [004] [043] [031]
Number of Observations 1819 1417
Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by
School x Cohort are in parentheses and p-values are in brackets
25
Table 3
First Stage Impacts on AP Course Enrollment and Overall Course Enrollment
(1) (2) (3) (4) (5) (6)
Full Sample Survey Respondents
Outcome
Control
Group
Mean
ITT
LATE
Control
Group
Mean
ITT
LATE
AP Treatment Course Enrollment 019 038 024 039
(005) (006)
[000] [000] Share of Credits During Study Year in
AP Science 003 004 011 003 004 010
(001) (001) (001) (001)
[000] [000] [000] [000]
All AP 013 004 011 014 004 010
(001) (002) (001) (002)
[000] [000] [000] [000]
Other Advanced Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [023] [020] [020]
All Other Advanced 025 -001 -003 025 -001 -003
(001) (002) (001) (003)
[023] [023] [030] [030]
Regular Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [020] [024] [019]
All Regular 062 -003 -009 061 -003 -007
(001) (003) (001) (003)
[002] [000] [007] [003]
Number of Observations 1819 1417
Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating
Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation
(1) Course-taking information collected from student transcripts Control Group Mean uses the
full control group for the first outcome (ie AP Treatment Course Enrollment) and those control
group members who complied with their assignment (ie those who did not take the AP
Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are
weighted by the inverse probability of completing the survey Standard errors clustered by School
x Cohort are in parentheses and p-values are in brackets
26
Table 4
Treatment Contrast (Composite Variables)
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Academically Challenging Curriculum -033 031 080
(010) (024)
[000] [000]
Project-Based Independent Classroom
Activities -006 013 033
(007) (017)
[007] [006]
Integrated Use of Technology
-011 011 028
(008) (019)
[019] [014]
Number of Observations 1417
Notes To construct these composite variables we first converted the values on each component
variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest
category was set to 10 the lowest to 00 and the remaining categories evenly spaced between
00 and 10 We then averaged and standardized these converted values Results are weighted by
the inverse probability of completing the survey Online Appendix Table 5 provides the list of
component variables Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
27
Table 5
AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Science Skill -019 009 023
(006) (016)
[015] [014]
STEM Interest 062 004 009
(002) (007)
[016] [016]
Confidence in College
Science 092 -004 -010
(002) (005)
[011] [006]
Stress 012 007 017
(003) (007)
[002] [001]
Grades in Science Courses 280 -012 -029
(007) (016)
[008] [007]
Grades in Other Courses 314 -007 -018
(002) (006)
[000] [000]
Number of Observations 1819 for grades 1417 for other
outcomes
Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of
participating students STEM interest =1 if high or some interest in pursuing a STEM degree or
=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to
complete a college science course or =0 if somewhat not confident or not at all confident Stress=
1 if most recent science course had strong negative or negative impact on physical or emotional
health or =0 if strong positive impact positive impact or no impact Grades in science and other
courses are obtained from student transcripts and measure grades during the study year
Results with the exception of grades during study year are weighted by the inverse probability of
completing the survey Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
Table 6
Robustness Checks of Main ITT Results
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Outcome
Control
Group
Complie
r Mean
Main
Result
s
Robus
t SE
p-value
(permutatio
n test)
Excludin
g High
School
56
Including
Imputatio
n of
Missing
Outcome
Variables
Excluding
Covariate
s
Excludin
g High
School
23
Lee
Lower
Boun
d
Lee
Upper
Boun
d
95
Confidence
Interval
from Lee
Bounds
Rati
o of
95
CI in
(11)
to
95
CI in
(7)
Science Skill -019 009 010 011 020 007 003 039
-
009
05
1 20
(006) (005) (000) (000) (000) (000) (007) (007)
[015] [006] [006] [020] [011] [001] [024] [072] [000]
STEM Interest 062 004 005 003 003 003 002 012
-
003
01
8 19
(002) (003) (000) (000) (000) (000) (003) (004)
[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College
Science 092 -004 -003 -006 -006 -004 -006 005
-
009
01
0 20
(002) (002) (000) (000) (000) (000) (002) (003)
[011] [005] [007] [037] [002] [003] [010] [000] [017]
Stress 012 007 005 006 008 007 001 011
-
005
01
5 16
(003) (002) (000) (000) (000) (000) (003) (002)
[002] [000] [000] [014] [007] [002] [002] [079] [000]
Grades in Science Courses 280 -012 -006 -010 -007 |
(007) (004) (000) (000) (000)
[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts
Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey
(002) (003) (000) (000) (000) |
[000] [001] [001] [000] [001] [038]
Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than
standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby
a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of
43
the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)
reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the
experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply
imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and
where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)
from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those
treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and
control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to
derive confidence interval for the treatment effect itself)
30
1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the
Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the
effect of advanced high school courses more generally often without distinctions between AP
and other rigorous course options Nearly all of these nonexperimental studies find large positive
effects of rigorous secondary school courses particularly those in math and science on studentsrsquo
high school postsecondary and labor market performance (eg Altonji 1995 Attewell and
Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long
Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer
an AP course are encouraged (though not required) to attend a professional development
training The Board and other independent agencies offer several workshops with the most
extensive training being the AP summer institute a week-long training that is led by an
experienced AP instructor Teachers are then expected to develop their syllabi for the course and
submit them to the Board for review A team of auditors at the Board review each syllabus and
grant permission to a school to label the course as AP on course catalogs and student transcripts
once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they
do not meet the requirements upon original submission College Board (2017b) contains a
discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for
assessment (ie course delivery and student performance are not assessed by the Board) In
order to effectively run an AP Biology or Chemistry course teachers require access to a well-
equipped classroom and laboratory including all supplies necessary to engage in
experimentation (eg beakers solutions microscopes measuring equipment) Most of the
teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of
learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry
reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses
were based upon recommendations from the National Science Foundation the National Research
Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently
influenced by their frames of reference in ways that other assessments of these traits (eg
external observations) may be less influenced By increasing the standard to which they compare
themselves studentsrsquo confidence may decrease This feature of most self-assessments could be
considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et
al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome
depends to some extent on how these changes in perceived ability influence other behaviors
such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and
Biology I and Chemistry I for AP Biology with no additional requirements beyond these
prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week
training course classroom supplies (eg lab materials textbooks) and to compensate schools
for the staff time required for study administration efforts We also offered $1000 compensation
for an individual selected by the school to serve as a liaison between the study team and the
31
school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of
students which would have powered the study to detect effect sizes smaller than those detected
here We faced several challenges in recruiting schools to participate even with the monetary
incentives Some schools were uncomfortable with randomization across classrooms while
others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the
course would be offered We also made some assignments on a rolling basis as additional
consentassent forms were submitted We have no information on the students who were deemed
eligible by the school to take the new AP science course but who did not sign the consent form
to participate As these students did not participate we do not have permission to obtain
information on their characteristics (eg via transcripts) and for most schools we do not know
the number of such students 10 Participating districts include Anaheim Union High School District California East Side
Union High School District California Lynwood Unified School District California Jefferson
Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg
Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public
Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville
Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate
degrees held by teachers nationally are likely to be in education (not STEM) Thus the study
teachers are less likely to have a graduate degree but not necessarily less likely to have STEM
training We also did not survey teachers regarding their Teach for America (TFA) experience
but it is possible that the relatively high share of STEM undergraduate degrees could be driven
by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last
pilot test included 140 students) prior to administering the tool to study participants Reliability
metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of
the development of the assessment instrument in the survey can be found in Seeratan et al
(2017) 13 Each year in the spring semester our team administered and collected the participant surveys
during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However
if study participants who did not take the survey differ in unobserved ways then our reweighting
based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo
characteristics before imputation of missing values (as described below) these results are very
similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact
that some students did not have 10th grade scores) we created one reading and math score for
each student that is the average of both scores or just the 8th grade score For the 23 participating
students who were in 10th grade during the year in which the AP course was offered to their
cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be
endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of
enrollment in the course so the lack of balance is simply due to unlucky randomization rather
32
than manipulation by school administrators We considered implementing a randomized block
design to avoid such issues but found it infeasible to obtain the necessary test score information
prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was
allowed to register for the new class We added an entire planning year to our study design to
avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by
Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We
find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these
six outcomes which suggests that generalizing our estimated treatment effects to the full control
group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly
offered enrollment in an AP course and then given the option of Chemistry or Biology To
account for the two courses offered we treat the school as two separate groups School-
Chemistry and School-Biology For those students who were not offered an AP course we
randomly assign them to one of two control groups proportional to the number of treated
students who chose each course For example if 60 of the treated students chose Biology then
we randomly assign 60 of the control students to the School-Biology control group In Section
VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a
probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1
if student i in school by cohort j completed any part of the end-of-year survey Xi is the same
vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed
effects and Φ() is the cumulative normal distribution function The results of this regression are
included in Online Appendix Table 2 Students who had higher pre-treatment grades Black
students those who were not disabled and those who took prerequisite courses were more likely
to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives
more weight in the regression to study participants who completed the survey and yet had pre-
study characteristics that were similar to those study participants who did not complete the
survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and
with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we
observe each outcome variable This follows a multiple imputation then deletion strategy
suggested by Hippel (2007) which improves efficiency while protecting against problematic
imputed outcome values As a robustness check Section VC provides results including
imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually
challenging curriculum with more homework than non-AP complier students Treatment group
students are also more likely to report that the students in their class were driven to succeed and
that the teacher set high standards The AP science class also involved more student-led projects
or experiments hands on learning and small group work all activities that are deemed to be
essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)
Yet we do not find strong evidence that students in AP classes were more likely to present what
they learned apply their knowledge to solve a new problem or work independently and none of
the component measures of technology usage were statistically significantly affected Nor did
33
treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear
better able to implement the academic rigor expected of an AP science class than some of the
inquiry-based approaches that the College Board intends for AP science We do not find
evidence that taking AP science led students to be more likely to report that they found their
course more interesting which may reflect the inability of the teachers to fully implement a
creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects
that might render our estimated effects smaller A research design with randomization both
across and within schools would allow for estimation of spillover effects but such a design was
infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers
received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the
weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors
in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple
outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons
(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same
three outcomes that reach statistical significance without applying the correction (shown in
Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys
from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we
have included the students from cohort 1 of high school number 23 where nonresponse was due
mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes
(including student prior academic preparation raceethnicity gender and teacher preparation)
We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in
science and grades in other courses) Some of the differences in the point estimates were quite
large yet so too were the standard errors For instance five of the seven estimated differential
treatment effects on science skill exceed 025 standard deviations with p-values that fall in the
suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse
on all three cohorts of study participants Once data collection is complete we will have the
ability to examine the effect of AP science on college enrollment college selectivity and college
completion
9
participants during a 45-minute period measures studentsrsquo skills in data analysis scientific
explanation and scientific argument12 Participating teachers were not provided copies of the
instrument in advance therefore teachers were unable to teach any content material prior to test
administration
The second source is a questionnaire that we administered concurrently with the assessment
and that asks students a number of questions about their most recent science class and their plans
after high school The assessment and questionnaire were completed together and administered
outside of class (henceforth we refer to these instruments as the ldquosurveyrdquo) The third data source
are studentsrsquo high school transcripts which contain data on demographic and socioeconomic
background grades courses standardized exams taken in the 8th and 10th grades as well as high
school completion We use these data to determine the balance of randomization on pre-
treatment covariates estimate the effect of randomization on course-taking (including
compliance) improve the precision of our estimates with statistical controls and estimate
treatment effects on studentsrsquo grades
Our survey response rate was 78 percent13 Attrition can be attributed to student absences
during the dates scheduled for survey administration and communication lapses between school
coordinators and students Students who were randomly assigned to treatment have a 9-
percentage point higher survey response rate Given the possibility of nonrandom sample
attrition we weight all regressions by the inverse of the probability of completing the survey
conditional on student characteristics14 We implement a variety of robustness checks as
additional means to account for nonresponse These include multiple imputation of missing
outcome variables excluding one high school that had a low response rate and using the Lee
(2009) technique to provide bounds on the estimated effects These methods and results are
discussed below
We supplement these data with surveys that we administered online to teachers of the new
AP courses at the conclusion of the course The teacher survey includes questions about their
educational background professional experiences and professional development past and
present instructional practices generally and around science specifically participation in the
College Board AP training ability to cover the content of the AP course and coaching
mentoring and other professional community supports received from the school district and
education community
Table 2 provides balancing tests on pre-treatment characteristics for the full sample and the
survey respondents conditional on school by cohort fixed effects15 Most of the estimated
differences between treatment and control group students on pre-treatment observed
characteristics are small with some notable exceptions In both the full and survey samples
treatment group studentsrsquo reading exam scores were 010 and 009 standard deviations higher
than control group students both at p-values below 005 The magnitude of the treatment-control
difference was slightly lower and less precisely-estimated in math yet also favored treatment
group students16 To adjust for these chance imbalances we include all student covariates as
predictors of outcomes in the models and in the robustness checks we exclude these
covariates17
Table 2 also shows the extent of differences between control group compliers and non-
compliers We find that non-compliers are generally much more academically prepared for AP
science they have higher pre-treatment reading and math test scores and are more likely to have
completed the prerequisite courses On demographics non-compliers are more likely to be Asian
American and female18
10
IV Empirical Strategy
We estimate the effect of taking the AP science course with a standard instrumental variable
specification
(1) 119884119894119895 = 120572119895 + 119860119894119895120573 + 119935119894120574 + 120598119894119895
(2) 119860119875119894119895 = 120575119895 + 119874119891119891119890119903119890119889119894119895120579 + 119935119894120583 + 120598119894119895
where 119860119875119894119895 = 1 if student i enrolled in the AP science course in school x cohort stratum j 119860119894119895 is
the fitted value based on the estimates of the parameters in Equation (2) Offeredij = 1 if the
student is randomized into the treatment group Xi is a vector of pre-treatment covariates
(including age math and reading exam scores from 8th and 10th grade (standardized and
averaged for math and reading separately) cumulative GPA prior to the year when the AP
science course was offered and indicator variables for female racial group (Asian American
Black or Hispanic Native American or Multiracial) disability gifted English Language
Learner eligible for free or reduced-price lunch home language is not English and took
recommended prerequisite courses) and 120572119895 and 120575119895 are school by cohort fixed effects19 We use
two-stage least squares to estimate the model for all outcomes The local average treatment effect
(LATE) estimate is given by β
The intent to treat (ITT) estimate is obtained by replacing 119860119894119895 with Offeredij in Equation (1)
as shown in Equation (3) The coefficient on Offeredij in Equation (3) provides the effect of
being offered enrollment in the new AP science course and is a weighted average of effects on
those who do and do not choose to enroll in the course
(3) 119884119894119895 = 120577119895 + 119874119891119891119890119903119890119889119894119895120591 + 119935119894120582 + 120598119894119895
For outcomes that are obtained from the survey we weight regressions by the inverse of the
estimated probability of completing the survey20 The results are similar without using these
weights (see Online Appendix Tables 3 4 and 6) Since we have some missingness in student
characteristics as a result of either missing student transcripts or certain data elements not
collected by the district we use multiple imputation by chained equations creating 10 imputed
datasets and combine the results21 For inference we cluster standard errors at the level of
treatment assignment (school by cohort) in our analysis of main effects In the analysis of
robustness we report permutation standard errors robust standard errors (for comparison to
permutations) and the statistical significance of the LATE estimates after adjusting our tests of
significance for multiple comparisons
V Results
A Course-Taking and Treatment Contrast
Table 3 provides estimated effects of the randomized offer of enrollment on AP science course
enrollment and share of credits in all courses for the full sample and the survey samples The
first-stage estimates indicate that the offer substantially increased the likelihood of the student
taking the AP science course by 38 percentage points in the full sample and 39 percentage points
in the survey sample As we expected compliance with randomization was imperfect with 42
11
percent of the students who received an offer choosing not to enroll and 19 percent of the control
students enrolling Nearly all of these latter crossovers reflected decisions by the district to
violate the study protocol and let control group students into the course while a few of these
came from hardship exemptions that were requested by the school and granted by the study team
The remaining rows in Table 3 shine light on the courses that were crowded out by the newly
offered AP science course Mechanically treatment group students took more credits in AP
science (an 11-percentage point increase in the share of total credits in the full sample)
Treatment group studentsrsquo share of courses in any AP also increased by 11 percentage points
indicating that they chose not to reduce enrollment in other AP courses Instead taking AP
science appears to have crowded out regular courses (down 9 percentage points) including
regular science courses (down 2 percentage points)22
Approximately 78 percent of the control group compliers took any science course with 34
percent taking a non-AP advanced science course (almost entirely honors courses) during the
study year The control students who did not take AP Biology or Chemistry took a variety of
alternative science courses with the most commonly reported courses including Chemistry
(13) Physics (12) AP Environmental Science (11) Biology (10) Honors Biology (9)
and AnatomyPhysiology (9)
Table 4 provides the contrast in treatment and control group complier reports on the content
and rigor of their science courses for three composite variables We find that taking AP science
yielded a substantially more academically challenging curriculum (up 080 sd p-value lt 001)
and raised the extent of inquiry-based classroom activities (up 033 sd p-value = 006) Our
results also suggest that AP course-takerrsquos classrooms were more likely to use technology (up
028 sd p-value = 014)23 Online Appendix Table 5 shows estimated impacts on each of the
component variables used in constructing the composite variables We find that while AP
classrooms were more inquiry-based than other science classrooms using our composite
measure some of the core components of the inquiry approach that were intended by the Board
(eg applying knowledge to solve a new problem) were not more prevalent in AP science
classes than other science classes24 This contrast between studentsrsquo reports of the content and
rigor of their AP science course relative to other courses available to them offers one measure of
the relative quality of the treatment In a companion manuscript we provide a detailed evaluation
of implementation fidelity (the degree to which the courses were implemented as intended by the
Board) through teacher surveys course syllabi student transcripts and interviews with teachers
and school administrators (Long Conger and McGhee 2018) In that manuscript we find results
that are consistent with the finding that most teachers were able to implement a rigorous AP
science classroom yet they also struggled with the inquiry-based approach and integrating
technology into the classroom
These reported differences between treatment and control group classrooms also hold despite
the fact that many of the teachers selected to teach AP also teach the other science courses taken
by control group students In fact almost 67 percent of AP teachers reported using some of their
AP science strategies and lessons in their non-AP classes These within-school spillovers likely
attenuate observed differences in outcomes between treatment and control group students in the
same school25
B AP Impact on Outcomes
Table 5 reports estimated impacts of AP science on the key outcomes of interest We estimate
that for the typical complier taking AP science raises objectively measured scientific inquiry
skills by 023 standard deviations We are unable to rule out zero treatment impacts with
12
conventionally high levels of confidence (p-value = 014) and consequently refer to these results
as more suggestive than definitive AP science also increased compliersrsquo interest in pursuing a
STEM degree should they enroll in college by 9 percentage points up from a control group
complier mean of 62 percent with again more suggestive than definitive results at traditional
levels of statistical inference (p-value = 016)
Table 5 provides stronger evidence of negative treatment effects on studentsrsquo confidence in
their ability to succeed in a college science course Among control group compliers 92 percent
express that they are at least somewhat confident in their ability to succeed in a college science
course These high levels of confidence are perhaps not surprising since all of our sample
participants demonstrated interest in taking AP Chemistry or Biology as a result of signing the
study assent forms Taking AP science substantially lowered participantsrsquo likelihood of being at
least somewhat confident in their ability to complete college courses in science (down 10
percentage points p-value = 006) We also find large effects of the AP course on studentsrsquo self-
reported stress levels Among control group compliers 12 percent stated that their most recent
science class had a negative or strong negative impact on their stress levels (where a negative
impact indicates more stress) Taking AP science more than doubles this rate raising the
likelihood of stating a negative impact by 17 percentage points (p-value = 001) In results
available from the authors we also examine the effect of taking AP on the full distribution of
studentrsquos self-reported confidence and stress levels We find that taking AP science increases
studentsrsquo likelihood of reporting strong negative impacts on stress by 5 percentage points (p-
value = 005) above the control group complier mean of 2 percent
In addition to experiencing a loss in confidence and an increase in stress treatment group
studentsrsquo grades suffered We estimate that taking AP science reduced studentsrsquo grades in their
science courses by 029 points (p-value = 007) Relative to a control group complier mean of
280 taking AP science lowers studentsrsquo science GPAs during the study year (usually their junior
year) from around a B- to a C+26 This decline is addressed to some degree by high schools that
use a weighted grade point average to upweight grades from AP courses The last row of Table 5
provides our estimated effects of AP science on studentsrsquo grades in other courses AP science
takers score approximately 018 grade points lower than control group compliers in non-science
courses during the study year (p-value below 001) These results suggest that students may be
shifting their effort away from their non-AP classes in order to meet the demands of the
challenging AP course An average of these impacts weighted by studentsrsquo share of credits in
science during the study year assuming that they take AP science (024) suggests that taking AP
science lowers studentsrsquo overall grades by 021 during the year ((-029 times 024) + (-018 times
076))
With our estimates in hand we can easily compute the adjustment that would leave the
studentrsquos GPA during the study year unaffected For students who took AP Biology or Chemistry
as result of this experiment the share of their classes in any AP science subject is predicted to be
14 percent (ie 002 + 012 from Table 3) If these studentsrsquo grades in AP science courses were
boosted by 146 (021014) their GPAs during the study year would be unaffected by their
enrollment in these AP courses This 146 boost is close to the higher end of the practices
documented in Klopfenstein and Lively (2016)27
C Robustness Checks
Table 6 presents a variety of robustness checks of the ITT estimates on our six main outcomes
The first two columns of this table repeat the findings previously shown in Table 5 Columns (3)
and (4) present alternate methods for inference Column (3) reports robust standard errors and
13
Column (4) reports the results of a permutation test where we randomly assign a pseudo
treatment and compute the share of 1000 permutations where the absolute value of the estimated
pseudo treatment effect exceeds the absolute value of the estimated treatment effect shown in
Column (2)28 The resulting p-values from this permutation test are similar to the results using
robust standard errors (shown in Column (3)) resulting in five of the six outcomes with p-values
of less than 01029
Columns (5) through (7) of Table 6 show that the results are robust to (a) dropping the one
high school that offered both AP Biology and AP Chemistry as part of the study (b) including
observations with multiply-imputed missing outcome variables and (c) excluding the high
school with the lowest survey response rate30 Column (8) shows the results when we exclude all
of the Xi covariates where we find much larger estimated positive effects on scientific inquiry
skills and smaller estimated negative effects on grades The differences in the treatment effects
on the remaining three outcomes are modest These results likely reflect the fact that students
who were randomly assigned into the treatment group have higher pre-treatment grades and
reading and math test scores all covariates that strongly correlate with science skill and future
grades
Columns (9) through (12) of Table 6 use the Lee (2009) method to place bounds on our
estimates due to potential nonresponse bias in the student survey used for the first four outcomes
This method trims particular observations from the treatment group (in this case) until it matches
the response rate of the control group The lower (upper) bound estimate trims the treatment
observations with the highest (lowest) values of the outcome Using these lower and upper bound
estimates we compute the 95 percent confidence interval for the treatment effect itself by
applying the Imbens and Manski (2004) method Consistent with our main findings the upper
and lower bound points estimates are positive for science skill (003 and 039 sd) interest in
pursuing a STEM degree (2 and 12 percentage points) and stress (1 and 11 percentage points)
However the 95 percent confidence intervals overlap zero in all cases and are roughly double the
size of the ordinary confidence intervals These results suggest that some additional caution
should be considered in evaluating the effects from outcomes based on the study survey31
Finally we would have liked to report the results of theoretically motivated heterogeneity
analyses yet we lack the statistical power needed to test heterogeneity with a high level of
confidence For example Figure 3 shows a quantile regression conditional on Xi with science
skill as the outcome We find that the point estimates at every quantile are insignificantly
different from the 009 ITT point estimate reported in Table 5 yet the 95 confidence intervals
fail to rule out large positives and negatives Additional heterogeneity results can be found in the
Online Appendix32
VI Conclusion
Most admissions committees at bachelorrsquos degree-granting institutions rely on applicantsrsquo AP
course and exam participation as signals of subject-matter skill and interest rendering the
relationship between AP uptake and college enrollment somewhat deterministic There has been
almost no empirical work to support the theory that AP disproportionately endows high school
students with greater human capital than the other courses available to them Many students
educators and parents have also complained that the rigor of the AP pro- gram causes students to
lose confidence gain stress and perform poorly in other courses We evaluate these claims with
experimental evidence on the impact of AP Biology and Chemistry courses on studentsrsquo skills
14
interests and beliefs We recruited 23 schools that had not previously offered AP Biology or
Chemistry and were willing to permit us to randomize student access to the newly offered
course At the time of our school recruitment an estimated 50 percent of US high schools
already offered AP science classes and they tended to be in relatively higher-income
communities disproportionately serving White students (Malkus 2016) Our study drew from the
remaining population of schools where teachers had lower levels of training than science
teachers nationally and students were disproportionately non-White and poor Consequently our
results on AP impacts best generalize to schools like these that are on the cusp of deciding
whether to offer an AP science course
The estimates suggest that AP science led to improvements in science skill and STEM
interest above the courses that these students would otherwise take Prior research points to
longer-run benefits of AP including a higher likelihood of college enrollment and completion as
well as possible earnings gains (Jackson 2010 2014) Our findings suggest that these long-term
effects are at least partially driven by genuine increases in skill and not due solely to
postsecondary admissions and credit-granting policies33 We also find that AP science classes
substantially increase studentsrsquo stress levels and reduce their confidence in completing a college
science course Students who take AP science also receive lower grades in science and in other
(non-science) courses The cognitive gains from AP science are consistent with evidence that
higher levels of pressure and a lower level of confidence cause students to learn more than they
would otherwise And some of the negative effect on grades can be offset by upwardly weighting
grades in advanced courses
Although we have no direct way to convert our study impacts into monetary values for
students or society our evidence suggests that schools and districts are not making unwise or
costly investments in AP Calculating the differential cost to deliver an AP course versus another
level course in the same subject is difficult given that few schools document per-course
expenditures One recent analysis of a US district that relied on teacher salaries and course
assignments offers a partial cost-analysis Roza (2009) finds approximately $360 more in per-
pupil expenditures to deliver AP versus honors due primarily to smaller class sizes and more
senior teachers in AP This cost does not factor in the time that teachers spend retraining
themselves to teach the new curriculum At the same time relative to other policies aimed at
increasing human capital in high school that are often more costly to implement (such as
reducing class size) offering an AP course may be one of the least expensive options
This study offers the first credible estimates on the impact of a curriculum that is now offered
in the majority of the nationrsquos high schools and used by most postsecondary institutions to assess
applicant potential Our findings offer evidence to support and refute some of the claims made
about the AP program At the same time many important questions remain about differential AP
course impacts along student teacher and school attributes and on different parts of the outcome
distributions What are the general equilibrium effects of AP expansion for instance on college
admissions decisions as AP expands into schools with fewer resources Do AP courses generate
spillover effects on non-AP course-takers via changes in peer interactions and changes in how
teachers teach their non-AP classes These are all questions that warrant further research
15
References
Abadie Alberto Susan Athey Guido W Imbens and Jeffrey Wooldridge 2017 ldquoWhen Should
you Adjust Standard Errors for Clusteringrdquo NBER Working Paper No 24003
Cambridge MA NBER
Adelman Clifford 2006 The Toolbox Revisited Paths to Degree Completion from High School
Through College Washington DC US Department of Education
Aguilar Lauren Greg Walton and Carl Wieman 2014 ldquoPsychological Insights for Improved
Physics Teachingrdquo Physics Today 67 (5) 43ndash49
Altonji Joseph G 1995 ldquoThe Effects of High School Curriculum on Education and Labor
Market Outcomesrdquo The Journal of Human Resources 30 (3) 409ndash438
Anderson Carl R 1976 ldquoCoping Behaviors as Intervening Mechanisms in the Inverted-U-stress-
performance Relationshiprdquo Journal of Applied Psychology 61 (1) 30ndash34
Attewell Paul and Thurston Domina 2008 ldquoRaising the Bar Curricular Intensity and
Academic Performancerdquo Educational Evaluation and Policy Analysis 30 (1) 51ndash71
Avery Christopher Oded Gurantz Michael Hurwitz and Jonathan Smith 2018 ldquoShifting
College Majors in Response to Advanced Placement Exam Scoresrdquo Journal of Human
Resources 53 (4) 918ndash956
Benjamini Yoav and Yosef Hochberg 1995 ldquoControlling the False Discovery Rate A Practical
and Powerful Approach to Multiple Testingrdquo Journal of the Royal Statistical Society 57
(1) 289ndash300
Bennett J S Hogarth F Lubben B Campbell and A Robinson 2010 ldquoTalking Science The
Research Evidence on the Use of Small Group Discussions in Science Teachingrdquo
International Journal of Science Education 32 (1) 69ndash95
Berger Joe 2006 ldquoDemoting Advanced Placementrdquo The New York Times October 4
Boekaerts Monique and Jeroen S Rozendaal 2010 ldquoUsing Multiple Calibration Indices in
Order to Capture the Complex Picture of What Affects Studentsrsquo Accuracy of Feeling of
Confidencerdquo Learning and Instruction 20 (5) 372ndash382
Bound John Brad Hershbein and Bridget Terry Long 2009 ldquoPlaying the Admissions Game
Student Reactions to Increasing College Competitionrdquo The Journal of Economic
Perspectives 23 (4) 119ndash146
Bowie Liz 2013 ldquoMaryland Schools have been Leader in Advanced Placement but Results are
Mixedrdquo The Baltimore Sun August 17
Bush George W 2006 ldquoState of the Union Address by the Presidentrdquo Washington DC The
White House
Chiu Ming Ming and Robert M Klassen 2010 ldquoRelations of Mathematics Self-Concept and its
Calibration with Mathematics Achievement Cultural Differences among Fifteen-year-
olds in 34 Countriesrdquo Learning and Instruction 20 (1) 2ndash17
Clotfelter Charles T Helen F Ladd and Jacob L Vigdor 2010 ldquoTeacher Credentials and
Student Achievement in High School Across-Subject Analysis with Student Fixed
Effectsrdquo Journal of Human Resources 45 (3) 655ndash681
College Board 2002 Equity Policy Statement New York NY
__________ 2011a AP Biology Curriculum Framework 2012-2013 New York NY
__________ 2011b AP Chemistry Curriculum Framework 2013-2014 New York NY
__________ 2017a AP Course and Exam Redesign New York NY
__________ 2017b AP Course Audit New York NY
__________ 2018 AP Program Participation and Performance Data 2018 New York NY
16
Davis Jennifer R 2014 ldquoA Little Goes a Long Way Pressure for College Students to Succeedrdquo
Journal of Undergraduate Research 12 (1) 1ndash9
Dobbie Will and Roland G Fryer Jr 2015 ldquoThe medium-term impacts of high-achieving
charter schoolsrdquo Journal of Political Economy 123 (5) 985ndash1037
Dougherty Chrys and Lynn Mellor 2009 ldquoPreparation Mattersrdquo National Center for
Educational Achievement Washington DC
Dounay Zinth Jennifer 2016 ldquo50-State Comparison Advanced Placement Policiesrdquo Education
Commission of the States
Drew Christopher 2011 ldquoRethinking Advanced Placementrdquo The New York Times January 7
Duffett Ann and Steve Farkas 2009 ldquoGrowing Pains in the Advanced Placement Program Do
Tough Trade-offs Lie Aheadrdquo Thomas B Fordham Institute Washington DC
Ellis Jessica Bailey K Fosdick and Chris Rasmussen 2016 ldquoWomen 15 Times More Likely to
Leave STEM Pipeline after Calculus Compared to Men Lack of Mathematical
Confidence a Potential Culpritrdquo PLOS ONE 11 (7) 1ndash14
Foust Regan Clark Holly Hertberg-Davis and Carolyn M Callahan 2009 ldquoStudentsrsquo
Perceptions of the Non-academic Advantages and Disadvantages of Participation in
Advanced Placement Courses and International Baccalaureate Programsrdquo Adolescence
44 (174) 289ndash312
Geiser Saul and Veronica Santelices 2004 ldquoThe Role of Advanced Placement and Honors
Courses in College Admissionsrdquo Center for Studies in Higher Education Research
Occasional Paper Series CSHE404
Goodman Joshua Samuel 2012 ldquoThe Labor of Division Returns to Compulsory Math
Courseworkrdquo Unpublished Manuscript
Harel O 2009 ldquoThe Estimation of R-squared and Adjusted R-squared in Incomplete Data Sets
Using Multiple Imputationrdquo Journal of Applied Statistics 36 (10) 1109ndash1118
Hippel Paul T von 2007 ldquoRegression with Missing Ys An Improved Strategy for Analyzing
Multiply Imputed Datardquo Sociological Methodology 37 (1) 83ndash117
Holstead Michael S Terry E Spradlin Margaret E McGillivray and Nathan Burroughs 2010
ldquoThe Impact of Advanced Placement Incentive Programsrdquo Center for Evaluation and
Education Policy Indiana University Education Policy Brief 8(1)
Hopkins Katy 2012 ldquoWeigh the Benefits Stress of AP Courses for Your Studentrdquo US News
the World Report May 10
Huber Martin 2013 ldquoA Simple Test for the Ignorability of Non-compliance in Experimentsrdquo
Economics Letters 120 (3) 389ndash391
Imbens G and F Manski 2004 ldquoConfidence Intervals for Partially Identified Parametersrdquo
Econometrica 72 (6) 1845ndash1857
Jackson C Kirabo 2010 ldquoA Little Now for a Lot Later A Look at a Texas Advanced Placement
Incentive Programrdquo Journal of Human Resources 45 (3) 591ndash639
__________ 2014 ldquoDo College-Preparatory Programs Improve Long-Term Outcomesrdquo
Economic Inquiry 52 (1) 72ndash99
Joensen Juanna Schroslashter and Helena Skyt Nielsen 2009 ldquoIs there a Causal Effect of High
School Math on Labor Market Outcomesrdquo Journal of Human Resources 44 (1) 171ndash
198
Kim Emily 2015 ldquoAP Classes often Translate to Advanced Pressurerdquo Los Angeles Times
September 22
Klopfenstein Kristin and Kit Lively 2016 ldquoDo Grade Weights Promote More Advanced
17
Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324
Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement
Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891
__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and
Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds
Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188
Cambridge Harvard Education Press
Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla
Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)
287ndash 313
Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on
Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102
Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations
of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347
(6219) 262ndash265
Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math
and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic
Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student
STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher
Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking
on Secondary and Postsecondary Successrdquo American Educational Research Journal 49
(2) 285ndash322
Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP
Expansion Can Schools in Less-Resourced Communities Successfully Implement
Advanced Placement Science Coursesrdquo Conditionally accepted by Educational
Researcher
Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo
American Enterprise Institute Washington DC
Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23
McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy
Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of
Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-
144) US Department of Education Washington DC National Center for Education
Statistics
National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of
Mathematics and Science in US High Schoolsrdquo Washington DC National Academies
Press
__________ 2012 A Framework for K-12 Science Education Practices Crosscutting
Concepts and Core Ideas Washington DC The National Academies Press
Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC
Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data
Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures
Version 10 Stanford University
Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic
Analysis amp Policy 4 (1) 1ndash30
18
Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The
Review of Economics and Statistics 86 (2) 497ndash513
Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)
Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of
Advanced High School Coursework in Increasing STEM Career Interestrdquo Science
Educator 23 (1) 1ndash13
Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework
in College Admission Decisionsrdquo College and University 82 (4) 7ndash14
Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan
Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific
Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo
Educational Measurement Forthcoming
Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where
it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor
Economics 35 (1) 67ndash147
Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An
Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732
Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual
differencesrdquo Personality and Individual Differences 21 (6) 971ndash986
Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of
Cross-Cultural Psychology 45 (5) 821ndash837
Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid
Growthrdquo The New York Times April 29
Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo
Liberal Education 94 (3) 38ndash43
The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo
Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo
Education Trust June 5
Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and
Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-
001) US Department of Education Washington DC National Center for Education
Statistics
Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13
Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate
US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the
Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced
Placement Testsrdquo Washington DC
Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of
Advanced Placementrdquo Progressive Policy Institute Washington DC
West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth
Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring
Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation
and Policy Analysis 38 (1) 148ndash170
Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity
of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482
19
Figure 1
Geographic Distribution of Participating Districts
20
Figure 2
Participating Districts Neighborhood Socioeconomic Status and School Test Scores
Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school
district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos
neighborhood defined as the first principal component factor score based on measures of median
income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed
household rate and unemployment rate Y-axis is the districtrsquos average test score in grade
equivalents based on the averaged spring math and English scores for students in grades 3-8 for
2009-2013 with the expected level of achievement standardized to zero The size of each circle
is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using
Statarsquos default settings and roughly shows the predicted test score as a function of the
neighborhoodrsquos SES
21
Figure 3
Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile
Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects
Corresponding OLS estimate shown by the dashed horizontal line Science skill has been
standardized to have a mean of 0 and SD of 1 for the full sample of participating students
Results are weighted by the inverse probability of completing the survey
22
Table 1
Participating Schools and Teachers Compared to Other US High Schools and High School
Science Teachers Panel A Schools Participating Others
Average Enrollment 1409 723
Free or Reduced-Price Lunch 0700 0438
Asian 0055 0050
Black 0349 0154
Hispanic 0410 0221
White 0164 0537
Adjusted Cohort Graduation Rate 0843 0802
District Instruction Expenditures Per Pupil $6561 $5636
District Student Services Expenditures Per Pupil $3787 $3385
Panel B Teachers Participating Others
Age Under 30 0407 0160
Age 30-49 0432 0553
Age 50 or over 0161 0287
Female 0630 0536
Hispanic or Latino 0111 0051
Race American Indian or Alaska Native 0000 0009
Race Asian American 0111 0041
Race Black 0111 0060
Race Native Hawaiian or other Pacific Islander 0000 0004
Race White 0778 0896
Years of Experience 103 132
Years of Experience lt=2 0290 0085
Years of Experience lt=5 0481 0234
Hold a Teaching Certificate 0926 0945
Undergraduate Major in STEM 0944 0747
Single Subject Credential in Science 0630 0823
Masterrsquos Degree or Higher 0356 0615
Previously Taught AP Course 0469 NA
Previously Taught AP IB or Honors Course 0796 NA
Number of Professional Development Trainings 309 NA
in the Past 5 years (0-5)
Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts
httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public
high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a
9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the
Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey
httpsncesedgovsurveyssass Others in Panel B refers to public and private high school
teachers in the US High school science teachers are defined as teachers of grades 9-12 whose
main teaching assignment is in the natural sciences
23
Table 2
TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics
(1) (2) (3) (4) (5) (6)
Full Sample Survey Sample
Pre-Treatment Characteristic
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Age as of October of 11th Grade 166 -003 -007 166 -001 -001
(002) (007) (003) (009)
[019] [035] [065] [094]
Math Exam Score 038 008 025 044 007 030
(004) (010) (005) (016)
[008] [002] [017] [006]
Reading Exam Score 029 010 018 036 009 017
(003) (012) (004) (017)
[000] [014] [002] [031]
HS Grade Point Average 316 005 020 323 006 013
(003) (008) (003) (010)
[014] [002] [006] [020]
Female 059 000 010 061 -001 011
(003) (006) (004) (007)
[099] [010] [073] [012]
Asian American 012 002 010 012 003 010
(002) (005) (001) (007)
[027] [006] [007] [012]
Black 032 -002 -006 027 000 -005
(002) (006) (002) (005)
[029] [028] [088] [040]
Hispanic Native American or Multiracial 031 001 005 033 001 005
24
(002) (006) (002) (007)
[055] [041] [081] [051]
Disabled 002 000 -001 001 000 -001
(001) (001) (001) (001)
[093] [024] [057] [05]
Gifted 013 003 000 014 002 001
(002) (005) (002) (009)
[006] [100] [025] [089]
English Language Learner 005 001 002 004 001 004
(001) (002) (001) (003)
[041] [039] [054] [022]
Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007
(002) (007) (003) (009)
[066] [077] [072] [045]
Language Other than English Spoken at Home 034 002 003 035 001 004
(002) (007) (002) (007)
[032] [073] [059] [056]
Took Recommended Prerequisite Courses 079 000 009 079 002 005
(002) (004) (002) (005)
[084] [004] [043] [031]
Number of Observations 1819 1417
Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by
School x Cohort are in parentheses and p-values are in brackets
25
Table 3
First Stage Impacts on AP Course Enrollment and Overall Course Enrollment
(1) (2) (3) (4) (5) (6)
Full Sample Survey Respondents
Outcome
Control
Group
Mean
ITT
LATE
Control
Group
Mean
ITT
LATE
AP Treatment Course Enrollment 019 038 024 039
(005) (006)
[000] [000] Share of Credits During Study Year in
AP Science 003 004 011 003 004 010
(001) (001) (001) (001)
[000] [000] [000] [000]
All AP 013 004 011 014 004 010
(001) (002) (001) (002)
[000] [000] [000] [000]
Other Advanced Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [023] [020] [020]
All Other Advanced 025 -001 -003 025 -001 -003
(001) (002) (001) (003)
[023] [023] [030] [030]
Regular Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [020] [024] [019]
All Regular 062 -003 -009 061 -003 -007
(001) (003) (001) (003)
[002] [000] [007] [003]
Number of Observations 1819 1417
Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating
Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation
(1) Course-taking information collected from student transcripts Control Group Mean uses the
full control group for the first outcome (ie AP Treatment Course Enrollment) and those control
group members who complied with their assignment (ie those who did not take the AP
Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are
weighted by the inverse probability of completing the survey Standard errors clustered by School
x Cohort are in parentheses and p-values are in brackets
26
Table 4
Treatment Contrast (Composite Variables)
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Academically Challenging Curriculum -033 031 080
(010) (024)
[000] [000]
Project-Based Independent Classroom
Activities -006 013 033
(007) (017)
[007] [006]
Integrated Use of Technology
-011 011 028
(008) (019)
[019] [014]
Number of Observations 1417
Notes To construct these composite variables we first converted the values on each component
variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest
category was set to 10 the lowest to 00 and the remaining categories evenly spaced between
00 and 10 We then averaged and standardized these converted values Results are weighted by
the inverse probability of completing the survey Online Appendix Table 5 provides the list of
component variables Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
27
Table 5
AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Science Skill -019 009 023
(006) (016)
[015] [014]
STEM Interest 062 004 009
(002) (007)
[016] [016]
Confidence in College
Science 092 -004 -010
(002) (005)
[011] [006]
Stress 012 007 017
(003) (007)
[002] [001]
Grades in Science Courses 280 -012 -029
(007) (016)
[008] [007]
Grades in Other Courses 314 -007 -018
(002) (006)
[000] [000]
Number of Observations 1819 for grades 1417 for other
outcomes
Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of
participating students STEM interest =1 if high or some interest in pursuing a STEM degree or
=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to
complete a college science course or =0 if somewhat not confident or not at all confident Stress=
1 if most recent science course had strong negative or negative impact on physical or emotional
health or =0 if strong positive impact positive impact or no impact Grades in science and other
courses are obtained from student transcripts and measure grades during the study year
Results with the exception of grades during study year are weighted by the inverse probability of
completing the survey Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
Table 6
Robustness Checks of Main ITT Results
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Outcome
Control
Group
Complie
r Mean
Main
Result
s
Robus
t SE
p-value
(permutatio
n test)
Excludin
g High
School
56
Including
Imputatio
n of
Missing
Outcome
Variables
Excluding
Covariate
s
Excludin
g High
School
23
Lee
Lower
Boun
d
Lee
Upper
Boun
d
95
Confidence
Interval
from Lee
Bounds
Rati
o of
95
CI in
(11)
to
95
CI in
(7)
Science Skill -019 009 010 011 020 007 003 039
-
009
05
1 20
(006) (005) (000) (000) (000) (000) (007) (007)
[015] [006] [006] [020] [011] [001] [024] [072] [000]
STEM Interest 062 004 005 003 003 003 002 012
-
003
01
8 19
(002) (003) (000) (000) (000) (000) (003) (004)
[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College
Science 092 -004 -003 -006 -006 -004 -006 005
-
009
01
0 20
(002) (002) (000) (000) (000) (000) (002) (003)
[011] [005] [007] [037] [002] [003] [010] [000] [017]
Stress 012 007 005 006 008 007 001 011
-
005
01
5 16
(003) (002) (000) (000) (000) (000) (003) (002)
[002] [000] [000] [014] [007] [002] [002] [079] [000]
Grades in Science Courses 280 -012 -006 -010 -007 |
(007) (004) (000) (000) (000)
[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts
Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey
(002) (003) (000) (000) (000) |
[000] [001] [001] [000] [001] [038]
Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than
standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby
a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of
43
the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)
reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the
experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply
imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and
where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)
from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those
treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and
control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to
derive confidence interval for the treatment effect itself)
30
1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the
Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the
effect of advanced high school courses more generally often without distinctions between AP
and other rigorous course options Nearly all of these nonexperimental studies find large positive
effects of rigorous secondary school courses particularly those in math and science on studentsrsquo
high school postsecondary and labor market performance (eg Altonji 1995 Attewell and
Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long
Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer
an AP course are encouraged (though not required) to attend a professional development
training The Board and other independent agencies offer several workshops with the most
extensive training being the AP summer institute a week-long training that is led by an
experienced AP instructor Teachers are then expected to develop their syllabi for the course and
submit them to the Board for review A team of auditors at the Board review each syllabus and
grant permission to a school to label the course as AP on course catalogs and student transcripts
once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they
do not meet the requirements upon original submission College Board (2017b) contains a
discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for
assessment (ie course delivery and student performance are not assessed by the Board) In
order to effectively run an AP Biology or Chemistry course teachers require access to a well-
equipped classroom and laboratory including all supplies necessary to engage in
experimentation (eg beakers solutions microscopes measuring equipment) Most of the
teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of
learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry
reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses
were based upon recommendations from the National Science Foundation the National Research
Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently
influenced by their frames of reference in ways that other assessments of these traits (eg
external observations) may be less influenced By increasing the standard to which they compare
themselves studentsrsquo confidence may decrease This feature of most self-assessments could be
considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et
al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome
depends to some extent on how these changes in perceived ability influence other behaviors
such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and
Biology I and Chemistry I for AP Biology with no additional requirements beyond these
prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week
training course classroom supplies (eg lab materials textbooks) and to compensate schools
for the staff time required for study administration efforts We also offered $1000 compensation
for an individual selected by the school to serve as a liaison between the study team and the
31
school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of
students which would have powered the study to detect effect sizes smaller than those detected
here We faced several challenges in recruiting schools to participate even with the monetary
incentives Some schools were uncomfortable with randomization across classrooms while
others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the
course would be offered We also made some assignments on a rolling basis as additional
consentassent forms were submitted We have no information on the students who were deemed
eligible by the school to take the new AP science course but who did not sign the consent form
to participate As these students did not participate we do not have permission to obtain
information on their characteristics (eg via transcripts) and for most schools we do not know
the number of such students 10 Participating districts include Anaheim Union High School District California East Side
Union High School District California Lynwood Unified School District California Jefferson
Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg
Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public
Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville
Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate
degrees held by teachers nationally are likely to be in education (not STEM) Thus the study
teachers are less likely to have a graduate degree but not necessarily less likely to have STEM
training We also did not survey teachers regarding their Teach for America (TFA) experience
but it is possible that the relatively high share of STEM undergraduate degrees could be driven
by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last
pilot test included 140 students) prior to administering the tool to study participants Reliability
metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of
the development of the assessment instrument in the survey can be found in Seeratan et al
(2017) 13 Each year in the spring semester our team administered and collected the participant surveys
during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However
if study participants who did not take the survey differ in unobserved ways then our reweighting
based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo
characteristics before imputation of missing values (as described below) these results are very
similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact
that some students did not have 10th grade scores) we created one reading and math score for
each student that is the average of both scores or just the 8th grade score For the 23 participating
students who were in 10th grade during the year in which the AP course was offered to their
cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be
endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of
enrollment in the course so the lack of balance is simply due to unlucky randomization rather
32
than manipulation by school administrators We considered implementing a randomized block
design to avoid such issues but found it infeasible to obtain the necessary test score information
prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was
allowed to register for the new class We added an entire planning year to our study design to
avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by
Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We
find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these
six outcomes which suggests that generalizing our estimated treatment effects to the full control
group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly
offered enrollment in an AP course and then given the option of Chemistry or Biology To
account for the two courses offered we treat the school as two separate groups School-
Chemistry and School-Biology For those students who were not offered an AP course we
randomly assign them to one of two control groups proportional to the number of treated
students who chose each course For example if 60 of the treated students chose Biology then
we randomly assign 60 of the control students to the School-Biology control group In Section
VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a
probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1
if student i in school by cohort j completed any part of the end-of-year survey Xi is the same
vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed
effects and Φ() is the cumulative normal distribution function The results of this regression are
included in Online Appendix Table 2 Students who had higher pre-treatment grades Black
students those who were not disabled and those who took prerequisite courses were more likely
to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives
more weight in the regression to study participants who completed the survey and yet had pre-
study characteristics that were similar to those study participants who did not complete the
survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and
with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we
observe each outcome variable This follows a multiple imputation then deletion strategy
suggested by Hippel (2007) which improves efficiency while protecting against problematic
imputed outcome values As a robustness check Section VC provides results including
imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually
challenging curriculum with more homework than non-AP complier students Treatment group
students are also more likely to report that the students in their class were driven to succeed and
that the teacher set high standards The AP science class also involved more student-led projects
or experiments hands on learning and small group work all activities that are deemed to be
essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)
Yet we do not find strong evidence that students in AP classes were more likely to present what
they learned apply their knowledge to solve a new problem or work independently and none of
the component measures of technology usage were statistically significantly affected Nor did
33
treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear
better able to implement the academic rigor expected of an AP science class than some of the
inquiry-based approaches that the College Board intends for AP science We do not find
evidence that taking AP science led students to be more likely to report that they found their
course more interesting which may reflect the inability of the teachers to fully implement a
creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects
that might render our estimated effects smaller A research design with randomization both
across and within schools would allow for estimation of spillover effects but such a design was
infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers
received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the
weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors
in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple
outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons
(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same
three outcomes that reach statistical significance without applying the correction (shown in
Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys
from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we
have included the students from cohort 1 of high school number 23 where nonresponse was due
mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes
(including student prior academic preparation raceethnicity gender and teacher preparation)
We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in
science and grades in other courses) Some of the differences in the point estimates were quite
large yet so too were the standard errors For instance five of the seven estimated differential
treatment effects on science skill exceed 025 standard deviations with p-values that fall in the
suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse
on all three cohorts of study participants Once data collection is complete we will have the
ability to examine the effect of AP science on college enrollment college selectivity and college
completion
10
IV Empirical Strategy
We estimate the effect of taking the AP science course with a standard instrumental variable
specification
(1) 119884119894119895 = 120572119895 + 119860119894119895120573 + 119935119894120574 + 120598119894119895
(2) 119860119875119894119895 = 120575119895 + 119874119891119891119890119903119890119889119894119895120579 + 119935119894120583 + 120598119894119895
where 119860119875119894119895 = 1 if student i enrolled in the AP science course in school x cohort stratum j 119860119894119895 is
the fitted value based on the estimates of the parameters in Equation (2) Offeredij = 1 if the
student is randomized into the treatment group Xi is a vector of pre-treatment covariates
(including age math and reading exam scores from 8th and 10th grade (standardized and
averaged for math and reading separately) cumulative GPA prior to the year when the AP
science course was offered and indicator variables for female racial group (Asian American
Black or Hispanic Native American or Multiracial) disability gifted English Language
Learner eligible for free or reduced-price lunch home language is not English and took
recommended prerequisite courses) and 120572119895 and 120575119895 are school by cohort fixed effects19 We use
two-stage least squares to estimate the model for all outcomes The local average treatment effect
(LATE) estimate is given by β
The intent to treat (ITT) estimate is obtained by replacing 119860119894119895 with Offeredij in Equation (1)
as shown in Equation (3) The coefficient on Offeredij in Equation (3) provides the effect of
being offered enrollment in the new AP science course and is a weighted average of effects on
those who do and do not choose to enroll in the course
(3) 119884119894119895 = 120577119895 + 119874119891119891119890119903119890119889119894119895120591 + 119935119894120582 + 120598119894119895
For outcomes that are obtained from the survey we weight regressions by the inverse of the
estimated probability of completing the survey20 The results are similar without using these
weights (see Online Appendix Tables 3 4 and 6) Since we have some missingness in student
characteristics as a result of either missing student transcripts or certain data elements not
collected by the district we use multiple imputation by chained equations creating 10 imputed
datasets and combine the results21 For inference we cluster standard errors at the level of
treatment assignment (school by cohort) in our analysis of main effects In the analysis of
robustness we report permutation standard errors robust standard errors (for comparison to
permutations) and the statistical significance of the LATE estimates after adjusting our tests of
significance for multiple comparisons
V Results
A Course-Taking and Treatment Contrast
Table 3 provides estimated effects of the randomized offer of enrollment on AP science course
enrollment and share of credits in all courses for the full sample and the survey samples The
first-stage estimates indicate that the offer substantially increased the likelihood of the student
taking the AP science course by 38 percentage points in the full sample and 39 percentage points
in the survey sample As we expected compliance with randomization was imperfect with 42
11
percent of the students who received an offer choosing not to enroll and 19 percent of the control
students enrolling Nearly all of these latter crossovers reflected decisions by the district to
violate the study protocol and let control group students into the course while a few of these
came from hardship exemptions that were requested by the school and granted by the study team
The remaining rows in Table 3 shine light on the courses that were crowded out by the newly
offered AP science course Mechanically treatment group students took more credits in AP
science (an 11-percentage point increase in the share of total credits in the full sample)
Treatment group studentsrsquo share of courses in any AP also increased by 11 percentage points
indicating that they chose not to reduce enrollment in other AP courses Instead taking AP
science appears to have crowded out regular courses (down 9 percentage points) including
regular science courses (down 2 percentage points)22
Approximately 78 percent of the control group compliers took any science course with 34
percent taking a non-AP advanced science course (almost entirely honors courses) during the
study year The control students who did not take AP Biology or Chemistry took a variety of
alternative science courses with the most commonly reported courses including Chemistry
(13) Physics (12) AP Environmental Science (11) Biology (10) Honors Biology (9)
and AnatomyPhysiology (9)
Table 4 provides the contrast in treatment and control group complier reports on the content
and rigor of their science courses for three composite variables We find that taking AP science
yielded a substantially more academically challenging curriculum (up 080 sd p-value lt 001)
and raised the extent of inquiry-based classroom activities (up 033 sd p-value = 006) Our
results also suggest that AP course-takerrsquos classrooms were more likely to use technology (up
028 sd p-value = 014)23 Online Appendix Table 5 shows estimated impacts on each of the
component variables used in constructing the composite variables We find that while AP
classrooms were more inquiry-based than other science classrooms using our composite
measure some of the core components of the inquiry approach that were intended by the Board
(eg applying knowledge to solve a new problem) were not more prevalent in AP science
classes than other science classes24 This contrast between studentsrsquo reports of the content and
rigor of their AP science course relative to other courses available to them offers one measure of
the relative quality of the treatment In a companion manuscript we provide a detailed evaluation
of implementation fidelity (the degree to which the courses were implemented as intended by the
Board) through teacher surveys course syllabi student transcripts and interviews with teachers
and school administrators (Long Conger and McGhee 2018) In that manuscript we find results
that are consistent with the finding that most teachers were able to implement a rigorous AP
science classroom yet they also struggled with the inquiry-based approach and integrating
technology into the classroom
These reported differences between treatment and control group classrooms also hold despite
the fact that many of the teachers selected to teach AP also teach the other science courses taken
by control group students In fact almost 67 percent of AP teachers reported using some of their
AP science strategies and lessons in their non-AP classes These within-school spillovers likely
attenuate observed differences in outcomes between treatment and control group students in the
same school25
B AP Impact on Outcomes
Table 5 reports estimated impacts of AP science on the key outcomes of interest We estimate
that for the typical complier taking AP science raises objectively measured scientific inquiry
skills by 023 standard deviations We are unable to rule out zero treatment impacts with
12
conventionally high levels of confidence (p-value = 014) and consequently refer to these results
as more suggestive than definitive AP science also increased compliersrsquo interest in pursuing a
STEM degree should they enroll in college by 9 percentage points up from a control group
complier mean of 62 percent with again more suggestive than definitive results at traditional
levels of statistical inference (p-value = 016)
Table 5 provides stronger evidence of negative treatment effects on studentsrsquo confidence in
their ability to succeed in a college science course Among control group compliers 92 percent
express that they are at least somewhat confident in their ability to succeed in a college science
course These high levels of confidence are perhaps not surprising since all of our sample
participants demonstrated interest in taking AP Chemistry or Biology as a result of signing the
study assent forms Taking AP science substantially lowered participantsrsquo likelihood of being at
least somewhat confident in their ability to complete college courses in science (down 10
percentage points p-value = 006) We also find large effects of the AP course on studentsrsquo self-
reported stress levels Among control group compliers 12 percent stated that their most recent
science class had a negative or strong negative impact on their stress levels (where a negative
impact indicates more stress) Taking AP science more than doubles this rate raising the
likelihood of stating a negative impact by 17 percentage points (p-value = 001) In results
available from the authors we also examine the effect of taking AP on the full distribution of
studentrsquos self-reported confidence and stress levels We find that taking AP science increases
studentsrsquo likelihood of reporting strong negative impacts on stress by 5 percentage points (p-
value = 005) above the control group complier mean of 2 percent
In addition to experiencing a loss in confidence and an increase in stress treatment group
studentsrsquo grades suffered We estimate that taking AP science reduced studentsrsquo grades in their
science courses by 029 points (p-value = 007) Relative to a control group complier mean of
280 taking AP science lowers studentsrsquo science GPAs during the study year (usually their junior
year) from around a B- to a C+26 This decline is addressed to some degree by high schools that
use a weighted grade point average to upweight grades from AP courses The last row of Table 5
provides our estimated effects of AP science on studentsrsquo grades in other courses AP science
takers score approximately 018 grade points lower than control group compliers in non-science
courses during the study year (p-value below 001) These results suggest that students may be
shifting their effort away from their non-AP classes in order to meet the demands of the
challenging AP course An average of these impacts weighted by studentsrsquo share of credits in
science during the study year assuming that they take AP science (024) suggests that taking AP
science lowers studentsrsquo overall grades by 021 during the year ((-029 times 024) + (-018 times
076))
With our estimates in hand we can easily compute the adjustment that would leave the
studentrsquos GPA during the study year unaffected For students who took AP Biology or Chemistry
as result of this experiment the share of their classes in any AP science subject is predicted to be
14 percent (ie 002 + 012 from Table 3) If these studentsrsquo grades in AP science courses were
boosted by 146 (021014) their GPAs during the study year would be unaffected by their
enrollment in these AP courses This 146 boost is close to the higher end of the practices
documented in Klopfenstein and Lively (2016)27
C Robustness Checks
Table 6 presents a variety of robustness checks of the ITT estimates on our six main outcomes
The first two columns of this table repeat the findings previously shown in Table 5 Columns (3)
and (4) present alternate methods for inference Column (3) reports robust standard errors and
13
Column (4) reports the results of a permutation test where we randomly assign a pseudo
treatment and compute the share of 1000 permutations where the absolute value of the estimated
pseudo treatment effect exceeds the absolute value of the estimated treatment effect shown in
Column (2)28 The resulting p-values from this permutation test are similar to the results using
robust standard errors (shown in Column (3)) resulting in five of the six outcomes with p-values
of less than 01029
Columns (5) through (7) of Table 6 show that the results are robust to (a) dropping the one
high school that offered both AP Biology and AP Chemistry as part of the study (b) including
observations with multiply-imputed missing outcome variables and (c) excluding the high
school with the lowest survey response rate30 Column (8) shows the results when we exclude all
of the Xi covariates where we find much larger estimated positive effects on scientific inquiry
skills and smaller estimated negative effects on grades The differences in the treatment effects
on the remaining three outcomes are modest These results likely reflect the fact that students
who were randomly assigned into the treatment group have higher pre-treatment grades and
reading and math test scores all covariates that strongly correlate with science skill and future
grades
Columns (9) through (12) of Table 6 use the Lee (2009) method to place bounds on our
estimates due to potential nonresponse bias in the student survey used for the first four outcomes
This method trims particular observations from the treatment group (in this case) until it matches
the response rate of the control group The lower (upper) bound estimate trims the treatment
observations with the highest (lowest) values of the outcome Using these lower and upper bound
estimates we compute the 95 percent confidence interval for the treatment effect itself by
applying the Imbens and Manski (2004) method Consistent with our main findings the upper
and lower bound points estimates are positive for science skill (003 and 039 sd) interest in
pursuing a STEM degree (2 and 12 percentage points) and stress (1 and 11 percentage points)
However the 95 percent confidence intervals overlap zero in all cases and are roughly double the
size of the ordinary confidence intervals These results suggest that some additional caution
should be considered in evaluating the effects from outcomes based on the study survey31
Finally we would have liked to report the results of theoretically motivated heterogeneity
analyses yet we lack the statistical power needed to test heterogeneity with a high level of
confidence For example Figure 3 shows a quantile regression conditional on Xi with science
skill as the outcome We find that the point estimates at every quantile are insignificantly
different from the 009 ITT point estimate reported in Table 5 yet the 95 confidence intervals
fail to rule out large positives and negatives Additional heterogeneity results can be found in the
Online Appendix32
VI Conclusion
Most admissions committees at bachelorrsquos degree-granting institutions rely on applicantsrsquo AP
course and exam participation as signals of subject-matter skill and interest rendering the
relationship between AP uptake and college enrollment somewhat deterministic There has been
almost no empirical work to support the theory that AP disproportionately endows high school
students with greater human capital than the other courses available to them Many students
educators and parents have also complained that the rigor of the AP pro- gram causes students to
lose confidence gain stress and perform poorly in other courses We evaluate these claims with
experimental evidence on the impact of AP Biology and Chemistry courses on studentsrsquo skills
14
interests and beliefs We recruited 23 schools that had not previously offered AP Biology or
Chemistry and were willing to permit us to randomize student access to the newly offered
course At the time of our school recruitment an estimated 50 percent of US high schools
already offered AP science classes and they tended to be in relatively higher-income
communities disproportionately serving White students (Malkus 2016) Our study drew from the
remaining population of schools where teachers had lower levels of training than science
teachers nationally and students were disproportionately non-White and poor Consequently our
results on AP impacts best generalize to schools like these that are on the cusp of deciding
whether to offer an AP science course
The estimates suggest that AP science led to improvements in science skill and STEM
interest above the courses that these students would otherwise take Prior research points to
longer-run benefits of AP including a higher likelihood of college enrollment and completion as
well as possible earnings gains (Jackson 2010 2014) Our findings suggest that these long-term
effects are at least partially driven by genuine increases in skill and not due solely to
postsecondary admissions and credit-granting policies33 We also find that AP science classes
substantially increase studentsrsquo stress levels and reduce their confidence in completing a college
science course Students who take AP science also receive lower grades in science and in other
(non-science) courses The cognitive gains from AP science are consistent with evidence that
higher levels of pressure and a lower level of confidence cause students to learn more than they
would otherwise And some of the negative effect on grades can be offset by upwardly weighting
grades in advanced courses
Although we have no direct way to convert our study impacts into monetary values for
students or society our evidence suggests that schools and districts are not making unwise or
costly investments in AP Calculating the differential cost to deliver an AP course versus another
level course in the same subject is difficult given that few schools document per-course
expenditures One recent analysis of a US district that relied on teacher salaries and course
assignments offers a partial cost-analysis Roza (2009) finds approximately $360 more in per-
pupil expenditures to deliver AP versus honors due primarily to smaller class sizes and more
senior teachers in AP This cost does not factor in the time that teachers spend retraining
themselves to teach the new curriculum At the same time relative to other policies aimed at
increasing human capital in high school that are often more costly to implement (such as
reducing class size) offering an AP course may be one of the least expensive options
This study offers the first credible estimates on the impact of a curriculum that is now offered
in the majority of the nationrsquos high schools and used by most postsecondary institutions to assess
applicant potential Our findings offer evidence to support and refute some of the claims made
about the AP program At the same time many important questions remain about differential AP
course impacts along student teacher and school attributes and on different parts of the outcome
distributions What are the general equilibrium effects of AP expansion for instance on college
admissions decisions as AP expands into schools with fewer resources Do AP courses generate
spillover effects on non-AP course-takers via changes in peer interactions and changes in how
teachers teach their non-AP classes These are all questions that warrant further research
15
References
Abadie Alberto Susan Athey Guido W Imbens and Jeffrey Wooldridge 2017 ldquoWhen Should
you Adjust Standard Errors for Clusteringrdquo NBER Working Paper No 24003
Cambridge MA NBER
Adelman Clifford 2006 The Toolbox Revisited Paths to Degree Completion from High School
Through College Washington DC US Department of Education
Aguilar Lauren Greg Walton and Carl Wieman 2014 ldquoPsychological Insights for Improved
Physics Teachingrdquo Physics Today 67 (5) 43ndash49
Altonji Joseph G 1995 ldquoThe Effects of High School Curriculum on Education and Labor
Market Outcomesrdquo The Journal of Human Resources 30 (3) 409ndash438
Anderson Carl R 1976 ldquoCoping Behaviors as Intervening Mechanisms in the Inverted-U-stress-
performance Relationshiprdquo Journal of Applied Psychology 61 (1) 30ndash34
Attewell Paul and Thurston Domina 2008 ldquoRaising the Bar Curricular Intensity and
Academic Performancerdquo Educational Evaluation and Policy Analysis 30 (1) 51ndash71
Avery Christopher Oded Gurantz Michael Hurwitz and Jonathan Smith 2018 ldquoShifting
College Majors in Response to Advanced Placement Exam Scoresrdquo Journal of Human
Resources 53 (4) 918ndash956
Benjamini Yoav and Yosef Hochberg 1995 ldquoControlling the False Discovery Rate A Practical
and Powerful Approach to Multiple Testingrdquo Journal of the Royal Statistical Society 57
(1) 289ndash300
Bennett J S Hogarth F Lubben B Campbell and A Robinson 2010 ldquoTalking Science The
Research Evidence on the Use of Small Group Discussions in Science Teachingrdquo
International Journal of Science Education 32 (1) 69ndash95
Berger Joe 2006 ldquoDemoting Advanced Placementrdquo The New York Times October 4
Boekaerts Monique and Jeroen S Rozendaal 2010 ldquoUsing Multiple Calibration Indices in
Order to Capture the Complex Picture of What Affects Studentsrsquo Accuracy of Feeling of
Confidencerdquo Learning and Instruction 20 (5) 372ndash382
Bound John Brad Hershbein and Bridget Terry Long 2009 ldquoPlaying the Admissions Game
Student Reactions to Increasing College Competitionrdquo The Journal of Economic
Perspectives 23 (4) 119ndash146
Bowie Liz 2013 ldquoMaryland Schools have been Leader in Advanced Placement but Results are
Mixedrdquo The Baltimore Sun August 17
Bush George W 2006 ldquoState of the Union Address by the Presidentrdquo Washington DC The
White House
Chiu Ming Ming and Robert M Klassen 2010 ldquoRelations of Mathematics Self-Concept and its
Calibration with Mathematics Achievement Cultural Differences among Fifteen-year-
olds in 34 Countriesrdquo Learning and Instruction 20 (1) 2ndash17
Clotfelter Charles T Helen F Ladd and Jacob L Vigdor 2010 ldquoTeacher Credentials and
Student Achievement in High School Across-Subject Analysis with Student Fixed
Effectsrdquo Journal of Human Resources 45 (3) 655ndash681
College Board 2002 Equity Policy Statement New York NY
__________ 2011a AP Biology Curriculum Framework 2012-2013 New York NY
__________ 2011b AP Chemistry Curriculum Framework 2013-2014 New York NY
__________ 2017a AP Course and Exam Redesign New York NY
__________ 2017b AP Course Audit New York NY
__________ 2018 AP Program Participation and Performance Data 2018 New York NY
16
Davis Jennifer R 2014 ldquoA Little Goes a Long Way Pressure for College Students to Succeedrdquo
Journal of Undergraduate Research 12 (1) 1ndash9
Dobbie Will and Roland G Fryer Jr 2015 ldquoThe medium-term impacts of high-achieving
charter schoolsrdquo Journal of Political Economy 123 (5) 985ndash1037
Dougherty Chrys and Lynn Mellor 2009 ldquoPreparation Mattersrdquo National Center for
Educational Achievement Washington DC
Dounay Zinth Jennifer 2016 ldquo50-State Comparison Advanced Placement Policiesrdquo Education
Commission of the States
Drew Christopher 2011 ldquoRethinking Advanced Placementrdquo The New York Times January 7
Duffett Ann and Steve Farkas 2009 ldquoGrowing Pains in the Advanced Placement Program Do
Tough Trade-offs Lie Aheadrdquo Thomas B Fordham Institute Washington DC
Ellis Jessica Bailey K Fosdick and Chris Rasmussen 2016 ldquoWomen 15 Times More Likely to
Leave STEM Pipeline after Calculus Compared to Men Lack of Mathematical
Confidence a Potential Culpritrdquo PLOS ONE 11 (7) 1ndash14
Foust Regan Clark Holly Hertberg-Davis and Carolyn M Callahan 2009 ldquoStudentsrsquo
Perceptions of the Non-academic Advantages and Disadvantages of Participation in
Advanced Placement Courses and International Baccalaureate Programsrdquo Adolescence
44 (174) 289ndash312
Geiser Saul and Veronica Santelices 2004 ldquoThe Role of Advanced Placement and Honors
Courses in College Admissionsrdquo Center for Studies in Higher Education Research
Occasional Paper Series CSHE404
Goodman Joshua Samuel 2012 ldquoThe Labor of Division Returns to Compulsory Math
Courseworkrdquo Unpublished Manuscript
Harel O 2009 ldquoThe Estimation of R-squared and Adjusted R-squared in Incomplete Data Sets
Using Multiple Imputationrdquo Journal of Applied Statistics 36 (10) 1109ndash1118
Hippel Paul T von 2007 ldquoRegression with Missing Ys An Improved Strategy for Analyzing
Multiply Imputed Datardquo Sociological Methodology 37 (1) 83ndash117
Holstead Michael S Terry E Spradlin Margaret E McGillivray and Nathan Burroughs 2010
ldquoThe Impact of Advanced Placement Incentive Programsrdquo Center for Evaluation and
Education Policy Indiana University Education Policy Brief 8(1)
Hopkins Katy 2012 ldquoWeigh the Benefits Stress of AP Courses for Your Studentrdquo US News
the World Report May 10
Huber Martin 2013 ldquoA Simple Test for the Ignorability of Non-compliance in Experimentsrdquo
Economics Letters 120 (3) 389ndash391
Imbens G and F Manski 2004 ldquoConfidence Intervals for Partially Identified Parametersrdquo
Econometrica 72 (6) 1845ndash1857
Jackson C Kirabo 2010 ldquoA Little Now for a Lot Later A Look at a Texas Advanced Placement
Incentive Programrdquo Journal of Human Resources 45 (3) 591ndash639
__________ 2014 ldquoDo College-Preparatory Programs Improve Long-Term Outcomesrdquo
Economic Inquiry 52 (1) 72ndash99
Joensen Juanna Schroslashter and Helena Skyt Nielsen 2009 ldquoIs there a Causal Effect of High
School Math on Labor Market Outcomesrdquo Journal of Human Resources 44 (1) 171ndash
198
Kim Emily 2015 ldquoAP Classes often Translate to Advanced Pressurerdquo Los Angeles Times
September 22
Klopfenstein Kristin and Kit Lively 2016 ldquoDo Grade Weights Promote More Advanced
17
Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324
Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement
Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891
__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and
Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds
Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188
Cambridge Harvard Education Press
Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla
Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)
287ndash 313
Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on
Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102
Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations
of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347
(6219) 262ndash265
Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math
and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic
Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student
STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher
Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking
on Secondary and Postsecondary Successrdquo American Educational Research Journal 49
(2) 285ndash322
Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP
Expansion Can Schools in Less-Resourced Communities Successfully Implement
Advanced Placement Science Coursesrdquo Conditionally accepted by Educational
Researcher
Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo
American Enterprise Institute Washington DC
Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23
McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy
Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of
Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-
144) US Department of Education Washington DC National Center for Education
Statistics
National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of
Mathematics and Science in US High Schoolsrdquo Washington DC National Academies
Press
__________ 2012 A Framework for K-12 Science Education Practices Crosscutting
Concepts and Core Ideas Washington DC The National Academies Press
Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC
Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data
Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures
Version 10 Stanford University
Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic
Analysis amp Policy 4 (1) 1ndash30
18
Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The
Review of Economics and Statistics 86 (2) 497ndash513
Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)
Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of
Advanced High School Coursework in Increasing STEM Career Interestrdquo Science
Educator 23 (1) 1ndash13
Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework
in College Admission Decisionsrdquo College and University 82 (4) 7ndash14
Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan
Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific
Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo
Educational Measurement Forthcoming
Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where
it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor
Economics 35 (1) 67ndash147
Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An
Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732
Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual
differencesrdquo Personality and Individual Differences 21 (6) 971ndash986
Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of
Cross-Cultural Psychology 45 (5) 821ndash837
Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid
Growthrdquo The New York Times April 29
Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo
Liberal Education 94 (3) 38ndash43
The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo
Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo
Education Trust June 5
Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and
Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-
001) US Department of Education Washington DC National Center for Education
Statistics
Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13
Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate
US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the
Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced
Placement Testsrdquo Washington DC
Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of
Advanced Placementrdquo Progressive Policy Institute Washington DC
West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth
Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring
Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation
and Policy Analysis 38 (1) 148ndash170
Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity
of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482
19
Figure 1
Geographic Distribution of Participating Districts
20
Figure 2
Participating Districts Neighborhood Socioeconomic Status and School Test Scores
Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school
district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos
neighborhood defined as the first principal component factor score based on measures of median
income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed
household rate and unemployment rate Y-axis is the districtrsquos average test score in grade
equivalents based on the averaged spring math and English scores for students in grades 3-8 for
2009-2013 with the expected level of achievement standardized to zero The size of each circle
is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using
Statarsquos default settings and roughly shows the predicted test score as a function of the
neighborhoodrsquos SES
21
Figure 3
Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile
Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects
Corresponding OLS estimate shown by the dashed horizontal line Science skill has been
standardized to have a mean of 0 and SD of 1 for the full sample of participating students
Results are weighted by the inverse probability of completing the survey
22
Table 1
Participating Schools and Teachers Compared to Other US High Schools and High School
Science Teachers Panel A Schools Participating Others
Average Enrollment 1409 723
Free or Reduced-Price Lunch 0700 0438
Asian 0055 0050
Black 0349 0154
Hispanic 0410 0221
White 0164 0537
Adjusted Cohort Graduation Rate 0843 0802
District Instruction Expenditures Per Pupil $6561 $5636
District Student Services Expenditures Per Pupil $3787 $3385
Panel B Teachers Participating Others
Age Under 30 0407 0160
Age 30-49 0432 0553
Age 50 or over 0161 0287
Female 0630 0536
Hispanic or Latino 0111 0051
Race American Indian or Alaska Native 0000 0009
Race Asian American 0111 0041
Race Black 0111 0060
Race Native Hawaiian or other Pacific Islander 0000 0004
Race White 0778 0896
Years of Experience 103 132
Years of Experience lt=2 0290 0085
Years of Experience lt=5 0481 0234
Hold a Teaching Certificate 0926 0945
Undergraduate Major in STEM 0944 0747
Single Subject Credential in Science 0630 0823
Masterrsquos Degree or Higher 0356 0615
Previously Taught AP Course 0469 NA
Previously Taught AP IB or Honors Course 0796 NA
Number of Professional Development Trainings 309 NA
in the Past 5 years (0-5)
Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts
httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public
high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a
9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the
Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey
httpsncesedgovsurveyssass Others in Panel B refers to public and private high school
teachers in the US High school science teachers are defined as teachers of grades 9-12 whose
main teaching assignment is in the natural sciences
23
Table 2
TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics
(1) (2) (3) (4) (5) (6)
Full Sample Survey Sample
Pre-Treatment Characteristic
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Age as of October of 11th Grade 166 -003 -007 166 -001 -001
(002) (007) (003) (009)
[019] [035] [065] [094]
Math Exam Score 038 008 025 044 007 030
(004) (010) (005) (016)
[008] [002] [017] [006]
Reading Exam Score 029 010 018 036 009 017
(003) (012) (004) (017)
[000] [014] [002] [031]
HS Grade Point Average 316 005 020 323 006 013
(003) (008) (003) (010)
[014] [002] [006] [020]
Female 059 000 010 061 -001 011
(003) (006) (004) (007)
[099] [010] [073] [012]
Asian American 012 002 010 012 003 010
(002) (005) (001) (007)
[027] [006] [007] [012]
Black 032 -002 -006 027 000 -005
(002) (006) (002) (005)
[029] [028] [088] [040]
Hispanic Native American or Multiracial 031 001 005 033 001 005
24
(002) (006) (002) (007)
[055] [041] [081] [051]
Disabled 002 000 -001 001 000 -001
(001) (001) (001) (001)
[093] [024] [057] [05]
Gifted 013 003 000 014 002 001
(002) (005) (002) (009)
[006] [100] [025] [089]
English Language Learner 005 001 002 004 001 004
(001) (002) (001) (003)
[041] [039] [054] [022]
Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007
(002) (007) (003) (009)
[066] [077] [072] [045]
Language Other than English Spoken at Home 034 002 003 035 001 004
(002) (007) (002) (007)
[032] [073] [059] [056]
Took Recommended Prerequisite Courses 079 000 009 079 002 005
(002) (004) (002) (005)
[084] [004] [043] [031]
Number of Observations 1819 1417
Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by
School x Cohort are in parentheses and p-values are in brackets
25
Table 3
First Stage Impacts on AP Course Enrollment and Overall Course Enrollment
(1) (2) (3) (4) (5) (6)
Full Sample Survey Respondents
Outcome
Control
Group
Mean
ITT
LATE
Control
Group
Mean
ITT
LATE
AP Treatment Course Enrollment 019 038 024 039
(005) (006)
[000] [000] Share of Credits During Study Year in
AP Science 003 004 011 003 004 010
(001) (001) (001) (001)
[000] [000] [000] [000]
All AP 013 004 011 014 004 010
(001) (002) (001) (002)
[000] [000] [000] [000]
Other Advanced Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [023] [020] [020]
All Other Advanced 025 -001 -003 025 -001 -003
(001) (002) (001) (003)
[023] [023] [030] [030]
Regular Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [020] [024] [019]
All Regular 062 -003 -009 061 -003 -007
(001) (003) (001) (003)
[002] [000] [007] [003]
Number of Observations 1819 1417
Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating
Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation
(1) Course-taking information collected from student transcripts Control Group Mean uses the
full control group for the first outcome (ie AP Treatment Course Enrollment) and those control
group members who complied with their assignment (ie those who did not take the AP
Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are
weighted by the inverse probability of completing the survey Standard errors clustered by School
x Cohort are in parentheses and p-values are in brackets
26
Table 4
Treatment Contrast (Composite Variables)
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Academically Challenging Curriculum -033 031 080
(010) (024)
[000] [000]
Project-Based Independent Classroom
Activities -006 013 033
(007) (017)
[007] [006]
Integrated Use of Technology
-011 011 028
(008) (019)
[019] [014]
Number of Observations 1417
Notes To construct these composite variables we first converted the values on each component
variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest
category was set to 10 the lowest to 00 and the remaining categories evenly spaced between
00 and 10 We then averaged and standardized these converted values Results are weighted by
the inverse probability of completing the survey Online Appendix Table 5 provides the list of
component variables Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
27
Table 5
AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Science Skill -019 009 023
(006) (016)
[015] [014]
STEM Interest 062 004 009
(002) (007)
[016] [016]
Confidence in College
Science 092 -004 -010
(002) (005)
[011] [006]
Stress 012 007 017
(003) (007)
[002] [001]
Grades in Science Courses 280 -012 -029
(007) (016)
[008] [007]
Grades in Other Courses 314 -007 -018
(002) (006)
[000] [000]
Number of Observations 1819 for grades 1417 for other
outcomes
Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of
participating students STEM interest =1 if high or some interest in pursuing a STEM degree or
=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to
complete a college science course or =0 if somewhat not confident or not at all confident Stress=
1 if most recent science course had strong negative or negative impact on physical or emotional
health or =0 if strong positive impact positive impact or no impact Grades in science and other
courses are obtained from student transcripts and measure grades during the study year
Results with the exception of grades during study year are weighted by the inverse probability of
completing the survey Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
Table 6
Robustness Checks of Main ITT Results
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Outcome
Control
Group
Complie
r Mean
Main
Result
s
Robus
t SE
p-value
(permutatio
n test)
Excludin
g High
School
56
Including
Imputatio
n of
Missing
Outcome
Variables
Excluding
Covariate
s
Excludin
g High
School
23
Lee
Lower
Boun
d
Lee
Upper
Boun
d
95
Confidence
Interval
from Lee
Bounds
Rati
o of
95
CI in
(11)
to
95
CI in
(7)
Science Skill -019 009 010 011 020 007 003 039
-
009
05
1 20
(006) (005) (000) (000) (000) (000) (007) (007)
[015] [006] [006] [020] [011] [001] [024] [072] [000]
STEM Interest 062 004 005 003 003 003 002 012
-
003
01
8 19
(002) (003) (000) (000) (000) (000) (003) (004)
[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College
Science 092 -004 -003 -006 -006 -004 -006 005
-
009
01
0 20
(002) (002) (000) (000) (000) (000) (002) (003)
[011] [005] [007] [037] [002] [003] [010] [000] [017]
Stress 012 007 005 006 008 007 001 011
-
005
01
5 16
(003) (002) (000) (000) (000) (000) (003) (002)
[002] [000] [000] [014] [007] [002] [002] [079] [000]
Grades in Science Courses 280 -012 -006 -010 -007 |
(007) (004) (000) (000) (000)
[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts
Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey
(002) (003) (000) (000) (000) |
[000] [001] [001] [000] [001] [038]
Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than
standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby
a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of
43
the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)
reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the
experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply
imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and
where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)
from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those
treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and
control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to
derive confidence interval for the treatment effect itself)
30
1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the
Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the
effect of advanced high school courses more generally often without distinctions between AP
and other rigorous course options Nearly all of these nonexperimental studies find large positive
effects of rigorous secondary school courses particularly those in math and science on studentsrsquo
high school postsecondary and labor market performance (eg Altonji 1995 Attewell and
Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long
Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer
an AP course are encouraged (though not required) to attend a professional development
training The Board and other independent agencies offer several workshops with the most
extensive training being the AP summer institute a week-long training that is led by an
experienced AP instructor Teachers are then expected to develop their syllabi for the course and
submit them to the Board for review A team of auditors at the Board review each syllabus and
grant permission to a school to label the course as AP on course catalogs and student transcripts
once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they
do not meet the requirements upon original submission College Board (2017b) contains a
discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for
assessment (ie course delivery and student performance are not assessed by the Board) In
order to effectively run an AP Biology or Chemistry course teachers require access to a well-
equipped classroom and laboratory including all supplies necessary to engage in
experimentation (eg beakers solutions microscopes measuring equipment) Most of the
teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of
learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry
reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses
were based upon recommendations from the National Science Foundation the National Research
Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently
influenced by their frames of reference in ways that other assessments of these traits (eg
external observations) may be less influenced By increasing the standard to which they compare
themselves studentsrsquo confidence may decrease This feature of most self-assessments could be
considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et
al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome
depends to some extent on how these changes in perceived ability influence other behaviors
such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and
Biology I and Chemistry I for AP Biology with no additional requirements beyond these
prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week
training course classroom supplies (eg lab materials textbooks) and to compensate schools
for the staff time required for study administration efforts We also offered $1000 compensation
for an individual selected by the school to serve as a liaison between the study team and the
31
school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of
students which would have powered the study to detect effect sizes smaller than those detected
here We faced several challenges in recruiting schools to participate even with the monetary
incentives Some schools were uncomfortable with randomization across classrooms while
others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the
course would be offered We also made some assignments on a rolling basis as additional
consentassent forms were submitted We have no information on the students who were deemed
eligible by the school to take the new AP science course but who did not sign the consent form
to participate As these students did not participate we do not have permission to obtain
information on their characteristics (eg via transcripts) and for most schools we do not know
the number of such students 10 Participating districts include Anaheim Union High School District California East Side
Union High School District California Lynwood Unified School District California Jefferson
Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg
Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public
Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville
Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate
degrees held by teachers nationally are likely to be in education (not STEM) Thus the study
teachers are less likely to have a graduate degree but not necessarily less likely to have STEM
training We also did not survey teachers regarding their Teach for America (TFA) experience
but it is possible that the relatively high share of STEM undergraduate degrees could be driven
by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last
pilot test included 140 students) prior to administering the tool to study participants Reliability
metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of
the development of the assessment instrument in the survey can be found in Seeratan et al
(2017) 13 Each year in the spring semester our team administered and collected the participant surveys
during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However
if study participants who did not take the survey differ in unobserved ways then our reweighting
based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo
characteristics before imputation of missing values (as described below) these results are very
similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact
that some students did not have 10th grade scores) we created one reading and math score for
each student that is the average of both scores or just the 8th grade score For the 23 participating
students who were in 10th grade during the year in which the AP course was offered to their
cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be
endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of
enrollment in the course so the lack of balance is simply due to unlucky randomization rather
32
than manipulation by school administrators We considered implementing a randomized block
design to avoid such issues but found it infeasible to obtain the necessary test score information
prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was
allowed to register for the new class We added an entire planning year to our study design to
avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by
Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We
find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these
six outcomes which suggests that generalizing our estimated treatment effects to the full control
group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly
offered enrollment in an AP course and then given the option of Chemistry or Biology To
account for the two courses offered we treat the school as two separate groups School-
Chemistry and School-Biology For those students who were not offered an AP course we
randomly assign them to one of two control groups proportional to the number of treated
students who chose each course For example if 60 of the treated students chose Biology then
we randomly assign 60 of the control students to the School-Biology control group In Section
VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a
probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1
if student i in school by cohort j completed any part of the end-of-year survey Xi is the same
vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed
effects and Φ() is the cumulative normal distribution function The results of this regression are
included in Online Appendix Table 2 Students who had higher pre-treatment grades Black
students those who were not disabled and those who took prerequisite courses were more likely
to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives
more weight in the regression to study participants who completed the survey and yet had pre-
study characteristics that were similar to those study participants who did not complete the
survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and
with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we
observe each outcome variable This follows a multiple imputation then deletion strategy
suggested by Hippel (2007) which improves efficiency while protecting against problematic
imputed outcome values As a robustness check Section VC provides results including
imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually
challenging curriculum with more homework than non-AP complier students Treatment group
students are also more likely to report that the students in their class were driven to succeed and
that the teacher set high standards The AP science class also involved more student-led projects
or experiments hands on learning and small group work all activities that are deemed to be
essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)
Yet we do not find strong evidence that students in AP classes were more likely to present what
they learned apply their knowledge to solve a new problem or work independently and none of
the component measures of technology usage were statistically significantly affected Nor did
33
treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear
better able to implement the academic rigor expected of an AP science class than some of the
inquiry-based approaches that the College Board intends for AP science We do not find
evidence that taking AP science led students to be more likely to report that they found their
course more interesting which may reflect the inability of the teachers to fully implement a
creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects
that might render our estimated effects smaller A research design with randomization both
across and within schools would allow for estimation of spillover effects but such a design was
infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers
received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the
weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors
in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple
outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons
(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same
three outcomes that reach statistical significance without applying the correction (shown in
Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys
from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we
have included the students from cohort 1 of high school number 23 where nonresponse was due
mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes
(including student prior academic preparation raceethnicity gender and teacher preparation)
We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in
science and grades in other courses) Some of the differences in the point estimates were quite
large yet so too were the standard errors For instance five of the seven estimated differential
treatment effects on science skill exceed 025 standard deviations with p-values that fall in the
suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse
on all three cohorts of study participants Once data collection is complete we will have the
ability to examine the effect of AP science on college enrollment college selectivity and college
completion
11
percent of the students who received an offer choosing not to enroll and 19 percent of the control
students enrolling Nearly all of these latter crossovers reflected decisions by the district to
violate the study protocol and let control group students into the course while a few of these
came from hardship exemptions that were requested by the school and granted by the study team
The remaining rows in Table 3 shine light on the courses that were crowded out by the newly
offered AP science course Mechanically treatment group students took more credits in AP
science (an 11-percentage point increase in the share of total credits in the full sample)
Treatment group studentsrsquo share of courses in any AP also increased by 11 percentage points
indicating that they chose not to reduce enrollment in other AP courses Instead taking AP
science appears to have crowded out regular courses (down 9 percentage points) including
regular science courses (down 2 percentage points)22
Approximately 78 percent of the control group compliers took any science course with 34
percent taking a non-AP advanced science course (almost entirely honors courses) during the
study year The control students who did not take AP Biology or Chemistry took a variety of
alternative science courses with the most commonly reported courses including Chemistry
(13) Physics (12) AP Environmental Science (11) Biology (10) Honors Biology (9)
and AnatomyPhysiology (9)
Table 4 provides the contrast in treatment and control group complier reports on the content
and rigor of their science courses for three composite variables We find that taking AP science
yielded a substantially more academically challenging curriculum (up 080 sd p-value lt 001)
and raised the extent of inquiry-based classroom activities (up 033 sd p-value = 006) Our
results also suggest that AP course-takerrsquos classrooms were more likely to use technology (up
028 sd p-value = 014)23 Online Appendix Table 5 shows estimated impacts on each of the
component variables used in constructing the composite variables We find that while AP
classrooms were more inquiry-based than other science classrooms using our composite
measure some of the core components of the inquiry approach that were intended by the Board
(eg applying knowledge to solve a new problem) were not more prevalent in AP science
classes than other science classes24 This contrast between studentsrsquo reports of the content and
rigor of their AP science course relative to other courses available to them offers one measure of
the relative quality of the treatment In a companion manuscript we provide a detailed evaluation
of implementation fidelity (the degree to which the courses were implemented as intended by the
Board) through teacher surveys course syllabi student transcripts and interviews with teachers
and school administrators (Long Conger and McGhee 2018) In that manuscript we find results
that are consistent with the finding that most teachers were able to implement a rigorous AP
science classroom yet they also struggled with the inquiry-based approach and integrating
technology into the classroom
These reported differences between treatment and control group classrooms also hold despite
the fact that many of the teachers selected to teach AP also teach the other science courses taken
by control group students In fact almost 67 percent of AP teachers reported using some of their
AP science strategies and lessons in their non-AP classes These within-school spillovers likely
attenuate observed differences in outcomes between treatment and control group students in the
same school25
B AP Impact on Outcomes
Table 5 reports estimated impacts of AP science on the key outcomes of interest We estimate
that for the typical complier taking AP science raises objectively measured scientific inquiry
skills by 023 standard deviations We are unable to rule out zero treatment impacts with
12
conventionally high levels of confidence (p-value = 014) and consequently refer to these results
as more suggestive than definitive AP science also increased compliersrsquo interest in pursuing a
STEM degree should they enroll in college by 9 percentage points up from a control group
complier mean of 62 percent with again more suggestive than definitive results at traditional
levels of statistical inference (p-value = 016)
Table 5 provides stronger evidence of negative treatment effects on studentsrsquo confidence in
their ability to succeed in a college science course Among control group compliers 92 percent
express that they are at least somewhat confident in their ability to succeed in a college science
course These high levels of confidence are perhaps not surprising since all of our sample
participants demonstrated interest in taking AP Chemistry or Biology as a result of signing the
study assent forms Taking AP science substantially lowered participantsrsquo likelihood of being at
least somewhat confident in their ability to complete college courses in science (down 10
percentage points p-value = 006) We also find large effects of the AP course on studentsrsquo self-
reported stress levels Among control group compliers 12 percent stated that their most recent
science class had a negative or strong negative impact on their stress levels (where a negative
impact indicates more stress) Taking AP science more than doubles this rate raising the
likelihood of stating a negative impact by 17 percentage points (p-value = 001) In results
available from the authors we also examine the effect of taking AP on the full distribution of
studentrsquos self-reported confidence and stress levels We find that taking AP science increases
studentsrsquo likelihood of reporting strong negative impacts on stress by 5 percentage points (p-
value = 005) above the control group complier mean of 2 percent
In addition to experiencing a loss in confidence and an increase in stress treatment group
studentsrsquo grades suffered We estimate that taking AP science reduced studentsrsquo grades in their
science courses by 029 points (p-value = 007) Relative to a control group complier mean of
280 taking AP science lowers studentsrsquo science GPAs during the study year (usually their junior
year) from around a B- to a C+26 This decline is addressed to some degree by high schools that
use a weighted grade point average to upweight grades from AP courses The last row of Table 5
provides our estimated effects of AP science on studentsrsquo grades in other courses AP science
takers score approximately 018 grade points lower than control group compliers in non-science
courses during the study year (p-value below 001) These results suggest that students may be
shifting their effort away from their non-AP classes in order to meet the demands of the
challenging AP course An average of these impacts weighted by studentsrsquo share of credits in
science during the study year assuming that they take AP science (024) suggests that taking AP
science lowers studentsrsquo overall grades by 021 during the year ((-029 times 024) + (-018 times
076))
With our estimates in hand we can easily compute the adjustment that would leave the
studentrsquos GPA during the study year unaffected For students who took AP Biology or Chemistry
as result of this experiment the share of their classes in any AP science subject is predicted to be
14 percent (ie 002 + 012 from Table 3) If these studentsrsquo grades in AP science courses were
boosted by 146 (021014) their GPAs during the study year would be unaffected by their
enrollment in these AP courses This 146 boost is close to the higher end of the practices
documented in Klopfenstein and Lively (2016)27
C Robustness Checks
Table 6 presents a variety of robustness checks of the ITT estimates on our six main outcomes
The first two columns of this table repeat the findings previously shown in Table 5 Columns (3)
and (4) present alternate methods for inference Column (3) reports robust standard errors and
13
Column (4) reports the results of a permutation test where we randomly assign a pseudo
treatment and compute the share of 1000 permutations where the absolute value of the estimated
pseudo treatment effect exceeds the absolute value of the estimated treatment effect shown in
Column (2)28 The resulting p-values from this permutation test are similar to the results using
robust standard errors (shown in Column (3)) resulting in five of the six outcomes with p-values
of less than 01029
Columns (5) through (7) of Table 6 show that the results are robust to (a) dropping the one
high school that offered both AP Biology and AP Chemistry as part of the study (b) including
observations with multiply-imputed missing outcome variables and (c) excluding the high
school with the lowest survey response rate30 Column (8) shows the results when we exclude all
of the Xi covariates where we find much larger estimated positive effects on scientific inquiry
skills and smaller estimated negative effects on grades The differences in the treatment effects
on the remaining three outcomes are modest These results likely reflect the fact that students
who were randomly assigned into the treatment group have higher pre-treatment grades and
reading and math test scores all covariates that strongly correlate with science skill and future
grades
Columns (9) through (12) of Table 6 use the Lee (2009) method to place bounds on our
estimates due to potential nonresponse bias in the student survey used for the first four outcomes
This method trims particular observations from the treatment group (in this case) until it matches
the response rate of the control group The lower (upper) bound estimate trims the treatment
observations with the highest (lowest) values of the outcome Using these lower and upper bound
estimates we compute the 95 percent confidence interval for the treatment effect itself by
applying the Imbens and Manski (2004) method Consistent with our main findings the upper
and lower bound points estimates are positive for science skill (003 and 039 sd) interest in
pursuing a STEM degree (2 and 12 percentage points) and stress (1 and 11 percentage points)
However the 95 percent confidence intervals overlap zero in all cases and are roughly double the
size of the ordinary confidence intervals These results suggest that some additional caution
should be considered in evaluating the effects from outcomes based on the study survey31
Finally we would have liked to report the results of theoretically motivated heterogeneity
analyses yet we lack the statistical power needed to test heterogeneity with a high level of
confidence For example Figure 3 shows a quantile regression conditional on Xi with science
skill as the outcome We find that the point estimates at every quantile are insignificantly
different from the 009 ITT point estimate reported in Table 5 yet the 95 confidence intervals
fail to rule out large positives and negatives Additional heterogeneity results can be found in the
Online Appendix32
VI Conclusion
Most admissions committees at bachelorrsquos degree-granting institutions rely on applicantsrsquo AP
course and exam participation as signals of subject-matter skill and interest rendering the
relationship between AP uptake and college enrollment somewhat deterministic There has been
almost no empirical work to support the theory that AP disproportionately endows high school
students with greater human capital than the other courses available to them Many students
educators and parents have also complained that the rigor of the AP pro- gram causes students to
lose confidence gain stress and perform poorly in other courses We evaluate these claims with
experimental evidence on the impact of AP Biology and Chemistry courses on studentsrsquo skills
14
interests and beliefs We recruited 23 schools that had not previously offered AP Biology or
Chemistry and were willing to permit us to randomize student access to the newly offered
course At the time of our school recruitment an estimated 50 percent of US high schools
already offered AP science classes and they tended to be in relatively higher-income
communities disproportionately serving White students (Malkus 2016) Our study drew from the
remaining population of schools where teachers had lower levels of training than science
teachers nationally and students were disproportionately non-White and poor Consequently our
results on AP impacts best generalize to schools like these that are on the cusp of deciding
whether to offer an AP science course
The estimates suggest that AP science led to improvements in science skill and STEM
interest above the courses that these students would otherwise take Prior research points to
longer-run benefits of AP including a higher likelihood of college enrollment and completion as
well as possible earnings gains (Jackson 2010 2014) Our findings suggest that these long-term
effects are at least partially driven by genuine increases in skill and not due solely to
postsecondary admissions and credit-granting policies33 We also find that AP science classes
substantially increase studentsrsquo stress levels and reduce their confidence in completing a college
science course Students who take AP science also receive lower grades in science and in other
(non-science) courses The cognitive gains from AP science are consistent with evidence that
higher levels of pressure and a lower level of confidence cause students to learn more than they
would otherwise And some of the negative effect on grades can be offset by upwardly weighting
grades in advanced courses
Although we have no direct way to convert our study impacts into monetary values for
students or society our evidence suggests that schools and districts are not making unwise or
costly investments in AP Calculating the differential cost to deliver an AP course versus another
level course in the same subject is difficult given that few schools document per-course
expenditures One recent analysis of a US district that relied on teacher salaries and course
assignments offers a partial cost-analysis Roza (2009) finds approximately $360 more in per-
pupil expenditures to deliver AP versus honors due primarily to smaller class sizes and more
senior teachers in AP This cost does not factor in the time that teachers spend retraining
themselves to teach the new curriculum At the same time relative to other policies aimed at
increasing human capital in high school that are often more costly to implement (such as
reducing class size) offering an AP course may be one of the least expensive options
This study offers the first credible estimates on the impact of a curriculum that is now offered
in the majority of the nationrsquos high schools and used by most postsecondary institutions to assess
applicant potential Our findings offer evidence to support and refute some of the claims made
about the AP program At the same time many important questions remain about differential AP
course impacts along student teacher and school attributes and on different parts of the outcome
distributions What are the general equilibrium effects of AP expansion for instance on college
admissions decisions as AP expands into schools with fewer resources Do AP courses generate
spillover effects on non-AP course-takers via changes in peer interactions and changes in how
teachers teach their non-AP classes These are all questions that warrant further research
15
References
Abadie Alberto Susan Athey Guido W Imbens and Jeffrey Wooldridge 2017 ldquoWhen Should
you Adjust Standard Errors for Clusteringrdquo NBER Working Paper No 24003
Cambridge MA NBER
Adelman Clifford 2006 The Toolbox Revisited Paths to Degree Completion from High School
Through College Washington DC US Department of Education
Aguilar Lauren Greg Walton and Carl Wieman 2014 ldquoPsychological Insights for Improved
Physics Teachingrdquo Physics Today 67 (5) 43ndash49
Altonji Joseph G 1995 ldquoThe Effects of High School Curriculum on Education and Labor
Market Outcomesrdquo The Journal of Human Resources 30 (3) 409ndash438
Anderson Carl R 1976 ldquoCoping Behaviors as Intervening Mechanisms in the Inverted-U-stress-
performance Relationshiprdquo Journal of Applied Psychology 61 (1) 30ndash34
Attewell Paul and Thurston Domina 2008 ldquoRaising the Bar Curricular Intensity and
Academic Performancerdquo Educational Evaluation and Policy Analysis 30 (1) 51ndash71
Avery Christopher Oded Gurantz Michael Hurwitz and Jonathan Smith 2018 ldquoShifting
College Majors in Response to Advanced Placement Exam Scoresrdquo Journal of Human
Resources 53 (4) 918ndash956
Benjamini Yoav and Yosef Hochberg 1995 ldquoControlling the False Discovery Rate A Practical
and Powerful Approach to Multiple Testingrdquo Journal of the Royal Statistical Society 57
(1) 289ndash300
Bennett J S Hogarth F Lubben B Campbell and A Robinson 2010 ldquoTalking Science The
Research Evidence on the Use of Small Group Discussions in Science Teachingrdquo
International Journal of Science Education 32 (1) 69ndash95
Berger Joe 2006 ldquoDemoting Advanced Placementrdquo The New York Times October 4
Boekaerts Monique and Jeroen S Rozendaal 2010 ldquoUsing Multiple Calibration Indices in
Order to Capture the Complex Picture of What Affects Studentsrsquo Accuracy of Feeling of
Confidencerdquo Learning and Instruction 20 (5) 372ndash382
Bound John Brad Hershbein and Bridget Terry Long 2009 ldquoPlaying the Admissions Game
Student Reactions to Increasing College Competitionrdquo The Journal of Economic
Perspectives 23 (4) 119ndash146
Bowie Liz 2013 ldquoMaryland Schools have been Leader in Advanced Placement but Results are
Mixedrdquo The Baltimore Sun August 17
Bush George W 2006 ldquoState of the Union Address by the Presidentrdquo Washington DC The
White House
Chiu Ming Ming and Robert M Klassen 2010 ldquoRelations of Mathematics Self-Concept and its
Calibration with Mathematics Achievement Cultural Differences among Fifteen-year-
olds in 34 Countriesrdquo Learning and Instruction 20 (1) 2ndash17
Clotfelter Charles T Helen F Ladd and Jacob L Vigdor 2010 ldquoTeacher Credentials and
Student Achievement in High School Across-Subject Analysis with Student Fixed
Effectsrdquo Journal of Human Resources 45 (3) 655ndash681
College Board 2002 Equity Policy Statement New York NY
__________ 2011a AP Biology Curriculum Framework 2012-2013 New York NY
__________ 2011b AP Chemistry Curriculum Framework 2013-2014 New York NY
__________ 2017a AP Course and Exam Redesign New York NY
__________ 2017b AP Course Audit New York NY
__________ 2018 AP Program Participation and Performance Data 2018 New York NY
16
Davis Jennifer R 2014 ldquoA Little Goes a Long Way Pressure for College Students to Succeedrdquo
Journal of Undergraduate Research 12 (1) 1ndash9
Dobbie Will and Roland G Fryer Jr 2015 ldquoThe medium-term impacts of high-achieving
charter schoolsrdquo Journal of Political Economy 123 (5) 985ndash1037
Dougherty Chrys and Lynn Mellor 2009 ldquoPreparation Mattersrdquo National Center for
Educational Achievement Washington DC
Dounay Zinth Jennifer 2016 ldquo50-State Comparison Advanced Placement Policiesrdquo Education
Commission of the States
Drew Christopher 2011 ldquoRethinking Advanced Placementrdquo The New York Times January 7
Duffett Ann and Steve Farkas 2009 ldquoGrowing Pains in the Advanced Placement Program Do
Tough Trade-offs Lie Aheadrdquo Thomas B Fordham Institute Washington DC
Ellis Jessica Bailey K Fosdick and Chris Rasmussen 2016 ldquoWomen 15 Times More Likely to
Leave STEM Pipeline after Calculus Compared to Men Lack of Mathematical
Confidence a Potential Culpritrdquo PLOS ONE 11 (7) 1ndash14
Foust Regan Clark Holly Hertberg-Davis and Carolyn M Callahan 2009 ldquoStudentsrsquo
Perceptions of the Non-academic Advantages and Disadvantages of Participation in
Advanced Placement Courses and International Baccalaureate Programsrdquo Adolescence
44 (174) 289ndash312
Geiser Saul and Veronica Santelices 2004 ldquoThe Role of Advanced Placement and Honors
Courses in College Admissionsrdquo Center for Studies in Higher Education Research
Occasional Paper Series CSHE404
Goodman Joshua Samuel 2012 ldquoThe Labor of Division Returns to Compulsory Math
Courseworkrdquo Unpublished Manuscript
Harel O 2009 ldquoThe Estimation of R-squared and Adjusted R-squared in Incomplete Data Sets
Using Multiple Imputationrdquo Journal of Applied Statistics 36 (10) 1109ndash1118
Hippel Paul T von 2007 ldquoRegression with Missing Ys An Improved Strategy for Analyzing
Multiply Imputed Datardquo Sociological Methodology 37 (1) 83ndash117
Holstead Michael S Terry E Spradlin Margaret E McGillivray and Nathan Burroughs 2010
ldquoThe Impact of Advanced Placement Incentive Programsrdquo Center for Evaluation and
Education Policy Indiana University Education Policy Brief 8(1)
Hopkins Katy 2012 ldquoWeigh the Benefits Stress of AP Courses for Your Studentrdquo US News
the World Report May 10
Huber Martin 2013 ldquoA Simple Test for the Ignorability of Non-compliance in Experimentsrdquo
Economics Letters 120 (3) 389ndash391
Imbens G and F Manski 2004 ldquoConfidence Intervals for Partially Identified Parametersrdquo
Econometrica 72 (6) 1845ndash1857
Jackson C Kirabo 2010 ldquoA Little Now for a Lot Later A Look at a Texas Advanced Placement
Incentive Programrdquo Journal of Human Resources 45 (3) 591ndash639
__________ 2014 ldquoDo College-Preparatory Programs Improve Long-Term Outcomesrdquo
Economic Inquiry 52 (1) 72ndash99
Joensen Juanna Schroslashter and Helena Skyt Nielsen 2009 ldquoIs there a Causal Effect of High
School Math on Labor Market Outcomesrdquo Journal of Human Resources 44 (1) 171ndash
198
Kim Emily 2015 ldquoAP Classes often Translate to Advanced Pressurerdquo Los Angeles Times
September 22
Klopfenstein Kristin and Kit Lively 2016 ldquoDo Grade Weights Promote More Advanced
17
Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324
Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement
Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891
__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and
Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds
Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188
Cambridge Harvard Education Press
Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla
Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)
287ndash 313
Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on
Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102
Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations
of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347
(6219) 262ndash265
Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math
and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic
Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student
STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher
Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking
on Secondary and Postsecondary Successrdquo American Educational Research Journal 49
(2) 285ndash322
Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP
Expansion Can Schools in Less-Resourced Communities Successfully Implement
Advanced Placement Science Coursesrdquo Conditionally accepted by Educational
Researcher
Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo
American Enterprise Institute Washington DC
Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23
McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy
Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of
Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-
144) US Department of Education Washington DC National Center for Education
Statistics
National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of
Mathematics and Science in US High Schoolsrdquo Washington DC National Academies
Press
__________ 2012 A Framework for K-12 Science Education Practices Crosscutting
Concepts and Core Ideas Washington DC The National Academies Press
Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC
Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data
Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures
Version 10 Stanford University
Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic
Analysis amp Policy 4 (1) 1ndash30
18
Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The
Review of Economics and Statistics 86 (2) 497ndash513
Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)
Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of
Advanced High School Coursework in Increasing STEM Career Interestrdquo Science
Educator 23 (1) 1ndash13
Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework
in College Admission Decisionsrdquo College and University 82 (4) 7ndash14
Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan
Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific
Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo
Educational Measurement Forthcoming
Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where
it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor
Economics 35 (1) 67ndash147
Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An
Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732
Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual
differencesrdquo Personality and Individual Differences 21 (6) 971ndash986
Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of
Cross-Cultural Psychology 45 (5) 821ndash837
Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid
Growthrdquo The New York Times April 29
Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo
Liberal Education 94 (3) 38ndash43
The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo
Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo
Education Trust June 5
Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and
Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-
001) US Department of Education Washington DC National Center for Education
Statistics
Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13
Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate
US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the
Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced
Placement Testsrdquo Washington DC
Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of
Advanced Placementrdquo Progressive Policy Institute Washington DC
West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth
Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring
Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation
and Policy Analysis 38 (1) 148ndash170
Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity
of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482
19
Figure 1
Geographic Distribution of Participating Districts
20
Figure 2
Participating Districts Neighborhood Socioeconomic Status and School Test Scores
Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school
district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos
neighborhood defined as the first principal component factor score based on measures of median
income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed
household rate and unemployment rate Y-axis is the districtrsquos average test score in grade
equivalents based on the averaged spring math and English scores for students in grades 3-8 for
2009-2013 with the expected level of achievement standardized to zero The size of each circle
is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using
Statarsquos default settings and roughly shows the predicted test score as a function of the
neighborhoodrsquos SES
21
Figure 3
Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile
Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects
Corresponding OLS estimate shown by the dashed horizontal line Science skill has been
standardized to have a mean of 0 and SD of 1 for the full sample of participating students
Results are weighted by the inverse probability of completing the survey
22
Table 1
Participating Schools and Teachers Compared to Other US High Schools and High School
Science Teachers Panel A Schools Participating Others
Average Enrollment 1409 723
Free or Reduced-Price Lunch 0700 0438
Asian 0055 0050
Black 0349 0154
Hispanic 0410 0221
White 0164 0537
Adjusted Cohort Graduation Rate 0843 0802
District Instruction Expenditures Per Pupil $6561 $5636
District Student Services Expenditures Per Pupil $3787 $3385
Panel B Teachers Participating Others
Age Under 30 0407 0160
Age 30-49 0432 0553
Age 50 or over 0161 0287
Female 0630 0536
Hispanic or Latino 0111 0051
Race American Indian or Alaska Native 0000 0009
Race Asian American 0111 0041
Race Black 0111 0060
Race Native Hawaiian or other Pacific Islander 0000 0004
Race White 0778 0896
Years of Experience 103 132
Years of Experience lt=2 0290 0085
Years of Experience lt=5 0481 0234
Hold a Teaching Certificate 0926 0945
Undergraduate Major in STEM 0944 0747
Single Subject Credential in Science 0630 0823
Masterrsquos Degree or Higher 0356 0615
Previously Taught AP Course 0469 NA
Previously Taught AP IB or Honors Course 0796 NA
Number of Professional Development Trainings 309 NA
in the Past 5 years (0-5)
Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts
httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public
high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a
9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the
Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey
httpsncesedgovsurveyssass Others in Panel B refers to public and private high school
teachers in the US High school science teachers are defined as teachers of grades 9-12 whose
main teaching assignment is in the natural sciences
23
Table 2
TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics
(1) (2) (3) (4) (5) (6)
Full Sample Survey Sample
Pre-Treatment Characteristic
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Age as of October of 11th Grade 166 -003 -007 166 -001 -001
(002) (007) (003) (009)
[019] [035] [065] [094]
Math Exam Score 038 008 025 044 007 030
(004) (010) (005) (016)
[008] [002] [017] [006]
Reading Exam Score 029 010 018 036 009 017
(003) (012) (004) (017)
[000] [014] [002] [031]
HS Grade Point Average 316 005 020 323 006 013
(003) (008) (003) (010)
[014] [002] [006] [020]
Female 059 000 010 061 -001 011
(003) (006) (004) (007)
[099] [010] [073] [012]
Asian American 012 002 010 012 003 010
(002) (005) (001) (007)
[027] [006] [007] [012]
Black 032 -002 -006 027 000 -005
(002) (006) (002) (005)
[029] [028] [088] [040]
Hispanic Native American or Multiracial 031 001 005 033 001 005
24
(002) (006) (002) (007)
[055] [041] [081] [051]
Disabled 002 000 -001 001 000 -001
(001) (001) (001) (001)
[093] [024] [057] [05]
Gifted 013 003 000 014 002 001
(002) (005) (002) (009)
[006] [100] [025] [089]
English Language Learner 005 001 002 004 001 004
(001) (002) (001) (003)
[041] [039] [054] [022]
Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007
(002) (007) (003) (009)
[066] [077] [072] [045]
Language Other than English Spoken at Home 034 002 003 035 001 004
(002) (007) (002) (007)
[032] [073] [059] [056]
Took Recommended Prerequisite Courses 079 000 009 079 002 005
(002) (004) (002) (005)
[084] [004] [043] [031]
Number of Observations 1819 1417
Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by
School x Cohort are in parentheses and p-values are in brackets
25
Table 3
First Stage Impacts on AP Course Enrollment and Overall Course Enrollment
(1) (2) (3) (4) (5) (6)
Full Sample Survey Respondents
Outcome
Control
Group
Mean
ITT
LATE
Control
Group
Mean
ITT
LATE
AP Treatment Course Enrollment 019 038 024 039
(005) (006)
[000] [000] Share of Credits During Study Year in
AP Science 003 004 011 003 004 010
(001) (001) (001) (001)
[000] [000] [000] [000]
All AP 013 004 011 014 004 010
(001) (002) (001) (002)
[000] [000] [000] [000]
Other Advanced Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [023] [020] [020]
All Other Advanced 025 -001 -003 025 -001 -003
(001) (002) (001) (003)
[023] [023] [030] [030]
Regular Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [020] [024] [019]
All Regular 062 -003 -009 061 -003 -007
(001) (003) (001) (003)
[002] [000] [007] [003]
Number of Observations 1819 1417
Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating
Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation
(1) Course-taking information collected from student transcripts Control Group Mean uses the
full control group for the first outcome (ie AP Treatment Course Enrollment) and those control
group members who complied with their assignment (ie those who did not take the AP
Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are
weighted by the inverse probability of completing the survey Standard errors clustered by School
x Cohort are in parentheses and p-values are in brackets
26
Table 4
Treatment Contrast (Composite Variables)
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Academically Challenging Curriculum -033 031 080
(010) (024)
[000] [000]
Project-Based Independent Classroom
Activities -006 013 033
(007) (017)
[007] [006]
Integrated Use of Technology
-011 011 028
(008) (019)
[019] [014]
Number of Observations 1417
Notes To construct these composite variables we first converted the values on each component
variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest
category was set to 10 the lowest to 00 and the remaining categories evenly spaced between
00 and 10 We then averaged and standardized these converted values Results are weighted by
the inverse probability of completing the survey Online Appendix Table 5 provides the list of
component variables Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
27
Table 5
AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Science Skill -019 009 023
(006) (016)
[015] [014]
STEM Interest 062 004 009
(002) (007)
[016] [016]
Confidence in College
Science 092 -004 -010
(002) (005)
[011] [006]
Stress 012 007 017
(003) (007)
[002] [001]
Grades in Science Courses 280 -012 -029
(007) (016)
[008] [007]
Grades in Other Courses 314 -007 -018
(002) (006)
[000] [000]
Number of Observations 1819 for grades 1417 for other
outcomes
Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of
participating students STEM interest =1 if high or some interest in pursuing a STEM degree or
=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to
complete a college science course or =0 if somewhat not confident or not at all confident Stress=
1 if most recent science course had strong negative or negative impact on physical or emotional
health or =0 if strong positive impact positive impact or no impact Grades in science and other
courses are obtained from student transcripts and measure grades during the study year
Results with the exception of grades during study year are weighted by the inverse probability of
completing the survey Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
Table 6
Robustness Checks of Main ITT Results
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Outcome
Control
Group
Complie
r Mean
Main
Result
s
Robus
t SE
p-value
(permutatio
n test)
Excludin
g High
School
56
Including
Imputatio
n of
Missing
Outcome
Variables
Excluding
Covariate
s
Excludin
g High
School
23
Lee
Lower
Boun
d
Lee
Upper
Boun
d
95
Confidence
Interval
from Lee
Bounds
Rati
o of
95
CI in
(11)
to
95
CI in
(7)
Science Skill -019 009 010 011 020 007 003 039
-
009
05
1 20
(006) (005) (000) (000) (000) (000) (007) (007)
[015] [006] [006] [020] [011] [001] [024] [072] [000]
STEM Interest 062 004 005 003 003 003 002 012
-
003
01
8 19
(002) (003) (000) (000) (000) (000) (003) (004)
[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College
Science 092 -004 -003 -006 -006 -004 -006 005
-
009
01
0 20
(002) (002) (000) (000) (000) (000) (002) (003)
[011] [005] [007] [037] [002] [003] [010] [000] [017]
Stress 012 007 005 006 008 007 001 011
-
005
01
5 16
(003) (002) (000) (000) (000) (000) (003) (002)
[002] [000] [000] [014] [007] [002] [002] [079] [000]
Grades in Science Courses 280 -012 -006 -010 -007 |
(007) (004) (000) (000) (000)
[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts
Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey
(002) (003) (000) (000) (000) |
[000] [001] [001] [000] [001] [038]
Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than
standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby
a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of
43
the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)
reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the
experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply
imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and
where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)
from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those
treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and
control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to
derive confidence interval for the treatment effect itself)
30
1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the
Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the
effect of advanced high school courses more generally often without distinctions between AP
and other rigorous course options Nearly all of these nonexperimental studies find large positive
effects of rigorous secondary school courses particularly those in math and science on studentsrsquo
high school postsecondary and labor market performance (eg Altonji 1995 Attewell and
Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long
Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer
an AP course are encouraged (though not required) to attend a professional development
training The Board and other independent agencies offer several workshops with the most
extensive training being the AP summer institute a week-long training that is led by an
experienced AP instructor Teachers are then expected to develop their syllabi for the course and
submit them to the Board for review A team of auditors at the Board review each syllabus and
grant permission to a school to label the course as AP on course catalogs and student transcripts
once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they
do not meet the requirements upon original submission College Board (2017b) contains a
discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for
assessment (ie course delivery and student performance are not assessed by the Board) In
order to effectively run an AP Biology or Chemistry course teachers require access to a well-
equipped classroom and laboratory including all supplies necessary to engage in
experimentation (eg beakers solutions microscopes measuring equipment) Most of the
teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of
learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry
reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses
were based upon recommendations from the National Science Foundation the National Research
Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently
influenced by their frames of reference in ways that other assessments of these traits (eg
external observations) may be less influenced By increasing the standard to which they compare
themselves studentsrsquo confidence may decrease This feature of most self-assessments could be
considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et
al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome
depends to some extent on how these changes in perceived ability influence other behaviors
such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and
Biology I and Chemistry I for AP Biology with no additional requirements beyond these
prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week
training course classroom supplies (eg lab materials textbooks) and to compensate schools
for the staff time required for study administration efforts We also offered $1000 compensation
for an individual selected by the school to serve as a liaison between the study team and the
31
school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of
students which would have powered the study to detect effect sizes smaller than those detected
here We faced several challenges in recruiting schools to participate even with the monetary
incentives Some schools were uncomfortable with randomization across classrooms while
others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the
course would be offered We also made some assignments on a rolling basis as additional
consentassent forms were submitted We have no information on the students who were deemed
eligible by the school to take the new AP science course but who did not sign the consent form
to participate As these students did not participate we do not have permission to obtain
information on their characteristics (eg via transcripts) and for most schools we do not know
the number of such students 10 Participating districts include Anaheim Union High School District California East Side
Union High School District California Lynwood Unified School District California Jefferson
Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg
Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public
Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville
Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate
degrees held by teachers nationally are likely to be in education (not STEM) Thus the study
teachers are less likely to have a graduate degree but not necessarily less likely to have STEM
training We also did not survey teachers regarding their Teach for America (TFA) experience
but it is possible that the relatively high share of STEM undergraduate degrees could be driven
by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last
pilot test included 140 students) prior to administering the tool to study participants Reliability
metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of
the development of the assessment instrument in the survey can be found in Seeratan et al
(2017) 13 Each year in the spring semester our team administered and collected the participant surveys
during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However
if study participants who did not take the survey differ in unobserved ways then our reweighting
based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo
characteristics before imputation of missing values (as described below) these results are very
similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact
that some students did not have 10th grade scores) we created one reading and math score for
each student that is the average of both scores or just the 8th grade score For the 23 participating
students who were in 10th grade during the year in which the AP course was offered to their
cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be
endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of
enrollment in the course so the lack of balance is simply due to unlucky randomization rather
32
than manipulation by school administrators We considered implementing a randomized block
design to avoid such issues but found it infeasible to obtain the necessary test score information
prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was
allowed to register for the new class We added an entire planning year to our study design to
avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by
Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We
find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these
six outcomes which suggests that generalizing our estimated treatment effects to the full control
group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly
offered enrollment in an AP course and then given the option of Chemistry or Biology To
account for the two courses offered we treat the school as two separate groups School-
Chemistry and School-Biology For those students who were not offered an AP course we
randomly assign them to one of two control groups proportional to the number of treated
students who chose each course For example if 60 of the treated students chose Biology then
we randomly assign 60 of the control students to the School-Biology control group In Section
VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a
probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1
if student i in school by cohort j completed any part of the end-of-year survey Xi is the same
vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed
effects and Φ() is the cumulative normal distribution function The results of this regression are
included in Online Appendix Table 2 Students who had higher pre-treatment grades Black
students those who were not disabled and those who took prerequisite courses were more likely
to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives
more weight in the regression to study participants who completed the survey and yet had pre-
study characteristics that were similar to those study participants who did not complete the
survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and
with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we
observe each outcome variable This follows a multiple imputation then deletion strategy
suggested by Hippel (2007) which improves efficiency while protecting against problematic
imputed outcome values As a robustness check Section VC provides results including
imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually
challenging curriculum with more homework than non-AP complier students Treatment group
students are also more likely to report that the students in their class were driven to succeed and
that the teacher set high standards The AP science class also involved more student-led projects
or experiments hands on learning and small group work all activities that are deemed to be
essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)
Yet we do not find strong evidence that students in AP classes were more likely to present what
they learned apply their knowledge to solve a new problem or work independently and none of
the component measures of technology usage were statistically significantly affected Nor did
33
treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear
better able to implement the academic rigor expected of an AP science class than some of the
inquiry-based approaches that the College Board intends for AP science We do not find
evidence that taking AP science led students to be more likely to report that they found their
course more interesting which may reflect the inability of the teachers to fully implement a
creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects
that might render our estimated effects smaller A research design with randomization both
across and within schools would allow for estimation of spillover effects but such a design was
infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers
received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the
weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors
in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple
outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons
(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same
three outcomes that reach statistical significance without applying the correction (shown in
Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys
from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we
have included the students from cohort 1 of high school number 23 where nonresponse was due
mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes
(including student prior academic preparation raceethnicity gender and teacher preparation)
We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in
science and grades in other courses) Some of the differences in the point estimates were quite
large yet so too were the standard errors For instance five of the seven estimated differential
treatment effects on science skill exceed 025 standard deviations with p-values that fall in the
suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse
on all three cohorts of study participants Once data collection is complete we will have the
ability to examine the effect of AP science on college enrollment college selectivity and college
completion
12
conventionally high levels of confidence (p-value = 014) and consequently refer to these results
as more suggestive than definitive AP science also increased compliersrsquo interest in pursuing a
STEM degree should they enroll in college by 9 percentage points up from a control group
complier mean of 62 percent with again more suggestive than definitive results at traditional
levels of statistical inference (p-value = 016)
Table 5 provides stronger evidence of negative treatment effects on studentsrsquo confidence in
their ability to succeed in a college science course Among control group compliers 92 percent
express that they are at least somewhat confident in their ability to succeed in a college science
course These high levels of confidence are perhaps not surprising since all of our sample
participants demonstrated interest in taking AP Chemistry or Biology as a result of signing the
study assent forms Taking AP science substantially lowered participantsrsquo likelihood of being at
least somewhat confident in their ability to complete college courses in science (down 10
percentage points p-value = 006) We also find large effects of the AP course on studentsrsquo self-
reported stress levels Among control group compliers 12 percent stated that their most recent
science class had a negative or strong negative impact on their stress levels (where a negative
impact indicates more stress) Taking AP science more than doubles this rate raising the
likelihood of stating a negative impact by 17 percentage points (p-value = 001) In results
available from the authors we also examine the effect of taking AP on the full distribution of
studentrsquos self-reported confidence and stress levels We find that taking AP science increases
studentsrsquo likelihood of reporting strong negative impacts on stress by 5 percentage points (p-
value = 005) above the control group complier mean of 2 percent
In addition to experiencing a loss in confidence and an increase in stress treatment group
studentsrsquo grades suffered We estimate that taking AP science reduced studentsrsquo grades in their
science courses by 029 points (p-value = 007) Relative to a control group complier mean of
280 taking AP science lowers studentsrsquo science GPAs during the study year (usually their junior
year) from around a B- to a C+26 This decline is addressed to some degree by high schools that
use a weighted grade point average to upweight grades from AP courses The last row of Table 5
provides our estimated effects of AP science on studentsrsquo grades in other courses AP science
takers score approximately 018 grade points lower than control group compliers in non-science
courses during the study year (p-value below 001) These results suggest that students may be
shifting their effort away from their non-AP classes in order to meet the demands of the
challenging AP course An average of these impacts weighted by studentsrsquo share of credits in
science during the study year assuming that they take AP science (024) suggests that taking AP
science lowers studentsrsquo overall grades by 021 during the year ((-029 times 024) + (-018 times
076))
With our estimates in hand we can easily compute the adjustment that would leave the
studentrsquos GPA during the study year unaffected For students who took AP Biology or Chemistry
as result of this experiment the share of their classes in any AP science subject is predicted to be
14 percent (ie 002 + 012 from Table 3) If these studentsrsquo grades in AP science courses were
boosted by 146 (021014) their GPAs during the study year would be unaffected by their
enrollment in these AP courses This 146 boost is close to the higher end of the practices
documented in Klopfenstein and Lively (2016)27
C Robustness Checks
Table 6 presents a variety of robustness checks of the ITT estimates on our six main outcomes
The first two columns of this table repeat the findings previously shown in Table 5 Columns (3)
and (4) present alternate methods for inference Column (3) reports robust standard errors and
13
Column (4) reports the results of a permutation test where we randomly assign a pseudo
treatment and compute the share of 1000 permutations where the absolute value of the estimated
pseudo treatment effect exceeds the absolute value of the estimated treatment effect shown in
Column (2)28 The resulting p-values from this permutation test are similar to the results using
robust standard errors (shown in Column (3)) resulting in five of the six outcomes with p-values
of less than 01029
Columns (5) through (7) of Table 6 show that the results are robust to (a) dropping the one
high school that offered both AP Biology and AP Chemistry as part of the study (b) including
observations with multiply-imputed missing outcome variables and (c) excluding the high
school with the lowest survey response rate30 Column (8) shows the results when we exclude all
of the Xi covariates where we find much larger estimated positive effects on scientific inquiry
skills and smaller estimated negative effects on grades The differences in the treatment effects
on the remaining three outcomes are modest These results likely reflect the fact that students
who were randomly assigned into the treatment group have higher pre-treatment grades and
reading and math test scores all covariates that strongly correlate with science skill and future
grades
Columns (9) through (12) of Table 6 use the Lee (2009) method to place bounds on our
estimates due to potential nonresponse bias in the student survey used for the first four outcomes
This method trims particular observations from the treatment group (in this case) until it matches
the response rate of the control group The lower (upper) bound estimate trims the treatment
observations with the highest (lowest) values of the outcome Using these lower and upper bound
estimates we compute the 95 percent confidence interval for the treatment effect itself by
applying the Imbens and Manski (2004) method Consistent with our main findings the upper
and lower bound points estimates are positive for science skill (003 and 039 sd) interest in
pursuing a STEM degree (2 and 12 percentage points) and stress (1 and 11 percentage points)
However the 95 percent confidence intervals overlap zero in all cases and are roughly double the
size of the ordinary confidence intervals These results suggest that some additional caution
should be considered in evaluating the effects from outcomes based on the study survey31
Finally we would have liked to report the results of theoretically motivated heterogeneity
analyses yet we lack the statistical power needed to test heterogeneity with a high level of
confidence For example Figure 3 shows a quantile regression conditional on Xi with science
skill as the outcome We find that the point estimates at every quantile are insignificantly
different from the 009 ITT point estimate reported in Table 5 yet the 95 confidence intervals
fail to rule out large positives and negatives Additional heterogeneity results can be found in the
Online Appendix32
VI Conclusion
Most admissions committees at bachelorrsquos degree-granting institutions rely on applicantsrsquo AP
course and exam participation as signals of subject-matter skill and interest rendering the
relationship between AP uptake and college enrollment somewhat deterministic There has been
almost no empirical work to support the theory that AP disproportionately endows high school
students with greater human capital than the other courses available to them Many students
educators and parents have also complained that the rigor of the AP pro- gram causes students to
lose confidence gain stress and perform poorly in other courses We evaluate these claims with
experimental evidence on the impact of AP Biology and Chemistry courses on studentsrsquo skills
14
interests and beliefs We recruited 23 schools that had not previously offered AP Biology or
Chemistry and were willing to permit us to randomize student access to the newly offered
course At the time of our school recruitment an estimated 50 percent of US high schools
already offered AP science classes and they tended to be in relatively higher-income
communities disproportionately serving White students (Malkus 2016) Our study drew from the
remaining population of schools where teachers had lower levels of training than science
teachers nationally and students were disproportionately non-White and poor Consequently our
results on AP impacts best generalize to schools like these that are on the cusp of deciding
whether to offer an AP science course
The estimates suggest that AP science led to improvements in science skill and STEM
interest above the courses that these students would otherwise take Prior research points to
longer-run benefits of AP including a higher likelihood of college enrollment and completion as
well as possible earnings gains (Jackson 2010 2014) Our findings suggest that these long-term
effects are at least partially driven by genuine increases in skill and not due solely to
postsecondary admissions and credit-granting policies33 We also find that AP science classes
substantially increase studentsrsquo stress levels and reduce their confidence in completing a college
science course Students who take AP science also receive lower grades in science and in other
(non-science) courses The cognitive gains from AP science are consistent with evidence that
higher levels of pressure and a lower level of confidence cause students to learn more than they
would otherwise And some of the negative effect on grades can be offset by upwardly weighting
grades in advanced courses
Although we have no direct way to convert our study impacts into monetary values for
students or society our evidence suggests that schools and districts are not making unwise or
costly investments in AP Calculating the differential cost to deliver an AP course versus another
level course in the same subject is difficult given that few schools document per-course
expenditures One recent analysis of a US district that relied on teacher salaries and course
assignments offers a partial cost-analysis Roza (2009) finds approximately $360 more in per-
pupil expenditures to deliver AP versus honors due primarily to smaller class sizes and more
senior teachers in AP This cost does not factor in the time that teachers spend retraining
themselves to teach the new curriculum At the same time relative to other policies aimed at
increasing human capital in high school that are often more costly to implement (such as
reducing class size) offering an AP course may be one of the least expensive options
This study offers the first credible estimates on the impact of a curriculum that is now offered
in the majority of the nationrsquos high schools and used by most postsecondary institutions to assess
applicant potential Our findings offer evidence to support and refute some of the claims made
about the AP program At the same time many important questions remain about differential AP
course impacts along student teacher and school attributes and on different parts of the outcome
distributions What are the general equilibrium effects of AP expansion for instance on college
admissions decisions as AP expands into schools with fewer resources Do AP courses generate
spillover effects on non-AP course-takers via changes in peer interactions and changes in how
teachers teach their non-AP classes These are all questions that warrant further research
15
References
Abadie Alberto Susan Athey Guido W Imbens and Jeffrey Wooldridge 2017 ldquoWhen Should
you Adjust Standard Errors for Clusteringrdquo NBER Working Paper No 24003
Cambridge MA NBER
Adelman Clifford 2006 The Toolbox Revisited Paths to Degree Completion from High School
Through College Washington DC US Department of Education
Aguilar Lauren Greg Walton and Carl Wieman 2014 ldquoPsychological Insights for Improved
Physics Teachingrdquo Physics Today 67 (5) 43ndash49
Altonji Joseph G 1995 ldquoThe Effects of High School Curriculum on Education and Labor
Market Outcomesrdquo The Journal of Human Resources 30 (3) 409ndash438
Anderson Carl R 1976 ldquoCoping Behaviors as Intervening Mechanisms in the Inverted-U-stress-
performance Relationshiprdquo Journal of Applied Psychology 61 (1) 30ndash34
Attewell Paul and Thurston Domina 2008 ldquoRaising the Bar Curricular Intensity and
Academic Performancerdquo Educational Evaluation and Policy Analysis 30 (1) 51ndash71
Avery Christopher Oded Gurantz Michael Hurwitz and Jonathan Smith 2018 ldquoShifting
College Majors in Response to Advanced Placement Exam Scoresrdquo Journal of Human
Resources 53 (4) 918ndash956
Benjamini Yoav and Yosef Hochberg 1995 ldquoControlling the False Discovery Rate A Practical
and Powerful Approach to Multiple Testingrdquo Journal of the Royal Statistical Society 57
(1) 289ndash300
Bennett J S Hogarth F Lubben B Campbell and A Robinson 2010 ldquoTalking Science The
Research Evidence on the Use of Small Group Discussions in Science Teachingrdquo
International Journal of Science Education 32 (1) 69ndash95
Berger Joe 2006 ldquoDemoting Advanced Placementrdquo The New York Times October 4
Boekaerts Monique and Jeroen S Rozendaal 2010 ldquoUsing Multiple Calibration Indices in
Order to Capture the Complex Picture of What Affects Studentsrsquo Accuracy of Feeling of
Confidencerdquo Learning and Instruction 20 (5) 372ndash382
Bound John Brad Hershbein and Bridget Terry Long 2009 ldquoPlaying the Admissions Game
Student Reactions to Increasing College Competitionrdquo The Journal of Economic
Perspectives 23 (4) 119ndash146
Bowie Liz 2013 ldquoMaryland Schools have been Leader in Advanced Placement but Results are
Mixedrdquo The Baltimore Sun August 17
Bush George W 2006 ldquoState of the Union Address by the Presidentrdquo Washington DC The
White House
Chiu Ming Ming and Robert M Klassen 2010 ldquoRelations of Mathematics Self-Concept and its
Calibration with Mathematics Achievement Cultural Differences among Fifteen-year-
olds in 34 Countriesrdquo Learning and Instruction 20 (1) 2ndash17
Clotfelter Charles T Helen F Ladd and Jacob L Vigdor 2010 ldquoTeacher Credentials and
Student Achievement in High School Across-Subject Analysis with Student Fixed
Effectsrdquo Journal of Human Resources 45 (3) 655ndash681
College Board 2002 Equity Policy Statement New York NY
__________ 2011a AP Biology Curriculum Framework 2012-2013 New York NY
__________ 2011b AP Chemistry Curriculum Framework 2013-2014 New York NY
__________ 2017a AP Course and Exam Redesign New York NY
__________ 2017b AP Course Audit New York NY
__________ 2018 AP Program Participation and Performance Data 2018 New York NY
16
Davis Jennifer R 2014 ldquoA Little Goes a Long Way Pressure for College Students to Succeedrdquo
Journal of Undergraduate Research 12 (1) 1ndash9
Dobbie Will and Roland G Fryer Jr 2015 ldquoThe medium-term impacts of high-achieving
charter schoolsrdquo Journal of Political Economy 123 (5) 985ndash1037
Dougherty Chrys and Lynn Mellor 2009 ldquoPreparation Mattersrdquo National Center for
Educational Achievement Washington DC
Dounay Zinth Jennifer 2016 ldquo50-State Comparison Advanced Placement Policiesrdquo Education
Commission of the States
Drew Christopher 2011 ldquoRethinking Advanced Placementrdquo The New York Times January 7
Duffett Ann and Steve Farkas 2009 ldquoGrowing Pains in the Advanced Placement Program Do
Tough Trade-offs Lie Aheadrdquo Thomas B Fordham Institute Washington DC
Ellis Jessica Bailey K Fosdick and Chris Rasmussen 2016 ldquoWomen 15 Times More Likely to
Leave STEM Pipeline after Calculus Compared to Men Lack of Mathematical
Confidence a Potential Culpritrdquo PLOS ONE 11 (7) 1ndash14
Foust Regan Clark Holly Hertberg-Davis and Carolyn M Callahan 2009 ldquoStudentsrsquo
Perceptions of the Non-academic Advantages and Disadvantages of Participation in
Advanced Placement Courses and International Baccalaureate Programsrdquo Adolescence
44 (174) 289ndash312
Geiser Saul and Veronica Santelices 2004 ldquoThe Role of Advanced Placement and Honors
Courses in College Admissionsrdquo Center for Studies in Higher Education Research
Occasional Paper Series CSHE404
Goodman Joshua Samuel 2012 ldquoThe Labor of Division Returns to Compulsory Math
Courseworkrdquo Unpublished Manuscript
Harel O 2009 ldquoThe Estimation of R-squared and Adjusted R-squared in Incomplete Data Sets
Using Multiple Imputationrdquo Journal of Applied Statistics 36 (10) 1109ndash1118
Hippel Paul T von 2007 ldquoRegression with Missing Ys An Improved Strategy for Analyzing
Multiply Imputed Datardquo Sociological Methodology 37 (1) 83ndash117
Holstead Michael S Terry E Spradlin Margaret E McGillivray and Nathan Burroughs 2010
ldquoThe Impact of Advanced Placement Incentive Programsrdquo Center for Evaluation and
Education Policy Indiana University Education Policy Brief 8(1)
Hopkins Katy 2012 ldquoWeigh the Benefits Stress of AP Courses for Your Studentrdquo US News
the World Report May 10
Huber Martin 2013 ldquoA Simple Test for the Ignorability of Non-compliance in Experimentsrdquo
Economics Letters 120 (3) 389ndash391
Imbens G and F Manski 2004 ldquoConfidence Intervals for Partially Identified Parametersrdquo
Econometrica 72 (6) 1845ndash1857
Jackson C Kirabo 2010 ldquoA Little Now for a Lot Later A Look at a Texas Advanced Placement
Incentive Programrdquo Journal of Human Resources 45 (3) 591ndash639
__________ 2014 ldquoDo College-Preparatory Programs Improve Long-Term Outcomesrdquo
Economic Inquiry 52 (1) 72ndash99
Joensen Juanna Schroslashter and Helena Skyt Nielsen 2009 ldquoIs there a Causal Effect of High
School Math on Labor Market Outcomesrdquo Journal of Human Resources 44 (1) 171ndash
198
Kim Emily 2015 ldquoAP Classes often Translate to Advanced Pressurerdquo Los Angeles Times
September 22
Klopfenstein Kristin and Kit Lively 2016 ldquoDo Grade Weights Promote More Advanced
17
Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324
Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement
Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891
__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and
Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds
Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188
Cambridge Harvard Education Press
Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla
Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)
287ndash 313
Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on
Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102
Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations
of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347
(6219) 262ndash265
Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math
and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic
Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student
STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher
Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking
on Secondary and Postsecondary Successrdquo American Educational Research Journal 49
(2) 285ndash322
Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP
Expansion Can Schools in Less-Resourced Communities Successfully Implement
Advanced Placement Science Coursesrdquo Conditionally accepted by Educational
Researcher
Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo
American Enterprise Institute Washington DC
Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23
McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy
Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of
Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-
144) US Department of Education Washington DC National Center for Education
Statistics
National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of
Mathematics and Science in US High Schoolsrdquo Washington DC National Academies
Press
__________ 2012 A Framework for K-12 Science Education Practices Crosscutting
Concepts and Core Ideas Washington DC The National Academies Press
Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC
Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data
Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures
Version 10 Stanford University
Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic
Analysis amp Policy 4 (1) 1ndash30
18
Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The
Review of Economics and Statistics 86 (2) 497ndash513
Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)
Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of
Advanced High School Coursework in Increasing STEM Career Interestrdquo Science
Educator 23 (1) 1ndash13
Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework
in College Admission Decisionsrdquo College and University 82 (4) 7ndash14
Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan
Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific
Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo
Educational Measurement Forthcoming
Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where
it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor
Economics 35 (1) 67ndash147
Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An
Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732
Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual
differencesrdquo Personality and Individual Differences 21 (6) 971ndash986
Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of
Cross-Cultural Psychology 45 (5) 821ndash837
Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid
Growthrdquo The New York Times April 29
Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo
Liberal Education 94 (3) 38ndash43
The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo
Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo
Education Trust June 5
Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and
Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-
001) US Department of Education Washington DC National Center for Education
Statistics
Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13
Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate
US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the
Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced
Placement Testsrdquo Washington DC
Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of
Advanced Placementrdquo Progressive Policy Institute Washington DC
West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth
Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring
Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation
and Policy Analysis 38 (1) 148ndash170
Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity
of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482
19
Figure 1
Geographic Distribution of Participating Districts
20
Figure 2
Participating Districts Neighborhood Socioeconomic Status and School Test Scores
Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school
district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos
neighborhood defined as the first principal component factor score based on measures of median
income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed
household rate and unemployment rate Y-axis is the districtrsquos average test score in grade
equivalents based on the averaged spring math and English scores for students in grades 3-8 for
2009-2013 with the expected level of achievement standardized to zero The size of each circle
is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using
Statarsquos default settings and roughly shows the predicted test score as a function of the
neighborhoodrsquos SES
21
Figure 3
Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile
Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects
Corresponding OLS estimate shown by the dashed horizontal line Science skill has been
standardized to have a mean of 0 and SD of 1 for the full sample of participating students
Results are weighted by the inverse probability of completing the survey
22
Table 1
Participating Schools and Teachers Compared to Other US High Schools and High School
Science Teachers Panel A Schools Participating Others
Average Enrollment 1409 723
Free or Reduced-Price Lunch 0700 0438
Asian 0055 0050
Black 0349 0154
Hispanic 0410 0221
White 0164 0537
Adjusted Cohort Graduation Rate 0843 0802
District Instruction Expenditures Per Pupil $6561 $5636
District Student Services Expenditures Per Pupil $3787 $3385
Panel B Teachers Participating Others
Age Under 30 0407 0160
Age 30-49 0432 0553
Age 50 or over 0161 0287
Female 0630 0536
Hispanic or Latino 0111 0051
Race American Indian or Alaska Native 0000 0009
Race Asian American 0111 0041
Race Black 0111 0060
Race Native Hawaiian or other Pacific Islander 0000 0004
Race White 0778 0896
Years of Experience 103 132
Years of Experience lt=2 0290 0085
Years of Experience lt=5 0481 0234
Hold a Teaching Certificate 0926 0945
Undergraduate Major in STEM 0944 0747
Single Subject Credential in Science 0630 0823
Masterrsquos Degree or Higher 0356 0615
Previously Taught AP Course 0469 NA
Previously Taught AP IB or Honors Course 0796 NA
Number of Professional Development Trainings 309 NA
in the Past 5 years (0-5)
Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts
httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public
high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a
9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the
Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey
httpsncesedgovsurveyssass Others in Panel B refers to public and private high school
teachers in the US High school science teachers are defined as teachers of grades 9-12 whose
main teaching assignment is in the natural sciences
23
Table 2
TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics
(1) (2) (3) (4) (5) (6)
Full Sample Survey Sample
Pre-Treatment Characteristic
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Age as of October of 11th Grade 166 -003 -007 166 -001 -001
(002) (007) (003) (009)
[019] [035] [065] [094]
Math Exam Score 038 008 025 044 007 030
(004) (010) (005) (016)
[008] [002] [017] [006]
Reading Exam Score 029 010 018 036 009 017
(003) (012) (004) (017)
[000] [014] [002] [031]
HS Grade Point Average 316 005 020 323 006 013
(003) (008) (003) (010)
[014] [002] [006] [020]
Female 059 000 010 061 -001 011
(003) (006) (004) (007)
[099] [010] [073] [012]
Asian American 012 002 010 012 003 010
(002) (005) (001) (007)
[027] [006] [007] [012]
Black 032 -002 -006 027 000 -005
(002) (006) (002) (005)
[029] [028] [088] [040]
Hispanic Native American or Multiracial 031 001 005 033 001 005
24
(002) (006) (002) (007)
[055] [041] [081] [051]
Disabled 002 000 -001 001 000 -001
(001) (001) (001) (001)
[093] [024] [057] [05]
Gifted 013 003 000 014 002 001
(002) (005) (002) (009)
[006] [100] [025] [089]
English Language Learner 005 001 002 004 001 004
(001) (002) (001) (003)
[041] [039] [054] [022]
Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007
(002) (007) (003) (009)
[066] [077] [072] [045]
Language Other than English Spoken at Home 034 002 003 035 001 004
(002) (007) (002) (007)
[032] [073] [059] [056]
Took Recommended Prerequisite Courses 079 000 009 079 002 005
(002) (004) (002) (005)
[084] [004] [043] [031]
Number of Observations 1819 1417
Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by
School x Cohort are in parentheses and p-values are in brackets
25
Table 3
First Stage Impacts on AP Course Enrollment and Overall Course Enrollment
(1) (2) (3) (4) (5) (6)
Full Sample Survey Respondents
Outcome
Control
Group
Mean
ITT
LATE
Control
Group
Mean
ITT
LATE
AP Treatment Course Enrollment 019 038 024 039
(005) (006)
[000] [000] Share of Credits During Study Year in
AP Science 003 004 011 003 004 010
(001) (001) (001) (001)
[000] [000] [000] [000]
All AP 013 004 011 014 004 010
(001) (002) (001) (002)
[000] [000] [000] [000]
Other Advanced Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [023] [020] [020]
All Other Advanced 025 -001 -003 025 -001 -003
(001) (002) (001) (003)
[023] [023] [030] [030]
Regular Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [020] [024] [019]
All Regular 062 -003 -009 061 -003 -007
(001) (003) (001) (003)
[002] [000] [007] [003]
Number of Observations 1819 1417
Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating
Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation
(1) Course-taking information collected from student transcripts Control Group Mean uses the
full control group for the first outcome (ie AP Treatment Course Enrollment) and those control
group members who complied with their assignment (ie those who did not take the AP
Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are
weighted by the inverse probability of completing the survey Standard errors clustered by School
x Cohort are in parentheses and p-values are in brackets
26
Table 4
Treatment Contrast (Composite Variables)
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Academically Challenging Curriculum -033 031 080
(010) (024)
[000] [000]
Project-Based Independent Classroom
Activities -006 013 033
(007) (017)
[007] [006]
Integrated Use of Technology
-011 011 028
(008) (019)
[019] [014]
Number of Observations 1417
Notes To construct these composite variables we first converted the values on each component
variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest
category was set to 10 the lowest to 00 and the remaining categories evenly spaced between
00 and 10 We then averaged and standardized these converted values Results are weighted by
the inverse probability of completing the survey Online Appendix Table 5 provides the list of
component variables Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
27
Table 5
AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Science Skill -019 009 023
(006) (016)
[015] [014]
STEM Interest 062 004 009
(002) (007)
[016] [016]
Confidence in College
Science 092 -004 -010
(002) (005)
[011] [006]
Stress 012 007 017
(003) (007)
[002] [001]
Grades in Science Courses 280 -012 -029
(007) (016)
[008] [007]
Grades in Other Courses 314 -007 -018
(002) (006)
[000] [000]
Number of Observations 1819 for grades 1417 for other
outcomes
Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of
participating students STEM interest =1 if high or some interest in pursuing a STEM degree or
=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to
complete a college science course or =0 if somewhat not confident or not at all confident Stress=
1 if most recent science course had strong negative or negative impact on physical or emotional
health or =0 if strong positive impact positive impact or no impact Grades in science and other
courses are obtained from student transcripts and measure grades during the study year
Results with the exception of grades during study year are weighted by the inverse probability of
completing the survey Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
Table 6
Robustness Checks of Main ITT Results
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Outcome
Control
Group
Complie
r Mean
Main
Result
s
Robus
t SE
p-value
(permutatio
n test)
Excludin
g High
School
56
Including
Imputatio
n of
Missing
Outcome
Variables
Excluding
Covariate
s
Excludin
g High
School
23
Lee
Lower
Boun
d
Lee
Upper
Boun
d
95
Confidence
Interval
from Lee
Bounds
Rati
o of
95
CI in
(11)
to
95
CI in
(7)
Science Skill -019 009 010 011 020 007 003 039
-
009
05
1 20
(006) (005) (000) (000) (000) (000) (007) (007)
[015] [006] [006] [020] [011] [001] [024] [072] [000]
STEM Interest 062 004 005 003 003 003 002 012
-
003
01
8 19
(002) (003) (000) (000) (000) (000) (003) (004)
[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College
Science 092 -004 -003 -006 -006 -004 -006 005
-
009
01
0 20
(002) (002) (000) (000) (000) (000) (002) (003)
[011] [005] [007] [037] [002] [003] [010] [000] [017]
Stress 012 007 005 006 008 007 001 011
-
005
01
5 16
(003) (002) (000) (000) (000) (000) (003) (002)
[002] [000] [000] [014] [007] [002] [002] [079] [000]
Grades in Science Courses 280 -012 -006 -010 -007 |
(007) (004) (000) (000) (000)
[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts
Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey
(002) (003) (000) (000) (000) |
[000] [001] [001] [000] [001] [038]
Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than
standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby
a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of
43
the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)
reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the
experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply
imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and
where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)
from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those
treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and
control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to
derive confidence interval for the treatment effect itself)
30
1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the
Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the
effect of advanced high school courses more generally often without distinctions between AP
and other rigorous course options Nearly all of these nonexperimental studies find large positive
effects of rigorous secondary school courses particularly those in math and science on studentsrsquo
high school postsecondary and labor market performance (eg Altonji 1995 Attewell and
Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long
Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer
an AP course are encouraged (though not required) to attend a professional development
training The Board and other independent agencies offer several workshops with the most
extensive training being the AP summer institute a week-long training that is led by an
experienced AP instructor Teachers are then expected to develop their syllabi for the course and
submit them to the Board for review A team of auditors at the Board review each syllabus and
grant permission to a school to label the course as AP on course catalogs and student transcripts
once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they
do not meet the requirements upon original submission College Board (2017b) contains a
discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for
assessment (ie course delivery and student performance are not assessed by the Board) In
order to effectively run an AP Biology or Chemistry course teachers require access to a well-
equipped classroom and laboratory including all supplies necessary to engage in
experimentation (eg beakers solutions microscopes measuring equipment) Most of the
teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of
learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry
reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses
were based upon recommendations from the National Science Foundation the National Research
Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently
influenced by their frames of reference in ways that other assessments of these traits (eg
external observations) may be less influenced By increasing the standard to which they compare
themselves studentsrsquo confidence may decrease This feature of most self-assessments could be
considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et
al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome
depends to some extent on how these changes in perceived ability influence other behaviors
such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and
Biology I and Chemistry I for AP Biology with no additional requirements beyond these
prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week
training course classroom supplies (eg lab materials textbooks) and to compensate schools
for the staff time required for study administration efforts We also offered $1000 compensation
for an individual selected by the school to serve as a liaison between the study team and the
31
school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of
students which would have powered the study to detect effect sizes smaller than those detected
here We faced several challenges in recruiting schools to participate even with the monetary
incentives Some schools were uncomfortable with randomization across classrooms while
others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the
course would be offered We also made some assignments on a rolling basis as additional
consentassent forms were submitted We have no information on the students who were deemed
eligible by the school to take the new AP science course but who did not sign the consent form
to participate As these students did not participate we do not have permission to obtain
information on their characteristics (eg via transcripts) and for most schools we do not know
the number of such students 10 Participating districts include Anaheim Union High School District California East Side
Union High School District California Lynwood Unified School District California Jefferson
Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg
Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public
Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville
Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate
degrees held by teachers nationally are likely to be in education (not STEM) Thus the study
teachers are less likely to have a graduate degree but not necessarily less likely to have STEM
training We also did not survey teachers regarding their Teach for America (TFA) experience
but it is possible that the relatively high share of STEM undergraduate degrees could be driven
by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last
pilot test included 140 students) prior to administering the tool to study participants Reliability
metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of
the development of the assessment instrument in the survey can be found in Seeratan et al
(2017) 13 Each year in the spring semester our team administered and collected the participant surveys
during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However
if study participants who did not take the survey differ in unobserved ways then our reweighting
based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo
characteristics before imputation of missing values (as described below) these results are very
similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact
that some students did not have 10th grade scores) we created one reading and math score for
each student that is the average of both scores or just the 8th grade score For the 23 participating
students who were in 10th grade during the year in which the AP course was offered to their
cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be
endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of
enrollment in the course so the lack of balance is simply due to unlucky randomization rather
32
than manipulation by school administrators We considered implementing a randomized block
design to avoid such issues but found it infeasible to obtain the necessary test score information
prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was
allowed to register for the new class We added an entire planning year to our study design to
avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by
Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We
find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these
six outcomes which suggests that generalizing our estimated treatment effects to the full control
group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly
offered enrollment in an AP course and then given the option of Chemistry or Biology To
account for the two courses offered we treat the school as two separate groups School-
Chemistry and School-Biology For those students who were not offered an AP course we
randomly assign them to one of two control groups proportional to the number of treated
students who chose each course For example if 60 of the treated students chose Biology then
we randomly assign 60 of the control students to the School-Biology control group In Section
VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a
probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1
if student i in school by cohort j completed any part of the end-of-year survey Xi is the same
vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed
effects and Φ() is the cumulative normal distribution function The results of this regression are
included in Online Appendix Table 2 Students who had higher pre-treatment grades Black
students those who were not disabled and those who took prerequisite courses were more likely
to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives
more weight in the regression to study participants who completed the survey and yet had pre-
study characteristics that were similar to those study participants who did not complete the
survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and
with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we
observe each outcome variable This follows a multiple imputation then deletion strategy
suggested by Hippel (2007) which improves efficiency while protecting against problematic
imputed outcome values As a robustness check Section VC provides results including
imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually
challenging curriculum with more homework than non-AP complier students Treatment group
students are also more likely to report that the students in their class were driven to succeed and
that the teacher set high standards The AP science class also involved more student-led projects
or experiments hands on learning and small group work all activities that are deemed to be
essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)
Yet we do not find strong evidence that students in AP classes were more likely to present what
they learned apply their knowledge to solve a new problem or work independently and none of
the component measures of technology usage were statistically significantly affected Nor did
33
treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear
better able to implement the academic rigor expected of an AP science class than some of the
inquiry-based approaches that the College Board intends for AP science We do not find
evidence that taking AP science led students to be more likely to report that they found their
course more interesting which may reflect the inability of the teachers to fully implement a
creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects
that might render our estimated effects smaller A research design with randomization both
across and within schools would allow for estimation of spillover effects but such a design was
infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers
received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the
weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors
in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple
outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons
(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same
three outcomes that reach statistical significance without applying the correction (shown in
Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys
from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we
have included the students from cohort 1 of high school number 23 where nonresponse was due
mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes
(including student prior academic preparation raceethnicity gender and teacher preparation)
We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in
science and grades in other courses) Some of the differences in the point estimates were quite
large yet so too were the standard errors For instance five of the seven estimated differential
treatment effects on science skill exceed 025 standard deviations with p-values that fall in the
suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse
on all three cohorts of study participants Once data collection is complete we will have the
ability to examine the effect of AP science on college enrollment college selectivity and college
completion
13
Column (4) reports the results of a permutation test where we randomly assign a pseudo
treatment and compute the share of 1000 permutations where the absolute value of the estimated
pseudo treatment effect exceeds the absolute value of the estimated treatment effect shown in
Column (2)28 The resulting p-values from this permutation test are similar to the results using
robust standard errors (shown in Column (3)) resulting in five of the six outcomes with p-values
of less than 01029
Columns (5) through (7) of Table 6 show that the results are robust to (a) dropping the one
high school that offered both AP Biology and AP Chemistry as part of the study (b) including
observations with multiply-imputed missing outcome variables and (c) excluding the high
school with the lowest survey response rate30 Column (8) shows the results when we exclude all
of the Xi covariates where we find much larger estimated positive effects on scientific inquiry
skills and smaller estimated negative effects on grades The differences in the treatment effects
on the remaining three outcomes are modest These results likely reflect the fact that students
who were randomly assigned into the treatment group have higher pre-treatment grades and
reading and math test scores all covariates that strongly correlate with science skill and future
grades
Columns (9) through (12) of Table 6 use the Lee (2009) method to place bounds on our
estimates due to potential nonresponse bias in the student survey used for the first four outcomes
This method trims particular observations from the treatment group (in this case) until it matches
the response rate of the control group The lower (upper) bound estimate trims the treatment
observations with the highest (lowest) values of the outcome Using these lower and upper bound
estimates we compute the 95 percent confidence interval for the treatment effect itself by
applying the Imbens and Manski (2004) method Consistent with our main findings the upper
and lower bound points estimates are positive for science skill (003 and 039 sd) interest in
pursuing a STEM degree (2 and 12 percentage points) and stress (1 and 11 percentage points)
However the 95 percent confidence intervals overlap zero in all cases and are roughly double the
size of the ordinary confidence intervals These results suggest that some additional caution
should be considered in evaluating the effects from outcomes based on the study survey31
Finally we would have liked to report the results of theoretically motivated heterogeneity
analyses yet we lack the statistical power needed to test heterogeneity with a high level of
confidence For example Figure 3 shows a quantile regression conditional on Xi with science
skill as the outcome We find that the point estimates at every quantile are insignificantly
different from the 009 ITT point estimate reported in Table 5 yet the 95 confidence intervals
fail to rule out large positives and negatives Additional heterogeneity results can be found in the
Online Appendix32
VI Conclusion
Most admissions committees at bachelorrsquos degree-granting institutions rely on applicantsrsquo AP
course and exam participation as signals of subject-matter skill and interest rendering the
relationship between AP uptake and college enrollment somewhat deterministic There has been
almost no empirical work to support the theory that AP disproportionately endows high school
students with greater human capital than the other courses available to them Many students
educators and parents have also complained that the rigor of the AP pro- gram causes students to
lose confidence gain stress and perform poorly in other courses We evaluate these claims with
experimental evidence on the impact of AP Biology and Chemistry courses on studentsrsquo skills
14
interests and beliefs We recruited 23 schools that had not previously offered AP Biology or
Chemistry and were willing to permit us to randomize student access to the newly offered
course At the time of our school recruitment an estimated 50 percent of US high schools
already offered AP science classes and they tended to be in relatively higher-income
communities disproportionately serving White students (Malkus 2016) Our study drew from the
remaining population of schools where teachers had lower levels of training than science
teachers nationally and students were disproportionately non-White and poor Consequently our
results on AP impacts best generalize to schools like these that are on the cusp of deciding
whether to offer an AP science course
The estimates suggest that AP science led to improvements in science skill and STEM
interest above the courses that these students would otherwise take Prior research points to
longer-run benefits of AP including a higher likelihood of college enrollment and completion as
well as possible earnings gains (Jackson 2010 2014) Our findings suggest that these long-term
effects are at least partially driven by genuine increases in skill and not due solely to
postsecondary admissions and credit-granting policies33 We also find that AP science classes
substantially increase studentsrsquo stress levels and reduce their confidence in completing a college
science course Students who take AP science also receive lower grades in science and in other
(non-science) courses The cognitive gains from AP science are consistent with evidence that
higher levels of pressure and a lower level of confidence cause students to learn more than they
would otherwise And some of the negative effect on grades can be offset by upwardly weighting
grades in advanced courses
Although we have no direct way to convert our study impacts into monetary values for
students or society our evidence suggests that schools and districts are not making unwise or
costly investments in AP Calculating the differential cost to deliver an AP course versus another
level course in the same subject is difficult given that few schools document per-course
expenditures One recent analysis of a US district that relied on teacher salaries and course
assignments offers a partial cost-analysis Roza (2009) finds approximately $360 more in per-
pupil expenditures to deliver AP versus honors due primarily to smaller class sizes and more
senior teachers in AP This cost does not factor in the time that teachers spend retraining
themselves to teach the new curriculum At the same time relative to other policies aimed at
increasing human capital in high school that are often more costly to implement (such as
reducing class size) offering an AP course may be one of the least expensive options
This study offers the first credible estimates on the impact of a curriculum that is now offered
in the majority of the nationrsquos high schools and used by most postsecondary institutions to assess
applicant potential Our findings offer evidence to support and refute some of the claims made
about the AP program At the same time many important questions remain about differential AP
course impacts along student teacher and school attributes and on different parts of the outcome
distributions What are the general equilibrium effects of AP expansion for instance on college
admissions decisions as AP expands into schools with fewer resources Do AP courses generate
spillover effects on non-AP course-takers via changes in peer interactions and changes in how
teachers teach their non-AP classes These are all questions that warrant further research
15
References
Abadie Alberto Susan Athey Guido W Imbens and Jeffrey Wooldridge 2017 ldquoWhen Should
you Adjust Standard Errors for Clusteringrdquo NBER Working Paper No 24003
Cambridge MA NBER
Adelman Clifford 2006 The Toolbox Revisited Paths to Degree Completion from High School
Through College Washington DC US Department of Education
Aguilar Lauren Greg Walton and Carl Wieman 2014 ldquoPsychological Insights for Improved
Physics Teachingrdquo Physics Today 67 (5) 43ndash49
Altonji Joseph G 1995 ldquoThe Effects of High School Curriculum on Education and Labor
Market Outcomesrdquo The Journal of Human Resources 30 (3) 409ndash438
Anderson Carl R 1976 ldquoCoping Behaviors as Intervening Mechanisms in the Inverted-U-stress-
performance Relationshiprdquo Journal of Applied Psychology 61 (1) 30ndash34
Attewell Paul and Thurston Domina 2008 ldquoRaising the Bar Curricular Intensity and
Academic Performancerdquo Educational Evaluation and Policy Analysis 30 (1) 51ndash71
Avery Christopher Oded Gurantz Michael Hurwitz and Jonathan Smith 2018 ldquoShifting
College Majors in Response to Advanced Placement Exam Scoresrdquo Journal of Human
Resources 53 (4) 918ndash956
Benjamini Yoav and Yosef Hochberg 1995 ldquoControlling the False Discovery Rate A Practical
and Powerful Approach to Multiple Testingrdquo Journal of the Royal Statistical Society 57
(1) 289ndash300
Bennett J S Hogarth F Lubben B Campbell and A Robinson 2010 ldquoTalking Science The
Research Evidence on the Use of Small Group Discussions in Science Teachingrdquo
International Journal of Science Education 32 (1) 69ndash95
Berger Joe 2006 ldquoDemoting Advanced Placementrdquo The New York Times October 4
Boekaerts Monique and Jeroen S Rozendaal 2010 ldquoUsing Multiple Calibration Indices in
Order to Capture the Complex Picture of What Affects Studentsrsquo Accuracy of Feeling of
Confidencerdquo Learning and Instruction 20 (5) 372ndash382
Bound John Brad Hershbein and Bridget Terry Long 2009 ldquoPlaying the Admissions Game
Student Reactions to Increasing College Competitionrdquo The Journal of Economic
Perspectives 23 (4) 119ndash146
Bowie Liz 2013 ldquoMaryland Schools have been Leader in Advanced Placement but Results are
Mixedrdquo The Baltimore Sun August 17
Bush George W 2006 ldquoState of the Union Address by the Presidentrdquo Washington DC The
White House
Chiu Ming Ming and Robert M Klassen 2010 ldquoRelations of Mathematics Self-Concept and its
Calibration with Mathematics Achievement Cultural Differences among Fifteen-year-
olds in 34 Countriesrdquo Learning and Instruction 20 (1) 2ndash17
Clotfelter Charles T Helen F Ladd and Jacob L Vigdor 2010 ldquoTeacher Credentials and
Student Achievement in High School Across-Subject Analysis with Student Fixed
Effectsrdquo Journal of Human Resources 45 (3) 655ndash681
College Board 2002 Equity Policy Statement New York NY
__________ 2011a AP Biology Curriculum Framework 2012-2013 New York NY
__________ 2011b AP Chemistry Curriculum Framework 2013-2014 New York NY
__________ 2017a AP Course and Exam Redesign New York NY
__________ 2017b AP Course Audit New York NY
__________ 2018 AP Program Participation and Performance Data 2018 New York NY
16
Davis Jennifer R 2014 ldquoA Little Goes a Long Way Pressure for College Students to Succeedrdquo
Journal of Undergraduate Research 12 (1) 1ndash9
Dobbie Will and Roland G Fryer Jr 2015 ldquoThe medium-term impacts of high-achieving
charter schoolsrdquo Journal of Political Economy 123 (5) 985ndash1037
Dougherty Chrys and Lynn Mellor 2009 ldquoPreparation Mattersrdquo National Center for
Educational Achievement Washington DC
Dounay Zinth Jennifer 2016 ldquo50-State Comparison Advanced Placement Policiesrdquo Education
Commission of the States
Drew Christopher 2011 ldquoRethinking Advanced Placementrdquo The New York Times January 7
Duffett Ann and Steve Farkas 2009 ldquoGrowing Pains in the Advanced Placement Program Do
Tough Trade-offs Lie Aheadrdquo Thomas B Fordham Institute Washington DC
Ellis Jessica Bailey K Fosdick and Chris Rasmussen 2016 ldquoWomen 15 Times More Likely to
Leave STEM Pipeline after Calculus Compared to Men Lack of Mathematical
Confidence a Potential Culpritrdquo PLOS ONE 11 (7) 1ndash14
Foust Regan Clark Holly Hertberg-Davis and Carolyn M Callahan 2009 ldquoStudentsrsquo
Perceptions of the Non-academic Advantages and Disadvantages of Participation in
Advanced Placement Courses and International Baccalaureate Programsrdquo Adolescence
44 (174) 289ndash312
Geiser Saul and Veronica Santelices 2004 ldquoThe Role of Advanced Placement and Honors
Courses in College Admissionsrdquo Center for Studies in Higher Education Research
Occasional Paper Series CSHE404
Goodman Joshua Samuel 2012 ldquoThe Labor of Division Returns to Compulsory Math
Courseworkrdquo Unpublished Manuscript
Harel O 2009 ldquoThe Estimation of R-squared and Adjusted R-squared in Incomplete Data Sets
Using Multiple Imputationrdquo Journal of Applied Statistics 36 (10) 1109ndash1118
Hippel Paul T von 2007 ldquoRegression with Missing Ys An Improved Strategy for Analyzing
Multiply Imputed Datardquo Sociological Methodology 37 (1) 83ndash117
Holstead Michael S Terry E Spradlin Margaret E McGillivray and Nathan Burroughs 2010
ldquoThe Impact of Advanced Placement Incentive Programsrdquo Center for Evaluation and
Education Policy Indiana University Education Policy Brief 8(1)
Hopkins Katy 2012 ldquoWeigh the Benefits Stress of AP Courses for Your Studentrdquo US News
the World Report May 10
Huber Martin 2013 ldquoA Simple Test for the Ignorability of Non-compliance in Experimentsrdquo
Economics Letters 120 (3) 389ndash391
Imbens G and F Manski 2004 ldquoConfidence Intervals for Partially Identified Parametersrdquo
Econometrica 72 (6) 1845ndash1857
Jackson C Kirabo 2010 ldquoA Little Now for a Lot Later A Look at a Texas Advanced Placement
Incentive Programrdquo Journal of Human Resources 45 (3) 591ndash639
__________ 2014 ldquoDo College-Preparatory Programs Improve Long-Term Outcomesrdquo
Economic Inquiry 52 (1) 72ndash99
Joensen Juanna Schroslashter and Helena Skyt Nielsen 2009 ldquoIs there a Causal Effect of High
School Math on Labor Market Outcomesrdquo Journal of Human Resources 44 (1) 171ndash
198
Kim Emily 2015 ldquoAP Classes often Translate to Advanced Pressurerdquo Los Angeles Times
September 22
Klopfenstein Kristin and Kit Lively 2016 ldquoDo Grade Weights Promote More Advanced
17
Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324
Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement
Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891
__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and
Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds
Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188
Cambridge Harvard Education Press
Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla
Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)
287ndash 313
Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on
Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102
Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations
of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347
(6219) 262ndash265
Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math
and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic
Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student
STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher
Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking
on Secondary and Postsecondary Successrdquo American Educational Research Journal 49
(2) 285ndash322
Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP
Expansion Can Schools in Less-Resourced Communities Successfully Implement
Advanced Placement Science Coursesrdquo Conditionally accepted by Educational
Researcher
Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo
American Enterprise Institute Washington DC
Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23
McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy
Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of
Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-
144) US Department of Education Washington DC National Center for Education
Statistics
National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of
Mathematics and Science in US High Schoolsrdquo Washington DC National Academies
Press
__________ 2012 A Framework for K-12 Science Education Practices Crosscutting
Concepts and Core Ideas Washington DC The National Academies Press
Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC
Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data
Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures
Version 10 Stanford University
Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic
Analysis amp Policy 4 (1) 1ndash30
18
Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The
Review of Economics and Statistics 86 (2) 497ndash513
Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)
Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of
Advanced High School Coursework in Increasing STEM Career Interestrdquo Science
Educator 23 (1) 1ndash13
Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework
in College Admission Decisionsrdquo College and University 82 (4) 7ndash14
Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan
Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific
Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo
Educational Measurement Forthcoming
Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where
it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor
Economics 35 (1) 67ndash147
Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An
Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732
Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual
differencesrdquo Personality and Individual Differences 21 (6) 971ndash986
Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of
Cross-Cultural Psychology 45 (5) 821ndash837
Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid
Growthrdquo The New York Times April 29
Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo
Liberal Education 94 (3) 38ndash43
The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo
Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo
Education Trust June 5
Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and
Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-
001) US Department of Education Washington DC National Center for Education
Statistics
Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13
Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate
US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the
Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced
Placement Testsrdquo Washington DC
Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of
Advanced Placementrdquo Progressive Policy Institute Washington DC
West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth
Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring
Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation
and Policy Analysis 38 (1) 148ndash170
Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity
of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482
19
Figure 1
Geographic Distribution of Participating Districts
20
Figure 2
Participating Districts Neighborhood Socioeconomic Status and School Test Scores
Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school
district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos
neighborhood defined as the first principal component factor score based on measures of median
income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed
household rate and unemployment rate Y-axis is the districtrsquos average test score in grade
equivalents based on the averaged spring math and English scores for students in grades 3-8 for
2009-2013 with the expected level of achievement standardized to zero The size of each circle
is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using
Statarsquos default settings and roughly shows the predicted test score as a function of the
neighborhoodrsquos SES
21
Figure 3
Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile
Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects
Corresponding OLS estimate shown by the dashed horizontal line Science skill has been
standardized to have a mean of 0 and SD of 1 for the full sample of participating students
Results are weighted by the inverse probability of completing the survey
22
Table 1
Participating Schools and Teachers Compared to Other US High Schools and High School
Science Teachers Panel A Schools Participating Others
Average Enrollment 1409 723
Free or Reduced-Price Lunch 0700 0438
Asian 0055 0050
Black 0349 0154
Hispanic 0410 0221
White 0164 0537
Adjusted Cohort Graduation Rate 0843 0802
District Instruction Expenditures Per Pupil $6561 $5636
District Student Services Expenditures Per Pupil $3787 $3385
Panel B Teachers Participating Others
Age Under 30 0407 0160
Age 30-49 0432 0553
Age 50 or over 0161 0287
Female 0630 0536
Hispanic or Latino 0111 0051
Race American Indian or Alaska Native 0000 0009
Race Asian American 0111 0041
Race Black 0111 0060
Race Native Hawaiian or other Pacific Islander 0000 0004
Race White 0778 0896
Years of Experience 103 132
Years of Experience lt=2 0290 0085
Years of Experience lt=5 0481 0234
Hold a Teaching Certificate 0926 0945
Undergraduate Major in STEM 0944 0747
Single Subject Credential in Science 0630 0823
Masterrsquos Degree or Higher 0356 0615
Previously Taught AP Course 0469 NA
Previously Taught AP IB or Honors Course 0796 NA
Number of Professional Development Trainings 309 NA
in the Past 5 years (0-5)
Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts
httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public
high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a
9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the
Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey
httpsncesedgovsurveyssass Others in Panel B refers to public and private high school
teachers in the US High school science teachers are defined as teachers of grades 9-12 whose
main teaching assignment is in the natural sciences
23
Table 2
TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics
(1) (2) (3) (4) (5) (6)
Full Sample Survey Sample
Pre-Treatment Characteristic
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Age as of October of 11th Grade 166 -003 -007 166 -001 -001
(002) (007) (003) (009)
[019] [035] [065] [094]
Math Exam Score 038 008 025 044 007 030
(004) (010) (005) (016)
[008] [002] [017] [006]
Reading Exam Score 029 010 018 036 009 017
(003) (012) (004) (017)
[000] [014] [002] [031]
HS Grade Point Average 316 005 020 323 006 013
(003) (008) (003) (010)
[014] [002] [006] [020]
Female 059 000 010 061 -001 011
(003) (006) (004) (007)
[099] [010] [073] [012]
Asian American 012 002 010 012 003 010
(002) (005) (001) (007)
[027] [006] [007] [012]
Black 032 -002 -006 027 000 -005
(002) (006) (002) (005)
[029] [028] [088] [040]
Hispanic Native American or Multiracial 031 001 005 033 001 005
24
(002) (006) (002) (007)
[055] [041] [081] [051]
Disabled 002 000 -001 001 000 -001
(001) (001) (001) (001)
[093] [024] [057] [05]
Gifted 013 003 000 014 002 001
(002) (005) (002) (009)
[006] [100] [025] [089]
English Language Learner 005 001 002 004 001 004
(001) (002) (001) (003)
[041] [039] [054] [022]
Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007
(002) (007) (003) (009)
[066] [077] [072] [045]
Language Other than English Spoken at Home 034 002 003 035 001 004
(002) (007) (002) (007)
[032] [073] [059] [056]
Took Recommended Prerequisite Courses 079 000 009 079 002 005
(002) (004) (002) (005)
[084] [004] [043] [031]
Number of Observations 1819 1417
Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by
School x Cohort are in parentheses and p-values are in brackets
25
Table 3
First Stage Impacts on AP Course Enrollment and Overall Course Enrollment
(1) (2) (3) (4) (5) (6)
Full Sample Survey Respondents
Outcome
Control
Group
Mean
ITT
LATE
Control
Group
Mean
ITT
LATE
AP Treatment Course Enrollment 019 038 024 039
(005) (006)
[000] [000] Share of Credits During Study Year in
AP Science 003 004 011 003 004 010
(001) (001) (001) (001)
[000] [000] [000] [000]
All AP 013 004 011 014 004 010
(001) (002) (001) (002)
[000] [000] [000] [000]
Other Advanced Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [023] [020] [020]
All Other Advanced 025 -001 -003 025 -001 -003
(001) (002) (001) (003)
[023] [023] [030] [030]
Regular Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [020] [024] [019]
All Regular 062 -003 -009 061 -003 -007
(001) (003) (001) (003)
[002] [000] [007] [003]
Number of Observations 1819 1417
Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating
Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation
(1) Course-taking information collected from student transcripts Control Group Mean uses the
full control group for the first outcome (ie AP Treatment Course Enrollment) and those control
group members who complied with their assignment (ie those who did not take the AP
Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are
weighted by the inverse probability of completing the survey Standard errors clustered by School
x Cohort are in parentheses and p-values are in brackets
26
Table 4
Treatment Contrast (Composite Variables)
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Academically Challenging Curriculum -033 031 080
(010) (024)
[000] [000]
Project-Based Independent Classroom
Activities -006 013 033
(007) (017)
[007] [006]
Integrated Use of Technology
-011 011 028
(008) (019)
[019] [014]
Number of Observations 1417
Notes To construct these composite variables we first converted the values on each component
variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest
category was set to 10 the lowest to 00 and the remaining categories evenly spaced between
00 and 10 We then averaged and standardized these converted values Results are weighted by
the inverse probability of completing the survey Online Appendix Table 5 provides the list of
component variables Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
27
Table 5
AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Science Skill -019 009 023
(006) (016)
[015] [014]
STEM Interest 062 004 009
(002) (007)
[016] [016]
Confidence in College
Science 092 -004 -010
(002) (005)
[011] [006]
Stress 012 007 017
(003) (007)
[002] [001]
Grades in Science Courses 280 -012 -029
(007) (016)
[008] [007]
Grades in Other Courses 314 -007 -018
(002) (006)
[000] [000]
Number of Observations 1819 for grades 1417 for other
outcomes
Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of
participating students STEM interest =1 if high or some interest in pursuing a STEM degree or
=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to
complete a college science course or =0 if somewhat not confident or not at all confident Stress=
1 if most recent science course had strong negative or negative impact on physical or emotional
health or =0 if strong positive impact positive impact or no impact Grades in science and other
courses are obtained from student transcripts and measure grades during the study year
Results with the exception of grades during study year are weighted by the inverse probability of
completing the survey Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
Table 6
Robustness Checks of Main ITT Results
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Outcome
Control
Group
Complie
r Mean
Main
Result
s
Robus
t SE
p-value
(permutatio
n test)
Excludin
g High
School
56
Including
Imputatio
n of
Missing
Outcome
Variables
Excluding
Covariate
s
Excludin
g High
School
23
Lee
Lower
Boun
d
Lee
Upper
Boun
d
95
Confidence
Interval
from Lee
Bounds
Rati
o of
95
CI in
(11)
to
95
CI in
(7)
Science Skill -019 009 010 011 020 007 003 039
-
009
05
1 20
(006) (005) (000) (000) (000) (000) (007) (007)
[015] [006] [006] [020] [011] [001] [024] [072] [000]
STEM Interest 062 004 005 003 003 003 002 012
-
003
01
8 19
(002) (003) (000) (000) (000) (000) (003) (004)
[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College
Science 092 -004 -003 -006 -006 -004 -006 005
-
009
01
0 20
(002) (002) (000) (000) (000) (000) (002) (003)
[011] [005] [007] [037] [002] [003] [010] [000] [017]
Stress 012 007 005 006 008 007 001 011
-
005
01
5 16
(003) (002) (000) (000) (000) (000) (003) (002)
[002] [000] [000] [014] [007] [002] [002] [079] [000]
Grades in Science Courses 280 -012 -006 -010 -007 |
(007) (004) (000) (000) (000)
[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts
Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey
(002) (003) (000) (000) (000) |
[000] [001] [001] [000] [001] [038]
Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than
standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby
a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of
43
the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)
reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the
experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply
imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and
where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)
from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those
treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and
control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to
derive confidence interval for the treatment effect itself)
30
1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the
Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the
effect of advanced high school courses more generally often without distinctions between AP
and other rigorous course options Nearly all of these nonexperimental studies find large positive
effects of rigorous secondary school courses particularly those in math and science on studentsrsquo
high school postsecondary and labor market performance (eg Altonji 1995 Attewell and
Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long
Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer
an AP course are encouraged (though not required) to attend a professional development
training The Board and other independent agencies offer several workshops with the most
extensive training being the AP summer institute a week-long training that is led by an
experienced AP instructor Teachers are then expected to develop their syllabi for the course and
submit them to the Board for review A team of auditors at the Board review each syllabus and
grant permission to a school to label the course as AP on course catalogs and student transcripts
once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they
do not meet the requirements upon original submission College Board (2017b) contains a
discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for
assessment (ie course delivery and student performance are not assessed by the Board) In
order to effectively run an AP Biology or Chemistry course teachers require access to a well-
equipped classroom and laboratory including all supplies necessary to engage in
experimentation (eg beakers solutions microscopes measuring equipment) Most of the
teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of
learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry
reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses
were based upon recommendations from the National Science Foundation the National Research
Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently
influenced by their frames of reference in ways that other assessments of these traits (eg
external observations) may be less influenced By increasing the standard to which they compare
themselves studentsrsquo confidence may decrease This feature of most self-assessments could be
considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et
al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome
depends to some extent on how these changes in perceived ability influence other behaviors
such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and
Biology I and Chemistry I for AP Biology with no additional requirements beyond these
prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week
training course classroom supplies (eg lab materials textbooks) and to compensate schools
for the staff time required for study administration efforts We also offered $1000 compensation
for an individual selected by the school to serve as a liaison between the study team and the
31
school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of
students which would have powered the study to detect effect sizes smaller than those detected
here We faced several challenges in recruiting schools to participate even with the monetary
incentives Some schools were uncomfortable with randomization across classrooms while
others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the
course would be offered We also made some assignments on a rolling basis as additional
consentassent forms were submitted We have no information on the students who were deemed
eligible by the school to take the new AP science course but who did not sign the consent form
to participate As these students did not participate we do not have permission to obtain
information on their characteristics (eg via transcripts) and for most schools we do not know
the number of such students 10 Participating districts include Anaheim Union High School District California East Side
Union High School District California Lynwood Unified School District California Jefferson
Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg
Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public
Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville
Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate
degrees held by teachers nationally are likely to be in education (not STEM) Thus the study
teachers are less likely to have a graduate degree but not necessarily less likely to have STEM
training We also did not survey teachers regarding their Teach for America (TFA) experience
but it is possible that the relatively high share of STEM undergraduate degrees could be driven
by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last
pilot test included 140 students) prior to administering the tool to study participants Reliability
metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of
the development of the assessment instrument in the survey can be found in Seeratan et al
(2017) 13 Each year in the spring semester our team administered and collected the participant surveys
during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However
if study participants who did not take the survey differ in unobserved ways then our reweighting
based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo
characteristics before imputation of missing values (as described below) these results are very
similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact
that some students did not have 10th grade scores) we created one reading and math score for
each student that is the average of both scores or just the 8th grade score For the 23 participating
students who were in 10th grade during the year in which the AP course was offered to their
cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be
endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of
enrollment in the course so the lack of balance is simply due to unlucky randomization rather
32
than manipulation by school administrators We considered implementing a randomized block
design to avoid such issues but found it infeasible to obtain the necessary test score information
prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was
allowed to register for the new class We added an entire planning year to our study design to
avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by
Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We
find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these
six outcomes which suggests that generalizing our estimated treatment effects to the full control
group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly
offered enrollment in an AP course and then given the option of Chemistry or Biology To
account for the two courses offered we treat the school as two separate groups School-
Chemistry and School-Biology For those students who were not offered an AP course we
randomly assign them to one of two control groups proportional to the number of treated
students who chose each course For example if 60 of the treated students chose Biology then
we randomly assign 60 of the control students to the School-Biology control group In Section
VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a
probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1
if student i in school by cohort j completed any part of the end-of-year survey Xi is the same
vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed
effects and Φ() is the cumulative normal distribution function The results of this regression are
included in Online Appendix Table 2 Students who had higher pre-treatment grades Black
students those who were not disabled and those who took prerequisite courses were more likely
to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives
more weight in the regression to study participants who completed the survey and yet had pre-
study characteristics that were similar to those study participants who did not complete the
survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and
with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we
observe each outcome variable This follows a multiple imputation then deletion strategy
suggested by Hippel (2007) which improves efficiency while protecting against problematic
imputed outcome values As a robustness check Section VC provides results including
imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually
challenging curriculum with more homework than non-AP complier students Treatment group
students are also more likely to report that the students in their class were driven to succeed and
that the teacher set high standards The AP science class also involved more student-led projects
or experiments hands on learning and small group work all activities that are deemed to be
essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)
Yet we do not find strong evidence that students in AP classes were more likely to present what
they learned apply their knowledge to solve a new problem or work independently and none of
the component measures of technology usage were statistically significantly affected Nor did
33
treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear
better able to implement the academic rigor expected of an AP science class than some of the
inquiry-based approaches that the College Board intends for AP science We do not find
evidence that taking AP science led students to be more likely to report that they found their
course more interesting which may reflect the inability of the teachers to fully implement a
creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects
that might render our estimated effects smaller A research design with randomization both
across and within schools would allow for estimation of spillover effects but such a design was
infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers
received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the
weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors
in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple
outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons
(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same
three outcomes that reach statistical significance without applying the correction (shown in
Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys
from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we
have included the students from cohort 1 of high school number 23 where nonresponse was due
mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes
(including student prior academic preparation raceethnicity gender and teacher preparation)
We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in
science and grades in other courses) Some of the differences in the point estimates were quite
large yet so too were the standard errors For instance five of the seven estimated differential
treatment effects on science skill exceed 025 standard deviations with p-values that fall in the
suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse
on all three cohorts of study participants Once data collection is complete we will have the
ability to examine the effect of AP science on college enrollment college selectivity and college
completion
14
interests and beliefs We recruited 23 schools that had not previously offered AP Biology or
Chemistry and were willing to permit us to randomize student access to the newly offered
course At the time of our school recruitment an estimated 50 percent of US high schools
already offered AP science classes and they tended to be in relatively higher-income
communities disproportionately serving White students (Malkus 2016) Our study drew from the
remaining population of schools where teachers had lower levels of training than science
teachers nationally and students were disproportionately non-White and poor Consequently our
results on AP impacts best generalize to schools like these that are on the cusp of deciding
whether to offer an AP science course
The estimates suggest that AP science led to improvements in science skill and STEM
interest above the courses that these students would otherwise take Prior research points to
longer-run benefits of AP including a higher likelihood of college enrollment and completion as
well as possible earnings gains (Jackson 2010 2014) Our findings suggest that these long-term
effects are at least partially driven by genuine increases in skill and not due solely to
postsecondary admissions and credit-granting policies33 We also find that AP science classes
substantially increase studentsrsquo stress levels and reduce their confidence in completing a college
science course Students who take AP science also receive lower grades in science and in other
(non-science) courses The cognitive gains from AP science are consistent with evidence that
higher levels of pressure and a lower level of confidence cause students to learn more than they
would otherwise And some of the negative effect on grades can be offset by upwardly weighting
grades in advanced courses
Although we have no direct way to convert our study impacts into monetary values for
students or society our evidence suggests that schools and districts are not making unwise or
costly investments in AP Calculating the differential cost to deliver an AP course versus another
level course in the same subject is difficult given that few schools document per-course
expenditures One recent analysis of a US district that relied on teacher salaries and course
assignments offers a partial cost-analysis Roza (2009) finds approximately $360 more in per-
pupil expenditures to deliver AP versus honors due primarily to smaller class sizes and more
senior teachers in AP This cost does not factor in the time that teachers spend retraining
themselves to teach the new curriculum At the same time relative to other policies aimed at
increasing human capital in high school that are often more costly to implement (such as
reducing class size) offering an AP course may be one of the least expensive options
This study offers the first credible estimates on the impact of a curriculum that is now offered
in the majority of the nationrsquos high schools and used by most postsecondary institutions to assess
applicant potential Our findings offer evidence to support and refute some of the claims made
about the AP program At the same time many important questions remain about differential AP
course impacts along student teacher and school attributes and on different parts of the outcome
distributions What are the general equilibrium effects of AP expansion for instance on college
admissions decisions as AP expands into schools with fewer resources Do AP courses generate
spillover effects on non-AP course-takers via changes in peer interactions and changes in how
teachers teach their non-AP classes These are all questions that warrant further research
15
References
Abadie Alberto Susan Athey Guido W Imbens and Jeffrey Wooldridge 2017 ldquoWhen Should
you Adjust Standard Errors for Clusteringrdquo NBER Working Paper No 24003
Cambridge MA NBER
Adelman Clifford 2006 The Toolbox Revisited Paths to Degree Completion from High School
Through College Washington DC US Department of Education
Aguilar Lauren Greg Walton and Carl Wieman 2014 ldquoPsychological Insights for Improved
Physics Teachingrdquo Physics Today 67 (5) 43ndash49
Altonji Joseph G 1995 ldquoThe Effects of High School Curriculum on Education and Labor
Market Outcomesrdquo The Journal of Human Resources 30 (3) 409ndash438
Anderson Carl R 1976 ldquoCoping Behaviors as Intervening Mechanisms in the Inverted-U-stress-
performance Relationshiprdquo Journal of Applied Psychology 61 (1) 30ndash34
Attewell Paul and Thurston Domina 2008 ldquoRaising the Bar Curricular Intensity and
Academic Performancerdquo Educational Evaluation and Policy Analysis 30 (1) 51ndash71
Avery Christopher Oded Gurantz Michael Hurwitz and Jonathan Smith 2018 ldquoShifting
College Majors in Response to Advanced Placement Exam Scoresrdquo Journal of Human
Resources 53 (4) 918ndash956
Benjamini Yoav and Yosef Hochberg 1995 ldquoControlling the False Discovery Rate A Practical
and Powerful Approach to Multiple Testingrdquo Journal of the Royal Statistical Society 57
(1) 289ndash300
Bennett J S Hogarth F Lubben B Campbell and A Robinson 2010 ldquoTalking Science The
Research Evidence on the Use of Small Group Discussions in Science Teachingrdquo
International Journal of Science Education 32 (1) 69ndash95
Berger Joe 2006 ldquoDemoting Advanced Placementrdquo The New York Times October 4
Boekaerts Monique and Jeroen S Rozendaal 2010 ldquoUsing Multiple Calibration Indices in
Order to Capture the Complex Picture of What Affects Studentsrsquo Accuracy of Feeling of
Confidencerdquo Learning and Instruction 20 (5) 372ndash382
Bound John Brad Hershbein and Bridget Terry Long 2009 ldquoPlaying the Admissions Game
Student Reactions to Increasing College Competitionrdquo The Journal of Economic
Perspectives 23 (4) 119ndash146
Bowie Liz 2013 ldquoMaryland Schools have been Leader in Advanced Placement but Results are
Mixedrdquo The Baltimore Sun August 17
Bush George W 2006 ldquoState of the Union Address by the Presidentrdquo Washington DC The
White House
Chiu Ming Ming and Robert M Klassen 2010 ldquoRelations of Mathematics Self-Concept and its
Calibration with Mathematics Achievement Cultural Differences among Fifteen-year-
olds in 34 Countriesrdquo Learning and Instruction 20 (1) 2ndash17
Clotfelter Charles T Helen F Ladd and Jacob L Vigdor 2010 ldquoTeacher Credentials and
Student Achievement in High School Across-Subject Analysis with Student Fixed
Effectsrdquo Journal of Human Resources 45 (3) 655ndash681
College Board 2002 Equity Policy Statement New York NY
__________ 2011a AP Biology Curriculum Framework 2012-2013 New York NY
__________ 2011b AP Chemistry Curriculum Framework 2013-2014 New York NY
__________ 2017a AP Course and Exam Redesign New York NY
__________ 2017b AP Course Audit New York NY
__________ 2018 AP Program Participation and Performance Data 2018 New York NY
16
Davis Jennifer R 2014 ldquoA Little Goes a Long Way Pressure for College Students to Succeedrdquo
Journal of Undergraduate Research 12 (1) 1ndash9
Dobbie Will and Roland G Fryer Jr 2015 ldquoThe medium-term impacts of high-achieving
charter schoolsrdquo Journal of Political Economy 123 (5) 985ndash1037
Dougherty Chrys and Lynn Mellor 2009 ldquoPreparation Mattersrdquo National Center for
Educational Achievement Washington DC
Dounay Zinth Jennifer 2016 ldquo50-State Comparison Advanced Placement Policiesrdquo Education
Commission of the States
Drew Christopher 2011 ldquoRethinking Advanced Placementrdquo The New York Times January 7
Duffett Ann and Steve Farkas 2009 ldquoGrowing Pains in the Advanced Placement Program Do
Tough Trade-offs Lie Aheadrdquo Thomas B Fordham Institute Washington DC
Ellis Jessica Bailey K Fosdick and Chris Rasmussen 2016 ldquoWomen 15 Times More Likely to
Leave STEM Pipeline after Calculus Compared to Men Lack of Mathematical
Confidence a Potential Culpritrdquo PLOS ONE 11 (7) 1ndash14
Foust Regan Clark Holly Hertberg-Davis and Carolyn M Callahan 2009 ldquoStudentsrsquo
Perceptions of the Non-academic Advantages and Disadvantages of Participation in
Advanced Placement Courses and International Baccalaureate Programsrdquo Adolescence
44 (174) 289ndash312
Geiser Saul and Veronica Santelices 2004 ldquoThe Role of Advanced Placement and Honors
Courses in College Admissionsrdquo Center for Studies in Higher Education Research
Occasional Paper Series CSHE404
Goodman Joshua Samuel 2012 ldquoThe Labor of Division Returns to Compulsory Math
Courseworkrdquo Unpublished Manuscript
Harel O 2009 ldquoThe Estimation of R-squared and Adjusted R-squared in Incomplete Data Sets
Using Multiple Imputationrdquo Journal of Applied Statistics 36 (10) 1109ndash1118
Hippel Paul T von 2007 ldquoRegression with Missing Ys An Improved Strategy for Analyzing
Multiply Imputed Datardquo Sociological Methodology 37 (1) 83ndash117
Holstead Michael S Terry E Spradlin Margaret E McGillivray and Nathan Burroughs 2010
ldquoThe Impact of Advanced Placement Incentive Programsrdquo Center for Evaluation and
Education Policy Indiana University Education Policy Brief 8(1)
Hopkins Katy 2012 ldquoWeigh the Benefits Stress of AP Courses for Your Studentrdquo US News
the World Report May 10
Huber Martin 2013 ldquoA Simple Test for the Ignorability of Non-compliance in Experimentsrdquo
Economics Letters 120 (3) 389ndash391
Imbens G and F Manski 2004 ldquoConfidence Intervals for Partially Identified Parametersrdquo
Econometrica 72 (6) 1845ndash1857
Jackson C Kirabo 2010 ldquoA Little Now for a Lot Later A Look at a Texas Advanced Placement
Incentive Programrdquo Journal of Human Resources 45 (3) 591ndash639
__________ 2014 ldquoDo College-Preparatory Programs Improve Long-Term Outcomesrdquo
Economic Inquiry 52 (1) 72ndash99
Joensen Juanna Schroslashter and Helena Skyt Nielsen 2009 ldquoIs there a Causal Effect of High
School Math on Labor Market Outcomesrdquo Journal of Human Resources 44 (1) 171ndash
198
Kim Emily 2015 ldquoAP Classes often Translate to Advanced Pressurerdquo Los Angeles Times
September 22
Klopfenstein Kristin and Kit Lively 2016 ldquoDo Grade Weights Promote More Advanced
17
Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324
Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement
Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891
__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and
Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds
Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188
Cambridge Harvard Education Press
Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla
Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)
287ndash 313
Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on
Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102
Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations
of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347
(6219) 262ndash265
Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math
and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic
Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student
STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher
Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking
on Secondary and Postsecondary Successrdquo American Educational Research Journal 49
(2) 285ndash322
Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP
Expansion Can Schools in Less-Resourced Communities Successfully Implement
Advanced Placement Science Coursesrdquo Conditionally accepted by Educational
Researcher
Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo
American Enterprise Institute Washington DC
Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23
McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy
Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of
Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-
144) US Department of Education Washington DC National Center for Education
Statistics
National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of
Mathematics and Science in US High Schoolsrdquo Washington DC National Academies
Press
__________ 2012 A Framework for K-12 Science Education Practices Crosscutting
Concepts and Core Ideas Washington DC The National Academies Press
Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC
Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data
Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures
Version 10 Stanford University
Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic
Analysis amp Policy 4 (1) 1ndash30
18
Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The
Review of Economics and Statistics 86 (2) 497ndash513
Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)
Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of
Advanced High School Coursework in Increasing STEM Career Interestrdquo Science
Educator 23 (1) 1ndash13
Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework
in College Admission Decisionsrdquo College and University 82 (4) 7ndash14
Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan
Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific
Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo
Educational Measurement Forthcoming
Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where
it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor
Economics 35 (1) 67ndash147
Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An
Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732
Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual
differencesrdquo Personality and Individual Differences 21 (6) 971ndash986
Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of
Cross-Cultural Psychology 45 (5) 821ndash837
Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid
Growthrdquo The New York Times April 29
Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo
Liberal Education 94 (3) 38ndash43
The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo
Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo
Education Trust June 5
Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and
Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-
001) US Department of Education Washington DC National Center for Education
Statistics
Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13
Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate
US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the
Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced
Placement Testsrdquo Washington DC
Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of
Advanced Placementrdquo Progressive Policy Institute Washington DC
West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth
Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring
Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation
and Policy Analysis 38 (1) 148ndash170
Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity
of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482
19
Figure 1
Geographic Distribution of Participating Districts
20
Figure 2
Participating Districts Neighborhood Socioeconomic Status and School Test Scores
Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school
district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos
neighborhood defined as the first principal component factor score based on measures of median
income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed
household rate and unemployment rate Y-axis is the districtrsquos average test score in grade
equivalents based on the averaged spring math and English scores for students in grades 3-8 for
2009-2013 with the expected level of achievement standardized to zero The size of each circle
is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using
Statarsquos default settings and roughly shows the predicted test score as a function of the
neighborhoodrsquos SES
21
Figure 3
Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile
Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects
Corresponding OLS estimate shown by the dashed horizontal line Science skill has been
standardized to have a mean of 0 and SD of 1 for the full sample of participating students
Results are weighted by the inverse probability of completing the survey
22
Table 1
Participating Schools and Teachers Compared to Other US High Schools and High School
Science Teachers Panel A Schools Participating Others
Average Enrollment 1409 723
Free or Reduced-Price Lunch 0700 0438
Asian 0055 0050
Black 0349 0154
Hispanic 0410 0221
White 0164 0537
Adjusted Cohort Graduation Rate 0843 0802
District Instruction Expenditures Per Pupil $6561 $5636
District Student Services Expenditures Per Pupil $3787 $3385
Panel B Teachers Participating Others
Age Under 30 0407 0160
Age 30-49 0432 0553
Age 50 or over 0161 0287
Female 0630 0536
Hispanic or Latino 0111 0051
Race American Indian or Alaska Native 0000 0009
Race Asian American 0111 0041
Race Black 0111 0060
Race Native Hawaiian or other Pacific Islander 0000 0004
Race White 0778 0896
Years of Experience 103 132
Years of Experience lt=2 0290 0085
Years of Experience lt=5 0481 0234
Hold a Teaching Certificate 0926 0945
Undergraduate Major in STEM 0944 0747
Single Subject Credential in Science 0630 0823
Masterrsquos Degree or Higher 0356 0615
Previously Taught AP Course 0469 NA
Previously Taught AP IB or Honors Course 0796 NA
Number of Professional Development Trainings 309 NA
in the Past 5 years (0-5)
Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts
httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public
high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a
9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the
Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey
httpsncesedgovsurveyssass Others in Panel B refers to public and private high school
teachers in the US High school science teachers are defined as teachers of grades 9-12 whose
main teaching assignment is in the natural sciences
23
Table 2
TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics
(1) (2) (3) (4) (5) (6)
Full Sample Survey Sample
Pre-Treatment Characteristic
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Age as of October of 11th Grade 166 -003 -007 166 -001 -001
(002) (007) (003) (009)
[019] [035] [065] [094]
Math Exam Score 038 008 025 044 007 030
(004) (010) (005) (016)
[008] [002] [017] [006]
Reading Exam Score 029 010 018 036 009 017
(003) (012) (004) (017)
[000] [014] [002] [031]
HS Grade Point Average 316 005 020 323 006 013
(003) (008) (003) (010)
[014] [002] [006] [020]
Female 059 000 010 061 -001 011
(003) (006) (004) (007)
[099] [010] [073] [012]
Asian American 012 002 010 012 003 010
(002) (005) (001) (007)
[027] [006] [007] [012]
Black 032 -002 -006 027 000 -005
(002) (006) (002) (005)
[029] [028] [088] [040]
Hispanic Native American or Multiracial 031 001 005 033 001 005
24
(002) (006) (002) (007)
[055] [041] [081] [051]
Disabled 002 000 -001 001 000 -001
(001) (001) (001) (001)
[093] [024] [057] [05]
Gifted 013 003 000 014 002 001
(002) (005) (002) (009)
[006] [100] [025] [089]
English Language Learner 005 001 002 004 001 004
(001) (002) (001) (003)
[041] [039] [054] [022]
Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007
(002) (007) (003) (009)
[066] [077] [072] [045]
Language Other than English Spoken at Home 034 002 003 035 001 004
(002) (007) (002) (007)
[032] [073] [059] [056]
Took Recommended Prerequisite Courses 079 000 009 079 002 005
(002) (004) (002) (005)
[084] [004] [043] [031]
Number of Observations 1819 1417
Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by
School x Cohort are in parentheses and p-values are in brackets
25
Table 3
First Stage Impacts on AP Course Enrollment and Overall Course Enrollment
(1) (2) (3) (4) (5) (6)
Full Sample Survey Respondents
Outcome
Control
Group
Mean
ITT
LATE
Control
Group
Mean
ITT
LATE
AP Treatment Course Enrollment 019 038 024 039
(005) (006)
[000] [000] Share of Credits During Study Year in
AP Science 003 004 011 003 004 010
(001) (001) (001) (001)
[000] [000] [000] [000]
All AP 013 004 011 014 004 010
(001) (002) (001) (002)
[000] [000] [000] [000]
Other Advanced Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [023] [020] [020]
All Other Advanced 025 -001 -003 025 -001 -003
(001) (002) (001) (003)
[023] [023] [030] [030]
Regular Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [020] [024] [019]
All Regular 062 -003 -009 061 -003 -007
(001) (003) (001) (003)
[002] [000] [007] [003]
Number of Observations 1819 1417
Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating
Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation
(1) Course-taking information collected from student transcripts Control Group Mean uses the
full control group for the first outcome (ie AP Treatment Course Enrollment) and those control
group members who complied with their assignment (ie those who did not take the AP
Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are
weighted by the inverse probability of completing the survey Standard errors clustered by School
x Cohort are in parentheses and p-values are in brackets
26
Table 4
Treatment Contrast (Composite Variables)
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Academically Challenging Curriculum -033 031 080
(010) (024)
[000] [000]
Project-Based Independent Classroom
Activities -006 013 033
(007) (017)
[007] [006]
Integrated Use of Technology
-011 011 028
(008) (019)
[019] [014]
Number of Observations 1417
Notes To construct these composite variables we first converted the values on each component
variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest
category was set to 10 the lowest to 00 and the remaining categories evenly spaced between
00 and 10 We then averaged and standardized these converted values Results are weighted by
the inverse probability of completing the survey Online Appendix Table 5 provides the list of
component variables Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
27
Table 5
AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Science Skill -019 009 023
(006) (016)
[015] [014]
STEM Interest 062 004 009
(002) (007)
[016] [016]
Confidence in College
Science 092 -004 -010
(002) (005)
[011] [006]
Stress 012 007 017
(003) (007)
[002] [001]
Grades in Science Courses 280 -012 -029
(007) (016)
[008] [007]
Grades in Other Courses 314 -007 -018
(002) (006)
[000] [000]
Number of Observations 1819 for grades 1417 for other
outcomes
Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of
participating students STEM interest =1 if high or some interest in pursuing a STEM degree or
=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to
complete a college science course or =0 if somewhat not confident or not at all confident Stress=
1 if most recent science course had strong negative or negative impact on physical or emotional
health or =0 if strong positive impact positive impact or no impact Grades in science and other
courses are obtained from student transcripts and measure grades during the study year
Results with the exception of grades during study year are weighted by the inverse probability of
completing the survey Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
Table 6
Robustness Checks of Main ITT Results
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Outcome
Control
Group
Complie
r Mean
Main
Result
s
Robus
t SE
p-value
(permutatio
n test)
Excludin
g High
School
56
Including
Imputatio
n of
Missing
Outcome
Variables
Excluding
Covariate
s
Excludin
g High
School
23
Lee
Lower
Boun
d
Lee
Upper
Boun
d
95
Confidence
Interval
from Lee
Bounds
Rati
o of
95
CI in
(11)
to
95
CI in
(7)
Science Skill -019 009 010 011 020 007 003 039
-
009
05
1 20
(006) (005) (000) (000) (000) (000) (007) (007)
[015] [006] [006] [020] [011] [001] [024] [072] [000]
STEM Interest 062 004 005 003 003 003 002 012
-
003
01
8 19
(002) (003) (000) (000) (000) (000) (003) (004)
[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College
Science 092 -004 -003 -006 -006 -004 -006 005
-
009
01
0 20
(002) (002) (000) (000) (000) (000) (002) (003)
[011] [005] [007] [037] [002] [003] [010] [000] [017]
Stress 012 007 005 006 008 007 001 011
-
005
01
5 16
(003) (002) (000) (000) (000) (000) (003) (002)
[002] [000] [000] [014] [007] [002] [002] [079] [000]
Grades in Science Courses 280 -012 -006 -010 -007 |
(007) (004) (000) (000) (000)
[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts
Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey
(002) (003) (000) (000) (000) |
[000] [001] [001] [000] [001] [038]
Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than
standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby
a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of
43
the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)
reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the
experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply
imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and
where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)
from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those
treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and
control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to
derive confidence interval for the treatment effect itself)
30
1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the
Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the
effect of advanced high school courses more generally often without distinctions between AP
and other rigorous course options Nearly all of these nonexperimental studies find large positive
effects of rigorous secondary school courses particularly those in math and science on studentsrsquo
high school postsecondary and labor market performance (eg Altonji 1995 Attewell and
Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long
Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer
an AP course are encouraged (though not required) to attend a professional development
training The Board and other independent agencies offer several workshops with the most
extensive training being the AP summer institute a week-long training that is led by an
experienced AP instructor Teachers are then expected to develop their syllabi for the course and
submit them to the Board for review A team of auditors at the Board review each syllabus and
grant permission to a school to label the course as AP on course catalogs and student transcripts
once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they
do not meet the requirements upon original submission College Board (2017b) contains a
discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for
assessment (ie course delivery and student performance are not assessed by the Board) In
order to effectively run an AP Biology or Chemistry course teachers require access to a well-
equipped classroom and laboratory including all supplies necessary to engage in
experimentation (eg beakers solutions microscopes measuring equipment) Most of the
teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of
learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry
reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses
were based upon recommendations from the National Science Foundation the National Research
Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently
influenced by their frames of reference in ways that other assessments of these traits (eg
external observations) may be less influenced By increasing the standard to which they compare
themselves studentsrsquo confidence may decrease This feature of most self-assessments could be
considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et
al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome
depends to some extent on how these changes in perceived ability influence other behaviors
such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and
Biology I and Chemistry I for AP Biology with no additional requirements beyond these
prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week
training course classroom supplies (eg lab materials textbooks) and to compensate schools
for the staff time required for study administration efforts We also offered $1000 compensation
for an individual selected by the school to serve as a liaison between the study team and the
31
school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of
students which would have powered the study to detect effect sizes smaller than those detected
here We faced several challenges in recruiting schools to participate even with the monetary
incentives Some schools were uncomfortable with randomization across classrooms while
others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the
course would be offered We also made some assignments on a rolling basis as additional
consentassent forms were submitted We have no information on the students who were deemed
eligible by the school to take the new AP science course but who did not sign the consent form
to participate As these students did not participate we do not have permission to obtain
information on their characteristics (eg via transcripts) and for most schools we do not know
the number of such students 10 Participating districts include Anaheim Union High School District California East Side
Union High School District California Lynwood Unified School District California Jefferson
Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg
Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public
Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville
Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate
degrees held by teachers nationally are likely to be in education (not STEM) Thus the study
teachers are less likely to have a graduate degree but not necessarily less likely to have STEM
training We also did not survey teachers regarding their Teach for America (TFA) experience
but it is possible that the relatively high share of STEM undergraduate degrees could be driven
by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last
pilot test included 140 students) prior to administering the tool to study participants Reliability
metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of
the development of the assessment instrument in the survey can be found in Seeratan et al
(2017) 13 Each year in the spring semester our team administered and collected the participant surveys
during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However
if study participants who did not take the survey differ in unobserved ways then our reweighting
based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo
characteristics before imputation of missing values (as described below) these results are very
similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact
that some students did not have 10th grade scores) we created one reading and math score for
each student that is the average of both scores or just the 8th grade score For the 23 participating
students who were in 10th grade during the year in which the AP course was offered to their
cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be
endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of
enrollment in the course so the lack of balance is simply due to unlucky randomization rather
32
than manipulation by school administrators We considered implementing a randomized block
design to avoid such issues but found it infeasible to obtain the necessary test score information
prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was
allowed to register for the new class We added an entire planning year to our study design to
avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by
Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We
find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these
six outcomes which suggests that generalizing our estimated treatment effects to the full control
group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly
offered enrollment in an AP course and then given the option of Chemistry or Biology To
account for the two courses offered we treat the school as two separate groups School-
Chemistry and School-Biology For those students who were not offered an AP course we
randomly assign them to one of two control groups proportional to the number of treated
students who chose each course For example if 60 of the treated students chose Biology then
we randomly assign 60 of the control students to the School-Biology control group In Section
VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a
probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1
if student i in school by cohort j completed any part of the end-of-year survey Xi is the same
vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed
effects and Φ() is the cumulative normal distribution function The results of this regression are
included in Online Appendix Table 2 Students who had higher pre-treatment grades Black
students those who were not disabled and those who took prerequisite courses were more likely
to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives
more weight in the regression to study participants who completed the survey and yet had pre-
study characteristics that were similar to those study participants who did not complete the
survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and
with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we
observe each outcome variable This follows a multiple imputation then deletion strategy
suggested by Hippel (2007) which improves efficiency while protecting against problematic
imputed outcome values As a robustness check Section VC provides results including
imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually
challenging curriculum with more homework than non-AP complier students Treatment group
students are also more likely to report that the students in their class were driven to succeed and
that the teacher set high standards The AP science class also involved more student-led projects
or experiments hands on learning and small group work all activities that are deemed to be
essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)
Yet we do not find strong evidence that students in AP classes were more likely to present what
they learned apply their knowledge to solve a new problem or work independently and none of
the component measures of technology usage were statistically significantly affected Nor did
33
treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear
better able to implement the academic rigor expected of an AP science class than some of the
inquiry-based approaches that the College Board intends for AP science We do not find
evidence that taking AP science led students to be more likely to report that they found their
course more interesting which may reflect the inability of the teachers to fully implement a
creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects
that might render our estimated effects smaller A research design with randomization both
across and within schools would allow for estimation of spillover effects but such a design was
infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers
received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the
weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors
in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple
outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons
(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same
three outcomes that reach statistical significance without applying the correction (shown in
Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys
from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we
have included the students from cohort 1 of high school number 23 where nonresponse was due
mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes
(including student prior academic preparation raceethnicity gender and teacher preparation)
We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in
science and grades in other courses) Some of the differences in the point estimates were quite
large yet so too were the standard errors For instance five of the seven estimated differential
treatment effects on science skill exceed 025 standard deviations with p-values that fall in the
suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse
on all three cohorts of study participants Once data collection is complete we will have the
ability to examine the effect of AP science on college enrollment college selectivity and college
completion
15
References
Abadie Alberto Susan Athey Guido W Imbens and Jeffrey Wooldridge 2017 ldquoWhen Should
you Adjust Standard Errors for Clusteringrdquo NBER Working Paper No 24003
Cambridge MA NBER
Adelman Clifford 2006 The Toolbox Revisited Paths to Degree Completion from High School
Through College Washington DC US Department of Education
Aguilar Lauren Greg Walton and Carl Wieman 2014 ldquoPsychological Insights for Improved
Physics Teachingrdquo Physics Today 67 (5) 43ndash49
Altonji Joseph G 1995 ldquoThe Effects of High School Curriculum on Education and Labor
Market Outcomesrdquo The Journal of Human Resources 30 (3) 409ndash438
Anderson Carl R 1976 ldquoCoping Behaviors as Intervening Mechanisms in the Inverted-U-stress-
performance Relationshiprdquo Journal of Applied Psychology 61 (1) 30ndash34
Attewell Paul and Thurston Domina 2008 ldquoRaising the Bar Curricular Intensity and
Academic Performancerdquo Educational Evaluation and Policy Analysis 30 (1) 51ndash71
Avery Christopher Oded Gurantz Michael Hurwitz and Jonathan Smith 2018 ldquoShifting
College Majors in Response to Advanced Placement Exam Scoresrdquo Journal of Human
Resources 53 (4) 918ndash956
Benjamini Yoav and Yosef Hochberg 1995 ldquoControlling the False Discovery Rate A Practical
and Powerful Approach to Multiple Testingrdquo Journal of the Royal Statistical Society 57
(1) 289ndash300
Bennett J S Hogarth F Lubben B Campbell and A Robinson 2010 ldquoTalking Science The
Research Evidence on the Use of Small Group Discussions in Science Teachingrdquo
International Journal of Science Education 32 (1) 69ndash95
Berger Joe 2006 ldquoDemoting Advanced Placementrdquo The New York Times October 4
Boekaerts Monique and Jeroen S Rozendaal 2010 ldquoUsing Multiple Calibration Indices in
Order to Capture the Complex Picture of What Affects Studentsrsquo Accuracy of Feeling of
Confidencerdquo Learning and Instruction 20 (5) 372ndash382
Bound John Brad Hershbein and Bridget Terry Long 2009 ldquoPlaying the Admissions Game
Student Reactions to Increasing College Competitionrdquo The Journal of Economic
Perspectives 23 (4) 119ndash146
Bowie Liz 2013 ldquoMaryland Schools have been Leader in Advanced Placement but Results are
Mixedrdquo The Baltimore Sun August 17
Bush George W 2006 ldquoState of the Union Address by the Presidentrdquo Washington DC The
White House
Chiu Ming Ming and Robert M Klassen 2010 ldquoRelations of Mathematics Self-Concept and its
Calibration with Mathematics Achievement Cultural Differences among Fifteen-year-
olds in 34 Countriesrdquo Learning and Instruction 20 (1) 2ndash17
Clotfelter Charles T Helen F Ladd and Jacob L Vigdor 2010 ldquoTeacher Credentials and
Student Achievement in High School Across-Subject Analysis with Student Fixed
Effectsrdquo Journal of Human Resources 45 (3) 655ndash681
College Board 2002 Equity Policy Statement New York NY
__________ 2011a AP Biology Curriculum Framework 2012-2013 New York NY
__________ 2011b AP Chemistry Curriculum Framework 2013-2014 New York NY
__________ 2017a AP Course and Exam Redesign New York NY
__________ 2017b AP Course Audit New York NY
__________ 2018 AP Program Participation and Performance Data 2018 New York NY
16
Davis Jennifer R 2014 ldquoA Little Goes a Long Way Pressure for College Students to Succeedrdquo
Journal of Undergraduate Research 12 (1) 1ndash9
Dobbie Will and Roland G Fryer Jr 2015 ldquoThe medium-term impacts of high-achieving
charter schoolsrdquo Journal of Political Economy 123 (5) 985ndash1037
Dougherty Chrys and Lynn Mellor 2009 ldquoPreparation Mattersrdquo National Center for
Educational Achievement Washington DC
Dounay Zinth Jennifer 2016 ldquo50-State Comparison Advanced Placement Policiesrdquo Education
Commission of the States
Drew Christopher 2011 ldquoRethinking Advanced Placementrdquo The New York Times January 7
Duffett Ann and Steve Farkas 2009 ldquoGrowing Pains in the Advanced Placement Program Do
Tough Trade-offs Lie Aheadrdquo Thomas B Fordham Institute Washington DC
Ellis Jessica Bailey K Fosdick and Chris Rasmussen 2016 ldquoWomen 15 Times More Likely to
Leave STEM Pipeline after Calculus Compared to Men Lack of Mathematical
Confidence a Potential Culpritrdquo PLOS ONE 11 (7) 1ndash14
Foust Regan Clark Holly Hertberg-Davis and Carolyn M Callahan 2009 ldquoStudentsrsquo
Perceptions of the Non-academic Advantages and Disadvantages of Participation in
Advanced Placement Courses and International Baccalaureate Programsrdquo Adolescence
44 (174) 289ndash312
Geiser Saul and Veronica Santelices 2004 ldquoThe Role of Advanced Placement and Honors
Courses in College Admissionsrdquo Center for Studies in Higher Education Research
Occasional Paper Series CSHE404
Goodman Joshua Samuel 2012 ldquoThe Labor of Division Returns to Compulsory Math
Courseworkrdquo Unpublished Manuscript
Harel O 2009 ldquoThe Estimation of R-squared and Adjusted R-squared in Incomplete Data Sets
Using Multiple Imputationrdquo Journal of Applied Statistics 36 (10) 1109ndash1118
Hippel Paul T von 2007 ldquoRegression with Missing Ys An Improved Strategy for Analyzing
Multiply Imputed Datardquo Sociological Methodology 37 (1) 83ndash117
Holstead Michael S Terry E Spradlin Margaret E McGillivray and Nathan Burroughs 2010
ldquoThe Impact of Advanced Placement Incentive Programsrdquo Center for Evaluation and
Education Policy Indiana University Education Policy Brief 8(1)
Hopkins Katy 2012 ldquoWeigh the Benefits Stress of AP Courses for Your Studentrdquo US News
the World Report May 10
Huber Martin 2013 ldquoA Simple Test for the Ignorability of Non-compliance in Experimentsrdquo
Economics Letters 120 (3) 389ndash391
Imbens G and F Manski 2004 ldquoConfidence Intervals for Partially Identified Parametersrdquo
Econometrica 72 (6) 1845ndash1857
Jackson C Kirabo 2010 ldquoA Little Now for a Lot Later A Look at a Texas Advanced Placement
Incentive Programrdquo Journal of Human Resources 45 (3) 591ndash639
__________ 2014 ldquoDo College-Preparatory Programs Improve Long-Term Outcomesrdquo
Economic Inquiry 52 (1) 72ndash99
Joensen Juanna Schroslashter and Helena Skyt Nielsen 2009 ldquoIs there a Causal Effect of High
School Math on Labor Market Outcomesrdquo Journal of Human Resources 44 (1) 171ndash
198
Kim Emily 2015 ldquoAP Classes often Translate to Advanced Pressurerdquo Los Angeles Times
September 22
Klopfenstein Kristin and Kit Lively 2016 ldquoDo Grade Weights Promote More Advanced
17
Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324
Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement
Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891
__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and
Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds
Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188
Cambridge Harvard Education Press
Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla
Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)
287ndash 313
Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on
Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102
Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations
of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347
(6219) 262ndash265
Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math
and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic
Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student
STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher
Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking
on Secondary and Postsecondary Successrdquo American Educational Research Journal 49
(2) 285ndash322
Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP
Expansion Can Schools in Less-Resourced Communities Successfully Implement
Advanced Placement Science Coursesrdquo Conditionally accepted by Educational
Researcher
Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo
American Enterprise Institute Washington DC
Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23
McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy
Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of
Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-
144) US Department of Education Washington DC National Center for Education
Statistics
National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of
Mathematics and Science in US High Schoolsrdquo Washington DC National Academies
Press
__________ 2012 A Framework for K-12 Science Education Practices Crosscutting
Concepts and Core Ideas Washington DC The National Academies Press
Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC
Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data
Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures
Version 10 Stanford University
Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic
Analysis amp Policy 4 (1) 1ndash30
18
Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The
Review of Economics and Statistics 86 (2) 497ndash513
Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)
Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of
Advanced High School Coursework in Increasing STEM Career Interestrdquo Science
Educator 23 (1) 1ndash13
Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework
in College Admission Decisionsrdquo College and University 82 (4) 7ndash14
Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan
Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific
Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo
Educational Measurement Forthcoming
Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where
it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor
Economics 35 (1) 67ndash147
Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An
Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732
Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual
differencesrdquo Personality and Individual Differences 21 (6) 971ndash986
Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of
Cross-Cultural Psychology 45 (5) 821ndash837
Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid
Growthrdquo The New York Times April 29
Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo
Liberal Education 94 (3) 38ndash43
The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo
Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo
Education Trust June 5
Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and
Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-
001) US Department of Education Washington DC National Center for Education
Statistics
Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13
Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate
US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the
Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced
Placement Testsrdquo Washington DC
Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of
Advanced Placementrdquo Progressive Policy Institute Washington DC
West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth
Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring
Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation
and Policy Analysis 38 (1) 148ndash170
Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity
of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482
19
Figure 1
Geographic Distribution of Participating Districts
20
Figure 2
Participating Districts Neighborhood Socioeconomic Status and School Test Scores
Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school
district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos
neighborhood defined as the first principal component factor score based on measures of median
income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed
household rate and unemployment rate Y-axis is the districtrsquos average test score in grade
equivalents based on the averaged spring math and English scores for students in grades 3-8 for
2009-2013 with the expected level of achievement standardized to zero The size of each circle
is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using
Statarsquos default settings and roughly shows the predicted test score as a function of the
neighborhoodrsquos SES
21
Figure 3
Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile
Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects
Corresponding OLS estimate shown by the dashed horizontal line Science skill has been
standardized to have a mean of 0 and SD of 1 for the full sample of participating students
Results are weighted by the inverse probability of completing the survey
22
Table 1
Participating Schools and Teachers Compared to Other US High Schools and High School
Science Teachers Panel A Schools Participating Others
Average Enrollment 1409 723
Free or Reduced-Price Lunch 0700 0438
Asian 0055 0050
Black 0349 0154
Hispanic 0410 0221
White 0164 0537
Adjusted Cohort Graduation Rate 0843 0802
District Instruction Expenditures Per Pupil $6561 $5636
District Student Services Expenditures Per Pupil $3787 $3385
Panel B Teachers Participating Others
Age Under 30 0407 0160
Age 30-49 0432 0553
Age 50 or over 0161 0287
Female 0630 0536
Hispanic or Latino 0111 0051
Race American Indian or Alaska Native 0000 0009
Race Asian American 0111 0041
Race Black 0111 0060
Race Native Hawaiian or other Pacific Islander 0000 0004
Race White 0778 0896
Years of Experience 103 132
Years of Experience lt=2 0290 0085
Years of Experience lt=5 0481 0234
Hold a Teaching Certificate 0926 0945
Undergraduate Major in STEM 0944 0747
Single Subject Credential in Science 0630 0823
Masterrsquos Degree or Higher 0356 0615
Previously Taught AP Course 0469 NA
Previously Taught AP IB or Honors Course 0796 NA
Number of Professional Development Trainings 309 NA
in the Past 5 years (0-5)
Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts
httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public
high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a
9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the
Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey
httpsncesedgovsurveyssass Others in Panel B refers to public and private high school
teachers in the US High school science teachers are defined as teachers of grades 9-12 whose
main teaching assignment is in the natural sciences
23
Table 2
TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics
(1) (2) (3) (4) (5) (6)
Full Sample Survey Sample
Pre-Treatment Characteristic
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Age as of October of 11th Grade 166 -003 -007 166 -001 -001
(002) (007) (003) (009)
[019] [035] [065] [094]
Math Exam Score 038 008 025 044 007 030
(004) (010) (005) (016)
[008] [002] [017] [006]
Reading Exam Score 029 010 018 036 009 017
(003) (012) (004) (017)
[000] [014] [002] [031]
HS Grade Point Average 316 005 020 323 006 013
(003) (008) (003) (010)
[014] [002] [006] [020]
Female 059 000 010 061 -001 011
(003) (006) (004) (007)
[099] [010] [073] [012]
Asian American 012 002 010 012 003 010
(002) (005) (001) (007)
[027] [006] [007] [012]
Black 032 -002 -006 027 000 -005
(002) (006) (002) (005)
[029] [028] [088] [040]
Hispanic Native American or Multiracial 031 001 005 033 001 005
24
(002) (006) (002) (007)
[055] [041] [081] [051]
Disabled 002 000 -001 001 000 -001
(001) (001) (001) (001)
[093] [024] [057] [05]
Gifted 013 003 000 014 002 001
(002) (005) (002) (009)
[006] [100] [025] [089]
English Language Learner 005 001 002 004 001 004
(001) (002) (001) (003)
[041] [039] [054] [022]
Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007
(002) (007) (003) (009)
[066] [077] [072] [045]
Language Other than English Spoken at Home 034 002 003 035 001 004
(002) (007) (002) (007)
[032] [073] [059] [056]
Took Recommended Prerequisite Courses 079 000 009 079 002 005
(002) (004) (002) (005)
[084] [004] [043] [031]
Number of Observations 1819 1417
Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by
School x Cohort are in parentheses and p-values are in brackets
25
Table 3
First Stage Impacts on AP Course Enrollment and Overall Course Enrollment
(1) (2) (3) (4) (5) (6)
Full Sample Survey Respondents
Outcome
Control
Group
Mean
ITT
LATE
Control
Group
Mean
ITT
LATE
AP Treatment Course Enrollment 019 038 024 039
(005) (006)
[000] [000] Share of Credits During Study Year in
AP Science 003 004 011 003 004 010
(001) (001) (001) (001)
[000] [000] [000] [000]
All AP 013 004 011 014 004 010
(001) (002) (001) (002)
[000] [000] [000] [000]
Other Advanced Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [023] [020] [020]
All Other Advanced 025 -001 -003 025 -001 -003
(001) (002) (001) (003)
[023] [023] [030] [030]
Regular Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [020] [024] [019]
All Regular 062 -003 -009 061 -003 -007
(001) (003) (001) (003)
[002] [000] [007] [003]
Number of Observations 1819 1417
Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating
Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation
(1) Course-taking information collected from student transcripts Control Group Mean uses the
full control group for the first outcome (ie AP Treatment Course Enrollment) and those control
group members who complied with their assignment (ie those who did not take the AP
Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are
weighted by the inverse probability of completing the survey Standard errors clustered by School
x Cohort are in parentheses and p-values are in brackets
26
Table 4
Treatment Contrast (Composite Variables)
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Academically Challenging Curriculum -033 031 080
(010) (024)
[000] [000]
Project-Based Independent Classroom
Activities -006 013 033
(007) (017)
[007] [006]
Integrated Use of Technology
-011 011 028
(008) (019)
[019] [014]
Number of Observations 1417
Notes To construct these composite variables we first converted the values on each component
variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest
category was set to 10 the lowest to 00 and the remaining categories evenly spaced between
00 and 10 We then averaged and standardized these converted values Results are weighted by
the inverse probability of completing the survey Online Appendix Table 5 provides the list of
component variables Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
27
Table 5
AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Science Skill -019 009 023
(006) (016)
[015] [014]
STEM Interest 062 004 009
(002) (007)
[016] [016]
Confidence in College
Science 092 -004 -010
(002) (005)
[011] [006]
Stress 012 007 017
(003) (007)
[002] [001]
Grades in Science Courses 280 -012 -029
(007) (016)
[008] [007]
Grades in Other Courses 314 -007 -018
(002) (006)
[000] [000]
Number of Observations 1819 for grades 1417 for other
outcomes
Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of
participating students STEM interest =1 if high or some interest in pursuing a STEM degree or
=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to
complete a college science course or =0 if somewhat not confident or not at all confident Stress=
1 if most recent science course had strong negative or negative impact on physical or emotional
health or =0 if strong positive impact positive impact or no impact Grades in science and other
courses are obtained from student transcripts and measure grades during the study year
Results with the exception of grades during study year are weighted by the inverse probability of
completing the survey Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
Table 6
Robustness Checks of Main ITT Results
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Outcome
Control
Group
Complie
r Mean
Main
Result
s
Robus
t SE
p-value
(permutatio
n test)
Excludin
g High
School
56
Including
Imputatio
n of
Missing
Outcome
Variables
Excluding
Covariate
s
Excludin
g High
School
23
Lee
Lower
Boun
d
Lee
Upper
Boun
d
95
Confidence
Interval
from Lee
Bounds
Rati
o of
95
CI in
(11)
to
95
CI in
(7)
Science Skill -019 009 010 011 020 007 003 039
-
009
05
1 20
(006) (005) (000) (000) (000) (000) (007) (007)
[015] [006] [006] [020] [011] [001] [024] [072] [000]
STEM Interest 062 004 005 003 003 003 002 012
-
003
01
8 19
(002) (003) (000) (000) (000) (000) (003) (004)
[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College
Science 092 -004 -003 -006 -006 -004 -006 005
-
009
01
0 20
(002) (002) (000) (000) (000) (000) (002) (003)
[011] [005] [007] [037] [002] [003] [010] [000] [017]
Stress 012 007 005 006 008 007 001 011
-
005
01
5 16
(003) (002) (000) (000) (000) (000) (003) (002)
[002] [000] [000] [014] [007] [002] [002] [079] [000]
Grades in Science Courses 280 -012 -006 -010 -007 |
(007) (004) (000) (000) (000)
[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts
Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey
(002) (003) (000) (000) (000) |
[000] [001] [001] [000] [001] [038]
Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than
standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby
a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of
43
the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)
reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the
experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply
imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and
where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)
from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those
treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and
control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to
derive confidence interval for the treatment effect itself)
30
1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the
Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the
effect of advanced high school courses more generally often without distinctions between AP
and other rigorous course options Nearly all of these nonexperimental studies find large positive
effects of rigorous secondary school courses particularly those in math and science on studentsrsquo
high school postsecondary and labor market performance (eg Altonji 1995 Attewell and
Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long
Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer
an AP course are encouraged (though not required) to attend a professional development
training The Board and other independent agencies offer several workshops with the most
extensive training being the AP summer institute a week-long training that is led by an
experienced AP instructor Teachers are then expected to develop their syllabi for the course and
submit them to the Board for review A team of auditors at the Board review each syllabus and
grant permission to a school to label the course as AP on course catalogs and student transcripts
once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they
do not meet the requirements upon original submission College Board (2017b) contains a
discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for
assessment (ie course delivery and student performance are not assessed by the Board) In
order to effectively run an AP Biology or Chemistry course teachers require access to a well-
equipped classroom and laboratory including all supplies necessary to engage in
experimentation (eg beakers solutions microscopes measuring equipment) Most of the
teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of
learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry
reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses
were based upon recommendations from the National Science Foundation the National Research
Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently
influenced by their frames of reference in ways that other assessments of these traits (eg
external observations) may be less influenced By increasing the standard to which they compare
themselves studentsrsquo confidence may decrease This feature of most self-assessments could be
considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et
al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome
depends to some extent on how these changes in perceived ability influence other behaviors
such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and
Biology I and Chemistry I for AP Biology with no additional requirements beyond these
prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week
training course classroom supplies (eg lab materials textbooks) and to compensate schools
for the staff time required for study administration efforts We also offered $1000 compensation
for an individual selected by the school to serve as a liaison between the study team and the
31
school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of
students which would have powered the study to detect effect sizes smaller than those detected
here We faced several challenges in recruiting schools to participate even with the monetary
incentives Some schools were uncomfortable with randomization across classrooms while
others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the
course would be offered We also made some assignments on a rolling basis as additional
consentassent forms were submitted We have no information on the students who were deemed
eligible by the school to take the new AP science course but who did not sign the consent form
to participate As these students did not participate we do not have permission to obtain
information on their characteristics (eg via transcripts) and for most schools we do not know
the number of such students 10 Participating districts include Anaheim Union High School District California East Side
Union High School District California Lynwood Unified School District California Jefferson
Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg
Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public
Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville
Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate
degrees held by teachers nationally are likely to be in education (not STEM) Thus the study
teachers are less likely to have a graduate degree but not necessarily less likely to have STEM
training We also did not survey teachers regarding their Teach for America (TFA) experience
but it is possible that the relatively high share of STEM undergraduate degrees could be driven
by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last
pilot test included 140 students) prior to administering the tool to study participants Reliability
metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of
the development of the assessment instrument in the survey can be found in Seeratan et al
(2017) 13 Each year in the spring semester our team administered and collected the participant surveys
during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However
if study participants who did not take the survey differ in unobserved ways then our reweighting
based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo
characteristics before imputation of missing values (as described below) these results are very
similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact
that some students did not have 10th grade scores) we created one reading and math score for
each student that is the average of both scores or just the 8th grade score For the 23 participating
students who were in 10th grade during the year in which the AP course was offered to their
cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be
endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of
enrollment in the course so the lack of balance is simply due to unlucky randomization rather
32
than manipulation by school administrators We considered implementing a randomized block
design to avoid such issues but found it infeasible to obtain the necessary test score information
prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was
allowed to register for the new class We added an entire planning year to our study design to
avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by
Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We
find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these
six outcomes which suggests that generalizing our estimated treatment effects to the full control
group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly
offered enrollment in an AP course and then given the option of Chemistry or Biology To
account for the two courses offered we treat the school as two separate groups School-
Chemistry and School-Biology For those students who were not offered an AP course we
randomly assign them to one of two control groups proportional to the number of treated
students who chose each course For example if 60 of the treated students chose Biology then
we randomly assign 60 of the control students to the School-Biology control group In Section
VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a
probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1
if student i in school by cohort j completed any part of the end-of-year survey Xi is the same
vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed
effects and Φ() is the cumulative normal distribution function The results of this regression are
included in Online Appendix Table 2 Students who had higher pre-treatment grades Black
students those who were not disabled and those who took prerequisite courses were more likely
to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives
more weight in the regression to study participants who completed the survey and yet had pre-
study characteristics that were similar to those study participants who did not complete the
survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and
with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we
observe each outcome variable This follows a multiple imputation then deletion strategy
suggested by Hippel (2007) which improves efficiency while protecting against problematic
imputed outcome values As a robustness check Section VC provides results including
imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually
challenging curriculum with more homework than non-AP complier students Treatment group
students are also more likely to report that the students in their class were driven to succeed and
that the teacher set high standards The AP science class also involved more student-led projects
or experiments hands on learning and small group work all activities that are deemed to be
essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)
Yet we do not find strong evidence that students in AP classes were more likely to present what
they learned apply their knowledge to solve a new problem or work independently and none of
the component measures of technology usage were statistically significantly affected Nor did
33
treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear
better able to implement the academic rigor expected of an AP science class than some of the
inquiry-based approaches that the College Board intends for AP science We do not find
evidence that taking AP science led students to be more likely to report that they found their
course more interesting which may reflect the inability of the teachers to fully implement a
creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects
that might render our estimated effects smaller A research design with randomization both
across and within schools would allow for estimation of spillover effects but such a design was
infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers
received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the
weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors
in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple
outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons
(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same
three outcomes that reach statistical significance without applying the correction (shown in
Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys
from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we
have included the students from cohort 1 of high school number 23 where nonresponse was due
mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes
(including student prior academic preparation raceethnicity gender and teacher preparation)
We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in
science and grades in other courses) Some of the differences in the point estimates were quite
large yet so too were the standard errors For instance five of the seven estimated differential
treatment effects on science skill exceed 025 standard deviations with p-values that fall in the
suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse
on all three cohorts of study participants Once data collection is complete we will have the
ability to examine the effect of AP science on college enrollment college selectivity and college
completion
16
Davis Jennifer R 2014 ldquoA Little Goes a Long Way Pressure for College Students to Succeedrdquo
Journal of Undergraduate Research 12 (1) 1ndash9
Dobbie Will and Roland G Fryer Jr 2015 ldquoThe medium-term impacts of high-achieving
charter schoolsrdquo Journal of Political Economy 123 (5) 985ndash1037
Dougherty Chrys and Lynn Mellor 2009 ldquoPreparation Mattersrdquo National Center for
Educational Achievement Washington DC
Dounay Zinth Jennifer 2016 ldquo50-State Comparison Advanced Placement Policiesrdquo Education
Commission of the States
Drew Christopher 2011 ldquoRethinking Advanced Placementrdquo The New York Times January 7
Duffett Ann and Steve Farkas 2009 ldquoGrowing Pains in the Advanced Placement Program Do
Tough Trade-offs Lie Aheadrdquo Thomas B Fordham Institute Washington DC
Ellis Jessica Bailey K Fosdick and Chris Rasmussen 2016 ldquoWomen 15 Times More Likely to
Leave STEM Pipeline after Calculus Compared to Men Lack of Mathematical
Confidence a Potential Culpritrdquo PLOS ONE 11 (7) 1ndash14
Foust Regan Clark Holly Hertberg-Davis and Carolyn M Callahan 2009 ldquoStudentsrsquo
Perceptions of the Non-academic Advantages and Disadvantages of Participation in
Advanced Placement Courses and International Baccalaureate Programsrdquo Adolescence
44 (174) 289ndash312
Geiser Saul and Veronica Santelices 2004 ldquoThe Role of Advanced Placement and Honors
Courses in College Admissionsrdquo Center for Studies in Higher Education Research
Occasional Paper Series CSHE404
Goodman Joshua Samuel 2012 ldquoThe Labor of Division Returns to Compulsory Math
Courseworkrdquo Unpublished Manuscript
Harel O 2009 ldquoThe Estimation of R-squared and Adjusted R-squared in Incomplete Data Sets
Using Multiple Imputationrdquo Journal of Applied Statistics 36 (10) 1109ndash1118
Hippel Paul T von 2007 ldquoRegression with Missing Ys An Improved Strategy for Analyzing
Multiply Imputed Datardquo Sociological Methodology 37 (1) 83ndash117
Holstead Michael S Terry E Spradlin Margaret E McGillivray and Nathan Burroughs 2010
ldquoThe Impact of Advanced Placement Incentive Programsrdquo Center for Evaluation and
Education Policy Indiana University Education Policy Brief 8(1)
Hopkins Katy 2012 ldquoWeigh the Benefits Stress of AP Courses for Your Studentrdquo US News
the World Report May 10
Huber Martin 2013 ldquoA Simple Test for the Ignorability of Non-compliance in Experimentsrdquo
Economics Letters 120 (3) 389ndash391
Imbens G and F Manski 2004 ldquoConfidence Intervals for Partially Identified Parametersrdquo
Econometrica 72 (6) 1845ndash1857
Jackson C Kirabo 2010 ldquoA Little Now for a Lot Later A Look at a Texas Advanced Placement
Incentive Programrdquo Journal of Human Resources 45 (3) 591ndash639
__________ 2014 ldquoDo College-Preparatory Programs Improve Long-Term Outcomesrdquo
Economic Inquiry 52 (1) 72ndash99
Joensen Juanna Schroslashter and Helena Skyt Nielsen 2009 ldquoIs there a Causal Effect of High
School Math on Labor Market Outcomesrdquo Journal of Human Resources 44 (1) 171ndash
198
Kim Emily 2015 ldquoAP Classes often Translate to Advanced Pressurerdquo Los Angeles Times
September 22
Klopfenstein Kristin and Kit Lively 2016 ldquoDo Grade Weights Promote More Advanced
17
Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324
Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement
Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891
__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and
Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds
Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188
Cambridge Harvard Education Press
Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla
Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)
287ndash 313
Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on
Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102
Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations
of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347
(6219) 262ndash265
Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math
and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic
Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student
STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher
Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking
on Secondary and Postsecondary Successrdquo American Educational Research Journal 49
(2) 285ndash322
Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP
Expansion Can Schools in Less-Resourced Communities Successfully Implement
Advanced Placement Science Coursesrdquo Conditionally accepted by Educational
Researcher
Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo
American Enterprise Institute Washington DC
Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23
McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy
Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of
Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-
144) US Department of Education Washington DC National Center for Education
Statistics
National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of
Mathematics and Science in US High Schoolsrdquo Washington DC National Academies
Press
__________ 2012 A Framework for K-12 Science Education Practices Crosscutting
Concepts and Core Ideas Washington DC The National Academies Press
Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC
Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data
Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures
Version 10 Stanford University
Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic
Analysis amp Policy 4 (1) 1ndash30
18
Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The
Review of Economics and Statistics 86 (2) 497ndash513
Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)
Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of
Advanced High School Coursework in Increasing STEM Career Interestrdquo Science
Educator 23 (1) 1ndash13
Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework
in College Admission Decisionsrdquo College and University 82 (4) 7ndash14
Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan
Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific
Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo
Educational Measurement Forthcoming
Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where
it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor
Economics 35 (1) 67ndash147
Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An
Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732
Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual
differencesrdquo Personality and Individual Differences 21 (6) 971ndash986
Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of
Cross-Cultural Psychology 45 (5) 821ndash837
Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid
Growthrdquo The New York Times April 29
Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo
Liberal Education 94 (3) 38ndash43
The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo
Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo
Education Trust June 5
Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and
Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-
001) US Department of Education Washington DC National Center for Education
Statistics
Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13
Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate
US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the
Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced
Placement Testsrdquo Washington DC
Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of
Advanced Placementrdquo Progressive Policy Institute Washington DC
West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth
Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring
Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation
and Policy Analysis 38 (1) 148ndash170
Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity
of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482
19
Figure 1
Geographic Distribution of Participating Districts
20
Figure 2
Participating Districts Neighborhood Socioeconomic Status and School Test Scores
Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school
district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos
neighborhood defined as the first principal component factor score based on measures of median
income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed
household rate and unemployment rate Y-axis is the districtrsquos average test score in grade
equivalents based on the averaged spring math and English scores for students in grades 3-8 for
2009-2013 with the expected level of achievement standardized to zero The size of each circle
is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using
Statarsquos default settings and roughly shows the predicted test score as a function of the
neighborhoodrsquos SES
21
Figure 3
Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile
Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects
Corresponding OLS estimate shown by the dashed horizontal line Science skill has been
standardized to have a mean of 0 and SD of 1 for the full sample of participating students
Results are weighted by the inverse probability of completing the survey
22
Table 1
Participating Schools and Teachers Compared to Other US High Schools and High School
Science Teachers Panel A Schools Participating Others
Average Enrollment 1409 723
Free or Reduced-Price Lunch 0700 0438
Asian 0055 0050
Black 0349 0154
Hispanic 0410 0221
White 0164 0537
Adjusted Cohort Graduation Rate 0843 0802
District Instruction Expenditures Per Pupil $6561 $5636
District Student Services Expenditures Per Pupil $3787 $3385
Panel B Teachers Participating Others
Age Under 30 0407 0160
Age 30-49 0432 0553
Age 50 or over 0161 0287
Female 0630 0536
Hispanic or Latino 0111 0051
Race American Indian or Alaska Native 0000 0009
Race Asian American 0111 0041
Race Black 0111 0060
Race Native Hawaiian or other Pacific Islander 0000 0004
Race White 0778 0896
Years of Experience 103 132
Years of Experience lt=2 0290 0085
Years of Experience lt=5 0481 0234
Hold a Teaching Certificate 0926 0945
Undergraduate Major in STEM 0944 0747
Single Subject Credential in Science 0630 0823
Masterrsquos Degree or Higher 0356 0615
Previously Taught AP Course 0469 NA
Previously Taught AP IB or Honors Course 0796 NA
Number of Professional Development Trainings 309 NA
in the Past 5 years (0-5)
Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts
httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public
high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a
9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the
Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey
httpsncesedgovsurveyssass Others in Panel B refers to public and private high school
teachers in the US High school science teachers are defined as teachers of grades 9-12 whose
main teaching assignment is in the natural sciences
23
Table 2
TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics
(1) (2) (3) (4) (5) (6)
Full Sample Survey Sample
Pre-Treatment Characteristic
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Age as of October of 11th Grade 166 -003 -007 166 -001 -001
(002) (007) (003) (009)
[019] [035] [065] [094]
Math Exam Score 038 008 025 044 007 030
(004) (010) (005) (016)
[008] [002] [017] [006]
Reading Exam Score 029 010 018 036 009 017
(003) (012) (004) (017)
[000] [014] [002] [031]
HS Grade Point Average 316 005 020 323 006 013
(003) (008) (003) (010)
[014] [002] [006] [020]
Female 059 000 010 061 -001 011
(003) (006) (004) (007)
[099] [010] [073] [012]
Asian American 012 002 010 012 003 010
(002) (005) (001) (007)
[027] [006] [007] [012]
Black 032 -002 -006 027 000 -005
(002) (006) (002) (005)
[029] [028] [088] [040]
Hispanic Native American or Multiracial 031 001 005 033 001 005
24
(002) (006) (002) (007)
[055] [041] [081] [051]
Disabled 002 000 -001 001 000 -001
(001) (001) (001) (001)
[093] [024] [057] [05]
Gifted 013 003 000 014 002 001
(002) (005) (002) (009)
[006] [100] [025] [089]
English Language Learner 005 001 002 004 001 004
(001) (002) (001) (003)
[041] [039] [054] [022]
Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007
(002) (007) (003) (009)
[066] [077] [072] [045]
Language Other than English Spoken at Home 034 002 003 035 001 004
(002) (007) (002) (007)
[032] [073] [059] [056]
Took Recommended Prerequisite Courses 079 000 009 079 002 005
(002) (004) (002) (005)
[084] [004] [043] [031]
Number of Observations 1819 1417
Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by
School x Cohort are in parentheses and p-values are in brackets
25
Table 3
First Stage Impacts on AP Course Enrollment and Overall Course Enrollment
(1) (2) (3) (4) (5) (6)
Full Sample Survey Respondents
Outcome
Control
Group
Mean
ITT
LATE
Control
Group
Mean
ITT
LATE
AP Treatment Course Enrollment 019 038 024 039
(005) (006)
[000] [000] Share of Credits During Study Year in
AP Science 003 004 011 003 004 010
(001) (001) (001) (001)
[000] [000] [000] [000]
All AP 013 004 011 014 004 010
(001) (002) (001) (002)
[000] [000] [000] [000]
Other Advanced Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [023] [020] [020]
All Other Advanced 025 -001 -003 025 -001 -003
(001) (002) (001) (003)
[023] [023] [030] [030]
Regular Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [020] [024] [019]
All Regular 062 -003 -009 061 -003 -007
(001) (003) (001) (003)
[002] [000] [007] [003]
Number of Observations 1819 1417
Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating
Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation
(1) Course-taking information collected from student transcripts Control Group Mean uses the
full control group for the first outcome (ie AP Treatment Course Enrollment) and those control
group members who complied with their assignment (ie those who did not take the AP
Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are
weighted by the inverse probability of completing the survey Standard errors clustered by School
x Cohort are in parentheses and p-values are in brackets
26
Table 4
Treatment Contrast (Composite Variables)
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Academically Challenging Curriculum -033 031 080
(010) (024)
[000] [000]
Project-Based Independent Classroom
Activities -006 013 033
(007) (017)
[007] [006]
Integrated Use of Technology
-011 011 028
(008) (019)
[019] [014]
Number of Observations 1417
Notes To construct these composite variables we first converted the values on each component
variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest
category was set to 10 the lowest to 00 and the remaining categories evenly spaced between
00 and 10 We then averaged and standardized these converted values Results are weighted by
the inverse probability of completing the survey Online Appendix Table 5 provides the list of
component variables Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
27
Table 5
AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Science Skill -019 009 023
(006) (016)
[015] [014]
STEM Interest 062 004 009
(002) (007)
[016] [016]
Confidence in College
Science 092 -004 -010
(002) (005)
[011] [006]
Stress 012 007 017
(003) (007)
[002] [001]
Grades in Science Courses 280 -012 -029
(007) (016)
[008] [007]
Grades in Other Courses 314 -007 -018
(002) (006)
[000] [000]
Number of Observations 1819 for grades 1417 for other
outcomes
Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of
participating students STEM interest =1 if high or some interest in pursuing a STEM degree or
=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to
complete a college science course or =0 if somewhat not confident or not at all confident Stress=
1 if most recent science course had strong negative or negative impact on physical or emotional
health or =0 if strong positive impact positive impact or no impact Grades in science and other
courses are obtained from student transcripts and measure grades during the study year
Results with the exception of grades during study year are weighted by the inverse probability of
completing the survey Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
Table 6
Robustness Checks of Main ITT Results
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Outcome
Control
Group
Complie
r Mean
Main
Result
s
Robus
t SE
p-value
(permutatio
n test)
Excludin
g High
School
56
Including
Imputatio
n of
Missing
Outcome
Variables
Excluding
Covariate
s
Excludin
g High
School
23
Lee
Lower
Boun
d
Lee
Upper
Boun
d
95
Confidence
Interval
from Lee
Bounds
Rati
o of
95
CI in
(11)
to
95
CI in
(7)
Science Skill -019 009 010 011 020 007 003 039
-
009
05
1 20
(006) (005) (000) (000) (000) (000) (007) (007)
[015] [006] [006] [020] [011] [001] [024] [072] [000]
STEM Interest 062 004 005 003 003 003 002 012
-
003
01
8 19
(002) (003) (000) (000) (000) (000) (003) (004)
[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College
Science 092 -004 -003 -006 -006 -004 -006 005
-
009
01
0 20
(002) (002) (000) (000) (000) (000) (002) (003)
[011] [005] [007] [037] [002] [003] [010] [000] [017]
Stress 012 007 005 006 008 007 001 011
-
005
01
5 16
(003) (002) (000) (000) (000) (000) (003) (002)
[002] [000] [000] [014] [007] [002] [002] [079] [000]
Grades in Science Courses 280 -012 -006 -010 -007 |
(007) (004) (000) (000) (000)
[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts
Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey
(002) (003) (000) (000) (000) |
[000] [001] [001] [000] [001] [038]
Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than
standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby
a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of
43
the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)
reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the
experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply
imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and
where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)
from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those
treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and
control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to
derive confidence interval for the treatment effect itself)
30
1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the
Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the
effect of advanced high school courses more generally often without distinctions between AP
and other rigorous course options Nearly all of these nonexperimental studies find large positive
effects of rigorous secondary school courses particularly those in math and science on studentsrsquo
high school postsecondary and labor market performance (eg Altonji 1995 Attewell and
Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long
Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer
an AP course are encouraged (though not required) to attend a professional development
training The Board and other independent agencies offer several workshops with the most
extensive training being the AP summer institute a week-long training that is led by an
experienced AP instructor Teachers are then expected to develop their syllabi for the course and
submit them to the Board for review A team of auditors at the Board review each syllabus and
grant permission to a school to label the course as AP on course catalogs and student transcripts
once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they
do not meet the requirements upon original submission College Board (2017b) contains a
discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for
assessment (ie course delivery and student performance are not assessed by the Board) In
order to effectively run an AP Biology or Chemistry course teachers require access to a well-
equipped classroom and laboratory including all supplies necessary to engage in
experimentation (eg beakers solutions microscopes measuring equipment) Most of the
teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of
learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry
reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses
were based upon recommendations from the National Science Foundation the National Research
Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently
influenced by their frames of reference in ways that other assessments of these traits (eg
external observations) may be less influenced By increasing the standard to which they compare
themselves studentsrsquo confidence may decrease This feature of most self-assessments could be
considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et
al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome
depends to some extent on how these changes in perceived ability influence other behaviors
such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and
Biology I and Chemistry I for AP Biology with no additional requirements beyond these
prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week
training course classroom supplies (eg lab materials textbooks) and to compensate schools
for the staff time required for study administration efforts We also offered $1000 compensation
for an individual selected by the school to serve as a liaison between the study team and the
31
school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of
students which would have powered the study to detect effect sizes smaller than those detected
here We faced several challenges in recruiting schools to participate even with the monetary
incentives Some schools were uncomfortable with randomization across classrooms while
others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the
course would be offered We also made some assignments on a rolling basis as additional
consentassent forms were submitted We have no information on the students who were deemed
eligible by the school to take the new AP science course but who did not sign the consent form
to participate As these students did not participate we do not have permission to obtain
information on their characteristics (eg via transcripts) and for most schools we do not know
the number of such students 10 Participating districts include Anaheim Union High School District California East Side
Union High School District California Lynwood Unified School District California Jefferson
Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg
Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public
Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville
Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate
degrees held by teachers nationally are likely to be in education (not STEM) Thus the study
teachers are less likely to have a graduate degree but not necessarily less likely to have STEM
training We also did not survey teachers regarding their Teach for America (TFA) experience
but it is possible that the relatively high share of STEM undergraduate degrees could be driven
by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last
pilot test included 140 students) prior to administering the tool to study participants Reliability
metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of
the development of the assessment instrument in the survey can be found in Seeratan et al
(2017) 13 Each year in the spring semester our team administered and collected the participant surveys
during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However
if study participants who did not take the survey differ in unobserved ways then our reweighting
based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo
characteristics before imputation of missing values (as described below) these results are very
similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact
that some students did not have 10th grade scores) we created one reading and math score for
each student that is the average of both scores or just the 8th grade score For the 23 participating
students who were in 10th grade during the year in which the AP course was offered to their
cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be
endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of
enrollment in the course so the lack of balance is simply due to unlucky randomization rather
32
than manipulation by school administrators We considered implementing a randomized block
design to avoid such issues but found it infeasible to obtain the necessary test score information
prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was
allowed to register for the new class We added an entire planning year to our study design to
avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by
Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We
find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these
six outcomes which suggests that generalizing our estimated treatment effects to the full control
group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly
offered enrollment in an AP course and then given the option of Chemistry or Biology To
account for the two courses offered we treat the school as two separate groups School-
Chemistry and School-Biology For those students who were not offered an AP course we
randomly assign them to one of two control groups proportional to the number of treated
students who chose each course For example if 60 of the treated students chose Biology then
we randomly assign 60 of the control students to the School-Biology control group In Section
VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a
probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1
if student i in school by cohort j completed any part of the end-of-year survey Xi is the same
vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed
effects and Φ() is the cumulative normal distribution function The results of this regression are
included in Online Appendix Table 2 Students who had higher pre-treatment grades Black
students those who were not disabled and those who took prerequisite courses were more likely
to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives
more weight in the regression to study participants who completed the survey and yet had pre-
study characteristics that were similar to those study participants who did not complete the
survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and
with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we
observe each outcome variable This follows a multiple imputation then deletion strategy
suggested by Hippel (2007) which improves efficiency while protecting against problematic
imputed outcome values As a robustness check Section VC provides results including
imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually
challenging curriculum with more homework than non-AP complier students Treatment group
students are also more likely to report that the students in their class were driven to succeed and
that the teacher set high standards The AP science class also involved more student-led projects
or experiments hands on learning and small group work all activities that are deemed to be
essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)
Yet we do not find strong evidence that students in AP classes were more likely to present what
they learned apply their knowledge to solve a new problem or work independently and none of
the component measures of technology usage were statistically significantly affected Nor did
33
treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear
better able to implement the academic rigor expected of an AP science class than some of the
inquiry-based approaches that the College Board intends for AP science We do not find
evidence that taking AP science led students to be more likely to report that they found their
course more interesting which may reflect the inability of the teachers to fully implement a
creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects
that might render our estimated effects smaller A research design with randomization both
across and within schools would allow for estimation of spillover effects but such a design was
infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers
received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the
weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors
in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple
outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons
(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same
three outcomes that reach statistical significance without applying the correction (shown in
Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys
from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we
have included the students from cohort 1 of high school number 23 where nonresponse was due
mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes
(including student prior academic preparation raceethnicity gender and teacher preparation)
We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in
science and grades in other courses) Some of the differences in the point estimates were quite
large yet so too were the standard errors For instance five of the seven estimated differential
treatment effects on science skill exceed 025 standard deviations with p-values that fall in the
suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse
on all three cohorts of study participants Once data collection is complete we will have the
ability to examine the effect of AP science on college enrollment college selectivity and college
completion
17
Course-Takingrdquo Education Finance and Policy 11 (3) 310ndash324
Klopfenstein Kristin and M Kathleen Thomas 2009 ldquoThe Link Between Advanced Placement
Experience and Early College Successrdquo Southern Economic Journal 75 (3) 873ndash 891
__________ 2010 ldquoAdvanced Placement Participation Evaluating the Policies of States and
Collegesrdquo In AP A Critical Examination of the Advanced Placement Program eds
Philip M Sadler Gerhard Sonnert Robert H Tai and Kristin Klopfenstein 167ndash188
Cambridge Harvard Education Press
Kurth Lori A Charles W Anderson and Annemarie S Palincsar 2002 ldquoThe Case of Carla
Dilemmas of Helping All Students to Understand Sciencerdquo Science Education 86 (3)
287ndash 313
Lee David S 2009 ldquoTraining Wages and Sample Selection Estimating Sharp Bounds on
Treatment Effectsrdquo The Review of Economic Studies 76 (3) 1071ndash1102
Leslie Sarah-Jane Andrei Cimpian Meredith Meyer and Edward Freeland 2015 ldquoExpectations
of Brilliance Underlie Gender Distributions across Academic Disciplinesrdquo Science 347
(6219) 262ndash265
Levine Phillip B and David J Zimmerman 1995 ldquoThe Benefit of Additional High school Math
and Science Classes for Young Men and Womenrdquo Journal of Business amp Economic
Statistics 13 (2) 137ndash149 Litzler Elizabeth Cate C Samuelson and Julie A Lorah 2014 ldquoBreaking it Down Engineering Student
STEM Confidence at the Intersection of RaceEthnicity and Genderrdquo Research in Higher
Education 55 (8) 810ndash832 Long Mark C Dylan Conger and Patrice Iatarola 2012 ldquoEffects of High School Course-taking
on Secondary and Postsecondary Successrdquo American Educational Research Journal 49
(2) 285ndash322
Long Mark C Dylan Conger and Raymond McGhee Jr 2018 ldquoLife on the Frontier of AP
Expansion Can Schools in Less-Resourced Communities Successfully Implement
Advanced Placement Science Coursesrdquo Conditionally accepted by Educational
Researcher
Malkus Nat 2016 ldquoAP at Scale Public School Students in Advanced Placement 1990ndash2013rdquo
American Enterprise Institute Washington DC
Marx Gabby 2014 ldquoAre AP Courses Worth the Stressrdquo WiscNews May 23
McFarland Joel Bill Hussar Xiaolei Wang Jijun Zhang Ke Wang Amy Rathbun Amy
Barmer Emily Forrest Cataldi and Farrah Bullock Mann 2018 ldquoThe Condition of
Education 2018 (NCES 2018-144) Public High School Graduation Ratesrdquo (NCES 2018-
144) US Department of Education Washington DC National Center for Education
Statistics
National Research Council 2002 ldquoLearning and Understanding Improving Advanced Study of
Mathematics and Science in US High Schoolsrdquo Washington DC National Academies
Press
__________ 2012 A Framework for K-12 Science Education Practices Crosscutting
Concepts and Core Ideas Washington DC The National Academies Press
Obama White House 2016 ldquoSTEM for Allrdquo February 11 Washington DC
Reardon Sean F Demetra Kalogrides and Kenneth Shores 2016 ldquoStanford Education Data
Archive (SEDA) Summary of Data Cleaning Estimation and Scaling Procedures
Version 10 Stanford University
Rose Heather 2004 ldquoHas Curriculum Closed the Test Score Gap in Mathrdquo Topics in Economic
Analysis amp Policy 4 (1) 1ndash30
18
Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The
Review of Economics and Statistics 86 (2) 497ndash513
Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)
Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of
Advanced High School Coursework in Increasing STEM Career Interestrdquo Science
Educator 23 (1) 1ndash13
Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework
in College Admission Decisionsrdquo College and University 82 (4) 7ndash14
Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan
Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific
Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo
Educational Measurement Forthcoming
Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where
it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor
Economics 35 (1) 67ndash147
Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An
Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732
Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual
differencesrdquo Personality and Individual Differences 21 (6) 971ndash986
Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of
Cross-Cultural Psychology 45 (5) 821ndash837
Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid
Growthrdquo The New York Times April 29
Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo
Liberal Education 94 (3) 38ndash43
The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo
Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo
Education Trust June 5
Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and
Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-
001) US Department of Education Washington DC National Center for Education
Statistics
Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13
Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate
US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the
Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced
Placement Testsrdquo Washington DC
Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of
Advanced Placementrdquo Progressive Policy Institute Washington DC
West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth
Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring
Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation
and Policy Analysis 38 (1) 148ndash170
Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity
of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482
19
Figure 1
Geographic Distribution of Participating Districts
20
Figure 2
Participating Districts Neighborhood Socioeconomic Status and School Test Scores
Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school
district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos
neighborhood defined as the first principal component factor score based on measures of median
income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed
household rate and unemployment rate Y-axis is the districtrsquos average test score in grade
equivalents based on the averaged spring math and English scores for students in grades 3-8 for
2009-2013 with the expected level of achievement standardized to zero The size of each circle
is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using
Statarsquos default settings and roughly shows the predicted test score as a function of the
neighborhoodrsquos SES
21
Figure 3
Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile
Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects
Corresponding OLS estimate shown by the dashed horizontal line Science skill has been
standardized to have a mean of 0 and SD of 1 for the full sample of participating students
Results are weighted by the inverse probability of completing the survey
22
Table 1
Participating Schools and Teachers Compared to Other US High Schools and High School
Science Teachers Panel A Schools Participating Others
Average Enrollment 1409 723
Free or Reduced-Price Lunch 0700 0438
Asian 0055 0050
Black 0349 0154
Hispanic 0410 0221
White 0164 0537
Adjusted Cohort Graduation Rate 0843 0802
District Instruction Expenditures Per Pupil $6561 $5636
District Student Services Expenditures Per Pupil $3787 $3385
Panel B Teachers Participating Others
Age Under 30 0407 0160
Age 30-49 0432 0553
Age 50 or over 0161 0287
Female 0630 0536
Hispanic or Latino 0111 0051
Race American Indian or Alaska Native 0000 0009
Race Asian American 0111 0041
Race Black 0111 0060
Race Native Hawaiian or other Pacific Islander 0000 0004
Race White 0778 0896
Years of Experience 103 132
Years of Experience lt=2 0290 0085
Years of Experience lt=5 0481 0234
Hold a Teaching Certificate 0926 0945
Undergraduate Major in STEM 0944 0747
Single Subject Credential in Science 0630 0823
Masterrsquos Degree or Higher 0356 0615
Previously Taught AP Course 0469 NA
Previously Taught AP IB or Honors Course 0796 NA
Number of Professional Development Trainings 309 NA
in the Past 5 years (0-5)
Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts
httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public
high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a
9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the
Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey
httpsncesedgovsurveyssass Others in Panel B refers to public and private high school
teachers in the US High school science teachers are defined as teachers of grades 9-12 whose
main teaching assignment is in the natural sciences
23
Table 2
TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics
(1) (2) (3) (4) (5) (6)
Full Sample Survey Sample
Pre-Treatment Characteristic
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Age as of October of 11th Grade 166 -003 -007 166 -001 -001
(002) (007) (003) (009)
[019] [035] [065] [094]
Math Exam Score 038 008 025 044 007 030
(004) (010) (005) (016)
[008] [002] [017] [006]
Reading Exam Score 029 010 018 036 009 017
(003) (012) (004) (017)
[000] [014] [002] [031]
HS Grade Point Average 316 005 020 323 006 013
(003) (008) (003) (010)
[014] [002] [006] [020]
Female 059 000 010 061 -001 011
(003) (006) (004) (007)
[099] [010] [073] [012]
Asian American 012 002 010 012 003 010
(002) (005) (001) (007)
[027] [006] [007] [012]
Black 032 -002 -006 027 000 -005
(002) (006) (002) (005)
[029] [028] [088] [040]
Hispanic Native American or Multiracial 031 001 005 033 001 005
24
(002) (006) (002) (007)
[055] [041] [081] [051]
Disabled 002 000 -001 001 000 -001
(001) (001) (001) (001)
[093] [024] [057] [05]
Gifted 013 003 000 014 002 001
(002) (005) (002) (009)
[006] [100] [025] [089]
English Language Learner 005 001 002 004 001 004
(001) (002) (001) (003)
[041] [039] [054] [022]
Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007
(002) (007) (003) (009)
[066] [077] [072] [045]
Language Other than English Spoken at Home 034 002 003 035 001 004
(002) (007) (002) (007)
[032] [073] [059] [056]
Took Recommended Prerequisite Courses 079 000 009 079 002 005
(002) (004) (002) (005)
[084] [004] [043] [031]
Number of Observations 1819 1417
Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by
School x Cohort are in parentheses and p-values are in brackets
25
Table 3
First Stage Impacts on AP Course Enrollment and Overall Course Enrollment
(1) (2) (3) (4) (5) (6)
Full Sample Survey Respondents
Outcome
Control
Group
Mean
ITT
LATE
Control
Group
Mean
ITT
LATE
AP Treatment Course Enrollment 019 038 024 039
(005) (006)
[000] [000] Share of Credits During Study Year in
AP Science 003 004 011 003 004 010
(001) (001) (001) (001)
[000] [000] [000] [000]
All AP 013 004 011 014 004 010
(001) (002) (001) (002)
[000] [000] [000] [000]
Other Advanced Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [023] [020] [020]
All Other Advanced 025 -001 -003 025 -001 -003
(001) (002) (001) (003)
[023] [023] [030] [030]
Regular Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [020] [024] [019]
All Regular 062 -003 -009 061 -003 -007
(001) (003) (001) (003)
[002] [000] [007] [003]
Number of Observations 1819 1417
Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating
Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation
(1) Course-taking information collected from student transcripts Control Group Mean uses the
full control group for the first outcome (ie AP Treatment Course Enrollment) and those control
group members who complied with their assignment (ie those who did not take the AP
Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are
weighted by the inverse probability of completing the survey Standard errors clustered by School
x Cohort are in parentheses and p-values are in brackets
26
Table 4
Treatment Contrast (Composite Variables)
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Academically Challenging Curriculum -033 031 080
(010) (024)
[000] [000]
Project-Based Independent Classroom
Activities -006 013 033
(007) (017)
[007] [006]
Integrated Use of Technology
-011 011 028
(008) (019)
[019] [014]
Number of Observations 1417
Notes To construct these composite variables we first converted the values on each component
variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest
category was set to 10 the lowest to 00 and the remaining categories evenly spaced between
00 and 10 We then averaged and standardized these converted values Results are weighted by
the inverse probability of completing the survey Online Appendix Table 5 provides the list of
component variables Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
27
Table 5
AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Science Skill -019 009 023
(006) (016)
[015] [014]
STEM Interest 062 004 009
(002) (007)
[016] [016]
Confidence in College
Science 092 -004 -010
(002) (005)
[011] [006]
Stress 012 007 017
(003) (007)
[002] [001]
Grades in Science Courses 280 -012 -029
(007) (016)
[008] [007]
Grades in Other Courses 314 -007 -018
(002) (006)
[000] [000]
Number of Observations 1819 for grades 1417 for other
outcomes
Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of
participating students STEM interest =1 if high or some interest in pursuing a STEM degree or
=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to
complete a college science course or =0 if somewhat not confident or not at all confident Stress=
1 if most recent science course had strong negative or negative impact on physical or emotional
health or =0 if strong positive impact positive impact or no impact Grades in science and other
courses are obtained from student transcripts and measure grades during the study year
Results with the exception of grades during study year are weighted by the inverse probability of
completing the survey Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
Table 6
Robustness Checks of Main ITT Results
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Outcome
Control
Group
Complie
r Mean
Main
Result
s
Robus
t SE
p-value
(permutatio
n test)
Excludin
g High
School
56
Including
Imputatio
n of
Missing
Outcome
Variables
Excluding
Covariate
s
Excludin
g High
School
23
Lee
Lower
Boun
d
Lee
Upper
Boun
d
95
Confidence
Interval
from Lee
Bounds
Rati
o of
95
CI in
(11)
to
95
CI in
(7)
Science Skill -019 009 010 011 020 007 003 039
-
009
05
1 20
(006) (005) (000) (000) (000) (000) (007) (007)
[015] [006] [006] [020] [011] [001] [024] [072] [000]
STEM Interest 062 004 005 003 003 003 002 012
-
003
01
8 19
(002) (003) (000) (000) (000) (000) (003) (004)
[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College
Science 092 -004 -003 -006 -006 -004 -006 005
-
009
01
0 20
(002) (002) (000) (000) (000) (000) (002) (003)
[011] [005] [007] [037] [002] [003] [010] [000] [017]
Stress 012 007 005 006 008 007 001 011
-
005
01
5 16
(003) (002) (000) (000) (000) (000) (003) (002)
[002] [000] [000] [014] [007] [002] [002] [079] [000]
Grades in Science Courses 280 -012 -006 -010 -007 |
(007) (004) (000) (000) (000)
[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts
Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey
(002) (003) (000) (000) (000) |
[000] [001] [001] [000] [001] [038]
Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than
standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby
a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of
43
the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)
reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the
experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply
imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and
where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)
from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those
treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and
control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to
derive confidence interval for the treatment effect itself)
30
1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the
Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the
effect of advanced high school courses more generally often without distinctions between AP
and other rigorous course options Nearly all of these nonexperimental studies find large positive
effects of rigorous secondary school courses particularly those in math and science on studentsrsquo
high school postsecondary and labor market performance (eg Altonji 1995 Attewell and
Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long
Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer
an AP course are encouraged (though not required) to attend a professional development
training The Board and other independent agencies offer several workshops with the most
extensive training being the AP summer institute a week-long training that is led by an
experienced AP instructor Teachers are then expected to develop their syllabi for the course and
submit them to the Board for review A team of auditors at the Board review each syllabus and
grant permission to a school to label the course as AP on course catalogs and student transcripts
once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they
do not meet the requirements upon original submission College Board (2017b) contains a
discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for
assessment (ie course delivery and student performance are not assessed by the Board) In
order to effectively run an AP Biology or Chemistry course teachers require access to a well-
equipped classroom and laboratory including all supplies necessary to engage in
experimentation (eg beakers solutions microscopes measuring equipment) Most of the
teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of
learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry
reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses
were based upon recommendations from the National Science Foundation the National Research
Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently
influenced by their frames of reference in ways that other assessments of these traits (eg
external observations) may be less influenced By increasing the standard to which they compare
themselves studentsrsquo confidence may decrease This feature of most self-assessments could be
considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et
al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome
depends to some extent on how these changes in perceived ability influence other behaviors
such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and
Biology I and Chemistry I for AP Biology with no additional requirements beyond these
prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week
training course classroom supplies (eg lab materials textbooks) and to compensate schools
for the staff time required for study administration efforts We also offered $1000 compensation
for an individual selected by the school to serve as a liaison between the study team and the
31
school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of
students which would have powered the study to detect effect sizes smaller than those detected
here We faced several challenges in recruiting schools to participate even with the monetary
incentives Some schools were uncomfortable with randomization across classrooms while
others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the
course would be offered We also made some assignments on a rolling basis as additional
consentassent forms were submitted We have no information on the students who were deemed
eligible by the school to take the new AP science course but who did not sign the consent form
to participate As these students did not participate we do not have permission to obtain
information on their characteristics (eg via transcripts) and for most schools we do not know
the number of such students 10 Participating districts include Anaheim Union High School District California East Side
Union High School District California Lynwood Unified School District California Jefferson
Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg
Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public
Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville
Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate
degrees held by teachers nationally are likely to be in education (not STEM) Thus the study
teachers are less likely to have a graduate degree but not necessarily less likely to have STEM
training We also did not survey teachers regarding their Teach for America (TFA) experience
but it is possible that the relatively high share of STEM undergraduate degrees could be driven
by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last
pilot test included 140 students) prior to administering the tool to study participants Reliability
metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of
the development of the assessment instrument in the survey can be found in Seeratan et al
(2017) 13 Each year in the spring semester our team administered and collected the participant surveys
during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However
if study participants who did not take the survey differ in unobserved ways then our reweighting
based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo
characteristics before imputation of missing values (as described below) these results are very
similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact
that some students did not have 10th grade scores) we created one reading and math score for
each student that is the average of both scores or just the 8th grade score For the 23 participating
students who were in 10th grade during the year in which the AP course was offered to their
cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be
endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of
enrollment in the course so the lack of balance is simply due to unlucky randomization rather
32
than manipulation by school administrators We considered implementing a randomized block
design to avoid such issues but found it infeasible to obtain the necessary test score information
prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was
allowed to register for the new class We added an entire planning year to our study design to
avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by
Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We
find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these
six outcomes which suggests that generalizing our estimated treatment effects to the full control
group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly
offered enrollment in an AP course and then given the option of Chemistry or Biology To
account for the two courses offered we treat the school as two separate groups School-
Chemistry and School-Biology For those students who were not offered an AP course we
randomly assign them to one of two control groups proportional to the number of treated
students who chose each course For example if 60 of the treated students chose Biology then
we randomly assign 60 of the control students to the School-Biology control group In Section
VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a
probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1
if student i in school by cohort j completed any part of the end-of-year survey Xi is the same
vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed
effects and Φ() is the cumulative normal distribution function The results of this regression are
included in Online Appendix Table 2 Students who had higher pre-treatment grades Black
students those who were not disabled and those who took prerequisite courses were more likely
to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives
more weight in the regression to study participants who completed the survey and yet had pre-
study characteristics that were similar to those study participants who did not complete the
survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and
with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we
observe each outcome variable This follows a multiple imputation then deletion strategy
suggested by Hippel (2007) which improves efficiency while protecting against problematic
imputed outcome values As a robustness check Section VC provides results including
imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually
challenging curriculum with more homework than non-AP complier students Treatment group
students are also more likely to report that the students in their class were driven to succeed and
that the teacher set high standards The AP science class also involved more student-led projects
or experiments hands on learning and small group work all activities that are deemed to be
essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)
Yet we do not find strong evidence that students in AP classes were more likely to present what
they learned apply their knowledge to solve a new problem or work independently and none of
the component measures of technology usage were statistically significantly affected Nor did
33
treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear
better able to implement the academic rigor expected of an AP science class than some of the
inquiry-based approaches that the College Board intends for AP science We do not find
evidence that taking AP science led students to be more likely to report that they found their
course more interesting which may reflect the inability of the teachers to fully implement a
creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects
that might render our estimated effects smaller A research design with randomization both
across and within schools would allow for estimation of spillover effects but such a design was
infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers
received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the
weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors
in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple
outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons
(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same
three outcomes that reach statistical significance without applying the correction (shown in
Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys
from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we
have included the students from cohort 1 of high school number 23 where nonresponse was due
mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes
(including student prior academic preparation raceethnicity gender and teacher preparation)
We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in
science and grades in other courses) Some of the differences in the point estimates were quite
large yet so too were the standard errors For instance five of the seven estimated differential
treatment effects on science skill exceed 025 standard deviations with p-values that fall in the
suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse
on all three cohorts of study participants Once data collection is complete we will have the
ability to examine the effect of AP science on college enrollment college selectivity and college
completion
18
Rose Heather and Julian R Betts 2004 ldquoThe Effect of High School Courses on Earningsrdquo The
Review of Economics and Statistics 86 (2) 497ndash513
Roza Marguerite 2009 ldquoBreaking Down School Budgetsrdquo Education Next 9 (3)
Sadler Philip M Gerhard Sonnert Zahra Hazari and Robert H Tai 2014 ldquoThe Role of
Advanced High School Coursework in Increasing STEM Career Interestrdquo Science
Educator 23 (1) 1ndash13
Sadler Philip M and Robert H Tai 2007 ldquoAccounting for Advanced High School Coursework
in College Admission Decisionsrdquo College and University 82 (4) 7ndash14
Seeratan Kavita L Kevin W McElhaney Jessica Mislevy Raymond McGhee Jr Dylan
Conger and Mark C Long 2017 ldquoMeasuring Studentsrsquo Ability to Engage in Scientific
Inquiry A New Instrument to Assess Data Analysis Explanation and Argumentationrdquo
Educational Measurement Forthcoming
Smith Jonathan Michael Hurwitz and Christopher Avery 2017 ldquoGiving College Credit Where
it is Due Advanced Placement Exam Scores and College Outcomesrdquo Journal of Labor
Economics 35 (1) 67ndash147
Stankov Lazar 2013 ldquoNoncognitive Predictors of Intelligence and Academic Achievement An
Important Role of Confidencerdquo Personality and Individual Differences 55 (7) 727ndash732
Stankov Lazar and John D Crawford 1996 ldquoConfidence Judgments in Studies of Individual
differencesrdquo Personality and Individual Differences 21 (6) 971ndash986
Stankov Lazar and Jihyun Lee 2014 ldquoOverconfidence Across World Regionsrdquo Journal of
Cross-Cultural Psychology 45 (5) 821ndash837
Steinberg Jacques 2009 ldquoMany Teachers in Advanced Placement Voice Concern at its Rapid
Growthrdquo The New York Times April 29
Tai Robert H 2008 ldquoPosing Tougher Questions about the Advanced Placement Programrdquo
Liberal Education 94 (3) 38ndash43
The IB Programme 2016 ldquoThe IB Diploma Programme Statistical Bulletinrdquo
Theokas Christina and Reid Saaris 2013 ldquoFinding Americarsquos Missing AP and IB Studentsrdquo
Education Trust June 5
Thomas Nina Stephanie Marken Lucinda Gray and Laurie Lewis 2013 ldquoDual Credit and
Exam-Based Courses in US Public High Schools 2010-11 First Lookrdquo (NCES 2013-
001) US Department of Education Washington DC National Center for Education
Statistics
Tierney John 2012 ldquoAP Classes are a Scamrdquo Atlantic October 13
Tucker Jill 2012 ldquoStressful AP Courses A Push for a Caprdquo SF Gate
US Department of Education 2014 ldquoEducation Department Awards 40 States DC and the
Virgin Islands $284 Million in Grants to Help Low-Income Students Take Advanced
Placement Testsrdquo Washington DC
Weinstein Paul 2016 ldquoDiminishing Credit How Colleges and Universities Restrict the Use of
Advanced Placementrdquo Progressive Policy Institute Washington DC
West Martin R Matthew A Kraft Amy S Finn Rebecca E Martin Angela L Duckworth
Christopher FO Gabrieli and John DE Gabrieli 2016 ldquoPromise and Paradox Measuring
Studentsrsquo Non-Cognitive Skills and the Impact of Schoolingrdquo Educational Evaluation
and Policy Analysis 38 (1) 148ndash170
Yerkes Robert M and John D Dodson 1908 ldquoThe Relation of Strength of Stimulus to Rapidity
of Habit-Formationrdquo Journal of Comparative Neurology 18 (5) 459ndash482
19
Figure 1
Geographic Distribution of Participating Districts
20
Figure 2
Participating Districts Neighborhood Socioeconomic Status and School Test Scores
Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school
district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos
neighborhood defined as the first principal component factor score based on measures of median
income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed
household rate and unemployment rate Y-axis is the districtrsquos average test score in grade
equivalents based on the averaged spring math and English scores for students in grades 3-8 for
2009-2013 with the expected level of achievement standardized to zero The size of each circle
is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using
Statarsquos default settings and roughly shows the predicted test score as a function of the
neighborhoodrsquos SES
21
Figure 3
Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile
Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects
Corresponding OLS estimate shown by the dashed horizontal line Science skill has been
standardized to have a mean of 0 and SD of 1 for the full sample of participating students
Results are weighted by the inverse probability of completing the survey
22
Table 1
Participating Schools and Teachers Compared to Other US High Schools and High School
Science Teachers Panel A Schools Participating Others
Average Enrollment 1409 723
Free or Reduced-Price Lunch 0700 0438
Asian 0055 0050
Black 0349 0154
Hispanic 0410 0221
White 0164 0537
Adjusted Cohort Graduation Rate 0843 0802
District Instruction Expenditures Per Pupil $6561 $5636
District Student Services Expenditures Per Pupil $3787 $3385
Panel B Teachers Participating Others
Age Under 30 0407 0160
Age 30-49 0432 0553
Age 50 or over 0161 0287
Female 0630 0536
Hispanic or Latino 0111 0051
Race American Indian or Alaska Native 0000 0009
Race Asian American 0111 0041
Race Black 0111 0060
Race Native Hawaiian or other Pacific Islander 0000 0004
Race White 0778 0896
Years of Experience 103 132
Years of Experience lt=2 0290 0085
Years of Experience lt=5 0481 0234
Hold a Teaching Certificate 0926 0945
Undergraduate Major in STEM 0944 0747
Single Subject Credential in Science 0630 0823
Masterrsquos Degree or Higher 0356 0615
Previously Taught AP Course 0469 NA
Previously Taught AP IB or Honors Course 0796 NA
Number of Professional Development Trainings 309 NA
in the Past 5 years (0-5)
Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts
httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public
high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a
9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the
Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey
httpsncesedgovsurveyssass Others in Panel B refers to public and private high school
teachers in the US High school science teachers are defined as teachers of grades 9-12 whose
main teaching assignment is in the natural sciences
23
Table 2
TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics
(1) (2) (3) (4) (5) (6)
Full Sample Survey Sample
Pre-Treatment Characteristic
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Age as of October of 11th Grade 166 -003 -007 166 -001 -001
(002) (007) (003) (009)
[019] [035] [065] [094]
Math Exam Score 038 008 025 044 007 030
(004) (010) (005) (016)
[008] [002] [017] [006]
Reading Exam Score 029 010 018 036 009 017
(003) (012) (004) (017)
[000] [014] [002] [031]
HS Grade Point Average 316 005 020 323 006 013
(003) (008) (003) (010)
[014] [002] [006] [020]
Female 059 000 010 061 -001 011
(003) (006) (004) (007)
[099] [010] [073] [012]
Asian American 012 002 010 012 003 010
(002) (005) (001) (007)
[027] [006] [007] [012]
Black 032 -002 -006 027 000 -005
(002) (006) (002) (005)
[029] [028] [088] [040]
Hispanic Native American or Multiracial 031 001 005 033 001 005
24
(002) (006) (002) (007)
[055] [041] [081] [051]
Disabled 002 000 -001 001 000 -001
(001) (001) (001) (001)
[093] [024] [057] [05]
Gifted 013 003 000 014 002 001
(002) (005) (002) (009)
[006] [100] [025] [089]
English Language Learner 005 001 002 004 001 004
(001) (002) (001) (003)
[041] [039] [054] [022]
Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007
(002) (007) (003) (009)
[066] [077] [072] [045]
Language Other than English Spoken at Home 034 002 003 035 001 004
(002) (007) (002) (007)
[032] [073] [059] [056]
Took Recommended Prerequisite Courses 079 000 009 079 002 005
(002) (004) (002) (005)
[084] [004] [043] [031]
Number of Observations 1819 1417
Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by
School x Cohort are in parentheses and p-values are in brackets
25
Table 3
First Stage Impacts on AP Course Enrollment and Overall Course Enrollment
(1) (2) (3) (4) (5) (6)
Full Sample Survey Respondents
Outcome
Control
Group
Mean
ITT
LATE
Control
Group
Mean
ITT
LATE
AP Treatment Course Enrollment 019 038 024 039
(005) (006)
[000] [000] Share of Credits During Study Year in
AP Science 003 004 011 003 004 010
(001) (001) (001) (001)
[000] [000] [000] [000]
All AP 013 004 011 014 004 010
(001) (002) (001) (002)
[000] [000] [000] [000]
Other Advanced Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [023] [020] [020]
All Other Advanced 025 -001 -003 025 -001 -003
(001) (002) (001) (003)
[023] [023] [030] [030]
Regular Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [020] [024] [019]
All Regular 062 -003 -009 061 -003 -007
(001) (003) (001) (003)
[002] [000] [007] [003]
Number of Observations 1819 1417
Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating
Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation
(1) Course-taking information collected from student transcripts Control Group Mean uses the
full control group for the first outcome (ie AP Treatment Course Enrollment) and those control
group members who complied with their assignment (ie those who did not take the AP
Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are
weighted by the inverse probability of completing the survey Standard errors clustered by School
x Cohort are in parentheses and p-values are in brackets
26
Table 4
Treatment Contrast (Composite Variables)
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Academically Challenging Curriculum -033 031 080
(010) (024)
[000] [000]
Project-Based Independent Classroom
Activities -006 013 033
(007) (017)
[007] [006]
Integrated Use of Technology
-011 011 028
(008) (019)
[019] [014]
Number of Observations 1417
Notes To construct these composite variables we first converted the values on each component
variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest
category was set to 10 the lowest to 00 and the remaining categories evenly spaced between
00 and 10 We then averaged and standardized these converted values Results are weighted by
the inverse probability of completing the survey Online Appendix Table 5 provides the list of
component variables Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
27
Table 5
AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Science Skill -019 009 023
(006) (016)
[015] [014]
STEM Interest 062 004 009
(002) (007)
[016] [016]
Confidence in College
Science 092 -004 -010
(002) (005)
[011] [006]
Stress 012 007 017
(003) (007)
[002] [001]
Grades in Science Courses 280 -012 -029
(007) (016)
[008] [007]
Grades in Other Courses 314 -007 -018
(002) (006)
[000] [000]
Number of Observations 1819 for grades 1417 for other
outcomes
Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of
participating students STEM interest =1 if high or some interest in pursuing a STEM degree or
=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to
complete a college science course or =0 if somewhat not confident or not at all confident Stress=
1 if most recent science course had strong negative or negative impact on physical or emotional
health or =0 if strong positive impact positive impact or no impact Grades in science and other
courses are obtained from student transcripts and measure grades during the study year
Results with the exception of grades during study year are weighted by the inverse probability of
completing the survey Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
Table 6
Robustness Checks of Main ITT Results
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Outcome
Control
Group
Complie
r Mean
Main
Result
s
Robus
t SE
p-value
(permutatio
n test)
Excludin
g High
School
56
Including
Imputatio
n of
Missing
Outcome
Variables
Excluding
Covariate
s
Excludin
g High
School
23
Lee
Lower
Boun
d
Lee
Upper
Boun
d
95
Confidence
Interval
from Lee
Bounds
Rati
o of
95
CI in
(11)
to
95
CI in
(7)
Science Skill -019 009 010 011 020 007 003 039
-
009
05
1 20
(006) (005) (000) (000) (000) (000) (007) (007)
[015] [006] [006] [020] [011] [001] [024] [072] [000]
STEM Interest 062 004 005 003 003 003 002 012
-
003
01
8 19
(002) (003) (000) (000) (000) (000) (003) (004)
[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College
Science 092 -004 -003 -006 -006 -004 -006 005
-
009
01
0 20
(002) (002) (000) (000) (000) (000) (002) (003)
[011] [005] [007] [037] [002] [003] [010] [000] [017]
Stress 012 007 005 006 008 007 001 011
-
005
01
5 16
(003) (002) (000) (000) (000) (000) (003) (002)
[002] [000] [000] [014] [007] [002] [002] [079] [000]
Grades in Science Courses 280 -012 -006 -010 -007 |
(007) (004) (000) (000) (000)
[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts
Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey
(002) (003) (000) (000) (000) |
[000] [001] [001] [000] [001] [038]
Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than
standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby
a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of
43
the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)
reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the
experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply
imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and
where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)
from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those
treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and
control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to
derive confidence interval for the treatment effect itself)
30
1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the
Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the
effect of advanced high school courses more generally often without distinctions between AP
and other rigorous course options Nearly all of these nonexperimental studies find large positive
effects of rigorous secondary school courses particularly those in math and science on studentsrsquo
high school postsecondary and labor market performance (eg Altonji 1995 Attewell and
Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long
Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer
an AP course are encouraged (though not required) to attend a professional development
training The Board and other independent agencies offer several workshops with the most
extensive training being the AP summer institute a week-long training that is led by an
experienced AP instructor Teachers are then expected to develop their syllabi for the course and
submit them to the Board for review A team of auditors at the Board review each syllabus and
grant permission to a school to label the course as AP on course catalogs and student transcripts
once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they
do not meet the requirements upon original submission College Board (2017b) contains a
discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for
assessment (ie course delivery and student performance are not assessed by the Board) In
order to effectively run an AP Biology or Chemistry course teachers require access to a well-
equipped classroom and laboratory including all supplies necessary to engage in
experimentation (eg beakers solutions microscopes measuring equipment) Most of the
teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of
learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry
reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses
were based upon recommendations from the National Science Foundation the National Research
Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently
influenced by their frames of reference in ways that other assessments of these traits (eg
external observations) may be less influenced By increasing the standard to which they compare
themselves studentsrsquo confidence may decrease This feature of most self-assessments could be
considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et
al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome
depends to some extent on how these changes in perceived ability influence other behaviors
such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and
Biology I and Chemistry I for AP Biology with no additional requirements beyond these
prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week
training course classroom supplies (eg lab materials textbooks) and to compensate schools
for the staff time required for study administration efforts We also offered $1000 compensation
for an individual selected by the school to serve as a liaison between the study team and the
31
school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of
students which would have powered the study to detect effect sizes smaller than those detected
here We faced several challenges in recruiting schools to participate even with the monetary
incentives Some schools were uncomfortable with randomization across classrooms while
others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the
course would be offered We also made some assignments on a rolling basis as additional
consentassent forms were submitted We have no information on the students who were deemed
eligible by the school to take the new AP science course but who did not sign the consent form
to participate As these students did not participate we do not have permission to obtain
information on their characteristics (eg via transcripts) and for most schools we do not know
the number of such students 10 Participating districts include Anaheim Union High School District California East Side
Union High School District California Lynwood Unified School District California Jefferson
Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg
Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public
Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville
Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate
degrees held by teachers nationally are likely to be in education (not STEM) Thus the study
teachers are less likely to have a graduate degree but not necessarily less likely to have STEM
training We also did not survey teachers regarding their Teach for America (TFA) experience
but it is possible that the relatively high share of STEM undergraduate degrees could be driven
by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last
pilot test included 140 students) prior to administering the tool to study participants Reliability
metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of
the development of the assessment instrument in the survey can be found in Seeratan et al
(2017) 13 Each year in the spring semester our team administered and collected the participant surveys
during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However
if study participants who did not take the survey differ in unobserved ways then our reweighting
based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo
characteristics before imputation of missing values (as described below) these results are very
similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact
that some students did not have 10th grade scores) we created one reading and math score for
each student that is the average of both scores or just the 8th grade score For the 23 participating
students who were in 10th grade during the year in which the AP course was offered to their
cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be
endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of
enrollment in the course so the lack of balance is simply due to unlucky randomization rather
32
than manipulation by school administrators We considered implementing a randomized block
design to avoid such issues but found it infeasible to obtain the necessary test score information
prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was
allowed to register for the new class We added an entire planning year to our study design to
avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by
Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We
find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these
six outcomes which suggests that generalizing our estimated treatment effects to the full control
group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly
offered enrollment in an AP course and then given the option of Chemistry or Biology To
account for the two courses offered we treat the school as two separate groups School-
Chemistry and School-Biology For those students who were not offered an AP course we
randomly assign them to one of two control groups proportional to the number of treated
students who chose each course For example if 60 of the treated students chose Biology then
we randomly assign 60 of the control students to the School-Biology control group In Section
VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a
probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1
if student i in school by cohort j completed any part of the end-of-year survey Xi is the same
vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed
effects and Φ() is the cumulative normal distribution function The results of this regression are
included in Online Appendix Table 2 Students who had higher pre-treatment grades Black
students those who were not disabled and those who took prerequisite courses were more likely
to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives
more weight in the regression to study participants who completed the survey and yet had pre-
study characteristics that were similar to those study participants who did not complete the
survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and
with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we
observe each outcome variable This follows a multiple imputation then deletion strategy
suggested by Hippel (2007) which improves efficiency while protecting against problematic
imputed outcome values As a robustness check Section VC provides results including
imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually
challenging curriculum with more homework than non-AP complier students Treatment group
students are also more likely to report that the students in their class were driven to succeed and
that the teacher set high standards The AP science class also involved more student-led projects
or experiments hands on learning and small group work all activities that are deemed to be
essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)
Yet we do not find strong evidence that students in AP classes were more likely to present what
they learned apply their knowledge to solve a new problem or work independently and none of
the component measures of technology usage were statistically significantly affected Nor did
33
treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear
better able to implement the academic rigor expected of an AP science class than some of the
inquiry-based approaches that the College Board intends for AP science We do not find
evidence that taking AP science led students to be more likely to report that they found their
course more interesting which may reflect the inability of the teachers to fully implement a
creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects
that might render our estimated effects smaller A research design with randomization both
across and within schools would allow for estimation of spillover effects but such a design was
infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers
received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the
weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors
in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple
outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons
(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same
three outcomes that reach statistical significance without applying the correction (shown in
Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys
from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we
have included the students from cohort 1 of high school number 23 where nonresponse was due
mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes
(including student prior academic preparation raceethnicity gender and teacher preparation)
We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in
science and grades in other courses) Some of the differences in the point estimates were quite
large yet so too were the standard errors For instance five of the seven estimated differential
treatment effects on science skill exceed 025 standard deviations with p-values that fall in the
suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse
on all three cohorts of study participants Once data collection is complete we will have the
ability to examine the effect of AP science on college enrollment college selectivity and college
completion
19
Figure 1
Geographic Distribution of Participating Districts
20
Figure 2
Participating Districts Neighborhood Socioeconomic Status and School Test Scores
Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school
district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos
neighborhood defined as the first principal component factor score based on measures of median
income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed
household rate and unemployment rate Y-axis is the districtrsquos average test score in grade
equivalents based on the averaged spring math and English scores for students in grades 3-8 for
2009-2013 with the expected level of achievement standardized to zero The size of each circle
is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using
Statarsquos default settings and roughly shows the predicted test score as a function of the
neighborhoodrsquos SES
21
Figure 3
Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile
Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects
Corresponding OLS estimate shown by the dashed horizontal line Science skill has been
standardized to have a mean of 0 and SD of 1 for the full sample of participating students
Results are weighted by the inverse probability of completing the survey
22
Table 1
Participating Schools and Teachers Compared to Other US High Schools and High School
Science Teachers Panel A Schools Participating Others
Average Enrollment 1409 723
Free or Reduced-Price Lunch 0700 0438
Asian 0055 0050
Black 0349 0154
Hispanic 0410 0221
White 0164 0537
Adjusted Cohort Graduation Rate 0843 0802
District Instruction Expenditures Per Pupil $6561 $5636
District Student Services Expenditures Per Pupil $3787 $3385
Panel B Teachers Participating Others
Age Under 30 0407 0160
Age 30-49 0432 0553
Age 50 or over 0161 0287
Female 0630 0536
Hispanic or Latino 0111 0051
Race American Indian or Alaska Native 0000 0009
Race Asian American 0111 0041
Race Black 0111 0060
Race Native Hawaiian or other Pacific Islander 0000 0004
Race White 0778 0896
Years of Experience 103 132
Years of Experience lt=2 0290 0085
Years of Experience lt=5 0481 0234
Hold a Teaching Certificate 0926 0945
Undergraduate Major in STEM 0944 0747
Single Subject Credential in Science 0630 0823
Masterrsquos Degree or Higher 0356 0615
Previously Taught AP Course 0469 NA
Previously Taught AP IB or Honors Course 0796 NA
Number of Professional Development Trainings 309 NA
in the Past 5 years (0-5)
Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts
httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public
high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a
9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the
Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey
httpsncesedgovsurveyssass Others in Panel B refers to public and private high school
teachers in the US High school science teachers are defined as teachers of grades 9-12 whose
main teaching assignment is in the natural sciences
23
Table 2
TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics
(1) (2) (3) (4) (5) (6)
Full Sample Survey Sample
Pre-Treatment Characteristic
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Age as of October of 11th Grade 166 -003 -007 166 -001 -001
(002) (007) (003) (009)
[019] [035] [065] [094]
Math Exam Score 038 008 025 044 007 030
(004) (010) (005) (016)
[008] [002] [017] [006]
Reading Exam Score 029 010 018 036 009 017
(003) (012) (004) (017)
[000] [014] [002] [031]
HS Grade Point Average 316 005 020 323 006 013
(003) (008) (003) (010)
[014] [002] [006] [020]
Female 059 000 010 061 -001 011
(003) (006) (004) (007)
[099] [010] [073] [012]
Asian American 012 002 010 012 003 010
(002) (005) (001) (007)
[027] [006] [007] [012]
Black 032 -002 -006 027 000 -005
(002) (006) (002) (005)
[029] [028] [088] [040]
Hispanic Native American or Multiracial 031 001 005 033 001 005
24
(002) (006) (002) (007)
[055] [041] [081] [051]
Disabled 002 000 -001 001 000 -001
(001) (001) (001) (001)
[093] [024] [057] [05]
Gifted 013 003 000 014 002 001
(002) (005) (002) (009)
[006] [100] [025] [089]
English Language Learner 005 001 002 004 001 004
(001) (002) (001) (003)
[041] [039] [054] [022]
Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007
(002) (007) (003) (009)
[066] [077] [072] [045]
Language Other than English Spoken at Home 034 002 003 035 001 004
(002) (007) (002) (007)
[032] [073] [059] [056]
Took Recommended Prerequisite Courses 079 000 009 079 002 005
(002) (004) (002) (005)
[084] [004] [043] [031]
Number of Observations 1819 1417
Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by
School x Cohort are in parentheses and p-values are in brackets
25
Table 3
First Stage Impacts on AP Course Enrollment and Overall Course Enrollment
(1) (2) (3) (4) (5) (6)
Full Sample Survey Respondents
Outcome
Control
Group
Mean
ITT
LATE
Control
Group
Mean
ITT
LATE
AP Treatment Course Enrollment 019 038 024 039
(005) (006)
[000] [000] Share of Credits During Study Year in
AP Science 003 004 011 003 004 010
(001) (001) (001) (001)
[000] [000] [000] [000]
All AP 013 004 011 014 004 010
(001) (002) (001) (002)
[000] [000] [000] [000]
Other Advanced Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [023] [020] [020]
All Other Advanced 025 -001 -003 025 -001 -003
(001) (002) (001) (003)
[023] [023] [030] [030]
Regular Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [020] [024] [019]
All Regular 062 -003 -009 061 -003 -007
(001) (003) (001) (003)
[002] [000] [007] [003]
Number of Observations 1819 1417
Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating
Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation
(1) Course-taking information collected from student transcripts Control Group Mean uses the
full control group for the first outcome (ie AP Treatment Course Enrollment) and those control
group members who complied with their assignment (ie those who did not take the AP
Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are
weighted by the inverse probability of completing the survey Standard errors clustered by School
x Cohort are in parentheses and p-values are in brackets
26
Table 4
Treatment Contrast (Composite Variables)
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Academically Challenging Curriculum -033 031 080
(010) (024)
[000] [000]
Project-Based Independent Classroom
Activities -006 013 033
(007) (017)
[007] [006]
Integrated Use of Technology
-011 011 028
(008) (019)
[019] [014]
Number of Observations 1417
Notes To construct these composite variables we first converted the values on each component
variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest
category was set to 10 the lowest to 00 and the remaining categories evenly spaced between
00 and 10 We then averaged and standardized these converted values Results are weighted by
the inverse probability of completing the survey Online Appendix Table 5 provides the list of
component variables Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
27
Table 5
AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Science Skill -019 009 023
(006) (016)
[015] [014]
STEM Interest 062 004 009
(002) (007)
[016] [016]
Confidence in College
Science 092 -004 -010
(002) (005)
[011] [006]
Stress 012 007 017
(003) (007)
[002] [001]
Grades in Science Courses 280 -012 -029
(007) (016)
[008] [007]
Grades in Other Courses 314 -007 -018
(002) (006)
[000] [000]
Number of Observations 1819 for grades 1417 for other
outcomes
Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of
participating students STEM interest =1 if high or some interest in pursuing a STEM degree or
=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to
complete a college science course or =0 if somewhat not confident or not at all confident Stress=
1 if most recent science course had strong negative or negative impact on physical or emotional
health or =0 if strong positive impact positive impact or no impact Grades in science and other
courses are obtained from student transcripts and measure grades during the study year
Results with the exception of grades during study year are weighted by the inverse probability of
completing the survey Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
Table 6
Robustness Checks of Main ITT Results
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Outcome
Control
Group
Complie
r Mean
Main
Result
s
Robus
t SE
p-value
(permutatio
n test)
Excludin
g High
School
56
Including
Imputatio
n of
Missing
Outcome
Variables
Excluding
Covariate
s
Excludin
g High
School
23
Lee
Lower
Boun
d
Lee
Upper
Boun
d
95
Confidence
Interval
from Lee
Bounds
Rati
o of
95
CI in
(11)
to
95
CI in
(7)
Science Skill -019 009 010 011 020 007 003 039
-
009
05
1 20
(006) (005) (000) (000) (000) (000) (007) (007)
[015] [006] [006] [020] [011] [001] [024] [072] [000]
STEM Interest 062 004 005 003 003 003 002 012
-
003
01
8 19
(002) (003) (000) (000) (000) (000) (003) (004)
[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College
Science 092 -004 -003 -006 -006 -004 -006 005
-
009
01
0 20
(002) (002) (000) (000) (000) (000) (002) (003)
[011] [005] [007] [037] [002] [003] [010] [000] [017]
Stress 012 007 005 006 008 007 001 011
-
005
01
5 16
(003) (002) (000) (000) (000) (000) (003) (002)
[002] [000] [000] [014] [007] [002] [002] [079] [000]
Grades in Science Courses 280 -012 -006 -010 -007 |
(007) (004) (000) (000) (000)
[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts
Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey
(002) (003) (000) (000) (000) |
[000] [001] [001] [000] [001] [038]
Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than
standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby
a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of
43
the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)
reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the
experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply
imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and
where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)
from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those
treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and
control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to
derive confidence interval for the treatment effect itself)
30
1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the
Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the
effect of advanced high school courses more generally often without distinctions between AP
and other rigorous course options Nearly all of these nonexperimental studies find large positive
effects of rigorous secondary school courses particularly those in math and science on studentsrsquo
high school postsecondary and labor market performance (eg Altonji 1995 Attewell and
Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long
Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer
an AP course are encouraged (though not required) to attend a professional development
training The Board and other independent agencies offer several workshops with the most
extensive training being the AP summer institute a week-long training that is led by an
experienced AP instructor Teachers are then expected to develop their syllabi for the course and
submit them to the Board for review A team of auditors at the Board review each syllabus and
grant permission to a school to label the course as AP on course catalogs and student transcripts
once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they
do not meet the requirements upon original submission College Board (2017b) contains a
discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for
assessment (ie course delivery and student performance are not assessed by the Board) In
order to effectively run an AP Biology or Chemistry course teachers require access to a well-
equipped classroom and laboratory including all supplies necessary to engage in
experimentation (eg beakers solutions microscopes measuring equipment) Most of the
teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of
learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry
reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses
were based upon recommendations from the National Science Foundation the National Research
Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently
influenced by their frames of reference in ways that other assessments of these traits (eg
external observations) may be less influenced By increasing the standard to which they compare
themselves studentsrsquo confidence may decrease This feature of most self-assessments could be
considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et
al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome
depends to some extent on how these changes in perceived ability influence other behaviors
such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and
Biology I and Chemistry I for AP Biology with no additional requirements beyond these
prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week
training course classroom supplies (eg lab materials textbooks) and to compensate schools
for the staff time required for study administration efforts We also offered $1000 compensation
for an individual selected by the school to serve as a liaison between the study team and the
31
school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of
students which would have powered the study to detect effect sizes smaller than those detected
here We faced several challenges in recruiting schools to participate even with the monetary
incentives Some schools were uncomfortable with randomization across classrooms while
others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the
course would be offered We also made some assignments on a rolling basis as additional
consentassent forms were submitted We have no information on the students who were deemed
eligible by the school to take the new AP science course but who did not sign the consent form
to participate As these students did not participate we do not have permission to obtain
information on their characteristics (eg via transcripts) and for most schools we do not know
the number of such students 10 Participating districts include Anaheim Union High School District California East Side
Union High School District California Lynwood Unified School District California Jefferson
Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg
Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public
Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville
Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate
degrees held by teachers nationally are likely to be in education (not STEM) Thus the study
teachers are less likely to have a graduate degree but not necessarily less likely to have STEM
training We also did not survey teachers regarding their Teach for America (TFA) experience
but it is possible that the relatively high share of STEM undergraduate degrees could be driven
by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last
pilot test included 140 students) prior to administering the tool to study participants Reliability
metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of
the development of the assessment instrument in the survey can be found in Seeratan et al
(2017) 13 Each year in the spring semester our team administered and collected the participant surveys
during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However
if study participants who did not take the survey differ in unobserved ways then our reweighting
based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo
characteristics before imputation of missing values (as described below) these results are very
similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact
that some students did not have 10th grade scores) we created one reading and math score for
each student that is the average of both scores or just the 8th grade score For the 23 participating
students who were in 10th grade during the year in which the AP course was offered to their
cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be
endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of
enrollment in the course so the lack of balance is simply due to unlucky randomization rather
32
than manipulation by school administrators We considered implementing a randomized block
design to avoid such issues but found it infeasible to obtain the necessary test score information
prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was
allowed to register for the new class We added an entire planning year to our study design to
avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by
Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We
find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these
six outcomes which suggests that generalizing our estimated treatment effects to the full control
group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly
offered enrollment in an AP course and then given the option of Chemistry or Biology To
account for the two courses offered we treat the school as two separate groups School-
Chemistry and School-Biology For those students who were not offered an AP course we
randomly assign them to one of two control groups proportional to the number of treated
students who chose each course For example if 60 of the treated students chose Biology then
we randomly assign 60 of the control students to the School-Biology control group In Section
VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a
probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1
if student i in school by cohort j completed any part of the end-of-year survey Xi is the same
vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed
effects and Φ() is the cumulative normal distribution function The results of this regression are
included in Online Appendix Table 2 Students who had higher pre-treatment grades Black
students those who were not disabled and those who took prerequisite courses were more likely
to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives
more weight in the regression to study participants who completed the survey and yet had pre-
study characteristics that were similar to those study participants who did not complete the
survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and
with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we
observe each outcome variable This follows a multiple imputation then deletion strategy
suggested by Hippel (2007) which improves efficiency while protecting against problematic
imputed outcome values As a robustness check Section VC provides results including
imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually
challenging curriculum with more homework than non-AP complier students Treatment group
students are also more likely to report that the students in their class were driven to succeed and
that the teacher set high standards The AP science class also involved more student-led projects
or experiments hands on learning and small group work all activities that are deemed to be
essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)
Yet we do not find strong evidence that students in AP classes were more likely to present what
they learned apply their knowledge to solve a new problem or work independently and none of
the component measures of technology usage were statistically significantly affected Nor did
33
treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear
better able to implement the academic rigor expected of an AP science class than some of the
inquiry-based approaches that the College Board intends for AP science We do not find
evidence that taking AP science led students to be more likely to report that they found their
course more interesting which may reflect the inability of the teachers to fully implement a
creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects
that might render our estimated effects smaller A research design with randomization both
across and within schools would allow for estimation of spillover effects but such a design was
infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers
received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the
weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors
in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple
outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons
(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same
three outcomes that reach statistical significance without applying the correction (shown in
Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys
from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we
have included the students from cohort 1 of high school number 23 where nonresponse was due
mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes
(including student prior academic preparation raceethnicity gender and teacher preparation)
We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in
science and grades in other courses) Some of the differences in the point estimates were quite
large yet so too were the standard errors For instance five of the seven estimated differential
treatment effects on science skill exceed 025 standard deviations with p-values that fall in the
suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse
on all three cohorts of study participants Once data collection is complete we will have the
ability to examine the effect of AP science on college enrollment college selectivity and college
completion
20
Figure 2
Participating Districts Neighborhood Socioeconomic Status and School Test Scores
Notes Data from Reardon Kalogrides and Shores (2016) Each circle represents one school
district in the United States X-axis is the Standardized Socioeconomic Status of the districtrsquos
neighborhood defined as the first principal component factor score based on measures of median
income percent with a bachelorrsquos degree or higher poverty rate SNAP rate single mother headed
household rate and unemployment rate Y-axis is the districtrsquos average test score in grade
equivalents based on the averaged spring math and English scores for students in grades 3-8 for
2009-2013 with the expected level of achievement standardized to zero The size of each circle
is proportional to the districtrsquos enrollment The dashed line is a lowess curve created using
Statarsquos default settings and roughly shows the predicted test score as a function of the
neighborhoodrsquos SES
21
Figure 3
Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile
Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects
Corresponding OLS estimate shown by the dashed horizontal line Science skill has been
standardized to have a mean of 0 and SD of 1 for the full sample of participating students
Results are weighted by the inverse probability of completing the survey
22
Table 1
Participating Schools and Teachers Compared to Other US High Schools and High School
Science Teachers Panel A Schools Participating Others
Average Enrollment 1409 723
Free or Reduced-Price Lunch 0700 0438
Asian 0055 0050
Black 0349 0154
Hispanic 0410 0221
White 0164 0537
Adjusted Cohort Graduation Rate 0843 0802
District Instruction Expenditures Per Pupil $6561 $5636
District Student Services Expenditures Per Pupil $3787 $3385
Panel B Teachers Participating Others
Age Under 30 0407 0160
Age 30-49 0432 0553
Age 50 or over 0161 0287
Female 0630 0536
Hispanic or Latino 0111 0051
Race American Indian or Alaska Native 0000 0009
Race Asian American 0111 0041
Race Black 0111 0060
Race Native Hawaiian or other Pacific Islander 0000 0004
Race White 0778 0896
Years of Experience 103 132
Years of Experience lt=2 0290 0085
Years of Experience lt=5 0481 0234
Hold a Teaching Certificate 0926 0945
Undergraduate Major in STEM 0944 0747
Single Subject Credential in Science 0630 0823
Masterrsquos Degree or Higher 0356 0615
Previously Taught AP Course 0469 NA
Previously Taught AP IB or Honors Course 0796 NA
Number of Professional Development Trainings 309 NA
in the Past 5 years (0-5)
Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts
httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public
high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a
9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the
Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey
httpsncesedgovsurveyssass Others in Panel B refers to public and private high school
teachers in the US High school science teachers are defined as teachers of grades 9-12 whose
main teaching assignment is in the natural sciences
23
Table 2
TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics
(1) (2) (3) (4) (5) (6)
Full Sample Survey Sample
Pre-Treatment Characteristic
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Age as of October of 11th Grade 166 -003 -007 166 -001 -001
(002) (007) (003) (009)
[019] [035] [065] [094]
Math Exam Score 038 008 025 044 007 030
(004) (010) (005) (016)
[008] [002] [017] [006]
Reading Exam Score 029 010 018 036 009 017
(003) (012) (004) (017)
[000] [014] [002] [031]
HS Grade Point Average 316 005 020 323 006 013
(003) (008) (003) (010)
[014] [002] [006] [020]
Female 059 000 010 061 -001 011
(003) (006) (004) (007)
[099] [010] [073] [012]
Asian American 012 002 010 012 003 010
(002) (005) (001) (007)
[027] [006] [007] [012]
Black 032 -002 -006 027 000 -005
(002) (006) (002) (005)
[029] [028] [088] [040]
Hispanic Native American or Multiracial 031 001 005 033 001 005
24
(002) (006) (002) (007)
[055] [041] [081] [051]
Disabled 002 000 -001 001 000 -001
(001) (001) (001) (001)
[093] [024] [057] [05]
Gifted 013 003 000 014 002 001
(002) (005) (002) (009)
[006] [100] [025] [089]
English Language Learner 005 001 002 004 001 004
(001) (002) (001) (003)
[041] [039] [054] [022]
Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007
(002) (007) (003) (009)
[066] [077] [072] [045]
Language Other than English Spoken at Home 034 002 003 035 001 004
(002) (007) (002) (007)
[032] [073] [059] [056]
Took Recommended Prerequisite Courses 079 000 009 079 002 005
(002) (004) (002) (005)
[084] [004] [043] [031]
Number of Observations 1819 1417
Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by
School x Cohort are in parentheses and p-values are in brackets
25
Table 3
First Stage Impacts on AP Course Enrollment and Overall Course Enrollment
(1) (2) (3) (4) (5) (6)
Full Sample Survey Respondents
Outcome
Control
Group
Mean
ITT
LATE
Control
Group
Mean
ITT
LATE
AP Treatment Course Enrollment 019 038 024 039
(005) (006)
[000] [000] Share of Credits During Study Year in
AP Science 003 004 011 003 004 010
(001) (001) (001) (001)
[000] [000] [000] [000]
All AP 013 004 011 014 004 010
(001) (002) (001) (002)
[000] [000] [000] [000]
Other Advanced Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [023] [020] [020]
All Other Advanced 025 -001 -003 025 -001 -003
(001) (002) (001) (003)
[023] [023] [030] [030]
Regular Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [020] [024] [019]
All Regular 062 -003 -009 061 -003 -007
(001) (003) (001) (003)
[002] [000] [007] [003]
Number of Observations 1819 1417
Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating
Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation
(1) Course-taking information collected from student transcripts Control Group Mean uses the
full control group for the first outcome (ie AP Treatment Course Enrollment) and those control
group members who complied with their assignment (ie those who did not take the AP
Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are
weighted by the inverse probability of completing the survey Standard errors clustered by School
x Cohort are in parentheses and p-values are in brackets
26
Table 4
Treatment Contrast (Composite Variables)
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Academically Challenging Curriculum -033 031 080
(010) (024)
[000] [000]
Project-Based Independent Classroom
Activities -006 013 033
(007) (017)
[007] [006]
Integrated Use of Technology
-011 011 028
(008) (019)
[019] [014]
Number of Observations 1417
Notes To construct these composite variables we first converted the values on each component
variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest
category was set to 10 the lowest to 00 and the remaining categories evenly spaced between
00 and 10 We then averaged and standardized these converted values Results are weighted by
the inverse probability of completing the survey Online Appendix Table 5 provides the list of
component variables Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
27
Table 5
AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Science Skill -019 009 023
(006) (016)
[015] [014]
STEM Interest 062 004 009
(002) (007)
[016] [016]
Confidence in College
Science 092 -004 -010
(002) (005)
[011] [006]
Stress 012 007 017
(003) (007)
[002] [001]
Grades in Science Courses 280 -012 -029
(007) (016)
[008] [007]
Grades in Other Courses 314 -007 -018
(002) (006)
[000] [000]
Number of Observations 1819 for grades 1417 for other
outcomes
Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of
participating students STEM interest =1 if high or some interest in pursuing a STEM degree or
=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to
complete a college science course or =0 if somewhat not confident or not at all confident Stress=
1 if most recent science course had strong negative or negative impact on physical or emotional
health or =0 if strong positive impact positive impact or no impact Grades in science and other
courses are obtained from student transcripts and measure grades during the study year
Results with the exception of grades during study year are weighted by the inverse probability of
completing the survey Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
Table 6
Robustness Checks of Main ITT Results
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Outcome
Control
Group
Complie
r Mean
Main
Result
s
Robus
t SE
p-value
(permutatio
n test)
Excludin
g High
School
56
Including
Imputatio
n of
Missing
Outcome
Variables
Excluding
Covariate
s
Excludin
g High
School
23
Lee
Lower
Boun
d
Lee
Upper
Boun
d
95
Confidence
Interval
from Lee
Bounds
Rati
o of
95
CI in
(11)
to
95
CI in
(7)
Science Skill -019 009 010 011 020 007 003 039
-
009
05
1 20
(006) (005) (000) (000) (000) (000) (007) (007)
[015] [006] [006] [020] [011] [001] [024] [072] [000]
STEM Interest 062 004 005 003 003 003 002 012
-
003
01
8 19
(002) (003) (000) (000) (000) (000) (003) (004)
[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College
Science 092 -004 -003 -006 -006 -004 -006 005
-
009
01
0 20
(002) (002) (000) (000) (000) (000) (002) (003)
[011] [005] [007] [037] [002] [003] [010] [000] [017]
Stress 012 007 005 006 008 007 001 011
-
005
01
5 16
(003) (002) (000) (000) (000) (000) (003) (002)
[002] [000] [000] [014] [007] [002] [002] [079] [000]
Grades in Science Courses 280 -012 -006 -010 -007 |
(007) (004) (000) (000) (000)
[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts
Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey
(002) (003) (000) (000) (000) |
[000] [001] [001] [000] [001] [038]
Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than
standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby
a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of
43
the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)
reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the
experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply
imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and
where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)
from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those
treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and
control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to
derive confidence interval for the treatment effect itself)
30
1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the
Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the
effect of advanced high school courses more generally often without distinctions between AP
and other rigorous course options Nearly all of these nonexperimental studies find large positive
effects of rigorous secondary school courses particularly those in math and science on studentsrsquo
high school postsecondary and labor market performance (eg Altonji 1995 Attewell and
Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long
Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer
an AP course are encouraged (though not required) to attend a professional development
training The Board and other independent agencies offer several workshops with the most
extensive training being the AP summer institute a week-long training that is led by an
experienced AP instructor Teachers are then expected to develop their syllabi for the course and
submit them to the Board for review A team of auditors at the Board review each syllabus and
grant permission to a school to label the course as AP on course catalogs and student transcripts
once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they
do not meet the requirements upon original submission College Board (2017b) contains a
discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for
assessment (ie course delivery and student performance are not assessed by the Board) In
order to effectively run an AP Biology or Chemistry course teachers require access to a well-
equipped classroom and laboratory including all supplies necessary to engage in
experimentation (eg beakers solutions microscopes measuring equipment) Most of the
teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of
learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry
reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses
were based upon recommendations from the National Science Foundation the National Research
Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently
influenced by their frames of reference in ways that other assessments of these traits (eg
external observations) may be less influenced By increasing the standard to which they compare
themselves studentsrsquo confidence may decrease This feature of most self-assessments could be
considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et
al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome
depends to some extent on how these changes in perceived ability influence other behaviors
such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and
Biology I and Chemistry I for AP Biology with no additional requirements beyond these
prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week
training course classroom supplies (eg lab materials textbooks) and to compensate schools
for the staff time required for study administration efforts We also offered $1000 compensation
for an individual selected by the school to serve as a liaison between the study team and the
31
school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of
students which would have powered the study to detect effect sizes smaller than those detected
here We faced several challenges in recruiting schools to participate even with the monetary
incentives Some schools were uncomfortable with randomization across classrooms while
others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the
course would be offered We also made some assignments on a rolling basis as additional
consentassent forms were submitted We have no information on the students who were deemed
eligible by the school to take the new AP science course but who did not sign the consent form
to participate As these students did not participate we do not have permission to obtain
information on their characteristics (eg via transcripts) and for most schools we do not know
the number of such students 10 Participating districts include Anaheim Union High School District California East Side
Union High School District California Lynwood Unified School District California Jefferson
Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg
Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public
Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville
Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate
degrees held by teachers nationally are likely to be in education (not STEM) Thus the study
teachers are less likely to have a graduate degree but not necessarily less likely to have STEM
training We also did not survey teachers regarding their Teach for America (TFA) experience
but it is possible that the relatively high share of STEM undergraduate degrees could be driven
by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last
pilot test included 140 students) prior to administering the tool to study participants Reliability
metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of
the development of the assessment instrument in the survey can be found in Seeratan et al
(2017) 13 Each year in the spring semester our team administered and collected the participant surveys
during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However
if study participants who did not take the survey differ in unobserved ways then our reweighting
based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo
characteristics before imputation of missing values (as described below) these results are very
similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact
that some students did not have 10th grade scores) we created one reading and math score for
each student that is the average of both scores or just the 8th grade score For the 23 participating
students who were in 10th grade during the year in which the AP course was offered to their
cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be
endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of
enrollment in the course so the lack of balance is simply due to unlucky randomization rather
32
than manipulation by school administrators We considered implementing a randomized block
design to avoid such issues but found it infeasible to obtain the necessary test score information
prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was
allowed to register for the new class We added an entire planning year to our study design to
avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by
Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We
find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these
six outcomes which suggests that generalizing our estimated treatment effects to the full control
group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly
offered enrollment in an AP course and then given the option of Chemistry or Biology To
account for the two courses offered we treat the school as two separate groups School-
Chemistry and School-Biology For those students who were not offered an AP course we
randomly assign them to one of two control groups proportional to the number of treated
students who chose each course For example if 60 of the treated students chose Biology then
we randomly assign 60 of the control students to the School-Biology control group In Section
VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a
probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1
if student i in school by cohort j completed any part of the end-of-year survey Xi is the same
vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed
effects and Φ() is the cumulative normal distribution function The results of this regression are
included in Online Appendix Table 2 Students who had higher pre-treatment grades Black
students those who were not disabled and those who took prerequisite courses were more likely
to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives
more weight in the regression to study participants who completed the survey and yet had pre-
study characteristics that were similar to those study participants who did not complete the
survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and
with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we
observe each outcome variable This follows a multiple imputation then deletion strategy
suggested by Hippel (2007) which improves efficiency while protecting against problematic
imputed outcome values As a robustness check Section VC provides results including
imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually
challenging curriculum with more homework than non-AP complier students Treatment group
students are also more likely to report that the students in their class were driven to succeed and
that the teacher set high standards The AP science class also involved more student-led projects
or experiments hands on learning and small group work all activities that are deemed to be
essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)
Yet we do not find strong evidence that students in AP classes were more likely to present what
they learned apply their knowledge to solve a new problem or work independently and none of
the component measures of technology usage were statistically significantly affected Nor did
33
treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear
better able to implement the academic rigor expected of an AP science class than some of the
inquiry-based approaches that the College Board intends for AP science We do not find
evidence that taking AP science led students to be more likely to report that they found their
course more interesting which may reflect the inability of the teachers to fully implement a
creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects
that might render our estimated effects smaller A research design with randomization both
across and within schools would allow for estimation of spillover effects but such a design was
infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers
received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the
weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors
in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple
outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons
(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same
three outcomes that reach statistical significance without applying the correction (shown in
Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys
from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we
have included the students from cohort 1 of high school number 23 where nonresponse was due
mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes
(including student prior academic preparation raceethnicity gender and teacher preparation)
We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in
science and grades in other courses) Some of the differences in the point estimates were quite
large yet so too were the standard errors For instance five of the seven estimated differential
treatment effects on science skill exceed 025 standard deviations with p-values that fall in the
suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse
on all three cohorts of study participants Once data collection is complete we will have the
ability to examine the effect of AP science on college enrollment college selectivity and college
completion
21
Figure 3
Intent to Treat Effect of AP Course on Science Skill by Conditional Quantile
Notes Quantiles are conditional on pretreatment characteristics and school by cohort fixed effects
Corresponding OLS estimate shown by the dashed horizontal line Science skill has been
standardized to have a mean of 0 and SD of 1 for the full sample of participating students
Results are weighted by the inverse probability of completing the survey
22
Table 1
Participating Schools and Teachers Compared to Other US High Schools and High School
Science Teachers Panel A Schools Participating Others
Average Enrollment 1409 723
Free or Reduced-Price Lunch 0700 0438
Asian 0055 0050
Black 0349 0154
Hispanic 0410 0221
White 0164 0537
Adjusted Cohort Graduation Rate 0843 0802
District Instruction Expenditures Per Pupil $6561 $5636
District Student Services Expenditures Per Pupil $3787 $3385
Panel B Teachers Participating Others
Age Under 30 0407 0160
Age 30-49 0432 0553
Age 50 or over 0161 0287
Female 0630 0536
Hispanic or Latino 0111 0051
Race American Indian or Alaska Native 0000 0009
Race Asian American 0111 0041
Race Black 0111 0060
Race Native Hawaiian or other Pacific Islander 0000 0004
Race White 0778 0896
Years of Experience 103 132
Years of Experience lt=2 0290 0085
Years of Experience lt=5 0481 0234
Hold a Teaching Certificate 0926 0945
Undergraduate Major in STEM 0944 0747
Single Subject Credential in Science 0630 0823
Masterrsquos Degree or Higher 0356 0615
Previously Taught AP Course 0469 NA
Previously Taught AP IB or Honors Course 0796 NA
Number of Professional Development Trainings 309 NA
in the Past 5 years (0-5)
Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts
httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public
high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a
9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the
Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey
httpsncesedgovsurveyssass Others in Panel B refers to public and private high school
teachers in the US High school science teachers are defined as teachers of grades 9-12 whose
main teaching assignment is in the natural sciences
23
Table 2
TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics
(1) (2) (3) (4) (5) (6)
Full Sample Survey Sample
Pre-Treatment Characteristic
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Age as of October of 11th Grade 166 -003 -007 166 -001 -001
(002) (007) (003) (009)
[019] [035] [065] [094]
Math Exam Score 038 008 025 044 007 030
(004) (010) (005) (016)
[008] [002] [017] [006]
Reading Exam Score 029 010 018 036 009 017
(003) (012) (004) (017)
[000] [014] [002] [031]
HS Grade Point Average 316 005 020 323 006 013
(003) (008) (003) (010)
[014] [002] [006] [020]
Female 059 000 010 061 -001 011
(003) (006) (004) (007)
[099] [010] [073] [012]
Asian American 012 002 010 012 003 010
(002) (005) (001) (007)
[027] [006] [007] [012]
Black 032 -002 -006 027 000 -005
(002) (006) (002) (005)
[029] [028] [088] [040]
Hispanic Native American or Multiracial 031 001 005 033 001 005
24
(002) (006) (002) (007)
[055] [041] [081] [051]
Disabled 002 000 -001 001 000 -001
(001) (001) (001) (001)
[093] [024] [057] [05]
Gifted 013 003 000 014 002 001
(002) (005) (002) (009)
[006] [100] [025] [089]
English Language Learner 005 001 002 004 001 004
(001) (002) (001) (003)
[041] [039] [054] [022]
Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007
(002) (007) (003) (009)
[066] [077] [072] [045]
Language Other than English Spoken at Home 034 002 003 035 001 004
(002) (007) (002) (007)
[032] [073] [059] [056]
Took Recommended Prerequisite Courses 079 000 009 079 002 005
(002) (004) (002) (005)
[084] [004] [043] [031]
Number of Observations 1819 1417
Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by
School x Cohort are in parentheses and p-values are in brackets
25
Table 3
First Stage Impacts on AP Course Enrollment and Overall Course Enrollment
(1) (2) (3) (4) (5) (6)
Full Sample Survey Respondents
Outcome
Control
Group
Mean
ITT
LATE
Control
Group
Mean
ITT
LATE
AP Treatment Course Enrollment 019 038 024 039
(005) (006)
[000] [000] Share of Credits During Study Year in
AP Science 003 004 011 003 004 010
(001) (001) (001) (001)
[000] [000] [000] [000]
All AP 013 004 011 014 004 010
(001) (002) (001) (002)
[000] [000] [000] [000]
Other Advanced Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [023] [020] [020]
All Other Advanced 025 -001 -003 025 -001 -003
(001) (002) (001) (003)
[023] [023] [030] [030]
Regular Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [020] [024] [019]
All Regular 062 -003 -009 061 -003 -007
(001) (003) (001) (003)
[002] [000] [007] [003]
Number of Observations 1819 1417
Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating
Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation
(1) Course-taking information collected from student transcripts Control Group Mean uses the
full control group for the first outcome (ie AP Treatment Course Enrollment) and those control
group members who complied with their assignment (ie those who did not take the AP
Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are
weighted by the inverse probability of completing the survey Standard errors clustered by School
x Cohort are in parentheses and p-values are in brackets
26
Table 4
Treatment Contrast (Composite Variables)
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Academically Challenging Curriculum -033 031 080
(010) (024)
[000] [000]
Project-Based Independent Classroom
Activities -006 013 033
(007) (017)
[007] [006]
Integrated Use of Technology
-011 011 028
(008) (019)
[019] [014]
Number of Observations 1417
Notes To construct these composite variables we first converted the values on each component
variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest
category was set to 10 the lowest to 00 and the remaining categories evenly spaced between
00 and 10 We then averaged and standardized these converted values Results are weighted by
the inverse probability of completing the survey Online Appendix Table 5 provides the list of
component variables Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
27
Table 5
AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Science Skill -019 009 023
(006) (016)
[015] [014]
STEM Interest 062 004 009
(002) (007)
[016] [016]
Confidence in College
Science 092 -004 -010
(002) (005)
[011] [006]
Stress 012 007 017
(003) (007)
[002] [001]
Grades in Science Courses 280 -012 -029
(007) (016)
[008] [007]
Grades in Other Courses 314 -007 -018
(002) (006)
[000] [000]
Number of Observations 1819 for grades 1417 for other
outcomes
Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of
participating students STEM interest =1 if high or some interest in pursuing a STEM degree or
=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to
complete a college science course or =0 if somewhat not confident or not at all confident Stress=
1 if most recent science course had strong negative or negative impact on physical or emotional
health or =0 if strong positive impact positive impact or no impact Grades in science and other
courses are obtained from student transcripts and measure grades during the study year
Results with the exception of grades during study year are weighted by the inverse probability of
completing the survey Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
Table 6
Robustness Checks of Main ITT Results
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Outcome
Control
Group
Complie
r Mean
Main
Result
s
Robus
t SE
p-value
(permutatio
n test)
Excludin
g High
School
56
Including
Imputatio
n of
Missing
Outcome
Variables
Excluding
Covariate
s
Excludin
g High
School
23
Lee
Lower
Boun
d
Lee
Upper
Boun
d
95
Confidence
Interval
from Lee
Bounds
Rati
o of
95
CI in
(11)
to
95
CI in
(7)
Science Skill -019 009 010 011 020 007 003 039
-
009
05
1 20
(006) (005) (000) (000) (000) (000) (007) (007)
[015] [006] [006] [020] [011] [001] [024] [072] [000]
STEM Interest 062 004 005 003 003 003 002 012
-
003
01
8 19
(002) (003) (000) (000) (000) (000) (003) (004)
[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College
Science 092 -004 -003 -006 -006 -004 -006 005
-
009
01
0 20
(002) (002) (000) (000) (000) (000) (002) (003)
[011] [005] [007] [037] [002] [003] [010] [000] [017]
Stress 012 007 005 006 008 007 001 011
-
005
01
5 16
(003) (002) (000) (000) (000) (000) (003) (002)
[002] [000] [000] [014] [007] [002] [002] [079] [000]
Grades in Science Courses 280 -012 -006 -010 -007 |
(007) (004) (000) (000) (000)
[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts
Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey
(002) (003) (000) (000) (000) |
[000] [001] [001] [000] [001] [038]
Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than
standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby
a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of
43
the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)
reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the
experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply
imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and
where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)
from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those
treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and
control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to
derive confidence interval for the treatment effect itself)
30
1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the
Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the
effect of advanced high school courses more generally often without distinctions between AP
and other rigorous course options Nearly all of these nonexperimental studies find large positive
effects of rigorous secondary school courses particularly those in math and science on studentsrsquo
high school postsecondary and labor market performance (eg Altonji 1995 Attewell and
Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long
Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer
an AP course are encouraged (though not required) to attend a professional development
training The Board and other independent agencies offer several workshops with the most
extensive training being the AP summer institute a week-long training that is led by an
experienced AP instructor Teachers are then expected to develop their syllabi for the course and
submit them to the Board for review A team of auditors at the Board review each syllabus and
grant permission to a school to label the course as AP on course catalogs and student transcripts
once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they
do not meet the requirements upon original submission College Board (2017b) contains a
discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for
assessment (ie course delivery and student performance are not assessed by the Board) In
order to effectively run an AP Biology or Chemistry course teachers require access to a well-
equipped classroom and laboratory including all supplies necessary to engage in
experimentation (eg beakers solutions microscopes measuring equipment) Most of the
teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of
learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry
reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses
were based upon recommendations from the National Science Foundation the National Research
Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently
influenced by their frames of reference in ways that other assessments of these traits (eg
external observations) may be less influenced By increasing the standard to which they compare
themselves studentsrsquo confidence may decrease This feature of most self-assessments could be
considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et
al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome
depends to some extent on how these changes in perceived ability influence other behaviors
such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and
Biology I and Chemistry I for AP Biology with no additional requirements beyond these
prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week
training course classroom supplies (eg lab materials textbooks) and to compensate schools
for the staff time required for study administration efforts We also offered $1000 compensation
for an individual selected by the school to serve as a liaison between the study team and the
31
school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of
students which would have powered the study to detect effect sizes smaller than those detected
here We faced several challenges in recruiting schools to participate even with the monetary
incentives Some schools were uncomfortable with randomization across classrooms while
others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the
course would be offered We also made some assignments on a rolling basis as additional
consentassent forms were submitted We have no information on the students who were deemed
eligible by the school to take the new AP science course but who did not sign the consent form
to participate As these students did not participate we do not have permission to obtain
information on their characteristics (eg via transcripts) and for most schools we do not know
the number of such students 10 Participating districts include Anaheim Union High School District California East Side
Union High School District California Lynwood Unified School District California Jefferson
Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg
Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public
Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville
Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate
degrees held by teachers nationally are likely to be in education (not STEM) Thus the study
teachers are less likely to have a graduate degree but not necessarily less likely to have STEM
training We also did not survey teachers regarding their Teach for America (TFA) experience
but it is possible that the relatively high share of STEM undergraduate degrees could be driven
by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last
pilot test included 140 students) prior to administering the tool to study participants Reliability
metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of
the development of the assessment instrument in the survey can be found in Seeratan et al
(2017) 13 Each year in the spring semester our team administered and collected the participant surveys
during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However
if study participants who did not take the survey differ in unobserved ways then our reweighting
based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo
characteristics before imputation of missing values (as described below) these results are very
similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact
that some students did not have 10th grade scores) we created one reading and math score for
each student that is the average of both scores or just the 8th grade score For the 23 participating
students who were in 10th grade during the year in which the AP course was offered to their
cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be
endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of
enrollment in the course so the lack of balance is simply due to unlucky randomization rather
32
than manipulation by school administrators We considered implementing a randomized block
design to avoid such issues but found it infeasible to obtain the necessary test score information
prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was
allowed to register for the new class We added an entire planning year to our study design to
avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by
Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We
find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these
six outcomes which suggests that generalizing our estimated treatment effects to the full control
group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly
offered enrollment in an AP course and then given the option of Chemistry or Biology To
account for the two courses offered we treat the school as two separate groups School-
Chemistry and School-Biology For those students who were not offered an AP course we
randomly assign them to one of two control groups proportional to the number of treated
students who chose each course For example if 60 of the treated students chose Biology then
we randomly assign 60 of the control students to the School-Biology control group In Section
VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a
probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1
if student i in school by cohort j completed any part of the end-of-year survey Xi is the same
vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed
effects and Φ() is the cumulative normal distribution function The results of this regression are
included in Online Appendix Table 2 Students who had higher pre-treatment grades Black
students those who were not disabled and those who took prerequisite courses were more likely
to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives
more weight in the regression to study participants who completed the survey and yet had pre-
study characteristics that were similar to those study participants who did not complete the
survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and
with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we
observe each outcome variable This follows a multiple imputation then deletion strategy
suggested by Hippel (2007) which improves efficiency while protecting against problematic
imputed outcome values As a robustness check Section VC provides results including
imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually
challenging curriculum with more homework than non-AP complier students Treatment group
students are also more likely to report that the students in their class were driven to succeed and
that the teacher set high standards The AP science class also involved more student-led projects
or experiments hands on learning and small group work all activities that are deemed to be
essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)
Yet we do not find strong evidence that students in AP classes were more likely to present what
they learned apply their knowledge to solve a new problem or work independently and none of
the component measures of technology usage were statistically significantly affected Nor did
33
treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear
better able to implement the academic rigor expected of an AP science class than some of the
inquiry-based approaches that the College Board intends for AP science We do not find
evidence that taking AP science led students to be more likely to report that they found their
course more interesting which may reflect the inability of the teachers to fully implement a
creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects
that might render our estimated effects smaller A research design with randomization both
across and within schools would allow for estimation of spillover effects but such a design was
infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers
received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the
weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors
in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple
outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons
(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same
three outcomes that reach statistical significance without applying the correction (shown in
Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys
from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we
have included the students from cohort 1 of high school number 23 where nonresponse was due
mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes
(including student prior academic preparation raceethnicity gender and teacher preparation)
We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in
science and grades in other courses) Some of the differences in the point estimates were quite
large yet so too were the standard errors For instance five of the seven estimated differential
treatment effects on science skill exceed 025 standard deviations with p-values that fall in the
suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse
on all three cohorts of study participants Once data collection is complete we will have the
ability to examine the effect of AP science on college enrollment college selectivity and college
completion
22
Table 1
Participating Schools and Teachers Compared to Other US High Schools and High School
Science Teachers Panel A Schools Participating Others
Average Enrollment 1409 723
Free or Reduced-Price Lunch 0700 0438
Asian 0055 0050
Black 0349 0154
Hispanic 0410 0221
White 0164 0537
Adjusted Cohort Graduation Rate 0843 0802
District Instruction Expenditures Per Pupil $6561 $5636
District Student Services Expenditures Per Pupil $3787 $3385
Panel B Teachers Participating Others
Age Under 30 0407 0160
Age 30-49 0432 0553
Age 50 or over 0161 0287
Female 0630 0536
Hispanic or Latino 0111 0051
Race American Indian or Alaska Native 0000 0009
Race Asian American 0111 0041
Race Black 0111 0060
Race Native Hawaiian or other Pacific Islander 0000 0004
Race White 0778 0896
Years of Experience 103 132
Years of Experience lt=2 0290 0085
Years of Experience lt=5 0481 0234
Hold a Teaching Certificate 0926 0945
Undergraduate Major in STEM 0944 0747
Single Subject Credential in Science 0630 0823
Masterrsquos Degree or Higher 0356 0615
Previously Taught AP Course 0469 NA
Previously Taught AP IB or Honors Course 0796 NA
Number of Professional Development Trainings 309 NA
in the Past 5 years (0-5)
Notes Panel A source is the 2013-14 Common Core Data httpsncesedgovccd EDFacts
httpswww2edgovaboutinitsededfactsindexhtml Others in Panel A refers to other public
high schools in the US Adjusted Cohort Graduation Rate is the percentage of the students in a
9th grade cohort who graduate within four years (McFarland et al 2018) Panel B source is the
Teacher Survey (N= 27) and 2011-12 Schools and Staffing Survey
httpsncesedgovsurveyssass Others in Panel B refers to public and private high school
teachers in the US High school science teachers are defined as teachers of grades 9-12 whose
main teaching assignment is in the natural sciences
23
Table 2
TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics
(1) (2) (3) (4) (5) (6)
Full Sample Survey Sample
Pre-Treatment Characteristic
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Age as of October of 11th Grade 166 -003 -007 166 -001 -001
(002) (007) (003) (009)
[019] [035] [065] [094]
Math Exam Score 038 008 025 044 007 030
(004) (010) (005) (016)
[008] [002] [017] [006]
Reading Exam Score 029 010 018 036 009 017
(003) (012) (004) (017)
[000] [014] [002] [031]
HS Grade Point Average 316 005 020 323 006 013
(003) (008) (003) (010)
[014] [002] [006] [020]
Female 059 000 010 061 -001 011
(003) (006) (004) (007)
[099] [010] [073] [012]
Asian American 012 002 010 012 003 010
(002) (005) (001) (007)
[027] [006] [007] [012]
Black 032 -002 -006 027 000 -005
(002) (006) (002) (005)
[029] [028] [088] [040]
Hispanic Native American or Multiracial 031 001 005 033 001 005
24
(002) (006) (002) (007)
[055] [041] [081] [051]
Disabled 002 000 -001 001 000 -001
(001) (001) (001) (001)
[093] [024] [057] [05]
Gifted 013 003 000 014 002 001
(002) (005) (002) (009)
[006] [100] [025] [089]
English Language Learner 005 001 002 004 001 004
(001) (002) (001) (003)
[041] [039] [054] [022]
Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007
(002) (007) (003) (009)
[066] [077] [072] [045]
Language Other than English Spoken at Home 034 002 003 035 001 004
(002) (007) (002) (007)
[032] [073] [059] [056]
Took Recommended Prerequisite Courses 079 000 009 079 002 005
(002) (004) (002) (005)
[084] [004] [043] [031]
Number of Observations 1819 1417
Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by
School x Cohort are in parentheses and p-values are in brackets
25
Table 3
First Stage Impacts on AP Course Enrollment and Overall Course Enrollment
(1) (2) (3) (4) (5) (6)
Full Sample Survey Respondents
Outcome
Control
Group
Mean
ITT
LATE
Control
Group
Mean
ITT
LATE
AP Treatment Course Enrollment 019 038 024 039
(005) (006)
[000] [000] Share of Credits During Study Year in
AP Science 003 004 011 003 004 010
(001) (001) (001) (001)
[000] [000] [000] [000]
All AP 013 004 011 014 004 010
(001) (002) (001) (002)
[000] [000] [000] [000]
Other Advanced Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [023] [020] [020]
All Other Advanced 025 -001 -003 025 -001 -003
(001) (002) (001) (003)
[023] [023] [030] [030]
Regular Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [020] [024] [019]
All Regular 062 -003 -009 061 -003 -007
(001) (003) (001) (003)
[002] [000] [007] [003]
Number of Observations 1819 1417
Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating
Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation
(1) Course-taking information collected from student transcripts Control Group Mean uses the
full control group for the first outcome (ie AP Treatment Course Enrollment) and those control
group members who complied with their assignment (ie those who did not take the AP
Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are
weighted by the inverse probability of completing the survey Standard errors clustered by School
x Cohort are in parentheses and p-values are in brackets
26
Table 4
Treatment Contrast (Composite Variables)
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Academically Challenging Curriculum -033 031 080
(010) (024)
[000] [000]
Project-Based Independent Classroom
Activities -006 013 033
(007) (017)
[007] [006]
Integrated Use of Technology
-011 011 028
(008) (019)
[019] [014]
Number of Observations 1417
Notes To construct these composite variables we first converted the values on each component
variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest
category was set to 10 the lowest to 00 and the remaining categories evenly spaced between
00 and 10 We then averaged and standardized these converted values Results are weighted by
the inverse probability of completing the survey Online Appendix Table 5 provides the list of
component variables Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
27
Table 5
AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Science Skill -019 009 023
(006) (016)
[015] [014]
STEM Interest 062 004 009
(002) (007)
[016] [016]
Confidence in College
Science 092 -004 -010
(002) (005)
[011] [006]
Stress 012 007 017
(003) (007)
[002] [001]
Grades in Science Courses 280 -012 -029
(007) (016)
[008] [007]
Grades in Other Courses 314 -007 -018
(002) (006)
[000] [000]
Number of Observations 1819 for grades 1417 for other
outcomes
Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of
participating students STEM interest =1 if high or some interest in pursuing a STEM degree or
=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to
complete a college science course or =0 if somewhat not confident or not at all confident Stress=
1 if most recent science course had strong negative or negative impact on physical or emotional
health or =0 if strong positive impact positive impact or no impact Grades in science and other
courses are obtained from student transcripts and measure grades during the study year
Results with the exception of grades during study year are weighted by the inverse probability of
completing the survey Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
Table 6
Robustness Checks of Main ITT Results
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Outcome
Control
Group
Complie
r Mean
Main
Result
s
Robus
t SE
p-value
(permutatio
n test)
Excludin
g High
School
56
Including
Imputatio
n of
Missing
Outcome
Variables
Excluding
Covariate
s
Excludin
g High
School
23
Lee
Lower
Boun
d
Lee
Upper
Boun
d
95
Confidence
Interval
from Lee
Bounds
Rati
o of
95
CI in
(11)
to
95
CI in
(7)
Science Skill -019 009 010 011 020 007 003 039
-
009
05
1 20
(006) (005) (000) (000) (000) (000) (007) (007)
[015] [006] [006] [020] [011] [001] [024] [072] [000]
STEM Interest 062 004 005 003 003 003 002 012
-
003
01
8 19
(002) (003) (000) (000) (000) (000) (003) (004)
[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College
Science 092 -004 -003 -006 -006 -004 -006 005
-
009
01
0 20
(002) (002) (000) (000) (000) (000) (002) (003)
[011] [005] [007] [037] [002] [003] [010] [000] [017]
Stress 012 007 005 006 008 007 001 011
-
005
01
5 16
(003) (002) (000) (000) (000) (000) (003) (002)
[002] [000] [000] [014] [007] [002] [002] [079] [000]
Grades in Science Courses 280 -012 -006 -010 -007 |
(007) (004) (000) (000) (000)
[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts
Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey
(002) (003) (000) (000) (000) |
[000] [001] [001] [000] [001] [038]
Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than
standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby
a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of
43
the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)
reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the
experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply
imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and
where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)
from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those
treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and
control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to
derive confidence interval for the treatment effect itself)
30
1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the
Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the
effect of advanced high school courses more generally often without distinctions between AP
and other rigorous course options Nearly all of these nonexperimental studies find large positive
effects of rigorous secondary school courses particularly those in math and science on studentsrsquo
high school postsecondary and labor market performance (eg Altonji 1995 Attewell and
Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long
Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer
an AP course are encouraged (though not required) to attend a professional development
training The Board and other independent agencies offer several workshops with the most
extensive training being the AP summer institute a week-long training that is led by an
experienced AP instructor Teachers are then expected to develop their syllabi for the course and
submit them to the Board for review A team of auditors at the Board review each syllabus and
grant permission to a school to label the course as AP on course catalogs and student transcripts
once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they
do not meet the requirements upon original submission College Board (2017b) contains a
discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for
assessment (ie course delivery and student performance are not assessed by the Board) In
order to effectively run an AP Biology or Chemistry course teachers require access to a well-
equipped classroom and laboratory including all supplies necessary to engage in
experimentation (eg beakers solutions microscopes measuring equipment) Most of the
teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of
learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry
reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses
were based upon recommendations from the National Science Foundation the National Research
Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently
influenced by their frames of reference in ways that other assessments of these traits (eg
external observations) may be less influenced By increasing the standard to which they compare
themselves studentsrsquo confidence may decrease This feature of most self-assessments could be
considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et
al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome
depends to some extent on how these changes in perceived ability influence other behaviors
such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and
Biology I and Chemistry I for AP Biology with no additional requirements beyond these
prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week
training course classroom supplies (eg lab materials textbooks) and to compensate schools
for the staff time required for study administration efforts We also offered $1000 compensation
for an individual selected by the school to serve as a liaison between the study team and the
31
school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of
students which would have powered the study to detect effect sizes smaller than those detected
here We faced several challenges in recruiting schools to participate even with the monetary
incentives Some schools were uncomfortable with randomization across classrooms while
others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the
course would be offered We also made some assignments on a rolling basis as additional
consentassent forms were submitted We have no information on the students who were deemed
eligible by the school to take the new AP science course but who did not sign the consent form
to participate As these students did not participate we do not have permission to obtain
information on their characteristics (eg via transcripts) and for most schools we do not know
the number of such students 10 Participating districts include Anaheim Union High School District California East Side
Union High School District California Lynwood Unified School District California Jefferson
Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg
Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public
Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville
Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate
degrees held by teachers nationally are likely to be in education (not STEM) Thus the study
teachers are less likely to have a graduate degree but not necessarily less likely to have STEM
training We also did not survey teachers regarding their Teach for America (TFA) experience
but it is possible that the relatively high share of STEM undergraduate degrees could be driven
by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last
pilot test included 140 students) prior to administering the tool to study participants Reliability
metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of
the development of the assessment instrument in the survey can be found in Seeratan et al
(2017) 13 Each year in the spring semester our team administered and collected the participant surveys
during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However
if study participants who did not take the survey differ in unobserved ways then our reweighting
based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo
characteristics before imputation of missing values (as described below) these results are very
similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact
that some students did not have 10th grade scores) we created one reading and math score for
each student that is the average of both scores or just the 8th grade score For the 23 participating
students who were in 10th grade during the year in which the AP course was offered to their
cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be
endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of
enrollment in the course so the lack of balance is simply due to unlucky randomization rather
32
than manipulation by school administrators We considered implementing a randomized block
design to avoid such issues but found it infeasible to obtain the necessary test score information
prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was
allowed to register for the new class We added an entire planning year to our study design to
avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by
Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We
find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these
six outcomes which suggests that generalizing our estimated treatment effects to the full control
group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly
offered enrollment in an AP course and then given the option of Chemistry or Biology To
account for the two courses offered we treat the school as two separate groups School-
Chemistry and School-Biology For those students who were not offered an AP course we
randomly assign them to one of two control groups proportional to the number of treated
students who chose each course For example if 60 of the treated students chose Biology then
we randomly assign 60 of the control students to the School-Biology control group In Section
VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a
probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1
if student i in school by cohort j completed any part of the end-of-year survey Xi is the same
vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed
effects and Φ() is the cumulative normal distribution function The results of this regression are
included in Online Appendix Table 2 Students who had higher pre-treatment grades Black
students those who were not disabled and those who took prerequisite courses were more likely
to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives
more weight in the regression to study participants who completed the survey and yet had pre-
study characteristics that were similar to those study participants who did not complete the
survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and
with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we
observe each outcome variable This follows a multiple imputation then deletion strategy
suggested by Hippel (2007) which improves efficiency while protecting against problematic
imputed outcome values As a robustness check Section VC provides results including
imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually
challenging curriculum with more homework than non-AP complier students Treatment group
students are also more likely to report that the students in their class were driven to succeed and
that the teacher set high standards The AP science class also involved more student-led projects
or experiments hands on learning and small group work all activities that are deemed to be
essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)
Yet we do not find strong evidence that students in AP classes were more likely to present what
they learned apply their knowledge to solve a new problem or work independently and none of
the component measures of technology usage were statistically significantly affected Nor did
33
treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear
better able to implement the academic rigor expected of an AP science class than some of the
inquiry-based approaches that the College Board intends for AP science We do not find
evidence that taking AP science led students to be more likely to report that they found their
course more interesting which may reflect the inability of the teachers to fully implement a
creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects
that might render our estimated effects smaller A research design with randomization both
across and within schools would allow for estimation of spillover effects but such a design was
infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers
received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the
weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors
in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple
outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons
(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same
three outcomes that reach statistical significance without applying the correction (shown in
Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys
from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we
have included the students from cohort 1 of high school number 23 where nonresponse was due
mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes
(including student prior academic preparation raceethnicity gender and teacher preparation)
We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in
science and grades in other courses) Some of the differences in the point estimates were quite
large yet so too were the standard errors For instance five of the seven estimated differential
treatment effects on science skill exceed 025 standard deviations with p-values that fall in the
suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse
on all three cohorts of study participants Once data collection is complete we will have the
ability to examine the effect of AP science on college enrollment college selectivity and college
completion
23
Table 2
TreatmentControl Balance and ComplierNon-Complier Differences on Pre-Treatment Characteristics
(1) (2) (3) (4) (5) (6)
Full Sample Survey Sample
Pre-Treatment Characteristic
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Control
Group
Mean
Difference
Between
Treated and
Controls
Difference
Between
Control Group
Non-Compliers
and Compliers
Age as of October of 11th Grade 166 -003 -007 166 -001 -001
(002) (007) (003) (009)
[019] [035] [065] [094]
Math Exam Score 038 008 025 044 007 030
(004) (010) (005) (016)
[008] [002] [017] [006]
Reading Exam Score 029 010 018 036 009 017
(003) (012) (004) (017)
[000] [014] [002] [031]
HS Grade Point Average 316 005 020 323 006 013
(003) (008) (003) (010)
[014] [002] [006] [020]
Female 059 000 010 061 -001 011
(003) (006) (004) (007)
[099] [010] [073] [012]
Asian American 012 002 010 012 003 010
(002) (005) (001) (007)
[027] [006] [007] [012]
Black 032 -002 -006 027 000 -005
(002) (006) (002) (005)
[029] [028] [088] [040]
Hispanic Native American or Multiracial 031 001 005 033 001 005
24
(002) (006) (002) (007)
[055] [041] [081] [051]
Disabled 002 000 -001 001 000 -001
(001) (001) (001) (001)
[093] [024] [057] [05]
Gifted 013 003 000 014 002 001
(002) (005) (002) (009)
[006] [100] [025] [089]
English Language Learner 005 001 002 004 001 004
(001) (002) (001) (003)
[041] [039] [054] [022]
Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007
(002) (007) (003) (009)
[066] [077] [072] [045]
Language Other than English Spoken at Home 034 002 003 035 001 004
(002) (007) (002) (007)
[032] [073] [059] [056]
Took Recommended Prerequisite Courses 079 000 009 079 002 005
(002) (004) (002) (005)
[084] [004] [043] [031]
Number of Observations 1819 1417
Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by
School x Cohort are in parentheses and p-values are in brackets
25
Table 3
First Stage Impacts on AP Course Enrollment and Overall Course Enrollment
(1) (2) (3) (4) (5) (6)
Full Sample Survey Respondents
Outcome
Control
Group
Mean
ITT
LATE
Control
Group
Mean
ITT
LATE
AP Treatment Course Enrollment 019 038 024 039
(005) (006)
[000] [000] Share of Credits During Study Year in
AP Science 003 004 011 003 004 010
(001) (001) (001) (001)
[000] [000] [000] [000]
All AP 013 004 011 014 004 010
(001) (002) (001) (002)
[000] [000] [000] [000]
Other Advanced Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [023] [020] [020]
All Other Advanced 025 -001 -003 025 -001 -003
(001) (002) (001) (003)
[023] [023] [030] [030]
Regular Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [020] [024] [019]
All Regular 062 -003 -009 061 -003 -007
(001) (003) (001) (003)
[002] [000] [007] [003]
Number of Observations 1819 1417
Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating
Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation
(1) Course-taking information collected from student transcripts Control Group Mean uses the
full control group for the first outcome (ie AP Treatment Course Enrollment) and those control
group members who complied with their assignment (ie those who did not take the AP
Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are
weighted by the inverse probability of completing the survey Standard errors clustered by School
x Cohort are in parentheses and p-values are in brackets
26
Table 4
Treatment Contrast (Composite Variables)
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Academically Challenging Curriculum -033 031 080
(010) (024)
[000] [000]
Project-Based Independent Classroom
Activities -006 013 033
(007) (017)
[007] [006]
Integrated Use of Technology
-011 011 028
(008) (019)
[019] [014]
Number of Observations 1417
Notes To construct these composite variables we first converted the values on each component
variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest
category was set to 10 the lowest to 00 and the remaining categories evenly spaced between
00 and 10 We then averaged and standardized these converted values Results are weighted by
the inverse probability of completing the survey Online Appendix Table 5 provides the list of
component variables Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
27
Table 5
AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Science Skill -019 009 023
(006) (016)
[015] [014]
STEM Interest 062 004 009
(002) (007)
[016] [016]
Confidence in College
Science 092 -004 -010
(002) (005)
[011] [006]
Stress 012 007 017
(003) (007)
[002] [001]
Grades in Science Courses 280 -012 -029
(007) (016)
[008] [007]
Grades in Other Courses 314 -007 -018
(002) (006)
[000] [000]
Number of Observations 1819 for grades 1417 for other
outcomes
Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of
participating students STEM interest =1 if high or some interest in pursuing a STEM degree or
=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to
complete a college science course or =0 if somewhat not confident or not at all confident Stress=
1 if most recent science course had strong negative or negative impact on physical or emotional
health or =0 if strong positive impact positive impact or no impact Grades in science and other
courses are obtained from student transcripts and measure grades during the study year
Results with the exception of grades during study year are weighted by the inverse probability of
completing the survey Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
Table 6
Robustness Checks of Main ITT Results
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Outcome
Control
Group
Complie
r Mean
Main
Result
s
Robus
t SE
p-value
(permutatio
n test)
Excludin
g High
School
56
Including
Imputatio
n of
Missing
Outcome
Variables
Excluding
Covariate
s
Excludin
g High
School
23
Lee
Lower
Boun
d
Lee
Upper
Boun
d
95
Confidence
Interval
from Lee
Bounds
Rati
o of
95
CI in
(11)
to
95
CI in
(7)
Science Skill -019 009 010 011 020 007 003 039
-
009
05
1 20
(006) (005) (000) (000) (000) (000) (007) (007)
[015] [006] [006] [020] [011] [001] [024] [072] [000]
STEM Interest 062 004 005 003 003 003 002 012
-
003
01
8 19
(002) (003) (000) (000) (000) (000) (003) (004)
[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College
Science 092 -004 -003 -006 -006 -004 -006 005
-
009
01
0 20
(002) (002) (000) (000) (000) (000) (002) (003)
[011] [005] [007] [037] [002] [003] [010] [000] [017]
Stress 012 007 005 006 008 007 001 011
-
005
01
5 16
(003) (002) (000) (000) (000) (000) (003) (002)
[002] [000] [000] [014] [007] [002] [002] [079] [000]
Grades in Science Courses 280 -012 -006 -010 -007 |
(007) (004) (000) (000) (000)
[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts
Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey
(002) (003) (000) (000) (000) |
[000] [001] [001] [000] [001] [038]
Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than
standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby
a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of
43
the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)
reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the
experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply
imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and
where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)
from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those
treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and
control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to
derive confidence interval for the treatment effect itself)
30
1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the
Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the
effect of advanced high school courses more generally often without distinctions between AP
and other rigorous course options Nearly all of these nonexperimental studies find large positive
effects of rigorous secondary school courses particularly those in math and science on studentsrsquo
high school postsecondary and labor market performance (eg Altonji 1995 Attewell and
Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long
Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer
an AP course are encouraged (though not required) to attend a professional development
training The Board and other independent agencies offer several workshops with the most
extensive training being the AP summer institute a week-long training that is led by an
experienced AP instructor Teachers are then expected to develop their syllabi for the course and
submit them to the Board for review A team of auditors at the Board review each syllabus and
grant permission to a school to label the course as AP on course catalogs and student transcripts
once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they
do not meet the requirements upon original submission College Board (2017b) contains a
discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for
assessment (ie course delivery and student performance are not assessed by the Board) In
order to effectively run an AP Biology or Chemistry course teachers require access to a well-
equipped classroom and laboratory including all supplies necessary to engage in
experimentation (eg beakers solutions microscopes measuring equipment) Most of the
teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of
learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry
reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses
were based upon recommendations from the National Science Foundation the National Research
Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently
influenced by their frames of reference in ways that other assessments of these traits (eg
external observations) may be less influenced By increasing the standard to which they compare
themselves studentsrsquo confidence may decrease This feature of most self-assessments could be
considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et
al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome
depends to some extent on how these changes in perceived ability influence other behaviors
such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and
Biology I and Chemistry I for AP Biology with no additional requirements beyond these
prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week
training course classroom supplies (eg lab materials textbooks) and to compensate schools
for the staff time required for study administration efforts We also offered $1000 compensation
for an individual selected by the school to serve as a liaison between the study team and the
31
school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of
students which would have powered the study to detect effect sizes smaller than those detected
here We faced several challenges in recruiting schools to participate even with the monetary
incentives Some schools were uncomfortable with randomization across classrooms while
others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the
course would be offered We also made some assignments on a rolling basis as additional
consentassent forms were submitted We have no information on the students who were deemed
eligible by the school to take the new AP science course but who did not sign the consent form
to participate As these students did not participate we do not have permission to obtain
information on their characteristics (eg via transcripts) and for most schools we do not know
the number of such students 10 Participating districts include Anaheim Union High School District California East Side
Union High School District California Lynwood Unified School District California Jefferson
Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg
Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public
Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville
Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate
degrees held by teachers nationally are likely to be in education (not STEM) Thus the study
teachers are less likely to have a graduate degree but not necessarily less likely to have STEM
training We also did not survey teachers regarding their Teach for America (TFA) experience
but it is possible that the relatively high share of STEM undergraduate degrees could be driven
by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last
pilot test included 140 students) prior to administering the tool to study participants Reliability
metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of
the development of the assessment instrument in the survey can be found in Seeratan et al
(2017) 13 Each year in the spring semester our team administered and collected the participant surveys
during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However
if study participants who did not take the survey differ in unobserved ways then our reweighting
based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo
characteristics before imputation of missing values (as described below) these results are very
similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact
that some students did not have 10th grade scores) we created one reading and math score for
each student that is the average of both scores or just the 8th grade score For the 23 participating
students who were in 10th grade during the year in which the AP course was offered to their
cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be
endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of
enrollment in the course so the lack of balance is simply due to unlucky randomization rather
32
than manipulation by school administrators We considered implementing a randomized block
design to avoid such issues but found it infeasible to obtain the necessary test score information
prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was
allowed to register for the new class We added an entire planning year to our study design to
avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by
Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We
find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these
six outcomes which suggests that generalizing our estimated treatment effects to the full control
group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly
offered enrollment in an AP course and then given the option of Chemistry or Biology To
account for the two courses offered we treat the school as two separate groups School-
Chemistry and School-Biology For those students who were not offered an AP course we
randomly assign them to one of two control groups proportional to the number of treated
students who chose each course For example if 60 of the treated students chose Biology then
we randomly assign 60 of the control students to the School-Biology control group In Section
VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a
probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1
if student i in school by cohort j completed any part of the end-of-year survey Xi is the same
vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed
effects and Φ() is the cumulative normal distribution function The results of this regression are
included in Online Appendix Table 2 Students who had higher pre-treatment grades Black
students those who were not disabled and those who took prerequisite courses were more likely
to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives
more weight in the regression to study participants who completed the survey and yet had pre-
study characteristics that were similar to those study participants who did not complete the
survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and
with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we
observe each outcome variable This follows a multiple imputation then deletion strategy
suggested by Hippel (2007) which improves efficiency while protecting against problematic
imputed outcome values As a robustness check Section VC provides results including
imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually
challenging curriculum with more homework than non-AP complier students Treatment group
students are also more likely to report that the students in their class were driven to succeed and
that the teacher set high standards The AP science class also involved more student-led projects
or experiments hands on learning and small group work all activities that are deemed to be
essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)
Yet we do not find strong evidence that students in AP classes were more likely to present what
they learned apply their knowledge to solve a new problem or work independently and none of
the component measures of technology usage were statistically significantly affected Nor did
33
treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear
better able to implement the academic rigor expected of an AP science class than some of the
inquiry-based approaches that the College Board intends for AP science We do not find
evidence that taking AP science led students to be more likely to report that they found their
course more interesting which may reflect the inability of the teachers to fully implement a
creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects
that might render our estimated effects smaller A research design with randomization both
across and within schools would allow for estimation of spillover effects but such a design was
infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers
received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the
weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors
in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple
outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons
(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same
three outcomes that reach statistical significance without applying the correction (shown in
Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys
from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we
have included the students from cohort 1 of high school number 23 where nonresponse was due
mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes
(including student prior academic preparation raceethnicity gender and teacher preparation)
We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in
science and grades in other courses) Some of the differences in the point estimates were quite
large yet so too were the standard errors For instance five of the seven estimated differential
treatment effects on science skill exceed 025 standard deviations with p-values that fall in the
suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse
on all three cohorts of study participants Once data collection is complete we will have the
ability to examine the effect of AP science on college enrollment college selectivity and college
completion
24
(002) (006) (002) (007)
[055] [041] [081] [051]
Disabled 002 000 -001 001 000 -001
(001) (001) (001) (001)
[093] [024] [057] [05]
Gifted 013 003 000 014 002 001
(002) (005) (002) (009)
[006] [100] [025] [089]
English Language Learner 005 001 002 004 001 004
(001) (002) (001) (003)
[041] [039] [054] [022]
Eligible for Free or Reduced-Price Lunch 053 001 002 051 001 007
(002) (007) (003) (009)
[066] [077] [072] [045]
Language Other than English Spoken at Home 034 002 003 035 001 004
(002) (007) (002) (007)
[032] [073] [059] [056]
Took Recommended Prerequisite Courses 079 000 009 079 002 005
(002) (004) (002) (005)
[084] [004] [043] [031]
Number of Observations 1819 1417
Notes Differences in columns (2) (3) (5) and (6) are conditional on School x Cohort fixed effects Standard errors clustered by
School x Cohort are in parentheses and p-values are in brackets
25
Table 3
First Stage Impacts on AP Course Enrollment and Overall Course Enrollment
(1) (2) (3) (4) (5) (6)
Full Sample Survey Respondents
Outcome
Control
Group
Mean
ITT
LATE
Control
Group
Mean
ITT
LATE
AP Treatment Course Enrollment 019 038 024 039
(005) (006)
[000] [000] Share of Credits During Study Year in
AP Science 003 004 011 003 004 010
(001) (001) (001) (001)
[000] [000] [000] [000]
All AP 013 004 011 014 004 010
(001) (002) (001) (002)
[000] [000] [000] [000]
Other Advanced Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [023] [020] [020]
All Other Advanced 025 -001 -003 025 -001 -003
(001) (002) (001) (003)
[023] [023] [030] [030]
Regular Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [020] [024] [019]
All Regular 062 -003 -009 061 -003 -007
(001) (003) (001) (003)
[002] [000] [007] [003]
Number of Observations 1819 1417
Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating
Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation
(1) Course-taking information collected from student transcripts Control Group Mean uses the
full control group for the first outcome (ie AP Treatment Course Enrollment) and those control
group members who complied with their assignment (ie those who did not take the AP
Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are
weighted by the inverse probability of completing the survey Standard errors clustered by School
x Cohort are in parentheses and p-values are in brackets
26
Table 4
Treatment Contrast (Composite Variables)
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Academically Challenging Curriculum -033 031 080
(010) (024)
[000] [000]
Project-Based Independent Classroom
Activities -006 013 033
(007) (017)
[007] [006]
Integrated Use of Technology
-011 011 028
(008) (019)
[019] [014]
Number of Observations 1417
Notes To construct these composite variables we first converted the values on each component
variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest
category was set to 10 the lowest to 00 and the remaining categories evenly spaced between
00 and 10 We then averaged and standardized these converted values Results are weighted by
the inverse probability of completing the survey Online Appendix Table 5 provides the list of
component variables Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
27
Table 5
AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Science Skill -019 009 023
(006) (016)
[015] [014]
STEM Interest 062 004 009
(002) (007)
[016] [016]
Confidence in College
Science 092 -004 -010
(002) (005)
[011] [006]
Stress 012 007 017
(003) (007)
[002] [001]
Grades in Science Courses 280 -012 -029
(007) (016)
[008] [007]
Grades in Other Courses 314 -007 -018
(002) (006)
[000] [000]
Number of Observations 1819 for grades 1417 for other
outcomes
Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of
participating students STEM interest =1 if high or some interest in pursuing a STEM degree or
=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to
complete a college science course or =0 if somewhat not confident or not at all confident Stress=
1 if most recent science course had strong negative or negative impact on physical or emotional
health or =0 if strong positive impact positive impact or no impact Grades in science and other
courses are obtained from student transcripts and measure grades during the study year
Results with the exception of grades during study year are weighted by the inverse probability of
completing the survey Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
Table 6
Robustness Checks of Main ITT Results
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Outcome
Control
Group
Complie
r Mean
Main
Result
s
Robus
t SE
p-value
(permutatio
n test)
Excludin
g High
School
56
Including
Imputatio
n of
Missing
Outcome
Variables
Excluding
Covariate
s
Excludin
g High
School
23
Lee
Lower
Boun
d
Lee
Upper
Boun
d
95
Confidence
Interval
from Lee
Bounds
Rati
o of
95
CI in
(11)
to
95
CI in
(7)
Science Skill -019 009 010 011 020 007 003 039
-
009
05
1 20
(006) (005) (000) (000) (000) (000) (007) (007)
[015] [006] [006] [020] [011] [001] [024] [072] [000]
STEM Interest 062 004 005 003 003 003 002 012
-
003
01
8 19
(002) (003) (000) (000) (000) (000) (003) (004)
[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College
Science 092 -004 -003 -006 -006 -004 -006 005
-
009
01
0 20
(002) (002) (000) (000) (000) (000) (002) (003)
[011] [005] [007] [037] [002] [003] [010] [000] [017]
Stress 012 007 005 006 008 007 001 011
-
005
01
5 16
(003) (002) (000) (000) (000) (000) (003) (002)
[002] [000] [000] [014] [007] [002] [002] [079] [000]
Grades in Science Courses 280 -012 -006 -010 -007 |
(007) (004) (000) (000) (000)
[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts
Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey
(002) (003) (000) (000) (000) |
[000] [001] [001] [000] [001] [038]
Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than
standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby
a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of
43
the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)
reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the
experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply
imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and
where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)
from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those
treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and
control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to
derive confidence interval for the treatment effect itself)
30
1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the
Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the
effect of advanced high school courses more generally often without distinctions between AP
and other rigorous course options Nearly all of these nonexperimental studies find large positive
effects of rigorous secondary school courses particularly those in math and science on studentsrsquo
high school postsecondary and labor market performance (eg Altonji 1995 Attewell and
Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long
Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer
an AP course are encouraged (though not required) to attend a professional development
training The Board and other independent agencies offer several workshops with the most
extensive training being the AP summer institute a week-long training that is led by an
experienced AP instructor Teachers are then expected to develop their syllabi for the course and
submit them to the Board for review A team of auditors at the Board review each syllabus and
grant permission to a school to label the course as AP on course catalogs and student transcripts
once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they
do not meet the requirements upon original submission College Board (2017b) contains a
discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for
assessment (ie course delivery and student performance are not assessed by the Board) In
order to effectively run an AP Biology or Chemistry course teachers require access to a well-
equipped classroom and laboratory including all supplies necessary to engage in
experimentation (eg beakers solutions microscopes measuring equipment) Most of the
teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of
learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry
reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses
were based upon recommendations from the National Science Foundation the National Research
Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently
influenced by their frames of reference in ways that other assessments of these traits (eg
external observations) may be less influenced By increasing the standard to which they compare
themselves studentsrsquo confidence may decrease This feature of most self-assessments could be
considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et
al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome
depends to some extent on how these changes in perceived ability influence other behaviors
such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and
Biology I and Chemistry I for AP Biology with no additional requirements beyond these
prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week
training course classroom supplies (eg lab materials textbooks) and to compensate schools
for the staff time required for study administration efforts We also offered $1000 compensation
for an individual selected by the school to serve as a liaison between the study team and the
31
school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of
students which would have powered the study to detect effect sizes smaller than those detected
here We faced several challenges in recruiting schools to participate even with the monetary
incentives Some schools were uncomfortable with randomization across classrooms while
others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the
course would be offered We also made some assignments on a rolling basis as additional
consentassent forms were submitted We have no information on the students who were deemed
eligible by the school to take the new AP science course but who did not sign the consent form
to participate As these students did not participate we do not have permission to obtain
information on their characteristics (eg via transcripts) and for most schools we do not know
the number of such students 10 Participating districts include Anaheim Union High School District California East Side
Union High School District California Lynwood Unified School District California Jefferson
Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg
Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public
Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville
Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate
degrees held by teachers nationally are likely to be in education (not STEM) Thus the study
teachers are less likely to have a graduate degree but not necessarily less likely to have STEM
training We also did not survey teachers regarding their Teach for America (TFA) experience
but it is possible that the relatively high share of STEM undergraduate degrees could be driven
by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last
pilot test included 140 students) prior to administering the tool to study participants Reliability
metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of
the development of the assessment instrument in the survey can be found in Seeratan et al
(2017) 13 Each year in the spring semester our team administered and collected the participant surveys
during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However
if study participants who did not take the survey differ in unobserved ways then our reweighting
based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo
characteristics before imputation of missing values (as described below) these results are very
similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact
that some students did not have 10th grade scores) we created one reading and math score for
each student that is the average of both scores or just the 8th grade score For the 23 participating
students who were in 10th grade during the year in which the AP course was offered to their
cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be
endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of
enrollment in the course so the lack of balance is simply due to unlucky randomization rather
32
than manipulation by school administrators We considered implementing a randomized block
design to avoid such issues but found it infeasible to obtain the necessary test score information
prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was
allowed to register for the new class We added an entire planning year to our study design to
avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by
Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We
find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these
six outcomes which suggests that generalizing our estimated treatment effects to the full control
group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly
offered enrollment in an AP course and then given the option of Chemistry or Biology To
account for the two courses offered we treat the school as two separate groups School-
Chemistry and School-Biology For those students who were not offered an AP course we
randomly assign them to one of two control groups proportional to the number of treated
students who chose each course For example if 60 of the treated students chose Biology then
we randomly assign 60 of the control students to the School-Biology control group In Section
VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a
probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1
if student i in school by cohort j completed any part of the end-of-year survey Xi is the same
vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed
effects and Φ() is the cumulative normal distribution function The results of this regression are
included in Online Appendix Table 2 Students who had higher pre-treatment grades Black
students those who were not disabled and those who took prerequisite courses were more likely
to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives
more weight in the regression to study participants who completed the survey and yet had pre-
study characteristics that were similar to those study participants who did not complete the
survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and
with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we
observe each outcome variable This follows a multiple imputation then deletion strategy
suggested by Hippel (2007) which improves efficiency while protecting against problematic
imputed outcome values As a robustness check Section VC provides results including
imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually
challenging curriculum with more homework than non-AP complier students Treatment group
students are also more likely to report that the students in their class were driven to succeed and
that the teacher set high standards The AP science class also involved more student-led projects
or experiments hands on learning and small group work all activities that are deemed to be
essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)
Yet we do not find strong evidence that students in AP classes were more likely to present what
they learned apply their knowledge to solve a new problem or work independently and none of
the component measures of technology usage were statistically significantly affected Nor did
33
treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear
better able to implement the academic rigor expected of an AP science class than some of the
inquiry-based approaches that the College Board intends for AP science We do not find
evidence that taking AP science led students to be more likely to report that they found their
course more interesting which may reflect the inability of the teachers to fully implement a
creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects
that might render our estimated effects smaller A research design with randomization both
across and within schools would allow for estimation of spillover effects but such a design was
infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers
received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the
weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors
in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple
outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons
(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same
three outcomes that reach statistical significance without applying the correction (shown in
Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys
from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we
have included the students from cohort 1 of high school number 23 where nonresponse was due
mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes
(including student prior academic preparation raceethnicity gender and teacher preparation)
We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in
science and grades in other courses) Some of the differences in the point estimates were quite
large yet so too were the standard errors For instance five of the seven estimated differential
treatment effects on science skill exceed 025 standard deviations with p-values that fall in the
suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse
on all three cohorts of study participants Once data collection is complete we will have the
ability to examine the effect of AP science on college enrollment college selectivity and college
completion
25
Table 3
First Stage Impacts on AP Course Enrollment and Overall Course Enrollment
(1) (2) (3) (4) (5) (6)
Full Sample Survey Respondents
Outcome
Control
Group
Mean
ITT
LATE
Control
Group
Mean
ITT
LATE
AP Treatment Course Enrollment 019 038 024 039
(005) (006)
[000] [000] Share of Credits During Study Year in
AP Science 003 004 011 003 004 010
(001) (001) (001) (001)
[000] [000] [000] [000]
All AP 013 004 011 014 004 010
(001) (002) (001) (002)
[000] [000] [000] [000]
Other Advanced Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [023] [020] [020]
All Other Advanced 025 -001 -003 025 -001 -003
(001) (002) (001) (003)
[023] [023] [030] [030]
Regular Science 006 -001 -002 006 -001 -002
(001) (002) (001) (002)
[024] [020] [024] [019]
All Regular 062 -003 -009 061 -003 -007
(001) (003) (001) (003)
[002] [000] [007] [003]
Number of Observations 1819 1417
Notes The ITT results in the AP Treatment Course Enrollment row are found by estimating
Equation (2) and subsequent ITT and LATE results are found by estimating variants of Equation
(1) Course-taking information collected from student transcripts Control Group Mean uses the
full control group for the first outcome (ie AP Treatment Course Enrollment) and those control
group members who complied with their assignment (ie those who did not take the AP
Treatment Course) for the subsequent outcomes Results in columns (4) (5) and (6) are
weighted by the inverse probability of completing the survey Standard errors clustered by School
x Cohort are in parentheses and p-values are in brackets
26
Table 4
Treatment Contrast (Composite Variables)
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Academically Challenging Curriculum -033 031 080
(010) (024)
[000] [000]
Project-Based Independent Classroom
Activities -006 013 033
(007) (017)
[007] [006]
Integrated Use of Technology
-011 011 028
(008) (019)
[019] [014]
Number of Observations 1417
Notes To construct these composite variables we first converted the values on each component
variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest
category was set to 10 the lowest to 00 and the remaining categories evenly spaced between
00 and 10 We then averaged and standardized these converted values Results are weighted by
the inverse probability of completing the survey Online Appendix Table 5 provides the list of
component variables Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
27
Table 5
AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Science Skill -019 009 023
(006) (016)
[015] [014]
STEM Interest 062 004 009
(002) (007)
[016] [016]
Confidence in College
Science 092 -004 -010
(002) (005)
[011] [006]
Stress 012 007 017
(003) (007)
[002] [001]
Grades in Science Courses 280 -012 -029
(007) (016)
[008] [007]
Grades in Other Courses 314 -007 -018
(002) (006)
[000] [000]
Number of Observations 1819 for grades 1417 for other
outcomes
Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of
participating students STEM interest =1 if high or some interest in pursuing a STEM degree or
=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to
complete a college science course or =0 if somewhat not confident or not at all confident Stress=
1 if most recent science course had strong negative or negative impact on physical or emotional
health or =0 if strong positive impact positive impact or no impact Grades in science and other
courses are obtained from student transcripts and measure grades during the study year
Results with the exception of grades during study year are weighted by the inverse probability of
completing the survey Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
Table 6
Robustness Checks of Main ITT Results
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Outcome
Control
Group
Complie
r Mean
Main
Result
s
Robus
t SE
p-value
(permutatio
n test)
Excludin
g High
School
56
Including
Imputatio
n of
Missing
Outcome
Variables
Excluding
Covariate
s
Excludin
g High
School
23
Lee
Lower
Boun
d
Lee
Upper
Boun
d
95
Confidence
Interval
from Lee
Bounds
Rati
o of
95
CI in
(11)
to
95
CI in
(7)
Science Skill -019 009 010 011 020 007 003 039
-
009
05
1 20
(006) (005) (000) (000) (000) (000) (007) (007)
[015] [006] [006] [020] [011] [001] [024] [072] [000]
STEM Interest 062 004 005 003 003 003 002 012
-
003
01
8 19
(002) (003) (000) (000) (000) (000) (003) (004)
[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College
Science 092 -004 -003 -006 -006 -004 -006 005
-
009
01
0 20
(002) (002) (000) (000) (000) (000) (002) (003)
[011] [005] [007] [037] [002] [003] [010] [000] [017]
Stress 012 007 005 006 008 007 001 011
-
005
01
5 16
(003) (002) (000) (000) (000) (000) (003) (002)
[002] [000] [000] [014] [007] [002] [002] [079] [000]
Grades in Science Courses 280 -012 -006 -010 -007 |
(007) (004) (000) (000) (000)
[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts
Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey
(002) (003) (000) (000) (000) |
[000] [001] [001] [000] [001] [038]
Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than
standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby
a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of
43
the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)
reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the
experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply
imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and
where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)
from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those
treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and
control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to
derive confidence interval for the treatment effect itself)
30
1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the
Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the
effect of advanced high school courses more generally often without distinctions between AP
and other rigorous course options Nearly all of these nonexperimental studies find large positive
effects of rigorous secondary school courses particularly those in math and science on studentsrsquo
high school postsecondary and labor market performance (eg Altonji 1995 Attewell and
Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long
Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer
an AP course are encouraged (though not required) to attend a professional development
training The Board and other independent agencies offer several workshops with the most
extensive training being the AP summer institute a week-long training that is led by an
experienced AP instructor Teachers are then expected to develop their syllabi for the course and
submit them to the Board for review A team of auditors at the Board review each syllabus and
grant permission to a school to label the course as AP on course catalogs and student transcripts
once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they
do not meet the requirements upon original submission College Board (2017b) contains a
discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for
assessment (ie course delivery and student performance are not assessed by the Board) In
order to effectively run an AP Biology or Chemistry course teachers require access to a well-
equipped classroom and laboratory including all supplies necessary to engage in
experimentation (eg beakers solutions microscopes measuring equipment) Most of the
teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of
learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry
reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses
were based upon recommendations from the National Science Foundation the National Research
Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently
influenced by their frames of reference in ways that other assessments of these traits (eg
external observations) may be less influenced By increasing the standard to which they compare
themselves studentsrsquo confidence may decrease This feature of most self-assessments could be
considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et
al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome
depends to some extent on how these changes in perceived ability influence other behaviors
such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and
Biology I and Chemistry I for AP Biology with no additional requirements beyond these
prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week
training course classroom supplies (eg lab materials textbooks) and to compensate schools
for the staff time required for study administration efforts We also offered $1000 compensation
for an individual selected by the school to serve as a liaison between the study team and the
31
school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of
students which would have powered the study to detect effect sizes smaller than those detected
here We faced several challenges in recruiting schools to participate even with the monetary
incentives Some schools were uncomfortable with randomization across classrooms while
others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the
course would be offered We also made some assignments on a rolling basis as additional
consentassent forms were submitted We have no information on the students who were deemed
eligible by the school to take the new AP science course but who did not sign the consent form
to participate As these students did not participate we do not have permission to obtain
information on their characteristics (eg via transcripts) and for most schools we do not know
the number of such students 10 Participating districts include Anaheim Union High School District California East Side
Union High School District California Lynwood Unified School District California Jefferson
Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg
Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public
Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville
Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate
degrees held by teachers nationally are likely to be in education (not STEM) Thus the study
teachers are less likely to have a graduate degree but not necessarily less likely to have STEM
training We also did not survey teachers regarding their Teach for America (TFA) experience
but it is possible that the relatively high share of STEM undergraduate degrees could be driven
by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last
pilot test included 140 students) prior to administering the tool to study participants Reliability
metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of
the development of the assessment instrument in the survey can be found in Seeratan et al
(2017) 13 Each year in the spring semester our team administered and collected the participant surveys
during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However
if study participants who did not take the survey differ in unobserved ways then our reweighting
based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo
characteristics before imputation of missing values (as described below) these results are very
similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact
that some students did not have 10th grade scores) we created one reading and math score for
each student that is the average of both scores or just the 8th grade score For the 23 participating
students who were in 10th grade during the year in which the AP course was offered to their
cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be
endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of
enrollment in the course so the lack of balance is simply due to unlucky randomization rather
32
than manipulation by school administrators We considered implementing a randomized block
design to avoid such issues but found it infeasible to obtain the necessary test score information
prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was
allowed to register for the new class We added an entire planning year to our study design to
avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by
Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We
find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these
six outcomes which suggests that generalizing our estimated treatment effects to the full control
group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly
offered enrollment in an AP course and then given the option of Chemistry or Biology To
account for the two courses offered we treat the school as two separate groups School-
Chemistry and School-Biology For those students who were not offered an AP course we
randomly assign them to one of two control groups proportional to the number of treated
students who chose each course For example if 60 of the treated students chose Biology then
we randomly assign 60 of the control students to the School-Biology control group In Section
VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a
probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1
if student i in school by cohort j completed any part of the end-of-year survey Xi is the same
vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed
effects and Φ() is the cumulative normal distribution function The results of this regression are
included in Online Appendix Table 2 Students who had higher pre-treatment grades Black
students those who were not disabled and those who took prerequisite courses were more likely
to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives
more weight in the regression to study participants who completed the survey and yet had pre-
study characteristics that were similar to those study participants who did not complete the
survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and
with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we
observe each outcome variable This follows a multiple imputation then deletion strategy
suggested by Hippel (2007) which improves efficiency while protecting against problematic
imputed outcome values As a robustness check Section VC provides results including
imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually
challenging curriculum with more homework than non-AP complier students Treatment group
students are also more likely to report that the students in their class were driven to succeed and
that the teacher set high standards The AP science class also involved more student-led projects
or experiments hands on learning and small group work all activities that are deemed to be
essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)
Yet we do not find strong evidence that students in AP classes were more likely to present what
they learned apply their knowledge to solve a new problem or work independently and none of
the component measures of technology usage were statistically significantly affected Nor did
33
treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear
better able to implement the academic rigor expected of an AP science class than some of the
inquiry-based approaches that the College Board intends for AP science We do not find
evidence that taking AP science led students to be more likely to report that they found their
course more interesting which may reflect the inability of the teachers to fully implement a
creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects
that might render our estimated effects smaller A research design with randomization both
across and within schools would allow for estimation of spillover effects but such a design was
infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers
received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the
weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors
in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple
outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons
(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same
three outcomes that reach statistical significance without applying the correction (shown in
Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys
from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we
have included the students from cohort 1 of high school number 23 where nonresponse was due
mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes
(including student prior academic preparation raceethnicity gender and teacher preparation)
We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in
science and grades in other courses) Some of the differences in the point estimates were quite
large yet so too were the standard errors For instance five of the seven estimated differential
treatment effects on science skill exceed 025 standard deviations with p-values that fall in the
suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse
on all three cohorts of study participants Once data collection is complete we will have the
ability to examine the effect of AP science on college enrollment college selectivity and college
completion
26
Table 4
Treatment Contrast (Composite Variables)
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Academically Challenging Curriculum -033 031 080
(010) (024)
[000] [000]
Project-Based Independent Classroom
Activities -006 013 033
(007) (017)
[007] [006]
Integrated Use of Technology
-011 011 028
(008) (019)
[019] [014]
Number of Observations 1417
Notes To construct these composite variables we first converted the values on each component
variable (eg strongly agree agree neutral disagree or strongly disagree) so that the highest
category was set to 10 the lowest to 00 and the remaining categories evenly spaced between
00 and 10 We then averaged and standardized these converted values Results are weighted by
the inverse probability of completing the survey Online Appendix Table 5 provides the list of
component variables Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
27
Table 5
AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Science Skill -019 009 023
(006) (016)
[015] [014]
STEM Interest 062 004 009
(002) (007)
[016] [016]
Confidence in College
Science 092 -004 -010
(002) (005)
[011] [006]
Stress 012 007 017
(003) (007)
[002] [001]
Grades in Science Courses 280 -012 -029
(007) (016)
[008] [007]
Grades in Other Courses 314 -007 -018
(002) (006)
[000] [000]
Number of Observations 1819 for grades 1417 for other
outcomes
Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of
participating students STEM interest =1 if high or some interest in pursuing a STEM degree or
=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to
complete a college science course or =0 if somewhat not confident or not at all confident Stress=
1 if most recent science course had strong negative or negative impact on physical or emotional
health or =0 if strong positive impact positive impact or no impact Grades in science and other
courses are obtained from student transcripts and measure grades during the study year
Results with the exception of grades during study year are weighted by the inverse probability of
completing the survey Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
Table 6
Robustness Checks of Main ITT Results
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Outcome
Control
Group
Complie
r Mean
Main
Result
s
Robus
t SE
p-value
(permutatio
n test)
Excludin
g High
School
56
Including
Imputatio
n of
Missing
Outcome
Variables
Excluding
Covariate
s
Excludin
g High
School
23
Lee
Lower
Boun
d
Lee
Upper
Boun
d
95
Confidence
Interval
from Lee
Bounds
Rati
o of
95
CI in
(11)
to
95
CI in
(7)
Science Skill -019 009 010 011 020 007 003 039
-
009
05
1 20
(006) (005) (000) (000) (000) (000) (007) (007)
[015] [006] [006] [020] [011] [001] [024] [072] [000]
STEM Interest 062 004 005 003 003 003 002 012
-
003
01
8 19
(002) (003) (000) (000) (000) (000) (003) (004)
[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College
Science 092 -004 -003 -006 -006 -004 -006 005
-
009
01
0 20
(002) (002) (000) (000) (000) (000) (002) (003)
[011] [005] [007] [037] [002] [003] [010] [000] [017]
Stress 012 007 005 006 008 007 001 011
-
005
01
5 16
(003) (002) (000) (000) (000) (000) (003) (002)
[002] [000] [000] [014] [007] [002] [002] [079] [000]
Grades in Science Courses 280 -012 -006 -010 -007 |
(007) (004) (000) (000) (000)
[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts
Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey
(002) (003) (000) (000) (000) |
[000] [001] [001] [000] [001] [038]
Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than
standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby
a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of
43
the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)
reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the
experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply
imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and
where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)
from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those
treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and
control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to
derive confidence interval for the treatment effect itself)
30
1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the
Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the
effect of advanced high school courses more generally often without distinctions between AP
and other rigorous course options Nearly all of these nonexperimental studies find large positive
effects of rigorous secondary school courses particularly those in math and science on studentsrsquo
high school postsecondary and labor market performance (eg Altonji 1995 Attewell and
Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long
Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer
an AP course are encouraged (though not required) to attend a professional development
training The Board and other independent agencies offer several workshops with the most
extensive training being the AP summer institute a week-long training that is led by an
experienced AP instructor Teachers are then expected to develop their syllabi for the course and
submit them to the Board for review A team of auditors at the Board review each syllabus and
grant permission to a school to label the course as AP on course catalogs and student transcripts
once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they
do not meet the requirements upon original submission College Board (2017b) contains a
discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for
assessment (ie course delivery and student performance are not assessed by the Board) In
order to effectively run an AP Biology or Chemistry course teachers require access to a well-
equipped classroom and laboratory including all supplies necessary to engage in
experimentation (eg beakers solutions microscopes measuring equipment) Most of the
teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of
learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry
reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses
were based upon recommendations from the National Science Foundation the National Research
Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently
influenced by their frames of reference in ways that other assessments of these traits (eg
external observations) may be less influenced By increasing the standard to which they compare
themselves studentsrsquo confidence may decrease This feature of most self-assessments could be
considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et
al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome
depends to some extent on how these changes in perceived ability influence other behaviors
such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and
Biology I and Chemistry I for AP Biology with no additional requirements beyond these
prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week
training course classroom supplies (eg lab materials textbooks) and to compensate schools
for the staff time required for study administration efforts We also offered $1000 compensation
for an individual selected by the school to serve as a liaison between the study team and the
31
school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of
students which would have powered the study to detect effect sizes smaller than those detected
here We faced several challenges in recruiting schools to participate even with the monetary
incentives Some schools were uncomfortable with randomization across classrooms while
others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the
course would be offered We also made some assignments on a rolling basis as additional
consentassent forms were submitted We have no information on the students who were deemed
eligible by the school to take the new AP science course but who did not sign the consent form
to participate As these students did not participate we do not have permission to obtain
information on their characteristics (eg via transcripts) and for most schools we do not know
the number of such students 10 Participating districts include Anaheim Union High School District California East Side
Union High School District California Lynwood Unified School District California Jefferson
Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg
Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public
Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville
Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate
degrees held by teachers nationally are likely to be in education (not STEM) Thus the study
teachers are less likely to have a graduate degree but not necessarily less likely to have STEM
training We also did not survey teachers regarding their Teach for America (TFA) experience
but it is possible that the relatively high share of STEM undergraduate degrees could be driven
by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last
pilot test included 140 students) prior to administering the tool to study participants Reliability
metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of
the development of the assessment instrument in the survey can be found in Seeratan et al
(2017) 13 Each year in the spring semester our team administered and collected the participant surveys
during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However
if study participants who did not take the survey differ in unobserved ways then our reweighting
based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo
characteristics before imputation of missing values (as described below) these results are very
similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact
that some students did not have 10th grade scores) we created one reading and math score for
each student that is the average of both scores or just the 8th grade score For the 23 participating
students who were in 10th grade during the year in which the AP course was offered to their
cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be
endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of
enrollment in the course so the lack of balance is simply due to unlucky randomization rather
32
than manipulation by school administrators We considered implementing a randomized block
design to avoid such issues but found it infeasible to obtain the necessary test score information
prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was
allowed to register for the new class We added an entire planning year to our study design to
avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by
Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We
find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these
six outcomes which suggests that generalizing our estimated treatment effects to the full control
group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly
offered enrollment in an AP course and then given the option of Chemistry or Biology To
account for the two courses offered we treat the school as two separate groups School-
Chemistry and School-Biology For those students who were not offered an AP course we
randomly assign them to one of two control groups proportional to the number of treated
students who chose each course For example if 60 of the treated students chose Biology then
we randomly assign 60 of the control students to the School-Biology control group In Section
VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a
probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1
if student i in school by cohort j completed any part of the end-of-year survey Xi is the same
vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed
effects and Φ() is the cumulative normal distribution function The results of this regression are
included in Online Appendix Table 2 Students who had higher pre-treatment grades Black
students those who were not disabled and those who took prerequisite courses were more likely
to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives
more weight in the regression to study participants who completed the survey and yet had pre-
study characteristics that were similar to those study participants who did not complete the
survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and
with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we
observe each outcome variable This follows a multiple imputation then deletion strategy
suggested by Hippel (2007) which improves efficiency while protecting against problematic
imputed outcome values As a robustness check Section VC provides results including
imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually
challenging curriculum with more homework than non-AP complier students Treatment group
students are also more likely to report that the students in their class were driven to succeed and
that the teacher set high standards The AP science class also involved more student-led projects
or experiments hands on learning and small group work all activities that are deemed to be
essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)
Yet we do not find strong evidence that students in AP classes were more likely to present what
they learned apply their knowledge to solve a new problem or work independently and none of
the component measures of technology usage were statistically significantly affected Nor did
33
treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear
better able to implement the academic rigor expected of an AP science class than some of the
inquiry-based approaches that the College Board intends for AP science We do not find
evidence that taking AP science led students to be more likely to report that they found their
course more interesting which may reflect the inability of the teachers to fully implement a
creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects
that might render our estimated effects smaller A research design with randomization both
across and within schools would allow for estimation of spillover effects but such a design was
infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers
received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the
weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors
in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple
outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons
(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same
three outcomes that reach statistical significance without applying the correction (shown in
Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys
from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we
have included the students from cohort 1 of high school number 23 where nonresponse was due
mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes
(including student prior academic preparation raceethnicity gender and teacher preparation)
We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in
science and grades in other courses) Some of the differences in the point estimates were quite
large yet so too were the standard errors For instance five of the seven estimated differential
treatment effects on science skill exceed 025 standard deviations with p-values that fall in the
suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse
on all three cohorts of study participants Once data collection is complete we will have the
ability to examine the effect of AP science on college enrollment college selectivity and college
completion
27
Table 5
AP Course Impact on Science Skill STEM Interest Confidence Stress and Grades
(1) (2) (3)
Outcome
Control
Group
Complier
Mean
ITT LATE
Science Skill -019 009 023
(006) (016)
[015] [014]
STEM Interest 062 004 009
(002) (007)
[016] [016]
Confidence in College
Science 092 -004 -010
(002) (005)
[011] [006]
Stress 012 007 017
(003) (007)
[002] [001]
Grades in Science Courses 280 -012 -029
(007) (016)
[008] [007]
Grades in Other Courses 314 -007 -018
(002) (006)
[000] [000]
Number of Observations 1819 for grades 1417 for other
outcomes
Notes Science skill has been standardized to have a mean of 0 and SD of 1 for the full sample of
participating students STEM interest =1 if high or some interest in pursuing a STEM degree or
=0 if no interest Confidence in college science = 1 if extremely or somewhat confident in ability to
complete a college science course or =0 if somewhat not confident or not at all confident Stress=
1 if most recent science course had strong negative or negative impact on physical or emotional
health or =0 if strong positive impact positive impact or no impact Grades in science and other
courses are obtained from student transcripts and measure grades during the study year
Results with the exception of grades during study year are weighted by the inverse probability of
completing the survey Standard errors clustered by School x Cohort are in parentheses and p-
values are in brackets
Table 6
Robustness Checks of Main ITT Results
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Outcome
Control
Group
Complie
r Mean
Main
Result
s
Robus
t SE
p-value
(permutatio
n test)
Excludin
g High
School
56
Including
Imputatio
n of
Missing
Outcome
Variables
Excluding
Covariate
s
Excludin
g High
School
23
Lee
Lower
Boun
d
Lee
Upper
Boun
d
95
Confidence
Interval
from Lee
Bounds
Rati
o of
95
CI in
(11)
to
95
CI in
(7)
Science Skill -019 009 010 011 020 007 003 039
-
009
05
1 20
(006) (005) (000) (000) (000) (000) (007) (007)
[015] [006] [006] [020] [011] [001] [024] [072] [000]
STEM Interest 062 004 005 003 003 003 002 012
-
003
01
8 19
(002) (003) (000) (000) (000) (000) (003) (004)
[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College
Science 092 -004 -003 -006 -006 -004 -006 005
-
009
01
0 20
(002) (002) (000) (000) (000) (000) (002) (003)
[011] [005] [007] [037] [002] [003] [010] [000] [017]
Stress 012 007 005 006 008 007 001 011
-
005
01
5 16
(003) (002) (000) (000) (000) (000) (003) (002)
[002] [000] [000] [014] [007] [002] [002] [079] [000]
Grades in Science Courses 280 -012 -006 -010 -007 |
(007) (004) (000) (000) (000)
[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts
Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey
(002) (003) (000) (000) (000) |
[000] [001] [001] [000] [001] [038]
Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than
standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby
a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of
43
the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)
reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the
experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply
imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and
where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)
from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those
treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and
control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to
derive confidence interval for the treatment effect itself)
30
1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the
Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the
effect of advanced high school courses more generally often without distinctions between AP
and other rigorous course options Nearly all of these nonexperimental studies find large positive
effects of rigorous secondary school courses particularly those in math and science on studentsrsquo
high school postsecondary and labor market performance (eg Altonji 1995 Attewell and
Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long
Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer
an AP course are encouraged (though not required) to attend a professional development
training The Board and other independent agencies offer several workshops with the most
extensive training being the AP summer institute a week-long training that is led by an
experienced AP instructor Teachers are then expected to develop their syllabi for the course and
submit them to the Board for review A team of auditors at the Board review each syllabus and
grant permission to a school to label the course as AP on course catalogs and student transcripts
once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they
do not meet the requirements upon original submission College Board (2017b) contains a
discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for
assessment (ie course delivery and student performance are not assessed by the Board) In
order to effectively run an AP Biology or Chemistry course teachers require access to a well-
equipped classroom and laboratory including all supplies necessary to engage in
experimentation (eg beakers solutions microscopes measuring equipment) Most of the
teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of
learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry
reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses
were based upon recommendations from the National Science Foundation the National Research
Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently
influenced by their frames of reference in ways that other assessments of these traits (eg
external observations) may be less influenced By increasing the standard to which they compare
themselves studentsrsquo confidence may decrease This feature of most self-assessments could be
considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et
al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome
depends to some extent on how these changes in perceived ability influence other behaviors
such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and
Biology I and Chemistry I for AP Biology with no additional requirements beyond these
prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week
training course classroom supplies (eg lab materials textbooks) and to compensate schools
for the staff time required for study administration efforts We also offered $1000 compensation
for an individual selected by the school to serve as a liaison between the study team and the
31
school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of
students which would have powered the study to detect effect sizes smaller than those detected
here We faced several challenges in recruiting schools to participate even with the monetary
incentives Some schools were uncomfortable with randomization across classrooms while
others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the
course would be offered We also made some assignments on a rolling basis as additional
consentassent forms were submitted We have no information on the students who were deemed
eligible by the school to take the new AP science course but who did not sign the consent form
to participate As these students did not participate we do not have permission to obtain
information on their characteristics (eg via transcripts) and for most schools we do not know
the number of such students 10 Participating districts include Anaheim Union High School District California East Side
Union High School District California Lynwood Unified School District California Jefferson
Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg
Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public
Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville
Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate
degrees held by teachers nationally are likely to be in education (not STEM) Thus the study
teachers are less likely to have a graduate degree but not necessarily less likely to have STEM
training We also did not survey teachers regarding their Teach for America (TFA) experience
but it is possible that the relatively high share of STEM undergraduate degrees could be driven
by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last
pilot test included 140 students) prior to administering the tool to study participants Reliability
metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of
the development of the assessment instrument in the survey can be found in Seeratan et al
(2017) 13 Each year in the spring semester our team administered and collected the participant surveys
during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However
if study participants who did not take the survey differ in unobserved ways then our reweighting
based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo
characteristics before imputation of missing values (as described below) these results are very
similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact
that some students did not have 10th grade scores) we created one reading and math score for
each student that is the average of both scores or just the 8th grade score For the 23 participating
students who were in 10th grade during the year in which the AP course was offered to their
cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be
endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of
enrollment in the course so the lack of balance is simply due to unlucky randomization rather
32
than manipulation by school administrators We considered implementing a randomized block
design to avoid such issues but found it infeasible to obtain the necessary test score information
prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was
allowed to register for the new class We added an entire planning year to our study design to
avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by
Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We
find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these
six outcomes which suggests that generalizing our estimated treatment effects to the full control
group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly
offered enrollment in an AP course and then given the option of Chemistry or Biology To
account for the two courses offered we treat the school as two separate groups School-
Chemistry and School-Biology For those students who were not offered an AP course we
randomly assign them to one of two control groups proportional to the number of treated
students who chose each course For example if 60 of the treated students chose Biology then
we randomly assign 60 of the control students to the School-Biology control group In Section
VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a
probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1
if student i in school by cohort j completed any part of the end-of-year survey Xi is the same
vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed
effects and Φ() is the cumulative normal distribution function The results of this regression are
included in Online Appendix Table 2 Students who had higher pre-treatment grades Black
students those who were not disabled and those who took prerequisite courses were more likely
to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives
more weight in the regression to study participants who completed the survey and yet had pre-
study characteristics that were similar to those study participants who did not complete the
survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and
with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we
observe each outcome variable This follows a multiple imputation then deletion strategy
suggested by Hippel (2007) which improves efficiency while protecting against problematic
imputed outcome values As a robustness check Section VC provides results including
imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually
challenging curriculum with more homework than non-AP complier students Treatment group
students are also more likely to report that the students in their class were driven to succeed and
that the teacher set high standards The AP science class also involved more student-led projects
or experiments hands on learning and small group work all activities that are deemed to be
essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)
Yet we do not find strong evidence that students in AP classes were more likely to present what
they learned apply their knowledge to solve a new problem or work independently and none of
the component measures of technology usage were statistically significantly affected Nor did
33
treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear
better able to implement the academic rigor expected of an AP science class than some of the
inquiry-based approaches that the College Board intends for AP science We do not find
evidence that taking AP science led students to be more likely to report that they found their
course more interesting which may reflect the inability of the teachers to fully implement a
creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects
that might render our estimated effects smaller A research design with randomization both
across and within schools would allow for estimation of spillover effects but such a design was
infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers
received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the
weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors
in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple
outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons
(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same
three outcomes that reach statistical significance without applying the correction (shown in
Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys
from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we
have included the students from cohort 1 of high school number 23 where nonresponse was due
mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes
(including student prior academic preparation raceethnicity gender and teacher preparation)
We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in
science and grades in other courses) Some of the differences in the point estimates were quite
large yet so too were the standard errors For instance five of the seven estimated differential
treatment effects on science skill exceed 025 standard deviations with p-values that fall in the
suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse
on all three cohorts of study participants Once data collection is complete we will have the
ability to examine the effect of AP science on college enrollment college selectivity and college
completion
Table 6
Robustness Checks of Main ITT Results
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Outcome
Control
Group
Complie
r Mean
Main
Result
s
Robus
t SE
p-value
(permutatio
n test)
Excludin
g High
School
56
Including
Imputatio
n of
Missing
Outcome
Variables
Excluding
Covariate
s
Excludin
g High
School
23
Lee
Lower
Boun
d
Lee
Upper
Boun
d
95
Confidence
Interval
from Lee
Bounds
Rati
o of
95
CI in
(11)
to
95
CI in
(7)
Science Skill -019 009 010 011 020 007 003 039
-
009
05
1 20
(006) (005) (000) (000) (000) (000) (007) (007)
[015] [006] [006] [020] [011] [001] [024] [072] [000]
STEM Interest 062 004 005 003 003 003 002 012
-
003
01
8 19
(002) (003) (000) (000) (000) (000) (003) (004)
[016] [019] [020] [009] [029] [027] [019] [060] [000] Confidence in College
Science 092 -004 -003 -006 -006 -004 -006 005
-
009
01
0 20
(002) (002) (000) (000) (000) (000) (002) (003)
[011] [005] [007] [037] [002] [003] [010] [000] [017]
Stress 012 007 005 006 008 007 001 011
-
005
01
5 16
(003) (002) (000) (000) (000) (000) (003) (002)
[002] [000] [000] [014] [007] [002] [002] [079] [000]
Grades in Science Courses 280 -012 -006 -010 -007 |
(007) (004) (000) (000) (000)
[008] [001] [001] [031] [016] [030] Not applicable as grades are based on transcripts
Grades in Other Courses 314 -007 -007 -006 -003 rather than student survey
(002) (003) (000) (000) (000) |
[000] [001] [001] [000] [001] [038]
Notes Columns (1) and (2) repeat the main results previously shown in Table 5 Column (3) reports robust standard errors (rather than
standard errors clustered by School x Cohort) and corresponding p-values Column (4) reports the results of a permutation test whereby
a pseudo treatment was randomly assigned 1000 times The p-value shows the share of these permutations where the absolute value of
43
the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)
reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the
experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply
imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and
where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)
from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those
treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and
control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to
derive confidence interval for the treatment effect itself)
30
1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the
Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the
effect of advanced high school courses more generally often without distinctions between AP
and other rigorous course options Nearly all of these nonexperimental studies find large positive
effects of rigorous secondary school courses particularly those in math and science on studentsrsquo
high school postsecondary and labor market performance (eg Altonji 1995 Attewell and
Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long
Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer
an AP course are encouraged (though not required) to attend a professional development
training The Board and other independent agencies offer several workshops with the most
extensive training being the AP summer institute a week-long training that is led by an
experienced AP instructor Teachers are then expected to develop their syllabi for the course and
submit them to the Board for review A team of auditors at the Board review each syllabus and
grant permission to a school to label the course as AP on course catalogs and student transcripts
once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they
do not meet the requirements upon original submission College Board (2017b) contains a
discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for
assessment (ie course delivery and student performance are not assessed by the Board) In
order to effectively run an AP Biology or Chemistry course teachers require access to a well-
equipped classroom and laboratory including all supplies necessary to engage in
experimentation (eg beakers solutions microscopes measuring equipment) Most of the
teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of
learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry
reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses
were based upon recommendations from the National Science Foundation the National Research
Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently
influenced by their frames of reference in ways that other assessments of these traits (eg
external observations) may be less influenced By increasing the standard to which they compare
themselves studentsrsquo confidence may decrease This feature of most self-assessments could be
considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et
al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome
depends to some extent on how these changes in perceived ability influence other behaviors
such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and
Biology I and Chemistry I for AP Biology with no additional requirements beyond these
prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week
training course classroom supplies (eg lab materials textbooks) and to compensate schools
for the staff time required for study administration efforts We also offered $1000 compensation
for an individual selected by the school to serve as a liaison between the study team and the
31
school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of
students which would have powered the study to detect effect sizes smaller than those detected
here We faced several challenges in recruiting schools to participate even with the monetary
incentives Some schools were uncomfortable with randomization across classrooms while
others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the
course would be offered We also made some assignments on a rolling basis as additional
consentassent forms were submitted We have no information on the students who were deemed
eligible by the school to take the new AP science course but who did not sign the consent form
to participate As these students did not participate we do not have permission to obtain
information on their characteristics (eg via transcripts) and for most schools we do not know
the number of such students 10 Participating districts include Anaheim Union High School District California East Side
Union High School District California Lynwood Unified School District California Jefferson
Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg
Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public
Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville
Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate
degrees held by teachers nationally are likely to be in education (not STEM) Thus the study
teachers are less likely to have a graduate degree but not necessarily less likely to have STEM
training We also did not survey teachers regarding their Teach for America (TFA) experience
but it is possible that the relatively high share of STEM undergraduate degrees could be driven
by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last
pilot test included 140 students) prior to administering the tool to study participants Reliability
metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of
the development of the assessment instrument in the survey can be found in Seeratan et al
(2017) 13 Each year in the spring semester our team administered and collected the participant surveys
during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However
if study participants who did not take the survey differ in unobserved ways then our reweighting
based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo
characteristics before imputation of missing values (as described below) these results are very
similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact
that some students did not have 10th grade scores) we created one reading and math score for
each student that is the average of both scores or just the 8th grade score For the 23 participating
students who were in 10th grade during the year in which the AP course was offered to their
cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be
endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of
enrollment in the course so the lack of balance is simply due to unlucky randomization rather
32
than manipulation by school administrators We considered implementing a randomized block
design to avoid such issues but found it infeasible to obtain the necessary test score information
prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was
allowed to register for the new class We added an entire planning year to our study design to
avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by
Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We
find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these
six outcomes which suggests that generalizing our estimated treatment effects to the full control
group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly
offered enrollment in an AP course and then given the option of Chemistry or Biology To
account for the two courses offered we treat the school as two separate groups School-
Chemistry and School-Biology For those students who were not offered an AP course we
randomly assign them to one of two control groups proportional to the number of treated
students who chose each course For example if 60 of the treated students chose Biology then
we randomly assign 60 of the control students to the School-Biology control group In Section
VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a
probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1
if student i in school by cohort j completed any part of the end-of-year survey Xi is the same
vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed
effects and Φ() is the cumulative normal distribution function The results of this regression are
included in Online Appendix Table 2 Students who had higher pre-treatment grades Black
students those who were not disabled and those who took prerequisite courses were more likely
to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives
more weight in the regression to study participants who completed the survey and yet had pre-
study characteristics that were similar to those study participants who did not complete the
survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and
with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we
observe each outcome variable This follows a multiple imputation then deletion strategy
suggested by Hippel (2007) which improves efficiency while protecting against problematic
imputed outcome values As a robustness check Section VC provides results including
imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually
challenging curriculum with more homework than non-AP complier students Treatment group
students are also more likely to report that the students in their class were driven to succeed and
that the teacher set high standards The AP science class also involved more student-led projects
or experiments hands on learning and small group work all activities that are deemed to be
essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)
Yet we do not find strong evidence that students in AP classes were more likely to present what
they learned apply their knowledge to solve a new problem or work independently and none of
the component measures of technology usage were statistically significantly affected Nor did
33
treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear
better able to implement the academic rigor expected of an AP science class than some of the
inquiry-based approaches that the College Board intends for AP science We do not find
evidence that taking AP science led students to be more likely to report that they found their
course more interesting which may reflect the inability of the teachers to fully implement a
creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects
that might render our estimated effects smaller A research design with randomization both
across and within schools would allow for estimation of spillover effects but such a design was
infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers
received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the
weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors
in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple
outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons
(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same
three outcomes that reach statistical significance without applying the correction (shown in
Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys
from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we
have included the students from cohort 1 of high school number 23 where nonresponse was due
mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes
(including student prior academic preparation raceethnicity gender and teacher preparation)
We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in
science and grades in other courses) Some of the differences in the point estimates were quite
large yet so too were the standard errors For instance five of the seven estimated differential
treatment effects on science skill exceed 025 standard deviations with p-values that fall in the
suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse
on all three cohorts of study participants Once data collection is complete we will have the
ability to examine the effect of AP science on college enrollment college selectivity and college
completion
the estimated pseudo treatment effect exceeded the absolute value of the estimated treatment effect shown in column (2) Column (5)
reestimates the main results with high school 56 excluded This school offered both AP biology and AP chemistry as part of the
experiment Column (6) reestimates the main results including observations where the outcome variable is missing and multiply
imputed Column (7) reestimates the main results excluding high school 23 which had a low rate of student survey completion and
where surveys from cohort 1 were lost Column (8) reestimates the main results excluding the vector of pre-treatment covariates (Xi)
from Equation 1 Columns (9) and (10) show the lower and upper bound estimate based on Lee (2009) (ie trimming off those
treatment observations with the highestlowest values of the outcome until the survey response rates are equal across treatment and
control groups) Column (11) shows the 95 confidence interval from Lee Bounds (applying Imbens and Manskirsquos (2004) method to
derive confidence interval for the treatment effect itself)
30
1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the
Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the
effect of advanced high school courses more generally often without distinctions between AP
and other rigorous course options Nearly all of these nonexperimental studies find large positive
effects of rigorous secondary school courses particularly those in math and science on studentsrsquo
high school postsecondary and labor market performance (eg Altonji 1995 Attewell and
Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long
Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer
an AP course are encouraged (though not required) to attend a professional development
training The Board and other independent agencies offer several workshops with the most
extensive training being the AP summer institute a week-long training that is led by an
experienced AP instructor Teachers are then expected to develop their syllabi for the course and
submit them to the Board for review A team of auditors at the Board review each syllabus and
grant permission to a school to label the course as AP on course catalogs and student transcripts
once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they
do not meet the requirements upon original submission College Board (2017b) contains a
discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for
assessment (ie course delivery and student performance are not assessed by the Board) In
order to effectively run an AP Biology or Chemistry course teachers require access to a well-
equipped classroom and laboratory including all supplies necessary to engage in
experimentation (eg beakers solutions microscopes measuring equipment) Most of the
teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of
learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry
reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses
were based upon recommendations from the National Science Foundation the National Research
Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently
influenced by their frames of reference in ways that other assessments of these traits (eg
external observations) may be less influenced By increasing the standard to which they compare
themselves studentsrsquo confidence may decrease This feature of most self-assessments could be
considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et
al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome
depends to some extent on how these changes in perceived ability influence other behaviors
such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and
Biology I and Chemistry I for AP Biology with no additional requirements beyond these
prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week
training course classroom supplies (eg lab materials textbooks) and to compensate schools
for the staff time required for study administration efforts We also offered $1000 compensation
for an individual selected by the school to serve as a liaison between the study team and the
31
school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of
students which would have powered the study to detect effect sizes smaller than those detected
here We faced several challenges in recruiting schools to participate even with the monetary
incentives Some schools were uncomfortable with randomization across classrooms while
others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the
course would be offered We also made some assignments on a rolling basis as additional
consentassent forms were submitted We have no information on the students who were deemed
eligible by the school to take the new AP science course but who did not sign the consent form
to participate As these students did not participate we do not have permission to obtain
information on their characteristics (eg via transcripts) and for most schools we do not know
the number of such students 10 Participating districts include Anaheim Union High School District California East Side
Union High School District California Lynwood Unified School District California Jefferson
Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg
Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public
Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville
Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate
degrees held by teachers nationally are likely to be in education (not STEM) Thus the study
teachers are less likely to have a graduate degree but not necessarily less likely to have STEM
training We also did not survey teachers regarding their Teach for America (TFA) experience
but it is possible that the relatively high share of STEM undergraduate degrees could be driven
by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last
pilot test included 140 students) prior to administering the tool to study participants Reliability
metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of
the development of the assessment instrument in the survey can be found in Seeratan et al
(2017) 13 Each year in the spring semester our team administered and collected the participant surveys
during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However
if study participants who did not take the survey differ in unobserved ways then our reweighting
based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo
characteristics before imputation of missing values (as described below) these results are very
similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact
that some students did not have 10th grade scores) we created one reading and math score for
each student that is the average of both scores or just the 8th grade score For the 23 participating
students who were in 10th grade during the year in which the AP course was offered to their
cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be
endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of
enrollment in the course so the lack of balance is simply due to unlucky randomization rather
32
than manipulation by school administrators We considered implementing a randomized block
design to avoid such issues but found it infeasible to obtain the necessary test score information
prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was
allowed to register for the new class We added an entire planning year to our study design to
avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by
Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We
find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these
six outcomes which suggests that generalizing our estimated treatment effects to the full control
group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly
offered enrollment in an AP course and then given the option of Chemistry or Biology To
account for the two courses offered we treat the school as two separate groups School-
Chemistry and School-Biology For those students who were not offered an AP course we
randomly assign them to one of two control groups proportional to the number of treated
students who chose each course For example if 60 of the treated students chose Biology then
we randomly assign 60 of the control students to the School-Biology control group In Section
VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a
probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1
if student i in school by cohort j completed any part of the end-of-year survey Xi is the same
vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed
effects and Φ() is the cumulative normal distribution function The results of this regression are
included in Online Appendix Table 2 Students who had higher pre-treatment grades Black
students those who were not disabled and those who took prerequisite courses were more likely
to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives
more weight in the regression to study participants who completed the survey and yet had pre-
study characteristics that were similar to those study participants who did not complete the
survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and
with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we
observe each outcome variable This follows a multiple imputation then deletion strategy
suggested by Hippel (2007) which improves efficiency while protecting against problematic
imputed outcome values As a robustness check Section VC provides results including
imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually
challenging curriculum with more homework than non-AP complier students Treatment group
students are also more likely to report that the students in their class were driven to succeed and
that the teacher set high standards The AP science class also involved more student-led projects
or experiments hands on learning and small group work all activities that are deemed to be
essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)
Yet we do not find strong evidence that students in AP classes were more likely to present what
they learned apply their knowledge to solve a new problem or work independently and none of
the component measures of technology usage were statistically significantly affected Nor did
33
treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear
better able to implement the academic rigor expected of an AP science class than some of the
inquiry-based approaches that the College Board intends for AP science We do not find
evidence that taking AP science led students to be more likely to report that they found their
course more interesting which may reflect the inability of the teachers to fully implement a
creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects
that might render our estimated effects smaller A research design with randomization both
across and within schools would allow for estimation of spillover effects but such a design was
infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers
received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the
weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors
in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple
outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons
(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same
three outcomes that reach statistical significance without applying the correction (shown in
Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys
from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we
have included the students from cohort 1 of high school number 23 where nonresponse was due
mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes
(including student prior academic preparation raceethnicity gender and teacher preparation)
We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in
science and grades in other courses) Some of the differences in the point estimates were quite
large yet so too were the standard errors For instance five of the seven estimated differential
treatment effects on science skill exceed 025 standard deviations with p-values that fall in the
suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse
on all three cohorts of study participants Once data collection is complete we will have the
ability to examine the effect of AP science on college enrollment college selectivity and college
completion
30
1 The AP Test Fee Program awarded $28 million to 40 states the District of Columbia and the
Virgin Islands in 2014 (US Department of Education 2014) 2 A relatively large literature in labor economics and the economics of education estimates the
effect of advanced high school courses more generally often without distinctions between AP
and other rigorous course options Nearly all of these nonexperimental studies find large positive
effects of rigorous secondary school courses particularly those in math and science on studentsrsquo
high school postsecondary and labor market performance (eg Altonji 1995 Attewell and
Domina 2008 Goodman 2012 Joensen and Nielsen 2009 Levine and Zimmerman 1995 Long
Conger and Iatarola 2012 Rose 2004 Rose and Betts 2004) 3 The process by which a course is designated AP involves two steps Teachers who plan to offer
an AP course are encouraged (though not required) to attend a professional development
training The Board and other independent agencies offer several workshops with the most
extensive training being the AP summer institute a week-long training that is led by an
experienced AP instructor Teachers are then expected to develop their syllabi for the course and
submit them to the Board for review A team of auditors at the Board review each syllabus and
grant permission to a school to label the course as AP on course catalogs and student transcripts
once the syllabus has been approved Teachers are also able to revise and resubmit syllabi if they
do not meet the requirements upon original submission College Board (2017b) contains a
discussion of the annual ldquoAP Course Auditrdquo This review of syllabi is the only mechanism for
assessment (ie course delivery and student performance are not assessed by the Board) In
order to effectively run an AP Biology or Chemistry course teachers require access to a well-
equipped classroom and laboratory including all supplies necessary to engage in
experimentation (eg beakers solutions microscopes measuring equipment) Most of the
teachers in our study reported that their classrooms were well-stocked with supplies 4 In 2012 the Board began redesigning several AP courses and exams to prioritize depth of
learning over breadth of coverage and to place ldquogreater emphasis on discipline-specific inquiry
reasoning and communication skillsrdquo (College Board 2017a) The revisions to science courses
were based upon recommendations from the National Science Foundation the National Research
Council and science educators across the United States (National Research Council 2002) 5 Students perception of their own confidence and other personality traits are inherently
influenced by their frames of reference in ways that other assessments of these traits (eg
external observations) may be less influenced By increasing the standard to which they compare
themselves studentsrsquo confidence may decrease This feature of most self-assessments could be
considered a measurement error that biases treatment effects (Dobbie and Fryer Jr 2015 West et
al 2016) Whether this is considered a bias or a true treatment effect on a meaningful outcome
depends to some extent on how these changes in perceived ability influence other behaviors
such as effort and academic motivation 6 The Board recommends Chemistry I and Algebra II as prerequisites for AP Chemistry and
Biology I and Chemistry I for AP Biology with no additional requirements beyond these
prerequisites 7 We offered each participating school $10000 to pay for teacher attendance at a one-week
training course classroom supplies (eg lab materials textbooks) and to compensate schools
for the staff time required for study administration efforts We also offered $1000 compensation
for an individual selected by the school to serve as a liaison between the study team and the
31
school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of
students which would have powered the study to detect effect sizes smaller than those detected
here We faced several challenges in recruiting schools to participate even with the monetary
incentives Some schools were uncomfortable with randomization across classrooms while
others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the
course would be offered We also made some assignments on a rolling basis as additional
consentassent forms were submitted We have no information on the students who were deemed
eligible by the school to take the new AP science course but who did not sign the consent form
to participate As these students did not participate we do not have permission to obtain
information on their characteristics (eg via transcripts) and for most schools we do not know
the number of such students 10 Participating districts include Anaheim Union High School District California East Side
Union High School District California Lynwood Unified School District California Jefferson
Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg
Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public
Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville
Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate
degrees held by teachers nationally are likely to be in education (not STEM) Thus the study
teachers are less likely to have a graduate degree but not necessarily less likely to have STEM
training We also did not survey teachers regarding their Teach for America (TFA) experience
but it is possible that the relatively high share of STEM undergraduate degrees could be driven
by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last
pilot test included 140 students) prior to administering the tool to study participants Reliability
metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of
the development of the assessment instrument in the survey can be found in Seeratan et al
(2017) 13 Each year in the spring semester our team administered and collected the participant surveys
during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However
if study participants who did not take the survey differ in unobserved ways then our reweighting
based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo
characteristics before imputation of missing values (as described below) these results are very
similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact
that some students did not have 10th grade scores) we created one reading and math score for
each student that is the average of both scores or just the 8th grade score For the 23 participating
students who were in 10th grade during the year in which the AP course was offered to their
cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be
endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of
enrollment in the course so the lack of balance is simply due to unlucky randomization rather
32
than manipulation by school administrators We considered implementing a randomized block
design to avoid such issues but found it infeasible to obtain the necessary test score information
prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was
allowed to register for the new class We added an entire planning year to our study design to
avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by
Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We
find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these
six outcomes which suggests that generalizing our estimated treatment effects to the full control
group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly
offered enrollment in an AP course and then given the option of Chemistry or Biology To
account for the two courses offered we treat the school as two separate groups School-
Chemistry and School-Biology For those students who were not offered an AP course we
randomly assign them to one of two control groups proportional to the number of treated
students who chose each course For example if 60 of the treated students chose Biology then
we randomly assign 60 of the control students to the School-Biology control group In Section
VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a
probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1
if student i in school by cohort j completed any part of the end-of-year survey Xi is the same
vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed
effects and Φ() is the cumulative normal distribution function The results of this regression are
included in Online Appendix Table 2 Students who had higher pre-treatment grades Black
students those who were not disabled and those who took prerequisite courses were more likely
to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives
more weight in the regression to study participants who completed the survey and yet had pre-
study characteristics that were similar to those study participants who did not complete the
survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and
with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we
observe each outcome variable This follows a multiple imputation then deletion strategy
suggested by Hippel (2007) which improves efficiency while protecting against problematic
imputed outcome values As a robustness check Section VC provides results including
imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually
challenging curriculum with more homework than non-AP complier students Treatment group
students are also more likely to report that the students in their class were driven to succeed and
that the teacher set high standards The AP science class also involved more student-led projects
or experiments hands on learning and small group work all activities that are deemed to be
essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)
Yet we do not find strong evidence that students in AP classes were more likely to present what
they learned apply their knowledge to solve a new problem or work independently and none of
the component measures of technology usage were statistically significantly affected Nor did
33
treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear
better able to implement the academic rigor expected of an AP science class than some of the
inquiry-based approaches that the College Board intends for AP science We do not find
evidence that taking AP science led students to be more likely to report that they found their
course more interesting which may reflect the inability of the teachers to fully implement a
creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects
that might render our estimated effects smaller A research design with randomization both
across and within schools would allow for estimation of spillover effects but such a design was
infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers
received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the
weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors
in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple
outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons
(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same
three outcomes that reach statistical significance without applying the correction (shown in
Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys
from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we
have included the students from cohort 1 of high school number 23 where nonresponse was due
mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes
(including student prior academic preparation raceethnicity gender and teacher preparation)
We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in
science and grades in other courses) Some of the differences in the point estimates were quite
large yet so too were the standard errors For instance five of the seven estimated differential
treatment effects on science skill exceed 025 standard deviations with p-values that fall in the
suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse
on all three cohorts of study participants Once data collection is complete we will have the
ability to examine the effect of AP science on college enrollment college selectivity and college
completion
31
school to assist in collecting consentassent forms and data 8 In our original research design we planned to recruit 40 schools with up to two cohorts of
students which would have powered the study to detect effect sizes smaller than those detected
here We faced several challenges in recruiting schools to participate even with the monetary
incentives Some schools were uncomfortable with randomization across classrooms while
others were simply unable to commit to offering a new course the following school year 9 Most of the assignments were made in one batch in the spring of the year prior to when the
course would be offered We also made some assignments on a rolling basis as additional
consentassent forms were submitted We have no information on the students who were deemed
eligible by the school to take the new AP science course but who did not sign the consent form
to participate As these students did not participate we do not have permission to obtain
information on their characteristics (eg via transcripts) and for most schools we do not know
the number of such students 10 Participating districts include Anaheim Union High School District California East Side
Union High School District California Lynwood Unified School District California Jefferson
Parish Louisiana Education Achievement Authority Michigan Charlotte-Mecklenburg
Schools North Carolina Winston-SalemForsyth Schools North Carolina Cranston Public
Schools Rhode Island El Paso Independent School District Texas Metropolitan Nashville
Public Schools Tennessee and Richmond Public Schools Virginia 11 Though the study teachers are less likely to hold a masterrsquos degree many of the graduate
degrees held by teachers nationally are likely to be in education (not STEM) Thus the study
teachers are less likely to have a graduate degree but not necessarily less likely to have STEM
training We also did not survey teachers regarding their Teach for America (TFA) experience
but it is possible that the relatively high share of STEM undergraduate degrees could be driven
by higher representation of TFA teachers 12 We developed the instrument over a two-year period and pilot tested it three times (the last
pilot test included 140 students) prior to administering the tool to study participants Reliability
metrics are high (inter-item reliability = 099 person reliability= 071) Further documentation of
the development of the assessment instrument in the survey can be found in Seeratan et al
(2017) 13 Each year in the spring semester our team administered and collected the participant surveys
during the school day in classrooms set aside for survey administration 14 Unweighted results which are similar are contained in the Online Appendix tables However
if study participants who did not take the survey differ in unobserved ways then our reweighting
based on observed characteristics will not eliminate nonignorable nonresponse bias 15 Online Appendix Table 1 presents the balance between treatment and control group membersrsquo
characteristics before imputation of missing values (as described below) these results are very
similar to those shown in Table 2 16 Given the high degree of correlation between studentsrsquo 8th and 10th grade scores (and the fact
that some students did not have 10th grade scores) we created one reading and math score for
each student that is the average of both scores or just the 8th grade score For the 23 participating
students who were in 10th grade during the year in which the AP course was offered to their
cohort we only use the studentrsquos 8th grade test scores as their 10th grade test scores would be
endogenous 17 The studyrsquos Principal Investigator personally handled the randomization of offers of
enrollment in the course so the lack of balance is simply due to unlucky randomization rather
32
than manipulation by school administrators We considered implementing a randomized block
design to avoid such issues but found it infeasible to obtain the necessary test score information
prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was
allowed to register for the new class We added an entire planning year to our study design to
avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by
Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We
find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these
six outcomes which suggests that generalizing our estimated treatment effects to the full control
group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly
offered enrollment in an AP course and then given the option of Chemistry or Biology To
account for the two courses offered we treat the school as two separate groups School-
Chemistry and School-Biology For those students who were not offered an AP course we
randomly assign them to one of two control groups proportional to the number of treated
students who chose each course For example if 60 of the treated students chose Biology then
we randomly assign 60 of the control students to the School-Biology control group In Section
VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a
probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1
if student i in school by cohort j completed any part of the end-of-year survey Xi is the same
vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed
effects and Φ() is the cumulative normal distribution function The results of this regression are
included in Online Appendix Table 2 Students who had higher pre-treatment grades Black
students those who were not disabled and those who took prerequisite courses were more likely
to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives
more weight in the regression to study participants who completed the survey and yet had pre-
study characteristics that were similar to those study participants who did not complete the
survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and
with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we
observe each outcome variable This follows a multiple imputation then deletion strategy
suggested by Hippel (2007) which improves efficiency while protecting against problematic
imputed outcome values As a robustness check Section VC provides results including
imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually
challenging curriculum with more homework than non-AP complier students Treatment group
students are also more likely to report that the students in their class were driven to succeed and
that the teacher set high standards The AP science class also involved more student-led projects
or experiments hands on learning and small group work all activities that are deemed to be
essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)
Yet we do not find strong evidence that students in AP classes were more likely to present what
they learned apply their knowledge to solve a new problem or work independently and none of
the component measures of technology usage were statistically significantly affected Nor did
33
treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear
better able to implement the academic rigor expected of an AP science class than some of the
inquiry-based approaches that the College Board intends for AP science We do not find
evidence that taking AP science led students to be more likely to report that they found their
course more interesting which may reflect the inability of the teachers to fully implement a
creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects
that might render our estimated effects smaller A research design with randomization both
across and within schools would allow for estimation of spillover effects but such a design was
infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers
received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the
weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors
in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple
outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons
(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same
three outcomes that reach statistical significance without applying the correction (shown in
Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys
from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we
have included the students from cohort 1 of high school number 23 where nonresponse was due
mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes
(including student prior academic preparation raceethnicity gender and teacher preparation)
We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in
science and grades in other courses) Some of the differences in the point estimates were quite
large yet so too were the standard errors For instance five of the seven estimated differential
treatment effects on science skill exceed 025 standard deviations with p-values that fall in the
suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse
on all three cohorts of study participants Once data collection is complete we will have the
ability to examine the effect of AP science on college enrollment college selectivity and college
completion
32
than manipulation by school administrators We considered implementing a randomized block
design to avoid such issues but found it infeasible to obtain the necessary test score information
prior to randomization given the schoolsrsquo needs for speed in deciding whether the student was
allowed to register for the new class We added an entire planning year to our study design to
avoid these issues but many schools were unable to plan ahead 18 To evaluate whether this non-compliance is ignorable we conducted the test recommended by
Huber (2013) for each of our six main outcomes (ie those subsequently shown in Table 5) We
find that we cannot reject the null hypothesis that non-compliance is ignorable for any of these
six outcomes which suggests that generalizing our estimated treatment effects to the full control
group may not be unreasonable 19 One district in our study offered both AP courses Students in this school were randomly
offered enrollment in an AP course and then given the option of Chemistry or Biology To
account for the two courses offered we treat the school as two separate groups School-
Chemistry and School-Biology For those students who were not offered an AP course we
randomly assign them to one of two control groups proportional to the number of treated
students who chose each course For example if 60 of the treated students chose Biology then
we randomly assign 60 of the control students to the School-Biology control group In Section
VC we show that our results are also robust to dropping this school entirely 20 To compute these weights we first estimate the parameters of the following equation using a
probit regression Pr(CompletedSurveyij = 1) = Φ(microj + Xiρ+ ϵij) where CompletedSurveyij equals 1
if student i in school by cohort j completed any part of the end-of-year survey Xi is the same
vector of pre-treatment characteristics in the previous equations microj are school by cohort fixed
effects and Φ() is the cumulative normal distribution function The results of this regression are
included in Online Appendix Table 2 Students who had higher pre-treatment grades Black
students those who were not disabled and those who took prerequisite courses were more likely
to complete the survey The inverse probability weight is computed as 1Φ(microj + Xi ) and gives
more weight in the regression to study participants who completed the survey and yet had pre-
study characteristics that were similar to those study participants who did not complete the
survey These weights range from 10 to 155 with a median (mean) weight of i 12 (15) and
with 90 of students receiving a weight less than 20 21 We impute with the full dataset yet only estimate regressions on the sample for which we
observe each outcome variable This follows a multiple imputation then deletion strategy
suggested by Hippel (2007) which improves efficiency while protecting against problematic
imputed outcome values As a robustness check Section VC provides results including
imputation of missing outcome variables 22 Detailed course-by-course impacts are available from the authors 23 Online Appendix Table 4 shows the unweighted results that are comparable to Table 4 24 Online Appendix Table 5 shows that AP science students report a much more intellectually
challenging curriculum with more homework than non-AP complier students Treatment group
students are also more likely to report that the students in their class were driven to succeed and
that the teacher set high standards The AP science class also involved more student-led projects
or experiments hands on learning and small group work all activities that are deemed to be
essential to an inquiry-based classroom (Bennett et al 2010 National Research Council 2012)
Yet we do not find strong evidence that students in AP classes were more likely to present what
they learned apply their knowledge to solve a new problem or work independently and none of
the component measures of technology usage were statistically significantly affected Nor did
33
treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear
better able to implement the academic rigor expected of an AP science class than some of the
inquiry-based approaches that the College Board intends for AP science We do not find
evidence that taking AP science led students to be more likely to report that they found their
course more interesting which may reflect the inability of the teachers to fully implement a
creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects
that might render our estimated effects smaller A research design with randomization both
across and within schools would allow for estimation of spillover effects but such a design was
infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers
received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the
weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors
in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple
outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons
(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same
three outcomes that reach statistical significance without applying the correction (shown in
Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys
from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we
have included the students from cohort 1 of high school number 23 where nonresponse was due
mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes
(including student prior academic preparation raceethnicity gender and teacher preparation)
We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in
science and grades in other courses) Some of the differences in the point estimates were quite
large yet so too were the standard errors For instance five of the seven estimated differential
treatment effects on science skill exceed 025 standard deviations with p-values that fall in the
suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse
on all three cohorts of study participants Once data collection is complete we will have the
ability to examine the effect of AP science on college enrollment college selectivity and college
completion
33
treatment teachers rely less on lecturing and multiple choice quizzes Thus teachers appear
better able to implement the academic rigor expected of an AP science class than some of the
inquiry-based approaches that the College Board intends for AP science We do not find
evidence that taking AP science led students to be more likely to report that they found their
course more interesting which may reflect the inability of the teachers to fully implement a
creative inquiry-based environment 25 In addition to the teacher spillover effects peer interactions could cause contamination effects
that might render our estimated effects smaller A research design with randomization both
across and within schools would allow for estimation of spillover effects but such a design was
infeasible due to the challenge and added cost of recruiting control schools 26 Note that 30 percent of treatment group compliers and 21 percent of control group compliers
received a grade of C (ie 20) or lower in their most recent science class 27 Grade weights may also affect AP participation which is one of the main purposes of the
weights (Klopfenstein and Lively 2016) 28 See Abadie et al (2017) for a discussion of the use of robust versus clustered standard errors
in models with fixed effects 29 To address the possibility of a heightened probability of Type I error given the multiple
outcomes we also apply the Benjamini-Hochberg Correction for Multiple Comparisons
(Benjamini and Hochberg 1995) At a preferred critical p-value of 005 we find that the same
three outcomes that reach statistical significance without applying the correction (shown in
Column (2) of Table 6 remain statistically significant after applying the correction 30 This school had a 27 percent response rate partially as a result of all of the completed surveys
from the first cohort of administration being lost in transit after they were completed 31 We are probably also a bit overly conservative in computing the Lee bounds estimates as we
have included the students from cohort 1 of high school number 23 where nonresponse was due
mostly to the surveys being lost after administration 32 We tested for heterogeneity in treatment effects along seven student and teacher attributes
(including student prior academic preparation raceethnicity gender and teacher preparation)
We also conducted ITT quantile regressions for the continuous outcomes (science skill grades in
science and grades in other courses) Some of the differences in the point estimates were quite
large yet so too were the standard errors For instance five of the seven estimated differential
treatment effects on science skill exceed 025 standard deviations with p-values that fall in the
suggestive (016) to noisy (034) range 33 We are currently in the process of gathering records from the National Student Clearinghouse
on all three cohorts of study participants Once data collection is complete we will have the
ability to examine the effect of AP science on college enrollment college selectivity and college
completion