A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and...

39
A 8–Year Review and A 8–Year Review and Lessons Learned from Lessons Learned from Federal Education Federal Education Evaluations: 2002-2009 Evaluations: 2002-2009 Prepared by: Prepared by: Margaret Cahalan Margaret Cahalan Sharon Stout Sharon Stout Prepared for: Prepared for: AEA Meeting AEA Meeting November 13 2009 November 13 2009 Draft not for Quotation or Citation: All views and data presented are solely those of the authors and do not reflect any review or authorization from the US Department of Education

description

This AEA presentation gives results from systematic review of all published reports from 2002 to 2009

Transcript of A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and...

Page 1: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

A 8–Year Review and Lessons A 8–Year Review and Lessons Learned from Federal Education Learned from Federal Education

Evaluations: 2002-2009Evaluations: 2002-2009Prepared by:Prepared by:

Margaret CahalanMargaret CahalanSharon StoutSharon Stout

Prepared for:Prepared for:

AEA Meeting AEA Meeting

November 13 2009November 13 2009

Draft not for Quotation or Citation: All views and data presented are solely those of the authors and do not reflect any review or authorization from the US Department of Education

Page 2: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

A Review of IES/NCEE & OPEPD/PPSS A Review of IES/NCEE & OPEPD/PPSS Research ReportsResearch Reports

Part 1: Overview---All reports listed on web sites published between 2002 and 2009--Looked at level, methods, topics, findings

• Part 2: Case Study Example of difficulties in using evaluation research for PART, budget and program reform ----college access and persistence

• Part 3: Lessons Learned--New Directions/Models for Evaluation

Page 3: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

Part I: Overview of Studies of Part I: Overview of Studies of Two ED-OfficesTwo ED-Offices

• 2002-2009—Period of transition between offices– OPEPD/PPSS--Office Planning Evaluation Policy Development/Policy

and Planning Studies Services (known as PPSS in this presentation)

• Earlier office responsible for most evaluations prior to 2002

• Reports to --OPEPD Assistant Secretary

• Informal agreement would do primarily Implementation evaluation studies after 2002

– IES/NCEE---Institute for Education Sciences/ National Center Education Evaluation (known as IES in this presentation)

• Created in re-organization in 2002 –former OERI

• Independent with Commissioner confirmed by congress

• Legislative mandate for evaluations--scientific approach—medical model-- experimental design—random assignment

Page 4: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

Published Reports from IES\NCEE and Published Reports from IES\NCEE and OPEPD/PPSS: 2002-2009 OPEPD/PPSS: 2002-2009

117 ---Evaluation Reports

IES/NCEE: Primarily experimental design

27 reports (2003 to 2009)

OPEPD/PPSS: Primarily

Implementation Studies 90 reports

(2002-2009)

15 Studies (study with multiple reports counted

once)

50 Studies (study with multiple reports counted

once)

Page 5: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

IES/NCEE Studies: 2003-2009IES/NCEE Studies: 2003-200927 reports– of 15 studies (Largely 27 reports– of 15 studies (Largely

Practice Based)Practice Based)1. Teacher Induction

2. Supplemental Reading Comprehension Interventions-

3. DC Opportunity Scholarship Program

4. Student Mentoring Program

5. Reading and Mathematics Software Products

6. Teachers Trained Through Different Routes to Certification

7. Professional Development Strategies to Improve Early Reading

8. Reading First Impact Study Comprehension

9. Classroom Literacy Interventions and Outcomes in Even Start (Started PPSS)

10. Enhanced Reading Opportunities Study

11. Four Early Elementary School Math Curricula

12. Enhanced Academic Instruction in After-School Programs—48 more hours

13. Striving Readers Study Closing the Reading Gap

14. Early Reading First

15. National Evaluation of 21st Century Community Learning Centers (started by PPSS)

Page 6: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

PPSS—50 studies (Largely PPSS—50 studies (Largely Program Based)Program Based)

• Career and Technical Education (CTE)-3

• Federal college access and retention programs TRIO and GEAR UP—12

• Title 1 (general, SES, Choice, funding, accountability, AYP & improvement)—9

• Single sex schools –1

• Private schools under Title 1-1

• Literacy through Libraries Program-1

• Reading First-2

• Teacher Quality-4

• Comprehensive School Reform—2

• Class Size Reduction-1

• Voluntary school choice-1

• Small Learning Communities-1

• Faith Based-1

• Technology & Data System Use—5

• Charter Schools-2

• Sex Misconduct-1

• Magnet Program-1

• Migrant Program—Title 1

• Even Start Program-1

Page 7: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

Distribution of Studies By Distribution of Studies By School Level School Level

IES/NCEE 15 Studies OPEPD/PPSS 50 Studies

Page 8: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

Topic Summary: IES/NCEETopic Summary: IES/NCEE

Page 9: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

Topics for OPEPD/PPSS Topics for OPEPD/PPSS Elementary-Secondary StudiesElementary-Secondary Studies

Page 10: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

Topics/Programs of Studies with Reports Topics/Programs of Studies with Reports Published in Secondary-Postsecondary by Published in Secondary-Postsecondary by

PPSS: 2002-2009PPSS: 2002-2009

Page 11: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

Major Method IES/NCEE: 15 Major Method IES/NCEE: 15 StudiesStudies

Page 12: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

OPEPD/PPSS Major Methods: OPEPD/PPSS Major Methods: 50 Studies50 Studies

Page 13: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

Summary of Effects IES/NCEE: Summary of Effects IES/NCEE: 15 Studies15 Studies

Page 14: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

Effects Summary 23 PPSS Studies that Effects Summary 23 PPSS Studies that report comparative effects or outcomesreport comparative effects or outcomes

Page 15: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

15 IES/NCEE Studies Topics

Effects Summary

Reading 6 studies (one 9th grade positive; 3 small positive on decoding not comprehension; 2 not detected; findings that practices were different but not outcomes; the one study with positive involved more time on reading—9th grade study)

DC Voucher 1 study --First two years no effects; third year began to see some effects for reading not for math; 26 percent did not take opportunity

Mentoring 1 study-- not overall positive effects--some positive for some subgroups—60 percent of controls also had mentoring;

Reading and Math Software 1 study ( not detected positive overall )

Teacher—Reading PD, Induction, Alternative Certification

3 studies (not detected overall on student outcomes; some on teacher practice for reading)

4 Math Curricula 1-study 2 curriculum’s had positive effects—relative to each other

Afterschool 2 studies—1--not effects on academics or behavior; second study—positive on math not reading–48 hours more instruction

Page 16: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

Observations— 15 IES/NCEE Observations— 15 IES/NCEE StudiesStudies

• Topics– Lack of direct link to federal program but sought more understanding of intervention

—most funded by program funds—Title1and Title 2 of ESEA/NCLB

• Effects– Few overall effects –especially on students when counterfactual was existing

practice– Observe some effects on teacher practices associated with intervention—not

translate into changes for students’ test scores– Some effects seen when increase time spent on subject – Effects sometimes emerge over time when the intervention is sustained

• Issues – Counterfactual why would not assume would use science based practices– Contamination of control group; spill over issues– Intent to treat drop outs– Implementing intervention issues– Lack of external validity –time and space----does the same circumstances ever happen again?

Moving targets for curriculums and technology

Page 17: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

K-12 PPSS Studies Topics

Effects Summary

Title 1-general & Funding

Proficiency increased; however most states will not meet goals by 2013-14; Over 1100 schools in need of improvement by 2006 and most were in advanced stages of corrective action; did not show changes in time on subjects; poorest schools get less per capita; federal more targeted than state and local

Teacher Over 90 percent of teachers met highly qualified; however those in high poverty have less experience and less likely to have degree in subject teaching and less professional development than less poverty schools;

Supplemental ED Services (SES)

Increase in number of schools eligible; however participation rate is stable--about 17 percent taking advantage ; Those who participated were more likely to show growth than eligible non-participants in sub-study of districts—selection effects issues; Waiver studies—mixed results—higher proportion eligible participating when district service—in two large districts do better in two small did not do better

Choice/Voluntary School Choice

About 1 percent of students participating; Mixed results—some show no difference in outcomes when compared with those not participating; one that students did better. Most districts kept choice within district and most did not go to high performing schools when transferred out of low performing schools

Reading First More time on reading and adopted practices; limited evidence 3rd and 4th graders increased reading more than those not in programs; regression discontinuity

Page 18: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

K-12 PPSS Studies (continued) Topics

Effects Summary

Charter Facilities study found loans were given to schools that served more needy population than other charter schools in areas; most would not have qualified for private loans-too high risk

Descriptive study on characteristics focus on accountability

Comprehensive School Reform

No effects for those who got grants; evidence that those who implemented practices and longer term had increase in growth especially in math

Small Learning Communities and class size

Primarily implementation study; performance reports before and after for schools found increase in and decrease in violence incidence; 9 th grade indicators and drop out rate; plains for college; no change in test

Even Start Random assignment—no difference detected on major child or parent outcomes; counter factual issues

Technology Meta-analysis found significant effects for on-line courses; small number of studies; extensive reports on status of technology use access in schools and class rooms. Documents growth of use computer and internet and also less use in poorer schools. Use generally not related to test score differences

Teaching History & Libraries

History professional development---most who took were experienced teachers; increased knowledge. Library grants increased materials and associated with higher than expected growth in test scores

Page 19: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

Observations— 50 PPSS Observations— 50 PPSS StudiesStudies

• Topics:

– Studies more direct link to program

– Attempts to get at impact overall of program-became black box studies

– Studies initiated after re-organization were implementation studies that often combined descriptive with comparative observation of outcomes of policy or services linked to NCLB requirements or services--projections

– Intervention studied was more likely to be a supplemental service rather than a replacement of curriculum

• Effects

– More positive effects observed—use propensity matching, matched comparison—larger n’s use of aggregated data or administrative student records

– Larger effects seen when increase time spent on subject or in the program

• Issues

– Selection effects (voluntary participation interaction of service and motivation;

– Difficulty in adequate comparison groups

– Contamination of control group; spill over issues

– Time scale –usually start study when start program---implementing intervention issues

– More external validity; less rigorous designs for internal validity

Page 20: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

Part 2: Example of Issues Faced in Part 2: Example of Issues Faced in Using Evaluations for PART and Using Evaluations for PART and

BudgetBudget

PPSS Postsecondary College PPSS Postsecondary College Access and Retention Studies Access and Retention Studies

Case HistoryCase History

Page 21: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

Table 1. Summary of Selected Federal College Support Programs: FY07-08

Federal PROGRAMS and Year Begun Total funding

Number of

grants Number served

Average award

Amount per

person served

PART RATING and Year Parted

Educational Opportunity Centers (1974) $47,006,891 125 195,795 $376,005 $240

Results not demonstrated

(2005)

McNair (1989) $45,604,173 190 5,210 $240,021 $8,753 Moderately

Effective

Student Support Services (1970) $271,566,777 949 199,499 $286,160 $1,361

Moderately Effective

(2003)

Talent Search (1971) $142,884,182 471 366,330 $303,363 $390

Moderately Effective

(2005)

Upward Bound (1965) $328,528,379 1017 71,247 $323,036 $4,611 Ineffective

2002

Upward Bound Math Science (1991) $33,976,483 126 6,808 $269,654 $4,990

Ineffective 2002 (with

UB)

Veterans Upward Bound (1974) $13,351,830 46 5,660 $290,257 $2,358

Ineffective 2002 (with

UB)

GEAR UP (State Grants)

(1998) $114,929,000 40 424,513 $2,873.00 $271

Adequate

(2003)

GEAR UP (Partnership

Grants) (1998) $186,953,000 174 331,245 $1,074,442 $564

Adequate

(2003)

Page 22: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

Rough Estimates of Percent of Rough Estimates of Percent of Eligible Population ServedEligible Population Served

Page 23: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

PPSS College Access and Postsecondary Studies Topics Effects Summary and Policy History

StudentSupportServices (SSS)

Propensity matching found positive impact for participation in SSS supported services whether funded by the SSS project or some other sources on college retention and completionModerately effective- 2003

Talent Search

Propensity matching found positive impacts in 3 states using longitudinal data bases on high school completion, financial aid and postsecondary entranceModerately effective-2005

GEAR Up Matched comparison of 7th graders in original cohort when re-surveyed in 8th grade found increases in parent knowledge and aspirations and student knowledge. Did not find impact on academic grades or course taking. Found impact on course taking for some sub-groups.Adequate- 2003

Upward Bound Math/Science

Propensity matching found positive impacts on college entrance, 4-year degree completion, planning to major in math or science and majoring in social science; majoring in math or natural science not statistically significant although positive direction—small n’s

Page 24: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

Upward Bound

Ineffective—PART-2002

•Gold Standard Random assignment study with inverse of probability weights for national estimates begun in early 1990s and followed until 2004;•Several Mathematica reports over 10 years (1999-2009) reported no overall positive findings but large effects for students more academically at risk and with lower expectations at baseline.•On basis of study given “ineffective rating” in PART in 2002• Zero funding in Bush Budget FY2005 & FY2006• OMB efforts to reform based on study results--UB initiative & Absolute Priority-1/3 at risk•New IES Study to Evaluate started in 2006 –large opposition by UB grantees Congress cancelled in 2007•HEOA:2008 contains prohibition of certain RA—where required to recruit more than normally do for purposes of study; HEOA also cancelled Absolute Priority•Obama budgets level funded programs in FY10 despite establishing large college access and completion innovation fund•Recent re-analyses by ED staff person who was COR for study found these estimates were confounded by bias in favor of the control group caused by one project with extreme weight (26 percent of the total) who had a large imbalance between treatment and control group. Control group members were more likely to have high expectations, less likely to be academically at risk and more likely to be older at start of study. • A re-analyses found statistically significant and substantial impacts for UB on the key outcomes: postsecondary entrance, award of financial aid, and postsecondary degree completion including •Re-analysis Found Treatment on Treated (TOT) 50 percent increase in BA attainment for the average of 66 of the 67 projects in the sample.

Page 25: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

National Evaluation of Upward National Evaluation of Upward Bound: 1992-2003-04Bound: 1992-2003-04

• Controversial Study—random assignment combined with probability national sample—very rare

• Mathematica reported findings of lack of overall impact but seeming large impact for higher academic risk students and those with lower expectations

• Ineffective PART Rating ---OMB attempt to reform program with UB Initiatives and Absolute Priority to Serve more at-risk students—fit with priority of Bush administration

• Bush Budget called for zero funding of all pre-college programs in FY05 and FY06

Page 26: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

Policy History (cont)Policy History (cont)

• UB 2006 Absolute Priority to serve 1/3 at-risk and 9th grade

• New IES random assignment study to evaluate begun 2006

• Following strong objections from UB projects and Council on Opportunity in Education (COE), Congress prohibited funding in 2007 and cancelled by ED in 2008

Page 27: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

HEOA:2008 & TRIO FY10 HEOA:2008 & TRIO FY10 Budget Budget

– HEOA:2008• Affirms goal of evaluation to identify: What are best practices

to achieve goals of program?• Contains prohibition on certain types of random assignment

studies that would require projects to recruit more than would normally serve only for the purposes of the studies—denial of service issues—work with IRB’s

• Cancelled At-Risk Absolute Priority• HEOA specifies should begin a new rigorous UB study by

2010– Obama Budget for FY10—level funded TRIO and GEAR UP while

putting in large money into new college access and completion innovation funds to attain priorities—5 million new degrees

Page 28: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

ED PPSS Staff QA Re-examination of ED PPSS Staff QA Re-examination of Sample Design and Re-AnalysesSample Design and Re-Analyses

• Found several serious bias issues combined to confound the results published by Mathematica in 2004 and 2009 concerning the UB program

• Re-examination found study impact estimates suffer from: 1) uncontrolled for bias in favor of control group; 2) unequal weighting with one project with 26 percent of weight; and 3) serious representational issues with regard to 4-year grantees

• When these issues are addressed using standard statistical methods—the study shows strong positive impacts for the UB program

Page 29: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

Basic Assumptions of Random Assignment Studies –Basic Assumptions of Random Assignment Studies –

1. Sample representative of population to which wish to generalize

2. Treatment and control group are equivalent

3. Treatment and control group treated equally except for the treatment

4. Treatment and control group are mutually exclusive with regard to the treatment

Page 30: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

Extreme unequal weighting and Extreme unequal weighting and serious representation issuesserious representation issues

• Project with 26 percent of weight (known as 69) was sole representative of 4-year public strata, but was a former 2-year school with largely less than 2-year programs

• Project partnered with job training program

• Inadequate representation of 4-year

Figure 5. Percent of sum of the weights by project of the 67 projects making up the Upward Bound national evaluation sample: study conducted 1992-93-2003-04

2 6 .3 8

0

5

1 0

1 5

2 0

2 5

3 0

P e rc e n t o f w e ig h t

N OTE : O f the 67 projects making up the UB sample just over half (54 percent) have less than 1 percent of the weights each and one project (69) accounts for 26.4 percent of the weights. SOURCE : D ata tabulated (D ecember 2007) by Policy and Program Studies Service (PPSS) of O ffice of Planning, Evaluation and Policy D evelopment (O PEPD ) US D epartment of Education (ED ) using national evaluation of Upward Bound data files: study conducted 1992 -93-2003-04.

Page 31: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

Bias in Favor of Control Group: Bias in Favor of Control Group: Imbalance in Project 69Imbalance in Project 69

Page 32: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

Almost Balance in 66 other Almost Balance in 66 other ProjectsProjects

Page 33: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

Uncontrolled for Bias in Favor of Control Uncontrolled for Bias in Favor of Control Group in Overall Sample with project 69 Group in Overall Sample with project 69

includedincluded

Page 34: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

Impact Estimates from Two Stage Instrumental Variables Regression for Impact Estimates from Two Stage Instrumental Variables Regression for Percent Obtaining a BA in +6 years based on UB Random Assignment Percent Obtaining a BA in +6 years based on UB Random Assignment

EvaluationEvaluation

Note: All estimates significant at the .01 level or higher. Estimates based on 66 of 67 projects in sample representing 74 percent of UB at the time of the study. One project removed due to introducing bias into estimates and representational issues.

Page 35: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

Estimated tax impact of UB compared to estimated cost Estimated tax impact of UB compared to estimated cost based on average length of participationbased on average length of participation

Increased taxes collected are 4.9 times higher than per participant cost of program

Page 36: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

Part 3: Lessons LearnedPart 3: Lessons Learned• Pay attention to sampling and non-sampling errors—evaluate the

evaluations

• Must ask question—Is the state of the art of evaluation as we practice it really capable of being the source of differentiating budget allocations?

• Zero Sum Issues---How to get around the “zero sum” game in which projects are competing with each other for scarce resources—efforts to game the system---

• Stakeholder Involvement critical for program improvement---Partnership ----Focus on how programs can work together and contribute to the shared goals—formative assessment

• Is non-punitive accountability possible ?

• Strength Finder Approach as opposed to focusing on deficits--Possible to work from strengths of program rather than weaknesses

Page 37: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

Current Models in Evaluation Current Models in Evaluation ProfessionProfession

• Partnership –engage practitioners—formative assessment—evaluation as tool for improvement—using data to improve

• Utilization Focused—Users and Client focus--Continuous Improvement—evaluation

• Standards based (feasible, accurate, ethical, transparent, useful)• Ex Ante—Theory of change—what impact can reasonably expect

given the intervention and given the system—multiple methods• Systems theory—role in contributing to the whole--interactions• Complexity theory—changing conditions-adaptation needed—

away from summative—never repeat same situation exactly—rapid time feedback for adaptation and innovation

Page 38: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

GEAR UP Next Generation GEAR UP Next Generation StudiesStudies

• Partnership Design work—systemic review, focus groups with grantees, expert papers on topics, APR analyses

• Concept of using data as feedback for program improvement—Engagement of practitioners in defining interventions to study in rigorous manner—leverage institutional research capabilities—cross site analyses– Issued call for grantee partnerships to submit

statements of interest in planning awards for rigorous studies—July 2009—$25,000

– About to award 44 small planning/proposal awards to grantees—submit developed proposals in Spring

– Implementation awards 2010-2012—

Page 39: A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout

Process of Evaluations for Process of Evaluations for ImprovementImprovement