APprOV~ lJest~ Credits - UW-Stout - University of Wisconsin-Stout
A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and...
description
Transcript of A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and...
A 8–Year Review and Lessons A 8–Year Review and Lessons Learned from Federal Education Learned from Federal Education
Evaluations: 2002-2009Evaluations: 2002-2009Prepared by:Prepared by:
Margaret CahalanMargaret CahalanSharon StoutSharon Stout
Prepared for:Prepared for:
AEA Meeting AEA Meeting
November 13 2009November 13 2009
Draft not for Quotation or Citation: All views and data presented are solely those of the authors and do not reflect any review or authorization from the US Department of Education
A Review of IES/NCEE & OPEPD/PPSS A Review of IES/NCEE & OPEPD/PPSS Research ReportsResearch Reports
Part 1: Overview---All reports listed on web sites published between 2002 and 2009--Looked at level, methods, topics, findings
• Part 2: Case Study Example of difficulties in using evaluation research for PART, budget and program reform ----college access and persistence
• Part 3: Lessons Learned--New Directions/Models for Evaluation
Part I: Overview of Studies of Part I: Overview of Studies of Two ED-OfficesTwo ED-Offices
• 2002-2009—Period of transition between offices– OPEPD/PPSS--Office Planning Evaluation Policy Development/Policy
and Planning Studies Services (known as PPSS in this presentation)
• Earlier office responsible for most evaluations prior to 2002
• Reports to --OPEPD Assistant Secretary
• Informal agreement would do primarily Implementation evaluation studies after 2002
– IES/NCEE---Institute for Education Sciences/ National Center Education Evaluation (known as IES in this presentation)
• Created in re-organization in 2002 –former OERI
• Independent with Commissioner confirmed by congress
• Legislative mandate for evaluations--scientific approach—medical model-- experimental design—random assignment
Published Reports from IES\NCEE and Published Reports from IES\NCEE and OPEPD/PPSS: 2002-2009 OPEPD/PPSS: 2002-2009
117 ---Evaluation Reports
IES/NCEE: Primarily experimental design
27 reports (2003 to 2009)
OPEPD/PPSS: Primarily
Implementation Studies 90 reports
(2002-2009)
15 Studies (study with multiple reports counted
once)
50 Studies (study with multiple reports counted
once)
IES/NCEE Studies: 2003-2009IES/NCEE Studies: 2003-200927 reports– of 15 studies (Largely 27 reports– of 15 studies (Largely
Practice Based)Practice Based)1. Teacher Induction
2. Supplemental Reading Comprehension Interventions-
3. DC Opportunity Scholarship Program
4. Student Mentoring Program
5. Reading and Mathematics Software Products
6. Teachers Trained Through Different Routes to Certification
7. Professional Development Strategies to Improve Early Reading
8. Reading First Impact Study Comprehension
9. Classroom Literacy Interventions and Outcomes in Even Start (Started PPSS)
10. Enhanced Reading Opportunities Study
11. Four Early Elementary School Math Curricula
12. Enhanced Academic Instruction in After-School Programs—48 more hours
13. Striving Readers Study Closing the Reading Gap
14. Early Reading First
15. National Evaluation of 21st Century Community Learning Centers (started by PPSS)
PPSS—50 studies (Largely PPSS—50 studies (Largely Program Based)Program Based)
• Career and Technical Education (CTE)-3
• Federal college access and retention programs TRIO and GEAR UP—12
• Title 1 (general, SES, Choice, funding, accountability, AYP & improvement)—9
• Single sex schools –1
• Private schools under Title 1-1
• Literacy through Libraries Program-1
• Reading First-2
• Teacher Quality-4
• Comprehensive School Reform—2
• Class Size Reduction-1
• Voluntary school choice-1
• Small Learning Communities-1
• Faith Based-1
• Technology & Data System Use—5
• Charter Schools-2
• Sex Misconduct-1
• Magnet Program-1
• Migrant Program—Title 1
• Even Start Program-1
Distribution of Studies By Distribution of Studies By School Level School Level
IES/NCEE 15 Studies OPEPD/PPSS 50 Studies
Topic Summary: IES/NCEETopic Summary: IES/NCEE
Topics for OPEPD/PPSS Topics for OPEPD/PPSS Elementary-Secondary StudiesElementary-Secondary Studies
Topics/Programs of Studies with Reports Topics/Programs of Studies with Reports Published in Secondary-Postsecondary by Published in Secondary-Postsecondary by
PPSS: 2002-2009PPSS: 2002-2009
Major Method IES/NCEE: 15 Major Method IES/NCEE: 15 StudiesStudies
OPEPD/PPSS Major Methods: OPEPD/PPSS Major Methods: 50 Studies50 Studies
Summary of Effects IES/NCEE: Summary of Effects IES/NCEE: 15 Studies15 Studies
Effects Summary 23 PPSS Studies that Effects Summary 23 PPSS Studies that report comparative effects or outcomesreport comparative effects or outcomes
15 IES/NCEE Studies Topics
Effects Summary
Reading 6 studies (one 9th grade positive; 3 small positive on decoding not comprehension; 2 not detected; findings that practices were different but not outcomes; the one study with positive involved more time on reading—9th grade study)
DC Voucher 1 study --First two years no effects; third year began to see some effects for reading not for math; 26 percent did not take opportunity
Mentoring 1 study-- not overall positive effects--some positive for some subgroups—60 percent of controls also had mentoring;
Reading and Math Software 1 study ( not detected positive overall )
Teacher—Reading PD, Induction, Alternative Certification
3 studies (not detected overall on student outcomes; some on teacher practice for reading)
4 Math Curricula 1-study 2 curriculum’s had positive effects—relative to each other
Afterschool 2 studies—1--not effects on academics or behavior; second study—positive on math not reading–48 hours more instruction
Observations— 15 IES/NCEE Observations— 15 IES/NCEE StudiesStudies
• Topics– Lack of direct link to federal program but sought more understanding of intervention
—most funded by program funds—Title1and Title 2 of ESEA/NCLB
• Effects– Few overall effects –especially on students when counterfactual was existing
practice– Observe some effects on teacher practices associated with intervention—not
translate into changes for students’ test scores– Some effects seen when increase time spent on subject – Effects sometimes emerge over time when the intervention is sustained
• Issues – Counterfactual why would not assume would use science based practices– Contamination of control group; spill over issues– Intent to treat drop outs– Implementing intervention issues– Lack of external validity –time and space----does the same circumstances ever happen again?
Moving targets for curriculums and technology
K-12 PPSS Studies Topics
Effects Summary
Title 1-general & Funding
Proficiency increased; however most states will not meet goals by 2013-14; Over 1100 schools in need of improvement by 2006 and most were in advanced stages of corrective action; did not show changes in time on subjects; poorest schools get less per capita; federal more targeted than state and local
Teacher Over 90 percent of teachers met highly qualified; however those in high poverty have less experience and less likely to have degree in subject teaching and less professional development than less poverty schools;
Supplemental ED Services (SES)
Increase in number of schools eligible; however participation rate is stable--about 17 percent taking advantage ; Those who participated were more likely to show growth than eligible non-participants in sub-study of districts—selection effects issues; Waiver studies—mixed results—higher proportion eligible participating when district service—in two large districts do better in two small did not do better
Choice/Voluntary School Choice
About 1 percent of students participating; Mixed results—some show no difference in outcomes when compared with those not participating; one that students did better. Most districts kept choice within district and most did not go to high performing schools when transferred out of low performing schools
Reading First More time on reading and adopted practices; limited evidence 3rd and 4th graders increased reading more than those not in programs; regression discontinuity
K-12 PPSS Studies (continued) Topics
Effects Summary
Charter Facilities study found loans were given to schools that served more needy population than other charter schools in areas; most would not have qualified for private loans-too high risk
Descriptive study on characteristics focus on accountability
Comprehensive School Reform
No effects for those who got grants; evidence that those who implemented practices and longer term had increase in growth especially in math
Small Learning Communities and class size
Primarily implementation study; performance reports before and after for schools found increase in and decrease in violence incidence; 9 th grade indicators and drop out rate; plains for college; no change in test
Even Start Random assignment—no difference detected on major child or parent outcomes; counter factual issues
Technology Meta-analysis found significant effects for on-line courses; small number of studies; extensive reports on status of technology use access in schools and class rooms. Documents growth of use computer and internet and also less use in poorer schools. Use generally not related to test score differences
Teaching History & Libraries
History professional development---most who took were experienced teachers; increased knowledge. Library grants increased materials and associated with higher than expected growth in test scores
Observations— 50 PPSS Observations— 50 PPSS StudiesStudies
• Topics:
– Studies more direct link to program
– Attempts to get at impact overall of program-became black box studies
– Studies initiated after re-organization were implementation studies that often combined descriptive with comparative observation of outcomes of policy or services linked to NCLB requirements or services--projections
– Intervention studied was more likely to be a supplemental service rather than a replacement of curriculum
• Effects
– More positive effects observed—use propensity matching, matched comparison—larger n’s use of aggregated data or administrative student records
– Larger effects seen when increase time spent on subject or in the program
• Issues
– Selection effects (voluntary participation interaction of service and motivation;
– Difficulty in adequate comparison groups
– Contamination of control group; spill over issues
– Time scale –usually start study when start program---implementing intervention issues
– More external validity; less rigorous designs for internal validity
Part 2: Example of Issues Faced in Part 2: Example of Issues Faced in Using Evaluations for PART and Using Evaluations for PART and
BudgetBudget
PPSS Postsecondary College PPSS Postsecondary College Access and Retention Studies Access and Retention Studies
Case HistoryCase History
Table 1. Summary of Selected Federal College Support Programs: FY07-08
Federal PROGRAMS and Year Begun Total funding
Number of
grants Number served
Average award
Amount per
person served
PART RATING and Year Parted
Educational Opportunity Centers (1974) $47,006,891 125 195,795 $376,005 $240
Results not demonstrated
(2005)
McNair (1989) $45,604,173 190 5,210 $240,021 $8,753 Moderately
Effective
Student Support Services (1970) $271,566,777 949 199,499 $286,160 $1,361
Moderately Effective
(2003)
Talent Search (1971) $142,884,182 471 366,330 $303,363 $390
Moderately Effective
(2005)
Upward Bound (1965) $328,528,379 1017 71,247 $323,036 $4,611 Ineffective
2002
Upward Bound Math Science (1991) $33,976,483 126 6,808 $269,654 $4,990
Ineffective 2002 (with
UB)
Veterans Upward Bound (1974) $13,351,830 46 5,660 $290,257 $2,358
Ineffective 2002 (with
UB)
GEAR UP (State Grants)
(1998) $114,929,000 40 424,513 $2,873.00 $271
Adequate
(2003)
GEAR UP (Partnership
Grants) (1998) $186,953,000 174 331,245 $1,074,442 $564
Adequate
(2003)
Rough Estimates of Percent of Rough Estimates of Percent of Eligible Population ServedEligible Population Served
PPSS College Access and Postsecondary Studies Topics Effects Summary and Policy History
StudentSupportServices (SSS)
Propensity matching found positive impact for participation in SSS supported services whether funded by the SSS project or some other sources on college retention and completionModerately effective- 2003
Talent Search
Propensity matching found positive impacts in 3 states using longitudinal data bases on high school completion, financial aid and postsecondary entranceModerately effective-2005
GEAR Up Matched comparison of 7th graders in original cohort when re-surveyed in 8th grade found increases in parent knowledge and aspirations and student knowledge. Did not find impact on academic grades or course taking. Found impact on course taking for some sub-groups.Adequate- 2003
Upward Bound Math/Science
Propensity matching found positive impacts on college entrance, 4-year degree completion, planning to major in math or science and majoring in social science; majoring in math or natural science not statistically significant although positive direction—small n’s
Upward Bound
Ineffective—PART-2002
•Gold Standard Random assignment study with inverse of probability weights for national estimates begun in early 1990s and followed until 2004;•Several Mathematica reports over 10 years (1999-2009) reported no overall positive findings but large effects for students more academically at risk and with lower expectations at baseline.•On basis of study given “ineffective rating” in PART in 2002• Zero funding in Bush Budget FY2005 & FY2006• OMB efforts to reform based on study results--UB initiative & Absolute Priority-1/3 at risk•New IES Study to Evaluate started in 2006 –large opposition by UB grantees Congress cancelled in 2007•HEOA:2008 contains prohibition of certain RA—where required to recruit more than normally do for purposes of study; HEOA also cancelled Absolute Priority•Obama budgets level funded programs in FY10 despite establishing large college access and completion innovation fund•Recent re-analyses by ED staff person who was COR for study found these estimates were confounded by bias in favor of the control group caused by one project with extreme weight (26 percent of the total) who had a large imbalance between treatment and control group. Control group members were more likely to have high expectations, less likely to be academically at risk and more likely to be older at start of study. • A re-analyses found statistically significant and substantial impacts for UB on the key outcomes: postsecondary entrance, award of financial aid, and postsecondary degree completion including •Re-analysis Found Treatment on Treated (TOT) 50 percent increase in BA attainment for the average of 66 of the 67 projects in the sample.
National Evaluation of Upward National Evaluation of Upward Bound: 1992-2003-04Bound: 1992-2003-04
• Controversial Study—random assignment combined with probability national sample—very rare
• Mathematica reported findings of lack of overall impact but seeming large impact for higher academic risk students and those with lower expectations
• Ineffective PART Rating ---OMB attempt to reform program with UB Initiatives and Absolute Priority to Serve more at-risk students—fit with priority of Bush administration
• Bush Budget called for zero funding of all pre-college programs in FY05 and FY06
Policy History (cont)Policy History (cont)
• UB 2006 Absolute Priority to serve 1/3 at-risk and 9th grade
• New IES random assignment study to evaluate begun 2006
• Following strong objections from UB projects and Council on Opportunity in Education (COE), Congress prohibited funding in 2007 and cancelled by ED in 2008
HEOA:2008 & TRIO FY10 HEOA:2008 & TRIO FY10 Budget Budget
– HEOA:2008• Affirms goal of evaluation to identify: What are best practices
to achieve goals of program?• Contains prohibition on certain types of random assignment
studies that would require projects to recruit more than would normally serve only for the purposes of the studies—denial of service issues—work with IRB’s
• Cancelled At-Risk Absolute Priority• HEOA specifies should begin a new rigorous UB study by
2010– Obama Budget for FY10—level funded TRIO and GEAR UP while
putting in large money into new college access and completion innovation funds to attain priorities—5 million new degrees
ED PPSS Staff QA Re-examination of ED PPSS Staff QA Re-examination of Sample Design and Re-AnalysesSample Design and Re-Analyses
• Found several serious bias issues combined to confound the results published by Mathematica in 2004 and 2009 concerning the UB program
• Re-examination found study impact estimates suffer from: 1) uncontrolled for bias in favor of control group; 2) unequal weighting with one project with 26 percent of weight; and 3) serious representational issues with regard to 4-year grantees
• When these issues are addressed using standard statistical methods—the study shows strong positive impacts for the UB program
Basic Assumptions of Random Assignment Studies –Basic Assumptions of Random Assignment Studies –
1. Sample representative of population to which wish to generalize
2. Treatment and control group are equivalent
3. Treatment and control group treated equally except for the treatment
4. Treatment and control group are mutually exclusive with regard to the treatment
Extreme unequal weighting and Extreme unequal weighting and serious representation issuesserious representation issues
• Project with 26 percent of weight (known as 69) was sole representative of 4-year public strata, but was a former 2-year school with largely less than 2-year programs
• Project partnered with job training program
• Inadequate representation of 4-year
Figure 5. Percent of sum of the weights by project of the 67 projects making up the Upward Bound national evaluation sample: study conducted 1992-93-2003-04
2 6 .3 8
0
5
1 0
1 5
2 0
2 5
3 0
P e rc e n t o f w e ig h t
N OTE : O f the 67 projects making up the UB sample just over half (54 percent) have less than 1 percent of the weights each and one project (69) accounts for 26.4 percent of the weights. SOURCE : D ata tabulated (D ecember 2007) by Policy and Program Studies Service (PPSS) of O ffice of Planning, Evaluation and Policy D evelopment (O PEPD ) US D epartment of Education (ED ) using national evaluation of Upward Bound data files: study conducted 1992 -93-2003-04.
Bias in Favor of Control Group: Bias in Favor of Control Group: Imbalance in Project 69Imbalance in Project 69
Almost Balance in 66 other Almost Balance in 66 other ProjectsProjects
Uncontrolled for Bias in Favor of Control Uncontrolled for Bias in Favor of Control Group in Overall Sample with project 69 Group in Overall Sample with project 69
includedincluded
Impact Estimates from Two Stage Instrumental Variables Regression for Impact Estimates from Two Stage Instrumental Variables Regression for Percent Obtaining a BA in +6 years based on UB Random Assignment Percent Obtaining a BA in +6 years based on UB Random Assignment
EvaluationEvaluation
Note: All estimates significant at the .01 level or higher. Estimates based on 66 of 67 projects in sample representing 74 percent of UB at the time of the study. One project removed due to introducing bias into estimates and representational issues.
Estimated tax impact of UB compared to estimated cost Estimated tax impact of UB compared to estimated cost based on average length of participationbased on average length of participation
Increased taxes collected are 4.9 times higher than per participant cost of program
Part 3: Lessons LearnedPart 3: Lessons Learned• Pay attention to sampling and non-sampling errors—evaluate the
evaluations
• Must ask question—Is the state of the art of evaluation as we practice it really capable of being the source of differentiating budget allocations?
• Zero Sum Issues---How to get around the “zero sum” game in which projects are competing with each other for scarce resources—efforts to game the system---
• Stakeholder Involvement critical for program improvement---Partnership ----Focus on how programs can work together and contribute to the shared goals—formative assessment
• Is non-punitive accountability possible ?
• Strength Finder Approach as opposed to focusing on deficits--Possible to work from strengths of program rather than weaknesses
Current Models in Evaluation Current Models in Evaluation ProfessionProfession
• Partnership –engage practitioners—formative assessment—evaluation as tool for improvement—using data to improve
• Utilization Focused—Users and Client focus--Continuous Improvement—evaluation
• Standards based (feasible, accurate, ethical, transparent, useful)• Ex Ante—Theory of change—what impact can reasonably expect
given the intervention and given the system—multiple methods• Systems theory—role in contributing to the whole--interactions• Complexity theory—changing conditions-adaptation needed—
away from summative—never repeat same situation exactly—rapid time feedback for adaptation and innovation
GEAR UP Next Generation GEAR UP Next Generation StudiesStudies
• Partnership Design work—systemic review, focus groups with grantees, expert papers on topics, APR analyses
• Concept of using data as feedback for program improvement—Engagement of practitioners in defining interventions to study in rigorous manner—leverage institutional research capabilities—cross site analyses– Issued call for grantee partnerships to submit
statements of interest in planning awards for rigorous studies—July 2009—$25,000
– About to award 44 small planning/proposal awards to grantees—submit developed proposals in Spring
– Implementation awards 2010-2012—
Process of Evaluations for Process of Evaluations for ImprovementImprovement