The EEF by numbers
description
Transcript of The EEF by numbers
![Page 1: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/1.jpg)
Building Evidence in Education:Workshop for EEF evaluators
2nd June: York6th June: London
www.educationendowmentfoundation.org.uk
![Page 2: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/2.jpg)
The EEF by numbers
83 evaluationsfunded to
date
3,000 schools
participating in projects
34 topics in
the Toolkit
16 independent evaluation
teams
600,000 pupils involved in EEF projects
14 members of EEF team
£220mestimated spend over lifetime of
the EEF
6,000 heads
presented to since launch
10 reports
published
![Page 3: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/3.jpg)
Session 1: Design
RCT design, power calculations and randomisation Ben Styles (NFER)
Maximising power using the NPDJohn Jerrim (Institute of Education)
![Page 4: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/4.jpg)
RCT designPower calculations and randomisation
Ben Styles
Education Endowment FoundationJune 2014
![Page 5: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/5.jpg)
RCT design
• The ideal trial• Methods of randomisation• Power calculations• Syntax exercise!
![Page 6: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/6.jpg)
A statistician’s ideal trial
• Randomly select eligible pupils from NPD• No consent!• Simple randomisation of pupils to intervention
and control groups• No attrition• No data matching problems• No measurement error
![Page 7: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/7.jpg)
BEFORE YOU START !
1. Trial registration: specification of primary and secondary outcomes in addition to sub-group analyses
2. Recruit participants and explain method to stakeholders
3. Select participants according to fixed eligibility criteria
4. Obtain consent
5. Baseline outcome measurement (or use existing administrative data)
6. Randomise eligible participants into groups (evaluator carries out randomisation)
7. Intervention runs in experimental group; control receives ‘business-as-usual’/an alternative activity
8. Administer follow-up measurement (evaluator)
9. Intention-to-treat analysis followed by reporting as per CONSORT guidelines
10. Control receives intervention (under what circumstances?)
![Page 8: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/8.jpg)
Why we depart from the ideal
• Schools manage pupils!• Nature of the intervention• Contamination – how serious is the risk?
![Page 9: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/9.jpg)
Restricted randomisation?• Use simple randomisation where you can• Timetable considerations in a pupil-randomised
trial → stratify by school• Important predictor variable with small and
important category → stratify by predictor• Fewer than 20 schools → minimise
http://minimpy.sourceforge.net/ • Multiple recruitment tranches → blocked• Pairing → BAD IDEA!
![Page 10: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/10.jpg)
Restricted randomisationSimple randomisation Restricted randomisation
Restricted randomisation more complicated and can go wrong. Take strata into account in analysis: http://www.bmj.com/content/345/bmj.e5840
![Page 11: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/11.jpg)
To remember!
If you have restricted your randomisation using a factor that is associated with the outcome (e.g. school) THEN
INCLUDE THE FACTOR AS A COVARIATE IN YOUR ANALYSIS
![Page 12: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/12.jpg)
Chance imbalance at baseline
• As distinct from bias induced by measurement attrition
• Can be quite large in small trials e.g. on baseline measure
• Include covariate in final analysis
![Page 13: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/13.jpg)
Sample size calculations
• School or pupil-randomised?• Intra-cluster correlation• Correlation between covariate and outcome• Expected effect size• p(type I error)=0.05; power=0.8• Attrition
![Page 14: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/14.jpg)
Rule of thumb
Lehr, 1992
![Page 15: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/15.jpg)
Pupil randomised• ICC = 0• Correlation between baseline and outcome:
http://educationendowmentfoundation.org.uk/uploads/pdf/Pre-testing_paper.pdf and your previous work
• Effect size: previous evidence; cost-effectiveness; EEF security ratings
• Attrition: EEF allow recruitment to be 15% above sample size after attrition
![Page 16: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/16.jpg)
Cluster-randomised
• Same as for pupils aside from ICC• Proportion of total variance that is due to
between cluster variance• EEF pre-testing paper has some useful
guidance• Pre-test also reduces ICC e.g. from 0.2 to
0.15 for KS2 baseline, GCSE outcome
![Page 17: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/17.jpg)
MDES
• Minimum detectable effect size• EEF require this on the basis of real
parameters for the security rating• (avoid retrospective power calculation)• How good were my estimates?
![Page 18: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/18.jpg)
Sample size spreadsheet(fill in the highlighted boxes) Scenario 1Expected number of pupils per school being sampled 180
ROH (Intra-class correlation - percentage of variance in outcome being studied attributable to school attended) 0.15Deff (adjustment for nested design) 27.85Confidence level (of test we will use to assess effect) 95.0%Critical T-value 1.96 Correlation between before and after scores 0.70SD of residuals in scores (if scores have SD of 1) 0.71
Expected effect size (in terms of absolute outcome scores) 0.2Expected effect size (in terms of residual outcome scores) 0.28n(schools) in intervention 31n(schools) in control 31n(pupils) in intervention 5580n(pupils) in control 5580Expected SE of difference between groups (in SDs) 0.10Power 80.0%
![Page 19: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/19.jpg)
0.00 0.05 0.10 0.15 0.20 0.25 0.300%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
n(intervention)=31; n(control)=31
Effect size
Pow
er
![Page 20: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/20.jpg)
Running the randomisation
SYNTAX EXERCISE
• In pairs, explain what each of the steps does• How many schools were randomised in this
block?
![Page 21: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/21.jpg)
Conclusions
• Always think of any RCT (any quantitative impact evaluation) as a departure from the ideal trial
• The design, power calculations, method of randomisation and analysis all interrelate and need to be consistent
![Page 22: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/22.jpg)
Maximising power using the NPD
John Jerrim (Institute of Education)
![Page 23: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/23.jpg)
StructureHow much power do EEF trials currently have?• PISA, power, star ratings and current EEF trials
Exercise• Work in groups to design an EEF trial• Goal = Maximise power at minimal cost
My answers• How might I try to maximise power?
Your answers! / Discussion
![Page 24: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/24.jpg)
Power in context
Effect sizes, PISA rankings and EEF padlock ratings
![Page 25: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/25.jpg)
How powerful are EEF trials thus far?
0 0.1 0.2 0.3 0.4 0.5
Detectable effect
EEF secondary school trialsAs of 01 / 05 / 2014
Detectable effect sizeMean = 0.276Median = 0.25
Between 4* and 5* by EEF guidelines….
![Page 26: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/26.jpg)
Power and the PISA reading rankings
United Kingdom
France
Viet Nam
Switzerland
Netherlands
New Zealand
Liechtenstein
Canada
Chinese Taipei
Korea
Singapore
Shanghai-China
-0.10 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80UK’s current position
Effect size = 0.10
Effect size = 0.20 (EEF 5*)
Effect size = 0.30 (EEF 4*)MEDIAN EEF TRIAL = 0.25
Effect size = 0.40 (EEF 3*)
IMPLICATIONEffect sizes of 0.20 are
damn big
… particularly given pretty small doses we are
giving
Effect size = 0.50 (EEF 2*)
![Page 27: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/27.jpg)
Do we currently have a power problem?
- Quite possibly!- So trying to get more power in future trials very important…..
![Page 28: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/28.jpg)
Exercise
![Page 29: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/29.jpg)
Task: In groups, discuss how you would design the following trial
Intervention = Teaching children how to play chessMaximum number of treatment schools = 20 secondary schoolsYear group = Year 7Level of randomisation = School levelTest = One-to-one non-verbal IQ assessment with trained educationalist (end of year 7)Control condition = ‘Business as usual’Study type = ‘Efficacy’ study (proof of concept)
Objective: Maximise power at minimum cost
How would you design this trial to meet these twin objectives?What could you do to increase power in this trialE.g. Would you use a baseline test? If so, what?
Exercise
![Page 30: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/30.jpg)
My answers
The usual suspects…..…and less obvious options
![Page 31: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/31.jpg)
The usual suspects…..
1. Use a regression model and include baseline covariates…..
- Adding controls explains variance. Boosts power
2. Use Key stage 2 test scores as “pre-test”….- Point of baseline covariates is to explain variance- KS 2 scores in maths likely to be reasonably correlated with outcome (non-
verbal IQ)- CHEAP! From NPD.
3. Stratify the sample prior to randomisation- Potentially reduces error variance. Thus boosts power.- Additional advantages. Balance of baseline characteristics.
4. Really engage with control schools- Make sure we minimise loss of sample through attrition
![Page 32: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/32.jpg)
Less ‘obvious’ options….
![Page 33: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/33.jpg)
Don’t test every child……..There are around 200 children per secondary school…..
…. One-to-one testing is expensive
…Testing more than 50 pupils buys you little additional power
RANDOMLY SAMPLE PUPILS WITHIN SCHOOLS!
Assumptions20 schoolsPre/post corr of 0.7580% powerRho = 0.15
0 20 40 60 80 100 120 140 160 180 2000.35
0.40
0.45
0.50
0.55
0.60
Cluster size
Det
ecta
ble
effe
ct
![Page 34: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/34.jpg)
…..use an unequal sampling fraction
• We all know that ↑ clusters (k) means ↑ power
• This example: limited to only a small number of treatment schools (20)
• ….but control condition was non-intrusive and cheap
• So don’t just recruit 20 control schools as well – recruit more!
• Nothing about RCT’s mean we need equal k for treatment and control
• Power calculation becomes more complex (anybody know it!?)
![Page 35: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/35.jpg)
Use more homogenous selection of schools….
0204060801000.00
0.05
0.10
0.15
0.20
0.25
0.30
Percentage of all UK schools in population
RHO
ALL UK SCHOOLS
LOW PERFORMING SCHOOLS ONLY
PISA 2009 data
All UK schools:
“Worst” 25% of schools only:
![Page 36: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/36.jpg)
Why does rho decline??
0204060801000
20
40
60
80
100Within school variation
Percentage of all UK schools in population
sigma
The within school variation barely changes …..
…. While the between school variation declines substantially
![Page 37: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/37.jpg)
Implications
• As example is an efficacy study why not restrict attention to low performing schools only?
- Boosts power!- Fits with EEF mandate (close performance gap)- Not worried about generalisability
• We implicitly do this anyway (e.g. by doing trials in just one or two LA’s)……
• …..but can we do it in a smarter way???
• Little appreciated trade-off between POWER and GENERALISBILITY
- Long-term implications for EEF- Trial representative of England population very hard to achieve
![Page 38: The EEF by numbers](https://reader035.fdocuments.in/reader035/viewer/2022070504/56816606550346895dd937ab/html5/thumbnails/38.jpg)
ConclusionsDo we have a “power problem”?
• Quite possibly• Median detectable effect size = 0.25 in EEF secondary school trials• If were to boost UK reading PISA scores by this amount, we would
move above Canada, Taiwan and Finland in the rankings…..
Ways to potentially increase power• Include baseline covariates (from NPD where possible)• Stratify the sample prior to randomisation• Engage with control schools!• Do you need to test every child? Practical alternatives?• Could you increase number of control schools without adding much
to cost (unequal randomisation fraction)• Could you restrict your focus to a narrower population? (e.g. low
performing schools only)?