Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June...
-
Upload
garey-mccormick -
Category
Documents
-
view
216 -
download
0
Transcript of Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June...
![Page 1: Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062408/56649f145503460f94c284b1/html5/thumbnails/1.jpg)
Classifying Designs of MSP Evaluations
Lessons Learned and Recommendations
Barbara E. Lovitts
June 11, 2008
![Page 2: Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062408/56649f145503460f94c284b1/html5/thumbnails/2.jpg)
2
The Sample of Projects
First level of screening
1. Final year – APR start and end dates.
2. Type of evaluation design:
• Experimental (random assignment)
• Quasi-experimental
– Comparison group study with equating
– Regression-discontinuity study
![Page 3: Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062408/56649f145503460f94c284b1/html5/thumbnails/3.jpg)
3
Findings – Final Year
• Started with 124 projects.
• Ended with 88 projects (results are not final).
• Projects eliminated based on:
– Evidence in project narrative or evaluation report.
– Information provided by the Project Director.
![Page 4: Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062408/56649f145503460f94c284b1/html5/thumbnails/4.jpg)
4
Evaluation Design
Type of Design Starting Number Ending Number
Experimental 3* 0
Quasi-Experimental
47* 19*
*Results are not final.
![Page 5: Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062408/56649f145503460f94c284b1/html5/thumbnails/5.jpg)
5
Findings - Quasi-Experimental Designs
• Many studies had a one-group pre-/post design (eliminated).
• In many treatment/comparison group studies, the comparison teachers were in the same school at the same grade level as the treatment teachers (not eliminated).
![Page 6: Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062408/56649f145503460f94c284b1/html5/thumbnails/6.jpg)
6
Applying the Rubric
Challenges
• Projects used different designs to evaluate different outcomes (e.g., content knowledge, pedagogy, efficacy)
• Projects used different designs to evaluate different participant groups (e.g., teachers, students)
• Projects used different designs at different grade levels or for different instruments.
![Page 7: Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062408/56649f145503460f94c284b1/html5/thumbnails/7.jpg)
7
Applying the Rubric
Solution
• Identify each measured outcome and group (e.g., 5th grade teachers – earth science content knowledge).
• Apply the rubric to each outcome/group combination that was evaluated using an experimental or a quasi-experimental design
![Page 8: Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062408/56649f145503460f94c284b1/html5/thumbnails/8.jpg)
8
Applying the Rubric
A. Baseline Equivalence of Groups (Quasi- Experimental Only)
Criterion:
• No significant pre-intervention differences between treatment and comparison on variables related to the study’s key outcomes; or
• Adequate steps were taken to address the lack of baseline equivalence in the statistical analysis.
![Page 9: Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062408/56649f145503460f94c284b1/html5/thumbnails/9.jpg)
9
Applying the Rubric
Common Issues:
• No pre-test information on outcome-related measures.
• Within groups pre-test results given for the treatment and comparison groups, but no tests of between groups differences.
• Projects match groups on unit of assignment (e.g., schools, teachers), but do not provide data on unit of assessment (e.g., teachers, students).
![Page 10: Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062408/56649f145503460f94c284b1/html5/thumbnails/10.jpg)
10
Applying the Rubric
Recommendation: Baseline Equivalence
Participant Group and Outcome
Treatment Pre-test
Comparison Pre-test
p-value
mean or percent
mean or percent
mean or percent
mean or percent
mean or percent
mean or percent
![Page 11: Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062408/56649f145503460f94c284b1/html5/thumbnails/11.jpg)
11
Applying the Rubric
B. Sample Size
Criterion:
• Sample size was adequate
– Based on a power analysis with recommended:
• significance level = 0.05
• power = 0.8
• minimum detectable effect informed by the literature or otherwise justified
![Page 12: Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062408/56649f145503460f94c284b1/html5/thumbnails/12.jpg)
12
Applying the Rubric
Common Issues:
• Power analyses rarely conducted.
• Different sample sizes given throughout the APR and Evaluation Report.
• Sample sizes in the APR and Evaluation Report do not match.
• Report sample size for teachers but not for students, or for students but not for teachers.
• Subgroup sizes:
– are not reported
– reported inconsistently
– vary by discipline, subdiscipline (e.g., earth science, physical science), and/or grade level
![Page 13: Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062408/56649f145503460f94c284b1/html5/thumbnails/13.jpg)
13
Applying the Rubric
Recommendation: Sample Size
Participant Group and Outcome
Treatment (Final sample
size)
Comparison (Final sample size)
Power Calculation
Assumptions (if available)
N N Alpha = Power = MDE =
N N Alpha = Power = MDE =
Recommended significance levels: alpha = 0.05, power = 0.8, Minimal Detectable Effect (MDE) = informed by the literature.
![Page 14: Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062408/56649f145503460f94c284b1/html5/thumbnails/14.jpg)
14
C. Quality of the Data Collection Methods
Criterion:
• The study used existing data collection instruments that had already been deemed valid and reliable to measure key outcomes; or
• The study used data collection instruments developed specifically for the study that were sufficiently pre-tested with subjects who were comparable to the study sample.
![Page 15: Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062408/56649f145503460f94c284b1/html5/thumbnails/15.jpg)
15
Applying the Rubric
Common Issues:
• Locally developed instruments not tested for validity or reliability.
• Identify an instrument in the APR and select “not tested for validity or reliability,” but a Google search shows that the instrument has been tested for validity and reliability.
• Use many instruments but do not report validity or reliability for all of them.
• Do not provide results for all instruments.
![Page 16: Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062408/56649f145503460f94c284b1/html5/thumbnails/16.jpg)
16
Applying the Rubric
Recommendation: Data Collection Instruments
Participant Group and Outcome
Name of Instrument Evidence for Validity and Reliability
Teacher content knowledge – math
DTAMS {cite website or other reference were evidence can be found}
Teacher content knowledge – marine biology
Locally developed instrument
Narrative description of the evidence
Teacher content knowledge - science
Borrowed items from [instrument name(s)].
Total # of items. # of items borrowed from each instrument.
![Page 17: Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062408/56649f145503460f94c284b1/html5/thumbnails/17.jpg)
17
Applying the Rubric
D. Quality of the Data Collection Methods
Criterion:
• The methods, procedures, and timeframes used to collect the key outcome data from treatment and comparison groups were the same.
![Page 18: Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062408/56649f145503460f94c284b1/html5/thumbnails/18.jpg)
18
Applying the Rubric
Common Issues:
• Little to no information is provided in general about data collection.
• Information is provided for the treatment group but for not the comparison group.
• Treatment teachers typically receive the pre-test before the summer institute and a post-test at the end of the summer institute, and sometime another post-test at the end of the school year.
• Comparison teachers receive a pre-test at the beginning of the school year and a post-test at the end of the school year.
• Comparison teachers receive a single test at the beginning of the year.
![Page 19: Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062408/56649f145503460f94c284b1/html5/thumbnails/19.jpg)
19
Applying the Rubric
Recommendation: Quality of Data Collection Methods
1. Participant Group and Outcome ______________
A. Method/procedure for collecting data from treatment group (describe):
B. Was the same method/procedure used to collect data from the comparison group? ___ Yes ___ No If no, please describe how the method/procedure was different:
(continued)
![Page 20: Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062408/56649f145503460f94c284b1/html5/thumbnails/20.jpg)
20
Applying the Rubric
C. Time Frame
Participant Group and Outcome
Month and Year
Pre-test Post-test Repeated Post-test
Treatment group
Comparison Group
![Page 21: Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062408/56649f145503460f94c284b1/html5/thumbnails/21.jpg)
21
Applying the Rubric
E. Data Reduction Rates
Criterion:
• The study measured the key outcome variable(s) in the post-tests for at least 70% of the original study sample (treatment and comparison groups combined)
• 0r there is evidence that the high rates of data reduction were unrelated to the intervention; AND
• The proportion of the original study sample that was retained in the follow-up data collection activities (e.g., post-intervention surveys) and/or for whom post-intervention data were provided (e.g., test scores) was similar for both the treatment and comparison groups (i.e., less than or equal to a 15% difference),
• Or the proportion of the original study sample that was retained in the follow-up data collection was different for the treatment and comparison groups, and sufficient steps were taken to address this differential attrition were not taken in the statistical analysis.
![Page 22: Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062408/56649f145503460f94c284b1/html5/thumbnails/22.jpg)
22
Applying the Rubric
Common Issues:
• Attrition information is typically not reported.
• Abt can sometimes calculate attrition, but it is difficult because sample and subsample sizes are not reported consistently.
• If projects provide data on attrition or if Abt can calculate it, it is usually for the treatment group only.
• Projects rarely provide data on student attrition, though some occasionally mention that there is a lot of student mobility, but it is not quantified.
![Page 23: Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062408/56649f145503460f94c284b1/html5/thumbnails/23.jpg)
23
Applying the Rubric
Recommendation: Data Reduction Rates
Participant Group and Outcome
Original Sample
Size
Pre-test sample
size
Post-test
sample size
Post-test N/Pre-test
N
Post-test N/ Original
N
Treatment
Comparison
![Page 24: Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062408/56649f145503460f94c284b1/html5/thumbnails/24.jpg)
24
Applying the Rubric
E. Relevant Data
Criterion:
• The final report includes treatment and comparison group post-test means and tests of significance for key outcomes; or
• Provides sufficient information for calculation of statistical significance (e.g., mean, sample size, standard deviation/standard error).
![Page 25: Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062408/56649f145503460f94c284b1/html5/thumbnails/25.jpg)
25
Applying the Rubric
Common Issues:
• Projects reports that the results were significant or non-significant but do not provide supporting data.
• Projects provide p-values but do not provide means or percents.
• Projects provide means/percents, p-values, but not standard deviations.
• Projects provide within group data for the treatment and comparison groups but do not provide between-group tests of significance.
• Projects with treatment and comparison groups provide data for the treatment group only.
• Projects provide significant results but do not identify the type of statistical test they performed.
• Projects provide an overwhelming amount of data for a large number of subgroups (e.g., on individual test or survey items).
![Page 26: Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062408/56649f145503460f94c284b1/html5/thumbnails/26.jpg)
26
Applying the Rubric
Recommendation: Relevant Data
Participant Group and Outcome
Mean or Percent
SD or SE t, F, or Chi square
p-value
Treatment
Comparison
![Page 27: Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062408/56649f145503460f94c284b1/html5/thumbnails/27.jpg)
27