7/27/2019 Teacher Preparation Program Student Performance Data Models NCTQ Report
1/9
Just as the evaluation o teachers is evolving into a multiaceted assessment, so
too is the evaluation o teacher preparation programs. Ideally, evaluation o teacher
preparation programs would involve a review o the program against rigorous
standards, targeted inspection by objective experts and collection o objective data,
o which data on the learning gains o graduates students are one part. This paper
addresses only the use o data on the learning gains o graduates students to
evaluate teacher preparation programs. The use o these data holds great promise
because it allows comparison o one program with another in the same state and
can help institutions to improve program quality. With this great value, however,
comes great challenge.
Because the use o student perormance data to evaluate teacher prep is so
challenging, states now developing appropriate models might benet rom the
experience o early adopters. Here we oer six core principles or strong design
based on the models developed in three pioneering states: Louisiana, North
Carolina and Tennessee. The principles are outlined below, with a more detailed
description o the principles ollowing. While it is possible that current teacher
prep data models may be sidelined or at least supplemented in the uture as betterteacher evaluations become an even richer source o data on student and teacher
perormance that can be connected back to teacher preparation programs, the
same design principles described here will apply to uture models as well.
Teacher preparation program
student perormance data models:Six core design principles
1 Comparisons o institutions across state lines on the basis o results rom dierent state teacher prep data models are not possibleat present. The introduction o common student assessments through the PARCC and Smarter Balanced consortia should acilitatesome comparisons.
2 The purpose o this paper is to address broad design principles, not the statistical undamentals o the various teacher prep datamodels. To date, states have selected dierent statistical models or their teacher prep data models, discussion o which is beyondthe scope o this paper. (Louisiana uses a hierarchical linear model, North Carolina, a covariate adjustment model andTennessee, a longitudinal mixed eects model.)
To date, these models have not been formally christened with a name that distinguishes them
from the student performance data models that are used to evaluate the performance of
individual schools and teachers. To introduce a common nomenclature, we propose that they
be calledteacher preparation student perormance data modelsorteacher prep
data modelsand will use teacher prep data models throughout this paper.
Observation
Objectivedata
Othermeasures
Teacher Evaluation
Studentperformance
data
Inspection
Objectivedata
Standards
Teacher Prep ProgramEvaluation
Studentperformance
data
APRIL 2013
Student Perormance Data:
One Bite o the Apple
7/27/2019 Teacher Preparation Program Student Performance Data Models NCTQ Report
2/92
Principle #1:Data need to be
suciently specic.Teacher prep data models
should generate ndings
at the level o specic
certication programswithin an institution, not
just the institution in the
aggregate.
Principle #4:Try to keep politics
out of the technical
design of the teacher
prep student data
model.The teacherprep student data model
is a statistical model, not
a political statement, and
its design should include
the student, classroom and
school-level variables that
analysis indicates are relevant.
Principle #2:Identifying theoutliers
is whats most important.The rst priority needs to be a
model that can accurately identiy
the value added by relatively larger
programs producing graduates
who are at the high or low ends othe efectiveness spectrum. This
may involve a trade-of between
the capacity o teacher prep data
models to produce ndings or
all teacher preparation programs
and their capacity to produce
actionable ndings.
Principle #5:Check the impact o
the distribution o
graduates among the
states K-12 schools.It is possible that the distribution
o graduates among the
states K-12 schools afects the
attribution o efects to teacher
preparation programs.
Principle #3:Use an absolute
standard or
comparison.A teacher prep student
data model should
eventually evaluate how
well a programs graduates
perorm relative to an
absolute standard o new
teacher perormance.
Principle #6:Findings must
be clearlycommunicated.Teacher prep student data
model ndings should be
explained in reports that
are readily comprehensible
to policymakers, program
administrators and thepublic at large.
Six Core Principles or the Design
and Use o Teacher Prep Data Models
7/27/2019 Teacher Preparation Program Student Performance Data Models NCTQ Report
3/9Teacher preparation program student performance data models:Six core design principles
Principle #1:Data need to be suciently specic.Teacher prep data models should generate ndings at the level o specic
certication programs within an institution, not just the institution in the
aggregate.Institutions o higher education (IHEs) generally house multiple teacher preparation programs. Our own
analysis nds that IHEs house an average o ve core elementary, secondary and special education programs at
undergraduate and/or graduate levels o training. We have ound tremendous variation among these programs.
Every aspect o teacher education can vary rom program to program, including admission standards, requiredcoursework, the amount o clinical practice and the assigned aculty. Yet, as the table below indicates, or the
three state teacher prep data models that now issue public reports, only the model developed in North Carolina
is designed to produce ndings at the level o a specifcprogram rather than o the institution.
Only North Carolina does a relatively good job o matching its student data model fndings with specifc programs.Absent that match, program accountability is impossible.
Given the variation among programs within the same institution, an aggregation o results rom numerous
programs in one institutional report makes it dicult to ascertain i individual programs are actually
producing more or ewer eective teachers. For this reason, a system designed only or institutional
accountability is o questionable value.
An explanation or why states are choosing to report a nding at the level o the institution as opposed to
the vastly more useul nding at the level o the program is that, at least in the near term, these models
would not able to produce reliable ndings i they were to urther disaggregate. Many institutions simply
produce too ew graduates in any one program to generate sucient perormance data. As Principle 2
discusses, states should consider that it may be better to produce statistically meaningul results on the
ew programs that produce sucient graduates to do so than to produce unreliable results on virtually all
programs or institutions.
Louisiana North Carolina Tennessee
Structure ofprep programs
What the teacherprep student datamodel evaluates
Structure ofprep programs
What the teacherprep student datamodel evaluates
Structure ofprep programs
What the teacherprep student datamodel evaluates
Grade span?Grades 1-5Grades 6-12
Grades 4-9 Grades K-6Grades 6-9Grades 9-12
Grades 3-5Grades 6-8Grades 9-12
Grades K-6Grades 7-12
Grades 4-8Grades 9-12
Undergrad,grad or both?
Separateundergrad andgrad oered
Onlyundergraddata
Separateundergrad andgrad oered
Onlyundergraddata
Separateundergrad andgrad oered
Combinedundergrad andgrad data
The mismatch between how institutions organize preparationand what states student data models report:
More on the Six Core
Design Principles
7/27/2019 Teacher Preparation Program Student Performance Data Models NCTQ Report
4/94
Principle #2:Identiying the outliers is whats most important.The rst priority needs to be a model that can accurately identiy the value
added by relatively larger programs producing graduates who are at the
high or low ends o the efectiveness spectrum. This may involve a trade-of between the capacity o teacher prep data models to produce ndings
or all teacher preparation programs and their capacity to produce
actionable ndings.
Expectations about what teacher prep student data model ndings can communicate need to be kept
realistic. At best, they will be able to distinguish those programs about which one can be airly condent
that graduates are very eective or very ineective relative to any given standard. Given the many IHEs and
other entities involved in teacher preparation within states, it would be dicult to produce a useul value-
added measure or every single institution. The reason that this is the case lies in the nature o the statistical
analysis as it is applied to data on graduates o a multitude o institutions.
Why does data suciency matter? The average value added associated with programs teachers will most
certainly vary rom program to program in any data set. This variation may indeed refect persistent average
dierences in quality across programs. However, there is always the possibility that the variation has been
amplied by periods o coincidental clumps o either the very eective or the very ineective teachercandidates that can generally be ound in every program. Statistical methods are used to evaluate whether
observed dierences across sets are due to chance or refect underlying dierences in the preparation o
the teachers connected with each set. Dierences that are ound to be very unlikely to arise due to random
chance are called statistically signicant. I actual quality dierences across programs are small as they
oten are -- a large collection o data is needed to be able to reliably identiy dierences among data sets
that refect true quality dierences among programs and to rule out accidental results with some certainty.
To maximize the number o teacher records linked to each preparation program rom which data can be
obtained, teacher prep data models use a variety o data pooling methods. In Louisiana, or example, our
years o successive data are pooled on novice teachers, who are dened as teachers in their rst or second
year o teaching; North Carolina pools ve years o successive data on novice teachers, who are dened ashaving ewer than ve years o experience.
In spite o pooling, production levels can still be so low that no amount o data pooling generates a sucient
number o teacher records or analysis. Because o this, each state has established a threshold o
production or its teacher prep student data model: Louisianas model requires a production threshold o 25
teachers per year or inclusion in its teacher prep student data model; North Carolina, 10; and Tennessee,
ve. In each state, there are a considerable number o small producers who all below the threshold.
Consider this act:Over 40 percent o IHEs nationwide produce 50or ewer teachers in all o their preparation programs combined. Themajority o institutions in any given state produce relatively ew teachers, making it dicult orsome and nearly impossible or others to obtain sucient data.
7/27/2019 Teacher Preparation Program Student Performance Data Models NCTQ Report
5/9Teacher preparation program student performance data models:Six core design principles
Even or those institutions that have enough teacher graduates to be above the threshold or production
and or which their data on graduates will be pooled, the results can still be too statistically imprecise
to determine whether graduates really dier rom the standard to which they are being compared. For
example, 2010-2011 data show that Louisiana State University-Shreveports graduates produce mean
reading scores that are slightly better than the average novice teacher in Louisiana: -1.0 or Shreveportgraduates compared to -1.2 or the average novice. But with a dierence this small, the results may
just refect random variation. In act, Louisianas report on this result indicates that it would take a
larger number o graduates than the 38 now produced annually (even using teacher records rom our
graduating classes and ollowing the perormance o each graduate or two years) to have any certainty
that the eectiveness o Shreveport graduates is actually dierent rom that o the average novice teacher.
The upshot is that regardless of how data are pooled, no teacher prep student
data model can produce the ideal: reliable results on the efectiveness
o a single years cohort o teachers graduating rom every teacher
preparation program in a state.Teacher prep student data model ndings are generally limited to a relatively small subset o programs:
Large programs with data sets adequate or making a rm conclusion regarding their teachers
perormance relative to the states standard, and
Programs o any size whose teachers perormance diers so much rom the states standard o
comparison that it is possible to draw a statistically reliable conclusion.
Reinorcing a point made at the conclusion o the discussion o Principle 1, decisions that are made about
the teacher prep student data model will aect how actionable its ndings will be. For example, i more
institutions can be included in the teacher prep student data model only by combining undergraduate and
graduate program graduates at a given institution, the trade-o in terms o actionable ndings may not
be worth the additional coverage. Likewise, i more than three years o data on program graduates needto be pooled, the trade-o in terms o actionable ndings may be dubious because programs and K-12
school environments may have changed over that time period.
7/27/2019 Teacher Preparation Program Student Performance Data Models NCTQ Report
6/96
Principle #3:Use an absolute standard or comparison.A teacher prep student data model should eventually evaluate how
well a programs graduates perorm relative to an absolute standard
o new teacher perormance.
Teacher prep data models always produce results about teacher preparation programs relative to one
another; results indicate which o the programs or institutions produce graduates that are relatively more
eective than others. Moreover, the current standards or comparison are based not on any absolute
measure o student progress, but instead on the perormance o the average novice teacher in the state,
which varies rom year to year. The result is that the best program in one state may be producing graduates
who are less eective than the graduates o the worst program in another state. Because there is no way
to compare graduates across state lines, it is impossible or any state to know i this is the case.
More interpretive power could be gained rom a states teacher prep student data model i the standard o
comparison were instead based on the amount o progress each novice teachers students should make
annually in terms o normal student learning, perhaps using as a goal college readiness at the end o
high school. While it may take some time to set this type o absolute standard appropriately, and there will
certainly need to be alignment with the standards in the states teacher evaluation system, the act that an
absolute standard or novice teachers could be changed as circumstances demand means that states need
not let the perect be the enemy o the good.
Within a ew years, the 46 states that have adopted the Common Core State Standards plan to be using
one o two sets o standardized tests. Providing that states begin to use more uniorm denitions o novice
teacher (something that now varies among states) and to align their relative or absolute standards o
comparison, these assessments will create even more potential to attach interpretative power to student data
model results, including interstate comparisons o teacher preparation programs. Ultimately, the capacity
to compare preparation programs to one another nationwide, all relative to an absolute standard basedon a national conception o annual progress toward college readiness, could produce the most valuable
inormation or teacher preparation improvement and accountability.
Principle #4:Try to keep politics out o the technical design
o the teacher prep student data model.The teacher prep studentdata model is a statistical model, not a political statement, and its design
should include the student, classroom and school-level variables that
analysis indicates are relevant.A variety o variables can be held constant by their inclusion in the teacher prep student data model:
student-level variables (e.g., gender, race, level o English prociency), classroom/teacher-level variables
(e.g., percentage o students who are identied as gited, class mean prior achievement in math) and school-
level variables (e.g., percentage o students who receive ree or reduced-price meals, school mean prior
achievement in reading). The decision o what variables to hold constant while comparing teachers across
7/27/2019 Teacher Preparation Program Student Performance Data Models NCTQ Report
7/9Teacher preparation program student performance data models:Six core design principles
preparatory institutions needs to be made based on sound scientic reasoning and experimentation
that assesses the degree to which teacher prep student data model results are aected when a
particular variable is included. Variables that actually aect student data model results should be
included to ensure their proper interpretation. Ideally, no political considerations should enter into
decisions about including variables.
How can results be dierent i a variable is not included? Louisiana, or example, includes the
percent o special education students in a classroom as a classroom variable in its teacher prep
student data model. For each additional one percent o special education students in a classroom,
perormance is estimated to decrease by about 1.4 percent o a standard deviation. Were this
variable to be excluded rom the model, the interpretation o the results on the eectiveness o
teachers whose classroom diered in their proportions o special education students would be
aected: some graduates would look worse than others, but only because they teach a higher
proportion o special education students, not because they are truly less eective. Thus, holding
constant the share o children who need special education services in each teachers classroom
would help ensure that the report is not placing postsecondary institutions that produce a
disproportionate number o teachers whose classrooms have a relatively large proportion o special
education children at an unair disadvantage.
Principle #5:Check the impact o the distribution o
graduates among the states K-12 schools.It is possible thatthe distribution o graduates among the states K-12 schools afects
the attribution o efects to teacher preparation programs.
All current teacher prep data models are considered value added because they assess the amount o
student academic growth that can be attributed to the teacher in the context o the student, classroomand school variables that can have an impact on student perormance. However, some statisticians
argue that any variables that are nally included can only account or school characteristics i graduates
rom the programs being evaluated evenly distribute themselves among dierent kinds o K-12 schools.
I all teachers rom one program go to good schools and those rom another go to bad schools, these
statisticians caution that variables included as theoretical controls or school eects wont actually
distinguish whether programs look dierent because their teachers vary in eectiveness or because their
teachers simply manage to nd jobs in schools that vary in ways that aect student perormance.
To ensure that variables used as school-level controls are eective, teacher prep student data model
designers might construct preparation program networks by methods described in the technical
literature to assess the direct and indirect connectivity o all relevant programs in a window o 2-3years (sucient to allow or connectivity, but not so long as to assume that the program and/or the
school have not changed). Model designers may recommend that any program that is not directly or
indirectly connected in the states network o preparation programs not be included in the teacher prep
student data model.
7/27/2019 Teacher Preparation Program Student Performance Data Models NCTQ Report
8/98
Conclusion:States that have commendably pioneered teacher prep data models have had steep learningcurves and have had to make signicant adjustments both because o internal kinks and because data systems
in K-12 education have been undergoing their own growing pains and adjustments. States now in the processo developing teacher prep data models should learn rom the experiences o these states. They should aim to
design a student data model that is part o a robust accountability system (including other orms o objective data,
standards and inspection) and that will ultimately provide actionable program-specic results based on CCSS-
aligned student perormance data on the eectiveness o program graduates relative to a standard based on
annual progress toward college readiness.
Principle #6:Findings must be clearly communicated.Teacher prep student data model ndings should be explained in reports that
are readily comprehensible to policymakers, program administrators and the
public at large.
States teacher prep student data model reports should not be intelligible only to those well versed in statistics. While
technical reports are certainly necessary, o the states currently reporting on its teacher prep student data model,
only North Carolina also includes in its set o publications a report that is meaningul to the lay reader. This report
translates statistics about program graduates perormance into days o instruction. In the case o the University
o North Carolina (UNC) at Greensboro, or example, the report indicates that graduates add the equivalent o just
over two instructional days to their students perormance in elementary mathematics compared to the average North
Carolina novice teacher not produced by a UNC system program.
All states teacher prep student data model reports should, at a minimum, provide the ollowing type o inormation:
Most important o all, teacher prep student data model results reported in terms that can be easily understood
and compared by the lay reader, such as translating scores into a net gain or loss o instructional days.
Programs o any size whose teachers perormance diers so much rom the states standard o comparison that
it is possible to draw a statistically reliable conclusion.
The standard relative to which results are reported. For example, in Louisiana, the standard or comparison in
mathematics is the mean adjustment to student outcomes that would be expected compared to the mean new
teacher eect, which was reported in 2010 to be -3.1. (With the distribution o student outcomes computed
to have a standard deviation o 50, this mean new teacher eect represents a decrease in student perormance
o 6.2 percent o a standard deviation.)
Clear institutional identication o the groups o graduates about which results are reported: For example,
University o Tennessee, Martin, Undergraduate and Graduate, Grades 4-8 or Southeastern Louisiana
University, Undergraduate, Grades 4-9.
The IHEs/programs not included in the teacher prep student data models analysis because they all below the
production size threshold established or the model.
The IHEs/programs or which results are not statistically signicant, making inadvisable a hard determination
regarding the eectiveness o graduates.
7/27/2019 Teacher Preparation Program Student Performance Data Models NCTQ Report
9/9Teacher preparation program student performance data models:Six core design principles
General Reerences
Gansle, K. H., Burns, J. M., & Noell, G., (2010). Value added assessment o teacher preparation in Louisiana: 2005-2006 to 2008-2009 Overview o perormance bands. Louisianas Teacher Quality Initiative. Retrieved rom http://regents.louisiana.gov/assets/docs/TeacherPreparation/200910ValueAddedAssessmentOverviewoPerormanceBandsFINAL82610.pd
Gansle, K. H., Burns, J. M., & Noell, G., (2011). Value added assessment o teacher preparation programs in Louisiana: 2007-2008 to 2009-2010 Overview o 2010-2011 Results. Louisianas Teacher Quality Initiative. Retrieved rom http://www.regents.doa.louisiana.gov/assets/docs/TeacherPreparation/2010-11ValueAddedAssessmentOverviewoResultsNov212011.pd
Henry, G. T., Thompson, C. L., Bastian, K. C., Fortner, C. K., Kershaw, D. C., Marcus, J. V., & Zulli, R. A. (2011). UNC teacher preparationprogram eectiveness report. Chapel Hill, NC: The Carolina Institute or Public Policy. Retrieved rom http://publicpolicy.unc.edu/research/TeacherPrepEectRpt_Final.pd
Henry, G. T., Thompson, C. L., Fortner, C .K., Zulli, R. A., & Kershaw, D. C . (2010). The impacts o teacher preparation on student test scores inNorth Carolina public schools. Chapel Hill, NC: The Carolina Institute or Public Policy. Retrieved rom http://publicpolicy.unc.edu/research/Teacher_Prep_Program_Impact_Final_Report_nc.pd
Mihaly, K., McCaery, D., Sass, T. R., & Lockwood, J. R. (2012). Where you come rom or where you go? Distinguishing between school quality andthe eectiveness o teacher preparation program graduates. National Center or Analysis o Longitudinal Data in Education Research, WorkingPaper 63. Retrieved rom http://www.caldercenter.org/upload/Mihaly_TeacherPrep.pd
Noell, G., & Burns, J. M., (2008). Value added teacher preparation assessment Louisianas Teacher Quality Initiative: Overview o 2007-08 study.
Louisianas Teacher Quality Initiative. Retrieved rom http://www.laregentsarchive.com/Academic/TE/2009/2008-09VA(8.27.09).pd
Noell, G., Burns, J. M., & Gansle, K. H., (2009). Value added assessment o teacher preparation in Louisiana: 2005-2006 to 2007-2008 Background & new results. Louisianas Teacher Quality Initiative. Retrieved rom http://www.laregentsarchive.com/Academic/TE/2009/2008-09VA(8.27.09).pd
Tennessee State Board o Education (2008). Report card on the eectiveness o teacher training programs. Retrieved rom http://www.tn.gov/sbe/2008Novemberpds/II%20A%20Teacher%20Quality%20Report%20Card%20Master.pd
Tennessee State Board o Education (2009). Report card on the eectiveness o teacher training programs. Retrieved rom http://www.tn.gov/sbe/TeacherReportCard/2009/2009%20Report%20Card%20on%20Teacher%20Eectiveness.pd
Tennessee State Board o Education & the Tennessee Higher Education Commission (2010). Report card on the eectiveness o teacher trainingprograms. Retrieved rom http://www.tn.gov/sbe/Teacher%20Report%20Card%202010/2010%20Report%20Card%20on%20the%20Eectiveness%20o%20Teacher%20Training%20Programs.pd
Tennessee State Board o Education & the Tennessee Higher Education Commission (2011). Report card on the eectiveness o teacher trainingprograms. Retrieved rom http://www.tn.gov/thec/Divisions/ttt/account_report/2011reportcard/2011%20Report%20Card%20on%20the%20Eectiveness%20o%20Teacher%20Training%20Programs.pd
1420 New York Avenue, NW Washington, DC 20005
Tel: 202-393-0020 Fax: 202-393-0095 Web: www.nctq.org
Subscribe to NCTQs blog PDQ
Follow NCTQ on Twitter and Facebook
NCTQ is available to work with individual states to improve teacher policies.For more information, please contact:
Sandi Jacobs
Vice President
202-393-0020
Top Related