Krista S. Schumacher, PhD student & Program Evaluator
description
Transcript of Krista S. Schumacher, PhD student & Program Evaluator
Do your results really say what you think they say?
Issues of reliability and validity in evaluation measuring instruments
Krista S. Schumacher, PhD student & Program EvaluatorOklahoma State University | JCCI Resource Development Services
AEA Meeting, October 17, 2013Assessment in Higher Education TIG
Key Issue“Unfortunately, many readers and
researchers fail to realize that no matter how profound the theoretical formulations, how sophisticated the design, and
how elegant the analytic techniques, they cannot
compensate for poor measures” (Pedhazur & Pedhazur Schmelkin, 1991).
The ProblemReview of 52 educational evaluation studies: 1971 to 1999 (Brandon & Singh, 2009)
None adequately addressed measurement
Lacking in research on practice of evaluation Literature on validity in evaluation studies
≠ measurement validity (Chen, 2010; Mark, 2011)
The Problem (cont.)
Federal emphasis on “scientifically based research” Experimental design Quasi-experimental design Regression discontinuity design, etc.
Where is measurement validity? How can programs be compared?
How can we justify requests for continued funding?
Program Evaluation Standards:
AccuracyStandard A2: Valid Information
“Evaluation information should serve the intended purposes and support valid interpretation” (p. 171).
Standard A3: Reliable Information“Evaluation procedures should yield
sufficiently dependable and consistent information for the intended users” (p. 179).(Yarbrough, Shulha, Hopson, & Caruthers,
2011)
Measurement Validity & Reliability Defined
Valid Inferences = Validity
Instrument measures intended construct
ReliabilityInstrument consistently measures a
construct But perhaps not the construct Reliability ≠ Validity
Consistent scores across administrations
Validity Types(basic for evaluation)
Face On its face, instrument seems to measure intended
construct Assessment: Subject Matter Experts (SME) ratings
Content Items representative of domain of interest.
Assessment: SME ratings Provides no information for validity of inferences about
scoresConstruct Instrument content reflects intended construct
Assessment: Exploratory factor analysis (EFA), principal components analysis (PCA)
Understanding Construct Validity
Pumpkin Pie Example
ConstructPie
FactorsCrust and filling
Variables (items)Individual ingredients
(Nassif & Khalil, 2006)
Validity Types(more advanced)
Criterion Establishes relationship or discrimination
Assessment: Correlation of scores with other test or with outcome variable
Types of criterion validity evidence Concurrent validity
Positive correlation with scores from another instrument measuring same construct
Discriminant validity Negative correlation with scores from another instrument measuring
opposite construct; comparing scores from different groups Predictive validity
Positive correlation of scores with criterion variable test is intended to predict
E.g., SAT scores and undergraduate GPA
Reliability(basic for evaluation)
Measure of error (or results due to chance)
Internal Consistency Reliability (one type of reliability)
Cronbach’s coefficient alpha (most common) Correlation coefficient:
+1 = high reliability, no error 0 = no reliability, high error ≥ .70 desired (Nunnally, 1978)
Not a measure of dimensionality If multiple scales (or factors), compute alpha for each
scale
Psychometrically Tested Instrument in Evaluation:
ExampleMiddle Schoolers Out to Save the
World(Tyler-Wood, Knezek, & Christensen, 2010)
$1.6 million NSF Innovative Technology Experiences for
Students and Teachers (ITEST) STEM attitudes & career interest surveys
Process Adapted existing psychometrically tested instruments Instrument development discussed Validity and reliability evidence included Instruments published in article
Middle Schoolers Out to Save the World: Validity & Reliability
Content validitySubject matter experts
Teachers; advisory board membersConstruct validity
Principal components analysisCriterion-related validity
Concurrent: Correlated scores with other instruments tested for validity and reliability
Discriminant: Compared scores among varying groups (e.g., 6th graders vs. ITEST PIs)
Middle Schoolers Out to Save the World: Construct Validity
Career Interest Survey Items
Component 1: Supportive environment
Component 2: Science education interest
Component 3: Perceived importance of science career
Item 1 .781 (component loading)
Item 2 .849Item 3 .759Item 4 .900Item 5 .851Item 6 .921Item 7 .852Item 8 .736Item 9 .844Item 10 .670Item 11 .888Item 12 .886
Middle Schoolers Out to Save the World: Reliability
Scale # Items
Cronbach’s alpha
Perception of supportive environment for pursuing a career in science
4 .86
Interest in pursuing educational opportunities that would lead to a career in science
5 .94
Perceived importance of a career in science 3 .78All items 12 .94
Internal Consistency Reliabilities for Career Interest Scales
Evaluations Lacking Instrument Validity &
ReliabilitySix evaluations reviewedApprox. $9 million in federal fundingNSF programs:
STEM Talent Expansion Program (STEP)Innovative Technology Experiences for
Science Teachers (ITEST)Research in Disabilities Education
All used evaluator-developed instruments
Purpose of Sample Evaluation Instruments
Instruments intended to measure:Attitudes toward science, technology,
engineering & math (STEM)Anxiety related to STEM educationInterest in STEM careersConfidence regarding success in STEM
majorProgram satisfaction
Measurement Fatal Flaws in Sample Evaluations
Failed to:Discuss process of instrument development
How were items developed? Were they reviewed by anyone other than
evaluators?Report reliability or validity information
Evaluations that included existing instruments did not report results of psychometric testing
One used different instruments for pre/post tests
How can claims of increases or decreases be made when different items are used?
Reported Findings of Sample Evaluations
IEP students less likely than non-IEP peers to be interested in STEM fields (Lam et al., 2008)
Freshman seminar increased perceived readiness for following semester (Raines, 2012)
Residential program increased STEM attitudes and career interests (Lenaburg et al., 2012)
Participants satisfied with program (Russomanno et al, 2010)
Increased perceived self-competence re: information technology (IT) (Hayden et al., 2011)
Improved perceptions of IT professionals among high school faculty (Forssen et al., 2011)
Implications for Evaluation Funding and other program decisions
Findings based on valid and reliable data provide strong justifications
Use existing (tested) instruments when possible Assessment Tools in Informal Science
http://www.pearweb.org/atis/dashboard/index Buros Center for Testing (Mental Measurements
Yearbook)http://buros.org/
For newly created instrumentsDiscuss process of instrument creationReport evidence of validity and reliability
Conclusion No more missing pieces
Measurement deserves a place of priority
Continually ask...• Are the data trustworthy? • Are my conclusions justifiable?
How do we know these results really say
what we think they say?
ReferencesBrandon, P. R., & Singh, J. M. (2009). The strength of the methodological warrants for the findings of research on program
evaluation use. American Journal of Evaluation, 30(2), 123-157.Chen, H. T. (2010). The bottom-up approach to integrative validity: A new perspective for program evaluation. Evaluation
and Program Planning, 33, 205-214.Forssen, A., Lauriski-Karriker, T., Harriger, A., & Moskal, B. (2011). Surprising Possibilities Imagined and Realized through
Information Technology: Encouraging high school girls' interests in information technology. Journal of STEM Education: Innovations & Research, 12(5/6), 46-57.
Hayden, K., Ouyang, Y., Scinski, L., Olszewski, B., & Bielefeldt, T. (2011). Increasing student interest and attitudes in STEM: Professional development and activities to engage and inspire learners. Contemporary Issues in Technology and Teacher Education, 11(1), 47-69.
Lam, P., Doverspike, D., Zhao, J., Zhe, J., & Menzemer, C. (2008). An evaluation of a STEM program for middle school students on learning disability related IEPs. Journal of STEM Education: Innovations & Research, 9(1/2), 21-29.
Lenaburg, L., Aguirre, O., Goodchild, F., & Kuhn, J.-U. (2012). Expanding Pathways: A Summer Bridge Program for Community College STEM Students. Community College Journal of Research and Practice, 36(3), 153-168.
Mark, M. M. (2011). New (and old) directions for validity concerning generalizability. New Directions for Evaluation, 2011(130), 31-42.
Nassif, N., & Khalil, Y. (2006). Making a pie as a metaphor for teaching scale validity and reliability. American Journal of Evaluation, 27(3), 393-398.
Nunnally, J. (1978). Psychometric theory. New York, NY: McGraw-Hill.Pedhazur, E. J., & Pedhazur Schmelkin, L. (1991). Measurement, design, and analysis: An integrated approach. New York,
NY: Psychology Press.Raines, J. M. (2012). FirstSTEP: A preliminary review of the effects of a summer bridge program on pre-college STEM
majors. Journal of STEM Education : Innovations and Research, 13(1).Russomanno, D., Best, R., Ivey, S., Haddock, J. R., Franceschetti, D., & Hairston, R. J. (2010). MemphiSTEP: A STEM Talent
Expansion Program at the University of Memphis. Journal of STEM Education : Innovations and Research, 11(1/2), 69-81.
Tyler-Wood, T., Knezek, G., & Christensen, R. (2010). Instruments for assessing interest in STEM content and careers. Journal of Technology and Teacher Education, 18(2), 341-363.
Yarbrough, D. B., Shulha, L. M., Hopson, R. K., & Caruthers, F. A. (Eds.). (2011). The program evaluation standards: A guide for evaluators and evaluation users (3rd ed.). Thousand Oaks, CA: Sage.
Contact InformationJCCI Resource Development Serviceshttp://www.jccionline.comBECO Building West 5410 Edson Lane - Suite 210B Rockville , MD 20852
Jennifer Kerns, President301-468-1851 | [email protected]
Krista S. Schumacher, Associate918-284-7276 | [email protected]