Krista S. Schumacher, PhD student & Program Evaluator

Do your results really say what you think they say?

Issues of reliability and validity in evaluation measuring instruments

Krista S. Schumacher, PhD student & Program EvaluatorOklahoma State University | JCCI Resource Development Services

AEA Meeting, October 17, 2013Assessment in Higher Education TIG

Key Issue“Unfortunately, many readers and

researchers fail to realize that no matter how profound the theoretical formulations, how sophisticated the design, and

how elegant the analytic techniques, they cannot

compensate for poor measures” (Pedhazur & Pedhazur Schmelkin, 1991).

The ProblemReview of 52 educational evaluation studies: 1971 to 1999 (Brandon & Singh, 2009)

None adequately addressed measurement

Lacking in research on practice of evaluation Literature on validity in evaluation studies

≠ measurement validity (Chen, 2010; Mark, 2011)

The Problem (cont.)

Federal emphasis on “scientifically based research” Experimental design Quasi-experimental design Regression discontinuity design, etc.

Where is measurement validity? How can programs be compared?

How can we justify requests for continued funding?

Program Evaluation Standards:

AccuracyStandard A2: Valid Information

“Evaluation information should serve the intended purposes and support valid interpretation” (p. 171).

Standard A3: Reliable Information“Evaluation procedures should yield

sufficiently dependable and consistent information for the intended users” (p. 179).(Yarbrough, Shulha, Hopson, & Caruthers,

2011)

Measurement Validity & Reliability Defined

Valid Inferences = Validity

Instrument measures intended construct

ReliabilityInstrument consistently measures a

construct But perhaps not the construct Reliability ≠ Validity

Consistent scores across administrations

Validity Types(basic for evaluation)

Face On its face, instrument seems to measure intended

construct Assessment: Subject Matter Experts (SME) ratings

Content Items representative of domain of interest.

Assessment: SME ratings Provides no information for validity of inferences about

scoresConstruct Instrument content reflects intended construct

Assessment: Exploratory factor analysis (EFA), principal components analysis (PCA)

Understanding Construct Validity

Pumpkin Pie Example

ConstructPie

FactorsCrust and filling

Variables (items)Individual ingredients

(Nassif & Khalil, 2006)

Validity Types(more advanced)

Criterion Establishes relationship or discrimination

Assessment: Correlation of scores with other test or with outcome variable

Types of criterion validity evidence Concurrent validity

Positive correlation with scores from another instrument measuring same construct

Discriminant validity Negative correlation with scores from another instrument measuring

opposite construct; comparing scores from different groups Predictive validity

Positive correlation of scores with criterion variable test is intended to predict

E.g., SAT scores and undergraduate GPA

Reliability(basic for evaluation)

Measure of error (or results due to chance)

Internal Consistency Reliability (one type of reliability)

Cronbach’s coefficient alpha (most common) Correlation coefficient:

+1 = high reliability, no error 0 = no reliability, high error ≥ .70 desired (Nunnally, 1978)

Not a measure of dimensionality If multiple scales (or factors), compute alpha for each

scale

Psychometrically Tested Instrument in Evaluation:

ExampleMiddle Schoolers Out to Save the

World(Tyler-Wood, Knezek, & Christensen, 2010)

$1.6 million NSF Innovative Technology Experiences for

Students and Teachers (ITEST) STEM attitudes & career interest surveys

Process Adapted existing psychometrically tested instruments Instrument development discussed Validity and reliability evidence included Instruments published in article

Middle Schoolers Out to Save the World: Validity & Reliability

Content validitySubject matter experts

Teachers; advisory board membersConstruct validity

Principal components analysisCriterion-related validity

Concurrent: Correlated scores with other instruments tested for validity and reliability

Discriminant: Compared scores among varying groups (e.g., 6th graders vs. ITEST PIs)

Middle Schoolers Out to Save the World: Construct Validity

Career Interest Survey Items

Component 1: Supportive environment

Component 2: Science education interest

Component 3: Perceived importance of science career

Item 1 .781 (component loading)

Item 2 .849Item 3 .759Item 4 .900Item 5 .851Item 6 .921Item 7 .852Item 8 .736Item 9 .844Item 10 .670Item 11 .888Item 12 .886

Middle Schoolers Out to Save the World: Reliability

Scale # Items

Cronbach’s alpha

Perception of supportive environment for pursuing a career in science

4 .86

Interest in pursuing educational opportunities that would lead to a career in science

5 .94

Perceived importance of a career in science 3 .78All items 12 .94

Internal Consistency Reliabilities for Career Interest Scales

Evaluations Lacking Instrument Validity &

ReliabilitySix evaluations reviewedApprox. $9 million in federal fundingNSF programs:

STEM Talent Expansion Program (STEP)Innovative Technology Experiences for

Science Teachers (ITEST)Research in Disabilities Education

All used evaluator-developed instruments

Purpose of Sample Evaluation Instruments

Instruments intended to measure:Attitudes toward science, technology,

engineering & math (STEM)Anxiety related to STEM educationInterest in STEM careersConfidence regarding success in STEM

majorProgram satisfaction

Measurement Fatal Flaws in Sample Evaluations

Failed to:Discuss process of instrument development

How were items developed? Were they reviewed by anyone other than

evaluators?Report reliability or validity information

Evaluations that included existing instruments did not report results of psychometric testing

One used different instruments for pre/post tests

How can claims of increases or decreases be made when different items are used?

Reported Findings of Sample Evaluations

IEP students less likely than non-IEP peers to be interested in STEM fields (Lam et al., 2008)

Freshman seminar increased perceived readiness for following semester (Raines, 2012)

Residential program increased STEM attitudes and career interests (Lenaburg et al., 2012)

Participants satisfied with program (Russomanno et al, 2010)

Increased perceived self-competence re: information technology (IT) (Hayden et al., 2011)

Improved perceptions of IT professionals among high school faculty (Forssen et al., 2011)

Implications for Evaluation Funding and other program decisions

Findings based on valid and reliable data provide strong justifications

Use existing (tested) instruments when possible Assessment Tools in Informal Science

http://www.pearweb.org/atis/dashboard/index Buros Center for Testing (Mental Measurements

Yearbook)http://buros.org/

For newly created instrumentsDiscuss process of instrument creationReport evidence of validity and reliability

http://www.pearweb.org/atis/dashboard/index

http://buros.org/

Conclusion No more missing pieces

Measurement deserves a place of priority

Continually ask...• Are the data trustworthy? • Are my conclusions justifiable?

How do we know these results really say

what we think they say?

ReferencesBrandon, P. R., & Singh, J. M. (2009). The strength of the methodological warrants for the findings of research on program

evaluation use. American Journal of Evaluation, 30(2), 123-157.Chen, H. T. (2010). The bottom-up approach to integrative validity: A new perspective for program evaluation. Evaluation

and Program Planning, 33, 205-214.Forssen, A., Lauriski-Karriker, T., Harriger, A., & Moskal, B. (2011). Surprising Possibilities Imagined and Realized through

Information Technology: Encouraging high school girls' interests in information technology. Journal of STEM Education: Innovations & Research, 12(5/6), 46-57.

Hayden, K., Ouyang, Y., Scinski, L., Olszewski, B., & Bielefeldt, T. (2011). Increasing student interest and attitudes in STEM: Professional development and activities to engage and inspire learners. Contemporary Issues in Technology and Teacher Education, 11(1), 47-69.

Lam, P., Doverspike, D., Zhao, J., Zhe, J., & Menzemer, C. (2008). An evaluation of a STEM program for middle school students on learning disability related IEPs. Journal of STEM Education: Innovations & Research, 9(1/2), 21-29.

Lenaburg, L., Aguirre, O., Goodchild, F., & Kuhn, J.-U. (2012). Expanding Pathways: A Summer Bridge Program for Community College STEM Students. Community College Journal of Research and Practice, 36(3), 153-168.

Mark, M. M. (2011). New (and old) directions for validity concerning generalizability. New Directions for Evaluation, 2011(130), 31-42.

Nassif, N., & Khalil, Y. (2006). Making a pie as a metaphor for teaching scale validity and reliability. American Journal of Evaluation, 27(3), 393-398.

Nunnally, J. (1978). Psychometric theory. New York, NY: McGraw-Hill.Pedhazur, E. J., & Pedhazur Schmelkin, L. (1991). Measurement, design, and analysis: An integrated approach. New York,

NY: Psychology Press.Raines, J. M. (2012). FirstSTEP: A preliminary review of the effects of a summer bridge program on pre-college STEM

majors. Journal of STEM Education : Innovations and Research, 13(1).Russomanno, D., Best, R., Ivey, S., Haddock, J. R., Franceschetti, D., & Hairston, R. J. (2010). MemphiSTEP: A STEM Talent

Expansion Program at the University of Memphis. Journal of STEM Education : Innovations and Research, 11(1/2), 69-81.

Tyler-Wood, T., Knezek, G., & Christensen, R. (2010). Instruments for assessing interest in STEM content and careers. Journal of Technology and Teacher Education, 18(2), 341-363.

Yarbrough, D. B., Shulha, L. M., Hopson, R. K., & Caruthers, F. A. (Eds.). (2011). The program evaluation standards: A guide for evaluators and evaluation users (3rd ed.). Thousand Oaks, CA: Sage.

Contact InformationJCCI Resource Development Serviceshttp://www.jccionline.comBECO Building West 5410 Edson Lane - Suite 210B Rockville , MD 20852

Jennifer Kerns, President301-468-1851 | [email protected]

Krista S. Schumacher, Associate918-284-7276 | [email protected]

http://www.jccionline.com/

mailto:[email protected]

mailto:[email protected]

Krista S. Schumacher, PhD student & Program Evaluator

Documents

Transcript of Krista S. Schumacher, PhD student & Program Evaluator