Statistical Considerations for Educational Screening & Diagnostic Assessments

112
YAACOV PETSCHER, PH.D. FLORIDA CENTER FOR READING RESEARCH FLORIDA STATE UNIVERSITY Statistical Considerations for Educational Screening & Diagnostic Assessments A discussion of methodological applications which have existed in the literature for a long time and are used in other disciplines but are emerging more now in education

description

Statistical Considerations for Educational Screening & Diagnostic Assessments. A discussion of methodological applications which have existed in the literature for a long time and are used in other disciplines but are emerging more now in education. Yaacov Petscher, Ph.D. - PowerPoint PPT Presentation

Transcript of Statistical Considerations for Educational Screening & Diagnostic Assessments

Page 1: Statistical Considerations for Educational Screening & Diagnostic Assessments

YAACOV PETSCHER, PH.D.FLORIDA CENTER FOR READING

RESEARCHFLORIDA STATE UNIVERSITY

Statistical Considerations for Educational Screening & Diagnostic Assessments

A discussion of methodological applications which have existed in the literature for a long time and are used in other disciplines but are emerging more now in education

Page 2: Statistical Considerations for Educational Screening & Diagnostic Assessments

Discussion Points

Assessment AssumptionsContexts of AssessmentsStatistical Considerations

Reliability Validity Benchmarking

“Disclaimer” Focusing on Breadth not Depth Based on applied contract and grant research One slide of equations

Page 3: Statistical Considerations for Educational Screening & Diagnostic Assessments

Assumptions of Assessment - Researchers

Constructs exist but we can’t see themConstructs can be measuredAlthough we can measure constructs, our

measurement is not perfectThere are different ways to measure any

given constructAll assessment procedures have strengths

and limitations

Page 4: Statistical Considerations for Educational Screening & Diagnostic Assessments

Assumptions of Assessment - Practitioner

Multiple sources of information should be part of the assessment process

Performance on tests can be generalized to non-test behaviors

Assessment can provide information that helps educators make better educational decisions

Assessment can be conducted in a fair manner

Testing and assessment can benefit our educational institutions and society as a whole

Page 5: Statistical Considerations for Educational Screening & Diagnostic Assessments

Contexts of Assessments

Instructional Formative Interim Summative

Research Individual Differences Group Differences (RCT) Growth

Legislative Initiatives NCLB Reading First Race to the Top Common Core

Page 6: Statistical Considerations for Educational Screening & Diagnostic Assessments

Common Core Adoption

Page 7: Statistical Considerations for Educational Screening & Diagnostic Assessments

PARCC

Page 8: Statistical Considerations for Educational Screening & Diagnostic Assessments

Smarter Balanced

Page 9: Statistical Considerations for Educational Screening & Diagnostic Assessments

Within Common Core

USDOE PARCC Assessments Smarter Balanced Assessments Reading for Understanding Assessments I3 Assessments

Private Sector

Page 10: Statistical Considerations for Educational Screening & Diagnostic Assessments

Underlying “Code” of Assumptions

Researcher Constructs exist but we

can’t see them Constructs can be

measured Although we can measure

constructs , our measurement is not perfect

There are different ways to measure any given construct

All assessment procedures have strengths and limitations

Practitioner Multiple sources of information

should be part of the assessment process

Performance on tests can be generalized to non-test behaviors.

Assessment can provide information that helps educators make better educational decisions

Assessment can be conducted in a fair manner.

Testing and assessment can benefit our educational institutions and society as a whole.

Page 11: Statistical Considerations for Educational Screening & Diagnostic Assessments

Statistical Considerations - Reliability

Stability, accuracy, or consistency of test scores Many types

Internal consistency Retest Parallel-form Split-half

Should not be viewed as interchangeable Once could have very high stability but very poor

internal consistency Date of Birth/Height/SSN

Page 12: Statistical Considerations for Educational Screening & Diagnostic Assessments

Statistical Considerations - Reliability

Most frequently used framework is classical test theory

What does this assume?

T

X

e

Page 13: Statistical Considerations for Educational Screening & Diagnostic Assessments

Benefits of IRT

Puts persons and individuals on the same scale CTT looks at total score by p-value (difficulty)

Can result in shorter tests CTT reliability increases with more items

Can estimate the precision of scores at the individual level CTT assumes error is the same

Page 14: Statistical Considerations for Educational Screening & Diagnostic Assessments

Item Difficulty by Total Score Decile Groups

Page 15: Statistical Considerations for Educational Screening & Diagnostic Assessments

Item Difficulty by Ability

Page 16: Statistical Considerations for Educational Screening & Diagnostic Assessments

Items Don’t Always Do What We Want

Page 17: Statistical Considerations for Educational Screening & Diagnostic Assessments

Item Information

Page 18: Statistical Considerations for Educational Screening & Diagnostic Assessments

Test Information – Standard Error

Page 19: Statistical Considerations for Educational Screening & Diagnostic Assessments

Precision/Reliability

Page 20: Statistical Considerations for Educational Screening & Diagnostic Assessments

Statistical Considerations - Reliability

While precision improves on the idea of reliability, can precision be improved? Account for context effects (Wainer et al., 2000)

Petscher & Foorman, 2011 Account for time (Verhelst, Verstralen, & Jansen,

1997) Prindle, Petscher, & Mitchell, 2013

Page 21: Statistical Considerations for Educational Screening & Diagnostic Assessments

Statistical Considerations - Reliability

Context effects Any influence or interpretation that an item may

acquire as a result of its relationship to other items Greater problem in CAT due to unique testing Emerges as an item and passage level problem

Page 22: Statistical Considerations for Educational Screening & Diagnostic Assessments

Statistical Considerations - Reliability

Common stimulus

Page 23: Statistical Considerations for Educational Screening & Diagnostic Assessments

Statistical Considerations - Reliability

“If several questions within a test a test are experimentally linked so that the reaction to one question influences the reaction to another, the entire group of questions should be treated preferably as an ‘item’ when the data arising from application of split-half or appropriate analysis-of-variance methods are reported in the test manual”

APA Standards of Educational and Psychological Testing

(1966)

Page 24: Statistical Considerations for Educational Screening & Diagnostic Assessments

Expressed in IRT

)](exp[1

)](exp[)1()|1(

)(

)(

ijdiji

ijdijiiijiij ba

baccxp

)](exp[1

)](exp[)1()|1(

iji

ijiiijiij ba

baccxp

Page 25: Statistical Considerations for Educational Screening & Diagnostic Assessments

Study 1Reading Comprehension in Florida

Page 26: Statistical Considerations for Educational Screening & Diagnostic Assessments

Precision – After 3 passages

Page 27: Statistical Considerations for Educational Screening & Diagnostic Assessments

FAIR Technical Manual

Page 28: Statistical Considerations for Educational Screening & Diagnostic Assessments

Simulations are all well and good…

How does accounting for item dependency improve testing in real world?

Page 29: Statistical Considerations for Educational Screening & Diagnostic Assessments

N ~= 800, randomly assigned to testing condition Control was current 2pl scoring Experimental was unrestricted bi-factor

Evaluate Precision # of passages Prediction to state achievement

RCT

Page 30: Statistical Considerations for Educational Screening & Diagnostic Assessments
Page 31: Statistical Considerations for Educational Screening & Diagnostic Assessments
Page 32: Statistical Considerations for Educational Screening & Diagnostic Assessments

What this suggests

“Newer” models help us to more appropriately model the data

Precision/reliability are improved just by modeling the context effect

Improve the efficiency and precision of a computer-adaptive test by modeling the item-dependency

Page 33: Statistical Considerations for Educational Screening & Diagnostic Assessments

Study 2Morphology CAT

Page 34: Statistical Considerations for Educational Screening & Diagnostic Assessments

Accounting for Time

Somewhat similar to the item dependency model

IRT models are concerned with accuracyWhat about fluency?

CBM (DIBELS, AIMSweb, easyCBM) Brief assessments (TOWRE, TOSREC, etc)

Prindle, Petscher, Mitchell (2013) N = 200 Word knowledge test Limited to 60 sec Compared 1pl with a 1pl-response time models

Page 35: Statistical Considerations for Educational Screening & Diagnostic Assessments

Results

1pl marginal α = .80

1pl-rt marginal α = .87

Page 36: Statistical Considerations for Educational Screening & Diagnostic Assessments

What this suggests

Accounting for response time of items can improve precision for most participants

Limitations More difficult to do with younger children Requires computer delivery to record accuracy and

time Cannot do with connected text

Page 37: Statistical Considerations for Educational Screening & Diagnostic Assessments

Validity

Page 38: Statistical Considerations for Educational Screening & Diagnostic Assessments

Statistical Considerations – Factor Validity

Assessments are measures of hypothetical constructs

Assessments are measured with errorUse latent variable to leverage the common

varianceHow is this modeled?

Unidimensional Multidimensional

Three illustrations Petscher & Foorman, 2012 (Syntactic Awareness) Kieffer & Petscher, 2013 (Morphology/Vocabulary) Justice, Petscher, & Pentimonti, 2013 (Early Literacy)

Page 39: Statistical Considerations for Educational Screening & Diagnostic Assessments
Page 40: Statistical Considerations for Educational Screening & Diagnostic Assessments

Study 1Syntactic Awareness

Page 41: Statistical Considerations for Educational Screening & Diagnostic Assessments
Page 42: Statistical Considerations for Educational Screening & Diagnostic Assessments
Page 43: Statistical Considerations for Educational Screening & Diagnostic Assessments
Page 44: Statistical Considerations for Educational Screening & Diagnostic Assessments
Page 45: Statistical Considerations for Educational Screening & Diagnostic Assessments
Page 46: Statistical Considerations for Educational Screening & Diagnostic Assessments

Distribution of Ability

Page 47: Statistical Considerations for Educational Screening & Diagnostic Assessments

Precision (reliability) of Ability Scores

Page 48: Statistical Considerations for Educational Screening & Diagnostic Assessments

Predictive Validity of Factor Scores

Page 49: Statistical Considerations for Educational Screening & Diagnostic Assessments

Study 2Morphological

Awareness/Vocabulary

Page 50: Statistical Considerations for Educational Screening & Diagnostic Assessments

Morphological Awareness (MA) predicts Reading Comprehension (RC)

For a while, we have known that MA is correlated with reading comprehension (e.g., Carlisle, 2000; Freyd & Baron, 1982; Tyler & Nagy, 1990)

MA RC

Page 51: Statistical Considerations for Educational Screening & Diagnostic Assessments

MA predicts RC,above & beyond Vocabulary (V)

Unique contributions of MA to RC, controlling for vocabulary (e.g., Carlisle, 2000; Kieffer, Biancarosa, & Mancilla-Martinez, in press; Kieffer & Lesaux, 2008, 2012; Kieffer & Box, 2013; Nagy, Berninger, & Abbott, 2006)

MA RC

V

Page 52: Statistical Considerations for Educational Screening & Diagnostic Assessments

But wait…

Are we actually measuring MA and vocabulary as separate dimensions of lexical knowledge?Observed correlations between MA and

vocabulary are attenuated by measurement error

Reliability of researcher-created MA measures has been moderate In the .70-.80 range & occasionally lower

So, “unique” contributions of MA beyond V could be an artifact of measurement error

MA V

Page 53: Statistical Considerations for Educational Screening & Diagnostic Assessments

But wait…

Using Confirmatory Factor Analysis (CFA), Muse (2005) found that MA could not be distinguished from vocabulary in fourth grade, but instead form a unidimensional construct (See also Wagner, Muse, & Tannenbaum, 2007).

Spencer (2012) replicated this finding with eighth graders.

MA/V

Page 54: Statistical Considerations for Educational Screening & Diagnostic Assessments

On the other hand…

Using CFA, Kieffer & Lesaux (2012) found that MA was measurably separable from two other dimensions of vocabulary, though strongly related for both native English & language minority learners in Grade 6

Neugebauer, Kieffer, & Howard (under review) replicated this finding for Spanish speaking language minority learners in Grades 6-8

MA V

Page 55: Statistical Considerations for Educational Screening & Diagnostic Assessments

But

Is it possible a multidimensional structure exists but could be best captured by a general factor lexical knowledge and specific factors of morphological awareness and vocabulary?

If the common variance is captured by a general factor as well as specific factors, do they each predict distal outcome?

Page 56: Statistical Considerations for Educational Screening & Diagnostic Assessments

Modeling Dimensionality of Lexical Knowledge:Unidimensional

Fit poorlyRejected across parametric & nonparametric EFA & CFA models

Page 57: Statistical Considerations for Educational Screening & Diagnostic Assessments

Modeling Dimensionality of Lexical Knowledge:Two Dimensional

Page 58: Statistical Considerations for Educational Screening & Diagnostic Assessments

Modeling Dimensionality of Lexical Knowledge:Bi-factor Model

CFI = .98; TLI = .98; RMSEA = .015>1D: Δχ² = 66.71, Δdf = 34, p <.001>2D: Δχ² = 48.94, Δdf = 33, p <.05

Page 59: Statistical Considerations for Educational Screening & Diagnostic Assessments

Statistical Considerations – Factor Validity

Page 60: Statistical Considerations for Educational Screening & Diagnostic Assessments

Statistical Considerations – Factor Validity

Page 61: Statistical Considerations for Educational Screening & Diagnostic Assessments

Statistical Considerations – Factor Validity

Page 62: Statistical Considerations for Educational Screening & Diagnostic Assessments

Statistical Considerations - SEM

Page 63: Statistical Considerations for Educational Screening & Diagnostic Assessments

Study 3Early Literacy Skills

Page 64: Statistical Considerations for Educational Screening & Diagnostic Assessments

SPOT

Measure developed by Jackie Van LankveldEmbedded assessment when students read a

storyUsed primarily with students identified with

LIMeasures

Alphabet knowledge, phonological awareness, print knowledge

Present study N~=300 In this LI sample, how are the item responses best

represented?

Page 65: Statistical Considerations for Educational Screening & Diagnostic Assessments

Statistical Considerations – Factor Validity

Page 66: Statistical Considerations for Educational Screening & Diagnostic Assessments

Statistical Considerations – Factor Validity

Model X2 DF CFI TLI RMSEAUnidimensional 223.42 104 0.88 0.86 .064 (.052 ,.075)Multidimensional - 3 factor 146.77 101 0.96 0.95 .040 (.024, .053)Multidimensional - Bi-factor 107.65 89 0.98 0.98 .027 (.000, .044)

Page 67: Statistical Considerations for Educational Screening & Diagnostic Assessments
Page 68: Statistical Considerations for Educational Screening & Diagnostic Assessments

Statistical Considerations – Factor Validity

SS

Core

EV

WS

EL

Phon.

Print

Alphabet

-.30***

-.35***

-.46***-.53***

.01-.11

-.08

-.15

.71***

.69***

.62***

.82***

.07.14

-.07

-.03

Page 69: Statistical Considerations for Educational Screening & Diagnostic Assessments

Implications

Research Multidimensional General Good for individual

differences Limited in applicability

For now (Piasta, Petscher, Anthony, in preparation)

Practice Multidimensional Correlated Traits Good for easy-to-use in

classroom Limited in specificity

Model X2 DF CFI TLI RMSEAUnidimensional 223.42 104 0.88 0.86 .064 (.052 ,.075)Multidimensional - 3 factor 146.77 101 0.96 0.95 .040 (.024, .053)Multidimensional - Bi-factor 107.65 89 0.98 0.98 .027 (.000, .044)

Page 70: Statistical Considerations for Educational Screening & Diagnostic Assessments

Benchmarking

Page 71: Statistical Considerations for Educational Screening & Diagnostic Assessments

Statistical Considerations – Benchmarks

Students with poor reading skills have difficulty in closing achievement gaps

Accurate identification is necessary to remediate difficulties

Many assessments include guidelines for cut-points

Page 72: Statistical Considerations for Educational Screening & Diagnostic Assessments

Sample Risk Levels Chart

Page 73: Statistical Considerations for Educational Screening & Diagnostic Assessments

How to Validate – Current Theory

Variety of Methods Best Guess

+/- 1SD Percentile Ranks

Simple Stat Bivariate Correlations Interrater Reliability

More Advanced Logistic Regression Discriminant Function Analysis Achievement-IQ Discrepancies

Page 74: Statistical Considerations for Educational Screening & Diagnostic Assessments

Typical “Diagnostic/Screening” Q’s

WITR between blood characteristics and being HIV positive?

WITR between electromagnetic signals and correctly distinguishing from noise?

WITR between students’ scores on the Scholastic Reading Inventory and future risk on the SAT-10?

Page 75: Statistical Considerations for Educational Screening & Diagnostic Assessments

What is our question? Correlational?

Bivariate Correlation Interrater Reliability

Discrimination? Logistic Regression Discrimination Function Receiver Operating Characteristic (ROC) Curves

Page 76: Statistical Considerations for Educational Screening & Diagnostic Assessments

ROC

Graphical representation of operating pointsMultiple indices of efficiencyMoving cut-pointsOutperforms other techniques in diagnostic

efficiency (Hintze, 2005)

Page 77: Statistical Considerations for Educational Screening & Diagnostic Assessments

Advantages of using ROC

It defines the quality of a test or prediction using a measurement without specifying a cut off value for decision making

Greater flexibility in diagnostic accuracy and predictive power

Assuming Normal distribution The mean and Standard Error can be estimated The 95% CI can be estimated Statistical significance can be determined

Whether one test is better than another can be determined

Page 78: Statistical Considerations for Educational Screening & Diagnostic Assessments

Old School Discrimination

Form Two GroupsGiven the Test

4 Outcomes People who have the attribute were detected People who have the attribute were not detected People who don’t have the attribute were detected People who don’t have the attribute were not detected

Using the Results

Page 79: Statistical Considerations for Educational Screening & Diagnostic Assessments

What is a ROC Curve?

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1-Specificity

Sensitivity

Page 80: Statistical Considerations for Educational Screening & Diagnostic Assessments

What is a ROC Curve?

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1-Specificity

Sensitivity

Based on Cumulative Frequency %

Page 81: Statistical Considerations for Educational Screening & Diagnostic Assessments

Data Scheme

SRILexile Score

SAT-10 (<40th

%ile)Y-axis

SAT-10 (>=40th

%ile)X-axis

505 35 (.35) 5 (.05)

520 30 (.65) 10 (.15)

550 20 (.85) 20 (.35)

600 10 (.95) 30 (.65)

700 5 (1.00) 35 (1.00)

TOTALS N=100 N=100

Page 82: Statistical Considerations for Educational Screening & Diagnostic Assessments

What is a ROC Curve?

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1-Specificity

Sensitivity

505 35 (.35) 5 (.05)

520 30 (.65) 10 (.15)

550 20 (.85) 20 (.35)

600 10 (.95) 30 (.65)

700 5 (1.00)

35 (1.00)

Page 83: Statistical Considerations for Educational Screening & Diagnostic Assessments

Confusion Matrix

A B

C D

SAT10-Score

<40th%ile >=40th

%ile

At-Risk

SRI Score Not At-Risk

Page 84: Statistical Considerations for Educational Screening & Diagnostic Assessments

Confusion Matrix

A B

C D

FNTP

TP

CA

ASE

At-Risk

SRI Score Not At-Risk

SAT10-Score

<40th%ile >=40th

%ile

Page 85: Statistical Considerations for Educational Screening & Diagnostic Assessments

Confusion Matrix

A B

C D

FPTN

TN

DB

DSP

At-Risk

SRI Score Not At-Risk

SAT10-Score

<40th%ile >=40th

%ile

Page 86: Statistical Considerations for Educational Screening & Diagnostic Assessments

Confusion Matrix

A B

C D

FPTP

TP

BA

APPP

At-Risk

SRI Score Not At-Risk

SAT10-Score

<40th%ile >=40th

%ile

Page 87: Statistical Considerations for Educational Screening & Diagnostic Assessments

Confusion Matrix

A B

C D

TNFN

TN

DC

DNPP

At-Risk

SRI Score Not At-Risk

SAT10-Score

<40th%ile >=40th

%ile

Page 88: Statistical Considerations for Educational Screening & Diagnostic Assessments

Confusion Matrix

A B

C D

TNFNFPTP

TNTP

DCBA

DAOCC

At-Risk

SRI Score Not At-Risk

SAT10-Score

<40th%ile >=40th

%ile

Page 89: Statistical Considerations for Educational Screening & Diagnostic Assessments

Confusion Matrix

A B

C D

TNFNFPTP

FNTP

DCBA

CABR

At-Risk

SRI Score Not At-Risk

SAT10-Score

<40th%ile >=40th

%ile

Page 90: Statistical Considerations for Educational Screening & Diagnostic Assessments

Confusion Matrix

A B

C D

DCBA

DA OCC

DC

D NPP

BA

A PPP

DB

DSP

CA

ASE

At-Risk

SRI Score Not At-Risk

SAT10-Score

<40th%ile >=40th

%ile

Page 91: Statistical Considerations for Educational Screening & Diagnostic Assessments

Data Scheme

SRILexile Score

SAT-10 (<40th

%ile)Y-axis

SAT-10 (>=40th

%ile)X-axis

505 35 (.35) 5 (.05)

520 30 (.65) 10 (.15)

550 20 (.85) 20 (.35)

600 10 (.95) 30 (.65)

700 5 (1.00) 35 (1.00)

TOTALS N=100 N=100

Page 92: Statistical Considerations for Educational Screening & Diagnostic Assessments

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1-Specificity

Sensitivity

505 35 (.35) 5 (.05)

520 30 (.65) 10 (.15)

550 20 (.85) 20 (.35)

600 10 (.95) 30 (.65)

700 5 (1.00)

35 (1.00)

Page 93: Statistical Considerations for Educational Screening & Diagnostic Assessments

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1-Specificity

Sensitivity

505 35 (.35) 5 (.05)

Page 94: Statistical Considerations for Educational Screening & Diagnostic Assessments

Classification – Example 1

Evaluation of Cut Scores

505 35 (.35) 5 (.05)

Column NSRI At-Risk Not At-RiskAt-Risk 35 5 40Not At-Risk 65 95 160Row N 100 100 200

SE = .35 PPP = .88SP = .95 NPP = .41FN = .65 OCC = .65FP = .05

SAT-10

Page 95: Statistical Considerations for Educational Screening & Diagnostic Assessments

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1-Specificity

Sensitivity

520 30 (.65) 10 (.15)

Page 96: Statistical Considerations for Educational Screening & Diagnostic Assessments

Classification – Example 2

Evaluation of Cut Scores

520 30 (.65) 10 (.15)

Column NSRI At-Risk Not At-RiskAt-Risk 65 15 100Not At-Risk 35 85 100Row N 100 100 200

SE = .65 PPP = .81SP = .85 NPP = .71FN = .35 OCC = .75FP = .15

SAT-10

Page 97: Statistical Considerations for Educational Screening & Diagnostic Assessments

Cut-Point Selection

Column N Column NSRI At-Risk Not At-Risk SRI At-Risk Not At-RiskAt-Risk 65 15 100 At-Risk 35 5 40Not At-Risk 35 85 100 Not At-Risk 65 95 160Row N 100 100 200 Row N 100 100 200

SE = .65 PPP = .81 SE = .35 PPP = .88SP = .85 NPP = .71 SP = .95 NPP = .41FN = .35 OCC = .75 FN = .65 OCC = .65FP = .15 FP = .05

SAT-10SAT-10

Lexile = 520 Lexile = 505

Choose Lexile = 520 Right?

Page 98: Statistical Considerations for Educational Screening & Diagnostic Assessments

What are you Maximizing?

Properties of the Test - Population Sensitivity Specificity

Properties of the Sample Positive Predictive Power Negative Predictive Power

It’s All About the Base Rate!

Page 99: Statistical Considerations for Educational Screening & Diagnostic Assessments

Base Rates are Variables Too!!

Screening Test RSN test of Fan Fanaticism SE = .95, SP=.90

Administered to Two Samples Sample 1 – 2,000 people in Boston where 50% have

the problem Sample 2 – 2,000 people in New York where 15% have

the problem

Page 100: Statistical Considerations for Educational Screening & Diagnostic Assessments

Boston: Base Rate of 50%

Criterion

RSN Jail No Jail Total

At-Risk 950 100 1050

Not At-Risk 50 900 950

1,000 1,000 2,000 Sensitivity = 950/(1,000) = .95 Specificity = 900/(1,000)=.90

PPP =950/(1,050)=.95 NPP=900/(950)=.95

Overall Correct Classification=(950+900)/2,000=.925

Page 101: Statistical Considerations for Educational Screening & Diagnostic Assessments

NYC: Base Rate of 15%

Criterion

RSN Jail No Jail Total

At-Risk 285 170 455

Not At-Risk 15 1,530 1,545

300 1,700 2,000 Sensitivity = 285/(300) = .95 Specificity = 1,530/(1,700)=.90

PPP =285/(455)=.63 NPP=1,530/(1,545)=.99

Overall Correct Classification=(285+1,530)/2,000=.91

Page 102: Statistical Considerations for Educational Screening & Diagnostic Assessments

Comparison

Indices Boston NYCSE 0.95 0.95SP 0.90 0.90PPP 0.95 0.63NPP 0.95 0.99OCC 0.93 0.91

Page 103: Statistical Considerations for Educational Screening & Diagnostic Assessments

Comparison

Indices Boston NYCSE 0.95 0.95SP 0.90 0.90PPP 0.95 0.63NPP 0.95 0.99OCC 0.93 0.91

Page 104: Statistical Considerations for Educational Screening & Diagnostic Assessments

Applied to All 1st Graders in Florida Base Rate 15%

Criterion

Screen Problem No Problem Total

At-Risk 26,754 15,958 42,712

Not At-Risk 1,408 143,626 145,034

28,162 159,584 187,746 Sensitivity = 26,754/(28,162) = .95

Specificity = 143,626/(159,584)=.90

PPP =26,754/(42,712)=.63 NPP=143,626/(145,034)=.99

Overall Correct Classification=(26,754+143,626)/187,746=.91

Page 105: Statistical Considerations for Educational Screening & Diagnostic Assessments

Statewide Screening

If we had employed a test with those measurement properties statewide to detect children who where at-risk for reading problems, we would have mislabeled around 16,000 kids as at-risk who weren’t (37%).

However, we would only have only missed 1,400 students who missed potential services that needed them (1%).

Page 106: Statistical Considerations for Educational Screening & Diagnostic Assessments

Concluding Thoughts - Reliability

Researchers Evaluating other methods of reliability

Precision Generalizability

Practitioners What is being reported?

Internal consistency, test-retest, etc How reliable is it?

Nunnally/Bernstein >.80 research >.90 clinical

Page 107: Statistical Considerations for Educational Screening & Diagnostic Assessments

Concluding Thoughts – Factor Validity

Researchers Testing additional specifications outside of the

traditional 1/multi framework Bi-factor, Causal Indicator, etc.

Practitioners What type of factor analysis was done?

EFA/CFA Rules of thumb?

Too many 200?

Page 108: Statistical Considerations for Educational Screening & Diagnostic Assessments

Concluding Thoughts - Benchmarking

Researchers Improve the rigor of our methods

ROC, Diagnostic Measurement, Cost Curves

Practitioners Identify what “at-risk” means Establish the goal of the screening process Study how the screen was developed Determine the base rate Attend to the +/- predictive power Collect local data

Page 109: Statistical Considerations for Educational Screening & Diagnostic Assessments

Implications of these Considerations

We must be careful in how we choose assessments AYP Value-added modeling Promotion/Retention

Moving toward a new phase in assessments Computer-delivered Computer-adaptive

Smarter Balanced, FCRR, RFU

Be more aware of what other disciplines are doingBe more aware of what’s in older literature

Technology!

Page 110: Statistical Considerations for Educational Screening & Diagnostic Assessments

Great Resources

IRT The Theory and Practice of Item Response Theory (De

Ayala) Fundamentals of Item Response Theory (Hambleton et

al.)Factor Analysis

CFA for Applied Research (Brown)SEM

Beginner’s Guide to SEM (Schumacker & Lomax)ROC analysis

Analyzing ROC Curves with SAS (Gonen)

Page 111: Statistical Considerations for Educational Screening & Diagnostic Assessments

Resources

Shameless Plug IRT

R.J. De Ayala Factor Analysis

Rex Kline SEM

Richard Lomax Benchmarking

Chris Schatschneider

Page 112: Statistical Considerations for Educational Screening & Diagnostic Assessments

End