Applied Psych Test Design: Part D--Develop norm (standardization) plan

25
The Art and Science of Test Development—Part D Develop norm (standardization) sample plan The basic structure and content of this presentation is grounded extensively on the test development procedures developed by Dr. Richard Woodcock Kevin S. McGrew, PhD. Educational Psychologist Research Director Woodcock-Muñoz Foundation

description

The Art and Science of Applied Test Development. This is the fourth in a series of PPT modules explicating the development of psychological tests in the domain of cognitive ability using contemporary methods (e.g., theory-driven test specification; IRT-Rasch scaling; etc.). The presentations are intended to be conceptual and not statistical in nature. Feedback is appreciated.

Transcript of Applied Psych Test Design: Part D--Develop norm (standardization) plan

Page 1: Applied Psych Test Design: Part D--Develop norm (standardization) plan

The Art and Science of Test Development—Part D

Develop norm (standardization) sample plan

The basic structure and content of this presentation is grounded extensively on the test development procedures developed by Dr. Richard Woodcock

Kevin S. McGrew, PhD.

Educational Psychologist

Research DirectorWoodcock-Muñoz Foundation

Page 2: Applied Psych Test Design: Part D--Develop norm (standardization) plan

Part A: Planning, development frameworks & domain/test specification blueprints

Part B: Test and Item Development

Part C: Use of Rasch Technology

Part D: Develop norm (standardization) plan

Part E: Calculate norms and derived scores

Part F: Psychometric/technical and statistical analysis: Internal

Part G: Psychometric/technical and statistical analysis: External

The Art and Science of Test Development

The above titled topic is presented in a series of sequential PowerPoint modules. It is strongly recommended that the modules (A-G) be viewed in sequence.

The current module is designated by red bold font lettering

Page 3: Applied Psych Test Design: Part D--Develop norm (standardization) plan

Develop norm (standardization) sampling plan

The goal:A nationally representative samplefrom which to develop test norms

Page 4: Applied Psych Test Design: Part D--Develop norm (standardization) plan

Implementation of sampling plan in practical test development framework

Gather the norm (standardization) data and use for Rasch scaling, norm development, psychometric, and statistical analyses

Page 5: Applied Psych Test Design: Part D--Develop norm (standardization) plan

Implementation of sampling plan inConceptual Psychometric Validity Framework

Gather the norm (standardization) data and use

for Rasch scaling, norm development, psychometric, and statistical analyses

Page 6: Applied Psych Test Design: Part D--Develop norm (standardization) plan

Develop norm (standardization) sample specifications for country/nation where test is intended to be used

The goal:A nationally representative samplefrom which to develop test norms

Page 7: Applied Psych Test Design: Part D--Develop norm (standardization) plan

Sampling plan should be based on best available source of national statistics (select examples below)

Page 8: Applied Psych Test Design: Part D--Develop norm (standardization) plan

Stage 1: Sampling of communities

Stage 2: Sampling of schools

Stage 3: Sampling of subjects

• School-age subjects – random sampling in grades

• Preschool, university and adult subjects - quotas selected as per selected US Census variables

Three-stage sampling plan strategy used for WJ batteries in United States

The goal:A nationally representative samplefrom which to develop test norms

Page 9: Applied Psych Test Design: Part D--Develop norm (standardization) plan

Goal is a nationally representative sample upon

which to base the norms for the measures (tests, clusters)

Page 10: Applied Psych Test Design: Part D--Develop norm (standardization) plan

No sampling plan is perfect: “Tweak” final norm data via subject weighting

A few tips/cautions

•Oversample small groups and then down-weight

•No amount of creative weighting can “fix” a poorly executed sampling plan

•Continuous sample analysis: Make sure to frequently monitor data as it is collected to see how close the “fit” between the sampling plan and actual subject characteristics. Important so you can make adjustments during the data collection (before it is too late)

Page 11: Applied Psych Test Design: Part D--Develop norm (standardization) plan
Page 12: Applied Psych Test Design: Part D--Develop norm (standardization) plan
Page 13: Applied Psych Test Design: Part D--Develop norm (standardization) plan
Page 14: Applied Psych Test Design: Part D--Develop norm (standardization) plan

WJ III Three-stage sampling plan strategy: Stage 1

Sampling of communities – communities sampled according to 10 different community characteristics

Page 15: Applied Psych Test Design: Part D--Develop norm (standardization) plan

$$$$$$$$

$$

Are students who are categorized as “low SES” (at parent/family level) from $$ Community likely to be having the same community/life/family/educational experiences as “low SES” students (at parent/family level) from Community $$$$$$$$ ?

Important question to think about

Page 16: Applied Psych Test Design: Part D--Develop norm (standardization) plan

Family/Parent SES classification

Co

mm

un

ity

SE

S c

lass

ific

atio

n

Low Middle High

Lo

w

M

idd

le

Hig

h

Cm

nty

AC

mn

ty B

Cm

nty

C

Illustration of potential threat to representative sample when community SES is ignored and only family/parent SES is used in sampling plan

Level of abilities of selected students

Page 17: Applied Psych Test Design: Part D--Develop norm (standardization) plan

Community SES is one of the most important, yet most frequently overlooked variables in a test

standardization sampling plan

Page 18: Applied Psych Test Design: Part D--Develop norm (standardization) plan

Sample research/norm demographic and data

file variable coding sheet: WJ III

Tips/Cautions

• Spend considerable time designing this data collection form (be OCD)

• Think of all future analyses when deciding what to include/code

• If in doubt about a variable…include it (you typically can’t go back later to get information)

• Create coding system with quality data entry procedures in mind

• You can’t analyze what you haven’t gathered and coded from the beginning

Page 19: Applied Psych Test Design: Part D--Develop norm (standardization) plan

Development and implementation of sampling plan is probably the most critical phase of test development

• GIGO – garbage ingarbage out

Retain as much control over all aspects of data collection and data entry as possible

Quality of data is only as good as your pool of examiners

• Recruiting, training, supervising, and retaining good examiners requires major attention and is very important

• Examiners need supervision

• Terminate poor examiners as soon as you can

• Central office should review every single piece of information on submitted test records, especially at the beginning of an examiners testing (you don’t want them practicing errors)

• Run special Rasch “person fit” reports to flag test records that look suspicious (and see if they come from certain examiners)

• Best examiners are not necessarily psychologists or doctoral students.

(cont. next slide)

Page 20: Applied Psych Test Design: Part D--Develop norm (standardization) plan

Spend considerable time developing the test record and variable coding sheets

Run “continuous sample analysis reports” to monitor sampling plan adherence or drift – so you can make changes quickly before it is too late

There is no single-purpose system, or collection of published software programs, that can handle the detailed and fluid work of data entry, editing and monitoring. Be prepared to develop and pay for custom software.

Dedicated professional data-entry software is a must. SPSS, Excel, etc. won’t cut it.

• Double data entry verification a must• Only have a few well trained and diligent individuals enter the data

Don’t succumb to samples of convenience

Oversample small groups – and then down weight

Seriously consider three-stage sampling plan that controls for community SES

Page 21: Applied Psych Test Design: Part D--Develop norm (standardization) plan

Ce

ntr

al

Eu

rop

e R

ep

A

sa

mp

le

US norming US norming

Ce

ntr

al

Eu

rop

e R

ep

B

sa

mp

le

Central Europe Rep A sample

Ce

ntr

al

Eu

rop

e R

ep

B

sa

mp

le Poor sampling plan and data collection

cannot hide from during data analysis: Select example

Rasch item (W-difficulties) for a WJ III test in US norming sample and two neighboring (and very similar) Central European Republics

Something is wrong with data from Reb B. Reb A is similar to US data; Rep A and Rep B, which are similar, are not even similar in plots

Page 22: Applied Psych Test Design: Part D--Develop norm (standardization) plan

Poor sampling plan and data collection cannot hide from good data analysis:

Select example

Distribution of W-abilities. Something is wrong with data from Reb B.

Rep B data is negatively skewed

US normingCentral Europe Rep A sample

Central Europe Rep B sample

Page 23: Applied Psych Test Design: Part D--Develop norm (standardization) plan

Planned “incomplete” (missing) data collection

Matrix sampling (Partial)

Multiple and EM-based data imputation

Page 24: Applied Psych Test Design: Part D--Develop norm (standardization) plan

Test 1 A B C D E X Total n

1. Verbal Comprehension C C C C C C 1346

2. Visual-Auditory Learning C C C C C C 1336

3. Spatial Relations C C C C C C 1375

4. Sound Blending C C C C C C 1382

5. Concept Formation C C C C C C 1300

6. Visual Matching C C C C C C 1364

7. Numbers Reversed C C C C C C 1325

8. Incomplete Words C M 627

9. Auditory Working Memory C M 548

11. General Information M M 389

12. Retrieval Fluency M M 388

13. Picture Recognition M M 390

14. Auditory Attention M M 388

15. Analysis-Synthesis M M 362

16. Decision Speed M M 376

17. Memory for Words M M 364

1. Letter-Word Identification C C C C C C 1323

2. Reading Fluency C M 588

5. Calculation C C C C C C 1210

6. Math Fluency C M 588

7. Spelling C M 557

8. Writing Fluency M M 352

9. Passage Comprehension C C C C C C 1203

10. Applied Problems C C C C C C 964

11. Writing Samples M M 342

Totals 450 102 117 127 108 120 372 1396

Mix

ed b

atte

ries

Recent example: Australian WJ IIII standardization partial matrix sampling plan

Yellow column (1) represents initial “core”

battery prior to implementation

of “core + matrix” battery

plan

Page 25: Applied Psych Test Design: Part D--Develop norm (standardization) plan

End of Part D

Additional steps in test development process will be presented in subsequent modules as they are developed