Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement Jeffrey M....

20
Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement Jeffrey M. Miller & Randall D. Penfield NCME, San Diego April 13, 2004 University of Florida millerjm @ufl.edu & [email protected]

Transcript of Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement Jeffrey M....

Page 1: Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement Jeffrey M. Miller & Randall D. Penfield NCME, San Diego April 13, 2004.

Improving Content Validity: A Confidence Interval

for Small Sample Expert Agreement

Jeffrey M. Miller & Randall D. PenfieldNCME, San Diego

April 13, 2004University of [email protected] & [email protected]

Page 2: Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement Jeffrey M. Miller & Randall D. Penfield NCME, San Diego April 13, 2004.

“Validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests (AERA/APA/NCME, 1999)

Content validity refers to the degree to which the content of the items reflects the content domain of interest (APA, 1954)

INTRODUCING CONTENT VALIDITY

Page 3: Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement Jeffrey M. Miller & Randall D. Penfield NCME, San Diego April 13, 2004.

Content is a precursor to drawing a score-based inference. It is evidence-in-waiting (Shepard, 1993; Yalow & Popham, 1983)

“Unfortunately, in many technical manuals, content representation is dealt with in a paragraph, indicating that selected panels of subject matter experts (SMEs) reviewed the test content, or mapped the items to the content standards…(Crocker, 2003)”

THE NEED FOR IMPROVED REPORTING

Page 4: Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement Jeffrey M. Miller & Randall D. Penfield NCME, San Diego April 13, 2004.

Several indices for quantifying expert agreement have been proposed

The mean rating across raters is often used in calculations

However, the mean alone does not provide information regarding its proximity to the unknown population mean.

We need a usable inferential procedure go gain insight into the accuracy of the sample mean as an estimate of the population mean.

QUANTIFYING CONTENT VALIDITY

Page 5: Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement Jeffrey M. Miller & Randall D. Penfield NCME, San Diego April 13, 2004.

A simple method is to calculate the traditional Waldconfidence interval

However, this interval is inappropriate for rating scales.

THE CONFIDENCE INTERVAL

df

sX t

n

1. Too few raters and response categories to assume population normality has not been violated.

2. No reason to believe the distribution should be normal.

3. The rating scale is bounded with categories that are discrete.

Page 6: Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement Jeffrey M. Miller & Randall D. Penfield NCME, San Diego April 13, 2004.

Penfield (2003) demonstrated that the Score method outperformed the Wald interval especially when The number of raters was small (e.g., ≤ 10) The number of categories was small (e.g., ≤ 5)

AN ALTERNATIVE IS THE

Furthermore, this interval is asymmetricIt is based on the actual distribution for the mean rating of concern.Further, the limits cannot extend below or above the actual limits of the categories.

SCORE CONFIDENCE INTERVAL FOR RATING SCALES

Page 7: Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement Jeffrey M. Miller & Randall D. Penfield NCME, San Diego April 13, 2004.

1. Obtain values for n, k, and z

n = the number of raters

K = the highest possible rating

z = the standard normal variate associated with the confidence level (e.g., +/- 1.96 at 95% confidence)

STEPS TO CALCULATING THE SCORE CONFIDENCE INTERVAL

Page 8: Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement Jeffrey M. Miller & Randall D. Penfield NCME, San Diego April 13, 2004.

2. Calculate the mean item rating

The sum of the ratings for an item divided by the number of raters

X

1

n

ii

X

n

Page 9: Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement Jeffrey M. Miller & Randall D. Penfield NCME, San Diego April 13, 2004.

3. Calculate p

p =

Or if scale begins with 1 then

p =

1

n

ii

X

nk

1 1

n

ii

X

nk k

Page 10: Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement Jeffrey M. Miller & Randall D. Penfield NCME, San Diego April 13, 2004.

4. Use p to calculate the upper and lower limits for a confidence interval for population proportion (Wilson, 1927)

2 2

2

2 4 (1 )

2( )L

pnk z z nkp p z

nk z

2 2

2

2 4 (1 )

2( )U

pnk z z nkp p z

nk z

Page 11: Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement Jeffrey M. Miller & Randall D. Penfield NCME, San Diego April 13, 2004.

5. Calculate the upper and lower limits of the Score confidence intervalfor the population mean rating

(1 )L LkLower X z

n

(1 )U UkUpper X z

n

Page 12: Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement Jeffrey M. Miller & Randall D. Penfield NCME, San Diego April 13, 2004.

Shorthand Example

Item: 3 + ? = 8

The content of this item represents the ability to add single-digit numbers.

1 2 3 4Strongly Disagree Disagree Agree Strongly Agree

Suppose the expert review session includes 10 raters.

The responses are 3, 3, 3, 3, 3, 3, 3, 3, 3, 4

Page 13: Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement Jeffrey M. Miller & Randall D. Penfield NCME, San Diego April 13, 2004.

Shorthand Example

n = 10

k = 4

z = 1.96

the sum of the items = 31

= 31/10 = 3.10 p = so, p = 31 / (10*4) = 0.775

1

n

ii

X

nk

X

Page 14: Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement Jeffrey M. Miller & Randall D. Penfield NCME, San Diego April 13, 2004.

Shorthand Example (cont.)

= (65.842 – 11.042) / 87.683 = 0.625

= (65.842 + 11.042) / 87.683 = 0.877

2 2

2

2 4 (1 )

2( )L

pnk z z nkp p z

nk z

2 2

2

2 4 (1 )

2( )U

pnk z z nkp p z

nk z

Page 15: Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement Jeffrey M. Miller & Randall D. Penfield NCME, San Diego April 13, 2004.

Shorthand Example (cont.)

= 3.100 – 1.96*sqrt(0.938/10) = 2.500

= 3.100 + 1.96*sqrt(0.421/10) = 3.507

(1 )L LkLower X z

n

(1 )U UkUpper X z

n

Page 16: Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement Jeffrey M. Miller & Randall D. Penfield NCME, San Diego April 13, 2004.

We are 95% confident that the

population mean rating falls somewhere

between 2.500 and 3.507

Page 17: Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement Jeffrey M. Miller & Randall D. Penfield NCME, San Diego April 13, 2004.

Content Validation

1. Method 1: Retain only items with a Score

interval of a particular width based on

a. A priori determination of appropriatenessb. An empirical standard (25th and 75th percentiles of all

widths)

2. Method 2: Retain items based on hypothesis test that the lower limit is above a particular value

Page 18: Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement Jeffrey M. Miller & Randall D. Penfield NCME, San Diego April 13, 2004.

EXAMPLE WITH 4 ITEMS  

 

  

 

  Rating Frequency for 10 Raters   95% Score CI

       

Item 0 1 2 3 4 Mean Lower Upper

                 

1 0 0 0 4 6 3.60 3.08 3.84

2 0 0 2 5 3 3.10 2.50 3.51

3 2 0 2 6 0 2.20 1.59 2.77

4 1 2 3 3 1 2.10 1.50 2.68

Page 19: Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement Jeffrey M. Miller & Randall D. Penfield NCME, San Diego April 13, 2004.

Conclusions 1. Score method provides a confidence interval that is

not dependent on the normality assumption

2. Outperforms the Wald interval when the number of raters and scale categories is small

3. Provides a decision-making method for the fate of items in expert review sessions.

4. Computational complexity can be eased through simple programming in Excel, SPSS, and SAS

Page 20: Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement Jeffrey M. Miller & Randall D. Penfield NCME, San Diego April 13, 2004.

For further reading,

Penfield, R. D. (2003). A score method for constructing asymmetric confidence intervals for the mean of a rating scale item. Psychological Methods, 8, 149-163.

Penfield, R. D., & Miller, J. M. (in press). Improving content validation studies using an asymmetric confidence interval for the mean of expert ratings. Applied Measurement in Education.