Item analysis

Item AnalysisItem Analysis

How do we evaluate our test?

A glimpse of our practices

When do we evaluate our test?

Windows User

What is a test?An objective measure of a sample of

behavior or psychological object (Anastasi & Urbina, 1997).

A systematic procedure for measuring a sample of behavior by posing a set of questions in a uniform manner (Gronlund & Linn, 2000).

Are only tools…

QUALITIES OF A QUALITIES OF A GOOD TESTGOOD TEST

To be a “GOOD” test, a test ought to have validity, reliability, and accuracy.

The Error Components The Error Components of a Test of a Test

Random Error◦Sources: Fatigue, Cheating,

Guessing, etc.Systematic Error

◦Sources: Item bias, Technical errors, Contextual clues, etc.

True Score = Observed Score Error

Important Notes from the Classical Test TheorySystematic errors lead to poor

test reliability and validity.Interpretations of test result

may be distorted by too many errors.

Random errors are more difficult to control than systematic errors.

Systematic errors can be controlled by systematic assessment (Item Analysis).

How do we evaluate our How do we evaluate our tests?tests?◦QUALITATIVE EVALUATION Systematic inspection of test plan, tasks, and format

◦QUANTITATIVE EVALUATION Psychometric techniques in item analysis

A Systematic Inspection of A Systematic Inspection of Tests Tests

Adequacy of Assessment or Test Plan

Adequacy of Assessment Task

Adequacy of Test Format and Directions

QUANTITATIVE EVALUATION

- determining the psychometric characteristics of the test using ITEM ANALYSIS

Psychometric Psychometric Techniques for Item Techniques for Item AnalysisAnalysis

Test / Item Statistics - These are psychometric techniques generally based on a norm-referenced perspective (Method of extreme groups).◦Item Difficulty◦Item Discrimination Power◦Effectiveness of Distracters

The Method of Extreme The Method of Extreme GroupsGroupsSelecting criterion groups

(Upper & Lower Criterion Groups)◦If samples are less than 50: Use

50% groupings◦If samples used are more than

50, and for a more refined analysis: Use upper & lower 27% groupings

◦Set aside the papers which will not be used in the analysis.

The Method of Extreme The Method of Extreme GroupsGroups

Determine item statistics◦Item Difficulty◦Item Discrimination◦Effectiveness of distracters

Determine test statistics◦Solve for the mean (average) of the

difficulty and discrimination indices of all the items in the test.

What is item difficulty?What is item difficulty?Item difficulty is simply the percentage of

students taking the test who answered the item correctly. It is represented by p index.

Range: 0.0 to 1.0 or 0% to 100%It can be computed using the formula below

NLUp RR

◦Some important notations UR = No. of students from the UPPER

criterion group that had gotten the item correctly or had chosen a particular option under analysis.

LR = No. of students from the LOWER criterion group that had gotten the item correctly or had chosen a particular option under analysis.

N = No. of students that had tried to answer the item

Rule of the Thumb: Rule of the Thumb: pp - - indexindexThe nearer the p value is to 0.0, or 0.0% the more DIFFICULT the item becomes.

The nearer the p value is to 1.0, or 100 % the EASIER the item becomes.

How difficult should our How difficult should our test/item be?test/item be?

For an objective item test, the ideal difficulty would be halfway between the percentage of pure guess and 100%. (Thompson & Levitov, 1985)◦Ex. p=0.63 for a multiple choice test with 4

options.Eclectic distribution of difficult, average and easy items; with extremely limited use of items having p = 0.9 or more (Frary, 1995)

How important is item How important is item difficulty?difficulty?

An item having p = 0.0 and 1.0 does not in any way contribute to measuring individual differences, and seriously affects test validity.

Item difficulty has a profound effect on the variability of test scores and the precision to which the test discriminates between achievement groups. (Thorndike, et al.,1991)

What is item What is item discrimination?discrimination?

It is the ability of an item to discriminate between students with high or low achievement. It is represented by Di.

Range: -1.0 to 1.0 or 0% to 100%

It can be computed using the formula below

0.5NLUDi RR

Rule of the Thumb: Rule of the Thumb: D iD i- - indexindex

The higher the discrimination index, the better the item becomes. This is so because such a value indicates that the discrimination index is in favor of the upper achievement group, whom we expect to get more of the items in the test correctly.

Why should our items Why should our items discriminate?discriminate?

Items that do not discriminate can seriously affect the validity of the test.

Negatively discriminating items are useless and tend to decrease the validity of the test. (Wood, 1960)

How do we evaluate the How do we evaluate the effectiveness of distracters?effectiveness of distracters?

Solve for Di of each wrong option (applicable only for multiple choice items)

Rule of the thumb: The more negative the Di index will be the more effective is the distracter

Workshop 2Workshop 2

Systematic inspection of test items, tasks and format

Item analysis of using Microsoft Excel ® and

STATISTICA 6.0 .0 or SPSS

Some points of caution in Some points of caution in interpreting item/test interpreting item/test

statisticsstatisticsA low index of discriminating power does NOT necessarily indicate a defective item.◦Non-technical factors which contribute to item discriminating power Emphasis given to a domain or content covered by a test.

Homogeneity and characteristics of student groups

Difficulty of an item

Some points of caution in Some points of caution in interpreting item/test interpreting item/test

statisticsstatisticsItem analysis data from small samples are highly tentative and tends to fluctuate due to its norm reference perspective.

A test that had undergone psychometric analysis is NOT necessarily a STANDARDIZED test.

What should we do after What should we do after item analysis?item analysis?

Further analysis with other psychometric methods ◦Ex. Item bias analysis, Point-biserial and biserial correlations

Item Calibration ◦IRT and Rasch Scaling

Item banking

Practical Implications Practical Implications of Evaluating Testsof Evaluating Tests

It helps prevent wastage of time and effort that went in to test and assessment preparation.

It provides a basis for the general improvement of classroom instruction.

It provides a venue for teachers to develop their test construction skills.

Item analysis

Education

Transcript of Item analysis