Criteria to consider when constructing good tests

13

Click here to load reader

Transcript of Criteria to consider when constructing good tests

Page 1: Criteria to consider when constructing good tests

Criteria to Consider when Constructing Good TestsA. Validity – is the degree to which the test measures what is intended to

measure. It is the usefulness of the test for a given purpose. It is the most important criterion of a good examination.

Factors Influencing the Validity of the Tests In General1. Appropriateness of Test – it should measure the abilities, skill and

information it is supposed to measure.2. Directions –it should indicate how the learners should answer and

record their answers.3. Reading Vocabulary and Sentence Structure –it should be based on

the intellectual level of maturity and background experience of the learners.

4. Difficulty of Items - it should have items that are not too difficult and not too easy to be able to discriminate the bright from slow pupils.

5. Construction of Test Items – it should not provide clues so it will not be a test on clues nor ambiguous so it will not be a test on interpretation.

6. Length of the Test –it should just be sufficient length so it can measure what it is supposed to measure and not that it is too short that it cannot adequately measure the performance we want to measure.

7. Arrangement of Items –it should have items that are arranged in ascending level of difficulty such that it starts with the easy so that the pupils will pursue on taking the test.

8. Patterns of Answer –it should not allow the creation of patterns in answering the test.

Ways in Establishing Validity

1. Face Validity – is done by examining the physical appearance of the test2. Content Validity – is done through a careful and critical examination of

the objectives of the test so that it reflects the curricular objectives.3. Criterion-related Validity – is established statistically such that a set of

scores revealed by a test is correlated with the scores obtained in another external predictor or measure.a. Concurrent validity – describes the present status of the individual

by correlating the sets of scores obtained from two measures given concurrently.

b. Predictive validity – describes the future performance of an individual by correlating the sets of scores obtained from two measures given at a longer time interval.

4. Construct Validity – is established statistically by comparing psychological traits or factors that theoretically influence scores in a test.

a. Convergent Validity – is established if the instrument defines another similar trait other than what it is intended to measure. e.g. Critical Thinking Test may be correlated with Creative Thinking Test.

b. Divergent Validity – is established if an instrument can describe only the intended trait and not the other traits. e. g. Critical Thinking Test may not be correlated with Reading Comprehension Test.

B. Reliability – it refers to the consistency of scores obtained by the same person when retested using the same instrument or one that is parallel to it.

Page 2: Criteria to consider when constructing good tests

Factors Affecting Reliability

1. Length of the Test – as a general rule, the longer the test, the higher the reliability. A longer test provides a more adequate sample of the behavior being measured and is less distorted by chance factors like guessing.

2. Difficulty of the Test – ideally, achievement tests should be constructed such that the average score is 50 percent correct and the scores range from near zero to perfect. The bigger spread of the scores, the more reliable the measured difference is likely to be. A test is reliable if the coefficient of correlation is not less than 0.85.

3. Objectivity – can be obtained by eliminating the bias, opinions or judgments of the person who checks the test.

Method Type of Reliability Measure Procedure Statistical

MeasureA.

Test-Retest Measureof stability

Give a test twice to the same group with any time interval between tests from several minutes to several years.

Pearson r

B.Equivalent Forms

Measureof equivalence

Give parallel forms of tests with close time intervals between forms.

Pearson r

C.Test-Retest with Equivalent Forms

Measureof stabilityand equivalence

Give parallel forms of test with increased time intervals between forms.

Pearson r

D.Split Half Measure

of Internal Consistency

Give a test once. Score equivalent halves of the test e.g. odd- and even- numbered items

Pearson r & Spearman

Brown Formula

E.Kuder-Richardson

Measureof Internal Consistency

Give the test once then correlate the proportion/percentage of the students passing and not passing a given item.

Kuder-Richardson Formula 20

and 21

Formulas for Measures of Correlation Used in Establishing Test Validity & Reliability

Pearson r

r=

∑ XYN

−(∑ XN )(∑Y

N )√∑ X2

N−(∑ X

N )2√∑ Y 2

N−(∑Y

N )2

Spearman Brown Formula

reliability of thewhole test=2 roe1+roe

Kuder-Richardson Formula 20

KR20=K

K−1 [1−∑ pqS2 ]

Where: X – scores in a testY – scores in a retestN –number of examinees

Where:roe– reliability coefficient using the split-half or odd-even procedure

Where:K – no. of itemsp – proportion of the examinees who got the

item rightq – proportion of the examinees who got the

item wrongS2 – variance or the square of the standard

deviation

Page 3: Criteria to consider when constructing good tests

Kuder-Richardson Formula 21

KR21=K

K−1 [1− k pqS2 ]

Interpretation of the Pearson r correlation value

High positive correlation {1−Perfect positive correlation0.5−Positivecorrelation

Low positivecorrelation {0.5−Positivecorrelatio n0−Zerocorrelation

Lownegativecorrelation{ 0−ZeroCorrelation−0.5−NegativeCorrelation

Highnegativecorrelation{ −0.5−Negativ ecorrelation−1−Perfect negative correlation

C. Administrability – the test should be administered with ease, clarity and uniformity so that scores obtained are comparable. Uniformity can be obtained by setting the time limit and oral instructions.

D. Scorability – the test should be easy to score such that directions for scoring are clear, the scoring key is simple; provisions for answer sheets are made.

E. Economy – the test should be given in the cheapest way, which means that answer sheets must be provided so the test can be given from time to time.

F. Adequacy – the test should contain a wide sampling of items to determine the educational outcomes or abilities so that the resulting scores are representatives of the total performance in the areas measured.

G. Authenticity – the test should simulate real-life situations.

Shapes of the Frequency Polygons1. Normal – bell-shaped curve2. Positively skewed – most scores are below the mean and there are extremely high scores,x> x̂

(mean is greater than the mode)3. Negatively skewed – most scores are above the mean and there are extremely low scores,

x< x̂ (mean is lower than the mode)4. Leptokurtic – highly peaked and the tails are more elevated above the baseline5. Mesokurtic – moderately peaked6. Platykurtic – flattened peak7. Bimodal Curve – curve with two peaks or mode8. Polymodal Curve – curve with three or more modes9. Rectangular Distribution – there is no mode

Four Types of Measurement Scales

Measurement Scale Characteristics Example1. Nominal Groups and labels data Gender (1-male, 2-female)2. Ordinal Ranks data

Distance between points are indefinite

Income (1-low, 2-average, 3-high)

3. Interval Distance between points are equal

No absolute zero point

Test scores and temperature*a score of zero in a test does not mean no knowledge at all

4. Ratio All of the above except that it has an absolute zero point

Height, weight* a zero weight means no weight at all

Where: p= XK ; q=1−p

Page 4: Criteria to consider when constructing good tests

Measures of Central Tendency and Variability

Assumptions When Used

Appropriate Statistical Tools

Measure of Central Tendency-describes the representative value of a set of data

Measure of Variability-describes the degree of spread or dispersion of a set of data

When the frequency distribution is regularly/ symmetrically/ normal

Usually used when the data are numeric (interval or ratio)

Mean – the arithmetic average

Standard Deviation – the root-mean-square of the deviations from the mean.

When the frequency distribution is irregular/ skewed

Usually used when the data are ordinal

Median – the middle score in a group of scores that are ranked

Quartile Deviation – the average deviation of the 1st and 3rd quartiles from the median

When the distribution of scores is normal and quick answer is needed

Usually used when the data are nominal

Mode – the score that occurs frequently

Range – the difference between the highest and lowest score in a set of observation

I. Procedure in the Computation of the Measures of Central TendencyA. Mean

Procedure:1. Mean of Ungrouped Data: used for few cases (N<30)

a. Get the sum of scores (ΣX)b. Divide the sum by the number of cases (N)

Formula:X=∑ X /N

2. Mean of Grouped Data: uses for large cases (N>30)There are 2 possible methods that will be discussed in computing the mean of grouped data.a. Using Midpoint Method

Procedures:1) Group data in the form of a frequency distribution2) Compute the midpoints of all class limits (M)3) Multiply the midpoints by their frequencies (M x F)4) Get the sum of the products of the midpoints and frequencies (Σ MF)5) Divide the sum by the number of cases (N)

Formula:X=∑ MF

N

b. Using Class Deviation MethodProcedures:

1) Choose your arbitrary starting point or origin from any of the class limits2) Get the midpoint of the class limit that you have chosen as your starting point. Call this

your assumed mean (AM)3) Get the deviation (D) of each class limit from the class limit where the assumed mean

is. The deviation of the class limit where the assumed mean is located is 0. Add one (+1) to each class limit higher than this point of origin and subtract one (-1) to the class limit lower than the origin.

4) Multiply the frequencies by their corresponding deviations (FD)5) Add the products of the frequencies and deviations (ΣFD)6) Divide the sum by the number of cases (ΣFD/N)7) Multiply the quotient by the number of class interval (i)8) Add the product to the assumed mean

Formula:X=AM +i(∑ FDN )

Page 5: Criteria to consider when constructing good tests

B. ModeMedian of Ungrouped Data

There are several ways in the computation of median for ungrouped data. The process depends on a case to case basis

Case 1: The total number of cases is an odd number

Procedure:1.) Arrange the scores from the highest to lowest or vice versa2.) Get the middlemost score. The score is the median score

Case 2: The total number of cases is an even number

Procedure:1.) Arrange the scores from highest to lowest or vice versa.2.) Get the two middlemost scores3.) Compute the average of the two middlemost scores. The average is the median score.

Case 3: The middlemost score occurs twice, thrice, or more number of times

Procedure:1.) Get the middlemost score/s, its/their identical score/s and its/their counterparts either

above or below the middlemost score/s2.) Compute their average and the average score is the median.

2. Median for Grouped DataProcedure:1.) Add up or accumulate the frequencies starting from the lowest to the highest class limit. Call

this the cumulative frequency. (CF)2.) Find one half of the number of cases in the distribution. (N/2)3.) Find the cumulative frequency which is equal or closest but higher than the half of the number

of cases. The class containing this frequency is the median class.4.) Find the lowest limit (LL) of the median class.5.) Get the cumulative frequency of the class below the median class. (CFb)6.) Subtract this from the half of the number of cases in the distribution. (N/2 – CFb)7.) Get the frequency of the median class. (FMdn)8.) Find the class interval (i) then follow the given formula below.

Formula:

~X=¿+i( N2 −CF b

FMdn )C. Mode

Procedure

1. Mode of Ungrouped Data Get the most frequent score

when there are more than three modes, they are called polymodal or multimodal when there is no mode, it is describe as a rectangular distribution.

2. Mode for Grouped Dataa. Crude Mode – refers to the midpoint of the class limit with the highest frequency.Procedure:1.) Find the class limit with the highest frequency2.) Get the midpoint of that class limit3.) The midpoint of the class limit with the highest frequency is the crude mode

Where:LL = lowest limit of the median classi = class intervalN/2 = half of the number of casesCFb = cumulative frequency below the

median classFMdn = frequency of the median class

Page 6: Criteria to consider when constructing good tests

b. Refined Mode–refers to the mode obtained from an ordered arrangements or a class frequency distribution

Procedure:1.) Get the mean and the median of the grouped data.2.) Multiply the median by three (3Mdn)3.) Multiply the mean by two (2Mn)4.) Subtract 2Mn from 3Mdn to get the Mode. (Md)Formula: X̂=3Mdn−2Mn

How will you interpret the Measures of Central Tendency?1.) The value that represents a set of data will be the basis in determining whether the group is

performing better or poorer than the other groups.

II. Procedure in the computation of the Measures of VariabilityA. Range (R)

1. For Ungrouped Data – the difference between the highest and lowest score

2. For Grouped Data – the difference between the highest limit of the highest class limit and the lowest limit of the lowest class limit.

B. Standard Deviation (SD)

Procedure for Ungrouped Data1.) Find the mean. (X )2.) Subtract the mean from each score to get the deviation. [d=X−X ]3.) Square the deviation. (d2)4.) Get the sum of the squared deviations. (Σd2)5.) Divide the sum by the number of cases (Σ d2 / N – 1)6.) Get the square root of the answer. √Σd2 / N-1

Formula: SD=√∑ d2

N−1

Procedure for Grouped DataA. Using Class Deviation Method1.) Like what you did in the mean, get the deviation (d) and the product of the frequency and

deviation of each score. (fd)2.) Multiply the product of the frequency and the deviation by the deviation. (fd2)3.) Get the sum of the product of the frequency and squared deviation. (Σfd2)4.) Compute the standard deviation using the formula below

Formula: SD=I √[∑ f d2

N ]−[ (∑ fd )2

N2 ]B. Using Midpoint Method1.) Square the midpoint (M2) and multiply it by the

frequency midpoint (FM)2.) Write the products of M & FM in another column and label it (FM2)3.) Use the formula below to compute the Standard Deviation.Formula:

SD=√∑ F M 2

N−( X )2

Where:I = intervalN = Number of casesΣfd = sum of the product of frequency

and deviationΣfd2 = sum of the product of the

frequency and squared deviation

Page 7: Criteria to consider when constructing good tests

How will you interpret the standard deviation?1.) The results will help you determine if the group is homogeneous or not.2.) The results will also help you determine the number of students that fall below and above

the average performance.

Study how to do this: Mean – 1 SD and mean + 1 SD would give the limits of an average ability The point right below – 1 SD is the upper limit of the below average ability The point right above + 1 SD is the lower limitof the above average ability

C. Quartile Deviation (QD)

1. Procedure in the Computation of QD for Ungrouped Data1.) Arrange the scores in descending or ascending order2.) Compute the Q1 i.e. [¼ (N)] and the results tells the rank of the Q1 score in the ordered

arrangement from the bottom.3.) Look for the score in this rank.4.) Compute the Q3 score [d = ¾ (N)] and the results tells the rank of the Q3 score.5.) Look for the Q3 score in this rank6.) Compute the QD

QD=Q3−Q1

2

2. Procedure in the Computation of QD for Grouped Data1.) Compute for the value of the 1st quartile

Q1=¿+( N2 −CF b

Fq ) i2.) Compute for the 3rd quartile

Q3=¿+( 3N2 −CF b

Fq ) i3.) Compute for the interquartile range or quartile

QD=Q3−Q1

2

How will you interpret the quartile deviation?The results will also tell if the group is homogeneous or not. It will also tell

how many of the students fall below or above the region of acceptable performance. To do this, study the instruction below. Median – 1 QD and Median +1 QD would give the limits of an average ability The Point right below the (-1) QD is the upper limit of the below average

ability The point right above the +1 QD is the lower limit of the above average ability

STANDARD SCORES Indicate the pupil’s relative position by showing how far his raw score is

above or below average Express the pupil’s performance in terms of standard unit from the mean Represented by the normal probability curve or what is commonly called the

normal curve Used to have a common unit to compare raw scores from different tests

1. PERCENTILE tells the percentage of examinees that

lies below one’s score.

Formula: Pa=¿+i [aN−CFb

FPa ]

Where: Q1 – stands for the 1st quartileLL – lowest limitN/4 – one-fourth of the total

number of the populationCF – cumulative frequency below

the quartile classFq – frequency of the class where

the first quartile score fallsI - interval

Where: LL – lowest limit of the class of a% NCFb – cumulative frequency below the class of a% NFPa – frequency of the class of a% N

Page 8: Criteria to consider when constructing good tests

2. Z-SCORES tells the number of standard deviations equivalent to a given raw score

Formula: Z= X−XSD

Note:Z – score is negative when X <XZ – score is positive when X >X

3. T-SCORES it refers to any set of normally distributed standard deviation score that has a mean of

50 and a standard deviation of 10. computed after converting raw scores to z-scores to get rid of negative valuesFormula: T−score=50+10(Z)

ASSIGNING GRADES/MARKS/RATINGS

A. Marking/Grading - is the process of assigning value to a performance

B. Mark/Grades/Ratings are symbols which:

Could be in – Percent such as: 70%, 75%, 80%, etc. Letters such as: A, B, C, D, or F Numbers such as: 1, 2, 3, 4, or 5 Descriptive expressions such as:

Outstanding (O), Very Satisfactory (VS), Satisfactory (S), Moderately Satisfactory (MS), Needs Improvement (NI), etc.

[Note: Any symbol can be used provided that it has uniform meaning to all concerned]

Could represent – How a student is performing in relation to other students (Norm-Referenced

Grading) The extent to which a student has mastered a particular body of knowledge

(Criterion-Referenced Grading) How a student is performing in relation to a teacher’s judgment of his or her

potential. (Grading in Relation to Teacher’s Judgment)

Could be for – Certification that gives assurance that a student has mastered a specific

content or achieved a certain level of accomplishment. Selection that provides basis in identifying or grouping students for certain

educational paths or programs. Direction that provides information for diagnosis and planning Motivation that emphasizes specific material or skills to be learned and

helping students to understand and improve their performance.

Could be based on – Examination results or test data Observations of student work

Page 9: Criteria to consider when constructing good tests

Group evaluation activities Class discussions and recitations Homework Notebooks and note taking Reports, themes and research papers discussions and debates Portfolios Projects Attitudes, etc.

Could be assigned by – Criterion-referenced grading or grading - based on fixed or absolute

standards where grade is assigned based on how a student has met the criteria or the well-defined objectives of a course that were spelled out in advance.

It is then up to the student to earn the grade he or she wants to receive regardless of how other students in the class have performed. This is done by transmuting test scores into marks or ratings.

Norm-referenced grading or grading - based on relative standards where a student’s grade reflects his or her level of achievement relative to the performance of other students in the class.

In this system the grade is assigned based in the average of test scores. The rating scales that are used in assigning grades are:1.) The four point rating scale which uses the median and quartile deviation

of the test scores to group the scores into four and each group is assigned the corresponding grade of A, B, C, and D or 1, 2, 3, or 4.

2.) The five point rating scale which uses the median and quartile deviation of the test scores to group the scores into 5 and each group is assigned the corresponding grade of A, B, C, D, or F or 1, 2, 3, 4, or 5

Point or Percentage Grading System whereby the teacher identifies points or percentages of various tests and class activities depending on their importance. The total of these points will be the bases for the grade assigned to the student.

Contract Grading System where each student agrees to work for a particular grade according to agreed-upon standards.

Guidelines in Grading Students1.) Explain your grading system to the students early in the course and remind them

of the grading policies regularly2.) Base grades on a predetermined and reasonable set of standards.3.) Base your grades on as much objective evidence as possible.4.) Base grades on the student’s attitude as well as achievement, especially at the

elementary and high school level.5.) Base grades on the student’s relative standing compared to classmates.6.) Base grades on a variety of sources7.) As a rule, do not change grades.8.) Become familiar with the grading policy of your school and with your colleagues’

standards9.) When failing a student, closely follow school procedures.10.)Record grades on report cards and cumulative records.11.)Guard against bias in grading.12.)Keep pupils informed of their standing in the class

Page 10: Criteria to consider when constructing good tests

References

Frankael, J.R. & Wallen, N.E. (1993). How to Design and Evaluate Research in Education, 2nd Edition, New York: McGrawHill Inc.

Nackmeas, C.F. and Nachmeas, D. (1996). Research Methods in the Social Sciences, 5th Edition, London: St. Martius Press, Inc.

Oriondo, Leonora et. al. (1996). Evaluating Educational Outcomes. Quezon City: Rex Printing Company, Inc.

Omstein, Allan C. (1990). Strategies for Effective Teaching. Newyork: Harper Collins Publisher: Navotas, M.M.