final

63
1.0 INTRODUCTION 1.1 Purpose To investigate the effectiveness of a set of 30 multiple- choice questions on English for Science and Mathematics subject in an upper secondary class in Sekolah Menengah Kebangsaan Gajah Berang. 1.2 Objectives 1.2.1 To plan and develop a set of 30 multiple-choice questions on English or Science and Mathematics based on the syllabus, content (topics ad sub-topics), instructional objectives, and Table of Specifications. 1.2.2 To assemble the 30 multiple-choice questions. 1.2.3 To administer the 30 multiple choice questions 1.2.4 To score and grade the performance of the students in the 30 multiple choice questions. 1.2.5 To present the performance of the students using descriptive statistics such as measures of central tendency (Mean, Median, Mode), Measures of

Transcript of final

Page 1: final

1.0 INTRODUCTION

1.1 Purpose

To investigate the effectiveness of a set of 30 multiple-choice questions on

English for Science and Mathematics subject in an upper secondary class in

Sekolah Menengah Kebangsaan Gajah Berang.

1.2 Objectives

1.2.1 To plan and develop a set of 30 multiple-choice questions on English or

Science and Mathematics based on the syllabus, content (topics ad sub-

topics), instructional objectives, and Table of Specifications.

1.2.2 To assemble the 30 multiple-choice questions.

1.2.3 To administer the 30 multiple choice questions

1.2.4 To score and grade the performance of the students in the 30 multiple

choice questions.

1.2.5 To present the performance of the students using descriptive statistics

such as measures of central tendency (Mean, Median, Mode), Measures

of Dispersions/ Variability (Range, Variance, Standard Deviation) and Z-

scores & T- scores.

1.2.6 To analyze the 30 multiple choice questions using item analysis (item

difficulty & item discrimination) and distracter analysis.

1.2.7 To discuss the results of the descriptive statistics, item analysis and

distracter analysis.

Page 2: final

1.2.8 To provide a conclusion for the project.

2.0 METHODOLOGY

2.1 Subjects

In conducting this research, we assessed 20 students of form 5 from Sek. Men.

Keb. Gajah Berang. The questions are on English for Science and Technology

(EST). The sample students provided by the teacher are the representatives from

form 5 Science 1, 5 Science 2, 5 Science 3, and 5 Science 4. Even though they

may seem differ from classroom segregation of academic abilities, but according

to the teacher, Madam Fang, they are from homogenous group. They possessed

similar abilities for they are the science stream students. The teacher picks the

students from each science stream class randomly. 5 representatives have been

chosen from each four classes mentioned above in order to assess overall

students’ performance precisely regardless of their academic abilities. It is also

for the purpose of to spread or vary our students sampling in this research to be

more accurate. As stated earlier, there are 20 students altogether. Four female

students and the rest which are sixteen of them are male students. According to

the school’s principal and the Head teacher of English Language of the school,

students are from middle class family and few of them are from upper middle

class family. Therefore, financial problem is not a big deal for most of them. As

for the time allocation of EST subject per class, Madam De Kwee Poh (EST

teacher for the 5 Science 1) said that 3 hours per week were allocated for each

class.

Page 3: final

2.2 Materials

In developing the questions, we done our surveys in the bookshops and

picked several exercise books that follow the new Form 4 and form 5 KBSM

syllabuses. After finish discussing on which sample questions is the best and that

relate to the new KBSM syllabuses as well as the Bloom’s Taxonomy of

Educational Objectives, we have decided to choose form 4 EST exercise book,

published by Oxford Fajar. The test consist of 30 multiple choice questions,

varies from their syllabus and Bloom’s Taxonomy of Educational Objectives level

and the tests is mid-year assessment for Form students. According to our table

of specification, there are six questions fall under Knowledge, twenty two under

Comprehension and one question fall under Analysis, and Application stage,

respectively. According to the syllabus, there are eleven topic included in the test

given. The topics are Treasures of Nature, Energy Comes & Goes, its All

Chemistry, Force & Motion, Tiny Beings Great Terrors, It’s All In The Genes,

Meddling With Nature, Food and thoughts, The World at Your Fingertips (ICT),

The Frontier of Space, and Reading New Horizons. Basically, EST does not have

specific syllabus for each form 4 and form 5. According to the curriculum

specification, both form share the same syllabus and curriculum specifications.

And their textbook also is of the same thing. They use the same textbook in form

5 that they have been used in form 4. Therefore, although we use form 4

exercise book, our students sampling are from form 5. The details of 30 multiple-

choice questions later were explained in planning and staging stage.

Page 4: final

2.3 PROCEDURES

2.3.1 Planning Stage

Before we start to construct the test question, all of us gather together to discuss

what subjects do we want to measure for this project. After a few discussions

with all of the group members, we come to an agreement and choose the English

Science and Technology (EST) subject as common ground to be measure. After

that we went to Sek. Men. Keb. Gajah Berang, and see the EST teacher to find

out the syllabus of the subject and how many topics have already been taught or

covered by the teacher because as we all know, a test should measure what has

been taught by the teacher. This information is important to us to facilitate the

test that we are going to construct for the students. Once we have analyzed the

information that we have gathered, we starts to develop our table of specification

which will serve as our guideline to make sure that our test contents will be

closely related to the classroom curriculum and educational objectives. The table

of specification is very crucial because it help us to determine what the major

content areas are to be covered in the test. These content areas are derived by

carefully reviewing the educational objective and selecting major content to be

included in the test which will measure different level of Bloom’s Taxonomy of

education. Thus, it is essential to refer to the table of specification to ensure as

wide a sampling of the potential content as possible. In order to get the general

ideas of how the EST test should look like, we go through several of EST past

year papers and closely discuss with the teacher what are the suitable question

to be located or set in our test paper to make sure its reliability and validity. After

all of the process above have been established, we start to construct our

question for the test and distribute it to the teacher to be checked to make sure

its accuracy standards and whether it measure what are suppose to be measure

for the subject.

2.3.2 Assembling Stage

Page 5: final

The test papers consists of 30 multiple choice questions and have to be

completed in 1 hour. For question number 1 it comes from Man and Human Body

under topic 9 in the syllabus which measure the comprehension level of Bloom’s

Taxonomy. While question number 2 is from topic 5 Natural Resources and

Industrial Process which also measure the comprehension level of the cognitive

taxonomy. Question 3 also comes under topic 5 but it measures the lowest level

of Bloom’s Taxonomy which is Knowledge. Furthermore, Topic 15; The Universe

Astronomy Aerospace is set for question number 4 which will measures the

comprehension level of the Taxonomy. We move to question number 5 which

going to test the student’s comprehension level of the Bloom’s Taxonomy from

topic 3 Natural Resources.

Now we move to question number 6 which come under topic 6, Matter & Mass

which also measure the comprehension level of Bloom’s Taxonomy. As question

number 7, it measures the comprehension level of the Taxonomy under Topic 8,

The Human Body from Topic 13, Technology and Communication is set for

question number 8 which will measures the knowledge level of Bloom’s

Taxonomy. Question number 9 is about to measures the knowledge level of the

Taxonomy under topic 9, Man & Human Body. For question numbers 9 and 10, it

comes under topic 9 too, which measure the comprehension level. Same goes to

questions number 12 and 13 which will measure the comprehension level of the

Taxonomy under topic 10, Man & Living Organism.

Topic 11, Nutrition & Food is placed in question number 14 which will measures

the student’s level of comprehension level of Bloom’s Taxonomy. Meanwhile

Knowledge level which is the lowest level of the Bloom’s Taxonomy is measured

in question 15, under topic 16 that is The Universe, Astronomy and Aerospace.

In the other hand, for question number 16, it comes under topic 7, Force &

Motion which will measures the comprehension level of the Taxonomy. Whereas,

topic 6 Matter & Mass comes under question 17 which will measures the

application level of Bloom’s Taxonomy. As for questions number 18, 19 and 20 it

all come under the same topic that is Matter & Mass from topic 6.

Page 6: final

Comprehension level is test in question 18 and 19 while knowledge level of the

cognitive level is measure in question 20.

Apart from that, Man & Living Organism from topic 10 is set in question 21 which

will measures the knowledge level of the Bloom’s Taxonomy. As for questions

number 22 and 23, it covered topic 8 that is Human Body which both measure

the comprehension level of the Boom’s Taxonomy. Meanwhile topic 10, Man &

Living Organism are placed in question 24 and 25. In question 24, analysis level

of Bloom’s Taxonomy is tested whereas in question 25 will measure the

comprehension level. Nutrition & Food from topic 11 is covered in question 26

until 30. The questions will all measure the comprehension level of the Bloom’s

Taxonomy.

2.3.3 Administering

When the test is ready, all the 30 multiple-choice questions are given to

the students. Firstly we have to make sure that the Form 4 Science students of

Sekolah Menengah Kebangsaan Gajah Berang are ready for the test. There are

some suggestions to help students psychologically prepare for the test.

Firstly we maintain a positive attitude. We went to the school a week

before we distribute the test. Letting the students know that there will be a test

next week can encourage them a positive test-taking attitude. It helps keep the

main purposes of classroom testing in mind; to evaluate achievement and

instructional procedures and to provide feedback to us and also the students. By

doing this, falling victims to such testing traps can be avoided and maintain a

positive test-taking atmosphere. Secondly is maximizing achievement aspect of

the test. Encourage the students to do their best in test and not to immobilize

Page 7: final

with fear. The test is something to be taken clearly and this should be clear to the

class.

Technically we went to the school to inform about the test a week earlier.

Such preparation can avoid surprises and also the students will have sufficient

advance of notice. This is not to say that the teacher should avoid frequent

quizzes. When students are tested frequently, learning or study takes place at

more regular intervals rather than study a night before a test. Letting the students

know about the test late will affect their expected performance that very important

and this will not evaluate their achievement.

In the classroom, before distribute the tests, we inform the students about

the time limits, restroom policy, and some of our special considerations. It is

important to inform the rules because usually students often fiddle with the rule

after they receive their tests and may miss important instruction. We started to

distribute the tests from left to right because allocating tests in this way will

prevent any students to get last paper in the class.

After distributing the tests to student, we remind the students to check

their copies. The item that should be checks in the tests are page numbers, the

questions, answer key and confirm whether they get the correct paper. Then we

let the tests begin and we monitor/ set the time limits for the tests.

We monitor the students while they answering the tests. We have to

make sure that they are not copying each others’ answers. During the monitoring

stage, we also inform the students do not cheat; there are punishments for

cheating. The reasons of avoiding cheating are because we can have precise

results of the students’ performance and we can evaluate the results correctly.

Page 8: final

2.3.4 Scoring and Grading

Scoring is one of the important parts in evaluating students’ performance.

We have distributed 30 multiple choice questions among 20 students of SMK

Gajah Berang . The questions given are tested on subject English for Science &

Technology (EST) and the students sampling prepared by the school

administrator are from mix abilities. Students are from four different science

classes altogether. 5 Science 1, 5 Science 2, 5 Science 3, and 5 Science 4. After

collecting back the questionnaires from the students, we determine the scoring.

There are several steps required in calculating the scoring and grading of each

student.

First, preparing the answer keys is the utmost important steps in

determining students’ scores. Without the answer keys, scoring can be difficult

task and might be unreliable to score. Answer key will save time during the

scoring session and also classify whether the questions need to be eliminated or

not. During constructing the answer key, researchers can identify whether the

time for the tests enough for students. Our 30 multiple-choice questions for the

subject English for Science and Technology is appropriate for the time limits; 1

hour.

Also during scoring the tests, we sit together and check each other’s

answer key in order to identify possible alternative answers and potential

problems. Since, we did not know the students, we are not affected by the halo

effect and hence their marks are not affected because basically there are not

much information about the students’ background and performance provided by

Page 9: final

their teacher. We did not return the tests back to students as we need to compile

them in our final report.

After scoring the tests correctly and accurately, the next thing that needs

to be done is grading the results. Grading or analyzing the test will determine

whether the test is valid or not. Basically, no test that the teachers had

constructed to their students will be perfect. It will include inappropriate or

otherwise deficient items. Thus in grading stage, a technique called Item Analysis

is very important. Item analysis is used to identify items that are defiant in some

ways. For example miskeying, guessing, and ambiguity.

Based on the results of the test, most of the question is functioning well.

The question is clear enough for students to understand it. The distracters of

every questions are well functioning and it is not difficult to the upper 10 students

to answer them. As informed by the teacher, students are already covered the

entire syllabus in the textbook since last year. Thus the possibility of guessing

item to occur is less. Unfortunately there are several questions that are

miskeyed, characterized by guessing and ambiguous.

Miskeying occurs when most students who did well in the test will likely to

choose wrong answer (distracter) rather than the correct answer. In question

number 2, most of upper students tend to choose distractor (A, B) than the

correct answer (C):

Page 10: final

Question 2.

A B C* D

Upper half 4 5 1 1

Questions for number 16 and 19 also miskeyed. Most of the students in

upper class choose distracters (B, A) relatively than the correct answers (A, C):

Question 16.

A * B C D

Upper half 1 9 0 0

Question 19.

A B C* D

Upper half 6 0 3 1

In these cases, the key is not positively discriminate and the distractors

are attracting the students in upper half; discriminate positively. Basically revision

is necessary and if possible, eliminates the items.

There is one question we characterized it as guessing. In guessing, it is

most likely to occur when the item is (a) not covered in the class, (b) so difficult

that even the students have no idea what the correct answer is, or (c) so trivial

Page 11: final

that students are unable to choose the option provided. As for question number

22, the question is not clear enough and the distracters are not well functioning:

Question 22.

A * B C D

Upper half 5 2 1 2

The question should be revised or eliminated. The option (A) is not clear

enough and most of the distracters have almost same level of frequency as the

option.

As for question number 27, the item is ambiguous. the distractor (D) is not

well functioning because it attracts same total numbers of students selecting the

correct answer. In this item, students who do well miss the item that are drawn

almost entirely to one of the distractor. thus the item should de revised.

Question 22.

A * B C D

Upper half 4 1 0 4

Page 12: final

3.0 RESULTS

3.1 Frequency Table & Histogram

Class Intervals Tally Frequency

24 – 26 /// 3

21 – 23 /////

/////

///

13

18 – 20 /// 3

15 – 17 / 1

Table 3.1: Frequency Distributions of Students Score

Page 13: final

Histogram Showing the Distribution of Scores of Form 5 Students of SMK Gajah Berang in EST test in 30 MCQs.

No. Of students/ Frequency

20

18

16

14

12

10

8

6

4

2

0 Scores

12 – 14 18 – 20 24 – 26

15 – 17 21 – 23 27 – 29

Page 14: final

3.2 Measures of Central Tendency

3.2.1 Mean

21.8

3.2.2 Median

22

3.2.3 Mode

23

3.3 Measures of Dispersion/Variability

3.3.1 Range

9

3.3.2 Variance

4.8

3.3.3 Standard Deviation

2.19

Page 15: final

3.4 Z-score and T-score

Z-score = X- X̄7 SD

T-score = 10Z+50

No. X̄ X̄- X̄7 Z- score T- score

1 26 4.2 1.92 69.2

2 25 3.2 1.46 64.6

3 24 2.2 1.00 60

4 23 1.2 0.55 55.5

5 23 1.2 0.55 55.5

6 23 1.2 0.55 55.5

7 23 1.2 0.55 55.5

8 23 1.2 0.55 55.5

9 22 0.2 0.09 50.9

10 22 0.2 0.09 50.9

11 22 0.2 0.09 50.9

12 22 0.2 0.09 50.9

13 21 - 0.8 - 0.37 46.3

14 21 - 0.8 - 0.37 46.3

15 21 - 0.8 - 0.37 46.3

16 21 - 0.8 - 0.37 46.3

17 20 - 1.8 - 0.82 41.8

18 19 - 2.8 - 1.28 37.2

19 18 - 3.8 - 1. 74 32.6

20 17 - 4.8 - 2.19 28.1

Table 3.4: Table showing Z-score and T-score of the subject EST for 20 students in SMK GAJAH BERANG

Page 16: final

3.5 Item Analysis

3.5.1 Item Difficulty, P

P= No. of students choosing the correct answer No. of students

Table 3.5.1: Item analysis and distracters analysis of 30 multiple-choice questions of EST subject (item difficulty)

The difficulty of a test item that is scored right or wrong is indicated by the

fraction of students who get the item right. There are 20 questions in the item that

falls into the easy category which falls into the range of >0.70. the questions are

question number 3, 4, 5, 6, 8, 9, 12, 13, 14, 15, 16, 17, 18, 20, 21, 25, 26, 28, 29

No. ITEM DIFFICULTY (p)1 0.552 0.103 0.754 0.905 0.956 0.807 0.358 0.709 1.0010 0.5511 0.5512 1.0013 1.0014 1.0015 0.9516 0.2517 1.0018 0.8519 0.3020 0.8021 0.8022 0.4023 0.5024 0.9025 0.9026 1.0027 0.2028 0.9029 0.8530 0.90

Page 17: final

and 30. 8 questions fall into the moderate difficulty category, these ranges from

0.30 to 0.69. While, there is only 3 questions that falls into the difficult category,

which are question number 2, 16 and 20. Whereby the calculation of item

difficulty will show the value of less than 0.29

3.5.2 Item Discrimination, D

D= (No. of students who chose the correct answer in the upper group) - (No. of student who chose the correct answer in the lower group) (No. of students in each group)

Table 3.5.2: Table showing item difficulty and discrimination of 30 EST question distributed to

20 students of SMK GAJAH BERANG

No. ITEM DISCRIMINATION (d)1 0.302 0.003 0.304 0.005 0.106 0.007 0.108 0.409 0.0010 0.7011 0.1012 0.0013 0.0014 0.0015 -0.9016 -0.3017 0.0018 0.3019 0.0020 0.0021 0.4022 0.2023 0.2024 0.2025 0.2026 0.0027 0.3028 0.2029 -0.1030 0.00

Page 18: final

There are 15 questions that have no discrimination (0.00 or negative values) at all. Those items are questions number 2, 4, 6, 9, 12, 13, 14, 15, 16, 17, 19, 20, 26, 29, 30. Meanwhile, there are 9 questions that falls into the moderate discrimination which ranges from 0.2 to 0.39. Those questions are questions number 1, 3, 18, 22, 23, 24, 25, 27, 28. Low discrimination, 0.1 to 0.19 were found in question number 5, 7 and 11. Lastly, there are only three questions with high discrimination, more than 0.4. The questions numbers are 8, 10 and 21.

Page 19: final

4.0 DISCUSSION

4.1 Histogram

XO mdn mode

15 20 25 30

Negatively skewed

: 21.8X̄�Median: 22Mode: 23

Page 20: final

Based on the graph, a negatively skewed distribution showed that the

mean has the lowest score that is 21.8 while the median in the middle with the

intermediate score 22 and the mode is the highest score 23. The negatively

skewed distribution indicates that the class did very well in the test with a majority

of them have high scores and only few had lower scores. Most of the students

have scored between 21 and 23 meaning that they can be class as a good

students or homogenous which mean all of them have almost the same ability

compare to one another. Again there could be many reasons for this. The test

may have been too easy due to the familiarity of the type of question since all of

the student have covered all the topic and done a lot of past year papers and

exercises in the classroom. Therefore it easy for them to score the test when

there are might be similar question in the test that they have done on their own.

Moreover the students are also exceptionally brilliant since they all come from

the first science class which their placement in the class was based on their

performance and achievement in the school academic.

4.2 Measures of Central Tendency

The mean is the average score of the student in the test. We can see

from the graph that most of the student had average score which is 21.8. This

indicates that almost all the student did well in the test with an average scored.

The mean has several characteristics that make it the measure of central

tendency most frequently used. One of these characteristic is stability. Since

each score in the distribution enters into the computation of the mean, it is more

stable over time than other measures of central tendency which consider only

one or two scores. Another characteristic is that the sum of each score’s distance

from the mean is equal to zero. A third characteristic is that the mean is affected

by extreme scores. This means a few very low scores of 20 or below in the

negatively skewed distribution will pull the mean down toward them. Thus, the

mean score gives an impression that the typical student scored about 21 and

pass the test with an A grade while the student below the mean score still pass

the test but with the C grade.

Page 21: final

The median is the score that splits a distribution in half. 50 percent of the

scores lie above the median and 50 percent of the score lie below the median. It

also can be describe as the middle scores since its falls in the middle of the

distribution scores. The score distribution show that 50 percent of the students

scored 22 and above on the test and which mean half of the students past the

test with an A grade. While the other half falls between A and B grade.

The mode is the most frequently scores occur in the distribution. Based

on the graph we has unimodal mode which mean only one score that most

frequently occurred in the student’s scores that is 23. The mode also indicate that

the many students score highly in the test with an A grade. So we can conclude

that most of the students in this class are good students and most of them pass

the test with an A grade.

4.3 Measures of Standard Deviation

The purpose of measures of variability is to show how the scores are spread from the mean. It is important because the measures of variability will determine in which group the majority of the students are, the good or the weak.

Range is the simplest measures of variance, calculated by subtracting the lowest score from the highest score. The range provides a quick estimate of variability but is undependable because it is based on the two positions of two extreme scores. The addition or subtraction of a single score can change the range significantly. As for our research, we need to arrange the students’ scores from the highest score to the lowest score; starting from the score 26 over 30 until 17 over 30. The range of the data is 9.

Standard Deviation (SD) is the most useful measure of variability. The calculation of the standard deviation does not make its meaning readily apparent, but essentially it is an average of the degree to which a set of scores deviate from the mean the procedure for calculating a standard deviation involves squaring each score and taking a square root. In overall, the calculation of the standard deviation needs the help from scientific calculator.

In order to calculate the standard deviation, firstly we need to calculate the mean first. Next subtract the mean from the raw scores, accordingly from the highest score to the lowest score. Then, square the results from the subtraction. Lastly is taking the square roots from each score so thus the calculation of the standard deviation can be ended. Here is the formula of standard deviation:

Page 22: final

The variance from the total score of the tests is 4.80. The standard deviation of the calculation is 2.19. Having the correct calculation of standard deviation can help to evaluate the students’ performance on the tests. Standard deviation is also important in calculating the Z-score and the T-score.

4.4.1 Z-score and T-score

Z-score is the simplest of the standard scores. This score expresses test performance simply and directly as the number of standard deviation units a raw score is above, or below the mean.

A Z-score is always positive when the raw score is bigger than mean. In our tests, we have 12 students that have positive Z-scores; 1.92, 1.46, 1.00, five of them get 0.55, and four of them get 0.09. As a Z-score is always positive when the raw score is bigger than mean, in contrast, a Z-score is always negative when the raw score is smaller than the mean. The lowest result of Z-score is -2.19, following by -1.74, -1.28, and four students share same results; -0.37. Forgetting the negative sign (-) can cause serious errors in test interpretation. For this reasons, Z-scores are seldom used directly in tests norms but are usually transformed into a standard score system that use only positive numbers; the term of T-score.

T-score has become to refer to any set of normally distributed standard score that has a mean of 50 and a standard deviation of 10. T-score can be obtained by multiplying the Z-score by 10 and adding the calculation to 50.

One reason that T-score is preferable to Z-score for reporting the test results is that only positive integers (+) are produced. The results of T-score of Sekolah Menengah Kebangsaan Gajah Berang were calculated and listed from the highest to the lowest.

4.5 ITEM ANALYSIS AND DISTRACTER ANALYSIS

Question 1

1. Is the item difficulty level appropriate for the testing application

The item difficulty with 0.55 proves that it is appropriate for the testing application which is analysis. The question asks students to conclude what the short passage is about.

Page 23: final

2. Does the item discriminate adequately?

With item discrimination of 0.3, the item does a satisfactory job in discriminating between examinees who performed well on the test and those doing poorly

3. Are the distracters performing adequately

Option A does not function for the item as it does not attract any of the students to choose it. As for B it is a weak distracter because it attracts one student from the upper group while none from the lower. Option C is a good distracter because it attracts more of the weak students that the good students.

4. Overall evaluation

This item need to checked and revised because there are weaknesses in the distracters as mentioned above.

Question 2

1. Is the item difficulty level appropriate for the testing application

The item difficulty with 0.1 proves that the item is difficult. However, it is appropriate for the synthesis application. The item asks students to come up with a motto from what they have read.

2. Does the item discriminate adequately?

There is no discrimination for this item because the number of students between the upper and lower group choosing the correct answer is the same.

3. Are the distracters performing adequately

Option A is a good distracter because it attracts more of the lower group students. Option B is a weak distracter because more of the upper group students are attracted to it. As for D, it is a non-functioning distracter because it does not attract any of students from both the group.

4. Overall evaluation

Page 24: final

This item is eliminated because it cannot discriminate between those who performed well those performing poorly in the test.

Question 3

1. Is the item difficulty level appropriate for the testing application

A p of 0.75 shows that the item is easy as it is only testing the students knowledge when they read the passage.

2. Does the item discriminate adequately?

With item discrimination of 0.3, the item does a satisfactory job in discriminating between examinees who performed well on the test and those doing poorly

3. Are the distracters performing adequately?

Option A is functioning well. C is a non-functioning distracter while D is a good distracter

4. Overall evaluation

This item need to checked and revised because there are weaknesses in the distracters as mentioned above and the discriminating power of the item is showing only satisfactory job

Question 4

1. Is the item difficulty level appropriate for the testing application

The item difficulty is 0.9 this shows that the item is too easy. The optimal mean p value for a multiple-choice question item with four choices is 0.74

2. Does the item discriminate adequately?

This item has no discrimination.

3. Are the distracters performing adequately

Option A and C is a weak distracter. Distracter D is non-functioning

Page 25: final

4. Overall evaluation

This item needs to be eliminated because it has no discrimination. It is not effective to test students’ performance

Question 5

1. Is the item difficulty level appropriate for the testing application

The item difficulty with 0.95 suggests that this item is too easy

2. Does the item discriminate adequately?

With item discrimination of 0.1, it indicates that the item has low discrimination

3. Are the distracters performing adequately

All of the distracters are not performing adequately with option A being weak and B and C is non-functioning

4. Overall evaluation

This item is eliminated or rewritten in a new way with improved distracters.

Question 6

1. Is the item difficulty level appropriate for the testing application

The item difficulty with 0.8 proves that this item is easy

2. Does the item discriminate adequately?

The item does not discriminate adequately because it has 0 value of discrimination.

3. Are the distracters performing adequately

All of the distracters are not performing adequately with option A and B being non-functioning while option C is a weak distracter

Page 26: final

4. Overall evaluation

This item is eliminated

Question 7

1. Is the item difficulty level appropriate for the testing application

The item difficulty with 0.35 shows that this item has moderate difficulty

2. Does the item discriminate adequately?

D value of 0.1 suggests that this item has low discrimination

3. Are the distracters performing adequately

Distracter A and B are performing adequately while distracter C is not

4. Overall evaluation

This item is checked and revised

Question 8

1. Is the item difficulty level appropriate for the testing application

The item difficulty with 0.7 proves that this item is easy

2. Does the item discriminate adequately?

The D value of 0.4 suggests that this item has high discrimination

3. Are the distracters performing adequately

Some of the distracters are not performing adequately with option A and B being non-functioning while only option C works as a good distracter

4. Overall evaluation

This item is retained. However, the distracters need to be improve

Page 27: final

Question 9

1. Is the item difficulty level appropriate for the testing application

The item difficulty with 1 proves that this item is too easy

2. Does the item discriminate adequately?

The item does not discriminate adequately because it has 0 value of discrimination.

3. Are the distracters performing adequately

All of the distracters are not performing adequately because all of the students answer the correct answer

4. Overall evaluation

This item is eliminated

Question 10

1. Is the item difficulty level appropriate for the testing application

The item difficulty with 0.55 proves that this item has moderate difficulty

2. Does the item discriminate adequately?

The item discriminates adequately because it has 0.7 value of discrimination.

3. Are the distracters performing adequately

Other distracters are performing adequately except for distracter C

4. Overall evaluation

This item is retained but distracter C need to be improved

Question 11

1. Is the item difficulty level appropriate for the testing application

The item difficulty with 0.55 suggests that this item has moderate difficulty

Page 28: final

2. Does the item discriminate adequately?

This item has 0.1 discrimination, making it a low discrimination

3. Are the distracters performing adequately

Distracter A and C is not performing adequately

4. Overall evaluation

This item is eliminated or rewritten

Question 12

1. Is the item difficulty level appropriate for the testing application

The item difficulty with value of 1 suggests that this item is too easy

2. Does the item discriminate adequately?

With item discrimination of 0, it indicates that the item has no discrimination

3. Are the distracters performing adequately

All of the distracters are not performing adequately because all of the students answer correctly

4. Overall evaluation

This item is eliminated

Question 13

1. Is the item difficulty level appropriate for the testing application

The item difficulty with value of 1 suggests that this item is too easy

2. Does the item discriminate adequately?

With item discrimination of 0, it indicates that the item has no discrimination

3. Are the distracters performing adequately

Page 29: final

All of the distracters are not performing adequately because all of the students answer correctly

4. Overall evaluation

This item is eliminated

Question 14

1. Is the item difficulty level appropriate for the testing application

The item difficulty with value of 1 suggests that this item is too easy

2. Does the item discriminate adequately?

With item discrimination of 0, it indicates that the item has no discrimination

3. Are the distracters performing adequately

All of the distracters are not performing adequately because all of the students answer correctly

4. Overall evaluation

This item is eliminated

Question 15

1. Is the item difficulty level appropriate for the testing application

The item difficulty with value of 0.95 suggests that this item is too easy

2. Does the item discriminate adequately?

With item discrimination of negative value -0.9, it indicates that the item has no discrimination

3. Are the distracters performing adequately

All of the distracters are not performing adequately because all of the students in the lower group answer correctly

Page 30: final

4. Overall evaluation

This item is eliminated

Question 16

1. Is the item difficulty level appropriate for the testing application

The item difficulty with value of 0.25 suggests that this item is difficult

2. Does the item discriminate adequately?

With item discrimination of negative value -0.3, it indicates that the item has no discrimination

3. Are the distracters performing adequately

Distracters are not performing adequately except for distracter D

4. Overall evaluation

This item is eliminated

Question 17

1. Is the item difficulty level appropriate for the testing application

The item difficulty with value of 1 suggests that this item is too easy

2. Does the item discriminate adequately?

With item discrimination of 0, it indicates that the item has no discrimination

3. Are the distracters performing adequately

All of the distracters are not performing adequately because all of the students answer correctly

4. Overall evaluation

This item is eliminated

Page 31: final

Question 18

1. Is the item difficulty level appropriate for the testing application

The item difficulty is 0.85 this shows that the item is easy.

2. Does the item discriminate adequately?

This item has moderate discrimination, 0.3

3. Are the distracters performing adequately

Not all of the distracters performing adequately. Option B is non-functioning, D is a weak distracter with only option C being a good distracter

4. Overall evaluation

This item needs to be checked and revised

Question 19

1. Is the item difficulty level appropriate for the testing application

The item difficulty is 0.3 shows that the item has moderate difficulty

2. Does the item discriminate adequately?

This item has no discrimination, 0

3. Are the distracters performing adequately

The distracters are not performing adequately. A is a weak distracter. B is non-functioning and D works as a good distracter.

4. Overall evaluation

This item is eliminated

Question 20

1. Is the item difficulty level appropriate for the testing application

The item difficulty is 0.8 this shows that the item is easy.

Page 32: final

2. Does the item discriminate adequately?

This item has zero value of discrimination

3. Are the distracters performing adequately

Not all of the distracters performing adequately. A and D is a weak distracter while C is non-functioning

4. Overall evaluation

This item is eliminated

Question 21

1. Is the item difficulty level appropriate for the testing application

The item difficulty is 0.85 this shows that the item is easy.

2. Does the item discriminate adequately?

This item has moderate discrimination, 0.3

3. Are the distracters performing adequately

Not all of the distracters performing adequately. Option B is non-functioning, D is a weak distracter with only option C being a good distracter

4. Overall evaluation

This item needs to be checked and revised

Question 22

1. Is the item difficulty level appropriate for the testing application

Item difficulty of 0.4 shows that the item has moderate difficulty

2. Does the item discriminate adequately?

This item has moderate discrimination, 0.2

Page 33: final

3. Are the distracters performing adequately

Distracter B and D is not performing adequately because it attracts the same amount of students from both the upper and lower group

4. Overall evaluation

This item needs to be checked and revised

Question 23

1. Is the item difficulty level appropriate for the testing application

The item difficulty is 0.5 this shows that the item has moderate difficulty

2. Does the item discriminate adequately?

This item has moderate discrimination, 0.2

3. Are the distracters performing adequately

Distracter C and D are not performing adequately. Distracter A is performing adequately because it attracts more students from the lower group

4. Overall evaluationThis item needs to be checked and revised

Question 24

1. Is the item difficulty level appropriate for the testing application

The item difficulty is 0.9, this shows that the item is easy.

2. Does the item discriminate adequately?

This item has moderate discrimination, 0.2

3. Are the distracters performing adequately

Only option A performs adequately. C and D is a non-functioning distracter

Page 34: final

4. Overall evaluation

This item needs to be checked and revised

Question 25

1. Is the item difficulty level appropriate for the testing application

The item difficulty of 0.9 shows that the item is too easy.

2. Does the item discriminate adequately?

This item has moderate discrimination, 0.2

3. Are the distracters performing adequately

All of the distracters are not performing adequately. Option B is non-functioning, while C and D is a weak distracter

4. Overall evaluation

This item needs to be checked and revised

Question 26

1. Is the item difficulty level appropriate for the testing application

The item difficulty with value of 1 suggests that this item is too easy

2. Does the item discriminate adequately?

With item discrimination of 0, it indicates that the item has no discrimination

3. Are the distracters performing adequately

All of the distracters are not performing adequately because all of the students answer correctly

4. Overall evaluation

This item is eliminated

Page 35: final

Question 27

1. Is the item difficulty level appropriate for the testing application

The item difficulty with value of 1 suggests that this item is too easy

2. Does the item discriminate adequately?

With item discrimination of 0, it indicates that the item has no discrimination

3. Are the distracters performing adequately

All of the distracters are not performing adequately because all of the students answer correctly

4. Overall evaluation

This item is eliminated

Question 28

1. Is the item difficulty level appropriate for the testing application

The item difficulty with value of 9 suggests that this item is easy

2. Does the item discriminate adequately?

With item discrimination of 0.2, it indicates that the item has moderate discrimination

3. Are the distracters performing adequately

All of the distracters are not performing adequately because all of it is weak

4. Overall evaluation

This item needs to be checked and revised. The distracters need to be change or rewritten

Page 36: final

Question 29

1. Is the item difficulty level appropriate for the testing application

The item difficulty with value of 0.85 suggests that this item is easy

2. Does the item discriminate adequately?

With item discrimination of -0.1, it indicates that the item has no discrimination

3. Are the distracters performing adequately

All of the distracters are not performing adequately because all option A is a weak distracter while B and D is non-functioning

4. Overall evaluation

This item needs to be checked and revised

Question 30

1. Is the item difficulty level appropriate for the testing application

The item difficulty with value of 0.9 suggests that this item is too easy

2. Does the item discriminate adequately?

With item discrimination of 0, it indicates that the item has no discrimination

3. Are the distracters performing adequately

All of the distracters are not performing adequately. A and B is a weak distracter while D is non-functioning

4. Overall evaluation

This item is eliminated

Page 37: final

4.6 LIMITATIONS

4.6.1 Inexperience

Our inexperience with coming up with a good question is one of the

limitations faced. This has been point out by the teacher after we go to

the school for follow-ups with the teacher. Puan Ee pointed out that we

should not extract the question from the exercise book alone. If we want

to extract questions, we should have extracted it from few materials

4.6.2 Lack of Reliable Materials

Much to our mortification, many of the books in the market is not reliable.

For example, the questions that we have extracted there are only three

items that can be retained. Others need to be improved or eliminated.

4.6.3 Students’ lack of preparation

This can be seen when the students were quite shocked when we tell

them that they need to answer a test. Lack of preparation from the

students can affect their performance. Some of the students also said to

us that they take the test easily and basically they just guess or peek at

other students’ answer

Apart from that, we have forgotten when we photocopied the questions

we were not aware of question 11, 12 and 13 have an answer. This is due

to our carelessness at the planning stage when we were going through

the question for the answer, one of the member ticks it.

Page 38: final

5.0 CONCLUSION

In conclusion, after conducting this research, we found that we have gained a lot

of meaningful information for our future use. The most crucial and harder part is on

planning the questions. It is because there are a lot of things to be considered on this

stage. The format of the questions that we want to construct, questions stringently

followed the Bloom’s Taxonomy of Educational Objectives, and which level of proficiency

of students that wants to be tested. All of these things are put into consideration when

constructing questions. After that, grading and scoring stage occur. At this stage, one

must carefully revise their calculation. Any number missing will affect the rest of the

calculation. Questions that have been chosen in the exercise book or revision book

should be examined thoroughly before putting it in the final draft of the test. Try to

minimize the numbers of questions that are miskeying, skewed, or ambiguous are often

found in any exercise or revision book. It is recommended for the teacher to use their

own item banks, if any. As we noticed, our students sampling is from homogenous

group. They possess similar abilities in terms of academic achievement. Finally, we have

learnt many things in this course especially things that we are going to apply for our

future use. We found that this course help us to be prepared as we are going to be a

teacher or educator later on.

Page 39: final

Appendices

1. Syllabus2. Table of Specifications3. Sample 30 multiple-choice questions4. Answer to the 30 multiple-choice questions5. Sample multiple-choice score sheet6. Marked score sheets (MCQ) of the students7. Item bank questions8. Calculation of Z-score and T-score 9. Calculations of item difficulty and item discrimination

Page 40: final

4. Answer to the 30 multiple-choice questions

Section A

1. A2. C3. B4. C5. C6. D7. D8. C9. C10. A11. D12. B13. C14. B15. B16. A17. A18. A19. C20. B21. B22. A23. A24. B25. A26. C27. A28. A29. C30. C

Page 41: final

8. Calculation of Z-score and T-score

no X Z - Score T - Score1 26 26 - 21.8 10 (1.92) + 50

2.191.92 69.2

2 25 25 - 21.8 10 (1.46) + 502.191.46 64.6

3 24 24 - 21.8 10 (1) + 502.19

1 604 23 23 - 21.8 10 (0.55) + 50

2.190.55 55.5

5 23 23 - 21.8 10 (0.55) + 502.190.55 55.5

6 23 23 - 21.8 10 (0.55) + 502.190.55 55.5

7 23 23 - 21.8 10 (0.55) + 502.190.55 55.5

8 23 23 - 21.8 10 (0.55) + 502.190.55 55.5

9 22 22 - 21.8 10 (0.09) + 502.190.09 50.9

10 22 22 - 21.8 10 (0.09) + 502.190.09 50.9

11 22 22 - 21.8 10 (0.09) + 502.190.09 50.9

12 22 22 - 21.8 10 (0.09) + 502.190.09 50.9

13 21 21 - 21.8 10 (- 0.37) + 50

Page 42: final

2.19-0.37 46.3

14 21 21 - 21.8 10 (- 0.37) + 502.19-0.37 46.3

15 21 21 - 21.8 10 (- 0.37) + 502.19-0.37 46.3

16 21 21 - 21.8 10 (- 0.37) + 502.19-0.37 46.3

17 20 20 - 21.8 10 (- 0.82) + 502.19-0.82 41.8

18 19 19 - 21.8 10 (- 1.28) + 502.19-1.28 37.2

19 18 18- 21.8 10 (- 1.74) + 502.19-1.74 32.6

20 17 17- 21.8 10 (- 2.19) + 502.19-2.19 28.1

9. Calculations of item difficulty and item discrimination

Page 43: final

no Item Difficulty, P Item Discrimination, D

Page 44: final

1 7+4 7−420 10

0.55 0.32 1+1 7−4

20 100.1 0.3

3 9+6 9−620 10

0.75 0.34 9+9 9−9

20 100.9 0

5 10+9 10−920 10

0.95 0.16 8+8 8−8

20 100.8 0

7 4+3 4−320 10

0.35 0.18 9+5 9−5

20 100.7 0.4

9 10+10 10−1020 101 0

10 9+2 9−220 10

0.55 0.711 6+5 6−5

20 100.55 0.1

12 10+10 10−1020 101 0

13 10+10 10−1020 101 0

14 10+10 10−1020 101 0

15 9+10 9−10

Page 45: final

20 100.95 -0.9

16 1+4 1−420 10

0.25 −0.317 10+10 10−10

20 101 0

18 10+7 10−720 10

0.85 0.319 3+3 3−3

20 100.3 0

20 8+8 8−820 100.8 0

21 10+6 10−620 100.6 0.4

22 5+3 5−320 100.4 0.2

23 6+4 6−420 100.5 0.2

24 10+8 10−820 100.9 0.2

25 10+8 10−220 100.9 0.2

26 10+10 10−1020 101 0

27 4+2 4−220 100.3 0.2

28 10+8 10−820 100.9 0.2

29 8+9 8−920 10

Page 46: final

0.85 −0.130 9+9 9−9

20 100.9 0.3

References

Page 47: final

Kubiszyn, T., and Borich, G. (1990). Educational Testing and Measurement (3rd Edition). Moterey, CA: Harper Colins Publishers.

Miller, M. D., Linn, R. L., and Gronland, N. E. (2009). Measurement and Assessment in Teaching (10th Edition). Upper Saddle River, N. J.: Pearson Publication, Inc.

Reynolds, C. R., Livingston, R. B., and William, V. (2006). Measurement and Assessment in Education. Boston, MA: Pearson Education, Inc.

Sax, G. (1989). Principles of Educational and Psychological Measurement and Evaluation (3rd Edition). Belmont, CA: Wadsworth Publishing Company.

Page 48: final

UNIVERSITI TEKNOLOGI MARA

KAMPUS BANDARAYA MELAKA

Prepared for:

DR. DAVID LOH ER FUU

PRINCIPLES OF TESTING AND EVALUATION (TSL 480)

UNIVERSITI TEKNOLOGI MARA

KAMPUS BANDARAYA MELAKA

Prepared by:

NOOR IZZATI MUHAMAD NASIR

2007297688

NOOR ALINA NAMAMI

2007297732

MUHAMMAD NABIL MUSTAFA

2007297686

ADI FARHAN GHAZALI

2007297782

Student of B. Ed. TESL

Faculty of Education

UiTM KAMPUS BANDARAYA MELAKA

20th April 2009