Manual on Test Item Construction Techniques

60
DRAFT June 2011 Facilitators: Rana Riaz Saeed, Senior Manager-NTS Murtaza Noor, Project Manager-HEC Handbook For Trainers of Learning Module On Test Item Construction Techniques

Transcript of Manual on Test Item Construction Techniques

Page 1: Manual on Test Item Construction Techniques

DRAFT

June 2011

Facilitators: Rana Riaz Saeed, Senior Manager-NTS Murtaza Noor, Project Manager-HEC

Handbook

For

Trainers of Learning Module

On

Test Item Construction Techniques

Page 2: Manual on Test Item Construction Techniques

TABLE OF CONTENTS

1. BACKGROUND ................................................................................................................ 4 1.2. Objectives ......................................................................................................................... 4 1.3. Training Module ................................................................................................................ 4 1.4. Aims .................................................................................................................................. 5 1.5. Objectives of the Module .................................................................................................. 5 1.6. Brief Introduction of Module Development Experts .......................................................... 5 2. ACTIVITY SCHEDULE OF WORKSHOP ....................................................................... 6

Day-1:

3. INTRODUCTION TO TESTING, ASSESSMENT AND EVALUATION .......................... 7 SESSION-1: Introduction of Testing, Assessment and Evaluation in Semester System ..................... 7 SESSION 2: Types of Tests and Assessment Types of Educational Tests .............................................................................................. 7 Group Work: Types of Educational Assessments ............................................................ 8 SESSION 3:

4. PLANNING THE TEST ..................................................................................................... 9 Instructional Objectives and Course Outcomes ............................................................... 9 Planning the Test .............................................................................................................. 9 The Contents..................................................................................................................... 9 Analysis of Curriculum and Textbooks ............................................................................. 9 Judgments of Experts ....................................................................................................... 9 Objectives of the Test ..................................................................................................... 10 Guidelines for Writing Objectives of the Test ................................................................. 10 Bloom’s Taxonomy of Educational Objective ................................................................. 10 Illustrative Examples ....................................................................................................... 11 Preparing the Test Blue Print ......................................................................................... 12 Preparing an Outline of Test Contents ........................................................................... 12 Techniques of Determining Test Contents ..................................................................... 12 Table of Specifications .................................................................................................... 15 Test Length ..................................................................................................................... 15 General Principles of Writing Questions for an Achievement test ................................. 16

Day-2

Session-1: Presentation of Home Assignment Session- 2&3

5. TYPES AND TECHNIQUES OF TEST ITEM WRITING ............................................... 18 What is a Test? ............................................................................................................... 18 Achievement Test ........................................................................................................... 18 Preparing the test according to Plan .............................................................................. 19 Items commonly used for Tests of Achievement ........................................................... 19 Constructing Objective Test Items ................................................................................. 19 Alternative Response Items ............................................................................................ 20 Uses of True-False Items ............................................................................................... 20 Suggestions for Constructing True-False Items ............................................................. 21 Short-Answer / Completion Items ................................................................................... 21

Page 3: Manual on Test Item Construction Techniques

Constructing Short-Answer Items ................................................................................... 22 Multiple Choice Questions .............................................................................................. 22 Characteristics of Multiple Choice Questions ................................................................. 23 Desired Characteristics of items ..................................................................................... 23 Advantages of MCQ........................................................................................................ 23 Limitations of MCQ ......................................................................................................... 23 Rules for Constructing Multiple - Choice Items .............................................................. 23 A variety of multiple choice items ................................................................................... 24 Context Dependent Items ............................................................................................... 25

Day-3 6. Step 2: PREPARING THE TEST ACCORDING TO PLAN .......................................... 26

Step 3: Test Administration and Use .............................................................................. 27 Awarding Grades .......................................................................................................... 28

7. HOW TO PREPARE EFFECTIVE ESSAY QUESTIONS? ........................................... 30 How to Prepare Better Essay Questions? ...................................................................... 30 When to use Essay Questions? ..................................................................................... 34 Guidelines for Constructing Essay Questions ................................................................ 36

Day 4 Session 1

8. EVALUATION OF ITEMS .............................................................................................. 43 Item Analysis ................................................................................................................... 43 Item Difficulty................................................................................................................... 43 Item Discrimination ......................................................................................................... 43 Interpreting Distracter Values ......................................................................................... 45 Effectiveness of Distracters ............................................................................................ 46 Session 2: Item Analysis and Subsequent Decision ................................................ 46

Decisions Subsequent to Item Analysis ......................................................................... 47 Session 3: Desirable Qualities of a Test .................................................................... 47 Reliability ......................................................................................................................... 47 How to Estimate Reliability? ........................................................................................... 48 Validity ............................................................................................................................. 49 Practicality ....................................................................................................................... 50 Objectivity ........................................................................................................................ 50

9. PROFESSIONAL ETHICS IN EDUCATIONAL ASSESSMENT .................................. 51

APPENDICES Appendix-A: Activity material for flawed items Appendix-B: Table of Specifications / Test Blueprint Appendix-C: Cognitive Domain, Instructional Objectives and item example Appendix-D: Professional Ethics in Assessment (Chapter Reading) Appendix-E: PowerPoint Presentations

Page 4: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 1

Handbook for Trainers of Learning Module on Test Item Construction

Background

National Testing Service (NTS) is an autonomous, self-reliant, self-sustained and Pakistan’s first premier testing service organization in public sector established with an aim to build and promote standards in educational and professional testing & assessment. NTS assesses competency of candidates for admission, scholarship, recruitment and promotion. It ensures efficiency, reliability, accuracy and most significantly credibility of entire system in a transparent manner under strict security arrangements. Moreover, NTS also contributes in human resource development (HRD) by organizing training and capacity building sessions and doing research and development (R&D). The Higher Education Commission (HEC) is the primary regulator of higher education in Pakistan. Its main purpose is to upgrade the Universities of Pakistan to be centres of education, research and development. It facilitates development of higher educational system in Pakistan. The HEC is also playing a leading role towards building a knowledge based economy in Pakistan by giving out hundreds of doctoral scholarships for education abroad every year. In Pakistan there exists several educational assessment and evaluation systems including provincial, federal or/and divisional Secondary and Higher Secondary Boards of examination and universities' annual or semester exams. Generally, the marking and evaluation persons either have little or no training/orientation of test item development and marking techniques. There is no proper institute to impart training on an internationally recognized and standardized testing / examination paper construction and marking techniques. Keeping in view the outlined background, NTS is planning to conduct a series of workshops for teaching faculties of Public Sector universities in collaboration with HEC.

Objectives The overall objective of the workshops is to acquaint university faculty members with pre-steps in student's evaluation in semester system and effective test item constructions techniques.

The main objectives of the workshops are:

To acquaint participants with test and evaluation paper development techniques in specified subject.

To enable participants with knowledge, skills and techniques of efficient paper marking.

Training Module In order to initiate the training workshop, NTS and HEC developed a training module on “Effective Test Item Construction Techniques” by engaging educational/ psychometric experts from various faculties of HEC affiliated universities/ educational institutions in a two-day consultative meeting held on 17 – 18 January 2011 at NTS Secretariat Islamabad. This training module would be used for conducting trainings for target university teachers both from natural and social sciences in all major cities of Pakistan.

Page 5: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 2

Aims This module is designed to provide an orientation to the University teachers about the goals, concepts, principles and concerns of testing, assessment and evaluation. It will also provide teachers an opportunity to design and construct useful test items for assessment and evaluation in their respective subjects for undergraduates/post graduate classes.

Objectives of the Module

The main objectives of this module are to:

1. Align assessment procedures to the learning objectives of the course. 2. Give a basic understanding and training to the faculty to develop and adapt materials

for assessing the level of learning: knowledge, comprehension and application within the course contents.

3. Follow standard assessment strategies and techniques in the construction of educational tests.

4. Test scoring and grading schemes. 5. Pre test, Metric Analysis, Review and Modification of test items.

In addition, HECs’ selected social sciences departments under the project titled “Development/Strengthening of Selected Departments of Social Sciences & Humanities”

would be part and parcel of this initiative as one of the key objective of the project includes linkages with other professional organizations.

Page 6: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 3

Brief Introduction of the Module Development Experts Professor Dr. Iftikhar N. Hassan has a master’s degree from Stanford University California and a Ph.D. degree from Indiana University USA in Clinical Psychology. She has worked as Professor and Dean at Allama Iqbal Open University and Pro-Vice Chancellor at Fatima Jinnah University. Her last assignment was as consultant and chair department of Behavioral Sciences at Karakuram International University, Gilgit- Baltistan. She has been a researcher and writer in Psychology, Social and Gender Issues. Her research publications are more than hundred and she is also author of six books. She is member of many National and International Professional Organizations. Her Publications include “Psychology of Women” a text book for MA students, Psychological profile of rural women, Case studies of successful women and Voiceless Melodies etc. Currently she is working as Executive Director, Gender and Psychological services an NGO. Dr. Iffat S. Dar got a PhD from the University of Minnesota –USA and with an emphasis on Psychological assessment and her master degree from Columbia University, Network-USA. Her last regular assignment was as Chief Psychologist at Federal Public Service Commission (FPSC), Islamabad. Since her retirement she has been working as consultant with several national and international organizations like Ministry of Education Islamabad, World Bank, Asian Development Bank, UNESCO and UNICEF. She has been a visiting professor at various universities of Pakistan like QAU, FJWU and Foundation University. Prof. Dr Rukhsana Kausar did her PhD from Surrey University, UK and Post-Doc from St. Georges' Hospital London, US as a Commonwealth Fellow. She has been working at the Department of Applied Psychology, University of the Punjab, for the last 23 years in various academic and administrative capacities. Currently she is serving 2nd tenure of Chairpersonship of Department of Applied Psychology. Dr. Rukhsana has worked for two years at Fatima Jinnah Women University, Rawalpindi as chairperson, and Associate Dean. She has been an active researcher and has supervised numerous M. Sc. M. Phil and PhD research theses. Her research work has been presented extensively in international conferences and she has about 45 research articles published in Journals of national and international repute. She is member of various international professional bodies in the discipline such as BPS, APA, IMSA, EHPS, ICP, IAAP and ECP. Dr. Mah Nazir Riaz is Professor of Psychology and Dean of Social Sciences, Frontier Women University Peshawar, Pakistan. Dr. Riaz received her Doctorate in Psychometrics from University of Peshawar, NWFP, Pakistan (1979). Her academic career spans 40 years of University teaching. She joined University of Peshawar on October 1st 1968 as a lecturer and retired as a Professor of Psychology in December 2003 from the same University. She won University Gold Medal (1966) and President of Pakistan’s Award at completion of Masters’ studies (1966) as well as a Professor of Psychology for her outstanding academic achievements (2003). She won Star Women International Award (1996). She received Distinguished Professor Award for meritorious services from Ministry of Education Govt.

of NWFP (2003). She also served as Professor of Psychology at National Institute of Psychology, Center of Excellence, Quaid-i-Azam University Islamabad, for three years (1999-2002). During this period, she won President of Pakistan’s Award “Izaz-e-Kamal”

(Gold Medal & cash Prize) for her life time achievements. She joined Frontier Women University Peshawar as Dean of Social Sciences in 2006. She was nominated as Eminent Educationist and Researcher by Higher Education Commission, Islamabad (2006). She has published more than 60 research papers in national and international journals. Besides, she is author of three Textbooks (1) Psychology, 2005; (2) Areas of Psychology

Page 7: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 4

2007; (3) Test Construction: Development and Standardization of Psychological Tests in Pakistan 2008. She has also contributed chapters in Edited Books published in Pakistan and USA. She is a member of International Society for Interpersonal Acceptance and Rejection (ISIPAR) and is currently representative of ISIPAR for South Asia. During the last two decades, she has conducted several researches on parental acceptance-rejection, published in national and international journals including cross-cultural research with Professor Rohner and Professor Abdul Khaleque, University of Connecticut, USA. Dr. Iftikhar Ahmad is currently working as Associate Professor in the University of Management and Technology, Lahore. He served as Psychologist Federal Public Service Commission and later as Director Research Punjab Public service commission. He also worked as Coordinator IBA Testing Service, Karachi for some time. He retired last year from G C University Lahore. His area of interest is Educational and Organizational Psychology and Testing and Mental Measurement.

Page 8: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 5

Activity Schedule of Workshop Estimated participants (25-30)

Day Session Activity/Contents Duration (Hrs)

1 Opening Ceremony 0.5

1 Introduction to Testing, Assessment & Evaluation

Rationale to Assessment and Evaluation in Semester System

Issues and problems (Activity/Discussion)

1.5

Tea/Refreshment Break 0.25

2 Types of Tests and Assessment Types of Educational Tests

Achievement

Diagnostic

Aptitude Types of Assessment

Formative-Summative

Theory-practical

1.5

Lunch and Prayer Break 0.75

3 Planning the Test

Key Concepts and Contents

Objectives of Test

Bloom’s Taxonomy of Educational Objectives

Table of Specification

General Principles of Writing Questions

2

Preparing a Table of Specification (Take Home Activity)

2 1 Presentations of Group Activity 1.5

Tea/Refreshment Break 0.25

2 Types and Techniques of Test Item Writing

What is a Test?

1.5

Lunch and Prayer Break 0.75

3 Practicum (preparing variety of items) 1.5

3 1 Preparing the Test According to Plan

Alternative Response Items

Use of True False Items

Suggestions for Constructing True-False Items

Multiple Choice Questions

Characteristics of Multiple Choice Questions

1.0

Tea/Refreshment Break 0.25

2 Activity: Preparing a Test, Peer Review, Scoring

Key

Presentation: Q&A

1.0

Lunch and Prayer Break 0.75

3 Step-2 preparing the Test According to Plan

Item Development.

Test Instructions Step-3 Test Administration and Use

Administration

Scoring the Test

Results

Awarding Grades

Review Test Results How to Prepare Effective Essay Questions?

How to Prepare Better Essay Questions?

When to Use Essay Questions?

Guidelines for Constructing Essay Questions?

1.5

Formatted Table

Page 9: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 6

Activity: Interactive and discussion 1.5

4 1 Evaluation of Items

Item Analysis

Discrimination By Difficulty

1.5

Tea/Refreshment Break 0.25

2 Activity Item Analysis and Subsequent Decision

1.0

Desirable Qualities of a Test

Reliability

Validity

Practicality

Objectivity

.75

Lunch and Prayer Break 1.0

3 Professional Ethics in Educational Testing 0.5

Feedback and Course Evaluation 0.25

Closing Ceremony 0.25

Formatted Table

Page 10: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 7

Introduction to Assessment and Evaluation in Semester System

Day 1 Session 1: Rationale to Assessment and Evaluation in Semester System

The session will begin with brief introduction of the participant. The trainer will introduce herself and then ask the participants to introduce themselves. Basic Concept: Lecture by the Trainer Trainer will talk about the importance of well constructed examinations as exams are the goal posts which act as guide and motivators for students to learn. We all know from our own experiences how students prepare for the examinations. They not only learn what interests them the most or are presented in a better way but also what type of paper they expect from the teacher. Due to this factor a well prepared examination paper is a guarantee of effective teaching learning process. Examinations have undergone radical change in the past fifty years due to improvements in measurement techniques and better understanding of learning processes. From a lengthy three hours essay type examination one can asses more comprehensively in thirty minutes objective type paper which can assess not only the knowledge but also comprehension and application of knowledge. Additionally a well prepared paper can evaluate the students objectively and quickly and large number of students in a class is not a problem. Why should we change our traditional system to newer system? Obviously the volume of knowledge has increased so much that our youth need to learn larger number of subjects for the same degree that educational institution started with semester system and now most of the universities are going on to quarter system. The students have to cover at least one year courses in one semester. This has become necessary to meet the world standard of education. Additionally, there is a concept of continuous assessment which requires more than one examination in one semester as well as class assignments and projects to asses different types of learning e.g. ability to express in writing, ability to collect data and draw conclusions from empirical information etc. Issues and Problems (Practical Work/Discussion –Time: one hour) Participants will write the advantages and disadvantages of objective type examination and Essay Type examination. This will be followed by sharing these points and discussing both pro and cons.

Session 2: Types of Tests and Assessment This session is based on Blooms Taxonomy and can one devise the techniques to assess different types of abilities.

Types of Educational Tests There are different types of tests which are used by the educationists and psychologists. In fact psychologists have influenced the evaluation system more than one likes to give them credit for .One can find large number of tests on all types of abilities and aptitudes and abnormalities in the catalogues however we are going to talk about those tests which concern with class room achievements only. These are also called scholastic achievement tests. We are going to talk about the following three types of tests:

Page 11: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 8

1. Class Room Achievement Tests These are the tests with which every teacher is familiar and has to construct it to judge the achievement of his /her students. If the test is well written and covers the entire course it will be a better measure of students achievement and could discriminate between good and poor learner. Learning needs to be understood in term of Blooms taxonomy and should not be just rote memorization.

2. Diagnostic Achievement Test A good achievement test can easily identify those students who have not comprehended a particular concept and teacher can work with those students separately. In large classes with students coming from variety of backgrounds it is very essential that teacher knows with the help of tools which of the students need more help. Generally one accomplishes this through quizzes and short tests after covering a certain portion of subject matter.

3. Scholastic Aptitude Tests (SAT) Scholastic Aptitude is sort of an ability test which can predict whether or not this student has the ability to succeed in the class room. Most of these are paper and pencil group tests and are given at the time of admission to screen the students. These are not intelligence tests in the classical sense which is a much wider concept than scholastic aptitude test. A very good example is GRE or GRE Subject etc.

Group Work (30 Min) Types of Educational Assessments

a. Formative- Summative. b. Theory - Practical

The participants will discuss different types of assessments in an interactive session and will be divided in four groups each assigned one type of assessment and present their report.

Reference Material Thorndike book on Educational Measurement is a bible of test development, Guilford’s book on statistic and more modern books on test development should be read by all the participants and be provided as references during the workshop.

Page 12: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 9

Planning the Test

Session 3: Planning the Test

Instructional Objectives and Course Outcomes Session Learning Outcomes The learning outcome for this session is to help the participants to:

Conceive of Instructional objectives and course outcomes

Design table of specifications in accordance with instructional objectives

Key Concepts and Content

Defining Instructional Objectives

Designing Table of Specification Education is a process that helps the students change in many ways. Some aspects of the change are intentional whereas others are quite unintentional. Keeping in view this assumption, one of the important tasks of university teachers is to decide, as far as possible how they want their students to change and to determine their role in this process of change. Secondly, upon the completion of a particular unit/ course, the teachers need to determine whether their students have changed in the desired manner. Furthermore, they have to assess the unintentional outcomes as well.

Planning the Test Test planning includes several activities that need to be carried out by the teacher to devise a new test. As a first step, the teacher must draw up a test “blue print”, specifying: 1) the content and 2) objectives of the test; types of items; practice exercises; time limit etc.

1) The Content

Analysis of Curriculum and Textbooks The outline of a classroom test should include details of test content in the specific course. Moreover, each content area should be weighted roughly in proportion to its judged importance. Usually, the weights are assigned according to the relative emphasis placed upon each topic in the textbook. The median number of pages on a given topic in the prescribed books is usually considered as an index of its importance. Analysis of curriculum prescribed by various Boards of Intermediate and Secondary Education (BISE), Public Sector and Private Universities of Pakistan, provides another method of determining the subject-matter content and instructional objectives to be measured by the proposed test.

Judgments of Experts To devise a classroom tests, the advice and assistance of subject-matter experts, serving as consultants, can prove to be of immense importance. The best way to seek consultants’ judgments is to submit to them a tentative outline prepared after studying the instructional objectives as stated in representative curriculums and of the subject-matter content as indicated by up-to-date and widely used textbooks.

Page 13: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 10

2) Objectives of the test Objectives The basic objective of an educational achievement test is to assess the desired changes brought about by the teaching-learning process. Obviously, each subject demands a different set of instructional objectives. For example, major objectives of the subjects like sciences, social sciences, and mathematics are: knowledge, understanding, application and skill. On the other hand the major objectives of a language course are: knowledge, comprehension and expression. Knowledge objective is considered to be the lowest level of learning whereas understanding, application of knowledge in sciences / behavior sciences is considered higher levels of learning.

As the basic objectives of education are concerned with the modification of human behavior, the teacher must determine measurable cognitive outcomes of instruction at the beginning of the course. The evaluation process determines the extent to which the objectives have been attained, both for the individual students and for the class as a whole. Such an evaluation provides feedback that can suggest modification of either the objectives or the instruction or both.

Some objectives are stated as broad, general, long-range goals, e.g., ability to exercise the mental functions of reasoning, imagination, critical appreciation. These educational

objectives are too general to be measured by classroom tests and need to be operationally defined by the class teacher.

Guidelines for Writing Objectives of Test When writing objectives, the teacher may follow the following guidelines:

1. Begin each objective with an action verb to specify observable behavior, for example, identify, formulate, describe etc.

2. Each objective must be stated in terms of student performance as an outcome or learning product.

3. Each objective should be stated clearly without creating any ambiguity.

Bloom’s Taxonomy of Educational Objectives Bloom’s Taxonomy classifies instructional objectives into three major domains-cognitive, affective, and psychomotor. The largest proportion of educational objectives falls into the cognitive domain.

The Cognitive Domain is the core of curriculum and test development. It is largely

concerned with descriptions of student behavior in terms of knowledge, understanding, and abilities that can be demonstrated.

The Affective Domain includes objectives that emphasize interests, attitudes, and values and the development of appreciations.

The Psychomotor Domain is concerned with physical, motor, or manipulative skills.

Bloom’s taxonomy classifies behaviors included in cognitive domain into the following categories:

1. knowledge 2. comprehension 3. application 4. analysis 5. synthesis 6. evaluation

Page 14: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 11

Knowledge level Task at the knowledge level involves the psychological processes of remembering. Items in knowledge category involve the ability to recall important information like knowledge of specific facts, definitions of important terms, familiarity with important concepts etc. Thus knowledge level questions are formulated to assess previously learned information:

Comprehension Level The common cognitive processes required at comprehension level are translation, interpretation, and extrapolation.

Application Level Task at the application level requires use of previously learned information in new and concrete situations to solve a problem. It requires mastery of a concept well enough to recognize when and how to use it correctly in an unfamiliar or novel situation. The fact that most of what we learn is intended to be applied to problem situations in our everyday life demonstrates well the importance of application objectives in the curriculum.

The taxonomy levels of knowledge, comprehension, and application are considered more valuable for curriculum development and educational evaluation than analysis, synthesis, and evaluation. Furthermore, the taxonomy does not suggest that all good

tests or evaluation techniques must include items from every level of taxonomy.

Illustrative Examples Knowledge Level

Which of the following does not belong with the others? a. Aptitude tests b. Personality tests c. Intelligence tests d. Achievement tests

Which of the following measures involve nominal data?

a. the test score on an examination b. the number on a basketball player’s jersey c. the speed of an automobile d. the class rank of a college student

Comprehension Level

A psychologist monitors a group of nursery-school children, recording each instance of altruistic behavior as it occurs. The psychologist is using:

a. case studies b. The experimental method. c. naturalistic observation d. The survey method.

In a study of the effect of a new teaching technique on students’ achievement test scores, an important extraneous variable would be the students:

a. Hair color. b. athletic skills

Formatted: Indent: Left: 0.5"

Formatted: Indent: Left: 0.5"

Formatted: Underline

Formatted: Indent: Left: 0.25"

Formatted: Indent: Left: 0.5"

Formatted: Indent: Left: 0.25"

Formatted: Indent: Left: 0.5"

Formatted: Indent: Left: 0.25"

Formatted: Underline

Formatted: Indent: Left: 0.25"

Formatted: Indent: Left: 0.5"

Formatted: Indent: Left: 0.25"

Formatted: Indent: Left: 0.5"

Page 15: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 12

c. IQ scores. d. sociability

Application Level

The results of Milgram’s (1963) study imply that: a. In the real world, most people will refuse to follow orders to inflict harm on a

stranger. b. Many people will obey an authority figure even if innocent people get hurt. c. Most people are willing to give obviously wrong answers when ordered to do so. d. Most people stick to their own judgment, even when group members

unanimously disagree. What is your chance of flipping 4 head or 4 tails in a row with a fair coin (i.e., one that comes up heads 50% of the time)?

a. .0625 b. .125 c. .250 d. .375

Problem Solving Problem solving refers to active efforts to discover an underlying process leading to achievement of a goal, for example, the series completion, analogies, and transformation problems. Examples1 1. A teacher had 28 students in her class. All but 7 of them went on a museum trip and

thus were away for the day. How many students remained in the class that day? 2. The water lilies on the surface of a small pond double in area every 24 hours. From

the time the first water lily appears until the pond is completely covered takes 60 days. On what day is half of the pond covered with lilies?

In the following illustrations, we have used only three levels: knowledge (recall/recognition), comprehension (understanding) and application (or skills), and labeled the columns accordingly.

Preparing the Test Blue Print The blue print is meant to ensure content validity of the test. It is the most important characteristic of an achievement test devised to determine the GPA at the end of a unit/ term

or course of instruction. The test may be based on several lessons or chapters in a text book, reflecting a balance between content areas and learning objectives. The test blue-print must specify both the content and process objectives in proportion to their relative

importance and emphasis in the curriculum. Depending on the purpose of the test and the instructional objectives, the test may vary in length, difficulty, and format (objective, essay, short-answer, open-book, or take-home).

1 Source: Sternberg, R.J.(1986). Intelligence Applied: Understanding and increasing your intellectual skills.

New York: Harcourt Brace Jovanovich.

Formatted: Underline

Formatted: Indent: Left: 0.25"

Formatted: Indent: Left: 0.5", Tab stops: -3.69", List tab + Not at -3.94"

Formatted: Indent: Left: 0.25"

Formatted: Indent: Left: 0.75", Tab stops: -3.63", List tab + Not at -3.88"

Formatted: Indent: Left: 0.25"

Formatted: Indent: Left: 0.25"

Formatted: Indent: Left: 0.25"

Page 16: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 13

Table 1 Test Blue Print/ General Layout of an Achievement Test Purpose of the Test Minimum competency, mastery, diagnosis, selection.

Nature of the test Norm-referenced or criterion-referenced. Target Population School children, college or university students, trainees of a

course, and employees of an organization. Format of Items Objective type (multiple-choice, true-false, matching,

completion), short answer, essay type, computer-administered. Test Length Approximate number of items comprising each category

(objective type/essay type, short answers). Testing Time Maximum time limit. Mode of Test Administration

Individual, group, computer.

Examiners’ Characteristics

Qualifications, training, experience.

Test content Verbal, pictorial, performance. Sources of Test Content Textbooks, subject experts, curriculum No. of items representing various content strata

Depends on the relative importance of content areas

Appropriate difficulty Taxonomy level of the items

Difficulty level in relation to purpose of the test, such as, minimum competency, mastery, selection, diagnosis, etc. Knowledge, comprehension, application, analysis, synthesis, evaluation.

Scoring procedure Hand scoring, machine scoring, computer assisted scoring, or grading (as in essay examinations).

Interpretation Norms, percentiles, grade equivalents, etc. Item analysis Qualitative, quantitative, both. Reliability techniques Test-retest, parallel form, Kuder-Richardson, Examiner or inter-

rater. Validity Content, criterion-related (predictive/concurrent), construct. Source: Riaz, M.N. (2008). Test Construction: Development and Standardization of Psychological Tests in Pakistan. Islamabad: HEC.

Preparing an Outline of Test Contents The term test content refers to representative sample of the course content, skills and

cognitive levels/instructional objectives to be measured by the test. The author of the test has to prepare a test plan or table of specification that clearly shows the relative emphasis on various topics and different types of behavior.

Techniques of Determining Test Contents

Analysis of Instructional Objectives The following tables of specification show how to develop a relatively broad outline of a classroom test. Our first illustration relates to a Pre Medical Science Achievement Test. We may prepare a 100 items test according to the following table of specifications showing instructional objectives and content areas.

Page 17: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 14

Table 2 Specifications related to Instructional Objectives and Content of a Premedical Science

Achievement Test

1. Objectives of instruction Percent of

Items

a. Recall of basic concepts 30

b. Comprehension, interpretation, analysis of scientific Content 40

c. Application of concepts, principles etc. 30

100

2. Content areas Percent of

Items

a. Biology 40

b. Chemistry 40

c. Physics 20

100

Source: Riaz, M.N. (2008). Test Construction: Development and Standardization of Psychological Tests in Pakistan. Islamabad: HEC.

In practice, a much more detailed outline of the contents within each cell of a table is needed before test construction proceeds. Combined in a two-way table, the above specifications are presented in the following table (Table 3).

Table 3 Number of Items in Each Category of a Premedical Science Achievement Test

Content/ Subjects

Objectives of Instruction

Recall of Basic Concepts

Comprehension Application of Concepts,

Principles, etc. Total

Biology 12 16 12 40

Chemistry 12 16 12 40

Physics 6 8 6 20

Total 30 40 30 100

Source: Riaz, M.N. (2008). Test Construction: Development and Standardization of Psychological Tests in Pakistan. Islamabad: HEC.

Table 4 Number of items in each category of an Achievement Test “Principles of

Psychological Measurement” Content/ Subject-matter Objectives of Instruction

Recall of Basic Concepts

Comprehension, Application

Analysis, synthesis evaluation

Total

1. Basic statistical concepts; variability, correlation, prediction

1 2 2

5 (10%)

2. Scales, transformation, norms 3 2 0

5 (10%)

3 Reliability: concepts, theory and methods of estimation

3 3 4

10 (20%)

4. Validity: Content, construct, criterion-related validity

4 6 5

15 (30%)

5. Item analysis: Item characteristics, distracter analysis, item discrimination, item characteristic curves

4 7 4

15 (30%)

Total 15 (30%) 20 (40%) 15 (30%) 50

Source: Riaz, M.N. (2008). Test Construction: Development and Standardization of Psychological Tests in Pakistan. Islamabad: HEC.

Formatted: Font: 9 pt

Formatted Table

Formatted: Font: 9 pt

Formatted: Font: 9 pt

Formatted: Font: 9 pt

Formatted: Font: 9 pt

Formatted: Font: 9 pt

Formatted: Font: 9 pt

Formatted: Font: 9 pt

Formatted: Font: 9 pt

Formatted: Font: 9 pt

Formatted: Font: 9 pt

Formatted Table

Formatted: Font: 9 pt

Formatted: Font: 9 pt

Formatted: Font: 9 pt

Formatted: Font: 9 pt

Formatted: Font: 9 pt

Formatted: Font: 10 pt

Formatted Table

Page 18: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 15

The above table shows a test outline for an Achievement Test based on “Principles of Psychological measurement” (Part II: Chapters 4-10 of Psychological Testing by Murphy, K. R., & Davidshofer, C. O., 1998).

Table of Specifications A table of specifications is a two-way table that represents along one axis the content area/topics that the teacher has taught during the specified period and the cognitive level at which it is to be measured, along the other axis. In other words, the table of specifications highlights how much emphasis is to be given to each objective or topic.

Table 5

A classroom test in Experimental Psychology: Semester 1 Subject/ Content

Instructional Objectives: Bloom’s Taxonomy

Knowledge & comprehension

Application Analysis, Synthesis & Evaluation

Totals

Topic A 10% 20% 10% 40%

Topic B 15% 15% 30% 60%

Total 25% 35% 40% 100%

While writing the test items, it may not be possible to attempt to adhere very rigorously to the weights assigned in each cell of a table of specifications like the one presented in Table 5. Thus, the weights indicated in the original table may need to be slightly changed during the course of test construction, if sound reasons for such a change are encountered by the teacher. For instance, the teacher may find it appropriate to modify the original test plan in view of data obtained from the experimental try-out of the new test. Preparing a Table of Specification

Table 6 Table of Specifications showing only one topic (Problem Solving) and three

levels of cognitive objectives (Knowledge, Comprehension, Application) Content/Topics

Objectives Number of Items Knowledge Comprehension Application

Problem solving in search of solutions

10% 10%

Barriers to Effective problem solving

10% 15% 25%

Approaches to problem solving 15% 20% 35%

Culture, cognitive style, and problem solving

30% 30%

The above table shows that the first topic is to be measured only at the knowledge level, and the fourth topic at the application level. Second and third topics are to be measured at two different levels. Topic 2: Knowledge and Comprehension; topic 3: Comprehension and Application. Preparing a test according to the above table of specifications means that 20% of items in our test measure Knowledge, 30% measure Comprehension, 50% measure Application.

Test Length The number of items that should constitute the final form of a test is determined by the purpose of the test or its proposed uses, and by the statistical characteristics of the items. Some of the important considerations in setting test length are:

Page 19: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 16

1. The optimal number of items for a homogenous test is lower than for a highly heterogeneous test.

2. Items that are meant to assess higher thought processes like logical reasoning, creativity, abstract thinking etc., require more time than those dependent on our ability to recall important information.

3. Another important consideration in determining the length of test and the time required for it is related to the validity and reliability of the test. The teacher has to determine the number of items that will yield maximum validity and reliability of the particular test.

General Principles of Writing Questions for an Achievement Test Different types of questions can be devised for an achievement test, for instance, multiple choice, fill-in-the-blank, true-false, matching, short answer and essay. Although each type of question is constructed differently, the following principles apply to constructing questions and tests in general:

1. Instructions for each type of question must be simple and brief. 2. Questions must be written in simple language. If the language is difficult or

ambiguous, even a student with strong language skills and good vocabulary may answer incorrectly if his/her interpretation of the question is different from the author’s intended meaning.

3. Test items must assess specific ability or comprehension of content developed during the course of study.

4. Write the questions as you teach or even before you teach, so that your teaching may be aimed at significant learning outcomes.

5. Devise questions that call for comprehension and application of knowledge skills. 6. Some of the questions must aim at appraisal of examinees’ ability to analyze,

synthesize, and evaluate novel instances of the concepts. If the instances are the same as used in instruction, students are only being asked to recall (knowledge level).

7. Questions should be written in different formats, e.g., multiple-choice, completion, true-false, short answer etc. to maintain interest and motivation of the students.

8. Prepare alternate forms of the test to deter cheating and to provide for make-up testing (if needed).

9. The items should be phrased so that the content rather than the format of the statements will determine the answer. Sometimes the item contains “specific determiners” which provide an irrelevant cue to the correct answer. For example, statements that contain terms like always, never, entirely, absolutely, and exclusively are much more likely to be false than to be true. On the other hand, such terms as may, sometimes, as a rule, and in general are much more likely to be true. Besides,

care should be taken to avoid double negatives, complicated sentence structures, and unusual words.

10. The difficulty level of the items should be appropriate for the ability level of the group. Optimal difficulty for true-false items is about 75 percent, for five-option multiple-

choice questions about 60 percent, and for completion items approximately 50 percent. However, difficulty in itself is not an end, the item content should be determined by the importance of the subject matter. It is desirable to place a few easy items in the beginning to motivate students, particularly those who are of below average ability.

11. The items should be devised in such a manner that different taxonomy levels are evaluated. Besides, achievement tests should be power test, not speed test.

12. Items pertaining to a specific topic or of a particular type should be placed together in the test. Such a grouping facilitates scoring and evaluation. It will also be helpful for the examinees to think and answer the items, similar in content and format, in a better manner without fluctuation of attention and changing the mind set.

Page 20: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 17

13. Directions to the examinees should be as simple, clear, and precise as possible, so that even those students who are of below average ability can clearly understand what they are expected to do.

14. Scoring procedures must be clearly defined before the test is administered. 15. The test constructor must clearly state optimal testing conditions for test

administration. 16. Item analysis should be carried out to make necessary changes, if any ambiguity is

found in the items.

Table of Specification (Take Home Activity) Participants will be asked to work individually and prepare a table of specification for one of the courses they are teaching.

Page 21: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 18

Day 2: Session: 1

a. Presentations based on home assignment.

b. Participants will share their drafts with group members for discussion and further

input.

Types and Techniques of Test Item Writing

Session 2 & 3

Objectives of the Sessions

To develop understanding of the essential steps in test construction for ensuring / enhancing validity and reliability.

To enable the participant to understand distinctions between a variety of achievement test items, their characteristics and appropriate usage of each item as distinguished from the others.

To apprise the participant of the techniques of writing good test items

To enable the participants to understand the scoring procedures and the meaning of scores.

To develop understanding and appreciation for scientific ways of awarding and using grades.

What is a Test?

A test is an instrument or a tool. It follows a systematic procedure for measuring a sample of behavior by posing a set of questions in a uniform manner. It is an attempt to measure what a person knows or can do at a particular point in time. Furthermore, a test answers the question ‘how well’ does the individual perform either in comparison with others or in comparison with a domain of performance tasks?

Achievement Test A test designed to apprise what the individual has learned as a result of planned previous experience or training is an Achievement Test. Since it relates to what has been learnt already its frame of reference is on the present or past. Basic Assumptions Preparation of an achievement test assumes that the content and / or the skill domain covered by the test can be specified in behavioral terms and that the knowledge and skill to be measured must be specified in a manner that is readily communicable to other persons. It is important that the test measures the important goals rather than peripheral or incidental goals. It also assumes that the test takers have had the opportunity to learn the material covered by the test. Achievement tests are designed specially to measure the degree of accomplishment in some particular educational or training experience. They are designed to measure the knowledge and skills developed in a relatively circumscribed area (domain). This area may be as narrow as one day’s class assignment or as broad as several years’ study.

Page 22: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 19

Achievement tests attempt to measure what a person knows or can do at a particular point in time. Furthermore, our reference is usually to the past; that is we are interested in what has been learned as a result of a particular course or experience or a series of experiences

Step 2 Preparing the Test According to Plan The next step after planning the test is preparing it in accordance with the plan. This (step 2) mainly deals with development of items and organizing them in the form of a test. Before we discuss preparing the test, it seems quite reasonable that we talk about different types of test items, their characteristics, use and limitations.

Items commonly used for Tests of Achievement Two major types of items have been identified:

1. Constructed Response / Supply items 2. Structured Response / Select items

1. Constructed Response / Supply items In the supply type items the question is so framed that the examinee has to supply or construct the answer on his own in his own words.

They generally include the following type:

Essay

Short Answer

Completion 2. Structured Response / Select items In the select type items, as the name suggests the examinee is required to select the correct answer from amongst the given or structured options. They are often called objective items. They include:

Alternate Response

Multiple-choice

Matching

The Constructed Response / Supply type items must be dealt with in another module. Let us restrict ourselves here with the use, limitations and construction of Structured Response / Select type items.

Constructing Objective Test Items: Simple Forms Structured Response / Select Construction of test items is a crucial step for the validity of a classroom test is determined by the extent to which performance to be measured is called forth by the test items. It is not enough to have knowledge of subject matter, defined learning outcomes, or a psychological understanding of the students’ mental processes, although all of these are prerequisites. The ability to construct high-quality test items requires knowledge of the principles and techniques of test construction and skill in their application. Objective test forms typically measure relatively simple learning outcome.

Page 23: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 20

Alternative Response Items Alternative response item, by definition is the one that offers two options to choose from. They often consists of a declarative statement that the examinee is asked to mark true or false, right or wrong, correct or incorrect, yes or no, agree or disagree, or the like. Incomplete sentences providing two options to choose from to fill in the blank also fall in this category. Very common use of such items is to test the knowledge of grammar. Appropriate use ‘tense’ and also, contextual meaning of words or spelling mainly of words that sound alike. In each case there are only two possible answers. The most common form it takes is True - False questions.

Uses of True-False Items Most common use of the true-false item is in measuring the examinee’s ability to identify the correctness of statements of fact, definitions of terms, statements of principles, and the like, also to distinguish fact from opinion. True-false tests include numerous opinion statements to which the examinee is asked to respond true or false. There is no objective basis for determining whether a statement of opinion is true or false. In most situations, when a student is the respondent, he guesses what opinion the teacher holds and marks the answers accordingly. This, of course, is not desirable from all standpoints, testing, teaching, and learning. An alternative procedure is to attribute the opinion to some source, making it possible to mark the statements true or false with some objectivity. This would allow measuring knowledge concerning the beliefs that may be held by an individual or the values supported by an organization or institution. Another aspect of understanding that can be measured by the true-false item is the ability to recognize cause-and-effect relationships. This type of item usually contains two true propositions in one statement, and the examinee is to judge whether the relationship between them is true or false. The true-false item also can be used to measure some simple aspects of logic.

Criticism A common criticism of the true-false item is that an examinee may be able to recognize a false statement as incorrect but still not know what is correct. To overcome such difficulties, some teachers prefer to have the students change all false statements to true.

Advantage A major advantage of true-false items is that they are efficient.

Students can typically respond to roughly three true-false items in the time it takes to respond to two multiple choice items.

True-false items have utility for measuring a broad range of verbal knowledge.

A wide sampling of course material can be obtained.

Limitations Limitations of the true-false items are in the types of learning outcomes that can be measured.

True-false items, is, unfortunately more illusory than real ease of construction

True-false items are not especially useful beyond the knowledge area.

Page 24: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 21

The exceptions to this seem to be distinguishing between fact and opinion and identifying cause-and-effect relationships. These two outcomes measured by the true-false item can be measured more effectively by other forms of selection items, especially the multiple-choice form.

Another factor that limits the usefulness of true-false item is susceptibility to guessing.

Successful guessing on the true-false item has effects that are at least as deleterious:

1. First, the reliability of each item is low. 2. Second, Very little diagnostic value of such a test.

Another concern that needs to be considered in the design of tests with true-false items is student response sets.

A response set is a consistent tendency to follow a certain pattern in responding to test items.

Note: True-false items are most useful in situations in which there are only two possible alternatives (for instance right, left, more, less, who, whom, and so on) and special uses such as distinguishing fact from opinion, cause from effect, superstition from scientific belief, relevant from irrelevant information, valid conclusions, and the like.

Suggestions for Constructing True-False Items Avoid trivial statements.

Avoid broad general statements.

Avoid the use of negative statements, especially double negatives. When a negative word must be used, it should be underlined or put in italics so that students do not overlook it.

Avoid complex sentences.

Avoid including two ideas in one statement, unless cause-effect relationships are being measured.

Avoid using opinion that is not attributed to some sources , unless the ability to identify opinion is being specifically measured.

Avoid using true statements and false statements that are unequal in length.

Avoid using disproportionate numbers of true statements and false statements.

Short-Answer / Completion Items The short –answer item and the completion item both are supply-type test items. Yet, they are included here for their simplicity. They can be answered by a word, phrase, number, or symbol. The short-answer item uses a direct question whereas the completion item consists of an incomplete statement.

Short-answer item is especially useful for measuring problem-solving ability in science and mathematics.

Complex interpretations can be made when the short- answer item is used to measure the ability to interpret diagrams, charts, graphs, and pictorial data.

When short-answer items are used the question must be stated clearly and concisely. It should be free from irrelevant clues, and require an answer that is both brief and definite.

Page 25: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 22

Advantages of Short-Answer Items Short-answer test item is one of the easiest to construct. This is used almost exclusively to measure the recall of memorized information. In short-answer type item the student must supply the answer, this reduces the possibility that the examinee will obtain the correct answer by guessing.

Limitations of Short-Answer It is not suitable for measuring complex learning outcomes. Unless the question is carefully phrased, many answers of varying degrees of correctness must be considered for total or partial credit. Hence it is difficult to score. These limitations are less troublesome when the answer is to be expressed in numbers or symbols, as in physical science or mathematics.

Constructing Short-Answer Items The following suggestions will help to avoid possible pitfalls and provide greater assurance that the items will function as intended.

Word the item so that the required answer is both brief and specific. A direct question is generally more desirable than an incomplete statement.

Do not take statements directly from textbooks to use as a basis for short-answer items.

• If the answer is to be expressed in numerical units, indicate the type of answer wanted.

• Blanks for answers should be equal in length and in a column to the right of the question.

• Do not include too many blanks.

Multiple Choice Questions

What is a Multiple Choice Item? The multiple choice item (MCQ) consists of two distinct parts: 1. The first part that contains task or problem is called stem of the item. The stem of the

item may be presented either as a question or as an incomplete statement. The form makes no difference as long as it presents a clear and a specific problem to the examinee.

2. Second part presents a series of options or alternatives. Each option represents

possible answer to the question. In a standard form one option is the correct or the best answer called the keyed response and the others are misleads or foils called distracters.

The number of options used differs from one test to the other. An item must have at least three answer choices to be classified as a multiple choice item. The typical pattern is to have four or five choices to reduce the probability of guessing the answer. A good item should have all the presented options look like probable answers at least to those examinees who do not know the answer.

Terminology: Multiple Choice Questions 1. Stem: presents the problem 2. Keyed Response: correct or best answer

Page 26: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 23

3. Distracters: appear to be reasonable answers to the examinee who does not know the content

4. Options: include the distracters and the keyed response.

Characteristics of Multiple Choice Questions Multiple choice items are considered better than all items that can be scored objectively

1. The MCQ is the most flexible of the objective type items. It can be used to appraise the achievement of any educational objectives that can be measured by a paper-and-pencil test except those relating to skill in written expression and originality.

2. An ingenious and talented item writer can construct an MCQ to measure a variety of educational objectives from rote learning to more complex learning outcomes like comprehension, interpretation, application of knowledge and also those that require the skills of analysis or synthesis to arrive at the correct answer.

3. Moreover, the chances of getting a correct response by guessing are significantly reduced.

However, good multiple choice items are difficult to construct. A thorough grasp of the subject matter and skillful application of certain rules is needed to construct good multiple choice items.

Desired characteristics of items Desirable difficulty level

Ability to discriminate between high and low performers

Effective distracters

Advantages of MCQ Wide sampling of content

The problem or the task is well structured or clearly defined.

Flexible Difficulty Level

Efficient scoring of items

Objective scoring

Provide scores easily understood and transformed as needed: Multiple choice tests provide scores in matrices that are familiar to most of the score users i.e. percentiles, grade equivalent scores

Limitations of MCQ The multiple choice items, despite having advantages over other items, have some serious limitations as well.

It takes time to construct MCQ.

susceptible to guessing

Do not provide any diagnostic information.

Rules for Constructing Multiple - Choice Items

1. Be sure that the stem clearly formulates a problem. The stem should be worded so that the examinee clearly understands the question being asked before he reads the answer choices.

Page 27: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 24

2. Stem should be written either in direct question form or in an incomplete statement form.

3. The stem of the item should present only one problem. Two concepts must not

be combined together to form a single stem. 4. Include as much of the item in the stem and keep options as short as possible:

this leads to economy of space, economy of reading time and clear statement of the problem. Hence, include most of the information in the stem and avoid repeating it in the options. For example, if an item relates to the association of a term with its definition, it would be better to include definition in the stem and several terms as options rather than to present option in the stem and several definitions as alternatives.

5. Unnecessary words or phrases should not be included in the stem. Such words

add to the length and complexity of the stem but do not enhance meaningfulness of the stem. The stem should be written in simple, concise and clear form.

6. Avoid the use of negative words in the stem of the item. There are times when it

is important for the examinee to detect errors or to know exceptions. For these purposes, sometimes the use of ‘not ‘or ‘except’ is justified in the stem. When a negative word is used in a stem it should be highlighted.

7. Use novel material in formulating problems to measure understanding or ability to

apply principles. Do not focus too closely on rote memory of the text that neglects measurement of the ability to use information.

8. Use plausible distracters as alternatives. If an examinee who does not know the

correct answer is not distracted by a given alternative, that alternative is not plausible and it will add nothing to the functioning of the item.

9. Be sure that no unintentional clues

10. The correct answer should appear at each position in almost equal numbers.

While constructing multiple - choice item, some examiners have a tendency to place correct alternative at the first position. Some place it in the middle and others at the end. Such tendencies should be consciously controlled.

11. Avoid using ‘none of the above’, ‘all of the above’, both a and b etc. as options

for an MCQ.

12. Alternatives should be grammatically consistent with the stem. Grammatical inconsistency provides irrelevant clues.

A variety of multiple choice items

Matching Exercises

Matching exercise consists of two parallel columns with each word, number, or symbol in one column being matched to a word, sentences, or phrase in the other column.

Items in the column for which a match is sought are called premises, and the items in the column from which the selection is made are called responses.

Page 28: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 25

Uses of Matching Exercises

When you have a number of questions of the same type (homogeneous), it is advisable to frame a matching item in place of a number of similar MCQs.

Whenever learning outcomes emphasize the ability to identify the relationship between two things and a sufficient number of homogeneous premises and responses can be obtained, a matching exercise seems most appropriate.

Hence it is suggested to Use only homogeneous material in single matching exercise e. g.

Inventions and inventers Authors and Books Scientists and their contribution

The major advantage of the matching exercise is its compact form, which makes it possible to measure a large amount of related factual material in a relatively short time.

Limitations

It is restricted to the measurement of factual information.

Another limitation somewhat is the difficulty of finding homogeneous material that is significant from the viewpoint of our objectives and learning outcomes.

Suggestions for Constructing Matching Exercises

Use only homogeneous material in a single matching exercise.

Include an unequal number of responses and premises and instruct the student that responses may be used once, more than once, or not at all.

Keep the list of items to be matched brief, and place the shorter responses on the right.

Arrange the list of responses in logical order. Place words in alphabetical order and numbers in sequence.

Indicate in the directions the basis for matching the responses and premises.

Ambiguity and confusion will be avoided. And testing time will be saved.

Place all of the items for one matching exercise on they same page. Context Dependent Items A variety of multiple choice items may be used to measure learning achievement of higher level such as comprehension, interpretation, extrapolation, application, reasoning, analysis etc. and help the students focus more on the items/test. The most commonly used variation is the Context Dependent Item. The selection of context or stimuli is made in accordance with the nature of the discipline /subject and the learning outcome to be measured. The context may be in the form of a:

Paragraph

Diagram

Graph

Page 29: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 26

Picture One context / stimulus may be followed by one or more multiple choice items some examples of context dependent items are stated as under:

Paragraph as a Context Paragraph as a context is used to measure learning outcomes relating to reading comprehension i.e. understanding meaning/theme of the paragraph, understanding contextual meanings of words, relating and synthesize various parts of information given in a paragraph etc.

Diagram/Picture as a Context

The questions using diagram may measure not only knowledge but understanding and application as well.

Like other contexts, a diagram may be followed by a number of MC items, it may also require the examinee to label various specified parts of the diagram, or even ask about their functions.

Graph as a Context

Reading and interpreting graphs is the ability that can be useful in most social and physical sciences. MCQ be asked to assess the desired achievement in the respective field.

Step 2: Preparing the Test According to Plan

Item Development: Items that can be scored Objectively: True false, matching, and multiple-choice type and their variations In preparing objective test care should be taken to make the items clear, precise, grammatically correct, and written in language suitable to the reading level of the group for whom the test is intended. All information and qualifications needed to select a reasonable answer should be included, but non-functional or stereotyped words and phrases should be avoided. General recommendations that apply to all kinds of test exercises:

Keep the test plan in view as test exercises are written. Items should be addressed to the cells in the blueprint / the test plan.

Draft the test items some time in advance, and then review them

Have test items examined and critiqued in, the light of the rules for writing items, by one or more colleagues.

Check if the item has an answer that would be agreed upon by experts. If possible one of the experts may take the test and the responses of the expert are compared with the keyed answers. This way any error can be detected and the test developer may put confidence in the key thus finalized.

Prepare a surplus of test exercises. So that an adequate number of good items will be available for the final version of the test.

Assembling a Test Items after having written and selected they are organized in the form

of a test.

Page 30: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 27

Arranging Items in the test Items of the same format may be placed together. Each item type requires specific set of directions and a somewhat different mental set on the part of the examinee. So far as possible, within item type, items dealing with the same content may be grouped together. The examinee will be able to concentrate on a single domain at a time rather than having to shift back and forth among areas of content. Furthermore, the examiner will have an easier job of analyzing the results, as it will be easier to see at a glance whether the errors are more frequent in one content area than the other. Items may be so arranged that difficulty progress from easy to hard Items should be arranged in the test booklet so that answers follow no set pattern.

Test Instructions The directions should be simple but complete. They should indicate the purpose of the test, the time limits and the score value of each question. Write a set of directions for each item type that is used on the test specifying what the respondent is expected to do and how one is required to record the responses.

Answer Sheets Separate answer sheets, which are easier to score, can be used at high school level and beyond.

Test Length Make sure that the length of the test is appropriate for the time limits? After the items have been reviewed and tentatively selected, it is important to see that the items measure a representative sample of learning objectives and course content included in the test plan. Agreement between the test plan and the test would ensure content validity of the test.

Step 3: Test Administration and Use

Administration All pupils must be given a fair chance to demonstrate their achievement. Physical and psychological environment be conducive to their best efforts. Control all factors that might interfere with valid measurement: Adequate workspace, quiet, proper light and ventilation are important. Pupils must be put at ease, tension and anxiety should be reduced to the minimum.

Scoring the test Scoring Key If the pupils’ answers are recorded on the test paper, the teacher may make a scoring key by marking the correct answers on a blank copy of the test. When separate answer sheets are used, a scoring stencil is a blank answer sheet with holes punched where correct answer should appear. Before scoring procedure is used, each test paper should also be scanned to

Page 31: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 28

make sure that only one answer was marked for each item. Any item containing more than one answer should be eliminated from scoring. In scoring objective tests, each correct answer is usually counted as one point. When pupils are told to answer every item on the test, a pupil's score is simply the number of items answered correctly. Short answer questions may sometime require awarding partial credit and may pose some problem in scoring. However, a detailed key may be prepared in advance to avoid confusion. For each question and for the test as a whole, the examiner may make a tally for each kind error that the examinees make. A summary of these errors could then be used to plan instructional activities. Results The raw score / score obtained by a pupil have no meaning at all and cannot be directly interpreted. If a student obtains 75 marks out of 100, it tells us neither how s/he compares with other students nor what he knows nor what he does not know. The simplest form of meaning that teachers often provide to the test score is by assigning ranks to the scores. It however, becomes more interpretable when one knows the total number of the students in the class. Often grades are assigned to give meaning to the raw scores by comparing individual performance with the whole group that has taken the test. In educational institutions most often it is the class. For criterion referenced test, of course, absolute grading is used

Awarding Grades

Relative grading Letter grades are typically assigned on the basis of Performance in relation to other group members. Some teachers assign them on normal curve but it is not recommended by experts (Linn and Gronlund), for the classes are usually too small to attain normal curve. They suggest that before letter grades are assigned, the proportion of As, Bs, Cs, Ds, and Fs, to be used may be determined. This must be done in the light of a consistent policy of the institution or the system.

The following distribution is recommended for an introductory course for the purpose of illustration only.

A = 10-20 % of the students. B = 20-30 % of the students. C = 30-50 % of the students. D = 10-20 % of the students. F = 0-10 % of the students.

The educational institution should decide about a consistent grading policy for all its departments. Grades may be awarded on the basis of percentile or standard score system may be used.

In relative grading, grades provide meaning to the scores in terms of performance in reference to the group.

When grades are assigned to the obtained scores, raw scores loose their significance.

In most systems where letter grades are used, grades are assigned numerical values. Such as A=4; B=3; C=2; D=1 and F = 0 or fail.

Page 32: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 29

Grade point for a course is obtained by multiplying the grade value with its credit hours.

Finally Grade point average (GPA: average of the grade points for all the courses) is found.

The GPA, a numerical value is often converted into equivalent letter grade Assigning and communicating the grades to the class is not enough. It is important that the teacher/ examiner returns and reviews test results with the students/ examinees. Feedback on the performance has a special value for motivating learners for improvement. Moreover, learning from one's mistake is usually very effective.

Absolute grading Assigning grades on absolute basis involves comparing a pupil’s performance to pre - specified standards set by the teacher. These standards are usually concerned with the degree of mastery to be achieved as specified and may be specified as the percent of correct answers to be obtained on a test designed to measure clearly defined set of learning tasks (competencies) on a criterion referenced test.

Practice though not scientific There are instances where pre specified standards are used to assign letter grades directly on the basis of raw scores. For example, in Pakistan the Boards of Secondary and Intermediate Education assign: A1 on 80 % marks or beyond A on 70 – 79 % marks B on 60 – 69 % marks … These grades at best tell us that the score obtained by a student lies between a certain range. Like a raw score such a grade tells us neither how a pupil compares with other students nor what he knows nor what he does not know. The experts in the field do not recognize this system as absolute grading nor does this fall in the category of relative grading. Though not scientifically recognized, this system of grading is practiced in many educational settings.

Review Test Results Assigning and communicating the grades to the class is not enough. It is important that the teacher returns and review test results with the students. Feedback on the performance has a special value for motivating students for improvement. Moreover, learning from one’s mistake is usually very effective.

Page 33: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 30

How to Prepare Effective Essay Questions? Objectives of the Sessions

1. To provide faculty with information and guidelines that helps better utilize the advantages of essay questions in assessing student performance. It will also provide guidelines for dealing with the challenges of essay questions.

2. To help understand the main advantages and limitations of essay questions and common misconceptions associated with their use.

3. To help distinguish between learning outcomes that are appropriately assessed by using essay questions and outcomes that are likely to be better assessed by other means.

4. Evaluate existing essay questions using commonly accepted criteria. 5. Improve poorly written essay questions by using the information in this booklet to

identify flaws in existing questions and correct them. 6. Construct well-written essay questions that assess given objectives.

How to Prepare Better Essay Questions? What is an Essay Question? There are two major purposes for using essay questions that address different learning outcomes. One purpose is to assess students' understanding of subject-matter content. The other purpose is to assess students' writing abilities. These two purposes are so different in nature that it is best to treat them separately. An essay question is "…a test item which requires a response composed by the examinee, usually in the form of one or more sentences, of a nature that no single response or pattern of responses can be listed as correct, and the accuracy and quality of which can be judged subjectively only by one skilled or informed in the subject." An essay question should meet the following criteria: 1. Requires examinees to compose rather than select their response.

Multiple-choice questions, matching exercises, and true-false items are all examples of selected response test items because they require students to select an answer from a list of possibilities provided by the test maker, whereas essay questions require students to construct their own answer.

2. Elicits student responses that must consist of one or more sentences.

Does the following example require student responses to consist of one or more sentences?

Example A How do you feel about the removal of prayer from public school system? In Example A, it is possible for a student to answer the question in one word. For instance, a student could write an answer like "good" or "bad". Moreover, this is a poor example for testing purposes because there is no basis for grading students’ personal preferences and feelings. The following example improves upon the given example in such a way that it elicits a response of one or more sentences that can be graded.

Page 34: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 31

Consider the following argument in favor of organized prayer in school.

School prayer should be allowed because national polls repeatedly indicate that the majority of Pakistanis are in favor of school prayers. Moreover, statistics show a steady moral decline in a country if banning of organized prayer in school. Drug use, divorce rate, and violent crime have all increased since the banning of organized prayer in school.

Analyze the argument by explaining which assumptions underlie the argument. 3. No single response or single response pattern is correct. Which example question below allows for a variety of correct answers? Example B or Example C Example B What was the full name of the man who assassinated President Abraham Lincoln? Example C

State the full name of the man who assassinated President Abraham Lincoln and explain why he committed the murder.

There is just one single correct answer to Example B because the students need to write the full name of the man who assassinated President Abraham Lincoln. The question assesses verbatim recall or memory and not the ability to think. For this reason, Example B would not be considered a typical essay question. Example C assesses students’ understanding of the assassination and it is more effective at providing students the opportunity to think and to give a variety of answers. Answers to this question may vary in length, structure, etc.

4. The accuracy and quality of students’ responses to essays must be judged

subjectively by a competent specialist in the subject.

The nature of essay questions is such that only competent specialists in the subject can judge to what degree student responses to an essay are complete, accurate, correct, and free from extraneous information. Ineffective essay questions allow students to generalize in their responses without being specific and thoughtful about the content matter. Effective essay questions elicit a depth of thought from students that can only be judged by someone with the appropriate experience and expertise in the content matter. Thus, content expertise is essential for both writing and grading essay questions.

Which of the following sample questions prompts student responses that can only be judged subjectively by a subject matter expert? Example D Explain how Arabs would treat women before advent of Islam. Example E As mentioned in class, list main ways women were treated before advent of Islam

Page 35: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 32

In order to grade a student's response to above Examples, the grader needs to know the ways women were treated in specified period in Arab countries. It takes subject-matter expertise to be able to grade an essay response to this question. Review: What is an Essay Question? An essay question is a test item which contains the following four elements:

1. Requires examinees to compose rather than select their response. 2. Elicits student responses that consist of one or more sentences. 3. No single response or single response pattern is correct. 4. The accuracy and quality of students’ responses to essays must be judged

subjectively by a competent specialist in the subject. 1. Advantages, Limitations, and Common Misconceptions of Essay Questions In order to use essay questions effectively, it is important to understand the following advantages, limitations and common misconceptions of essay questions. Advantages

1. Essay questions provide an effective way of assessing complex learning outcomes that cannot be assessed by other commonly used paper-and-pencil assessment procedures.

Essay questions allow you to assess students' ability to synthesize ideas, to organize, and express ideas and to evaluate the worth of ideas. These abilities cannot be effectively assessed directly with other paper-and-pencil test items.

2. Essay questions allow students to demonstrate their reasoning.

Essay questions not only allow students to present an answer to a question but also to explain how they arrived at their conclusion. This allows teachers to gain insights into a student's way of viewing and solving problems. With such insights teachers are able to detect problems students may have with their reasoning process and help them overcome those problems.

3. Essay questions provide authentic experience. Constructed responses are closer to

real life than selected responses.

Problem solving and decision-making are vital life competencies. In most cases, these skills require the ability to construct a solution or decision rather than selecting a solution or decision from a limited set of possibilities. It is not very likely that an employer or customer will give a list of four options to choose from when he/she asks for a problem to be solved. In most cases, a constructed response will be required. Hence, essay items are closer to real life than selected response items because in real life students typically construct responses, not select them.

Limitations

1. Essay questions necessitate testing a limited sample of the subject matter, thereby

reducing content validity. 2. Essay questions have limitations in reliability. 3. Essay questions require more time for scoring student responses. 4. Essay questions provide practice in poor or unpolished writing.

Page 36: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 33

Common Misconceptions

1. By their very nature essay questions assess higher-order thinking. Whether or not an essay item assesses higher-order thinking depends on the design of the question and how students’ responses are scored. An essay question does not automatically assess higher-order thinking skills. It is possible to write essay questions that simply assess recall. Also, if a teacher designs an essay question meant to assesses higher-order thinking but then scores students’ responses in a way that only rewards recall ability, that teacher is not assessing higher-order thinking. Exercise

Compare the following two examples and decide which one assesses higher-order thinking skills.

Example A

What are the major advantages and limitations of essay questions? Example B

Given their advantages and limitations, should an essay question be used to assess the following intended learning outcome? In answering this question provide brief explanations of the major advantages and limitations of essay questions. Clearly state whether you think an essay question should be used to assess students’ achievement of the given intended learning outcome and explain the reasoning for your judgment.

Intended learning outcome: Evaluate the reasons why the nursing process is an effective process for serving clients.

Example A assesses recall of factual knowledge, whereas Example B requires more of students. It requires students to recall facts, to make an evaluative judgment, and explain the reasoning for their judgment.

2. Essay questions are easy to construct.

Essay questions are easier to construct than multiple-choice items because teachers don't have to create effective distracters. However, that doesn’t mean that good essay questions are easy to construct. They may be easier to construct in a relative sense, but they still require a lot of effort and time. Essay questions that are hastily constructed without much thought and review usually function poorly.

3. The use of essay questions eliminates the problem of guessing.

One of the drawbacks of selected response items is that students sometimes get the right answer by guessing which of the presented options is correct. This problem does not exist with essay questions because students need to generate the answer rather than identifying it from a set of options provided. At the same time, the use of essay questions introduces bluffing, another form of guessing. Some students are adept at using various methods of bluffing (vague generalities, padding, name-dropping, etc.) to add credibility to an otherwise vacuous answer. Thus, the use of essay questions changes the nature of the guessing that occurs, but does not eliminate it.

Page 37: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 34

4. Essay questions benefit all students by placing emphasis on the importance of written communication skills.

Written communication is a life competency that is required for effective and successful performance in many vocations. Essay questions challenge students to organize and express subject matter and problem solutions in their own words, thereby giving them a chance to practice written communication skills that will be helpful to them in future vocational responsibilities. At the same time, the focus on written communication skills is also a serious disadvantage for students who have marginal writing skills but know the subject-matter being assessed. To the degree that students who are knowledgeable in the subject obtain low scores because of their inability to write well, the validity of the test scores will be diminished.

5. Essay questions encourage students to prepare more thoroughly.

Some research seems to indicate that students are more thorough in their preparation for essay questions that in their preparation for objective examinations like multiple choice questions.

Review: Advantages, Limitations, and Common Misconceptions of Essay Questions Advantages Essay Questions:

1. Provide an effective way of assessing complex learning outcomes that cannot be assessed by other commonly used paper-and-pencil assessment procedures.

2. Allow students to demonstrate their reasoning. 3. Provide authentic experience. Constructed responses are closer to real life than

selected responses.

Limitations Essay Questions:

1. Necessitate testing a limited sample of the subject matter, thereby reducing content validity.

2. Have limitations in reliability. 3. Require more time for scoring student responses. 4. Provide practice in poor or unpolished writing.

Common Misconceptions 1. By their very nature essay questions assess higher-order thinking.

2. Essay questions are easy to construct.

3. The use of essay questions eliminates the problem of guessing.

4. Essay questions benefit all students by placing emphasis on the importance of written communication skills.

5. Essay questions encourage students to prepare more thoroughly. When to use Essay Questions?

When it is more important that the students construct rather than select the answer.

When a teacher has sufficient resources and/or help (time, teaching assistants) to score the student responses to the essay question(s)

When “the group to be tested is small.”

When a teacher is “more confident of [his/her] ability as a critical and fair reader than as an imaginative writer of good objective test items.”

Page 38: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 35

Concerning the ranking of their students based on test scores, teachers should know that some research suggests that students are ranked about the same on essay questions and multiple-choice questions when tests results are compared (Chase & Jacobs, 1992).

Intended Learning Outcomes

Objective

items Objective or

essay Essay

Students will: 1. Analyze the function of humor in

Shakespeare's “Romeo and Juliet”. 2. Describe the attributes of a democracy. 3. Distinguish between learning outcomes

appropriately assessed using essay questions and outcomes better assessed by some other means.

4. Evaluate the impact of the Industrial Revolution on the family.

5. Know the definition for the Law of Demand. 6. Predict the outcome of an experiment. 7. Propose a solution for the disposal of

batteries that is friendly to users and the environment.

8. Recall the major functions of the human

heart. 9. Understand the “Golden Rule”. 10. Use a theory in literature to analyze a

poem.

x x

x x x x

x x x

The directive verb in each intended learning outcome provides clues about the method of assessment that should be used. This can be seen when taking a closer look at some of the sample intended learning outcomes provided on this page. For example, the verb “recall” means to retrieve relevant knowledge from long-term memory. Students’ ability to recall relevant knowledge can be most conveniently assessed through objectively scored test items. There is no need for students to explain or justify their answer when they are assessed on recall. The verb “analyze” means to determine how parts relate to one another and to an overall structure or purpose. Students can demonstrate their ability to analyze the function of humor in Shakespeare’s “Romeo and Juliet” by either describing the function of humor in their own words or by selecting the right or best answer among different options of a well drafted multiple choice item. The verb “evaluate” means to make judgments based on criteria and standards. To effectively assess students’ ability to evaluate the impact of the Industrial Revolution on the family, the assessment item needs to provide students with the opportunity to not only make an evaluative judgment but to also explain how they have arrived at their judgment. Hence, students’ ability to evaluate should be assessed with essay items because they allow students to explain the rationale for their judgment. Review: When to Use Essay Questions It is appropriate to use Essay questions for the following purposes:

To assess students' understanding of subject-matter content

Page 39: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 36

To assess higher-order thinking skills that cannot be adequately assessed by objectively scored test items.

To assess students' ability to construct rather than to select answers If an intended learning outcome could be either assessed through objective items or essay questions, use essay questions for the following situations:

When it is more important that the students construct rather than select the answer

When a teacher has sufficient resources and/or help (time, teaching assistants) to score the student responses to the essay question(s)

When the group to be tested is small.

When a teacher is more confident of his/her ability as a critical and fair reader than as an imaginative writer of good objective test items

Guidelines for Constructing Essay Questions Students should have a clear idea of what they are expected to do after they have read the problem presented in an essay item. Below are specific guidelines that can help to improve existing essay questions and create new ones. 1. Clearly define the intended learning outcome to be assessed by the item.

Knowing the intended learning outcome is crucial for designing essay questions. If the outcome to be assessed is not clear, it is likely that the question will assess for some skill, ability, or trait other than the one intended. In specifying the intended learning outcome teachers clarify the performance that students should be able to demonstrate as a result of what they have learned. The intended learning outcome typically begins with a directive verb and describes the observable behavior, action or outcome that students should demonstrate. The focus is on what students should be able to do and not on the learning or teaching process. Reviewing a list of directive verbs can help to clarify what ability students should demonstrate and to clearly define the intended learning outcome to be assessed.

2. Avoid using essay questions for intended learning outcomes that are better assessed with other kinds of assessment.

Some types of learning outcomes can be more efficiently and more reliably assessed with selected-response questions than with essay questions. In addition, some complex learning outcomes can be more directly assessed with performance assessment than with essay questions. Since essay questions sample a limited range of subject-matter content, are more time consuming to score, and involve greater subjectivity in scoring, the use of essay questions should be reserved for learning outcomes that cannot be better assessed by some other means.

3. Define the task and shape the problem situation.

Essay questions have two variable elements—the degree to which the task is structured versus unstructured and the degree to which the scope of the context is focused or unfocused. Although it is true that essay questions should always provide students with structure and focus for their responses, it is not necessarily true that more structure and more focus are better than less structure and less focus. When using more structure in

Page 40: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 37

essay questions, teachers are trying to avoid at least two problems. More structure helps to avoid the problem of student responses containing ideas that were not meant to be assessed and the problem of extreme subjectivity when scoring responses. Although more structure helps to avoid these problems, how much and what kind of structure and focus to provide is dependent on the intended learning outcome that is to be assessed by the essay question and the purpose for which the essay question is to be used.

The process of writing effective essay questions involves defining the task and delimiting the scope of the task in an effort to create an effective question that is aligned with the intended learning outcome to be assessed by it. This alignment is absolutely necessary for eliciting student responses that can be accepted as evidence for determining the students’ achievement of the intended learning outcome. Hence, the essay question must be carefully and thoughtfully written in such a way that it elicits student responses that provide the teacher with valid and reliable evidence about the students’ achievement of the intended learning outcome.

Failure to establish adequate and effective limits for the student response to the essay question allows students to set their own boundaries for their response, meaning that students might provide responses that are outside of the intended task or that only address a part of the intended task. If students’ failure to answer within the intended limits of the essay question can be ascribed to poor or ineffective wording of the task, the teacher is left with unreliable and invalid information about the students’ achievement of the intended learning outcome and has no basis for grading the student responses. Therefore, it is the responsibility of the teacher to write essay questions in such a way that they provide students with clear boundaries for their response.

Task(s) and problem(s) are the key elements of essay questions. The task will specify the performance students should exhibit when responding to the essay question. A task is composed of a directive verb and the object of that verb. For example, consider the following tasks:

i. Task = Justify (Directive verb) the view you prefer (object of the Verb)

ii. Task = Defend (Directive verb) the theory as the most suitable for the situation

(object of the verb)

Tasks for essay questions are not developed from scratch, but are developed based on the intended learning outcome to be assessed. In essay questions, the task can be presented either in the form of a direct question or an imperative statement. If written as a question, then it must be readily translatable into the form of an imperative statement. For example, the following illustrates the same essay item twice, once as a question and once as an imperative statement.

Question: How are the processes of increasing production and improving quality in a manufacturing plant similar or different based on cost? Imperative statement: Compare and contrast the processes of increasing production and improving quality in a manufacturing plant based on cost.

(compare and contrast processes based on cost). Whether essay questions are written as imperative statements or questions, they should be written to align with the intended outcome and in such a way that the task is clear to the students.

The other key element of essay questions is the “problem.” The “problem” in essay questions includes the unsettled matter or undesirable state of affairs that needs to be

Page 41: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 38

resolved. The purpose of the problem is to provide the students with a context within which they can demonstrate the performance to be assessed. Ideally, students would not have previously encountered the specific problem.

Problems within essay questions differ in the complexity of thinking processes they elicit from students depending on the intended learning outcome to be assessed. For example, if the intended outcome is to assess basic recall, the essay question could be to summarize views as given in class concerning a particular conflict. The thinking process in this case is fairly simple. Students merely need to recall what was mentioned and discussed in class. Yet consider the problem within the essay question meant to assess students’ abilities to evaluate a particular conflict and to justify their reasoning. This problem is more complex. In this case, students have to recall facts about the conflict, understand the conflict, make judgments about the conflict and justify their reasoning.

Depending on the intended learning outcome to be assessed, teachers may take different approaches to develop the problem within an essay question. In some cases, the intended outcome can be assessed well using a “problem” that is inherent in the task of the essay question.

Example:

Intended Learning Outcome: Understand the interrelationship of grade histories, student reviews and course schedules for students’ selection of a course and professor. Essay Question: Explain the interrelationship of grade histories, student reviews and course schedules for students’ selection of a course and professor.

In the example essay question, the problem is inherent in the task of the question and is sufficiently developed. The problem for students is to translate into their own words the interrelationships of certain factors affecting how students select courses. For intended learning outcomes meant to assess more complex thinking, often a “problem situation” is developed. The problem situation consists of a problem that students have not previously encountered that presents some unresolved matter or into an essay question is to confront students with a new context requiring them to assess the situation and derive an acceptable solution by using:

1. Their knowledge of the relevant subject matter, and 2. Their reasoning skills.

Intended learning outcome: Analyze the impact of War on terror on the Pakistani economy. Less effective essay question:

Describe the impact of War on terror on the Pakistani economy.

More effective essay question:

Analyze the impact of War on terror on the Pakistan economy by describing how different effects of the war work together to influence the economy.

Page 42: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 39

Example of an Evolving Essay Question that Becomes More Focused

1. Less focused essay question:

Evaluate the impact of the Industrial Revolution on England.

2. More focused essay question:

Evaluate the impact of the Industrial Revolution on the family in England.

4. Helpful Instructions: Specify the relative point value and the approximate time

limit in clear directions.

Specifying the relative point value and the approximate time limit helps students allocate their time in answering several essay questions because the directions clarify the relative merit of each essay question. Without such guidelines students may feel at a loss as to how much time to spend on a question. When deciding the guidelines for how much time should be spent on a question keep the slower students and students with certain disabilities in mind. Also make sure that students can be realistically expected to provide an adequate answer in the given and/or the suggested time.

5. Helpful Guidance: State the criteria for grading

Students should know what criteria will be applied to grade their responses. As long as the criteria are the same for the grading of the different essay questions they don’t have to be repeated for each essay question but can rather be stated once for all essay questions. Consider the following example.

Example

All of your responses to essay questions will be graded based on the following criteria:

The content of your answer will be evaluated in terms of the accuracy, completeness, and relevance of the ideas expressed. The form of your answer will be evaluated in terms of clarity, organization, correct mechanics (spelling. punctuation, grammar, capitalization), and legibility.

Essay question, they should specify the relative point value for the content and the relative point value for the form. Rubric for grading long essay exam questions (10 points possible)

Response Score Criteria

Exemplary 10 The answer is complete.

All information provided is accurate.

The answer demonstrates a deep understanding of the content.

Writing is well organized, cohesive, and easy to read.

Competent 9 The answer is missing slight details.

All information provided is accurate.

The answer demonstrates understanding of the content.

Writing is well organized, cohesive, and easy to read.

Minor Flaws 8 The answer is missing multiple details.

All information provided is accurate.

The answer demonstrates basic understanding of the content.

Writing is organized, cohesive, and easy to read.

Satisfactory 7 The answer does not address a portion of the question, or major details are missing.

Almost all information provided is accurate.

The answer demonstrates basic understanding of the content.

Writing is organized, cohesive, and easy to read

Page 43: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 40

Nearly

Satisfactory

6 The answer is lacking major details and/or does not address a portion of the question.

Most information provided is accurate.

The answer demonstrates less than basic understanding of the content.

Writing may be unorganized, not cohesive, and difficult to read.

Fails to

Complete

4 The answer to the question is lacking any detail.

Some information provided is accurate.

The answer demonstrates a lack of understanding of the content.

Writing may be unorganized, not cohesive, and difficult to read.

Unable to

begin

effectively

2 Question is not answered.

A small amount to none of the information provided is accurate.

The answer demonstrates a lack of understanding of the content.

Writing is unorganized, not cohesive, and very difficult to read.

No attempt 0 Answer was left blank.

6. Use several relatively short essay questions rather than one long one.

Only a very limited number of essay questions can be included on a test because of the time it takes for students to respond to them and the time it takes for teachers to grade the student responses. This creates a challenge with regards to designing valid essay questions. Shorter essay questions are better suited to assess the depth of student learning within a subject whereas longer essay questions are better suited to assess the breadth of student learning within a subject. Hence, there is a trade-off when choosing between several short essay questions or one long one. Focus on assessing the depth of student learning within a subject limits the assessment of the breadth of student learning within the same subject and focus on assessing the breadth of student learning within a subject limits the assessment of the depth of student learning within the same subject.

When choosing between using several short essay questions or one long one also keep in mind that short essays are generally easier to score than long essay questions.

7. Avoid the use of optional questions

Students should not be permitted to choose one essay question to answer from two or more optional questions. The use of optional questions should be avoided for the following reasons:

Students may waste time deciding on an option.

Some questions are likely to be harder which could make the comparative assessment of students' abilities unfair.

The use of optional questions makes it difficult to evaluate if all students are equally knowledgeable about topics covered in the test.

8. Improve the essay question through preview and review.

The following steps can help you improve the essay item before and after you hand it

out to your students. Preview (before handing out the essay question to the students)

a. Predict student responses.

Try to respond to the question from the perspective of a typical student.

Page 44: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 41

Evaluate whether students have the content knowledge and the skills necessary to adequately respond to the question. In detecting possible weaknesses of the essay question, repair them before handing out the exam.

b. Write a model answer.

Before using a question, write model answer(s) or at least an outline of major points that should be included in an answer. Writing the model answer allows reflection on the clarity of the essay question. Furthermore, the model answer(s) serve as a basis for the grading of student responses.

Once the model answer has been written compare its alignment with the essay question and the intended learning outcome and make changes as needed to assure that the intended learning outcome, the essay question, and the model answer are aligned with each other.

c. Ask a knowledgeable colleague to critically review the essay question, the

model answer, and the intended learning outcome for alignment.

Before using the essay question on a test, ask a person knowledgeable in the subject (colleague, teaching assistant, etc.) to critically review the essay question, the model answer, and the intended learning outcome to determine how well they are aligned with each other. Based on the intended learning outcome, revise the question as needed. By having someone else look at the test, the likelihood of creating effective test items is increased, simply because two minds are usually better than one. Try asking a colleague to evaluate the essay questions based on the guidelines for constructing essay questions.

Review (after receiving the student responses)

d. Review student responses to the essay question

After students complete the essay questions, carefully review the range of answers given and the manner in which students seem to have interpreted the question. Make revisions based on the findings. Writing good essay questions is a process that requires time and practice. Carefully studying the student responses can help to evaluate students' understanding of the question as well as the effectiveness of the question in assessing the intended learning outcomes.

Review: How to Construct Essay Questions 1. Clearly define the intended learning outcome to be assessed by the item. 2. Avoid using essay questions for intended learning outcomes that are better assessed

with other kinds of assessment. 3. Define the task and shape the problem situation.

a. Clearly define the task. b. Clearly develop the problem or problem situation. c. Delimit the scope of the task.

4. Helpful instructions: specify the relative point value and the approximate time limit in

clear directions.

5. Helpful guidance: state the criteria for grading

Page 45: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 42

6. Use several relatively short essay questions rather than one long one

7. Avoid the use of optional questions.

8. Improve the essay question through preview and review.

Preview (before)

a. Predict student responses. b. Write a model answer. c. Ask a knowledgeable colleague to critically review the essay question, the

model answer, and the intended learning outcome for alignment. Review (after)

a. Review student responses to the essay question.

Review Exercise: How to construct Essay Questions For exercises 1 – 2 develop an effective essay question for the given intended learning outcome. Make sure that the essay question meets the following criteria:

The essay question matches the intended learning outcome

The task is specifically and clearly defined.

The relative point value and the approximate time limit are specified

Exercise Choose an intended learning outcome from a course you are currently teaching and create an effective essay question to assess students’ achievement of the outcome. Follow each of the guidelines provided for this exercise. Check off each step on the provided checklist once you have finished it.

Checklist

1 Clearly define the intended learning outcome to be assessed by the item.

2 Avoid using essay questions for objectives that are better assessed with

3 Objectively-scored items.

4 Use several relatively short essay questions rather than one long one.

5 The task is appropriately defined and the scope of the task is appropriately limited

6 Present a novel situation.

7 Consider identifying an audience for the response

8 Specify the relative point value and the approximate time limit.

9 Predict student responses.

10 Write a model answer.

11 Have colleague critically review the essay question

Page 46: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 43

EVALUATION OF ITEMS

Day 4, Session- 1

Often students judge, after taking the exam, whether the test was fair and good. Teacher is also usually interested about how the test worked for the students. One way to ascertain this is to undertake item analysis. It provides objective, external and empirical evidence for the quality of the items we have pre-tested. The objective of item analysis is to identify problematic or poor items which might be either confusing the respondents or do not have a clearly correct response or a distracter might well be competing with the keyed answer. A good test has good items. Good test making requires careful attention to the principles of item evaluation. The basic methods involve are assessment of item difficulty and item discrimination. These measures comprise item analysis.

Item Analysis Item analysis is about how difficult an item is and how well it can discriminate between the good and the poor students. In other words, item analysis provides a numerical assessment of item difficulty and item discrimination. Item Difficulty Item difficulty is determined from the proportion (p) of students who answered each item correctly. Item difficulty can range from zero (none could solve it)to hundred (all persons solved it correctly). The goal is usually to have items of all difficulty levels in the test so that test could identify poor, average as well as good students. However, most of the items are designed to be average in difficulty levels for they are more useful. Item analysis exercise provides us the difficulty level of each item.

Optimally difficult items are those that 50%–75% of students answer correctly.

Items are considered low to moderately difficult if (p) is between 70% and 85%

Items that only 30% or below solve correctly are considered difficult ones. Item Difficulty Percentage can also be denoted as Item Difficulty Index by expressing it in decimals e.g. .40 for items which could be solved by 40 % of the test-takers. Thus index can range from 0 to 1. Items should fall in a variety of difficulty levels in order to differentiate between good and average as well as average and poor students. Easy items are usually placed in the initial part of the test to motivate students in taking the test and alleviating test-anxiety. The optimal item difficulty depends on the question type and number of possible distracters as well. Item Discrimination Another way to evaluate items is to ask “Who gets this item correct”-- the good, average and the weak students? Assessment of item discrimination answers this query. Item discrimination refers to the percentage difference in correct responses between the poor and the high scoring students. In a small class of 30 students, one can administer the test items, score them and then rank the students in terms of their overall score.

Page 47: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 44

Next, we separate the upper 15 students and the low 15 into two groups: The UPPER and the LOW groups

Finally, we find how well each item was solved correctly (p) by each group. In other words, percentage of students passing (p) each item in each of the two groups is worked out.

Discrimination (D) power of the item is then known by finding difference between the percentage of upper group and the low group. The higher the difference, the greater the discrimination power of an item.

D = (p of upper group - p of lower group) Also see the following tables.

In a large class of 100 or more students, we take the top 25% and the lower 25% students to form upper and lower groups, to cut short the labor or amount of work. The discrimination ratio for an item falls between −1.0 and +1.0. The closer the ratio is to +1.0, the more effectively that item distinguishes students who know the material (the top group) from those who don’t (the bottom group).

An item with a discrimination of 60% or greater is considered a very good item, whereas a discrimination of less than 20% indicates a low discrimination and the item needs to be revised.

An item with a negative index of discrimination indicates that the poor students answer correctly more often than do the good students. Strange! Such items should be dropped from the test.

Ten students in a class have taken a ten item quiz. The students’ responses are shown below from high to low. The top five students can be called the high score group and the bottom half as the low scoring group. The number”1” indicates a correct answer; a “0” indicates an incorrect answer. Students Total

score% Q.1 Q.2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10

1 100 1 1 1 1 1 1 1 1 1 1

2 90 1 1 1 1 1 1 1 1 0 1

3 80 1 1 0 1 1 1 1 1 0 0

4 70 0 1 1 1 1 1 0 1 0 1

5 70 1 1 1 0 1 1 1 0 0 1

6 60 1 1 1 0 1 1 0 1 0 0

7 60 0 1 1 0 1 1 0 1 0 1

8 50 0 1 1 1 0 0 1 0 1 0

9 40 1 1 1 0 0 0 0 0 1 1

10 30 0 1 0 0 0 1 0 0 1 0

Page 48: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 45

Difficulty Index and Discrimination Index are calculated below

Item Correct high group

Correct low group

Difficulty % Discrimination%

Question 1 4 2 60 40

Question 2 5 5 100 0

Question 3 4 4 80 0

Question 4 4 1 50 60

Question 5 5 2 80 60

Question 6 5 3 80 40

Question 7 4 1 50 60

Question 8 4 2 06 40

Question 9 1 3 30 - 40

Question 10 4 2 60 40

Question no 2 was the easiest; no 9 was most difficult.

Question 9 also had negative discrimination and should be removed from the quiz.

100% discrimination would occur if all those in the upper group answered correctly and all those in the lower group answered incorrectly.

Zero discrimination occurs when equal numbers in both groups answer correctly.

Negative discrimination, a highly undesirable condition, occurs when more students in the lower group than the upper group answer correctly.

Items with 25% and above discrimination are considered good. Discrimination by Item Difficulty Graph

Both difficulty and discrimination indices are important parameters which influence each other. A two-way chart indicating relationship between the two indices is shown below. Students’ responses on the 10 item quiz as shown above can be presented on a chart.

Difficulty by Discrimination Chart indicating overall efficacy of the quiz Disc %

60 Item 4, 7

50 Item 5

40 Item 1, 8, 10

30 Item 6

20

10

0 Item 3 Item 2

-10

-20

-30

-40 Item 9

10 20 30 40 50 60 70 80 90 100

D I F F I C U L T Y %

We find from the chart that items of medium difficulty level are more discriminating and useful unlike too difficult (item 9) and too easy items (no 2, 3). Interpreting Distracter Values

Distracters should be ideally equally attractive, but not more than the answer.

Minimum, it must be opted by at least 5% of the examinees.

Page 49: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 46

Weak or non-functional distracters may be substituted with new ones and make sure that they align with the stem as well as the objective of the item, well connected with the rest, and are grammatically correct.

Effectiveness of Distracters Difficulty and discrimination index are estimates about an item which overall comprises a stem and a set of distracters or options (Appendix- A). The item analysis statistics reflects on the goodness of both distracters and the stem. Let us look at the guidelines which can help us improve them.

1. Most MCQs have 2-4 distracters; 3 is better, 4 is best at the college level Where it is difficult to think of more than one distracter, frame it as true/false item

2. Distracters that have less than 5 percent response rate are weak and may be changed / improved. Distracters which attracted no response are not working at all.

3. No distracter should be chosen more than the keyed response in the upper group 4. Similarly, no one distracter should pull more than about half the students. 5. If students have respond about equally to all the options, they might be marking

randomly or wildly guessing. Critically check contents of such items. They might have been written badly and the students seem to have no idea what you are asking. It could be very difficult items and students might be completely baffled.

6. If the low group gets the keyed answer as often as the upper group, all the distracters might be looked into again. Or drop the item if you have a large pool of items.

Other theoretical points to consider: 7. Do not repeat a phrase in the options if it can be stated in the stem. Thus make short

and precise distracters. 8. Us Options appear on separate lines and are suitably indented. 9. There should be plausible and homogeneous distracters which are presented in

logical or numerical order and must be independent of one another. 10. Keep the alternatives mutually exclusive, homogeneous in contents and free from

clues that might indicate which response is correct. These should be moreover parallel in form and similar in length.

11. The position of the keyed response should vary from the A, B, C and D positions 12. Distracters should be related or somehow linked to each other, should appear as

similar as possible to the correct answer and should not stand out as a result of their phrasing . If the stem is in past tense, all the options should be in past tense. If the tense calls for a plural answer, all the options should be plural. Stem and options should have subject-verb agreement. All options follow grammatically from the stem.

13. Options should not include “none of the above” or “all of the above.” None of the above is problematic in items where judgment is involved and where the options are not absolutely true or false.

14. When more than one option has some element of accuracy but the keyed Response is the best, ask the students to select the “best answer” rather than “correct answer.”

Session 2: Item Analysis and Subsequent Decision Activity: Identifying poor items and ways to improve them

Objectives: 1- To consolidate the preceding presentation 2- Applying principles & conventions of item construction / Brainstorming 3- Hands-on-Practice / Learning by doing

Page 50: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 47

Method: Interactive Class Discussion

Material: A set of 30 flawed MCQs (Appendix-A)

Outcome: Learning how to detect and remove flaws. Decisions Subsequent to Item Analysis

1. Item revision to remove flaws or write alternative items 2. Does the reviewed pool of item correspond with the original table of specifications

and stipulated objectives? Discrepancies, if any, have to be removed before using the test.

3. While assembling a test (out of the pre-tested pool of items) set the items into groups (parts of the test) with appropriate instructions.

4. Check the scoring key of the revised test. 5. Deciding about the duration / time of the test for actual use on the basis of :

a) rate of omitted responses in the pre-test b) observation of the test administrator

6. May review scoring / grading scheme e.g. choose or drop negative marking. 7. Be informed about instructional weaknesses and student misconception to prepare

them better in future. May even coach students about MCQs solving strategies.

Session 3: Desirable Qualities of a Test All tests are desired to be valid and reliable, but no tests are more so than MCQs because of several advantages over other examination techniques. Here we mention of four qualities of a test.

1- Reliability Reliability can be defined as a procedure that tells us whether a test is likely to yield the same results if administered multiple times to the same group of test takers. In other words a test is said to be reliable if it measures consistently. If there is consistency or homogeneity among questions, it enhances reliability of the test. A test may be having several parts: mathematics, verbal comprehension etc. therefore separate reliability of each part will be worked out. Since MCQs have clear-cut, unambiguously correct or incorrect objectively score-able answers, there is more marker reliability in such assessment. However, no test or measure is perfect. A certain degree of error does creep in called random or chance error. In measuring length with a ruler, for example, there may be random error associated with your eye's ability to read the marking or extrapolate between the markings. In addition, the scale that you use to measure length not be very precise and accurate These factors fluctuate and vary from time to time influencing students’ performance on the test adversely. When such chance or random error is kept to the minimum, test scores truly reflect on students’ ability. Reliability, as a statistical estimate, ranges between 0-1. There are three sources of error which adversely influence reliability.

Page 51: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 48

1) Factors in the test itself Most tests contain a collection of items that represent different skills therefore test

content are usually not homogeneous.

Length of test also matters: A test of 50 items would be more reliable than that of 30. Psychometric theory suggests that adding more items should increase the reliability of the items, provided distracters are good ones.

Other sources of test error include the effectiveness of the distracters and difficulty of the items: Too or too easy item limit the reliability of the test.

Variations in testing procedure such as making changes in the test instructions or time limit can also undermine reliability. Data of such cases should not be pooled.

2) Factors in test-takers Changes in student's attitudes, health, mood, sleep etc can affect the quality of their efforts and thus their test-taking consistency. For example, test takers may make careless errors, misinterpret test instructions, forget test instructions, inadvertently omit test sections, or misread test items.

3) Scoring errors These refer to scoring rubrics, and a host of rater errors manifested in exams.

How to Estimate Reliability? Educational tests are reliable when they have homogeneous contents. Performance of the students is consistent on homogeneous test contents: The poor students would perform poorly throughout the test and vice versa. Among several methods: test-retest, split-half, alternative form and internal consistency, the last one is particularly salient to scholastic tests. ‘KR-20’ as it is formally called is a statistical method that is related to item analysis work as well. ________________________________________________________________________ KR-20 formula statistically works out the reliability of an educational or ability test, once the difficulty level of the items is known, that is what proportion of students passed (p) and what proportion failed (q) an item unit of a test. KR-20 formula = (n /n-1) SDt² - ∑ pq / SDt² Where;

n = number of items, p = pass percentage of an item in the total group q = fail percentage of the item in the total group SDt²= variance of the total test

Page 52: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 49

________________________________________________________________________ Let us apply this formula to 10 item quiz that we item-analyzed in the earlier session.

Mean = 6.5 SD = 2.17 __________________________ Items p q pq 1 .6 .4 .24 2 1 0 0 3 .8 .2 .16 4 .5 .5 .25 5 .8 .2 .16 6 .8 .2 .16 7 .5 .5 .25 8 .08 .92 .07 9 .3 .6 .18 10 .6 .4 .24 ___ ∑ p q =.1.71 __________

KR-20 = ( k / k-1) SDt² - ∑ p q / SDt²

Next we put values in the formula = ( 10 / 10-1) 2.17 - 1.71 / 2.17 = (1.1) .46 / 2.17 = (1.1) 2.11 = .235 Answer

K = number of items SDt2 = Standard deviation of the total class squared p = proportion of students who pass an item q = proportion of students who fail in solving the item

________________________________________________________________________ Values of .8 or higher are considered satisfactory for a test of 50 or more items. For a short quiz of 10 items which has some flawed items as well as pointed out through the p and q values, the reliability index would be very modest. The quiz when revised would gain in reliability and if more items are added to it, provided they are good ones, it will improve in reliability still more. Degree of reliability is a function of the number of items in a quiz. Lengthened tests comprehensively cover the contents / subject matter. The more the merrier! Validity Validity indicates whether a test measures what it is purported to measure. Usually test based on MCQs cover the entire course and is therefore potentially a more valid assessment than the descriptive tests. The scores of a not-so-valid a test are less credible in warranting student’s mastery of the course material. It is therefore not safe to draw any inference or decision from it. Assessing content validity is very salient and essential to an educational test. Course exams or scholastic tests are required to cover and represent the entire course domain / knowledge to be called valid tests. Subject specialists or experts judge how valid a test is by contents.

Page 53: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 50

Content reliability essentially involves systematic examination of test contents to determine whether it covers a representative sample of the knowledge domain / course along with the learning /course objectives in the right proportion (as specified in the table of specification. (See Appendix-B)

For example, a test intended to measure knowledge of science in fifth grade course in science is judged by a panel of 2-3 teachers who estimate as experts how representative the test contents are of grade-5 science and the objectives of learning course with due emphasis. They rate the test material accordingly and degree of agreement in their ratings is considered content validity index. Another method to establish validity is to correlate test scores on this test with another established test or criterion: For example correlating scores in Economics Test in a local university with overall GPA or with GRE score etc. This procedure is called criterion related validity. Practicality MCQs are practically useful and efficient especially in a large scale testing situation unlike descriptive tests which are more resource intensive and demand time and money. Increasing enrolment in college and universities has made the staff to incorporate objective testing within their assessment for more efficient examining of students. MCQs are versatile and adaptable to various levels of learning / educational objectives (recall, comprehension, application, analysis). MCQs test a broad area of knowledge in a short time and are moreover easy to score. Moreover they yield rich data for psychometric analysis. To develop quality MCQs faculty need have sufficient know-how of the techniques of test construction besides having motivation to persevere in this intensive work of framing and pre-testing question papers. It also requires them to orient students to MCQs as a system of assessment and evaluation. Objectivity

It refers to fairness and uniformity in the test scoring procedure. Examiner / rater bias is therefore non-existent in MCQ based tests. That is why they are called objective tests. Further, the analysis of the test data is undertaken statistically which further assesses various dimensions of the tests to make them more precise and accurate instruments.

Page 54: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 51

Professional Ethics in Educational Assessment

Resource: Dr. Pat Nellor Wickwire, Education Resource Information Center (ERIC)

ACCOUNTABILITY to all Stake Holders Internal & External Clients

Professional Norms / Ethical Standards

Professional Applications / Code of Conduct 1. Teacher as scientist and practitioner 2. Professional Practice Ethics

Engage Empowerment (E)

Temper Tone (T)

Honor Humility (H)

Internalize Integrity (I)

Communicate commitment ( C)

Synthesize Standards (S) CONTRIBUTE TO THESE HIGH PRINCIPLES

Welfare and development of the client

Equal access to all / Clients

Maintaining loyalty to all / clients

Final decision is that of the individual

DESIGN & IMPLIMENT EDUCATION ASSESSMENT AS AVALUE-ADDED COMPONENT TO LEARNING

Select Assessment Method (searching available and construct able material)

Representativeness of domain sampling

Meaningful score reporting

Balance between highest quality and greatest benefits PREPARATION

Printing, question papers (uniform-varied)

Exam Hall size &number of examinees

Training in test administration, monitoring students

Security of material (with precise accounting) MARKING / DTADING & REPORTING

Timeliness, accuracy and clarity

Result card / Transcript INTERPRETATION OF ASSESSMENT RESULTS

Curved or absolute scores

How many slabs /grades, grades cut-off score

Local examining rules and regulations

Scoring, record keeping and access to class results COMMUNICATE

Result to be conveyed in clear and understandable terms APPLICATION (of results /assessment)

For welfare, development and growth of students in terms of

Page 55: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 52

Selection of future subjects /courses/ careers

Diagnostic counseling & remedial education

Institutional education planning / students for future programs

Changes in future instructional strategies / interventions EVALUATION

Of formative and summative assessment

Judging learning outcomes of students

Improvements in A-E above References

Bloom B S. Taxonomy of educational objectives. Vol I: Cognitive domain. New York, NY: M Cox K R, Bandaranayake R. (1978). How to write good multiple choice questions. Med Journal, Medline, 2 , 553–554. Kaplan & Saccuzzo (2002) Psychological Testing: Principles, Applications & Issues (7th Edition) Pat Nellor Wickwire (2004) Application of professional ethics in educational assessment. Education al Resources Information Center (ERIC) Chapter 25, pp 349-362.(Appendix-D http://testing.byu.edu/info/handbook/betteritems.pdf

Page 56: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 1

List of Appendices

Appendix-A: Activity material for flawed items (Bibliography of Multiple-Choice

Question Resources) Appendix-B: Table of Specifications / Test Blueprint Appendix-C: Cognitive Domain, Instructional Objectives and item example Appendix-D: Professional Ethics in Assessment (Chapter Reading) Appendix-E: PowerPoint Presentations

Page 57: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 2

Appendix-A Activity material for flawed items

Bibliography of Multiple-Choice Question Resources

Books: Bloom, Benjamin B. (Ed.) Taxonomy of Educational Objectives: the classification of

educational goals, by a committee of college and university examiners 1st Ed. New York: Longmans, Green, 1956.

Davis, Barbara Gross. Tools for Teaching San Francisco: Jossey-Bass, 1993.

Erickson, Bette LaSere and Diane Weltner Strommer. Teaching College Freshmen San Francisco: Jossey-Bass, 1991.

Jacobs, Lucy Cheser and Clinton I. Chase. Developing and Using Tests Effectively: A Guide for Faculty San Francisco: Jossey-Bass, 1992.

McKeachie, Wilbert. Teaching Tips: Strategies, Research, and Theory for College and University Teachers (9th Ed.) Lexington, Mass: D.C. Heath and Company, 1994.

Miller, Harry G., Reed G. Williams, and Thomas M Haldyna. Beyond Facts: Objective Ways to Measure Thinking Englewood Cliffs: Educational Technology Publications, 1978.

Articles: Clegg, Victoria L. and William E. Cashin. "Improving Multiple-Choice Tests." Idea Paper #16,

Center for Faculty Evaluation and Development, Kansas State University, 1986.

Fuhrman, Miriam. "Developing Good Multiple-Choice Tests and Test Questions." Journal of Geoscience Education 44 (1996): 379-384.

Johnson, Janice K. ". . . Or None of the Above." The Science Teacher 56.2 (1989) 57-61.

Websites: University of Capetown's Guide to Designing and Managing Multiple Choice Questions

Contact:

Email: [email protected], Phone: 541-346-2177 Fax: 541-346-2184 University of Oregon.

Page 58: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 3

Appendix-B

Table of Specifications / Test Blueprint

Once you know the learning objectives and item types you want to include in your test you should create a test blueprint. A test blueprint, also known as test specifications, consists of a matrix representing the number of questions you want in your test within each topic and level of objectives. The blueprint identifies the objectives and skills that are to be tested and their relative weight age. The blueprint can help you steer through the desired coverage of topics as well as levels of objectives. Once you create your test blueprint you can begin writing your items!

Topic A Topic B Topic C Topic D TOTAL

Knowledge 1 2 1 1 5 (12.5%)

Comprehension 2 1 2 2 7 (17.5%)

Application 4 4 3 4 15 (37.5%)

Analysis 3 2 3 2 10 (25%)

Synthesis 1 1 2 (5%)

Evaluation 1 1 (2.5%)

TOTAL 10 (25%) 10 (25%) 10 (25%) 10 (25%) 40

This sketch indicates a plan for 40 items.

The 40 Items equally cover four topics being examined through this test. That is 10 items (25% of the test) for each topic.

The items will, moreover, test 6 objectives (of the test): Knowledge, comprehension, application, analysis, synthesis & evaluation. It is further shown in this table that the test constructer wants to write 5 items about testing knowledge about various topic, a little more(7) about comprehension and still more (17) about application and so on.

Page 59: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 4

Appendix-C

Cognitive Domain, Instructional Objectives and item example

1-Knowledge Outcome: Identifies the meaning of a term.

Reliability is the same as: A. consistency* B. relevancy C. representativeness D. usefulness

Outcome: Identifies the order of events. What is the first step in constructing an achievement test?

A. Decide on test length. B. Identify the intended learning outcomes* C. Prepare a table of specifications. D. Select the item types to use.

2-Comprehension Outcome: Identifies an example of a concept or principle.

Which of the following is an example of a criterion-referenced interpretation? A. Derik earned the highest score in science B. Erik completed his experiment faster than his classmates C. Edna’s test score was higher than 50 percent of the class D. Tricia set up her laboratory equipment in five minutes*

3-Application Outcome: Distinguishes between properly and improperly stated outcomes. Which one of the following learning outcomes is properly stated in terms of student performance?

A. Develops an appreciation of the importance of testing. B. Explains the purpose of test specifications* C. Learns how to write good test items. D. Realizes the importance of validity.

4-Analysis Directions: Read the following comments a teacher made about testing. Then answer the

questions that follow by circling the letter of the best answer. “Students go to school to learn, not to take tests. In addition, tests cannot be used to indicate a student’s absolute level of learning. All tests can do is rank students in order of achievement, and this relative ranking is influenced by guessing, bluffing, and the subjective opinions of the teacher doing the scoring. The teacher-learning process would benefit if we did away with tests and depended on student self-evaluation.” Outcome: Recognizes unstated assumptions.

Which one of the following unstated assumptions is this teacher making?

A. Students go to school to learn. B. Teachers use essay tests primarily. C. Tests make no contribution to learning*

Page 60: Manual on Test Item Construction Techniques

Training Module on “Effective Test Item Construction Techniques”

Page 5

D. Tests do not indicate a student’s absolute level of learning.

Outcome: Identifies the meaning of a term.

Which one of the following types of test is this teacher primarily talking about?

A. Diagnostic test B. Formative test C. Pretest D. Summative test*

5-Synthesis

Given a short story, the student will write a different but plausible ending.

(See paragraph for analysis items) Outcome: Identifies relationships.

Which one of the following propositions is most essential to the final conclusion?

A. Effective self-evaluation does not require the use of tests* B. Tests place students in rank order only. C. Test scores are influenced by factors other than achievement. D. Students do not go to school to take tests.

6-Evaluation

Given a description of a country’s economic system, the student will defend it by basing arguments on principles of socialism.

Reference:

1. Kubiszyn, K., & Borich, G. (1984). Educational testing and measurement: Classroom application and practice. Glenview, IL: Scott, Foresman, pp. 53-55.

2. Gronlund, N. E. (1998). Assessment of Student Achievement. Boston: Allyn and

Bacon. Last revised