EXAMSOFT WHITE PAPER Design of a Tagged Electronic ... · | [email protected] | 866.429.8889 Design...

15
www.examsoft.com | [email protected] | 866.429.8889 Design of a Tagged Electronic Database of Exam Questions (TEDEQ) as a Tool for Assessment Management within an Undergraduate Medical Curriculum. Dr. Dale D. Vandre Department of Physiology and Cell Biology, College of Medicine, The Ohio State University, Columbus, OH Eric Ermie Office of Medical Education, College of Medicine, The Ohio State University, Columbus, OH EXAMSOFT WHITE PAPER

Transcript of EXAMSOFT WHITE PAPER Design of a Tagged Electronic ... · | [email protected] | 866.429.8889 Design...

www.examsoft.com | [email protected] | 866.429.8889

Design of a Tagged Electronic Database of Exam Questions (TEDEQ) as a Tool

for Assessment Management within an Undergraduate Medical Curriculum.

Dr. Dale D. VandreDepartment of Physiology and Cell Biology, College of

Medicine, The Ohio State University, Columbus, OH

Eric ErmieOffice of Medical Education, College of Medicine,

The Ohio State University, Columbus, OH

EXAMSOFT WHITE PAPER

Abstract

An aspect of curriculum mapping that is often overlooked is exam blueprinting,

which provides a necessary link between instructional content and examination items.

Computerized testing not only increases the efficiency and reliability of examination

delivery, it also provides an effective tool for exam blueprinting, item banking, and

management of examination content and quality. We designed a method to categorize

the exam items used in our preclinical medical curriculum using a unique identifying tag

to create a tagged electronic database of exam questions (TEDEQ) using the SofTeach

module of the ExamSoft test management system. Utilizing the TEDEQ output, a detailed

report of exam performance is now provided to students following completion of each

examination.

This enables students to better evaluate their performance in relevant subject areas

after each examination, and follow their cumulative performance in subject areas spread

longitudinally across the integrated medical curriculum. These same reports can be used

by faculty and staff to aid in academic advising and counseling of students. The TEDEQ

system provides a flexible tool for curricular management and examination blueprinting

that is relatively easy to implement in a medical curriculum. The information retrieved from

the TEDEQ enables students and faculty to better evaluate course performance.

Introduction

The importance of assessment in medical education is well established, not only

for guiding learning and ensuring competence, but also as a means to provide feedback

to the student as they negotiate the curriculum. Perhaps the most critical aspect of

assessment is the licensure exam, which has a significant impact on determining career

options. Assessment of knowledge and evaluating the acquisition of competencies by

students during their undergraduate medical education relies heavily upon the utilization

of multiple choice question based examinations. Despite the extensive use of multiple

choice questions in medical curricula comparatively little faculty time is dedicated to the

construction of the exam in comparison to the time involved in the design, preparation,

and delivery of curricular content.1 Instructors often do not devote sufficient effort into

the preparation of questions, and the exam tends to be assembled at the last minute with

little or no time for adequate review of the questions or evaluation of the overall balance

and quality of the exam as a whole.2 As a result, the quality of in house multiple choice

questions, especially in the pre-clinical curriculum, may suffer from being too reliant on

questions that focus on simple recall and comprehension of knowledge without effectively

testing higher order thinking skills.3,4

Curriculum mapping programs have been designed to facilitate the management

of integrated medical school curricula in order to keep track of institutional objectives,

course objectives, content areas, learning resources, learning events, assessments, and

outcomes. Many of these approaches focus on generating a database that includes a

taxonomy of subjects or concepts included in the curriculum with varying degrees of

granularity. Examples include databases such as KnowledgeMap,5 Topics for Indexing

Medical Education (TIME),6,7 and CurrMit.8 Documentation of where specific topics are

covered in the curriculum is an important component of the accreditation process for

medical schools, and utilization of curriculum management tools provides an efficient

mechanism to aid in addressing accreditation standards.

One component of effective curriculum mapping is exam blueprinting, which is often

overlooked in medical education.3,9 A test blueprint is used to link the content delivered

during a period of instruction to the items appearing on the corresponding examination,

and is a measurement of the representativeness of the test items to the subject matter.10,11

To improve the quality of written examinations it is essential that the examination questions

reflect the content of the curriculum. Therefore, test blueprinting is a critical component

of improving examination quality, but linking curricular topics with those in the exam

questions is not sufficient to ensure that the examination has content validity.9,12 In addition

to measuring whether the examination adequately represents the learning objectives,

content validity ensures that the examination is comprehensive and does not contain

biased and/or under sampling of the curriculum. Moreover, test blueprinting ensures that

the questions are balanced with regard to degree of difficulty, that the items are clearly

written and the format is not flawed, and that the examination measures higher order

thinking skills and not just factual recall.1,3,9,13 Therefore additional information, beyond

that provided by the subject taxonomy, is required of individual test questions in order to

effectively determine the content validity of assessments.

The introduction of computerized testing to medical education provides an opportunity

to increase the efficiency and reliability of the assessment process. When compared to

paper-and-pencil examinations, no difference was found in the performance of medical

students on computer-based exams.14 In addition to delivery of examination by computer,

programs designed to maintain a database of exam questions, or item bank, from which

examinations could be assembled were described nearly 30 years ago.15 However,

testing software must be extremely flexible to meet the demands of a medical school

curriculum. In addition to facilitate item banking, the software must be user friendly, be

able to collect item statistics, provide immediate scoring feedback, have the capability of

presenting items using various multimedia formats, and be able to deliver the examination

in a secure mode. Because of these various demands, suitable commercial software

products were unavailable until recently, and as a result medical schools that were early

adopters of computerized testing developed in-house solutions such as the ItemBanker

program developed at the University of Iowa.16 As part of the ItemBanker system, each

exam question is identified by a unique serial number, and the database allows for a

breakdown of question topic taxonomy, and provides statistics on performance and item

difficulty.

On-line administration of licensure examinations is becoming more commonplace in

professional education. The United States Medical Licensing Examination (USMLE) Step

1, Step 2, and Step 3 tests have been delivered in a computerized format since 1998, and

the National Board of Medical Educators provide an increasing number of examinations in

an on-line format. Similarly, the bar examination is used to determine whether a student

is qualified to practice law in a given jurisdiction. Unlike the USMLE, which is a national

examination, bar examinations are administered by each state in the United States, and 38

states currently use ExamSoft Worldwide, Inc., as the provider of secure on-line computer-

based testing software for the administration of the state bar examination.17 Based upon

this utilization, we evaluated ExamSoft among other commercial testing software products

for the administration of examinations to pre-clinical medical students, and adopted

ExamSoft for the administration of multiple-choice examinations in 2009.

The construction of a well-written exam is required to effectively measure student

competencies throughout the curriculum. Ultimately, exam performance is used to assess

the success of the educational program in preparing the student for licensure exams and

more advanced training. Therefore, a quality exam must not only measure the student’s

application of knowledge, but it is also essential that the questions adequately reflect the

course content and objectives. We describe the development of a tool to generate a tagged

electronic database of exam questions (TEDEQ) that can be used for the categorization

of multiple choice examination questions. TEDEQ provides information necessary to link

existing questions to curricular objectives using a taxonomy of instructional objectives,

identifies question characteristics, and helps ascertain the level of knowledge required to

address the question. The TEDEQ tool is easy to implement and integrate into existing

curricula and can be customized using the SofTeach module of the ExamSoft test

management system to derive the maximum amount of information from assessments. We

have integrated the TEDEQ tool as part of the computerized administration of our exams

using ExamSoft, and the information is being used to help improve examination quality,

provide input into curricular management, supplement curricular mapping documentation

for accreditation, and make content area specific performance feedback available to our

students.

The preclinical Integrated Pathway (IP) program at Ohio State University College of

Medicine is broken down into organ system blocks, which are subdivided into divisions

ranging in length from three to five weeks. At the end of each division, a 100-125 multiple

choice question examination is administered to assess whether the outlined learning

objects have been achieved. We have amassed an item bank of over 3,500 multiple

choice questions distributed across 22+ exams during the Med 1 and Med 2 years of the

IP curriculum. Previously, little or no information was gathered linking course learning

objectives with the items included on the examination, rather test items were simply

grouped according to the division test they were part of. Additionally the only performance

feedback provided to the students, administrators, or faculty following an examination

was the overall test scores and an item analysis of each exam question.

We set out to design a system that would provide more thorough feedback to the

students regarding their performance in content specific areas of a particular exam as well

as across longitudinal topics that span the curriculum. In addition, we wanted to generate

information that would help faculty guide curricular management and enable improved

examination quality, and give administrators an additional measure by which to compare

internal course performance against student performance on the USMLE Step 1 exam.

To accomplish these goals we developed a simple coding system for each question, the

Tagged Electronic Database of Exam Questions, which utilizes specific data markers to

categorize, organize, and track use of items. We accomplished this goal using features of

the question categorization tool developed within ExamSoft, which is the software used

currently for secure computerized delivery of examinations within the IP program.

While the question categorization system, contained within the ExamSoft software,

allowed for an unlimited number of categories to be assigned to each question, one of our

design objectives was to limit the number of fields applied to each question as well as the

granularity of the topic categories in order to obtain data that would be most meaningful.

For students useful information would include performance feedback in subject areas of

the curriculum, provide a study guide indicating areas of deficiency for those students

requiring remediation, and serve as an aide in planning for their USMLE Step 1 preparation.

Further, these limitations allowed us to create a system that was not overly complex or

difficult for faculty to implement. This structure insured greater faculty buy in, without

compromising the impact of information collected, which was required to meet our goals

with regard to curricular management and test construction.

The method used for item tagging/question categorization consisted of six categories

as outlined in Table 1. Each of these categories provides distinct information that will have

different significance depending on the recipient audience. The content for categories one

and two associate the exam question with the block and division of the IP curriculum, and

provides a link to the faculty member responsible for writing the item. In most cases, the

Methods

identified faculty member is also responsible for the design and delivery of the learning

materials used to meet the learning objective for which the question is designed to assess.

Therefore, the tagging system links the question to the appropriate content expert if

questions or issues arise over the validity or accuracy of the question. The “options” for

each of the first two categories are simply the names of the blocks in the curriculum and

the names of the faculty members.

Table 1 contains a detailed list of all the categorization options within the first five categories of the TEDEQ system.  This table is distributed to faculty for use in categorizing questions.

Table 1 – Design of the Tagged Electronic Database of Exam Questions (TEDEQ) Categories

For category three we defined four component choices that would classify the

question type with regard to level of cognitive complexity: 1) Recall of Factual Knowledge

- memory recall/factoid questions; 2) Interpretation or Analysis of Information - questions

that required the interpretation of data from a table/graph and use of that information

to answer the question; 3) Application: Basic Science Vignette - questions pertaining to

foundational science that contain a vignette of patient information that must be applied

requiring multiple steps of knowledge application to deduce the correct answer; and 4)

Application: Clinical Science Vignette - questions pertaining to clinical science that contain

a vignette of patient information that must be applied requiring multiple steps of knowledge

application to deduce the correct answer. To code the question, a number was assigned

to each category component. For example, a recall question generated by Dr. Smith in

the first division of the Neuroscience block would be coded Neuro1.Smith.1. Subsequent

categories were assigned numeric values and added to the end of the code sequentially

separated by periods.

Categories four, five and six are used to map the question to content areas. The

categorization was designed to meet the components of our curriculum, but is also flexible

enough to be applied to other medical curricula as well. To make the tagging system as

widely applicable as possible we focused on the subject categories of the USMLE Step 1

exam.18 We first tag the question into broad categories with regard to process and focus.

These include normal process, abnormal process, therapeutics, or gender, ethnic, and

behavioral considerations.

All medical schools whose students take the USMLE Step 1 exam receive a report from

USMLE that breaks down the performance of their students into 20 specific categories in

comparison to the national average for each of those categories. These categories are

the same as the 20 categories content is broken down in the USMLE Step 1 study guide

provided to medical students for Step 1 exam preparation. Since these major subject

areas are comparable to those used in either the block design of our curriculum or as

longitudinal subject areas that run across the IP curriculum, they were adopted as the

20 subject areas for category five. The study guide also contains specific sub-topics for

each of the 20 subject areas. We reviewed the sub-topics from the USMLE Step 1 study

guide and compared them with an internal set of learning objectives and topics that we

use within our curriculum. These two lists were combined and modified as necessary to

create the sub-categories that comprise category six. A sampling of those sub-categories

is presented in Table 2. A set of sub-categories was created for and matched to each of

the 20 subject areas of category five.

Table 2 contains a sampling of the sub-categories used in the TEDEQ system.  In total there are 290 sub-categories within the system that are associated with one of the 20 USMLE subject areas.  Faculty were required to select one sub-category for each subject area they associated with a question.

Faculty buy in was key to the successful implementation of the system. As such we

met with our faculty members who serve as block leaders within the curriculum to explain

the categorization system, how it worked and what it could do for them. We provided

all block leaders instructions detailing the guidelines for applying the categorization to

their exams. Faculty members were instructed that they could only assign one option

from each category for categories one through four. A question could be assigned to up

to three subject categories (category 5), however for each designation in category five a

corresponding sub-category designation in category six is required. The number of sub-

categories per subject category ranged from 5 to 25.

Rather than attempt to force a curriculum wide application of the TEDEQ system

simultaneously, we chose to work the application into the existing framework of the exam

process. Therefore, categorization was required for each examination as it was generated

during the academic year. As a part of normal preparation and revision, application of

the TEDEQ categorization was added to the examination development process. A report

was generated using ExamSoft that served as a template for faculty members to review all

questions appearing on the exam. The report included the item analysis for each question

(if available from previous assessment records), and a section was provided for assignment

of the TEDEQ code to each question (Figure 1). The TEDEQ database was generated based

upon the codes assigned to each exam item. The database could be used retroactively,

since any question used on a previous assessment in ExamSoft would be recognized

and assigned TEDEQ categories. This allowed second year medical students to review

category performance on previous exams from their first year of medical school.

Figure 1 - Sample Question with TEDEQ Categorization

Figure 1 contains an example from the report faculty members use to categorize exam questions.  It displays the item analysis of the question (from use on the previous year’s exam), the question text, the image (if any) associated with the question and the categories associated with each question.

The TEDEQ method has been implemented successfully across the Med 1 and Med 2

IP curriculum for the 2010-2011 academic year. It has been well received by both faculty

and students, and we have successfully gathered data on all of our division assessments

using this system. While data collection is continuing, there have been some immediate

applications of the information provided by TEDEQ. As feature a of the ExamSoft software

students can instantly view a breakdown report of their individual performance in every

category applied to exam items (Table 3).

Results

Table 3 - Sample Exam Performance Breakdown

Table 3 is a sample of the TEDEQ report generated following each exam, which breaks down the performance of each student on that exam.  This same report is also generated for faculty and staff; however it also contains the breakdown for the overall class performance for comparison purposes.

The students access these reports at anytime post exam by logging into a website,

and can view and compare their results from exam to exam throughout the course of the

year. Course leaders receive an identical report, but the results represent an aggregate for

the class as a whole and not an individual student. The individual students performance

data can also be readily plotted in comparison to class average performance for any

coded category, which serves as an additional aid for students to evaluate their academic

success in the IP program (Figure 2). For example, a student would be able to evaluate

their performance on recall questions in comparison to clinical vignettes, or in particular

subject areas such as pharmacology. This information can be provided to the student for

each exam as shown in Figure 2, as well as a cumulative average across exams as the

student progresses through the curriculum. These individual student performance reports

(Table 3 and Figure 2) can be generated for the faculty and staff review as necessary for

academic advising.

Figure 2 - Sample Student Performance Analysis

Figure 2 illustrates comparative student performance in four of the six TEDEQ categories to the performance of the whole class.  This report is used by students and faculty to distinguish areas of strength and weakness in student performance.

Discusion

The TEDEQ method we developed for creating a linked database of information relating

exam items to curricular content was implemented with several specific applications in mind.

At the time of development, we did not have the ability to either accurately monitor how

well examination content addressed instructional objectives nor track student performance

on specific longitudinal foundational science subjects that cross blocks of the curriculum

and appear on multiple assessments. Therefore, the initial goal was to create a centralized

repository of information that allows for both improved quality control of assessments and

more detailed tracking of student performance in selected subject areas that span the

curriculum. The preclinical IP program is organized around organ systems, and integrates

normal structure/function with pathophysiology and clinical aspects of disease.

The TEDEQ coding system begins by identifying the temporal location and source of

each question within the curriculum. This is followed by two broad categories that define

general properties of each question with relation to cognitive skills and process that the

item is addressing. Each item is then assigned to a subject area corresponding to the

classification of topics used by the USMLE to both guide student preparation for the Step

1 licensure examination and breakdown student performance. The final level of granularity

in the tagging code indicates the most relevant sub-categories within each subject area

based upon curricular learning objectives. In many, but not all cases, this final level of

classification is directly related to the categories also defined by the USMLE for Step 1.

Utilizing a coding system that closely aligns with the USMLE Step 1 categories not only

allows for the collection of the necessary information to link internal learning objectives

with the assessment content, but also provides an opportunity to directly compare and

analyze student performance in discipline or subject specific components of the integrated

curriculum with performance on the Step 1 examination. This provides an opportunity

to identify potential areas of curricular content that excel in preparing students for the

licensure examinations, or those areas that may need attention and improvement in order

to better prepare the students.

The TEDEQ database provides a critical component necessary to blueprint the medical

curriculum, namely an exam blueprint,3,9 and the corresponding content validity.10,11

Another important aspect of aligning content between the curriculum and the examination

is that it provides greater relevance to the assessment.19 For example, the feedback report

TEDEQ enables us to generate, provides the student with immediate information relevant

to their success in the curriculum. In addition, these performance reports also allow the

student learner to more easily visualize how the knowledge base builds upon itself as they

progress through the curriculum especially in longitudinal subject areas providing academic

relevance. Since the TEDEQ subject breakdown relates their current performance in

the curriculum to topics they will encounter in future licensure examinations, the report

provides an additional tool the student can use to gauge readiness and develop their study

plans for USMLE Step 1 preparation. Thus the current curricular examination gains future

relevance to the student. Together, feedback reports generated by TEDEQ contribute to

providing greater authentic relevance to the assessment process.19

The TEDEQ reports have already been used by the teaching faculty to identify selected

areas of curricular content that students are not performing as well as expected on the

assessment. For example, we have identified sub-categories within the gross anatomy

content that indicate students are having difficulty with specific anatomical regions.

Having identified these topics, anatomy faculty are designing additional e-learning

objects targeting these topics that will be available for the incoming class of students

to supplement the material currently presented in the gross anatomy component of the

curriculum.

In summary, the TEDEQ database provides a powerful tool for curricular management

that is easy for faculty to implement. The system was developed within the ExamSoft

software, which has been used to deliver computerized medical examinations at our

institution for the past two years. Information provided by the TEDEQ database allows for

exam blueprinting, which serves as a source of additional information necessary to meet

accreditation guidelines for curricular content management. The exam item breakdown

reports provide useful performance feedback to the students as well as faculty instructors.

The feedback information can be used to guide student remediation, student study habits,

and direct curricular modification. In future, we plan to use the TEDEQ database to guide

the design and assembly of higher quality examinations within the preclinical medical

curriculum.

1. Wallach PM, Crespo LM, Holtzman KZ, Galbraith RM, Swanson DB. Use of a committee review process to improve the quality of course examinations. Adv Health Sci Educ 2006;11:61-8.

2. Jozefowicz RF, Koeppen BM, Case S, Galbraith R, Swanson D, Glew RH. The quality of in-house medical school examinations. Acad Med 2002;77:156-61.

3. Hamdy H. Blueprinting for the assessment of health care professionals. Clin Teach 2006;3:175-9.

4. Chandratilake MN, Davis MH, Ponnamperuma G. Evaluating and designing assessments for medical education: the utility formula. Internet J Med Educ 2010;1:1-9.

5. Denny JC, Smithers JD, Armstrong B, Spickard III A. “Where do we teach what?” Finding broad concepts in the medical school curriculum. J Gen Intern Med 2005;20:943-6.

6. Willett TG, Marshall KC, Broudo M, Clarke M. TIME as a generic index for outcome-based medical education. Med Teach 2007;29:655-9.

7. Willett TG, Marshall KC, Broudo M, Clarke M. It’s about TIME: a general-purpose taxonomy of subjects in medical education. Med Educ 2008;42:432-8.

8. Salas AA, Anderson MB, LaCourse L, Allen R, Candler CS, Cameron T, Lafferty D: CurrMIT. a tool for managing medical school curricula. Acad Med 2003;78:275-9.

9. Bridge PD, Musial J, Frank R, Roe T, Sawilowsky S. Measurement practices: methods for developing content-valid student examinations. Med Teach 2003;25:414-21.

10. McLaughlin K, Coderre S, Woloschuk W, Mandin H. Does blueprint publication affect students’ perception of validity of the evaluation process? Adv Health Sci Educ 2005;1015-22.

11. Coderre S, Woloschuk W, McLaughlin K. Twelve tips for blueprinting. Med Teach 2009;31:322-4.

12. Lynn MR. Determination and quantification of content validity. Nurs Res 1986;35: 382-5.

13. Yaghmale F. Content validity and its estimation. J Med Educ 2003;3:25-7.

14. Kies SM, Williams BD, Freund GC. Gender plays no role in student ability to perform on computer-based examinations. BMC Med Educ 2006;6:57.

15. Hall JR, Weitz FI. Question database and program for generation of examinations in national board of medical examiner format. Proc Annu Symp Comput Appl Med Care 1983;26:454-6.

16. Peterson MW, Gordon J, Elliott S, Kreiter C. Computer-based testing: initial report of extensive use in a medical school curriculum. Teach Learn Med 2004;16:51-9.

17. ExamSoft [http://examsoft.com/main/index.php?option=com_content&view= article&id=33&Itemid=7#NEWS12].

18. United States Medical Licensing Examination Website [http://www.usmle.org/ examinations/step1/2011step1.pdf].

19. D’Eon M, Crawford R. The elusive content of the medical-school curriculum: a method to the madness. Med Teach 2005;27:699-703.

References