Using simulation in medical student assessment
Transcript of Using simulation in medical student assessment
www.wiser.pitt.edu
Using simulation in medical student assessment
WR McIvor, MD Associate Professor of Anesthesiology
Associate Director of WISER for Medical Student Simulation Education
www.wiser.pitt.edu
I don’t use sim to determine med student proficiency
• 12 yrs experience
• “Teach” medical students, and presume my efforts improve their KSA’s
• “Day 1” MS III course – 90 minutes long, from 7:30 - 9:00 am
– Goal: KSA around “Do you want to intubate my patient?”
– Value of keeping students in the sim lab who are grossly incompetent?
www.wiser.pitt.edu
Factors driving assessment
• Public accountability1
• LCME2 – Educational program provide a general
professional education that prepares students for all career options in medicine. Cite relevant outcomes indicating success in that preparation.
– Ensure students have acquired core clinical skills.
• Performance (vs time) criterion for advancement3
1Crossing the Quality Chasm: A New Health System for the 21st Century. IOM, 2001 2http://www.lcme.org/selfstudyguide1011.pdf 3Educating Physicians: A call for reform of medical school and residency
www.wiser.pitt.edu
Advantages of sim-based assessment
• Reproducible
• Realistic
• Safe for patients
• Assess ability across many medical and surgical scenarios
www.wiser.pitt.edu
Challenges
• What should we expect of a trainee? • How hard is this scenario? • Limitations of what can be simulated • Require a number (4-8) scenarios in order to get accurate
assessment – Necessitates short experiences – Time validity?
• Clear understanding of what we are seeking to measure – Knowledge – Procedural skill – Decision making – Communication – Professionalism
www.wiser.pitt.edu
Simulation used to assess medical students: USMLE Part 2 CS
• Uses SP’s to assess:
– Communication skills
– Diagnostic skills
– Interpersonal skills
– Documentation ability
– English proficiency
• Pass/fail exam
www.wiser.pitt.edu
CS test characteristics
• Utilizes a method that has 35 years of history • The cases (12) all have the same difficulty • Very specific instructions
– Trust the VS, unless you don’t think you should – Do a focused, not necessarily complete physical exam – Some physical findings will be real/some simulated
(suspend belief) – Genital/rectal/pelvic simulators are used for those
exams
• Only performed in Philadelphia • Schools (certainly Pitt) rehearse this test
1http://www.usmle.org/Examinations/step2/cs/content/description.html
www.wiser.pitt.edu
Mannequin simulator limitations
• Some things the simulators do not model well – Cyanosis – Sweating – Respiratory distress
• Airway problems tend to be all or nothing – Can’t have a moderately difficult intubation
• Time issues – Students give drugs, or mask ventilate, and expect an
instantaneous change in VS – Sometimes administer several drugs at once; produces
conflicting responses
• The frequency of simulators crashing
www.wiser.pitt.edu
Key areas of human-patient simulation (HPS) assessment1
1. Defining the skills to be assessed
• Choosing appropriate sim tasks
• Appropriate simulators
2. Establishing appropriate metrics
3. Determining the source of error in measurements
4. Evidence of the validity of test scores
1Anesthesiology 2010; 112:1041–52
www.wiser.pitt.edu
1. Defining the skills to be measured and choosing the correct simulation
• The assessment needs – Defined purpose
– Delineation of the knowledge and skills evaluated
– Context for performance-based activities
• Targeted to the examinee’s ability
• Choose scenarios based upon: – Competency guidelines
– Curriculum information
– Simulation capabilities
www.wiser.pitt.edu
2. Developing appropriate metrics- Do the scores reflect actual ability?
• Implicit and explicit scoring – Explicit: checklists or key actions
• Established by content experts informed by experience and practice guidelines
• Advantages: logical, objective scoring, modest reproducibility • Disadvantages: subjectively constructed, reward scripted approach &
“shot gun” performance, do not consider the order in which actions are taken
– Implicit: Entire performance is rated as a whole (“Global assessment”) • Applied to teamwork/communication assessment • Often require multiple well-trained raters • Typically scored retrospectively • How to assess varying performance over time?
• “Patient” (simulator) outcome
www.wiser.pitt.edu
3. Test score reliability
• Generalizability (G) studies are conducted to identify sources of error (score inconsistency) and their interactions
• Decision (D) studies are then conducted to determine optimal scoring design – How many simulations and raters are necessary for reliable
scores given the construct being assessed
• Task sampling variance has greater impact on assessment than rater’s effect – Participants can do a great job treating hypotension, and a poor
job with hypoxia – Need more sim scenarios (not more raters) to improve reliability
www.wiser.pitt.edu
4. Validity of test scores- What inferences can be made from the
assessment scores? • Content validity:
– Base simulations on actual occurrences/practice characteristics – Base scoring rubrics on evidence – Stakeholder feedback – Realistic modeling using real world equipment
• Internal consistency – Good proceduralists are likely good communicators
• Criterion validity – Sim performance correlates positively with experience and test
scores (board scores, for e.g.)
• Competency threshold (“cut score”) must be determined
www.wiser.pitt.edu
Experience with mannequin-based sim assessment1
• Med school grads are expected to manage acute care scenarios
– These scenarios can’t be modeled with SP’s
– Knowledge (cognitive tests) may not be sufficient to assess management skills
– Looked to HPS for a testing platform
• Had MS IV’s and interns perform 6 of 10 scenarios
1Anesthesiology 2003; 99:1270–80
www.wiser.pitt.edu
Results
• Interns were more proficient than MS IV’s
• Variance in student/resident scores were attributable to case content
• To improve the precision of the assessment measurement, increase the number of cases performance
• Increasing the number of raters would not improve reliability – Agreement among raters about key elements during
scenario development
www.wiser.pitt.edu
Results
– Based scoring on specific diagnostic and treatment guidelines
– Brief scenarios
– Evaluated technical, not non-technical skills
• Participants with ACLS/PALS certification and CCM experience performed better
www.wiser.pitt.edu
Conclusions
• Rater’s facet did not impact overall reproducibility – Scenarios with high degree of content validity
(performance objectives established by experts)
– Well-defined scoring rubrics
• Person x case variance was large – Number of cases are the most important factors affecting
the reliability of this assessment
• Clinical experience correlated with better performance
• HPS can be used to evaluate clinical performance in med students and residents
www.wiser.pitt.edu
To be an effective assessment piece, participants must be familiar with HPS
• More penetration of HPS into med school curricula
• ACGME statement that anesthesia residency programs use simulation yearly
• MOCA’s HPS requirement
• HPS is being studied as an evaluation instrument
• HPS will become common place in the next few years, therefore…