Understanding differential attainment across medical training
pathways: A rapid review of the literaturereview of the literature
Final report prepared for The General
Medical Council
Dr Sam Regan de Bere, Dr Suzanne Nunn, Dr Mona Nasser
21/08/2015
1
Funded by the General Medical Council.
The views expressed in this report are those of the participants
and the authors and do not
necessarily reflect those of the General Medical Council.
2
Executive Summary
....................................................................................................................
5
Current narratives of differential attainment
................................................................
10
1. Introduction
.....................................................................................................................
14
2 Background
......................................................................................................................
15
4 Methodology
....................................................................................................................
16
5.2 Search strategy
..........................................................................................................
17
5.4 Quality assurance
......................................................................................................
20
6 Research Ethics
................................................................................................................
22
7 Data Analysis
....................................................................................................................
22
8 Narrative synthesis
..........................................................................................................
25
Study habits
.....................................................................................................................
27
Success
.............................................................................................................................
28
Ethnicity
...........................................................................................................................
29
IMG
...................................................................................................................................
31
Language
..........................................................................................................................
34
Gender
.............................................................................................................................
36
Mentoring
........................................................................................................................
38
Selection
...........................................................................................................................
38
PMQ
.................................................................................................................................
40
Examiner bias
...................................................................................................................
43
10.3 Possible interventions
...........................................................................................
48
12 References
.......................................................................................................................
52
13 Appendices
.......................................................................................................................
56
Appendix 1: Studies and other documents included in the synthesis
............................... 56
Appendix 2: Quality evaluation of studies using primary
data............................................ 87
4
Table of Figures Figure 1. Flow diagram of study selection
...............................................................................
23
Figure 2 Analysis of included studies and documents by methodology
or type ..................... 24
Figure 3. Publication by date
...................................................................................................
25
Figure 4. Conceptual map of themes identified in the published
literature ........................... 26
Table of Abbreviations and Acronyms AoMRC Academy of Medical Royal
Colleges
BME Black and minority ethnic
BAPIO British Association of Physicians of Indian Origin
CSA Clinical skills assessment
GMC General Medical Council
IELTS International English Language Testing System
IMG International medical graduates
MRCOG Member of the Royal College of Obstetricians and
Gynaecologists (examination)
MRCPsych Member of the Royal College of Psychiatrists
MCQ Multiple choice question
OSCE Objective Structured Clinical Examinations
PMQ Primary medical qualification
RCA Royal College of Anaesthetists
RCGP Royal College of General Practitioners
RCOG Royal College of Obstetricians and Gynaecologists
USMLE United States Medical Licensing Examination
5
Executive Summary
Introduction Differential attainment is a term used to describe the
variations in levels of educational
achievement that occur between different demographic groups
undertaking the same
assessment. Differential attainment has been recognised as a
challenge for medical
professionals and educators since the 1990s, and has been observed
in both undergraduate
and postgraduate contexts. It is not specific to medical education;
it is a feature of
professional education more generally.
Since 2010 the GMC has worked with others analysing data in order
to better understand
the progress of trainees through their programmes and to identify
any potential differences
between demographic groups. This rapid review of literature
published in the period
between 2004 and the present day contributes to a wider programme
of research being
carried out by the GMC to explore differential attainment across
training pathways.
Research design The research was commissioned to provide a rapid
review of the corpus of knowledge
relating to differential attainment. The researchers adopted a
narrative synthesis
methodology in order to explore how contributions to the literature
had sought to define,
measure and explain differential attainment – and therefore to
identify key factors that
might be considered as having an impact upon attainment.
An initial scoping exercise highlighted that the current corpus of
literature comprises
materials in a variety of formats, including; qualitative and
quantitative research reports,
systematic reviews of attainment data patterns, policy documents
and academic papers,
and opinion pieces and editorials.
Narrative synthesis provides a useful framework for accessing and
analysing such diverse
and complex literatures. It lends itself to a ‘storytelling’
approach, by capturing a number of
different insights, evidence bases, theories and position pieces in
context, and presenting
them together as an overarching narrative of differential
attainment. In addition, rather
than imposing a definitive structure or sequential process, which
might preclude certain
significant contributions that do not fit the initial review terms
(1), narrative synthesis
6
allows researchers to move iteratively within a systematic approach
– picking up on leads to
relevant information throughout the research process.
The search was conducted using PubMED, MEDLINE and PsychINFO
databases, within a
search strategy that included Medical Subject Headings (MeSH) terms
and text-word
searches for maximal retrieval. These searches were supplemented
with further iterative
searching of reference lists, and a grey literature search of
stakeholder websites. The
research team was supplemented by an expert panel, members of which
were selected in
order to provide advice on search terms, to discuss the quality of
the retrieved literature, to
comment on any initial emergent themes and to review the final
report prior to submission
to the GMC.
We developed two frameworks against which to evaluate the retrieved
papers and grey
literature: PICOC (Population, Intervention, Comparison, Outcome,
Context) for quantitative
papers and SPICE (Setting, Perspective, Intervention/phenomena of
Interest, Comparison,
Evaluation) for qualitative papers and other documents. These
frameworks provided
transparency for our identification of included papers and other
documents. A total of 39
papers were included in the synthesis with the addition of 24
documents from the grey
literature.
The literature on differential attainment The findings of narrative
synthesis are grounded in the literature surveyed. The
research
process does not begin with a set of a priori assumptions: instead,
using this method
enables themes to emerge and be recorded as the literature is
identified. The search
process highlighted that the evidence base relating to differential
attainment is disparate,
that it includes a number of different research designs and
variously applied methods, and
that it does not feature definitive terminology across studies.
Concepts and terms are often
used interchangeably and are operationalised in research
accordingly, which makes
constant or consistent comparison difficult to validate.
Overall the peer reviewed literature was of a high quality, where
research aims, objectives,
methods and analyses were clearly articulated and justified. The
main focus of primary
research was on the relationship between ethnicity and differential
attainment in high
7
stakes examinations. While some studies are focused on
undergraduate populations, some
on postgraduate doctors, and a number include both, we found that
the research questions,
findings and conclusions were nevertheless relevant to
understanding the emerging
narrative of differential attainment in postgraduate cohorts.
Given the limitations in the literature, we read and re-read the
materials selected,
individually and then discussed as a team. During this process we
used conceptual mapping
to help us understand the categories and themes arising from the
entire data set. This
grounded approach led to the emergence of a three level schemata,
providing three distinct
but related categories, or layers, of information on:
• the macro or policy level (investigating the political agendas
and practical activity surrounding high stakes examinations)
• the meso or institutional level (exploring the impact of the
medical school, training contexts and/or working environment)
• the micro or individual/discrete group level (with a focus on
individuals or groups of students, doctors, examiners and so
on)
Quantitative studies dominated the research base (26 studies),
focusing on the macro level
and typically using large data sets to examine causal and
associative relationships between
various demographic groups and different high stakes exams. The
focus of the qualitative
research (5 studies) was more diverse, and explored the role of
factors at the micro and
meso levels of infrastructures built to support examinations,
cultural contexts and personal
interactions.
Two large scale commissioned studies in the grey literature
examined the significance of
language and cultural factors for IMGs (2, 3) using mixed methods
approaches, and one
further study combined a literature search with interviews.(4) In
addition to this there was
one systematic review and meta-analysis of ethnicity and
performance in UK trained doctors
and medical students, focusing on quantitative reports. (5)
Investigating assessment agendas and the design of high stakes
examinations
The majority of studies dealing with the macro level focused on
differential attainment in
high stakes exams. The research upon which this aspect of the
literature is based typically
used quantitative methodologies, using large datasets, with a focus
on testing for bias in the
8
exam, or a component part of it using exam data. Their conclusions
are founded on typically
high quality, peer reviewed reports including clear validity
measures.
Taken as a whole, these studies have broadly demonstrated the
validity of high stakes
exams, and discounted evidence of bias in the nature and structure
of exams themselves as
causal factors for differential attainment. However, the emerging
narrative contains a
recognition that the infrastructures and processes put in place to
support selection (6) and
high stakes exams may nevertheless encompass elements that lead to
actual or implied bias
and/or differential attainment. (7)
Examples of this include: i) potential examiner bias through levels
of concordance between
examiner and candidate in practical examinations, (8-12), and ii)
the lack of a universal
terminology to classify data, which may lead to different
interpretations of bias and/or
differential attainment from the exam data. An example of this is
the variation in the ways
the Royal Colleges monitor for protected characteristics, which has
been identified as a
potential contributor to unfair bias. (13)
The impact of institutional structures and organisational
contexts
Literatures focusing on the nuts and bolts of postgraduate
education, at the level of the
medical school or workplace, highlighted the paucity of
well-developed research into
postgraduate selection. These contributions drew on primary data,
were typically published
in high quality academic sources, and editorial comments were
presented by authors with
research or experience in this area. (14, 15)
In contrast to undergraduate selection, there is little research on
postgraduate selection
processes. In the literature, selection processes were presented as
highly variable,(4, 6)
although it was recognised that a rigorous (or otherwise) selection
process might have
implications for attainment. Best practice selection methods
highlighted in the sample
involved the identification of required competencies and the
development reliable
assessment methods for them. The narrative suggests that the
application of a validation
process should be used to assess the predictive value of the
selection methods.
Pre-entry advice and proper induction processes were identified
across the international
literature as important factors for IMGs and other students who
gained their PMQ outside
9
the country they wished to work in. (16, 17) One significant UK
focused study (2) identified
the GMC as having a central role in developing a ‘joined up’
approach to supporting PMQs
and IMGs in addition to individual employers. (2)
Buddying or mentoring was highlighted as a useful approach to
assisting acculturation. A
literature search of PubMed identified mentoring programmes for
undergraduates as having
positive impacts on attainment levels but cautioned that this was
relevant only if such
programmes were based on robust designs and were evaluated to
ensure effectiveness. This
review demonstrated that most research in the area of mentoring to
improve attainment
has been undertaken in the USA. (18)
Understanding the role of the individual or discrete group
The literature pertaining to the individual or discrete groups
suggested that a combination
of factors may be associated with educational performance. These
include: learning styles
and psycho-social factors; demographic characteristics such as
gender and ethnicity; wider
social and cultural capital; language and other, more tacit,
contributors to success. The
literature exploring these factors used both qualitative and
quantitative methodologies and
was generally of a high academic quality whereby methods and
findings were justified
accordingly. Two of the four studies were UK-focused: and both
examining undergraduate
medical students (19) (20) two qualitative studies more narrowly
focused on specific types
of student in the USA and Saudi Arabia focused on contributors to
success. (21, 22)
Numerous studies focused on ethnicity in relation to analysis of
differential attainment at
macro, meso and micro levels. However, whilst this issue dominated
the literature, the
complexity of the term was largely unaddressed. Terms such as IMG
and BME were used
interchangeably and uncritically.
For example, while “BME” is a widely used term in public and
private sector organisations to
incorporate a range of minority communities living in the UK, using
it as an umbrella term to
group together diverse socio-cultural demographics has been
critiqued – but typically this is
not addressed in the sampling or conclusions drawn from the various
studies within the
literature.
10
Whilst perhaps more obvious, IMG is another umbrella term specific
to medicine that
requires clear definition, for similar reasons. The narrative
emerging from the literature
identifies “IMGs” as being increasingly important to the delivery
of healthcare, but
nevertheless experiencing the inherent difficulties of migration
and acculturation. However,
the specifics of these difficulties, how they might vary – and why
this might be important for
differential attainment of IMGs – is absent from these
discussions.
Similarly, ‘language’ is cited as a predictor of good performance
but it is not proven to be, of
itself, the reason why students and/or doctors fail high stakes
examinations. Moreover, any
sociological or psychological examination of ‘language’ is also
missing, and the concept is
treated as unproblematic in terms of its application as a potential
factor underpinning
attainment.
The key narratives of differential attainment Following thematic
analysis, narrative analysis was then used to identify any
relationships
emerging between and across these themes. As has already been
acknowledged, the
literatures are disparate and disjointed. However, there key
messages are similarly
structured around: i) the potential causes of differential
attainment, ii) the ways in which
differential attainment has been researched and iii) potential
interventions to further our
understanding and help inform strategies going forward.
Understanding causes and relationships
The initial research undertaken into understanding differential
attainment tended to focus
on the analysis of exam data with the aim of validating high stakes
examinations or
identifying bias. There were 5 high quality quantitative studies
included in the analysis. (7,
12, 23-25). The dominant message from these studies was that, while
the reasons for
differential attainment remained unclear, they were likely to be
multifactorial.
The chronological trajectory of the research demonstrates that
research is increasingly
emphasising the importance of educational and social factors in
contributing to
performance. In this area research is frequently qualitative. We
found 8 studies, key among
which were Woolf’s analysis exploring the relevance of stereotype
threat (26) and
Vaughan’s study using social capital theory to understand the role
of networks and social
behaviour (19).
11
Both of these studies focused on undergraduate medical students,
but provided a way of
analysing differential attainments that bear relevance for
postgraduate patterns. In terms of
studies examining the attainment levels of postgraduate students,
Illing’s and Roberts’
studies were the most extensive in terms of scope and data
analysed. (2, 3)
The general point to draw from this development of research foci in
both undergraduate
and postgraduate fields (and one that suggests we may be best
served by considering both),
is growing consensus that researchers should not limit their
analysis purely to exam results.
Current thinking acknowledges the requirement to examine the
‘whole’ of the exam; its
support structures (both formal and informal) and features of its
candidature that go
beyond demographics to attitudes and behaviours.
Selection, language and the identification of facilitators, as well
as barriers, are factors that
have been emphasised across a number of studies. In much of the
literature, language is
used as a proxy for communication broadly, which is an umbrella
category incorporating
gesture, pronunciation and intonation etc. This is an important
observation, since
communication skills form part of clinical skills assessments and
these carry with them
implicit cultural assumptions relating to the doctor-patient
dynamic. The message emerging
here is that lack of acculturation will impact on performance and
ultimately attainment,
even if clinical skills are to an expected standard or level of
competency.
The literature also identifies poor induction, lack of support for
IMGs in overcoming the
difficulties inherent to migration, and career change; all as
factors that may disadvantage
IMGs in becoming better trained and acculturated doctors. A small
number of studies have
highlighted the importance of considering factors that support
higher levels of attainment.
Qualitatively, it is important to note these contributions to the
building narrative: limiting
analysis to why certain individuals or discreet groups might fail
to progress along the career
pathway risks ignoring evidence that identifies why other
individuals or discreet groups
succeed – all of which might help us to understand different levels
of attainment along the
spectrum.
The importance of appropriate research design
For the reasons outlined above, this review included studies
employing different research
methods, the majority of which undertook the quantitative analysis
of primary data. In
12
order to examine the complex nature of ‘causes’, qualitative
research approaches have
more recently been used to examine complex phenomenon embedded in
the culture and
contexts of assessment.
This relatively recent turn to qualitative methodologies to capture
evidence of complexity
adds depth of understanding to the breadth of the quantitative
research literature. Indeed,
the narrative emerging from the more recent contributions to the
literature suggests that
innovative research approaches are required now that complexity is
acknowledged. Specific
recommendations within the literature include: longitudinal
tracking, interdisciplinary
research to provide fresh perspectives, and the development of more
appropriately
sophisticated theoretical frameworks.
A significant issue across the research is the lack of (either)
transparency or consistent
definition around the categories of explanation. While some
contributions acknowledge the
inherent difficulty in defining and categorising, it remains the
case that umbrella categories
like BME and IMG, ethnic group and ethnicity have not been
subjected to full interrogation.
In this sense, the development of suitable interventions to address
the problem of
differential attainment is compromised by the problem of
inconsistently applied definitions
and classifications across existing databases and research
studies.
Possible interventions and future strategies
Overall, the differential attainment literatures suggest that a
variety of factors may affect
performance and attainment. These include issues around the
background and
characteristics of the individuals, the stage they are at in their
medical career and the
organisational structure of different workplace settings. These
might have cumulative
effects over time or ‘one-off’ effects at certain stages of their
career.
Due to the variety of factors identified as potentially affecting
performance and attainment
in part, the narrative emerging from the current body of knowledge
recognises the need for
a complex intervention incorporating analysis of the micro, meso
and macro levels of
engagement - rather than a simple intervention to establish cause
and effect relationships
of single factors.
13
The first consideration in designing an intervention relates to the
level at which the
intervention is required: at the individual level, the
institutional level, a broader policy level,
or a complex intervention with components on each level. It is
important to recognise that
any intervention targeted at a single level needs to be thought
through across all levels in
case unanticipated effects at other levels emerge as a
consequence
Conclusion
This review has found that differential attainment in postgraduate
medical education in the
UK cannot be attributed to a single identifiable cause, but results
from a subtle combination
of factors yet to be fully explored. Over time, research has moved
from the quantitative
analysis of exam data towards a more cross-disciplinary approach in
order to explore a
combination of educational and social factors (rather than single
causes) as contributors to
differential attainment. Such an interdisciplinary approach is now
presented as essential for
developing a nuanced understanding of the complexities of
differential attainment across
the micro, meso and macro-structure of medical education, and is
viewed as the foundation
upon which future interventions may succeed.
14
1. Introduction Differential attainment is a term used to describe
variations in educational achievement by
different demographic groups undertaking the same assessment. It is
a phenomenon readily
identified across the educational landscape, and research by HEFCE
and others has
identified a complex range of personal, cultural, institutional and
structural factors
impacting on parity.(27)
Differential attainment has been a recognised feature of medical
educational achievement
since the 1990s in both undergraduate and postgraduate contexts.
But interest in the
underperformance of ethnic minority doctors has been heightened in
recent years in the UK
with a judicial review in the High Court (April 2014) for alleged
racial discrimination against
ethnic minority doctors by the RCGP in their high stakes
examinations. The legal challenge
from BAPIO was dismissed but the Judge recommended action on
differential attainment
and that the RCGP should focus on training to ensure that
candidates are prepared for their
examinations.
The Judicial review is often presented as the catalyst for action,
whereas the GMC has been
working with others since 2010 to analyse data to better understand
the progress of
trainees through their programmes. A commissioned independent
review of the RCGGP CSA
identified that overseas qualified doctors, or (IMGs), were 15
times more likely to fail the
CSA, and UK qualified BME doctors were four times more likely to
fail than their white
counterparts at first attempt (the difference diminished for UK BME
doctors on their second
attempt but differences for IMG BME doctors persisted on second and
third attempts). (28)
Recent analysis of exam data has shown that in a simple univariate
analysis the same
patterns of attainment were present across speciality groups.
(29)
The present literature review contributes to a wider programme of
work being carried out
by the GMC to explore differential attainment across training
pathways.
15
2 Background
Differential attainment is a term used to describe variations in
educational achievement by
different demographic groups undertaking the same assessment.
Characteristics including
gender, age, ethnicity, nationality and socio-economic status,
along with medical school and
postgraduate training programme, are all factors that HEFCE have
identified as having a
correlation with performance and attainment.(27)
A search of PROSPERO and the COCHRANE library revealed that there
are currently no
registered substantive reviews of differential attainment specific
to postgraduate medical
education. There is however a growing body of literature examining
potential causes and
factors relating to differential attainment across both
undergraduate and postgraduate
medical education. (20, 21, 30, 31)
3 Aims and purposes of the review The purpose of this review is to
understand from the existing evidence the underlying
causes of differential attainment in postgraduate medical education
in the UK and English-
language speaking countries with comparable medical education
systems (USA, Canada,
New Zealand and Australia). This includes identifying different
causes and/or significance of
causes across those countries, providing a conceptual framework to
design interventions to
address these issues in UK, identifying possible methods for
further research in this area and
rating the strengths and weaknesses of evidence that may suggest
areas for future research
and/or work.
The aims of the review are as follows:
o To establish an evidence base for differential attainment in the
UK and other
comparable countries
o To identify any research methods pertinent to identifying and/or
understanding the
causes of differential attainment in UK postgraduate medical
education
o To examine interventions that have been effective in reducing
differential
attainment that may be applicable to UK postgraduate medical
education
16
o To rate the quality of evidence as a ‘springboard’ for future
work
4 Methodology 4.1 Rapid review Systematic reviews that engage with
health policy are becoming increasingly valued by
policy makers as the evidence base becomes more complex (32).
However policy makers
often require a synthesis of knowledge on emerging issues within a
short time frame in
order to facilitate a timely response and/or decision. A
traditional systematic review takes at
least 12 months to complete, the need to accelerate this process to
produce a rapid review
requires the reviewers to undertake methodological ‘shortcuts’ to
streamline the process.
There is currently no standardised method for undertaking rapid
reviews, and indeed Oliver
argues that this may be counterproductive.(33) In a review of the
methods used in rapid
reviews Ganann et al recommend transparency of reporting methods,
in particular where
‘traditional’ processes had been streamlined. (34)
There is considerable debate about the relative merits of full
systematic over rapid reviews
with rapid reviews considered appropriate to answer focused
questions or as an important
intermediary step to further research where interventions are
complex. Rapid reviews may
lack the depth of full systematic reviews to present detailed
recommendations, but a review
comparing cases where both rapid and full systematic reviews were
conducted found that
overall there was no significant impact on the final conclusions of
a review. (35)
4.2 Narrative synthesis “’Narrative synthesis’ refers to an
approach to the systematic review and synthesis of
findings from multiple studies that relies primarily on the use of
words and text to
summarise and explain the findings of a synthesis”.(1)
The flexibility of narrative synthesis lends itself to this type of
‘storytelling’ since rather than
having a definitive structure or sequential process (1) it relies
on a framework that can be
broken down into four elements, through which the researchers can
move iteratively:
• Developing a theory about how the intervention works, why and for
whom
• Developing a preliminary synthesis of findings of included
studies
17
• Exploring relationships within and between studies
• Assessing the robustness of the synthesis
5 Methods 5.1 Development and registration of protocol The protocol
for the research was developed by the core research team: Drs Regan
de Bere,
Nunn and Nasser with support from the expert panel. The protocol
for the research was
agreed with the GMC on 06/02/2015 and registered with PROSPERO on
26/02/2015. The
protocol was subsequently published on the PROSPERO
website http://www.crd.york.ac.uk/PROSPERO Reference no:
CRD42015017130.
5.2 Search strategy The inclusion and exclusion criteria were
agreed between the lead researchers and the
expert panel. These criteria set the boundaries for the
research.
Table 1 Inclusion and exclusion criteria from the protocol
Inclusion Exclusion Published between 01/01/2004 and
01/01/2015
Disciplines outside medicine (e.g. pharmacy, dentistry, nursing and
midwifery)
UK and countries with comparable medical education systems (USA,
Canada, New Zealand and Australia).
In the English language Studies using any methodology singly or in
combination and ‘grey’ literature
Studies or documents related to postgraduate, and where appropriate
to undergraduate medical education
Differential attainment /success or failure
However, as the research progressed we did revisit and refine the
initial criteria as we
identified gaps and leads to important relevant literatures
previously excluded. For example,
while Norway was not on our original source list, while reviewing
the literature we included
one Norwegian study (36) since it addressed conceptual issues we
considered relevant to
the review (namely those surrounding gender and qualification
related to working
environments). We also included a study from Switzerland examining
the impact of
mentoring during postgraduate training. (37)
We searched PubMed using the following search strategy that
includes MeSH and ‘free
text’.
#7 #3 AND #6 Filters: Publication date from 2004/01/01 #6 #4 OR #5
#5 (Attainment or success* or fail*) #4 "Educational Status"[MeSH]
#3 #1 OR #2 #2 (postgraduate AND educat* AND med*) # 1 "Education,
Medical, Graduate"[MeSH]
This search strategy was adapted for other databases like
PsychINFO. We also searched
reference lists of key papers to:
1. Ensure that our search criteria was identifying key papers
2. Identify additional papers and/or grey literature
We also added to the studies found through the searches from our
own knowledge of the
subject literature.
We did not consult authors directly but met several leading
researchers in the field at
related GMC events 27/2/15 and 16/03/15 where there was the
opportunity to discuss the
review.
We also placed a call on the GMC website for contributions from
other researchers and
interested parties. This call produced no new sources of
information.
The results of the searches, conversations and prior knowledge of
the literature identified
prominent topic areas and issues in the medical education
literature, as well as highlighting
those which have been less well documented. This information was
later used to conduct
additional iterative searches in educational literature in order to
fill any gaps identified.
19
As part of the selection process, we categorised relevant
literature in medical education that
fell outside of our inclusion criteria i.e. studies relating to
other countries. The rationale for
this was to enable decisions at the later analysis stage, to decide
whether such studies might
help us fill any gaps (or otherwise).
After an initial screening of the results, we used NVivo 10, a data
management software
package, to calculate the themes identified across the literature.
Individual papers may
contain several foci and each is coded individually. By listing the
number of studies that
reference each descriptive theme we developed a simple schema to
identify gaps in the
literature. From this we conducted further iterative searches in
the medical undergraduate
literature to assess if there were any generalizable findings from
those studies.
We also undertook general searching of relevant stakeholder
websites listed below for grey
literature.
General Medical Council British Medical Association Royal College
of Physicians and Surgeons of Glasgow
Royal College of Psychiatrists
Royal College of Ophthalmologists
Royal College of Obstetricians and Gynaecologists Royal College of
Radiologists Royal College of Paediatrics and Child health Academy
of Medical Royal Colleges
(AoMRC) Royal College of Physicians of Edinburgh Royal College of
Physicians of London Royal College of Physicians of Ireland
Royal College of Surgeons of England Royal College of Surgeons in
Ireland Royal College of Surgeons of Edinburgh
UK Higher Education Funding Council for England (HEFCE)
Other representative groups: BAIPO Medical Woman’s Federation
The initial search term used in ‘Google’ was:
name of the stakeholder AND differential attainment
20
We then searched iteratively within the stakeholder websites for
additional documents.
5.3 Data management and extraction In defining eligible literature
formats, we included all content-relevant documents and
articles, regardless of the status of their publication. The final
sample therefore included
academic studies, unpublished research, conference papers, guidance
documents, opinion
pieces and so on. Editorial and opinion pieces are included since
they can provide useful
insights and offer potential solutions or identify areas for
thought. They will not be formally
quality assessed but we will report on the perspective from which
the paper was written
(the author and their background) and how this may have contributed
to the shaping of
his/her argument.
We developed frameworks that disaggregated the elements of the
research question,
against which to map the papers. Due to their structured nature,
quantitative studies
tended to relate to the elements of the PICOC framework
(Population, Intervention,
Comparison, Outcome, Context), whilst qualitative studies were
typically more effectively
interrogated using the SPICE framework (Setting, Perspective,
Intervention/phenomena of
Interest, Comparison, Evaluation). The frameworks provided a
transparent method of
identifying papers to include and exclude from the synthesis.
We found no randomised or non-randomised controlled trials. Most
studies focused on
evaluating certain factors like gender and ethnicity on the
performance of the students.
Therefore, we have used a modified version of PICOC and SPICE
frameworks for the final
synthesis presented in this report. This is still consistent with
our methodology in the
protocol registered with PROSPERO (CRD42015017130).
5.4 Quality assurance Due to the inclusion of a wide variety of
material in the final synthesis, and the iterative
method of study and document extraction, the transparency of all
decisions made about
inclusion is guaranteed by thorough documentation of each stage of
the review and the
decision-making processes.
We undertook a quality assessment of the studies that included
primary data using an
adapted version of the Critical Appraisal of Qualitative Research
(CASP) framework. We used
21
this for both qualitative and quantitative studies since the key
issues around quantitative
studies related to the approach to the questions, the design of the
research as related to the
question, the study’s population and what was measured and how. The
ratings of the
studies (high quality / unclear quality / low quality) are included
where appropriate in
Appendix 1 and a fuller description of the evaluation of each study
using primary data is
included as Appendix 2. We included a question related to
generalizability of the study
(direct / indirect / unclear). This question does not contribute to
the quality evaluation but is
reported separately to account for generalizability to the
review.
The research team was supplemented by an expert panel to advise on
search terms, discuss
the retrieved literature, any initial emergent themes and review
the final report prior to
submission to the GMC. The Expert panel (Sam Regan de Bere, Suzanne
Nunn, Mona Nasser,
Paul Lambe, Julian Archer, Martin Roberts, Tom Gale and Rebecca
Pitt) have met to discuss
various stages of the review, including: feeding back on the
research design; ratifying the
protocol; agreeing the selected academic literature; discussing
themes emerging from the
literature; quality assessment and agreeing the structure of the
final report.
During the process of the research the panel agreed that the
retrieved literature was
representative of the field and that the search terms used had been
appropriate. The panel
did not consider that there were any significant gaps in the
literature: they suggested that,
rather than reinforcing extant knowledge by including the
literature from other health
professions, the research team should concentrate on the emerging
narratives and look to a
broader cultural literature to inform the socio/cultural and
pedagogic narratives that were
emerging if required.
The panel did identify a lack of clarity in the terminology used in
different studies across the
literature: in particular the words ‘performance’ and ‘attainment’
have been used
interchangeably. The panel suggested that, for the purposes of this
review, the following
definitions should be applied: attainment would be used in
reference to a direct
measurement, namely passing exams, whereas performance would refer
to academic
performance as a process which implies a temporal element, with
attainment being a
consequence of performance.
22
6 Research Ethics The research for this review is desk based and
ethical permission was not required.
7 Data Analysis Initial database searches identified 3,044
potentially relevant documents. Duplicates were
removed (68) leaving 2,976 documents to be screened by title for
possible inclusion in the
synthesis. Documents rejected at this stage, after exclusions were
applied, were categorised
in case any gaps were identified in the literature and these
documents needed to be
revisited. Ninety six documents were retrieved as papers for
further review (10% of these
being checked by SRdB against the inclusion criteria). From this
tranche 40 papers were
evaluated against the PICOC and SPICE frameworks, as described in
the protocol, 8 failed on
one or more of the criteria, leaving 32 documents extracted for
discussion by the expert
panel and potential synthesis. Following discussion a further three
papers were added on
the advice of the expert panel from their subject knowledge and 4
papers were added as a
result of iterative searching of the reference lists in the papers
identified for synthesis.
A total of 39 papers were included in the synthesis with the
addition of 24 documents from
the grey literature. A flow diagram of the search process is shown
in Figure 1 below.
23
24
The studies and other documents included in the synthesis use a
variety of formats and
methodologies. Shown in Fig 2 below
Figure 2 Analysis of included studies and documents by methodology
or type
Quantitative research is the dominant research methodology for
published research.
Interestingly mixed methods research studies were only found in the
grey literature. The
‘other category’ includes opinion pieces, letters and comment,
conference and other
reports. Not surprisingly this is the area dominated by the grey
literature.
Fifteen of the documents extracted from the grey literature were
comment pieces in the
online medical news and media, Pulse (n = 4), BMA (n = 6), GPonline
(n = 1), BMJ Careers (n
= 3) and Mancunian Matters (n = 1). The most disseminated document
in the grey literature
was the AoMRC 2013-14 review (38) it was linked to the Royal
College sites and returned as
a ‘hit’ when searching them. The document itself has little to say
about differential
attainment: a short paragraph identifying the judicial review as a
catalyst for AoMRCs
decision to “look at the wider question of differential attainment
in medical education.” (38)
An examination of the dates of publication of the included
documents testifies to a growing
interest in differential attainment. This is with the caveat that
there is a time lag between
2
25
4
8
Published Grey
25
academic research and its publication that does not apply to online
comment. But even
taking this into account a trend is clearly discernible.
Figure 3. Publication by date
Broadly speaking, the peaks of interest roughly coincide with
significant changes to the
MRCGP in 2010 (specifically the CSA component), the publication of
Esmail and Roberts
report in 2013 and the Judicial review in 2014.
8 Narrative synthesis A narrative synthesis does not begin with a
set of a priori assumptions. Using this method
themes emerge as the literature is identified and reviewed. The
first level of thematic
identification is descriptive and can be generated in a number of
ways including coding
followed by conceptual mapping to help us think about the
relationships between and
across the themes identified.
Using the themes coded in NVivo 10 we identified two key areas of
interest that emerged
across the literature: high stakes exams and ethnicity.
Fig 4 shows a conceptual map of the relationship between high
stakes exams and ethnicity
in the published literature, with the sub-themes or factors either
identified or investigated.
0
2
4
6
8
10
12
14
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Published Grey
26
Figure 4. Conceptual map of themes identified in the published
literature
N.B. the size of the ovals does not reflect significance of factor
or quality of the research
The conceptual map, while amply demonstrating complexity, also
provides a way of
populating a micro, meso, macro analytical framework that broadly
relates to three key
levels of engagement: the individual or discrete group (student/s,
doctor/s, examiners etc),
the institutional (medical school or work environment) and the
level of policy (exams) .
9 Findings 9.1 The Individual or discrete group Although not
discussing postgraduate education specifically, Schrewe makes a
number of
insightful observations around the place of the individual in
medical education and the
tension between the competing discourses of diversity (respect for
the culture, gender and
ethnicity of individuals) and standardisation (uniformity and
consistency).(39) Arguing that
these discourses need to be made explicit and “brought into the
same conversation” in
27
order to enable students and trainers to achieve their full
potential, Schrewe suggests that a
better understanding of the common qualities required and the
extent to which individual
variation can be supported without detriment to the profession as a
whole is the question
that needs addressing with some urgency.(39)
In this section of the findings we discuss themes identified in the
literature pertaining to the
individual or discrete demographic group.
Study habits Woolf examined ‘study habits’ as part of wider
research into ethnic underperformance in
Year 3 medical students using a questionnaire to assess surface,
deep and strategic learning
processes. Deep learning is associated with an active search for
meaning, whereas surface
learning is associated with memorising rather than understanding.
(40) Woolf found that
minority ethnic students scored lower on deep learning study habits
(p = .003) and higher
on surface learning study habits (p =.008) than their white peers.
(20) Strategic learning,
where learners adopt the best learning style to fit with the needs
of the task was identified
by Woolf as positive predictor of performance but was statistically
related to other factors
including, living at home and having English as a first language.
It is also important to
recognise that students should not be identified with a fixed
approach to learning;
curriculum design, assessment and teaching style all encourage
students to adopt a
particular approach. (41) This suggests potentially broader
questions about ethnicity and
learning.
Psycho-social Psycho-social is a term used to describe an
individual’s psychological development in, and
interaction with a social environment. In the literature on
widening participation psycho-
social factors in relation to undergraduate degree choice are well
documented. (42)
As part of a larger undergraduate research study Woolf examined
personality types of white
and non-white students using an adaptation of the NEO-PI-R (43) to
identify five personality
types (neuroticism, openness to experience, agreeableness,
extraversion and
conscientiousness). The study of a total of 703 (51% minority
ethnic) students found that
ethnic minority students were lower on the personality trait
“openness to experience” (p =
28
0041) (20) but this was not found to have a negative effect on
final year examination
performance.
Social and cultural capital The ‘standing’ that medical
professionals have within different cultures has been shown
to
have a significant effect on the choice of medicine as a career. A
study linking informed
choice and academic success in Iranian medical students provides a
useful international
review of studies that found many medical students having an
“over-dramatized and
romanticized view of medicine at the beginning of academic
studies”. (34) The Iranian study
used a multiple choice questionnaire (n = 2208) for final year
medical students and found
that informed choice had a positive effect on attainment.
Success Esmail recommends that more research is undertaken into
factors for ethnic students’
success. (28) Whilst not looking at ethnicity, we identified one
small scale qualitative study
using interviews with 10 black male medical students and 3 black
male physicians at Florida
State University College of Medicine to explore their perceptions
of the factors contributing
to their success in being admitted to and graduating from medical
school. (22) The study,
with its gender, geographical and numerical limitations, never the
less presented an
interesting line of enquiry looking at contributors rather than
barriers to attainment.
The study concluded that factors contributing to success were a
balance between
educational experiences, exposure to medicine,
psychosocial-cultural experiences (including
family and other support networks) and personal attributes.
Participants in the research
specifically identified structured activities like enrichment
programmes and outreach
programmes as significant. The Minority Association of Pre-Medical
Students Programme
(MAPS) was an example cited by the study participants. MAPS
provided opportunities for
networking with other premedical students, medical students and
physicians and
importantly provided the opportunity for shadowing
experiences.
We then looked at the undergraduate literature to see if there were
any other studies that
looked at contributors. One qualitative study based in Saudi Arabia
used focus groups to
understand 19 mixed gender high achieving medical students
perceptions of factors
contributing to their success.(21) They identified learning
strategies, resource management
29
problems i.e. stress.
In a study examining the differential achievement between white
medical students and their
ethnic minority peers, Vaughan (19) used social capital theory to
develop and analyse survey
data from medical students in the clinical phase of their training
(n = 158). The research
found no link between ethnic and religious homophily and
achievement. However,
interacting with problem-based learning group peers in study
related activities and having a
wider academic support network were found to be directly linked to
better achievement.
Vaughan concluded that ethnic homophile may cut minority students
off from potential and
actual resources that facilitate learning and achievement.
Therefore it is key that students
build wide relationships with colleagues at all levels of
training.
Ethnicity The underperformance of ethnic minorities compared to
their white peers across the higher
education landscape has been consistently identified.(44) (45) The
studies discussed in this
review focusing on the performance of UK-trained medical students
and doctors from
minority ethnic groups have corroborated broader HEFCE
findings.(27)
Definitions of ethnicity are numerous and complex. In the UK
studies we discuss in this
review ethnicity was either self-declared specifically for an
individual study or was a
characteristic already identified in a data set being
analysed.
All these papers evidence a mix of educational and social factors
as
contributing to performance of individuals in addition to
individual
characteristics
The literature examining contributors to success is important since
by
only looking at why certain students might fail only tells half the
story.
From the papers found, contributors to success seem to be
international
but with such few studies the results are not generalizable.
30
Classification systems used in the research also varied, and
included the 2001 UK census
guidelines, (19, 20, 26) individual Royal College geographical
bands, (46) white and non-
white, (47) BME as an umbrella group, (5) (11) (24) categories
approved by UK Commission
for racial equality, (9) GMC National Training Survey (23) which
uses UK census categories.
Studies use different categorizations and therefore comparisons
between studies can be
difficult. For example, Denney cites the conflation of all BME
groups under one heading as a
limitation of the study but states that it was necessary in order
to compare and contrast the
results with other studies and because the numbers were too small
in some sub groups. (11)
Woolf adopts the same approach, arguing that ethnic categories are
to an extent artificial
because they can never take into account the subtle variations
between groups of people.(5)
There have been a number of key large scale quantitative research
projects since the 1990s
focusing on ethnicity and differential attainment. The catalyst for
this area of research was
the identification of a higher failure rate in clinical exams among
non-white students at the
University of Manchester the 1995 (48): the leading researchers in
the field in the UK are
Chris McManus, Katherine Wolf, Jane Dacre and Richard
Wakeford.
In a systematic review of ethnicity and academic performance in UK
undergraduate and
postgraduate medical students, Woolf found ethnic differences in
attainment to be
widespread across different types of medical school and different
types of exam at both
levels of study.(5) The review focused on quantitative reports that
measured performance
and concluded that differential attainment was both “consistent and
persistent”: but while
ethnicity was clearly related to exam performance the reasons for
this were not clear.(5)
The first large scale longitudinal study exploring in depth a
number of potential
psychological and demographic reasons for differential attainment
in undergraduate and
postgraduate medical students was led by Katherine Woolf
(20).
In contrast to the studies, focusing on measuring differences in
attainment between
different groups, Woolf’s qualitative study (26) using focus groups
and semi-structured
interviews (n = 27 medical students and 25 clinical teachers)
followed earlier studies in the
US and examined the potential of stereotype threat to provide an
insight into the identified
gap in attainment. Stereotype threat has been identified as a
psychological phenomenon
31
whereby individuals who are members of a group characterized by
negative stereotypes
perform below their actual abilities when group membership is
emphasized. Woolf found
that negative stereotyping could impact on the relationship between
lecturer and student
and therefore affect learning. She concluded that while a negative
stereotype about an
ethnic group had “numerous implications for teaching and learning”
the relationship was
neither simplistic or deterministic.(26) Woolf concluded that the
student/teacher
relationship was “vital for clinical learning” in particular the
negative Asian stereotype was
considered to be potentially jeopardising to Asian students
relationship with their teachers.
Woolf recommends that employers should facilitate teachers in
getting to know their
students as individuals. Although the study was limited to one
London Medical School
stereotype threat is an interesting line of inquiry – not just
relevant to ethnicity – for
example Burgess has studied gender in terms of stereotype threat in
the context of career
advancement in Academic Medicine in the US. (49)
IMG IMGs are an important asset to the Health Service in the UK. In
a review article in 2005
Sandhu opined that increasing numbers of IMGs would be needed to
achieve the rapid
increase of workers needed as a result of legislation relating to
the creation of a consultant
based service, and other working directives. (50)
Definitions of ethnicity are numerous and complex.
BME is a widely used term in public and private sector
organisations to
incorporate a range of minority communities living in the UK. Such
an
umbrella term has been critiqued in terms of the validity of
grouping
together diverse groups in this way.
Conversely for quantitative studies broad terms may need to be used
to
obtain statistically significant results
32
Sandhu raises the concern that this requirement combined with the
UK being a very
attractive place for medical graduates to work and continue their
training could encourage
an influx of inexperienced doctors or doctors having poor
communication skills seeking
opportunities in the competitive specialities. Sandhu advocates
that more realistic
information about postgraduate opportunities and training be
available to enable potential
IMGs to make a more informed choice, but also praises the
motivation and determination of
IMGs as a group.
A study in the US found great persistence on the part of IMGs in
pursuit of a US residency
position.(51) The linked data study of a cohort comprising 10,328
IMGs who were both US
citizen IMGs and non-US IMGs highlighted the importance of IMGs to
the delivery of
national healthcare.
In a large scale analysis of RCOG data Rushd undertook
retrospective analysis on the
performance of IMGs who appeared for the first time in the Part 1
(n = 11,863) and Part 2
written (n = 5336) MRCOG examinations between 2000 and 2010. (46)
Rushd’s evaluation of
the first time performance of IMGs in the MRCOG part 1 and 2
written examinations
critiques IMG as a category by identifying variation in performance
between students across
the RCOG geographical bands.
Rushd was unable to perform statistical comparisons to the results
of the study since
geographical bands are not comparable: they contain different
countries, different
academic standards, different teaching methods etc. Rushd however,
found that variation of
IMG performance was likely to be multifactorial and suggests that
the introduction of e-
learning modules may “go some way in equalising the learning
opportunities among
geographic regions and could prove useful for both trainers and
trainees.” (46)
Aside from Illing’s study, discussed below, (2) we only found one
qualitative study examining
barriers and facilitators encountered by IMGs. The study was
situated in the Netherlands
and the findings related mainly to sociocultural rather than
educational factors, including
being able to access information and financial support. (16) Lack
of command of the Dutch
language (particularly the medical terminology) and age were seen
as barriers to securing
employment and entrance to specialism. Age was only a barrier in
some specialisms since
they set an upper age limit for postgraduate specialist
training.
33
The study concluded that better support to overcome difficulties
inherent to migration and
career change would result in better trained and acculturated
doctors. The GMC has
recently undertaken some work in this area and developed a ‘Welcome
to UK Practice
programme’ to raise awareness about practice in the UK. (52)
In contrast to Vaughan, who cautioned against homophily, (19) a
presenter at the RCPsych
conference (2014) encouraged IMGs to join and become active in
diaspora organisations,
thereby familiarising themselves with working in the NHS and
broadening their network of
professional contacts. (53)
The RCPsych convened a conference in 2014 to focus on familiarising
IMGs with working in
psychiatry in the UK. The conference was organised in recognition
“that IMGs face more
problems than British graduates in succeeding in the system.” (53)
The college is keen to
support IMGs by commissioning an external review of the MRCPsych
exam and ARCPs and
appointing an Associate Dean for Trainee Support.
Feedback from the delegates was positive and the college plans to
run another in
2015.There was a recognition by delegates of the importance of
trainers, the role of
employers in developing meaningful induction programmes and giving
IMGs additional
support and remediation if required. Among the recommendations
proposed at the
conference were that the College appoint local and national IMG
Champions and improve
examiner training to help recognise unconscious biases (accents,
manner etc).
IMG is a category that needs to be problematized and properly
defined.
The literature identifies IMGs as increasingly important
internationally to the
delivery of healthcare.
IMGs are noted for their persistence and tenacity in pursuing
postgraduate
qualification.
IMGs face the inherent difficulties of migration and acculturation.
These include
language, accessing information, financial support and limited
knowledge of the
healthcare system.
34
Language A number of studies discussed language either as a sole
focus or as part of a number of
compounding factors. Woolf’s longitudinal study using exam data and
questionnaires over
two consecutive year 5 cohorts (n = 703: 51% minority ethnic) found
that speaking English
as a first language, with one parent also speaking English as a
first language and being
schooled in the UK, was a predictor of good performance in final
year UCL medical students.
However not having this level of English was not the reason why
minority ethnic students
underperformed. She suggests that where examinations like the OSCE
require
communication skills “country of schooling could be a proxy for
communication or cultural
differences.”(20)
This finding concurs with those of Watmough (21), discussed above,
who was also unable to
identify language as a determining factor in success in the RCA
postgraduate examination.
The most significant study exploring language and cultural factors,
was undertaken by
Roberts and funded by the Economic and Social Research Council
(ESRC). (3) This study
used a sociolinguistic methodology to examine both how candidates
performed in the RCGP
exam but also how the specific conditions of the exam operated to
determine behaviour.
In specific relation to the CSA, but with wider implications for
other practical exams in both
undergraduate and postgraduate contexts, Roberts’ study found the
“relatively
decontextualized nature of the CSA made it a ‘talk-heavy’
assessment from which a number
of effects flow”. These include “communicative performance factors’
which relate to how
IMGs talk and interact with role playing patients, examiner
perceptions of candidates
sounding formulaic and not engaging with the patient through a
patient centred model.”
The researchers suggest that the sociolinguistic “fingerprint” of
the exam which assumes a
patient centred approach could constitute a “hidden
curriculum.”(3)
The study concludes that “Rather than talk of ‘cultural bias’ or
not, there needs to be a
debate about tolerances and communicative flexibility, about what
are acceptable
competencies in an increasingly diverse society and how, within
these competencies, talk
and interaction can be more explicitly addressed. ‘Cultural bias’
implies that there is a goal
of neutrality that must be reached and that there is one ‘culture’,
one way of doing
things.”(3)
35
Memon argues that oral examination is an important element of
postgraduate examinations,
but ensuring its reliability and validity across specialisms is
complex to design and
implement. (35) Memon cites the work commissioned by the RCGP in
this area of
postgraduate examination as an example of good practice in
providing an evidence base for
the validity and reliability of the oral elements of their exam.
Memon cautions that IMGs
taking exams in other specialities may be disadvantaged if their
English is less fluent and
articulate than UK trained candidates.
Knight, an MRCGP examiner, argues in an editorial piece that while
there is evidence that
the MRCGP is reliable; IMGs are prone to failure because the exam
is in English and they
spend much of their practise consulting in other languages. (36)
Aside from language Knight
also cites other factors that may impact on IMG success in the
MRCGP, including differing
clinical environments in the UK from the one in which they trained
and that they may spend
much of their consulting time in the UK speaking in a language (or
languages) other than
English.(36) Knight with Roberts identify the failure to
acknowledge or assess multilingual
expertise, which both see as an asset in an increasingly diverse UK
society.
The specialities with the highest proportion of IMG candidates are
the MRCGP and the
MRCP (particularly psychiatry). (45) These specialities require
significant levels of cultural
awareness and advanced communication skills, both of which may
place IMG students at a
disadvantage. (17)
Issues around IMG students and language are not unique to the UK,
but also evident in
other countries where there are minority groups. (54)
While language may be a predictor of good performance it is not, of
itself, the reason why students fail.
Language is often conflated with sociolinguistic performance.
There is currently no acknowledgement or assessment of multilingual
expertise.
36
Gender Two papers (both American) compared female attainment
against male attainment in
obstetrics and gynaecology (Obs/Gyn) (55, 56). Both studies
conclude that women
outperformed men in the Obs/Gyn) specialism.
Bibbo’s study found that on the pre-clerkship measures MCAT men
outperformed women,
but on the overall clerkship scores women outperformed men. This
was due to womens’
higher achievement on the standardised National Board of Medical
Examiners (NBME)
subject examination. Drawing on other literature a number of
proposals were made as to
why this might be the case, including men being less interested in
the specialism and
consequently less motivated, combined with the perception that
patients prefer a female
physician. Women in contrast being potentially more motivated
because they want to enter
this specialism due to gender identification, and the dominance of
women already in the
field.(55)
Cuddy’s study on examinee gender and United States medical
Licencing Exam (USMLE)
performance also found men outperforming women at Clinical
Knowledge (CK) step 1 of the
exam but with women outperforming men at CK Step 2 (clinical
skills), and with women out
performing men in most content areas of obs/gyne, paediatrics and
psychiatry: in contrast,
men out performed women in medicine, surgery and preventative
medicine.(56)
In a Norwegian study of 2474 Norwegian residents who began
specialization in 1999-2001
(36), Johannsen found that although women progressed more slowly
than men, the gender
variation was not significant when the effects of child-birth and
having children under 18
were controlled for. But gender was found to have a strong
influence on choice of speciality
due to longer required working hours, for example in emergency
services.
In combination these studies identify a gender split in
specialisms, for example the dominance of women in Obs/Gyne.
Identified gender differences in exam performance may potentially
be linked to gender motivation to succeed in specific specialisms
and/or gender identification with certain specialisms
Studies suggest that changes to the hospital environment, working
practices and cultures could encourage a more even gender split
across the specialities.
37
9.2 The institutional
The Medical School and the working environment In a Norwegian study
(36), Johannsen looked at hospital specific factors in speciality
choice
and qualification. The study found that hospital factors were
significant predictors for the
participants (n = 2474) timely attainment of specialization.
Working at university hospitals
(regional) or central hospitals was associated with a reduction in
the time taken to complete
the specialization, “whereas an increased patient load and less
supervision had the opposite
effect.” Johannsen’s study suggested that more flexibility in the
curriculum would be
beneficial.
Illing, using quantitative and qualitative data, describes how
senior overseas doctors who
come to the UK with established clinical practices may find
adapting to a different
workplace culture difficult and not have access to the support
available to less experienced
doctors.(2) IMGs may also find difficulties understanding roles and
responsibilities in the
NHS structure in addition to patient-centred culture and a holistic
model of care. (2)
Two studies identified a need for a greater emphasis on Equality
and Diversity and cultural
awareness in training within organisations with targeted events and
diversity initiatives used
as opportunities. (3, 4)
As part of McManus’s data linkage study into PLAB and UK graduates
performance on
MRCP(UK) and MRCGP examinations, a comparison between graduates
from different UK
medical schools was performed. (7) The study found “clear and large
differences in
performance at MRCP(UK) between graduates of different medical
schools.” (7) However,
the study concluded that the identified differences in training
could not account for the
poorer performance of IMGs.
Esmail advocates examining the distribution of IMGs and BME doctors
across UK medical
schools in order to ascertain if the selection and training
placement processes could operate
against the interests of weaker candidates, thus encouraging a
cycle of educational
deprivation. (10) This observation is supported by Tiffin.
(23)
38
Mentoring There is a significant body of literature around
mentoring for medical students and doctors
at all levels of study, with the majority of studies being
undertaken in the USA. (18) Frei’s
review concludes that mentoring is “an important career advancement
tool for medical
students” and that more programmes should be set up in Europe, but
monitored and
assessed for impact.(18)
In terms of mentoring in the context of postgraduate medical
training the literature is not
well developed, although there is support from the Royal colleges
and the NHS generally.
(57) (58) Stamm’s study examining mentoring as part of a
developmental network, set in
Switzerlamd, found that only 50% of doctors undergoing specialist
training (n = 326) took
advantage of mentoring despite the positive benefits identified and
of those, females
received less mentoring than their male colleagues. Reasons for
this gender gap were
identified as primarily due to extraprofessional concerns. Stamm
concludes that given the
often less straightforward career path for females mentoring is
particularly important.(59)
Steven’s qualitative study over six NHS sites, identified benefits
across the professional-
personal interface. Steven suggests that successful mentoring makes
doctors feel more
confident and satisfied in their work, and this will have
beneficial impacts for organizations.
(60)
Selection The literature on postgraduate selection is less
developed than that of undergraduate
selection into medical school. The UK general practitioner
selection process, uses a national
machine markable shortlisting test to assess both cognitive and
non-cognitive skills and a
‘corporately owned’ and validated selection methodology. Plint (42)
summarises the success
of the process and the confidence in it from both students and
deaneries as: “Corporate
commitment to national process; legitimate authority and locus of
control; process of
incremental convergence, rather than imposition; development and
adoption of validated
selection method; representative infrastructure operating the
process.”
McManus, undertook a significant study examining the educational
background and
qualifications of UK students from ethnic minorities and the
selection for medical school.
(47) The study addressed the assumption that entrants to medical
school are equivalent in
39
their academic ability and that following on from this differential
attainment in
undergraduate medical exams and beyond were accounted for at some
point after selection
to medical school. The study found however, that non-white students
had slightly lower A
level and GCSE grades than their white peers. Concluding that while
GCSE and A level grades
might explain some of the effects found, they could not entirely
explain the poorer
performance of non-white students at medical school and
beyond.
Citing the GP selection process as an example, Paterson recommends
a more robust
selection process.(4)
9.3 Policy
Predictors of success at postgraduate level Woloschuk’s small scale
study (n = 244 medical graduates) at the University of
Calgary,
Canada found that measures of undergraduate performance seemed to
be poor predictors
of postgraduate success. In particular they found a ‘weak’
relationship between
performance in the Medical Council of Canada (MCC) national
licencing exam, which they
describe as a “rite of passage” to postgraduate training, and
residency. They suggest that
success may be due to non-cognitive attributes for example, work
ethic, personality and
motivation (61).
Understanding workplace culture is important.
Mentoring programmes are beneficial but they need to be robust and
evaluated to ensure they are effective.
The literature on postgraduate selection is less developed than
that of undergraduate selection into medical school.
Prior attainment could not entirely explain the poorer performance
of non-white students at medical school and beyond.
Selection processes for postgraduate study are highly
variable.
40
PMQ In a high quality comparative study of UK trained doctors and
those whose PMQ was gained
outside the UK sitting the RCA exam from 1999-2008, Watmough (62)
found that candidates
from Egypt, Iraq, Ireland and Pakistan performed significantly
worse than those from
Australia, New Zealand, South Africa, Zimbabwe and the UK. From
June 1990 to February
2008, there were 9,315 attempts at the MCQ by 5,797 graduates from
70 countries, with 25
countries having candidates who made 15 or more attempts. The
analysis was undertaken
using data from the written part of the exam which uses multiple
choice questions to test a
range of generic clinical skills. The MCQ is a high stakes exam
essential for career
progression to consultant level. The study did not find a coherent
pattern to attainment and
concluded that “some IMG graduates who sit UK postgraduate exams
may require
additional support prior to taking the exam.” Importantly, the
underperformance of
students from Ireland and Pakistan, where English is the main
language in medical
education, indicates that language is not a key factor in
differential attainment in this exam.
The authors suggest that rather than language it may be that
cultural ties ease the transition
of working in the UK, however the poor performance of candidates
from the Republic of
Ireland casts doubt on this supposition.
High stakes examinations High stakes exams all contain a number of
components, assessment of practical skills using
‘real’ or simulated clinical scenarios, multiple choice, written,
oral – different elements of
the exam are marked in different ways: by computer, by examiner and
by assessment of
skills.
It is important that the transparency and fairness of ‘high stakes’
exams be demonstrated
given the influence they have on a doctor’s career progression and
employment
opportunities. Memon, in relation to the specifics of oral
examination in postgraduate
examinations argued for the Royal Colleges to undertake much more
rigorous validity and
reliability testing on their high stakes exams. (63)
Wakeford’s large scale assessment of validity and differential
performance by ethnicity in
the RCGP and MRCP(UK) examinations followed in the wake of a
Judicial Review.(24) It
sought to evaluate if the performance of candidates in the MRCP(UK)
was predictive of their
41
attainment in the MRCGP (usually taken 3-4 years after). The study
found substantial
correlations between a candidates performance in the two exams
which provides support
for the validity of each. (24)
Wakeford identified a higher correlation between PACES and the new
CSA than the old,
suggesting that the new CSA is a more valid assessment. (24) in
addition the study found
that in particular BME candidates showed a higher correlation
between PACES and the CSA
than white students “suggesting that there is less extraneous
variance between BME
candidates making it a more valid assessment.”(24)
MRCGP and MRCGP Clinical Skills Assessment (CSA)
The CSA exam was revised in the autumn of 2010 to improve the
reliability of the
assessment. In Esmail and Roberts key study, using previously
unavailable data from the
GMC and the RCGP, they examined ethnic minority candidates
performance in the MRCGP
exams between 2010 and 2012: therefore testing the new
CSA.(28)
The headline conclusion was that “subjective bias due to racial
discrimination in the clinical
skills assessment may be a cause of failure for UK trained
graduates and international
medical graduates.”(10)
Discrimination is an incendiary term. Judith Hawkins writing in
Mancunian Matters
explained the format of the CSA to its non-medical audience and
outlined Esmail’s findings.
The article stimulated a heated online debate among readers who
were only too willing to
support claims of racial discrimination.(64)
Esmail and Roberts suggested that the different training experience
and other cultural
factors (patient/doctor relationship and proficiency in spoken
English for example) between
UK and non-UK trained candidates could affect exam outcomes.
However they did not
consider that these cultural factors could entirely account for
differential attainment
between white and BME UK trained candidates. It was suggested that
discrimination could
occur at a number of points in the CSA: the behaviour of
standardised patients to white and
non-white candidates and bias on the part of the
examiners.(10)
McManus’s study (7) leads on from this study by Esmail and Roberts
(10) although there are
significant differences between the two: McManus’s study analyses
PLAB part 1 (which
42
Esmail does not) and it analyses a larger dataset (n = 7,829) from
MRCP(UK) compared to (n
= 5,055 candidates + 1,175 not trained in the UK). Both studies use
candidates’ marks at 1st
attempt for all analysis. McManus’s study found that IMGs lower
performance was
“unlikely to result from systemic examiner bias or
discrimination.”
Knight states in an opinion piece, that while there is evidence
that the MRCGP is reliable
IMGs are prone to failure because the exam is in English and they
spend much of their
practise consulting in other languages. (15)
The CSA exam was revised in the Autumn of 2010: the new CSA has
been found not to
discriminate between white and BME candidates. (24) However, the
CSA will inevitably carry
implicit cultural association’s specific to UK medicine. Esmail
states that the CSA is not, and
was not intended to be a culturally neutral exam. Therefore UK
graduates are likely to be
initially more successful, because they are acculturated.
(28)
The CSA was consistently identified in the medical news media as a
particular issue for IMGs.
Commentary was prompted by both Esmail’s study (65, 66) and the
judicial review. (67) (68)
MRCOG
The MRCOG is an internationally recognised standard and at the time
of Rushd’s study more
than 85% of the total candidates were IMGs. The study found that
MRCOG examination
success rates were significantly different according to the
university of medical graduation.
Rushd also identified a variation in performance among graduates
from different medical
schools in the Part 1 and 2 of the MRCOG written examination which
was comparable to
those school’s performances on the MRCP (UK). (46)
PLAB and IELTS
If IMGs are going to sit the PLAB they need to demonstrate that
they have achieved an
acceptable level of English via IELTS in the previous two years.
PLAB was reviewed in 2011 to
assess whether the knowledge and skills demonstrated by passing the
PLAB continued to be
equivalent to those demonstrated by an F1 doctor. A key component
of this review was to
examine any disparity between IMGs, who successfully passed the
PLAB test, and their UK
graduate peers in postgraduate examinations.(7) Aside from
difficulties relating to direct
comparison the study concludes that there are good correlations
between PLAB and the
43
MRCP(UK) and MRCGP which means that PLAB is a valid assessment of
skills relevant to
progression during UK postgraduate training. It should be noted
however, that PLAB is not
designed to predict postgraduate exam performance, or to ensure
that those passing PLAB
can achieve at postgraduate level.
In order to produce outcome equivalence between IMG and white
graduates it was
suggested by McManas that the PLAB pass mark could be set higher –
however this would
have significant impacts on health service delivery.(7)
In Tiffin’s study of UK based trainee doctors with at least one
competency related ARCP
related outcome (n = 53,463 of whom 11,419 were IMG registered
following a pass from the
PLAB route) in the study period also found that the PLAB test was
not generally equivalent
to the requirements for UK graduates. With the standard of English
competency and the
PLAB pass mark needing to be raised to ensure equity.(23) Tiffin
also discusses how PLAB
candidates with lower scores may not be able to secure a post in
their preferred specialism
and therefore successfully apply for “shortage specialities” like
psychiatry and general
practice. Given that these specialisms require enhanced
communication skills some IMGs
may immediately be disadvantaged. (23)
Sandhu notes that the requirement to pass these exams cuts into
IMGs time and can cause
the erosion of time for research resulting in IMGs CVs being weak
in publications which can
impact on them being shortlisted for jobs in spite of clinical
experience.(50)
Much of the research into differential attainment is quantitative
with a focus on testing for
bias in the exam, or a component part of it using exam data. Taken
as a whole these studies
have broadly demonstrated the validity of some high stakes exams
and discounted evidence
of bias in the exams themselves leading to differential attainment.
This view is endorsed by
Patterson with the caveat that it is not an endorsement of all
assessment tools.(14)
Examiner bias Examiner bias in relation examinations like the MRCGP
and the MRCP in which candidates
are judged ‘live’ and therefore examiners can identify a
candidate’s gender and ethnicity has
been frequently questioned (11) (8, 9).
44
Examiner bias is a potential risk in any examination and a threat
to the validity of an
examination. The first study in this area by Dewhurst (9) focused
on the MRCP(UK) and
found any potential examiner prejudice was only significant when
two non-white examiners
examined a non-white candidate. This Dewhurst suggested was not
conscious and may
relate to a consistency in communication style and cultural
understanding.
McManus’s investigation into possible bias as a threat to the
validity, used data from
MRCGP(UK) PACES and nPACES examinations.(8) The study found that
having two
independent examiners reduced any potential for bias and judged it
a preferable method of
assessment over a single examiner. This is an example of how the
infrastructure around an
exam can potentially impact on the outcome for the candidate
Denny’s study to investigate potential examiner bias as responsible
for differential
attainment in the MRCGP CSA, found no evidence to support examiner
bias. Finding that
differential attainment was linked to the candidates’ demographic
rather than the
examiners. (11)
In a letter to the BMJ, Shaw opined that the new CSA high failure
rate of ethnic candidates
was an unintended consequence of selection and examination