Download - Understanding differential attainment across medical ...

Understanding differential attainment across medical training pathways: A rapid review of the literaturereview of the literature Final report prepared for The General
Medical Council
Dr Sam Regan de Bere, Dr Suzanne Nunn, Dr Mona Nasser
21/08/2015
1
Funded by the General Medical Council.
The views expressed in this report are those of the participants and the authors and do not
necessarily reflect those of the General Medical Council.
2
Executive Summary .................................................................................................................... 5
Current narratives of differential attainment ................................................................ 10
1. Introduction ..................................................................................................................... 14
2 Background ...................................................................................................................... 15
4 Methodology .................................................................................................................... 16
5.2 Search strategy .......................................................................................................... 17
5.4 Quality assurance ...................................................................................................... 20
6 Research Ethics ................................................................................................................ 22
7 Data Analysis .................................................................................................................... 22
8 Narrative synthesis .......................................................................................................... 25
Study habits ..................................................................................................................... 27
Success ............................................................................................................................. 28
Ethnicity ........................................................................................................................... 29
IMG ................................................................................................................................... 31
Language .......................................................................................................................... 34
Gender ............................................................................................................................. 36
Mentoring ........................................................................................................................ 38
Selection ........................................................................................................................... 38
PMQ ................................................................................................................................. 40
Examiner bias ................................................................................................................... 43
10.3 Possible interventions ........................................................................................... 48
12 References ....................................................................................................................... 52
13 Appendices ....................................................................................................................... 56
Appendix 1: Studies and other documents included in the synthesis ............................... 56
Appendix 2: Quality evaluation of studies using primary data............................................ 87
4
Table of Figures Figure 1. Flow diagram of study selection ............................................................................... 23
Figure 2 Analysis of included studies and documents by methodology or type ..................... 24
Figure 3. Publication by date ................................................................................................... 25
Figure 4. Conceptual map of themes identified in the published literature ........................... 26
Table of Abbreviations and Acronyms AoMRC Academy of Medical Royal Colleges
BME Black and minority ethnic
BAPIO British Association of Physicians of Indian Origin
CSA Clinical skills assessment
GMC General Medical Council
IELTS International English Language Testing System
IMG International medical graduates
MRCOG Member of the Royal College of Obstetricians and Gynaecologists (examination)
MRCPsych Member of the Royal College of Psychiatrists
MCQ Multiple choice question
OSCE Objective Structured Clinical Examinations
PMQ Primary medical qualification
RCA Royal College of Anaesthetists
RCGP Royal College of General Practitioners
RCOG Royal College of Obstetricians and Gynaecologists
USMLE United States Medical Licensing Examination
5
Executive Summary
Introduction Differential attainment is a term used to describe the variations in levels of educational
achievement that occur between different demographic groups undertaking the same
assessment. Differential attainment has been recognised as a challenge for medical
professionals and educators since the 1990s, and has been observed in both undergraduate
and postgraduate contexts. It is not specific to medical education; it is a feature of
professional education more generally.
Since 2010 the GMC has worked with others analysing data in order to better understand
the progress of trainees through their programmes and to identify any potential differences
between demographic groups. This rapid review of literature published in the period
between 2004 and the present day contributes to a wider programme of research being
carried out by the GMC to explore differential attainment across training pathways.
Research design The research was commissioned to provide a rapid review of the corpus of knowledge
relating to differential attainment. The researchers adopted a narrative synthesis
methodology in order to explore how contributions to the literature had sought to define,
measure and explain differential attainment – and therefore to identify key factors that
might be considered as having an impact upon attainment.
An initial scoping exercise highlighted that the current corpus of literature comprises
materials in a variety of formats, including; qualitative and quantitative research reports,
systematic reviews of attainment data patterns, policy documents and academic papers,
and opinion pieces and editorials.
Narrative synthesis provides a useful framework for accessing and analysing such diverse
and complex literatures. It lends itself to a ‘storytelling’ approach, by capturing a number of
different insights, evidence bases, theories and position pieces in context, and presenting
them together as an overarching narrative of differential attainment. In addition, rather
than imposing a definitive structure or sequential process, which might preclude certain
significant contributions that do not fit the initial review terms (1), narrative synthesis
6
allows researchers to move iteratively within a systematic approach – picking up on leads to
relevant information throughout the research process.
The search was conducted using PubMED, MEDLINE and PsychINFO databases, within a
search strategy that included Medical Subject Headings (MeSH) terms and text-word
searches for maximal retrieval. These searches were supplemented with further iterative
searching of reference lists, and a grey literature search of stakeholder websites. The
research team was supplemented by an expert panel, members of which were selected in
order to provide advice on search terms, to discuss the quality of the retrieved literature, to
comment on any initial emergent themes and to review the final report prior to submission
to the GMC.
We developed two frameworks against which to evaluate the retrieved papers and grey
literature: PICOC (Population, Intervention, Comparison, Outcome, Context) for quantitative
papers and SPICE (Setting, Perspective, Intervention/phenomena of Interest, Comparison,
Evaluation) for qualitative papers and other documents. These frameworks provided
transparency for our identification of included papers and other documents. A total of 39
papers were included in the synthesis with the addition of 24 documents from the grey
literature.
The literature on differential attainment The findings of narrative synthesis are grounded in the literature surveyed. The research
process does not begin with a set of a priori assumptions: instead, using this method
enables themes to emerge and be recorded as the literature is identified. The search
process highlighted that the evidence base relating to differential attainment is disparate,
that it includes a number of different research designs and variously applied methods, and
that it does not feature definitive terminology across studies. Concepts and terms are often
used interchangeably and are operationalised in research accordingly, which makes
constant or consistent comparison difficult to validate.
Overall the peer reviewed literature was of a high quality, where research aims, objectives,
methods and analyses were clearly articulated and justified. The main focus of primary
research was on the relationship between ethnicity and differential attainment in high
7
stakes examinations. While some studies are focused on undergraduate populations, some
on postgraduate doctors, and a number include both, we found that the research questions,
findings and conclusions were nevertheless relevant to understanding the emerging
narrative of differential attainment in postgraduate cohorts.
Given the limitations in the literature, we read and re-read the materials selected,
individually and then discussed as a team. During this process we used conceptual mapping
to help us understand the categories and themes arising from the entire data set. This
grounded approach led to the emergence of a three level schemata, providing three distinct
but related categories, or layers, of information on:
• the macro or policy level (investigating the political agendas and practical activity surrounding high stakes examinations)
• the meso or institutional level (exploring the impact of the medical school, training contexts and/or working environment)
• the micro or individual/discrete group level (with a focus on individuals or groups of students, doctors, examiners and so on)
Quantitative studies dominated the research base (26 studies), focusing on the macro level
and typically using large data sets to examine causal and associative relationships between
various demographic groups and different high stakes exams. The focus of the qualitative
research (5 studies) was more diverse, and explored the role of factors at the micro and
meso levels of infrastructures built to support examinations, cultural contexts and personal
interactions.
Two large scale commissioned studies in the grey literature examined the significance of
language and cultural factors for IMGs (2, 3) using mixed methods approaches, and one
further study combined a literature search with interviews.(4) In addition to this there was
one systematic review and meta-analysis of ethnicity and performance in UK trained doctors
and medical students, focusing on quantitative reports. (5)
Investigating assessment agendas and the design of high stakes examinations
The majority of studies dealing with the macro level focused on differential attainment in
high stakes exams. The research upon which this aspect of the literature is based typically
used quantitative methodologies, using large datasets, with a focus on testing for bias in the
8
exam, or a component part of it using exam data. Their conclusions are founded on typically
high quality, peer reviewed reports including clear validity measures.
Taken as a whole, these studies have broadly demonstrated the validity of high stakes
exams, and discounted evidence of bias in the nature and structure of exams themselves as
causal factors for differential attainment. However, the emerging narrative contains a
recognition that the infrastructures and processes put in place to support selection (6) and
high stakes exams may nevertheless encompass elements that lead to actual or implied bias
and/or differential attainment. (7)
Examples of this include: i) potential examiner bias through levels of concordance between
examiner and candidate in practical examinations, (8-12), and ii) the lack of a universal
terminology to classify data, which may lead to different interpretations of bias and/or
differential attainment from the exam data. An example of this is the variation in the ways
the Royal Colleges monitor for protected characteristics, which has been identified as a
potential contributor to unfair bias. (13)
The impact of institutional structures and organisational contexts
Literatures focusing on the nuts and bolts of postgraduate education, at the level of the
medical school or workplace, highlighted the paucity of well-developed research into
postgraduate selection. These contributions drew on primary data, were typically published
in high quality academic sources, and editorial comments were presented by authors with
research or experience in this area. (14, 15)
In contrast to undergraduate selection, there is little research on postgraduate selection
processes. In the literature, selection processes were presented as highly variable,(4, 6)
although it was recognised that a rigorous (or otherwise) selection process might have
implications for attainment. Best practice selection methods highlighted in the sample
involved the identification of required competencies and the development reliable
assessment methods for them. The narrative suggests that the application of a validation
process should be used to assess the predictive value of the selection methods.
Pre-entry advice and proper induction processes were identified across the international
literature as important factors for IMGs and other students who gained their PMQ outside
9
the country they wished to work in. (16, 17) One significant UK focused study (2) identified
the GMC as having a central role in developing a ‘joined up’ approach to supporting PMQs
and IMGs in addition to individual employers. (2)
Buddying or mentoring was highlighted as a useful approach to assisting acculturation. A
literature search of PubMed identified mentoring programmes for undergraduates as having
positive impacts on attainment levels but cautioned that this was relevant only if such
programmes were based on robust designs and were evaluated to ensure effectiveness. This
review demonstrated that most research in the area of mentoring to improve attainment
has been undertaken in the USA. (18)
Understanding the role of the individual or discrete group
The literature pertaining to the individual or discrete groups suggested that a combination
of factors may be associated with educational performance. These include: learning styles
and psycho-social factors; demographic characteristics such as gender and ethnicity; wider
social and cultural capital; language and other, more tacit, contributors to success. The
literature exploring these factors used both qualitative and quantitative methodologies and
was generally of a high academic quality whereby methods and findings were justified
accordingly. Two of the four studies were UK-focused: and both examining undergraduate
medical students (19) (20) two qualitative studies more narrowly focused on specific types
of student in the USA and Saudi Arabia focused on contributors to success. (21, 22)
Numerous studies focused on ethnicity in relation to analysis of differential attainment at
macro, meso and micro levels. However, whilst this issue dominated the literature, the
complexity of the term was largely unaddressed. Terms such as IMG and BME were used
interchangeably and uncritically.
For example, while “BME” is a widely used term in public and private sector organisations to
incorporate a range of minority communities living in the UK, using it as an umbrella term to
group together diverse socio-cultural demographics has been critiqued – but typically this is
not addressed in the sampling or conclusions drawn from the various studies within the
literature.
10
Whilst perhaps more obvious, IMG is another umbrella term specific to medicine that
requires clear definition, for similar reasons. The narrative emerging from the literature
identifies “IMGs” as being increasingly important to the delivery of healthcare, but
nevertheless experiencing the inherent difficulties of migration and acculturation. However,
the specifics of these difficulties, how they might vary – and why this might be important for
differential attainment of IMGs – is absent from these discussions.
Similarly, ‘language’ is cited as a predictor of good performance but it is not proven to be, of
itself, the reason why students and/or doctors fail high stakes examinations. Moreover, any
sociological or psychological examination of ‘language’ is also missing, and the concept is
treated as unproblematic in terms of its application as a potential factor underpinning
attainment.
The key narratives of differential attainment Following thematic analysis, narrative analysis was then used to identify any relationships
emerging between and across these themes. As has already been acknowledged, the
literatures are disparate and disjointed. However, there key messages are similarly
structured around: i) the potential causes of differential attainment, ii) the ways in which
differential attainment has been researched and iii) potential interventions to further our
understanding and help inform strategies going forward.
Understanding causes and relationships
The initial research undertaken into understanding differential attainment tended to focus
on the analysis of exam data with the aim of validating high stakes examinations or
identifying bias. There were 5 high quality quantitative studies included in the analysis. (7,
12, 23-25). The dominant message from these studies was that, while the reasons for
differential attainment remained unclear, they were likely to be multifactorial.
The chronological trajectory of the research demonstrates that research is increasingly
emphasising the importance of educational and social factors in contributing to
performance. In this area research is frequently qualitative. We found 8 studies, key among
which were Woolf’s analysis exploring the relevance of stereotype threat (26) and
Vaughan’s study using social capital theory to understand the role of networks and social
behaviour (19).
11
Both of these studies focused on undergraduate medical students, but provided a way of
analysing differential attainments that bear relevance for postgraduate patterns. In terms of
studies examining the attainment levels of postgraduate students, Illing’s and Roberts’
studies were the most extensive in terms of scope and data analysed. (2, 3)
The general point to draw from this development of research foci in both undergraduate
and postgraduate fields (and one that suggests we may be best served by considering both),
is growing consensus that researchers should not limit their analysis purely to exam results.
Current thinking acknowledges the requirement to examine the ‘whole’ of the exam; its
support structures (both formal and informal) and features of its candidature that go
beyond demographics to attitudes and behaviours.
Selection, language and the identification of facilitators, as well as barriers, are factors that
have been emphasised across a number of studies. In much of the literature, language is
used as a proxy for communication broadly, which is an umbrella category incorporating
gesture, pronunciation and intonation etc. This is an important observation, since
communication skills form part of clinical skills assessments and these carry with them
implicit cultural assumptions relating to the doctor-patient dynamic. The message emerging
here is that lack of acculturation will impact on performance and ultimately attainment,
even if clinical skills are to an expected standard or level of competency.
The literature also identifies poor induction, lack of support for IMGs in overcoming the
difficulties inherent to migration, and career change; all as factors that may disadvantage
IMGs in becoming better trained and acculturated doctors. A small number of studies have
highlighted the importance of considering factors that support higher levels of attainment.
Qualitatively, it is important to note these contributions to the building narrative: limiting
analysis to why certain individuals or discreet groups might fail to progress along the career
pathway risks ignoring evidence that identifies why other individuals or discreet groups
succeed – all of which might help us to understand different levels of attainment along the
spectrum.
The importance of appropriate research design
For the reasons outlined above, this review included studies employing different research
methods, the majority of which undertook the quantitative analysis of primary data. In
12
order to examine the complex nature of ‘causes’, qualitative research approaches have
more recently been used to examine complex phenomenon embedded in the culture and
contexts of assessment.
This relatively recent turn to qualitative methodologies to capture evidence of complexity
adds depth of understanding to the breadth of the quantitative research literature. Indeed,
the narrative emerging from the more recent contributions to the literature suggests that
innovative research approaches are required now that complexity is acknowledged. Specific
recommendations within the literature include: longitudinal tracking, interdisciplinary
research to provide fresh perspectives, and the development of more appropriately
sophisticated theoretical frameworks.
A significant issue across the research is the lack of (either) transparency or consistent
definition around the categories of explanation. While some contributions acknowledge the
inherent difficulty in defining and categorising, it remains the case that umbrella categories
like BME and IMG, ethnic group and ethnicity have not been subjected to full interrogation.
In this sense, the development of suitable interventions to address the problem of
differential attainment is compromised by the problem of inconsistently applied definitions
and classifications across existing databases and research studies.
Possible interventions and future strategies
Overall, the differential attainment literatures suggest that a variety of factors may affect
performance and attainment. These include issues around the background and
characteristics of the individuals, the stage they are at in their medical career and the
organisational structure of different workplace settings. These might have cumulative
effects over time or ‘one-off’ effects at certain stages of their career.
Due to the variety of factors identified as potentially affecting performance and attainment
in part, the narrative emerging from the current body of knowledge recognises the need for
a complex intervention incorporating analysis of the micro, meso and macro levels of
engagement - rather than a simple intervention to establish cause and effect relationships
of single factors.
13
The first consideration in designing an intervention relates to the level at which the
intervention is required: at the individual level, the institutional level, a broader policy level,
or a complex intervention with components on each level. It is important to recognise that
any intervention targeted at a single level needs to be thought through across all levels in
case unanticipated effects at other levels emerge as a consequence
Conclusion
This review has found that differential attainment in postgraduate medical education in the
UK cannot be attributed to a single identifiable cause, but results from a subtle combination
of factors yet to be fully explored. Over time, research has moved from the quantitative
analysis of exam data towards a more cross-disciplinary approach in order to explore a
combination of educational and social factors (rather than single causes) as contributors to
differential attainment. Such an interdisciplinary approach is now presented as essential for
developing a nuanced understanding of the complexities of differential attainment across
the micro, meso and macro-structure of medical education, and is viewed as the foundation
upon which future interventions may succeed.
14
1. Introduction Differential attainment is a term used to describe variations in educational achievement by
different demographic groups undertaking the same assessment. It is a phenomenon readily
identified across the educational landscape, and research by HEFCE and others has
identified a complex range of personal, cultural, institutional and structural factors
impacting on parity.(27)
Differential attainment has been a recognised feature of medical educational achievement
since the 1990s in both undergraduate and postgraduate contexts. But interest in the
underperformance of ethnic minority doctors has been heightened in recent years in the UK
with a judicial review in the High Court (April 2014) for alleged racial discrimination against
ethnic minority doctors by the RCGP in their high stakes examinations. The legal challenge
from BAPIO was dismissed but the Judge recommended action on differential attainment
and that the RCGP should focus on training to ensure that candidates are prepared for their
examinations.
The Judicial review is often presented as the catalyst for action, whereas the GMC has been
working with others since 2010 to analyse data to better understand the progress of
trainees through their programmes. A commissioned independent review of the RCGGP CSA
identified that overseas qualified doctors, or (IMGs), were 15 times more likely to fail the
CSA, and UK qualified BME doctors were four times more likely to fail than their white
counterparts at first attempt (the difference diminished for UK BME doctors on their second
attempt but differences for IMG BME doctors persisted on second and third attempts). (28)
Recent analysis of exam data has shown that in a simple univariate analysis the same
patterns of attainment were present across speciality groups. (29)
The present literature review contributes to a wider programme of work being carried out
by the GMC to explore differential attainment across training pathways.
15
2 Background
Differential attainment is a term used to describe variations in educational achievement by
different demographic groups undertaking the same assessment. Characteristics including
gender, age, ethnicity, nationality and socio-economic status, along with medical school and
postgraduate training programme, are all factors that HEFCE have identified as having a
correlation with performance and attainment.(27)
A search of PROSPERO and the COCHRANE library revealed that there are currently no
registered substantive reviews of differential attainment specific to postgraduate medical
education. There is however a growing body of literature examining potential causes and
factors relating to differential attainment across both undergraduate and postgraduate
medical education. (20, 21, 30, 31)
3 Aims and purposes of the review The purpose of this review is to understand from the existing evidence the underlying
causes of differential attainment in postgraduate medical education in the UK and English-
language speaking countries with comparable medical education systems (USA, Canada,
New Zealand and Australia). This includes identifying different causes and/or significance of
causes across those countries, providing a conceptual framework to design interventions to
address these issues in UK, identifying possible methods for further research in this area and
rating the strengths and weaknesses of evidence that may suggest areas for future research
and/or work.
The aims of the review are as follows:
o To establish an evidence base for differential attainment in the UK and other
comparable countries
o To identify any research methods pertinent to identifying and/or understanding the
causes of differential attainment in UK postgraduate medical education
o To examine interventions that have been effective in reducing differential
attainment that may be applicable to UK postgraduate medical education
16
o To rate the quality of evidence as a ‘springboard’ for future work
4 Methodology 4.1 Rapid review Systematic reviews that engage with health policy are becoming increasingly valued by
policy makers as the evidence base becomes more complex (32). However policy makers
often require a synthesis of knowledge on emerging issues within a short time frame in
order to facilitate a timely response and/or decision. A traditional systematic review takes at
least 12 months to complete, the need to accelerate this process to produce a rapid review
requires the reviewers to undertake methodological ‘shortcuts’ to streamline the process.
There is currently no standardised method for undertaking rapid reviews, and indeed Oliver
argues that this may be counterproductive.(33) In a review of the methods used in rapid
reviews Ganann et al recommend transparency of reporting methods, in particular where
‘traditional’ processes had been streamlined. (34)
There is considerable debate about the relative merits of full systematic over rapid reviews
with rapid reviews considered appropriate to answer focused questions or as an important
intermediary step to further research where interventions are complex. Rapid reviews may
lack the depth of full systematic reviews to present detailed recommendations, but a review
comparing cases where both rapid and full systematic reviews were conducted found that
overall there was no significant impact on the final conclusions of a review. (35)
4.2 Narrative synthesis “’Narrative synthesis’ refers to an approach to the systematic review and synthesis of
findings from multiple studies that relies primarily on the use of words and text to
summarise and explain the findings of a synthesis”.(1)
The flexibility of narrative synthesis lends itself to this type of ‘storytelling’ since rather than
having a definitive structure or sequential process (1) it relies on a framework that can be
broken down into four elements, through which the researchers can move iteratively:
• Developing a theory about how the intervention works, why and for whom
• Developing a preliminary synthesis of findings of included studies
17
• Exploring relationships within and between studies
• Assessing the robustness of the synthesis
5 Methods 5.1 Development and registration of protocol The protocol for the research was developed by the core research team: Drs Regan de Bere,
Nunn and Nasser with support from the expert panel. The protocol for the research was
agreed with the GMC on 06/02/2015 and registered with PROSPERO on 26/02/2015. The
protocol was subsequently published on the PROSPERO
website http://www.crd.york.ac.uk/PROSPERO Reference no: CRD42015017130.
5.2 Search strategy The inclusion and exclusion criteria were agreed between the lead researchers and the
expert panel. These criteria set the boundaries for the research.
Table 1 Inclusion and exclusion criteria from the protocol
Inclusion Exclusion Published between 01/01/2004 and 01/01/2015
Disciplines outside medicine (e.g. pharmacy, dentistry, nursing and midwifery)
UK and countries with comparable medical education systems (USA, Canada, New Zealand and Australia).
In the English language Studies using any methodology singly or in combination and ‘grey’ literature
Studies or documents related to postgraduate, and where appropriate to undergraduate medical education
Differential attainment /success or failure
However, as the research progressed we did revisit and refine the initial criteria as we
identified gaps and leads to important relevant literatures previously excluded. For example,
while Norway was not on our original source list, while reviewing the literature we included
one Norwegian study (36) since it addressed conceptual issues we considered relevant to
the review (namely those surrounding gender and qualification related to working
environments). We also included a study from Switzerland examining the impact of
mentoring during postgraduate training. (37)
We searched PubMed using the following search strategy that includes MeSH and ‘free
text’.
#7 #3 AND #6 Filters: Publication date from 2004/01/01 #6 #4 OR #5 #5 (Attainment or success* or fail*) #4 "Educational Status"[MeSH] #3 #1 OR #2 #2 (postgraduate AND educat* AND med*) # 1 "Education, Medical, Graduate"[MeSH]
This search strategy was adapted for other databases like PsychINFO. We also searched
reference lists of key papers to:
1. Ensure that our search criteria was identifying key papers
2. Identify additional papers and/or grey literature
We also added to the studies found through the searches from our own knowledge of the
subject literature.
We did not consult authors directly but met several leading researchers in the field at
related GMC events 27/2/15 and 16/03/15 where there was the opportunity to discuss the
review.
We also placed a call on the GMC website for contributions from other researchers and
interested parties. This call produced no new sources of information.
The results of the searches, conversations and prior knowledge of the literature identified
prominent topic areas and issues in the medical education literature, as well as highlighting
those which have been less well documented. This information was later used to conduct
additional iterative searches in educational literature in order to fill any gaps identified.
19
As part of the selection process, we categorised relevant literature in medical education that
fell outside of our inclusion criteria i.e. studies relating to other countries. The rationale for
this was to enable decisions at the later analysis stage, to decide whether such studies might
help us fill any gaps (or otherwise).
After an initial screening of the results, we used NVivo 10, a data management software
package, to calculate the themes identified across the literature. Individual papers may
contain several foci and each is coded individually. By listing the number of studies that
reference each descriptive theme we developed a simple schema to identify gaps in the
literature. From this we conducted further iterative searches in the medical undergraduate
literature to assess if there were any generalizable findings from those studies.
We also undertook general searching of relevant stakeholder websites listed below for grey
literature.
General Medical Council British Medical Association Royal College of Physicians and Surgeons of Glasgow
Royal College of Psychiatrists
Royal College of Ophthalmologists
Royal College of Obstetricians and Gynaecologists Royal College of Radiologists Royal College of Paediatrics and Child health Academy of Medical Royal Colleges
(AoMRC) Royal College of Physicians of Edinburgh Royal College of Physicians of London Royal College of Physicians of Ireland
Royal College of Surgeons of England Royal College of Surgeons in Ireland Royal College of Surgeons of Edinburgh
UK Higher Education Funding Council for England (HEFCE)
Other representative groups: BAIPO Medical Woman’s Federation
The initial search term used in ‘Google’ was:
name of the stakeholder AND differential attainment
20
We then searched iteratively within the stakeholder websites for additional documents.
5.3 Data management and extraction In defining eligible literature formats, we included all content-relevant documents and
articles, regardless of the status of their publication. The final sample therefore included
academic studies, unpublished research, conference papers, guidance documents, opinion
pieces and so on. Editorial and opinion pieces are included since they can provide useful
insights and offer potential solutions or identify areas for thought. They will not be formally
quality assessed but we will report on the perspective from which the paper was written
(the author and their background) and how this may have contributed to the shaping of
his/her argument.
We developed frameworks that disaggregated the elements of the research question,
against which to map the papers. Due to their structured nature, quantitative studies
tended to relate to the elements of the PICOC framework (Population, Intervention,
Comparison, Outcome, Context), whilst qualitative studies were typically more effectively
interrogated using the SPICE framework (Setting, Perspective, Intervention/phenomena of
Interest, Comparison, Evaluation). The frameworks provided a transparent method of
identifying papers to include and exclude from the synthesis.
We found no randomised or non-randomised controlled trials. Most studies focused on
evaluating certain factors like gender and ethnicity on the performance of the students.
Therefore, we have used a modified version of PICOC and SPICE frameworks for the final
synthesis presented in this report. This is still consistent with our methodology in the
protocol registered with PROSPERO (CRD42015017130).
5.4 Quality assurance Due to the inclusion of a wide variety of material in the final synthesis, and the iterative
method of study and document extraction, the transparency of all decisions made about
inclusion is guaranteed by thorough documentation of each stage of the review and the
decision-making processes.
We undertook a quality assessment of the studies that included primary data using an
adapted version of the Critical Appraisal of Qualitative Research (CASP) framework. We used
21
this for both qualitative and quantitative studies since the key issues around quantitative
studies related to the approach to the questions, the design of the research as related to the
question, the study’s population and what was measured and how. The ratings of the
studies (high quality / unclear quality / low quality) are included where appropriate in
Appendix 1 and a fuller description of the evaluation of each study using primary data is
included as Appendix 2. We included a question related to generalizability of the study
(direct / indirect / unclear). This question does not contribute to the quality evaluation but is
reported separately to account for generalizability to the review.
The research team was supplemented by an expert panel to advise on search terms, discuss
the retrieved literature, any initial emergent themes and review the final report prior to
submission to the GMC. The Expert panel (Sam Regan de Bere, Suzanne Nunn, Mona Nasser,
Paul Lambe, Julian Archer, Martin Roberts, Tom Gale and Rebecca Pitt) have met to discuss
various stages of the review, including: feeding back on the research design; ratifying the
protocol; agreeing the selected academic literature; discussing themes emerging from the
literature; quality assessment and agreeing the structure of the final report.
During the process of the research the panel agreed that the retrieved literature was
representative of the field and that the search terms used had been appropriate. The panel
did not consider that there were any significant gaps in the literature: they suggested that,
rather than reinforcing extant knowledge by including the literature from other health
professions, the research team should concentrate on the emerging narratives and look to a
broader cultural literature to inform the socio/cultural and pedagogic narratives that were
emerging if required.
The panel did identify a lack of clarity in the terminology used in different studies across the
literature: in particular the words ‘performance’ and ‘attainment’ have been used
interchangeably. The panel suggested that, for the purposes of this review, the following
definitions should be applied: attainment would be used in reference to a direct
measurement, namely passing exams, whereas performance would refer to academic
performance as a process which implies a temporal element, with attainment being a
consequence of performance.
22
6 Research Ethics The research for this review is desk based and ethical permission was not required.
7 Data Analysis Initial database searches identified 3,044 potentially relevant documents. Duplicates were
removed (68) leaving 2,976 documents to be screened by title for possible inclusion in the
synthesis. Documents rejected at this stage, after exclusions were applied, were categorised
in case any gaps were identified in the literature and these documents needed to be
revisited. Ninety six documents were retrieved as papers for further review (10% of these
being checked by SRdB against the inclusion criteria). From this tranche 40 papers were
evaluated against the PICOC and SPICE frameworks, as described in the protocol, 8 failed on
one or more of the criteria, leaving 32 documents extracted for discussion by the expert
panel and potential synthesis. Following discussion a further three papers were added on
the advice of the expert panel from their subject knowledge and 4 papers were added as a
result of iterative searching of the reference lists in the papers identified for synthesis.
A total of 39 papers were included in the synthesis with the addition of 24 documents from
the grey literature. A flow diagram of the search process is shown in Figure 1 below.
23
24
The studies and other documents included in the synthesis use a variety of formats and
methodologies. Shown in Fig 2 below
Figure 2 Analysis of included studies and documents by methodology or type
Quantitative research is the dominant research methodology for published research.
Interestingly mixed methods research studies were only found in the grey literature. The
‘other category’ includes opinion pieces, letters and comment, conference and other
reports. Not surprisingly this is the area dominated by the grey literature.
Fifteen of the documents extracted from the grey literature were comment pieces in the
online medical news and media, Pulse (n = 4), BMA (n = 6), GPonline (n = 1), BMJ Careers (n
= 3) and Mancunian Matters (n = 1). The most disseminated document in the grey literature
was the AoMRC 2013-14 review (38) it was linked to the Royal College sites and returned as
a ‘hit’ when searching them. The document itself has little to say about differential
attainment: a short paragraph identifying the judicial review as a catalyst for AoMRCs
decision to “look at the wider question of differential attainment in medical education.” (38)
An examination of the dates of publication of the included documents testifies to a growing
interest in differential attainment. This is with the caveat that there is a time lag between
2
25
4
8
Published Grey
25
academic research and its publication that does not apply to online comment. But even
taking this into account a trend is clearly discernible.
Figure 3. Publication by date
Broadly speaking, the peaks of interest roughly coincide with significant changes to the
MRCGP in 2010 (specifically the CSA component), the publication of Esmail and Roberts
report in 2013 and the Judicial review in 2014.
8 Narrative synthesis A narrative synthesis does not begin with a set of a priori assumptions. Using this method
themes emerge as the literature is identified and reviewed. The first level of thematic
identification is descriptive and can be generated in a number of ways including coding
followed by conceptual mapping to help us think about the relationships between and
across the themes identified.
Using the themes coded in NVivo 10 we identified two key areas of interest that emerged
across the literature: high stakes exams and ethnicity.
Fig 4 shows a conceptual map of the relationship between high stakes exams and ethnicity
in the published literature, with the sub-themes or factors either identified or investigated.
0
2
4
6
8
10
12
14
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Published Grey
26
Figure 4. Conceptual map of themes identified in the published literature
N.B. the size of the ovals does not reflect significance of factor or quality of the research
The conceptual map, while amply demonstrating complexity, also provides a way of
populating a micro, meso, macro analytical framework that broadly relates to three key
levels of engagement: the individual or discrete group (student/s, doctor/s, examiners etc),
the institutional (medical school or work environment) and the level of policy (exams) .
9 Findings 9.1 The Individual or discrete group Although not discussing postgraduate education specifically, Schrewe makes a number of
insightful observations around the place of the individual in medical education and the
tension between the competing discourses of diversity (respect for the culture, gender and
ethnicity of individuals) and standardisation (uniformity and consistency).(39) Arguing that
these discourses need to be made explicit and “brought into the same conversation” in
27
order to enable students and trainers to achieve their full potential, Schrewe suggests that a
better understanding of the common qualities required and the extent to which individual
variation can be supported without detriment to the profession as a whole is the question
that needs addressing with some urgency.(39)
In this section of the findings we discuss themes identified in the literature pertaining to the
individual or discrete demographic group.
Study habits Woolf examined ‘study habits’ as part of wider research into ethnic underperformance in
Year 3 medical students using a questionnaire to assess surface, deep and strategic learning
processes. Deep learning is associated with an active search for meaning, whereas surface
learning is associated with memorising rather than understanding. (40) Woolf found that
minority ethnic students scored lower on deep learning study habits (p = .003) and higher
on surface learning study habits (p =.008) than their white peers. (20) Strategic learning,
where learners adopt the best learning style to fit with the needs of the task was identified
by Woolf as positive predictor of performance but was statistically related to other factors
including, living at home and having English as a first language. It is also important to
recognise that students should not be identified with a fixed approach to learning;
curriculum design, assessment and teaching style all encourage students to adopt a
particular approach. (41) This suggests potentially broader questions about ethnicity and
learning.
Psycho-social Psycho-social is a term used to describe an individual’s psychological development in, and
interaction with a social environment. In the literature on widening participation psycho-
social factors in relation to undergraduate degree choice are well documented. (42)
As part of a larger undergraduate research study Woolf examined personality types of white
and non-white students using an adaptation of the NEO-PI-R (43) to identify five personality
types (neuroticism, openness to experience, agreeableness, extraversion and
conscientiousness). The study of a total of 703 (51% minority ethnic) students found that
ethnic minority students were lower on the personality trait “openness to experience” (p =
28
0041) (20) but this was not found to have a negative effect on final year examination
performance.
Social and cultural capital The ‘standing’ that medical professionals have within different cultures has been shown to
have a significant effect on the choice of medicine as a career. A study linking informed
choice and academic success in Iranian medical students provides a useful international
review of studies that found many medical students having an “over-dramatized and
romanticized view of medicine at the beginning of academic studies”. (34) The Iranian study
used a multiple choice questionnaire (n = 2208) for final year medical students and found
that informed choice had a positive effect on attainment.
Success Esmail recommends that more research is undertaken into factors for ethnic students’
success. (28) Whilst not looking at ethnicity, we identified one small scale qualitative study
using interviews with 10 black male medical students and 3 black male physicians at Florida
State University College of Medicine to explore their perceptions of the factors contributing
to their success in being admitted to and graduating from medical school. (22) The study,
with its gender, geographical and numerical limitations, never the less presented an
interesting line of enquiry looking at contributors rather than barriers to attainment.
The study concluded that factors contributing to success were a balance between
educational experiences, exposure to medicine, psychosocial-cultural experiences (including
family and other support networks) and personal attributes. Participants in the research
specifically identified structured activities like enrichment programmes and outreach
programmes as significant. The Minority Association of Pre-Medical Students Programme
(MAPS) was an example cited by the study participants. MAPS provided opportunities for
networking with other premedical students, medical students and physicians and
importantly provided the opportunity for shadowing experiences.
We then looked at the undergraduate literature to see if there were any other studies that
looked at contributors. One qualitative study based in Saudi Arabia used focus groups to
understand 19 mixed gender high achieving medical students perceptions of factors
contributing to their success.(21) They identified learning strategies, resource management
29
problems i.e. stress.
In a study examining the differential achievement between white medical students and their
ethnic minority peers, Vaughan (19) used social capital theory to develop and analyse survey
data from medical students in the clinical phase of their training (n = 158). The research
found no link between ethnic and religious homophily and achievement. However,
interacting with problem-based learning group peers in study related activities and having a
wider academic support network were found to be directly linked to better achievement.
Vaughan concluded that ethnic homophile may cut minority students off from potential and
actual resources that facilitate learning and achievement. Therefore it is key that students
build wide relationships with colleagues at all levels of training.
Ethnicity The underperformance of ethnic minorities compared to their white peers across the higher
education landscape has been consistently identified.(44) (45) The studies discussed in this
review focusing on the performance of UK-trained medical students and doctors from
minority ethnic groups have corroborated broader HEFCE findings.(27)
Definitions of ethnicity are numerous and complex. In the UK studies we discuss in this
review ethnicity was either self-declared specifically for an individual study or was a
characteristic already identified in a data set being analysed.
All these papers evidence a mix of educational and social factors as
contributing to performance of individuals in addition to individual
characteristics
The literature examining contributors to success is important since by
only looking at why certain students might fail only tells half the story.
From the papers found, contributors to success seem to be international
but with such few studies the results are not generalizable.
30
Classification systems used in the research also varied, and included the 2001 UK census
guidelines, (19, 20, 26) individual Royal College geographical bands, (46) white and non-
white, (47) BME as an umbrella group, (5) (11) (24) categories approved by UK Commission
for racial equality, (9) GMC National Training Survey (23) which uses UK census categories.
Studies use different categorizations and therefore comparisons between studies can be
difficult. For example, Denney cites the conflation of all BME groups under one heading as a
limitation of the study but states that it was necessary in order to compare and contrast the
results with other studies and because the numbers were too small in some sub groups. (11)
Woolf adopts the same approach, arguing that ethnic categories are to an extent artificial
because they can never take into account the subtle variations between groups of people.(5)
There have been a number of key large scale quantitative research projects since the 1990s
focusing on ethnicity and differential attainment. The catalyst for this area of research was
the identification of a higher failure rate in clinical exams among non-white students at the
University of Manchester the 1995 (48): the leading researchers in the field in the UK are
Chris McManus, Katherine Wolf, Jane Dacre and Richard Wakeford.
In a systematic review of ethnicity and academic performance in UK undergraduate and
postgraduate medical students, Woolf found ethnic differences in attainment to be
widespread across different types of medical school and different types of exam at both
levels of study.(5) The review focused on quantitative reports that measured performance
and concluded that differential attainment was both “consistent and persistent”: but while
ethnicity was clearly related to exam performance the reasons for this were not clear.(5)
The first large scale longitudinal study exploring in depth a number of potential
psychological and demographic reasons for differential attainment in undergraduate and
postgraduate medical students was led by Katherine Woolf (20).
In contrast to the studies, focusing on measuring differences in attainment between
different groups, Woolf’s qualitative study (26) using focus groups and semi-structured
interviews (n = 27 medical students and 25 clinical teachers) followed earlier studies in the
US and examined the potential of stereotype threat to provide an insight into the identified
gap in attainment. Stereotype threat has been identified as a psychological phenomenon
31
whereby individuals who are members of a group characterized by negative stereotypes
perform below their actual abilities when group membership is emphasized. Woolf found
that negative stereotyping could impact on the relationship between lecturer and student
and therefore affect learning. She concluded that while a negative stereotype about an
ethnic group had “numerous implications for teaching and learning” the relationship was
neither simplistic or deterministic.(26) Woolf concluded that the student/teacher
relationship was “vital for clinical learning” in particular the negative Asian stereotype was
considered to be potentially jeopardising to Asian students relationship with their teachers.
Woolf recommends that employers should facilitate teachers in getting to know their
students as individuals. Although the study was limited to one London Medical School
stereotype threat is an interesting line of inquiry – not just relevant to ethnicity – for
example Burgess has studied gender in terms of stereotype threat in the context of career
advancement in Academic Medicine in the US. (49)
IMG IMGs are an important asset to the Health Service in the UK. In a review article in 2005
Sandhu opined that increasing numbers of IMGs would be needed to achieve the rapid
increase of workers needed as a result of legislation relating to the creation of a consultant
based service, and other working directives. (50)
Definitions of ethnicity are numerous and complex.
BME is a widely used term in public and private sector organisations to
incorporate a range of minority communities living in the UK. Such an
umbrella term has been critiqued in terms of the validity of grouping
together diverse groups in this way.
Conversely for quantitative studies broad terms may need to be used to
obtain statistically significant results
32
Sandhu raises the concern that this requirement combined with the UK being a very
attractive place for medical graduates to work and continue their training could encourage
an influx of inexperienced doctors or doctors having poor communication skills seeking
opportunities in the competitive specialities. Sandhu advocates that more realistic
information about postgraduate opportunities and training be available to enable potential
IMGs to make a more informed choice, but also praises the motivation and determination of
IMGs as a group.
A study in the US found great persistence on the part of IMGs in pursuit of a US residency
position.(51) The linked data study of a cohort comprising 10,328 IMGs who were both US
citizen IMGs and non-US IMGs highlighted the importance of IMGs to the delivery of
national healthcare.
In a large scale analysis of RCOG data Rushd undertook retrospective analysis on the
performance of IMGs who appeared for the first time in the Part 1 (n = 11,863) and Part 2
written (n = 5336) MRCOG examinations between 2000 and 2010. (46) Rushd’s evaluation of
the first time performance of IMGs in the MRCOG part 1 and 2 written examinations
critiques IMG as a category by identifying variation in performance between students across
the RCOG geographical bands.
Rushd was unable to perform statistical comparisons to the results of the study since
geographical bands are not comparable: they contain different countries, different
academic standards, different teaching methods etc. Rushd however, found that variation of
IMG performance was likely to be multifactorial and suggests that the introduction of e-
learning modules may “go some way in equalising the learning opportunities among
geographic regions and could prove useful for both trainers and trainees.” (46)
Aside from Illing’s study, discussed below, (2) we only found one qualitative study examining
barriers and facilitators encountered by IMGs. The study was situated in the Netherlands
and the findings related mainly to sociocultural rather than educational factors, including
being able to access information and financial support. (16) Lack of command of the Dutch
language (particularly the medical terminology) and age were seen as barriers to securing
employment and entrance to specialism. Age was only a barrier in some specialisms since
they set an upper age limit for postgraduate specialist training.
33
The study concluded that better support to overcome difficulties inherent to migration and
career change would result in better trained and acculturated doctors. The GMC has
recently undertaken some work in this area and developed a ‘Welcome to UK Practice
programme’ to raise awareness about practice in the UK. (52)
In contrast to Vaughan, who cautioned against homophily, (19) a presenter at the RCPsych
conference (2014) encouraged IMGs to join and become active in diaspora organisations,
thereby familiarising themselves with working in the NHS and broadening their network of
professional contacts. (53)
The RCPsych convened a conference in 2014 to focus on familiarising IMGs with working in
psychiatry in the UK. The conference was organised in recognition “that IMGs face more
problems than British graduates in succeeding in the system.” (53) The college is keen to
support IMGs by commissioning an external review of the MRCPsych exam and ARCPs and
appointing an Associate Dean for Trainee Support.
Feedback from the delegates was positive and the college plans to run another in
2015.There was a recognition by delegates of the importance of trainers, the role of
employers in developing meaningful induction programmes and giving IMGs additional
support and remediation if required. Among the recommendations proposed at the
conference were that the College appoint local and national IMG Champions and improve
examiner training to help recognise unconscious biases (accents, manner etc).
IMG is a category that needs to be problematized and properly defined.
The literature identifies IMGs as increasingly important internationally to the
delivery of healthcare.
IMGs are noted for their persistence and tenacity in pursuing postgraduate
qualification.
IMGs face the inherent difficulties of migration and acculturation. These include
language, accessing information, financial support and limited knowledge of the
healthcare system.
34
Language A number of studies discussed language either as a sole focus or as part of a number of
compounding factors. Woolf’s longitudinal study using exam data and questionnaires over
two consecutive year 5 cohorts (n = 703: 51% minority ethnic) found that speaking English
as a first language, with one parent also speaking English as a first language and being
schooled in the UK, was a predictor of good performance in final year UCL medical students.
However not having this level of English was not the reason why minority ethnic students
underperformed. She suggests that where examinations like the OSCE require
communication skills “country of schooling could be a proxy for communication or cultural
differences.”(20)
This finding concurs with those of Watmough (21), discussed above, who was also unable to
identify language as a determining factor in success in the RCA postgraduate examination.
The most significant study exploring language and cultural factors, was undertaken by
Roberts and funded by the Economic and Social Research Council (ESRC). (3) This study
used a sociolinguistic methodology to examine both how candidates performed in the RCGP
exam but also how the specific conditions of the exam operated to determine behaviour.
In specific relation to the CSA, but with wider implications for other practical exams in both
undergraduate and postgraduate contexts, Roberts’ study found the “relatively
decontextualized nature of the CSA made it a ‘talk-heavy’ assessment from which a number
of effects flow”. These include “communicative performance factors’ which relate to how
IMGs talk and interact with role playing patients, examiner perceptions of candidates
sounding formulaic and not engaging with the patient through a patient centred model.”
The researchers suggest that the sociolinguistic “fingerprint” of the exam which assumes a
patient centred approach could constitute a “hidden curriculum.”(3)
The study concludes that “Rather than talk of ‘cultural bias’ or not, there needs to be a
debate about tolerances and communicative flexibility, about what are acceptable
competencies in an increasingly diverse society and how, within these competencies, talk
and interaction can be more explicitly addressed. ‘Cultural bias’ implies that there is a goal
of neutrality that must be reached and that there is one ‘culture’, one way of doing
things.”(3)
35
Memon argues that oral examination is an important element of postgraduate examinations,
but ensuring its reliability and validity across specialisms is complex to design and
implement. (35) Memon cites the work commissioned by the RCGP in this area of
postgraduate examination as an example of good practice in providing an evidence base for
the validity and reliability of the oral elements of their exam. Memon cautions that IMGs
taking exams in other specialities may be disadvantaged if their English is less fluent and
articulate than UK trained candidates.
Knight, an MRCGP examiner, argues in an editorial piece that while there is evidence that
the MRCGP is reliable; IMGs are prone to failure because the exam is in English and they
spend much of their practise consulting in other languages. (36) Aside from language Knight
also cites other factors that may impact on IMG success in the MRCGP, including differing
clinical environments in the UK from the one in which they trained and that they may spend
much of their consulting time in the UK speaking in a language (or languages) other than
English.(36) Knight with Roberts identify the failure to acknowledge or assess multilingual
expertise, which both see as an asset in an increasingly diverse UK society.
The specialities with the highest proportion of IMG candidates are the MRCGP and the
MRCP (particularly psychiatry). (45) These specialities require significant levels of cultural
awareness and advanced communication skills, both of which may place IMG students at a
disadvantage. (17)
Issues around IMG students and language are not unique to the UK, but also evident in
other countries where there are minority groups. (54)
While language may be a predictor of good performance it is not, of itself, the reason why students fail.
Language is often conflated with sociolinguistic performance.
There is currently no acknowledgement or assessment of multilingual expertise.
36
Gender Two papers (both American) compared female attainment against male attainment in
obstetrics and gynaecology (Obs/Gyn) (55, 56). Both studies conclude that women
outperformed men in the Obs/Gyn) specialism.
Bibbo’s study found that on the pre-clerkship measures MCAT men outperformed women,
but on the overall clerkship scores women outperformed men. This was due to womens’
higher achievement on the standardised National Board of Medical Examiners (NBME)
subject examination. Drawing on other literature a number of proposals were made as to
why this might be the case, including men being less interested in the specialism and
consequently less motivated, combined with the perception that patients prefer a female
physician. Women in contrast being potentially more motivated because they want to enter
this specialism due to gender identification, and the dominance of women already in the
field.(55)
Cuddy’s study on examinee gender and United States medical Licencing Exam (USMLE)
performance also found men outperforming women at Clinical Knowledge (CK) step 1 of the
exam but with women outperforming men at CK Step 2 (clinical skills), and with women out
performing men in most content areas of obs/gyne, paediatrics and psychiatry: in contrast,
men out performed women in medicine, surgery and preventative medicine.(56)
In a Norwegian study of 2474 Norwegian residents who began specialization in 1999-2001
(36), Johannsen found that although women progressed more slowly than men, the gender
variation was not significant when the effects of child-birth and having children under 18
were controlled for. But gender was found to have a strong influence on choice of speciality
due to longer required working hours, for example in emergency services.
In combination these studies identify a gender split in specialisms, for example the dominance of women in Obs/Gyne.
Identified gender differences in exam performance may potentially be linked to gender motivation to succeed in specific specialisms and/or gender identification with certain specialisms
Studies suggest that changes to the hospital environment, working practices and cultures could encourage a more even gender split across the specialities.
37
9.2 The institutional
The Medical School and the working environment In a Norwegian study (36), Johannsen looked at hospital specific factors in speciality choice
and qualification. The study found that hospital factors were significant predictors for the
participants (n = 2474) timely attainment of specialization. Working at university hospitals
(regional) or central hospitals was associated with a reduction in the time taken to complete
the specialization, “whereas an increased patient load and less supervision had the opposite
effect.” Johannsen’s study suggested that more flexibility in the curriculum would be
beneficial.
Illing, using quantitative and qualitative data, describes how senior overseas doctors who
come to the UK with established clinical practices may find adapting to a different
workplace culture difficult and not have access to the support available to less experienced
doctors.(2) IMGs may also find difficulties understanding roles and responsibilities in the
NHS structure in addition to patient-centred culture and a holistic model of care. (2)
Two studies identified a need for a greater emphasis on Equality and Diversity and cultural
awareness in training within organisations with targeted events and diversity initiatives used
as opportunities. (3, 4)
As part of McManus’s data linkage study into PLAB and UK graduates performance on
MRCP(UK) and MRCGP examinations, a comparison between graduates from different UK
medical schools was performed. (7) The study found “clear and large differences in
performance at MRCP(UK) between graduates of different medical schools.” (7) However,
the study concluded that the identified differences in training could not account for the
poorer performance of IMGs.
Esmail advocates examining the distribution of IMGs and BME doctors across UK medical
schools in order to ascertain if the selection and training placement processes could operate
against the interests of weaker candidates, thus encouraging a cycle of educational
deprivation. (10) This observation is supported by Tiffin. (23)
38
Mentoring There is a significant body of literature around mentoring for medical students and doctors
at all levels of study, with the majority of studies being undertaken in the USA. (18) Frei’s
review concludes that mentoring is “an important career advancement tool for medical
students” and that more programmes should be set up in Europe, but monitored and
assessed for impact.(18)
In terms of mentoring in the context of postgraduate medical training the literature is not
well developed, although there is support from the Royal colleges and the NHS generally.
(57) (58) Stamm’s study examining mentoring as part of a developmental network, set in
Switzerlamd, found that only 50% of doctors undergoing specialist training (n = 326) took
advantage of mentoring despite the positive benefits identified and of those, females
received less mentoring than their male colleagues. Reasons for this gender gap were
identified as primarily due to extraprofessional concerns. Stamm concludes that given the
often less straightforward career path for females mentoring is particularly important.(59)
Steven’s qualitative study over six NHS sites, identified benefits across the professional-
personal interface. Steven suggests that successful mentoring makes doctors feel more
confident and satisfied in their work, and this will have beneficial impacts for organizations.
(60)
Selection The literature on postgraduate selection is less developed than that of undergraduate
selection into medical school. The UK general practitioner selection process, uses a national
machine markable shortlisting test to assess both cognitive and non-cognitive skills and a
‘corporately owned’ and validated selection methodology. Plint (42) summarises the success
of the process and the confidence in it from both students and deaneries as: “Corporate
commitment to national process; legitimate authority and locus of control; process of
incremental convergence, rather than imposition; development and adoption of validated
selection method; representative infrastructure operating the process.”
McManus, undertook a significant study examining the educational background and
qualifications of UK students from ethnic minorities and the selection for medical school.
(47) The study addressed the assumption that entrants to medical school are equivalent in
39
their academic ability and that following on from this differential attainment in
undergraduate medical exams and beyond were accounted for at some point after selection
to medical school. The study found however, that non-white students had slightly lower A
level and GCSE grades than their white peers. Concluding that while GCSE and A level grades
might explain some of the effects found, they could not entirely explain the poorer
performance of non-white students at medical school and beyond.
Citing the GP selection process as an example, Paterson recommends a more robust
selection process.(4)
9.3 Policy
Predictors of success at postgraduate level Woloschuk’s small scale study (n = 244 medical graduates) at the University of Calgary,
Canada found that measures of undergraduate performance seemed to be poor predictors
of postgraduate success. In particular they found a ‘weak’ relationship between
performance in the Medical Council of Canada (MCC) national licencing exam, which they
describe as a “rite of passage” to postgraduate training, and residency. They suggest that
success may be due to non-cognitive attributes for example, work ethic, personality and
motivation (61).
Understanding workplace culture is important.
Mentoring programmes are beneficial but they need to be robust and evaluated to ensure they are effective.
The literature on postgraduate selection is less developed than that of undergraduate selection into medical school.
Prior attainment could not entirely explain the poorer performance of non-white students at medical school and beyond.
Selection processes for postgraduate study are highly variable.
40
PMQ In a high quality comparative study of UK trained doctors and those whose PMQ was gained
outside the UK sitting the RCA exam from 1999-2008, Watmough (62) found that candidates
from Egypt, Iraq, Ireland and Pakistan performed significantly worse than those from
Australia, New Zealand, South Africa, Zimbabwe and the UK. From June 1990 to February
2008, there were 9,315 attempts at the MCQ by 5,797 graduates from 70 countries, with 25
countries having candidates who made 15 or more attempts. The analysis was undertaken
using data from the written part of the exam which uses multiple choice questions to test a
range of generic clinical skills. The MCQ is a high stakes exam essential for career
progression to consultant level. The study did not find a coherent pattern to attainment and
concluded that “some IMG graduates who sit UK postgraduate exams may require
additional support prior to taking the exam.” Importantly, the underperformance of
students from Ireland and Pakistan, where English is the main language in medical
education, indicates that language is not a key factor in differential attainment in this exam.
The authors suggest that rather than language it may be that cultural ties ease the transition
of working in the UK, however the poor performance of candidates from the Republic of
Ireland casts doubt on this supposition.
High stakes examinations High stakes exams all contain a number of components, assessment of practical skills using
‘real’ or simulated clinical scenarios, multiple choice, written, oral – different elements of
the exam are marked in different ways: by computer, by examiner and by assessment of
skills.
It is important that the transparency and fairness of ‘high stakes’ exams be demonstrated
given the influence they have on a doctor’s career progression and employment
opportunities. Memon, in relation to the specifics of oral examination in postgraduate
examinations argued for the Royal Colleges to undertake much more rigorous validity and
reliability testing on their high stakes exams. (63)
Wakeford’s large scale assessment of validity and differential performance by ethnicity in
the RCGP and MRCP(UK) examinations followed in the wake of a Judicial Review.(24) It
sought to evaluate if the performance of candidates in the MRCP(UK) was predictive of their
41
attainment in the MRCGP (usually taken 3-4 years after). The study found substantial
correlations between a candidates performance in the two exams which provides support
for the validity of each. (24)
Wakeford identified a higher correlation between PACES and the new CSA than the old,
suggesting that the new CSA is a more valid assessment. (24) in addition the study found
that in particular BME candidates showed a higher correlation between PACES and the CSA
than white students “suggesting that there is less extraneous variance between BME
candidates making it a more valid assessment.”(24)
MRCGP and MRCGP Clinical Skills Assessment (CSA)
The CSA exam was revised in the autumn of 2010 to improve the reliability of the
assessment. In Esmail and Roberts key study, using previously unavailable data from the
GMC and the RCGP, they examined ethnic minority candidates performance in the MRCGP
exams between 2010 and 2012: therefore testing the new CSA.(28)
The headline conclusion was that “subjective bias due to racial discrimination in the clinical
skills assessment may be a cause of failure for UK trained graduates and international
medical graduates.”(10)
Discrimination is an incendiary term. Judith Hawkins writing in Mancunian Matters
explained the format of the CSA to its non-medical audience and outlined Esmail’s findings.
The article stimulated a heated online debate among readers who were only too willing to
support claims of racial discrimination.(64)
Esmail and Roberts suggested that the different training experience and other cultural
factors (patient/doctor relationship and proficiency in spoken English for example) between
UK and non-UK trained candidates could affect exam outcomes. However they did not
consider that these cultural factors could entirely account for differential attainment
between white and BME UK trained candidates. It was suggested that discrimination could
occur at a number of points in the CSA: the behaviour of standardised patients to white and
non-white candidates and bias on the part of the examiners.(10)
McManus’s study (7) leads on from this study by Esmail and Roberts (10) although there are
significant differences between the two: McManus’s study analyses PLAB part 1 (which
42
Esmail does not) and it analyses a larger dataset (n = 7,829) from MRCP(UK) compared to (n
= 5,055 candidates + 1,175 not trained in the UK). Both studies use candidates’ marks at 1st
attempt for all analysis. McManus’s study found that IMGs lower performance was
“unlikely to result from systemic examiner bias or discrimination.”
Knight states in an opinion piece, that while there is evidence that the MRCGP is reliable
IMGs are prone to failure because the exam is in English and they spend much of their
practise consulting in other languages. (15)
The CSA exam was revised in the Autumn of 2010: the new CSA has been found not to
discriminate between white and BME candidates. (24) However, the CSA will inevitably carry
implicit cultural association’s specific to UK medicine. Esmail states that the CSA is not, and
was not intended to be a culturally neutral exam. Therefore UK graduates are likely to be
initially more successful, because they are acculturated. (28)
The CSA was consistently identified in the medical news media as a particular issue for IMGs.
Commentary was prompted by both Esmail’s study (65, 66) and the judicial review. (67) (68)
MRCOG
The MRCOG is an internationally recognised standard and at the time of Rushd’s study more
than 85% of the total candidates were IMGs. The study found that MRCOG examination
success rates were significantly different according to the university of medical graduation.
Rushd also identified a variation in performance among graduates from different medical
schools in the Part 1 and 2 of the MRCOG written examination which was comparable to
those school’s performances on the MRCP (UK). (46)
PLAB and IELTS
If IMGs are going to sit the PLAB they need to demonstrate that they have achieved an
acceptable level of English via IELTS in the previous two years. PLAB was reviewed in 2011 to
assess whether the knowledge and skills demonstrated by passing the PLAB continued to be
equivalent to those demonstrated by an F1 doctor. A key component of this review was to
examine any disparity between IMGs, who successfully passed the PLAB test, and their UK
graduate peers in postgraduate examinations.(7) Aside from difficulties relating to direct
comparison the study concludes that there are good correlations between PLAB and the
43
MRCP(UK) and MRCGP which means that PLAB is a valid assessment of skills relevant to
progression during UK postgraduate training. It should be noted however, that PLAB is not
designed to predict postgraduate exam performance, or to ensure that those passing PLAB
can achieve at postgraduate level.
In order to produce outcome equivalence between IMG and white graduates it was
suggested by McManas that the PLAB pass mark could be set higher – however this would
have significant impacts on health service delivery.(7)
In Tiffin’s study of UK based trainee doctors with at least one competency related ARCP
related outcome (n = 53,463 of whom 11,419 were IMG registered following a pass from the
PLAB route) in the study period also found that the PLAB test was not generally equivalent
to the requirements for UK graduates. With the standard of English competency and the
PLAB pass mark needing to be raised to ensure equity.(23) Tiffin also discusses how PLAB
candidates with lower scores may not be able to secure a post in their preferred specialism
and therefore successfully apply for “shortage specialities” like psychiatry and general
practice. Given that these specialisms require enhanced communication skills some IMGs
may immediately be disadvantaged. (23)
Sandhu notes that the requirement to pass these exams cuts into IMGs time and can cause
the erosion of time for research resulting in IMGs CVs being weak in publications which can
impact on them being shortlisted for jobs in spite of clinical experience.(50)
Much of the research into differential attainment is quantitative with a focus on testing for
bias in the exam, or a component part of it using exam data. Taken as a whole these studies
have broadly demonstrated the validity of some high stakes exams and discounted evidence
of bias in the exams themselves leading to differential attainment. This view is endorsed by
Patterson with the caveat that it is not an endorsement of all assessment tools.(14)
Examiner bias Examiner bias in relation examinations like the MRCGP and the MRCP in which candidates
are judged ‘live’ and therefore examiners can identify a candidate’s gender and ethnicity has
been frequently questioned (11) (8, 9).
44
Examiner bias is a potential risk in any examination and a threat to the validity of an
examination. The first study in this area by Dewhurst (9) focused on the MRCP(UK) and
found any potential examiner prejudice was only significant when two non-white examiners
examined a non-white candidate. This Dewhurst suggested was not conscious and may
relate to a consistency in communication style and cultural understanding.
McManus’s investigation into possible bias as a threat to the validity, used data from
MRCGP(UK) PACES and nPACES examinations.(8) The study found that having two
independent examiners reduced any potential for bias and judged it a preferable method of
assessment over a single examiner. This is an example of how the infrastructure around an
exam can potentially impact on the outcome for the candidate
Denny’s study to investigate potential examiner bias as responsible for differential
attainment in the MRCGP CSA, found no evidence to support examiner bias. Finding that
differential attainment was linked to the candidates’ demographic rather than the
examiners. (11)
In a letter to the BMJ, Shaw opined that the new CSA high failure rate of ethnic candidates
was an unintended consequence of selection and examination