BMJ Open...Ahmed F (1) Lyratzopoulos G (1) Cambridge Centre for Health Services Research, University...
Transcript of BMJ Open...Ahmed F (1) Lyratzopoulos G (1) Cambridge Centre for Health Services Research, University...
For peer review only
Accuracy of routinely recorded ethnic group information compared with self-reported ethnicity: Evidence from the
English Cancer Patient Experience survey
Journal: BMJ Open
Manuscript ID: bmjopen-2013-002882
Article Type: Research
Date Submitted by the Author: 14-Mar-2013
Complete List of Authors: Saunders, Catherine; University of Cambridge, Abel, Gary; University of Cambridge, El Turabi, Anas; University of Cambridge,
Ahmed, Faraz; University of Cambridge, Lyratzopoulos, Georgios; University of Cambridge,
<b>Primary Subject Heading</b>:
Health services research
Secondary Subject Heading: Health informatics
Keywords: Ethnicity, Hospital Records, Quality in health care < HEALTH SERVICES ADMINISTRATION & MANAGEMENT, Equality
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open on A
ugust 25, 2021 by guest. Protected by copyright.
http://bmjopen.bm
j.com/
BM
J Open: first published as 10.1136/bm
jopen-2013-002882 on 28 June 2013. Dow
nloaded from
For peer review only
1
STROBE Statement—checklist of items that should be included in reports of observational studies
Item
No Recommendation
Title and abstract 1 (a) Indicate the study’s design with a commonly used term in the title or the abstract
Study design is stated in title and in the design section of abstract
(b) Provide in the abstract an informative and balanced summary of what was done
and what was found
Please see Methods/Results section of abstract
Introduction
Background/rationale 2 Explain the scientific background and rationale for the investigation being reported
Given in the background section of the paper - specifically – previous studies of the
quality of coding in routine English NHS records have focused on completeness
rather than accuracy and so this work attempts to fill this research gap
Objectives 3 State specific objectives, including any prespecified hypotheses
Specifically stated as the last line of the introduction
Methods
Study design 4 Present key elements of study design early in the paper
Stated in the abstract and further presented in the methods
Setting 5 Describe the setting, locations, and relevant dates, including periods of recruitment,
exposure, follow-up, and data collection
Stated in the abstract and further presented in Methods (Data)
Participants 6 Cross-sectional study—Give the eligibility criteria, and the sources and methods of
selection of participants
Stated in the abstract and further presented in the methods (data) flowchart (figure 1)
Variables 7 Clearly define all outcomes, exposures, predictors, potential confounders, and effect
modifiers. Give diagnostic criteria, if applicable
Please see Methods, analysis (p9, line 4 onwards)
Data sources/
measurement
8* For each variable of interest, give sources of data and details of methods of
assessment (measurement). Describe comparability of assessment methods if there is
more than one group
Please see Methods, analysis (p9 line 4 onwards)
Bias 9 Describe any efforts to address potential sources of bias
Describing measurement bias in hospital record ethnicity coding is an outcome for
this study
In Methods, analysis, we describe our approach to potential clustering by hospital
Study size 10 Explain how the study size was arrived at
Given in Methods, Data and the flowchart in figure 1
Quantitative variables 11 Explain how quantitative variables were handled in the analyses. If applicable,
describe which groupings were chosen and why
All variable groups are explained and described in methods, analysis
Statistical methods 12 (a) Describe all statistical methods, including those used to control for confounding
Described in methods, analysis
(b) Describe any methods used to examine subgroups and interactions
No subgroup analyses considered
(c) Explain how missing data were addressed
Page 1 of 30
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
2
We performed these analyses (considering the small proportion of cases with missing
ethnicity coding) and mention these results in our discussion. These analyses were
extensive but not the main focus of this paper and so we did not include them as a
substantial research output in this paper
Cross-sectional study—If applicable, describe analytical methods taking account of
sampling strategy
Discussion of the generalizability of the sample was included in the discussion.
Clustering at the hospital level was considered in the analysis, looking at results with
and without a random effect for hospital.
(e) Describe any sensitivity analyses
We repeated the analyses on data from 2011/2, but as information on deprivation for
these years was not recorded and the results were similar across both years we only
present the results for 2010 (including the data from 2011/2 in the probability lookup
table calculation only)
Discordance of ethnicity using ONS 16 and ONS 6 categorisations is also presented
(the latter appendix).
Continued on next page
Page 2 of 30
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
3
Results
Participants 13* (a) Report numbers of individuals at each stage of study—eg numbers potentially eligible,
examined for eligibility, confirmed eligible, included in the study, completing follow-up, and
analysed
Figure 1
(b) Give reasons for non-participation at each stage
Figure 1
(c) Consider use of a flow diagram
Figure 1
Descriptive
data
14* (a) Give characteristics of study participants (eg demographic, clinical, social) and information
on exposures and potential confounders
Given as part of table 1
(b) Indicate number of participants with missing data for each variable of interest
Methods (data) section and figure 1
Outcome data 15*
Cross-sectional study—Report numbers of outcome events or summary measures
Table 1, given in the text of the results
Main results 16 (a) Give unadjusted estimates and, if applicable, confounder-adjusted estimates and their
precision (eg, 95% confidence interval). Make clear which confounders were adjusted for and
why they were included
Table 1; crude and adjusted results presented with 95% confidence intervals.
(b) Report category boundaries when continuous variables were categorized
Done so for age in table 1.
(c) If relevant, consider translating estimates of relative risk into absolute risk for a meaningful
time period
Not applicable
Other analyses 17 Report other analyses done—eg analyses of subgroups and interactions, and sensitivity
analyses
Results from sensitivity analyses (considering different categorisations of ethnicity) presented
in the appendix.
Discussion
Key results 18 Summarise key results with reference to study objectives
In first paragraph of discussion
Limitations 19 Discuss limitations of the study, taking into account sources of potential bias or imprecision.
Discuss both direction and magnitude of any potential bias
In discussion
Interpretation 20 Give a cautious overall interpretation of results considering objectives, limitations, multiplicity
of analyses, results from similar studies, and other relevant evidence
In discussion
Generalisability 21 Discuss the generalisability (external validity) of the study results
Discussed as in the discussion section of this paper, and also as part of the last bullet point in
‘Article Summary’.
Other information
Funding 22 Give the source of funding and the role of the funders for the present study and, if applicable,
Page 3 of 30
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
4
for the original study on which the present article is based
In acknowledgements section
*Give information separately for cases and controls in case-control studies and, if applicable, for exposed and
unexposed groups in cohort and cross-sectional studies.
Note: An Explanation and Elaboration article discusses each checklist item and gives methodological background and
published examples of transparent reporting. The STROBE checklist is best used in conjunction with this article (freely
available on the Web sites of PLoS Medicine at http://www.plosmedicine.org/, Annals of Internal Medicine at
http://www.annals.org/, and Epidemiology at http://www.epidem.com/). Information on the STROBE Initiative is
available at www.strobe-statement.org.
Page 4 of 30
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
1
Accuracy of routinely recorded ethnic group information
compared with self-reported ethnicity: Evidence from the
English Cancer Patient Experience survey
Saunders CL (1)
Abel GA (1)
El Turabi A (1)
Ahmed F (1)
Lyratzopoulos G (1)
Cambridge Centre for Health Services Research, University of Cambridge, Institute of Public
Health, Forvie Site, Robinson Way, Cambridge, CB2 0SR
Author for correspondence:
Catherine Saunders
Mail to: [email protected]
Word count: 295
Abstract: 335 words
Total main text: 2,621
Page 5 of 30
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
2
ABSTRACT
Objective: To describe the accuracy of ethnicity coding in contemporary NHS hospital
records compared to the ‘gold standard’ of self-reported ethnicity.
Design: Secondary analysis of data from a cross-sectional survey (2011)
Setting: All NHS hospitals in England providing cancer treatment
Participants: 58721 patients with cancer for whom ethnicity information [Office for
National Statistics (ONS) 2001 16-group classification] was available from self-reports
(considered to represent the ‘gold standard’) and their hospital record.
Methods: We calculated the sensitivity and positive predictive value (PPV) of hospital record
ethnicity. Further we used a logistic regression model to explore independent predictors of
discordance between recorded and self-reported ethnicity.
Results: Overall, 4.9% (4.7-5.1%) of people had their self-reported ethnic group incorrectly
recorded in their hospital records. Recorded White British ethnicity had high sensitivity
(97.8% (97.7-98.0%)) and PPV (98.1% (98.0-98.2%)) for self-reported White British ethnicity.
Recorded ethnicity information for the 15 other ethnic groups was substantially less
accurate with 41.2% (39.7-42.7%) incorrect. Recorded ‘Mixed’ ethnicity had low sensitivity
(12%-31%) and PPVs (12%-42%). Recorded ‘Indian’, ‘Chinese’, ‘Black-Caribbean’ and ‘Black
African’ ethnic groups had intermediate levels of sensitivity (65%-80%) and PPV (80%-89%
respectively). In multivariable analysis, belonging to an ethnic minority group was the only
independent predictor of discordant ethnicity information. There was strong evidence that
the degree of discordance of ethnicity information varied substantially between different
hospitals (p<0.0001).
Discussion: Current levels of accuracy of ethnicity information in NHS hospital records
support valid profiling of White / non-White ethnic differences. However profiling of ethnic
Page 6 of 30
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
3
differences in process or outcome measures for specific minority groups may contain a
substantial and variable degree of misclassification error. These considerations should be
taken into account when interpreting ethnic variation audits based on routine data and
inform initiatives aimed at improving the accuracy of ethnicity information in hospital
records.
Page 7 of 30
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
4
ARTICLE SUMMARY
Article focus
• Accurate recording of ethnicity in administrative health data is a prerequisite for efforts to
ensure equality or reduce inequalities in health care.
• This paper describes the accuracy of ethnicity coding in English NHS hospital records
compared to the ‘gold standard’ of self-reported ethnicity, and identifies areas where
improvement is needed.
Key messages
• Hospital records will usually code the ethnicity of patients with self-reported White British
ethnic group correctly, but levels of incorrect coding of the ethnicity of all ethnic minority
groups are high
• Belonging to an ethnic minority group is the only independent predictor of having an
incorrectly coded hospital record ethnicity; there is substantial variation in the quality of
coding between hospitals
• Probability look-up tables provided in this paper can be used for weighting of incidence or
prevalence estimates where hospital record ethnicity is being used, or in regression
analysis to improve estimation of ethnic variation in processes or outcomes of care
Strengths and limitations
• This was a unique opportunity to carry out an audit of the accuracy of hospital record
coding of ethnicity in England, using data from a large national survey of recently treated
patients.
Page 8 of 30
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
5
• We were not able to account for different processes by which ethnicity was
ascertained in hospital records and the study population was skewed towards older
ages which may limit the generalisability of the findings.
Page 9 of 30
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
6
BACKGROUND
Modern health care systems aspire to equal access and quality of care for patients of any
ethnic group 1,2
. In England, there are nevertheless well-documented ethnic inequalities in
processes and experiences of care 3,4
. Good practice in measuring these inequalities is
needed 5. Therefore, the availability of complete and valid information on patients’ ethnicity
in routine NHS data is a fundamental first step to enable equality audits to inform
improvement actions. Further, there are variations in the incidence and prevalence of some
conditions between different patient ethnic groups. Understanding such variations can be
vital for service planning and required capacity estimates. National Health Service hospitals
began to routinely record ethnicity information in their Patient Administration System (PAS)
records in the mid-1990s 6. (Patient Administration System records are a pre-cursor to
Hospital Episode Statistics (HES) data – hereafter we refer to hospital records as ‘HES
records’ for simplicity, as this is the more commonly used convention to denote patient
administrative data in the NHS). Although implementation of this measure has been slow 7,
HES data in recent years have high completeness of ethnic group information (typically
exceeding 90%) 6,8
. There is however little evidence about the accuracy of routinely recorded
information on patients’ ethnicity. In the past, audits of the quality of ethnicity information
in NHS records have principally focused on data completeness, rather than accuracy 6 and
completeness of ethnicity coding information currently forms part of the Commissioning
Outcomes Data Set 6, a prescribed standard for data quality that is linked to hospital
reimbursement.
Page 10 of 30
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
7
Ethnicity can be classified by referring to a community of people who share the same
culture, and/or by referring to an ancestral population which comprises their self-identity 9.
Self-reported ethnicity captures both the shared experiences/culture of an individual and
their self-identity. Methods such as surname recognition algorithms or geo-coding (or
combinations of both these approaches) have been developed to indirectly infer the
ethnicity of individuals10–15
, but both have limitations that do not apply to self-reported
ethnicity. For example, geo-coding methods rely on high levels of geographical (residential)
segregation between different ethnic groups. Surname recognition methods rely on low
levels of inter-communal marriages and the existence of distinctive surname nomenclatures
(which do not always exist for some ethnic groups, e.g. Black populations in the United
States). Further, by definition indirect methods will misclassify the ethnicity of some
patients, and disproportionately do so for the ethnicity of people from ethnic minorities. For
all the above reasons self-report is currently considered the gold standard measure of
ethnicity 16,17
.
Patient surveys, when linked to routine ethnicity data, provide opportunities to determine
the accuracy of ethnic group information contained in routine health system records. As this
linkage is rarely done, few studies have cross-examined the accuracy of ethnic group
information included in routine healthcare records against self-reported ethnicity
information. Those that have examined this have been conducted in the USA 18–20
.
Against this background and using data from an English national patient survey we aimed to
examine the overall accuracy of recorded versus self-reported ethnicity; and whether
discordance between recorded and self-reported ethnicity information varied between
patients of different ethnic groups.
Page 11 of 30
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
8
METHODS
Data
We used publicly available anonymous data from the 2010 Cancer Patient Experience Survey
(CPES) in England 21
. An unusual feature of this survey is that hospital recorded ethnicity was
collected when compiling the sample and linked to patient responses by the survey provider.
All patients treated for cancer in an English NHS hospital during the first quarter of 2010
were invited to participate in the survey, with a response rate of 67% 21
. Of 67713
respondents, 64418 (94.7%) provided valid self-reported ethnicity information. Because self-
reported ethnicity was used as the gold-standard against which the accuracy of HES-
recorded ethnicity was compared, data on respondents with missing self-reported ethnicity
were excluded from further analysis (see figure 1). Data on an additional 348 respondents
with missing information on deprivation status were also excluded from further analysis,
leaving 63,770 respondents. An additional 5049 records with missing HES-recorded ethnicity
were excluded, leaving 58721 records for analysis. For all respondents included in our
analyses completely observed data were available for patients’ HES-recorded age, gender
and socio-economic status (using Index of Multiple Deprivation (IMD) score of lower super
output area of residence). In both HES-records and patient reports, ethnicity was classified
using the same 16-group categorisation [Office of National Statistics (ONS16) 2001]
(Appendix 1).
Analysis
We first described the overall degree of discordance between HES-recorded and self-
reported information on ethnicity using the Cohen’s Kappa statistic. We further calculated
the sensitivity and the positive predictive value (PPV) of HES-recorded ethnicity in respect of
self-reported ethnicity. In this context, sensitivity denotes the proportion of patients with a
Page 12 of 30
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
9
given self-reported ethnicity with concordant ethnicity information in their HES record; and
PPV denotes the probability that a patient with a particular HES-recorded ethnic group will
self-report the same ethnic group.
Subsequently, we explored independent predictors of discordant ethnicity information by
constructing a multivariable logistic regression model with concordant/discordant ethnicity
status as the binary outcome variable and self-reported ethnicity as a covariate. Adjustment
was also made for other patient socio-demographics characteristics (age in 10 year age
groups, gender and socio-economic status). For age and gender we used HES-recorded
information because the completeness of these variables among survey respondents was
higher than self-reported age and gender, and as the degree of concordance between HES-
recorded and self-reported age (based on year of birth) and gender were very high (>99.5%
for both).
To explore whether clustering of patients in some groups in hospitals with higher or lower
discordance levels could in part explain the findings, we subsequently repeated the
regression model described above including a random effect for hospital – in addition, this
model allows us to explore the degree of variation in the level of ethnicity information
discordance between different hospitals. Because less detailed classifications of ethnicity are
often used in health research, we also carried out supplementary analysis looking at
discordance when using 6 (as opposed to 16) ethnic groups (i.e. White, Mixed, Asian or Asian
British, Black or Black British, Chinese and Other).
Lastly, we constructed a probability ‘look-up’ table indicating the probability that HES-
recorded ethnicity represents true (self-reported) ethnicity for each of the 16 different
ethnic groups. These probabilities can be used for weighting of incidence or prevalence
estimates or used in regression analysis to improve estimation of ethnic variation in
processes or outcomes of care 14,22
. For the calculation of the probability `look-up’ table only,
Page 13 of 30
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
10
we combined data from two surveys (2010 and 2011/12) to improve the precision of the
probabilities presented (increasing our sample size to 133,204). STATA 11 was used for all
analyses.
Page 14 of 30
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
11
RESULTS
Overall the frequency of discordance of HES-recorded and self-reported ethnicity
information was 4.9% (4.7-5.1%). Patients who identified themselves as White British had
the lowest frequency of discordant ethnicity information in their HES records (2.2% (2.0-
2.3%)). In contrast, patients who identified themselves as belonging to any other ethnic
group had substantially higher frequency of discordant HES-recorded ethnicity (41.2% (39.7-
42.7%)). Cohen’s Kappa for HES-recorded and self-reported ethnicity was 0.64 overall and
0.54 if White British patients were excluded. The frequency of discordance was particularly
high for patients who self-reported that they belonged to the Any Other Black Background
(90% (97.4-100%)) and the Any Other Mixed Background (87.8% (78.2-97.3%)) groups (Table
1).
HES-recorded ‘White British’ ethnicity had high sensitivity 97.8 (97.7-98) and PPV 98.1 (98-
98.2) for self-reported White British ethnicity (Figure 2, estimates and confidence intervals in
appendix table 2). In contrast, HES-recorded ‘Mixed’ ethnicity had very low sensitivities
(12%-31%) and PPVs (12%-42%). HES-recorded ‘Indian’, ‘Pakistani’, ‘Bangladeshi’, ‘Chinese’,
‘Black-Caribbean’ and ‘Black African’ ethnicity had intermediate levels of sensitivity (65% -
80%) and PPV (80%- 89% respectively). HES-recorded ‘White Irish’ ethnicity had low
sensitivity ( 47.8% (44.5-51.0%)) but high PPV (81.5% (77.9-84.6%)). This means that of all
individuals who self-identify themselves as White Irish only 48% would have their ethnic
group recorded as such in their HES records; however, among patients whose HES records
indicate that they are ‘White Irish’, 82% would identify themselves to belong to this group
too.
Whilst crudely there was some evidence that age, gender and deprivation are associated
with discordance of ethnicity information, in multivariable logistic regression analysis
Page 15 of 30
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
12
adjusting for other patient characteristics the sole independent predictor of discordance was
self-reported ethnicity (Table 1). Repeating this model with a random effect for hospital
produced only trivial differences to associations of discordance with patient characteristics.
This means that the association between discordance and self-reported ethnic minority
group cannot be explained by ethnic minority patients attending hospitals that have overall
poor levels of accuracy of ethnicity information. There was however strong evidence
(p<0.0001) of variation in discordance between different hospitals. Specifically, if all
hospitals were to be arranged in order of frequency of discordance of ethnicity information
among their HES records, the odds ratio of discordance between the hospital in the 97.5th
and 2.5th
centiles (the 95% reference range) will be about 13, indicating a 13-fold difference
in the odds of discordance across hospitals.
Similar findings to those observed in the main analysis were observed when using a 6-group
classification (Appendix 2) indicating that a large degree of discordance is between rather
than within the 6-groups used in the cruder classification.
Finally, we provide probability tables (Table 2) that can be used in estimating ethnic group
variations in incidence, prevalence or measures care quality when using HES data. Examples
of such applications in the context of US health research have been reported previously 14,22
.
Page 16 of 30
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
13
DISCUSSION
Using data from a recent national survey of hospital patients with cancer we explored the
accuracy of ethnicity information in HES data. We found that overall the level of discordance
of ethnicity information between HES records and patient self-reports is low, particularly for
the majority White British patients. There is however a substantial degree of inaccuracy in
the recorded ethnicity of patients who self-report themselves as belonging to ethnic
minority groups. For many major ethnic groups (‘Indian’, ‘Pakistani’, ‘Bangladeshi’, ‘Chinese’,
‘Black-Caribbean’ and ‘Black African’) routine hospital data will mis-code between 20 and
35% of all patients who self-report that they belong to these ethnic groups (sensitivity 65% -
80%). Further, up to 20% of patients recorded as belonging to some major ethnic groups will
self-report that they actually belong to other ethnic groups (PPV 80%- 89% respectively). For
patients who self-report being of mixed ethnic groups HES records are usually discordant.
We provide probability tables that can be used in re-estimating ethnic group variations when
investigating ethnic variation in processes or quality of care from UK hospital records in
order to improve precision of such estimates.
The study explored a unique opportunity provided by patient survey data to explore the
accuracy of ethnicity information in UK routine health care data. Previous evidence on the
accuracy of ethnicity information in administrative datasets relates to US settings 18–20
. Our
findings concord with previous US literature which also indicates that the accuracy of
ethnicity information tends to be highest for the majority white population, lower for major
ethnic minority groups (like African American or Hispanic) and lowest for smaller ethnic
groups (such as American Indian/Alaskan Natives) and Mixed or Other racial/ethnic groups
18,19. Other strengths of the study include its large sample size, enabling the profiling of
discordance for small ethnic group; and the use of regression analyses to explore
Page 17 of 30
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
14
independent predictors of discordance and to examine variation between different
hospitals.
A limitation is that the study population included patients who attended hospital for cancer
treatment (most of whom are aged 65 years of age or older). Therefore in principle the
generalisability of the findings (particularly regarding younger patients) might be limited. We
were also unable to explore potential inaccuracies in ethnicity categorisation resulting from
longitudinal person-level discordance among patients with more than one hospital care
episodes. Previous research indicates that up to 3% of patients with more than one episode
of care have longitudinally discordant ethnic group information 23
. Evolving socio-cultural
trends, or changes in Census methodology (such as the introduction of Mixed ethnic groups
to the UK census in 2001) could contribute to changes in a person’s self-identified ethnic
group over their lifetime. Lastly, we were not able to account for the process by which
ethnicity was ascertained in hospital records. It is likely that this process involves a
combination of self-reports, reports by relatives or carers (e.g. in the case of infirm patients,
or patients with language or other communication difficulties) or even guess work by
hospital staff (e.g. when there are language barriers, or in clinical emergencies) 16
.
The 2001 Census has expanded the previous ONS classification of ethnicity (originally
containing 10-groups) to the 16-group classification also used in this survey 2, a change
which was subsequently reflected into HES coding. The impact of this secular change may
explain some of the discordance observed. For example, the discordance in self-reported
Irish White ethnicity may reflect the fact that this ethnic group was only included in the 2001
classification. The 2011 Census included two new ethnic groups; ‘Gypsy or Irish Traveller’
and ‘Arab’24
. It remains to be seen if this change will be reflected into the routinely collected
health data.
Page 18 of 30
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
15
Whilst this study considers the accuracy of recorded ethnicity, there are issues with defining
ethnicity using any simple classification, for example the potential for ‘concealed
heterogeneity’ within each of the ethnic groups 16
. It is possible for example that within the
Indian ethnic group certain social, linguistic or religious subgroups which are more or less
likely to have a discordant ethnicity. Indeed evidence indicates that ethnic minority patients
with discordant ethnicity information may be systematically different to other patients from
the same ethnic group with concordant ethnicity 20
. Within-ethnic group heterogeneity is
complex issue inherent in any type of ethnicity research, and the nature of our study did not
allow us to address this.
Another issue we have not addressed in this paper is the completeness of ethnicity
information, which has the potential to bias estimates of ethnic variation in addition to the
misclassification detailed here. We performed similar analysis to those discussed here for
predictors of missing ethnicity (not shown). We found only small variations by self-reported
ethnic group but a much larger degree of variation between hospitals than that seen for
discordance implying the issues with missing ethnicity is primarily driven by hospital level
processes.
Our findings have implications for policy and future research. First, although currently HES
data have a high level of completeness of ethnic information, a degree of caution is required
when interpreting evidence on ethnic inequalities in care quality or disease incidence and
prevalence that is based solely on HES data, particularly for minority ethnic groups found to
have higher discordance rates. Misclassification of ethnicity in HES data could result in either
under-estimation of ethnic variation, or inability to detect such variation when it exists (‘type
1’ error).
Second, we provide a list of probabilities that can be used to improve estimates of ethnic
variation in health care in a UK setting 14,22
. These probabilities can be used both to improve
Page 19 of 30
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
16
the precision of prevalence estimates and in statistical models where hospital record ethnic
group is a predictor 14
.
Third, as the completeness of ethnicity information in hospital records is currently high,
more attention needs to be given to the accuracy of recorded information. The substantial
variation between hospitals in the accuracy of ethnicity information indicates that there is
great potential for improving the quality of ethnicity information in poorly performing
hospitals. Improvement in the quality of HES data (which is generally desirable 25
) should
also encompass improvements in the quality of ethnicity coding. Future research should
explore optimal ways for efficiently obtaining current self-reported information on ethnicity
in patient records.
Page 20 of 30
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
17
Contributors: All authors had substantial input into the study concept, design, analysis and
access to the data. GL had the original idea for the study which was further conceptualised in
discussions between all authors; methods were principally developed by CLS and further
amplified by GAA. All authors communicated frequently during the course of the project,
including through face-to-face meetings. CLS and GL wrote the first draft which was edited by
all authors over multiple versions. All authors saw and approved the final manuscript.
Competing interests: None.
Acknowledgement: We thank the UK Data Archive for access to the anonymous survey data
(UKDA study number: 69488), the Department of Health as the depositor and principal
investigator of the Cancer Patient Experience Survey 2010, Quality Health as data collector;
and all NHS Acute Trusts in England, for provision of data samples.
Ethics approval: Not required.
Funding statement: The paper is independent research arising from a Post-Doctoral
Fellowship award to GL supported by the National Institute for Health Research (PDF-2011-04-
047). The views expressed in this publication are those of the authors and not necessarily
those of the NHS, the National Institute for Health Research or the Department of Health.
Data sharing: Not applicable (the data are available for academic research from the UK Data
Archive – see acknowledgement).
STROBE: Checklist for an observational study: Attached separately.
Page 21 of 30
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
18
REFERENCES
1. US Department of Health and Human Services Agency for Healthcare Research and Quality 2009 National
Healthcare Disparities and Quality Reports. (2009).at <http://www.ahrq.gov/qual/qrdr09.htm>
2. Department of Health Equity and excellence: liberating the NHS (White Paper). (2010).at
<http://www.dh.gov.uk/en/Publicationsandstatistics/Publications/PublicationsPolicyAndGuidance/DH_11735
3>
3. Cancer Incidence and Survival By Major Ethnic Group, England, 2002 - 2006. (2009).at
<http://www.ncin.org.uk/publications/reports/default.aspx>
4. Mindell, J., Klodawski, E. & Fitzpatrick, J. Using routine data to measure ethnic differentials in access to
coronary revascularization. Journal of public health (Oxford, England) 30, 45–53 (2008).
5. Mir, G. et al. Principles for research on ethnicity and health: the Leeds Consensus Statement. European
journal of public health cks028– (2012).doi:10.1093/eurpub/cks028
6. The NHS Information Centre How good is HES ethnic coding and where do the problems lie? (2011).at
<http://www.hesonline.nhs.uk/Ease/servlet/ContentServer?siteID=1937&categoryID=171>
7. Szczepura, A. Access to health care for ethnic minority populations. Postgraduate medical journal 81, 141–7
(2005).
8. Jack, R. H., Linklater, K. M., Hofman, D., Fitzpatrick, J. & Møller, H. Ethnicity coding in a regional cancer
registry and in Hospital Episode Statistics. BMC public health 6, 281 (2006).
9. Isajiw, W. W. Definition and Dimensions of Ethnicity: a theoretical framework. Challenges of Measuring an
Ethnic World: Science, politics and reality: Proceedings of the Joint Canada-United States Conference on the
Measurement of Ethnicity April 1-3, 1992, 407–27at
<https://tspace.library.utoronto.ca/bitstream/1807/68/2/Def_DimofEthnicity.pdf>
10. Lyratzopoulos, G., McElduff, P., Heller, R. F., Hanily, M. & Lewis, P. S. Comparative levels and time trends in
blood pressure, total cholesterol, body mass index and smoking among Caucasian and South-Asian
participants of a UK primary-care based cardiovascular risk factor screening programme. BMC public health 5,
125 (2005).
11. Shah, B. R. et al. Surname lists to identify South Asian and Chinese ethnicity from secondary data in Ontario,
Canada: a validation study. BMC medical research methodology 10, 42 (2010).
12. Maringe, C. et al. Cancer incidence in South Asian migrants to England, 1986-2004: Unraveling ethnic from
socioeconomic differentials. International journal of cancer. Journal international du cancer 132, 1886–94
(2013).
13. Quan, H. et al. Development and validation of a surname list to define Chinese ethnicity. Medical care 44,
328–33 (2006).
14. Elliott, M. N., Fremont, A., Morrison, P. A., Pantoja, P. & Lurie, N. A new method for estimating race/ethnicity
and associated disparities where administrative records lack self-reported race/ethnicity. Health services
research 43, 1722–36 (2008).
Page 22 of 30
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
19
15. Nitsch, D. et al. Validation and utility of a computerized South Asian names and group recognition algorithm
in ascertaining South Asian ethnicity in the national renal registry. QJM : monthly journal of the Association of
Physicians 102, 865–72 (2009).
16. Aspinall, P. J. The utility and validity for public health of ethnicity categorization in the 1991, 2001 and 2011
British Censuses. Public health 125, 680–7 (2011).
17. Mays, V. M., Ponce, N. A., Washington, D. L. & Cochran, S. D. Classification of race and ethnicity: implications
for public health. Annual review of public health 24, 83–110 (2003).
18. Arday, S. L., Arday, D. R., Monroe, S. & Zhang, J. HCFA’s racial and ethnic data: current accuracy and recent
improvements. Health care financing review 21, 107–16 (2000).
19. Lee, L. M., Lehman, J. S., Bindman, A. B. & Fleming, P. L. Validation of race/ethnicity and transmission mode in
the US HIV/AIDS reporting system. American journal of public health 93, 914–7 (2003).
20. Zaslavsky, A. M., Ayanian, J. Z. & Zaborski, L. B. The validity of race and ethnicity in enrollment data for
Medicare beneficiaries. Health services research 47, 1300–21 (2012).
21. Department of Health National Cancer Patient Experience Survey 2010 [computer file]. UKDA-SN-6742-1 .
Colchester, Essex: UK Data Archive [distributor] at <http://dx.doi.org/10.5255/UKDA-SN-6742-1 >
22. McCaffrey, D. F. & Elliott, M. N. Power of tests for a dichotomous independent variable measured with error.
Health services research 43, 1085–101 (2008).
23. Georghiou, T. & Thorlby, R. Quality of ethnicity coding in Hospital Episode Statistics: beyond completeness.
Paper presented at Healthcare Commission seminar “Measuring what matters: measuring ethnicity in health
data” 12/02/2007 . (2007).
24. Office of National Statistics 2011 Census. at <http://www.ons.gov.uk/ons/guide-
method/census/2011/index.html>
25. Spencer, S. A. & Davies, M. P. Hospital episode statistics: improving the quality and value of hospital data: a
national internet e-survey of hospital consultants. BMJ open 2, (2012).
Page 23 of 30
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
20
Figure 1. Survey responders and exclusions
67713
64118
63770
58721
Excluding patients with missing self-reported
ethnicity
Excluding patients with missing age, gender and
Index of Multiple Deprivation (IMD) score
Excluding patients with missing HES recorded
ethnicity
101773
surveys sent
67% response rate
Page 24 of 30
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
21
Total
respondants
Total discordant Crude OR of
discordance
joint p-
value
Adjusted OR joint p-
value
n % n %
Age
16-24 399 0.7 30 7.5 1.77 (1.28-2.46) <0.0001 0.9 (0.55-1.48) 0.54
25-34 950 1.6 93 9.8 2.37 (1.86-3.01) 0.94 (0.71-1.26)
35-44 3,139 5.3 231 7.4 1.73 (1.51-1.99) 1.02 (0.84-1.23)
45-54 7,763 13.2 484 6.2 1.45 (1.3-1.61) 1.12 (0.97-1.29)
55-64 15,375 26.2 743 4.8 1.11 (0.99-1.23) 1.11 (0.98-1.25)
65-74 18,366 31.3 805 4.4 reference reference
75-84 10,747 18.3 412 3.8 0.87 (0.76-0.99) 0.99 (0.86-1.14)
85+ 1,982 3.4 80 4.0 0.92 (0.71-1.18) 1.08 (0.83-1.42)
Gender
Men 27,395 46.7 1,268 4.6 reference 0.014 reference 0.69
Women 31,326 53.3 1,610 5.1 1.12 (1.02-1.22) 1.02 (0.93-1.12)
Deprivation
Lowest 13,301 22.7 488 3.7 reference <0.0001 reference 0.69
2 13,432 22.9 509 3.8 1.03 (0.91-1.18) 1.04 (0.9-1.2)
3 12,498 21.3 559 4.5 1.23 (1.05-1.44) 0.99 (0.85-1.14)
4 10,756 18.3 652 6.1 1.69 (1.36-2.11) 1.03 (0.89-1.2)
Highest 8,734 14.9 670 7.7 2.18 (1.68-2.84) 0.94 (0.8-1.11)
Ethnic Group
British (White) 54,589 93.0 1,177 2.2 reference <0.0001 reference <0.0001
Irish (White) 938 1.6 490 52.2 49.63 (32.72-75.28) 47.73 (41-55.55)
Any other White background 1,066 1.8 485 45.5 37.88 (24.95-57.52) 36.49 (31.52-42.25)
White and Black Caribbean
(Mixed)
72 0.1 50 69.4 103.14 (55.73-190.86) 109.97 (64.65-187.04)
White and Black African (Mixed) 41 0.1 33 80.5 187.19 (90.23-388.35) 202.59 (90.28-454.64)
White and Asian (Mixed) 77 0.1 65 84.4 245.81 (125.89-
479.94)
283.62 (149.43-
538.33)
Any other Mixed background 49 0.1 43 87.8 325.22 (124.17-
851.84)
399.68 (166.26-
960.79)
Indian (Asian or Asian British) 516 0.9 101 19.6 11.04 (7.16-17.04) 8.06 (6.33-10.27)
Pakistani (Asian or Asian British) 206 0.4 46 22.3 13.05 (8.39-20.28) 10.26 (7.19-14.66)
Bangladeshi (Asian or Asian
British)
50 0.1 13 26.0 15.94 (7.44-34.18) 12.02 (6.17-23.41)
Any other Asian background 129 0.2 57 44.2 35.93 (20.28-63.63) 30.42 (20.94-44.2)
Caribbean (Black or Black British) 471 0.8 136 28.9 18.42 (12.64-26.85) 10.37 (8.21-13.1)
African (Black or Black British) 319 0.5 111 34.8 24.22 (16.97-34.56) 15.54 (11.9-20.28)
Any other Black background 10 0.0 9 90.0 408.42 (49.3-3383.53) 410.23 (49.73-
3384.32)
Chinese (other ethnic group) 124 0.2 30 24.2 14.48 (8.77-23.92) 10.66 (6.86-16.57)
Any other ethnic group 64 0.1 32 50.0 45.38 (27.55-74.75) 34.04 (20.05-57.79)
Hospital
OR 95% Reference Range 158 - 30.48 <0.0001 12.85 <0.0001
Table 1. Crude and adjusted predictors of discordant hospital record ethnicity coding
Page 25 of 30
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
22
White British
Irish
Other White
White & Black Caribbean
White & Black AfricanWhite & Asian
Other Mixed
IndianPakistani
Bangladeshi
Other Asian
Caribbean
African
Other Black
Chinese
Other
020
40
60
80
100
Sensitivity %
0 20 40 60 80 100Positive predictive value %
Figure 2. Sensitivity* and positive predictive value** of hospital record recorded ethnicity compared with
self-reported ethnicity as a gold-standard
*If a patient self-reports that they belong to a particular ethnic group, then the sensitivity of the hospital record
ethnicity coding is the probability that the hospital record will record the same (correct) ethnicity
**If a patient’s hospital record states that they belong to a particular ethnic group, then the positive predictive value of
the hospital record ethnicity code is the probability that this code has been recorded correctly and that the patient will
self-report the same ethnicity
Page 26 of 30
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
23
Probability (%) of true self-reported ethnic group for each HES code
HES code British
(White)
Irish
(White)
Any other
White
background
White and
Black
Caribbean
(Mixed)
White
and
Black
African
(Mixed)
White
and
Asian
(Mixed)
Any other
Mixed
background
Indian
(Asian
or Asian
British)
Pakistani
(Asian or
Asian
British)
Bangladeshi
(Asian or
Asian British)
Any other
Asian
background
Caribbean
(Black or
Black
British)
African
(Black
or Black
British)
Any other
Black
background
Chinese
(other
ethnic
group)
Any
other
ethnic
group
A 98.1 0.9 0.7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.0
B 20.1 78.7 1.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
C 53.1 1.2 42.1 0.2 0.1 0.5 0.7 0.2 0.0 0.0 0.5 0.1 0.3 0.1 0.2 0.8
D 11.2 0.0 2.8 38.3 0.0 0.0 2.8 0.0 0.0 0.0 0.0 41.1 2.8 0.9 0.0 0.0
E 12.2 0.0 6.1 4.1 26.5 0.0 8.2 0.0 0.0 0.0 0.0 8.2 34.7 0.0 0.0 0.0
F 25.0 0.0 2.8 0.0 0.0 40.3 2.8 8.3 1.4 1.4 13.9 1.4 0.0 0.0 1.4 1.4
G 30.6 0.9 10.8 4.5 4.5 7.2 12.6 4.5 0.9 0.9 3.6 5.4 2.7 0.9 4.5 5.4
H 1.9 0.0 0.4 0.0 0.1 0.6 0.1 88.4 2.8 0.4 4.6 0.2 0.1 0.1 0.0 0.1
J 2.3 0.0 0.8 0.0 0.0 0.5 0.3 5.9 87.7 1.0 1.0 0.0 0.0 0.0 0.0 0.5
K 2.1 0.0 0.0 0.0 0.0 1.1 1.1 5.3 2.1 82.1 4.2 0.0 1.1 0.0 1.1 0.0
L 4.4 0.0 4.2 0.2 0.5 3.7 3.0 21.5 7.6 0.9 41.0 0.5 0.2 0.7 3.5 8.1
M 4.2 0.0 0.1 3.6 0.5 0.0 0.6 0.0 0.0 0.0 0.4 84.8 3.6 2.0 0.0 0.1
N 5.0 0.2 1.4 0.6 2.6 0.0 0.6 0.0 0.0 0.0 0.2 6.6 81.3 1.4 0.0 0.2
P 3.1 0.0 0.9 3.6 2.7 0.9 1.8 0.4 0.0 0.0 1.3 38.1 42.6 4.0 0.4 0.0
R 4.2 0.0 0.8 0.0 0.0 0.8 0.0 0.4 0.0 0.0 4.2 0.0 0.0 0.0 83.2 6.3
S 43.5 0.6 17.3 0.4 0.9 1.6 1.8 3.5 1.1 0.6 9.1 2.5 3.4 0.9 1.6 11.3
Z 90.6 1.5 2.8 0.1 0.1 0.3 0.2 0.9 0.5 0.0 0.7 1.0 0.6 0.1 0.3 0.3
X or Missing 91.3 1.3 2.5 0.1 0.1 0.2 0.2 1.3 0.4 0.1 0.5 0.7 0.7 0.1 0.4 0.2
Table 2. The probability (%) of ‘true’ ethnic group for each hospital record-recorded ethnic group (i.e. the probability that a given person with a specific HES-recorded
ethnicity belongs to any (self-reported) group
Page 27 of 30
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. Protected by copyright. http://bmjopen.bmj.com/ BMJ Open: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. Downloaded from
For peer review only
24
From 2001-02 onwards: A = British (White)
B = Irish (White)
C = Any other White background
D = White and Black Caribbean (Mixed)
E = White and Black African (Mixed)
F = White and Asian (Mixed)
G = Any other Mixed background
H = Indian (Asian or Asian British)
J = Pakistani (Asian or Asian British)
K = Bangladeshi (Asian or Asian British)
L = Any other Asian background
M = Caribbean (Black or Black British)
N = African (Black or Black British)
P = Any other Black background
R = Chinese (other ethnic group)
S = Any other ethnic group
Z = Not stated
X = Not known
From 1995-96 to 2000-01: 0= White
1 = Black - Caribbean
2 = Black - African
3 = Black - Other
4 = Indian
5 = Pakistani
6 = Bangladeshi
7 = Chinese
8 = Any other ethnic group
9 = Not given
X = Not known
Appendix table 1. NCPES questionnaire ethnicity question (left) and HES ethnicity codes (right)
Page 28 of 30
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
25
Sensitivity of
hospital-recorded
Information
PPV of
hospital-recorded
Information
Ethnicity in 2001 Census groups (16)
British (White) 97.8 (97.7-98) 98.1 (98-98.2)
Irish (White) 47.8 (44.5-51) 81.5 (77.9-84.6)
Any other White background 54.5 (51.5-57.5) 39.3 (36.8-41.8)
White and Black Caribbean (Mixed) 30.6 (20.2-42.5) 41.5 (28.1-55.9)
White and Black African (Mixed) 19.5 (8.8-34.9) 34.8 (16.4-57.3)
White and Asian (Mixed) 15.6 (8.3-25.6) 38.7 (21.8-57.8)
Any other Mixed background 12.2 (4.6-24.8) 12 (4.5-24.3)
Indian (Asian or Asian British) 80.4 (76.7-83.8) 89.1 (85.9-91.7)
Pakistani (Asian or Asian British) 77.7 (71.4-83.2) 86 (80.2-90.7)
Bangladeshi (Asian or Asian British) 74 (59.7-85.4) 80.4 (66.1-90.6)
Any other Asian background 55.8 (46.8-64.5) 38.3 (31.3-45.7)
Caribbean (Black or Black British) 71.1 (66.8-75.2) 85.2 (81.3-88.6)
African (Black or Black British) 65.2 (59.7-70.4) 80.3 (74.9-85)
Any other Black background 10 (0.3-44.5) 0.9 (0-5)
Chinese (other ethnic group) 75.8 (67.3-83) 87.9 (80.1-93.4)
Appendix table 2 Sensitivity and PPV. Estimates and 95% CI
Page 29 of 30
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
26
Total %
N discordant
(ONS6) %
Crude OR of
discordance
joint p-
value
Adjusted OR of
discordance
joint p-
value
Age
16-24 399 0.7 17 4.3 4.15 (2.47-6.96)
<0.0001
0.87 (0.43-1.79)
0.10
25-34 950 1.6 37 3.9 3.78 (2.63-5.42) 1.23 (0.77-1.94)
35-44 3,139 5.3 83 2.6 2.53 (1.99-3.22) 1.32 (0.96-1.82)
45-54 7,763 13.2 166 2.1 2.04 (1.68-2.46) 1.26 (0.98-1.62)
55-64 15,375 26.2 200 1.3 1.23 (1.05-1.44) 1.16 (0.93-1.46)
65-74 18,366 31.3 195 1.1 reference reference
75-84 10,747 18.3 86 0.8 0.75 (0.58-0.98) 0.84 (0.63-1.12)
85+ 1,982 3.4 12 0.6 0.57 (0.3-1.09) 0.73 (0.39-1.39)
Gender
Men 27,395 46.7 321 1.2 reference 0.0003
reference 0.49
Women 31,326 53.3 475 1.5 1.3 (1.13-1.49) 1.06 (0.9-1.26)
Deprivation
Lowest 13,301 22.7 119 0.9 reference
<0.0001
reference
0.30
2 13,432 22.9 120 0.9 1 (0.75-1.34) 1.13 (0.84-1.52)
3 12,498 21.3 143 1.1 1.28 (0.92-1.78) 1.23 (0.92-1.64)
4 10,756 18.3 191 1.8 2 (1.44-2.79) 1.36 (1.02-1.8)
Highest 8,734 14.9 223 2.6 2.9 (2.14-3.93) 1.27 (0.94-1.7)
Ethnic Group
White 56,593 96.4 332 0.6 baseline
<0.0001
baseline
<0.0001
Mixed 239 0.4 179 74.9 505.56 (333.91-765.45) 564.17 (399.47-796.78)
Asian 901 1.5 100 11.1 21.16 (14.5-30.88) 14.3 (10.97-18.65)
Black 800 1.4 123 15.4 30.79 (19.65-48.23) 17.51 (13.35-22.96)
Chinese 124 0.2 30 24.2 54.08 (32.35-90.42) 38.62 (24.39-61.14)
Other 64 0.1 32 50.0 169.46 (103.63-277.12) 109.64 (63.49-189.34)
Hospital
OR 95% Reference
Range 158 - 73.97 <0.0001 22.96 <0.0001
Appendix table 3. Crude and adjusted predictors of discordant hospital record ethnicity coding (based on 6
combined ethnicity groups)
Page 30 of 30
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
Accuracy of routinely recorded ethnic group information compared with self-reported ethnicity: Evidence from the
English Cancer Patient Experience survey
Journal: BMJ Open
Manuscript ID: bmjopen-2013-002882.R1
Article Type: Research
Date Submitted by the Author: 02-May-2013
Complete List of Authors: Saunders, Catherine; University of Cambridge, Abel, Gary; University of Cambridge, El Turabi, Anas; University of Cambridge,
Ahmed, Faraz; University of Cambridge, Lyratzopoulos, Georgios; University of Cambridge,
<b>Primary Subject Heading</b>:
Health services research
Secondary Subject Heading: Health informatics, Oncology
Keywords: Ethnicity, Hospital Records, Quality in health care < HEALTH SERVICES ADMINISTRATION & MANAGEMENT, Equality
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open on A
ugust 25, 2021 by guest. Protected by copyright.
http://bmjopen.bm
j.com/
BM
J Open: first published as 10.1136/bm
jopen-2013-002882 on 28 June 2013. Dow
nloaded from
For peer review only
1
Accuracy of routinely recorded ethnic group information
compared with self-reported ethnicity: Evidence from the
English Cancer Patient Experience survey
Saunders CL (1)
Abel GA (1)
El Turabi A (1)
Ahmed F (1)
Lyratzopoulos G (1)
Cambridge Centre for Health Services Research, University of Cambridge, Institute of Public
Health, Forvie Site, Robinson Way, Cambridge, CB2 0SR
Author for correspondence:
Catherine Saunders
Mail to: [email protected]
Word count: 295
Abstract: 335 words
Total main text: 2,915
Page 1 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
2
ABSTRACT
Objective: To describe the accuracy of ethnicity coding in contemporary NHS hospital
records compared to the ‘gold standard’ of self-reported ethnicity.
Design: Secondary analysis of data from a cross-sectional survey (2011)
Setting: All NHS hospitals in England providing cancer treatment
Participants: 58721 patients with cancer for whom ethnicity information [Office for
National Statistics (ONS) 2001 16-group classification] was available from self-reports
(considered to represent the ‘gold standard’) and their hospital record.
Methods: We calculated the sensitivity and positive predictive value (PPV) of hospital record
ethnicity. Further we used a logistic regression model to explore independent predictors of
discordance between recorded and self-reported ethnicity.
Results: Overall, 4.9% (4.7-5.1%) of people had their self-reported ethnic group incorrectly
recorded in their hospital records. Recorded White British ethnicity had high sensitivity
(97.8% (97.7-98.0%)) and PPV (98.1% (98.0-98.2%)) for self-reported White British ethnicity.
Recorded ethnicity information for the 15 other ethnic groups was substantially less
accurate with 41.2% (39.7-42.7%) incorrect. Recorded ‘Mixed’ ethnicity had low sensitivity
(12%-31%) and PPVs (12%-42%). Recorded ‘Indian’, ‘Chinese’, ‘Black-Caribbean’ and ‘Black
African’ ethnic groups had intermediate levels of sensitivity (65%-80%) and PPV (80%-89%
respectively). In multivariable analysis, belonging to an ethnic minority group was the only
independent predictor of discordant ethnicity information. There was strong evidence that
the degree of discordance of ethnicity information varied substantially between different
hospitals (p<0.0001).
Discussion: Current levels of accuracy of ethnicity information in NHS hospital records
support valid profiling of White / non-White ethnic differences. However profiling of ethnic
Page 2 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
3
differences in process or outcome measures for specific minority groups may contain a
substantial and variable degree of misclassification error. These considerations should be
taken into account when interpreting ethnic variation audits based on routine data and
inform initiatives aimed at improving the accuracy of ethnicity information in hospital
records.
Page 3 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
4
ARTICLE SUMMARY
Article focus
• Accurate recording of ethnicity in administrative health data is a prerequisite for efforts to
ensure equality or reduce inequalities in health care.
• This paper describes the accuracy of ethnicity coding in English NHS hospital records
compared to the ‘gold standard’ of self-reported ethnicity, and identifies areas where
improvement is needed.
Key messages
• Hospital records will usually code the ethnicity of patients with self-reported White British
ethnic group correctly, but levels of incorrect coding of the ethnicity of all ethnic minority
groups are high
• Belonging to an ethnic minority group is the only independent predictor of having an
incorrectly coded hospital record ethnicity; there is substantial variation in the quality of
coding between hospitals
• Probability look-up tables provided in this paper can be used for weighting of incidence or
prevalence estimates where hospital record ethnicity is being used, or in regression
analysis to improve estimation of ethnic variation in processes or outcomes of care
Strengths and limitations
• This was a unique opportunity to carry out an audit of the accuracy of hospital record
coding of ethnicity in England, using data from a large national survey of recently treated
patients.
Page 4 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
5
• We were not able to account for different processes by which ethnicity was
ascertained in hospital records and the study population was skewed towards older
ages which may limit the generalisability of the findings.
Page 5 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
6
BACKGROUND
Modern health care systems aspire to equal access and quality of care for patients of any
ethnic group 1 2
. In England, there are nevertheless well-documented ethnic inequalities in
processes and experiences of care 3 4
. Good practice in measuring these inequalities is
needed 5. Therefore, the availability of complete and valid information on patients’ ethnicity
in routine NHS data is a fundamental first step to enable equality audits to inform
improvement actions. Further, there are variations in the incidence and prevalence of some
conditions between different patient ethnic groups. Understanding such variations can be
vital for service planning and required capacity estimates. National Health Service hospitals
began to routinely record ethnicity information in their Patient Administration System (PAS)
records in the mid-1990s 6. (Patient Administration System records are a pre-cursor to
Hospital Episode Statistics (HES) data – hereafter we refer to hospital records as ‘HES
records’ for simplicity, as this is the more commonly used convention to denote patient
administrative data in the NHS). Although implementation of this measure has been slow 7,
HES data in recent years have high completeness of ethnic group information (typically
exceeding 90%) 6 8
. There is however little evidence about the accuracy of routinely recorded
information on patients’ ethnicity. In the past, audits of the quality of ethnicity information
in NHS records have principally focused on data completeness, rather than accuracy 6 and
completeness of ethnicity coding information currently forms part of the Commissioning
Outcomes Data Set 6, a prescribed standard for data quality that is linked to hospital
reimbursement.
Ethnicity can be classified by referring to a community of people who share the same
culture, and/or by referring to an ancestral population which comprises their self-identity 9.
Self-reported ethnicity captures both the shared experiences/culture of an individual and
Page 6 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
7
their self-identity. Methods such as surname recognition algorithms or geo-coding (or
combinations of both these approaches) have been developed to indirectly infer the
ethnicity of individuals10–15
, but both have limitations that do not apply to self-reported
ethnicity. For example, geo-coding methods rely on high levels of geographical (residential)
segregation between different ethnic groups. Surname recognition methods rely on low
levels of inter-ethnic marriages and the existence of distinctive surname nomenclatures
(which do not always exist for some ethnic groups, e.g. Black populations in the United
States). Further, by definition indirect methods will misclassify the ethnicity of some
patients, and disproportionately do so for the ethnicity of people from ethnic minorities. For
all the above reasons self-report is currently considered the gold standard measure of
ethnicity 16 17
.
Patient surveys, when linked to routine ethnicity data, provide opportunities to determine
the accuracy of ethnic group information contained in routine health system records. As this
linkage is rarely done, few studies have cross-examined the accuracy of ethnic group
information included in routine healthcare records against self-reported ethnicity
information. Those that have examined this have been conducted in the USA 18–20
.
Against this background and using data from an English national patient survey we aimed to
examine the overall accuracy of recorded versus self-reported ethnicity; and whether
discordance between recorded and self-reported ethnicity information varied between
patients of different ethnic groups.
Page 7 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
8
METHODS
Data
We used publicly available anonymous data from the 2010 Cancer Patient Experience Survey
(CPES) in England 21
. An unusual feature of this survey is that hospital recorded ethnicity was
collected when compiling the sample and linked to patient responses by the survey provider.
All patients treated for cancer in an English NHS hospital during the first quarter of 2010
were invited to participate in the survey, with a response rate of 67% 21
. Of 67713
respondents, 64418 (94.7%) provided valid self-reported ethnicity information. Because self-
reported ethnicity was used as the gold-standard against which the accuracy of HES-
recorded ethnicity was compared, data on respondents with missing self-reported ethnicity
were excluded from further analysis (see figure 1). Data on an additional 348 respondents
with missing information on deprivation status were also excluded from further analysis,
leaving 63,770 respondents. An additional 5049 records with missing HES-recorded ethnicity
were excluded, leaving 58721 records for analysis. For all respondents included in our
analyses completely observed data were available for patients’ HES-recorded age, gender
and socio-economic status (using Index of Multiple Deprivation (IMD) score of lower super
output area of residence). In both HES-records and patient reports, ethnicity was classified
using the same 16-group categorisation [Office of National Statistics (ONS16) 2001]
(Appendix 1).
Analysis
We first described the overall degree of discordance between HES-recorded and self-
reported information on ethnicity using the Cohen’s Kappa statistic. We further calculated
the sensitivity and the positive predictive value (PPV) of HES-recorded ethnicity in respect of
self-reported ethnicity. In this context, sensitivity denotes the proportion of patients with a
Page 8 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
9
given self-reported ethnicity with concordant ethnicity information in their HES record; and
PPV denotes the probability that a patient with a particular HES-recorded ethnic group will
self-report the same ethnic group.
Subsequently, we explored independent predictors of discordant ethnicity information by
constructing a multivariable logistic regression model with concordant/discordant ethnicity
status as the binary outcome variable and self-reported ethnicity as a covariate. Adjustment
was also made for other patient socio-demographics characteristics (age in 10 year age
groups, gender and postcode-linked area-based deprivation). For age and gender we used
HES-recorded information because the completeness of these variables among survey
respondents was higher than self-reported age and gender, and as the degree of
concordance between HES-recorded and self-reported age (based on year of birth) and
gender were very high (>99.5% for both).
To explore whether clustering of patients in some groups in hospitals with higher or lower
discordance levels could in part explain the findings, we subsequently repeated the
regression model described above including a random effect for hospital – in addition, this
model allows us to explore the degree of variation in the level of ethnicity information
discordance between different hospitals. Because less detailed classifications of ethnicity are
often used in health research, we also carried out supplementary analysis looking at
discordance when using 6 (as opposed to 16) ethnic groups (i.e. White, Mixed, Asian or Asian
British, Black or Black British, Chinese and Other).
Lastly, we constructed a probability ‘look-up’ table indicating the probability that HES-
recorded ethnicity represents true (self-reported) ethnicity for each of the 16 different
ethnic groups. These probabilities can be used for weighting of incidence or prevalence
estimates or used in regression analysis to improve estimation of ethnic variation in
processes or outcomes of care 14 22
. For the calculation of the probability `look-up’ table only,
Page 9 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
10
we combined data from two surveys (2010 and 2011/12) to improve the precision of the
probabilities presented (increasing our sample size to 133,204). STATA 11 was used for all
analyses.
Page 10 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
11
RESULTS
Overall the frequency of discordance of HES-recorded and self-reported ethnicity
information was 4.9% (4.7-5.1%). Patients who identified themselves as White British had
the lowest frequency of discordant ethnicity information in their HES records (2.2% (2.0-
2.3%)). In contrast, patients who identified themselves as belonging to any other ethnic
group had substantially higher frequency of discordant HES-recorded ethnicity (41.2% (39.7-
42.7%)). Cohen’s Kappa for HES-recorded and self-reported ethnicity was 0.64 overall and
0.54 if White British patients were excluded. The frequency of discordance was particularly
high for patients who self-reported that they belonged to the Any Other Black Background
(90% (97.4-100%)) and the Any Other Mixed Background (87.8% (78.2-97.3%)) groups (Table
1).
HES-recorded ‘White British’ ethnicity had high sensitivity 97.8 (97.7-98) and PPV 98.1 (98-
98.2) for self-reported White British ethnicity (Figure 2, estimates and confidence intervals in
appendix table 2 for the 6 and 16 group classification). In contrast, HES-recorded ‘Mixed’
ethnicity had very low sensitivities (12%-31%) and PPVs (12%-42%). HES-recorded ‘Indian’,
‘Pakistani’, ‘Bangladeshi’, ‘Chinese’, ‘Black-Caribbean’ and ‘Black African’ ethnicity had
intermediate levels of sensitivity (65% -80%) and PPV (80%- 89% respectively). HES-recorded
‘White Irish’ ethnicity had low sensitivity ( 47.8% (44.5-51.0%)) but high PPV (81.5% (77.9-
84.6%)). This means that of all individuals who self-identify themselves as White Irish only
48% would have their ethnic group recorded as such in their HES records; however, among
patients whose HES records indicate that they are ‘White Irish’, 82% would identify
themselves to belong to this group too.
Whilst crudely there was some evidence that age, gender and deprivation are associated
with discordance of ethnicity information, in multivariable logistic regression analysis
Page 11 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
12
adjusting for other patient characteristics the sole independent predictor of discordance was
self-reported ethnicity (Table 1). Repeating this model with a random effect for hospital
produced only trivial differences to associations of discordance with patient characteristics.
This means that the association between discordance and self-reported ethnic minority
group cannot be explained by ethnic minority patients attending hospitals that have overall
poor levels of accuracy of ethnicity information. There was however strong evidence
(p<0.0001) of variation in discordance between different hospitals (accuracy of coding of
ethnicity across hospitals ranged from 67-100%). Specifically, if all hospitals were to be
arranged in order of frequency of discordance of ethnicity information, and after accounting
for differences in the proportion of ethnic minority patients attending the hospital, the odds
ratio of discordance between the hospital in the 97.5th
and 2.5th
centiles (the 95% reference
range) will be about 13, indicating a 13-fold difference in the odds of discordance across
hospitals. After accounting for differences in the proportion of ethnic minority patients
attending each hospital the hospital at the bottom 2.5th
centile of coding accuracy had
concordant self-reported and hospital record recorded ethnicity codes in 90% of records.
Similar findings to those observed in the main analysis were observed when using a 6-group
classification (table 2) The total numbers of respondents with discordant ethnicity decreased
when using a 6-group classification to 796 subjects (1.4%) from 2878 (4.9%), indicating that
much of the discordance was within the cruder 6 group classification. However self-
reporting a non-White ethnic background remained as strongly associated with having an
incorrectly coded hospital-record ethnic group as for the 16 group classification. Details of
the ethnic groups to which people from each self-reported ethnic background were
misclassified to are given in table 3, showing the variation between and within the 6 and 16
group coding.
Page 12 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
13
People who self-report mixed ethnic backgrounds are particularly likely to have an
incorrectly coded ethnic group in their hospital records, with numbers high using both the 16
(79.9%) and 6 (74.9%) group classifications. Looking at table 3 we see that the self reported
ethnicity is falls into one of three groups - the concordant mixed category, white (either
British or Other white) or the corresponding ethnic minority category - over 90% of the time,
explaining why for this group the broader 6 category classification does not improve the
accuracy of hospital record ethnicity coding.
Finally, the probability table (Table 3) can be used in estimating ethnic group variations in
incidence, prevalence or measures care quality when using HES data. Examples of such
applications in the context of US health research have been reported previously 14 22
.
Page 13 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
14
DISCUSSION
Using data from a recent national survey of hospital patients with cancer we explored the
accuracy of ethnicity information in HES data. We found that overall the level of discordance
of ethnicity information between HES records and patient self-reports is low, particularly for
the majority White British patients. There is however a substantial degree of inaccuracy in
the recorded ethnicity of patients who self-report themselves as belonging to ethnic
minority groups. For many major ethnic groups (‘Indian’, ‘Pakistani’, ‘Bangladeshi’, ‘Chinese’,
‘Black-Caribbean’ and ‘Black African’) routine hospital data will mis-code between 20 and
35% of all patients who self-report that they belong to these ethnic groups (sensitivity 65% -
80%). Further, up to 20% of patients recorded as belonging to some major ethnic groups will
self-report that they actually belong to other ethnic groups (PPV 80%- 89% respectively). For
patients who self-report being of mixed ethnic groups HES records are usually discordant.
We provide probability tables that can be used in re-estimating ethnic group variations when
investigating ethnic variation in processes or quality of care from UK hospital records in
order to improve precision of such estimates.
The study explored a unique opportunity provided by patient survey data to explore the
accuracy of ethnicity information in UK routine health care data. Previous evidence on the
accuracy of ethnicity information in administrative datasets relates to US settings 18–20
. Our
findings concord with previous US literature which also indicates that the accuracy of
ethnicity information tends to be highest for the majority white population, lower for major
ethnic minority groups (like African American or Hispanic) and lowest for smaller ethnic
groups (such as American Indian/Alaskan Natives) and Mixed or Other racial/ethnic groups 18
19. Other strengths of the study include its large sample size, enabling the profiling of
discordance for small ethnic group; and the use of regression analyses to explore
Page 14 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
15
independent predictors of discordance and to examine variation between different
hospitals.
A limitation is that the study population included patients who attended hospital for cancer
treatment (most of whom are aged 65 years of age or older). Therefore in principle the
generalisability of the findings (particularly regarding younger patients) might be limited,
equally the need for accurate recording of ethnicity among cancer patients has been
identified 23
. We were also unable to explore potential inaccuracies in ethnicity
categorisation resulting from longitudinal person-level discordance among patients with
more than one hospital care episodes. Previous research indicates that up to 3% of patients
with more than one episode of care have longitudinally discordant ethnic group information
24. Evolving socio-cultural trends, or changes in Census methodology (such as the
introduction of Mixed ethnic groups to the UK census in 2001) could contribute to changes
in a person’s self-identified ethnic group over their lifetime. Lastly, we were not able to
account for the process by which ethnicity was ascertained in hospital records. It is likely
that this process involves a combination of self-reports, reports by relatives or carers (e.g. in
the case of infirm patients, or patients with language or other communication difficulties) or
even guess work by hospital staff (e.g. when there are language barriers, or in clinical
emergencies) 16
.
The 2001 Census has expanded the previous ONS classification of ethnicity (originally
containing 10-groups) to the 16-group classification also used in this survey 2, a change
which was subsequently reflected into HES coding. The impact of this secular change may
explain some of the discordance observed. For example, the discordance in self-reported
Irish White ethnicity may reflect the fact that this ethnic group was only included in the 2001
classification. The 2011 Census included two new ethnic groups; ‘Gypsy or Irish Traveller’
Page 15 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
16
and ‘Arab’25
. It remains to be seen if this change will be reflected into the routinely collected
health data.
Whilst this study considers the accuracy of recorded ethnicity, there are issues with defining
ethnicity using any simple classification, for example the potential for ‘concealed
heterogeneity’ within each of the ethnic groups 16
. It is possible for example that within the
Indian ethnic group certain social, linguistic or religious subgroups which are more or less
likely to have a discordant ethnicity. Indeed evidence indicates that ethnic minority patients
with discordant ethnicity information may be systematically different to other patients from
the same ethnic group with concordant ethnicity 20
. Within-ethnic group heterogeneity is
complex issue inherent in any type of ethnicity research, and the nature of our study did not
allow us to address this.
Another issue we have not addressed in this paper is the completeness of ethnicity
information, which has the potential to bias estimates of ethnic variation in addition to the
misclassification detailed here. We performed similar analysis to those discussed here for
predictors of missing ethnicity (not shown). We found only small variations by self-reported
ethnic group but a much larger degree of variation between hospitals than that seen for
discordance implying the issues with missing ethnicity is primarily driven by hospital level
processes.
Our findings have implications for policy and future research. First, although currently HES
data have a high level of completeness of ethnic information, and if the aim of any audit is to
compare outcomes between White and non-White groups the current classification system
will perform well but a degree of caution is required when interpreting more detailed
evidence on ethnic inequalities in care quality or disease incidence and prevalence that is
based solely on HES data, particularly for minority ethnic groups found to have higher
discordance rates. Misclassification of ethnicity in HES data could result in either under-
Page 16 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
17
estimation of ethnic variation, or inability to detect such variation when it exists (‘type 1’
error).
Second, we provide a list of probabilities that can be used to improve estimates of ethnic
variation in health care in a UK setting 14 22
. These probabilities can be used both to improve
the precision of prevalence estimates and in statistical models where hospital record ethnic
group is a predictor 14
.
Third, as the completeness of ethnicity information in hospital records is currently high,
more attention needs to be given to the accuracy of recorded information. The substantial
variation between hospitals in the accuracy of ethnicity information indicates that there is
great potential for improving the quality of ethnicity information in poorly performing
hospitals. Improvement in the quality of HES data (which is generally desirable 26
) should
also encompass improvements in the quality of ethnicity coding. Qualitative studies have
found willingness among ethnic minority groups to provide this information 27
and future
research should explore optimal ways for efficiently obtaining current self-reported
information on ethnicity in patient records.
Page 17 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
18
Contributors: All authors had substantial input into the study concept, design, analysis and
access to the data. GL had the original idea for the study which was further conceptualised in
discussions between all authors; methods were principally developed by CLS and further
amplified by GAA. All authors communicated frequently during the course of the project,
including through face-to-face meetings. CLS and GL wrote the first draft which was edited by
all authors over multiple versions. All authors saw and approved the final manuscript.
Competing interests: None.
Acknowledgement: We thank the UK Data Archive for access to the anonymous survey data
(UKDA study number: 69488), the Department of Health as the depositor and principal
investigator of the Cancer Patient Experience Survey 2010, Quality Health as data collector;
and all NHS Acute Trusts in England, for provision of data samples.
Ethics approval: Not required.
Funding statement: The paper is independent research arising from a Post-Doctoral
Fellowship award to GL supported by the National Institute for Health Research (PDF-2011-04-
047). The views expressed in this publication are those of the authors and not necessarily
those of the NHS, the National Institute for Health Research or the Department of Health.
Data sharing: Not applicable (the data are available for academic research from the UK Data
Archive – see acknowledgement).
STROBE: Checklist for an observational study: Attached separately.
Page 18 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
19
REFERENCES
1 US Department of Health and Human Services Agency for Healthcare Research and Quality. 2009 National
Healthcare Disparities and Quality Reports. 2009. http://www.ahrq.gov/qual/qrdr09.htm (accessed 29
Nov2012).
2 Department of Health. Equity and excellence: liberating the NHS (White Paper). 2010.
http://www.dh.gov.uk/en/Publicationsandstatistics/Publications/PublicationsPolicyAndGuidance/DH_117353
(accessed 29 Nov2012).
3 Cancer Incidence and Survival By Major Ethnic Group, England, 2002 - 2006. 2009.
http://www.ncin.org.uk/publications/reports/default.aspx
4 Mindell J, Klodawski E, Fitzpatrick J. Using routine data to measure ethnic differentials in access to coronary
revascularization. J Public Health 2008;30:45–53.
5 Mir G, Salway S, Kai J, et al. Principles for research on ethnicity and health: the Leeds Consensus Statement.
Eur J Public Health 2012;:cks028–.
6 The NHS Information Centre. How good is HES ethnic coding and where do the problems lie? 2011.
http://www.hesonline.nhs.uk/Ease/servlet/ContentServer?siteID=1937&categoryID=171 (accessed 7
Feb2013).
7 Szczepura A. Access to health care for ethnic minority populations. Postgrad Med J 2005;81:141–7.
8 Jack RH, Linklater KM, Hofman D, et al. Ethnicity coding in a regional cancer registry and in Hospital Episode
Statistics. BMC Public Health 2006;6:281.
9 Isajiw WW. Definition and Dimensions of Ethnicity: a theoretical framework. In: Challenges of Measuring an
Ethnic World: Science, politics and reality: Proceedings of the Joint Canada-United States Conference on the
Measurement of Ethnicity April 1-3, 1992,. Washington, D.C.: : U.S. Government Printing Office 407–27.
10 Lyratzopoulos G, McElduff P, Heller RF, et al. Comparative levels and time trends in blood pressure, total
cholesterol, body mass index and smoking among Caucasian and South-Asian participants of a UK primary-
care based cardiovascular risk factor screening programme. BMC Public Health 2005;5:125.
11 Shah BR, Chiu M, Amin S, et al. Surname lists to identify South Asian and Chinese ethnicity from secondary
data in Ontario, Canada: a validation study. BMC Med Res Methodol 2010;10:42.
12 Maringe C, Mangtani P, Rachet B, et al. Cancer incidence in South Asian migrants to England, 1986-2004:
Unraveling ethnic from socioeconomic differentials. Int J Cancer 2013;132:1886–94.
13 Quan H, Wang F, Schopflocher D, et al. Development and validation of a surname list to define Chinese
ethnicity. MedCare 2006;44:328–33.
14 Elliott MN, Fremont A, Morrison PA, et al. A new method for estimating race/ethnicity and associated
disparities where administrative records lack self-reported race/ethnicity. Health Serv Res 2008;43:1722–36.
15 Nitsch D, Kadalayil L, Mangtani P, et al. Validation and utility of a computerized South Asian names and group
recognition algorithm in ascertaining South Asian ethnicity in the national renal registry. QJM 2009;102:865–
72.
Page 19 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
20
16 Aspinall PJ. The utility and validity for public health of ethnicity categorization in the 1991, 2001 and 2011
British Censuses. Public Health 2011;125:680–7.
17 Mays VM, Ponce NA, Washington DL, et al. Classification of race and ethnicity: implications for public health.
Annu Rev Public Health 2003;24:83–110.
18 Arday SL, Arday DR, Monroe S, et al. HCFA’s racial and ethnic data: current accuracy and recent
improvements. Health Care Financ Rev 2000;21:107–16.
19 Lee LM, Lehman JS, Bindman AB, et al. Validation of race/ethnicity and transmission mode in the US HIV/AIDS
reporting system. Am JPublicHhealth 2003;93:914–7.
20 Zaslavsky AM, Ayanian JZ, Zaborski LB. The validity of race and ethnicity in enrollment data for Medicare
beneficiaries. Health Serv Res 2012;47:1300–21.
21 Department of Health. National Cancer Patient Experience Survey 2010 [computer file]. UKDA-SN-6742-1 .
Colchester, Essex: UK Data Archive [distributor]. http://dx.doi.org/10.5255/UKDA-SN-6742-1
22 McCaffrey DF, Elliott MN. Power of tests for a dichotomous independent variable measured with error.
Health Serv Res 2008;43:1085–101.
23 Iqbal G, Gumber A, Szczepura A, et al. Improving ethnic data collection for statistics of cancer incidence,
management, mortality and survival in the UK .
http://www2.warwick.ac.uk/fac/med/research/csri/ethnicityhealth/research/crc.pdf (accessed 25 Apr2013).
24 Georghiou T, Thorlby R. Quality of ethnicity coding in Hospital Episode Statistics: beyond completeness. Paper
presented at Healthcare Commission seminar “Measuring what matters: measuring ethnicity in health data”
12/02/2007 . 2007.
25 Office of National Statistics. 2011 Census. http://www.ons.gov.uk/ons/guide-
method/census/2011/index.html (accessed 8 Mar2013).
26 Spencer SA, Davies MP. Hospital episode statistics: improving the quality and value of hospital data: a
national internet e-survey of hospital consultants. BMJ Open 2012;2. doi:10.1136/bmjopen-2012-001651
27 Iqbal G, Johnson MR, Szczepura A, et al. UK ethnicity data collection for healthcare statistics: the South Asian
perspective. BMC Public Health 2012;12:243.
Page 20 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
21
Page 21 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
22
Figure legends:
Figure 1. Survey responders and exclusions
Figure 2. Sensitivity* and positive predictive value** of hospital record recorded ethnicity compared with self-
reported ethnicity as a gold-standard
*If a patient self-reports that they belong to a particular ethnic group, then the sensitivity of the hospital record ethnicity coding
is the probability that the hospital record will record the same (correct) ethnicity
**If a patient’s hospital record states that they belong to a particular ethnic group, then the positive predictive value of the
hospital record ethnicity code is the probability that this code has been recorded correctly and that the patient will self-report
the same ethnicity
Page 22 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
23
Total
respondents
Total discordant Crude OR of
discordance
joint p-
value
Adjusted OR joint p-
value
n % n %
Age
16-24 399 0.7 30 7.5 1.77 (1.28-2.46) <0.0001 0.9 (0.55-1.48) 0.54
25-34 950 1.6 93 9.8 2.37 (1.86-3.01) 0.94 (0.71-1.26)
35-44 3,139 5.3 231 7.4 1.73 (1.51-1.99) 1.02 (0.84-1.23)
45-54 7,763 13.2 484 6.2 1.45 (1.3-1.61) 1.12 (0.97-1.29)
55-64 15,375 26.2 743 4.8 1.11 (0.99-1.23) 1.11 (0.98-1.25)
65-74 18,366 31.3 805 4.4 reference reference
75-84 10,747 18.3 412 3.8 0.87 (0.76-0.99) 0.99 (0.86-1.14)
85+ 1,982 3.4 80 4.0 0.92 (0.71-1.18) 1.08 (0.83-1.42)
Gender
Men 27,395 46.7 1,268 4.6 reference 0.014 reference 0.69
Women 31,326 53.3 1,610 5.1 1.12 (1.02-1.22) 1.02 (0.93-1.12)
Deprivation
Lowest 13,301 22.7 488 3.7 reference <0.0001 reference 0.69
2 13,432 22.9 509 3.8 1.03 (0.91-1.18) 1.04 (0.9-1.2)
3 12,498 21.3 559 4.5 1.23 (1.05-1.44) 0.99 (0.85-1.14)
4 10,756 18.3 652 6.1 1.69 (1.36-2.11) 1.03 (0.89-1.2)
Highest 8,734 14.9 670 7.7 2.18 (1.68-2.84) 0.94 (0.8-1.11)
Ethnic Group
British (White) 54,589 93.0 1,177 2.2 reference <0.0001 reference <0.0001
Irish (White) 938 1.6 490 52.2 49.63 (32.72-75.28) 47.73 (41-55.55)
Any other White background 1,066 1.8 485 45.5 37.88 (24.95-57.52) 36.49 (31.52-42.25)
White and Black Caribbean
(Mixed)
72 0.1 50 69.4 103.14 (55.73-190.86) 109.97 (64.65-187.04)
White and Black African (Mixed) 41 0.1 33 80.5 187.19 (90.23-388.35) 202.59 (90.28-454.64)
White and Asian (Mixed) 77 0.1 65 84.4 245.81 (125.89-
479.94)
283.62 (149.43-
538.33)
Any other Mixed background 49 0.1 43 87.8 325.22 (124.17-
851.84)
399.68 (166.26-
960.79)
Indian (Asian or Asian British) 516 0.9 101 19.6 11.04 (7.16-17.04) 8.06 (6.33-10.27)
Pakistani (Asian or Asian British) 206 0.4 46 22.3 13.05 (8.39-20.28) 10.26 (7.19-14.66)
Bangladeshi (Asian or Asian
British)
50 0.1 13 26.0 15.94 (7.44-34.18) 12.02 (6.17-23.41)
Any other Asian background 129 0.2 57 44.2 35.93 (20.28-63.63) 30.42 (20.94-44.2)
Caribbean (Black or Black British) 471 0.8 136 28.9 18.42 (12.64-26.85) 10.37 (8.21-13.1)
African (Black or Black British) 319 0.5 111 34.8 24.22 (16.97-34.56) 15.54 (11.9-20.28)
Any other Black background 10 0.0 9 90.0 408.42 (49.3-3383.53) 410.23 (49.73-
3384.32)
Chinese (other ethnic group) 124 0.2 30 24.2 14.48 (8.77-23.92) 10.66 (6.86-16.57)
Any other ethnic group 64 0.1 32 50.0 45.38 (27.55-74.75) 34.04 (20.05-57.79)
Hospital
OR 95% Reference Range 158 - 30.48 <0.0001 12.85 <0.0001
Table 1. Crude and adjusted predictors of discordant hospital record ethnicity coding
Page 23 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
24
Total %
N discordant
(ONS6) %
Crude OR of
discordance
joint p-
value
Adjusted OR of
discordance
joint p-
value
Age
16-24 399 0.7 17 4.3 4.15 (2.47-6.96)
<0.0001
0.87 (0.43-1.79)
0.10
25-34 950 1.6 37 3.9 3.78 (2.63-5.42) 1.23 (0.77-1.94)
35-44 3,139 5.3 83 2.6 2.53 (1.99-3.22) 1.32 (0.96-1.82)
45-54 7,763 13.2 166 2.1 2.04 (1.68-2.46) 1.26 (0.98-1.62)
55-64 15,375 26.2 200 1.3 1.23 (1.05-1.44) 1.16 (0.93-1.46)
65-74 18,366 31.3 195 1.1 reference reference
75-84 10,747 18.3 86 0.8 0.75 (0.58-0.98) 0.84 (0.63-1.12)
85+ 1,982 3.4 12 0.6 0.57 (0.3-1.09) 0.73 (0.39-1.39)
Gender
Men 27,395 46.7 321 1.2 reference 0.0003
reference 0.49
Women 31,326 53.3 475 1.5 1.3 (1.13-1.49) 1.06 (0.9-1.26)
Deprivation
Lowest 13,301 22.7 119 0.9 reference
<0.0001
reference
0.30
2 13,432 22.9 120 0.9 1 (0.75-1.34) 1.13 (0.84-1.52)
3 12,498 21.3 143 1.1 1.28 (0.92-1.78) 1.23 (0.92-1.64)
4 10,756 18.3 191 1.8 2 (1.44-2.79) 1.36 (1.02-1.8)
Highest 8,734 14.9 223 2.6 2.9 (2.14-3.93) 1.27 (0.94-1.7)
Ethnic Group
White 56,593 96.4 332 0.6 baseline
<0.0001
baseline
<0.0001
Mixed 239 0.4 179 74.9 505.56 (333.91-765.45) 564.17 (399.47-796.78)
Asian 901 1.5 100 11.1 21.16 (14.5-30.88) 14.3 (10.97-18.65)
Black 800 1.4 123 15.4 30.79 (19.65-48.23) 17.51 (13.35-22.96)
Chinese 124 0.2 30 24.2 54.08 (32.35-90.42) 38.62 (24.39-61.14)
Other 64 0.1 32 50.0 169.46 (103.63-277.12) 109.64 (63.49-189.34)
Hospital
OR 95% Reference
Range 158 - 73.97 <0.0001 22.96 <0.0001
Table 2. Crude and adjusted predictors of discordant hospital record ethnicity coding (ONS 6-group classification)
Page 24 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
25
Page 25 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
26
Probability (%) of true self-reported ethnic group for each HES code
HES code British
(White)
Irish
(White)
Any other
White
background
White and
Black
Caribbean
(Mixed)
White and
Black
African
(Mixed)
White
and Asian
(Mixed)
Any other
Mixed
background
Indian
(Asian or
Asian
British)
Pakistani
(Asian or
Asian
British)
Bangladeshi
(Asian or Asian
British)
Any other
Asian
background
Caribbean
(Black or
Black British)
African
(Black or
Black
British)
Any other
Black
background
Chinese
(other
ethnic
group)
Any
other
ethnic
group
A = British (White) 98.1 0.9 0.7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.0
B = Irish (White) 20.1 78.7 1.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
C = Any other White background 53.1 1.2 42.1 0.2 0.1 0.5 0.7 0.2 0.0 0.0 0.5 0.1 0.3 0.1 0.2 0.8
D = White and Black Caribbean (Mixed) 11.2 0.0 2.8 38.3 0.0 0.0 2.8 0.0 0.0 0.0 0.0 41.1 2.8 0.9 0.0 0.0
E = White and Black African (Mixed) 12.2 0.0 6.1 4.1 26.5 0.0 8.2 0.0 0.0 0.0 0.0 8.2 34.7 0.0 0.0 0.0
F = White and Asian (Mixed) 25.0 0.0 2.8 0.0 0.0 40.3 2.8 8.3 1.4 1.4 13.9 1.4 0.0 0.0 1.4 1.4
G = Any other Mixed background 30.6 0.9 10.8 4.5 4.5 7.2 12.6 4.5 0.9 0.9 3.6 5.4 2.7 0.9 4.5 5.4
H = Indian (Asian or Asian British) 1.9 0.0 0.4 0.0 0.1 0.6 0.1 88.4 2.8 0.4 4.6 0.2 0.1 0.1 0.0 0.1
J = Pakistani (Asian or Asian British) 2.3 0.0 0.8 0.0 0.0 0.5 0.3 5.9 87.7 1.0 1.0 0.0 0.0 0.0 0.0 0.5
K = Bangladeshi (Asian or Asian British) 2.1 0.0 0.0 0.0 0.0 1.1 1.1 5.3 2.1 82.1 4.2 0.0 1.1 0.0 1.1 0.0
L = Any other Asian background 4.4 0.0 4.2 0.2 0.5 3.7 3.0 21.5 7.6 0.9 41.0 0.5 0.2 0.7 3.5 8.1
M = Caribbean (Black or Black British) 4.2 0.0 0.1 3.6 0.5 0.0 0.6 0.0 0.0 0.0 0.4 84.8 3.6 2.0 0.0 0.1
N = African (Black or Black British) 5.0 0.2 1.4 0.6 2.6 0.0 0.6 0.0 0.0 0.0 0.2 6.6 81.3 1.4 0.0 0.2
P = Any other Black background 3.1 0.0 0.9 3.6 2.7 0.9 1.8 0.4 0.0 0.0 1.3 38.1 42.6 4.0 0.4 0.0
R = Chinese (other ethnic group) 4.2 0.0 0.8 0.0 0.0 0.8 0.0 0.4 0.0 0.0 4.2 0.0 0.0 0.0 83.2 6.3
S = Any other ethnic group 43.5 0.6 17.3 0.4 0.9 1.6 1.8 3.5 1.1 0.6 9.1 2.5 3.4 0.9 1.6 11.3
Z = Not stated 90.6 1.5 2.8 0.1 0.1 0.3 0.2 0.9 0.5 0.0 0.7 1.0 0.6 0.1 0.3 0.3
X or Missing 91.3 1.3 2.5 0.1 0.1 0.2 0.2 1.3 0.4 0.1 0.5 0.7 0.7 0.1 0.4 0.2
Table 3. The probability (%) of ‘true’ ethnic group for each hospital record-recorded ethnic group (i.e. the probability that a given person with a specific HES-recorded
ethnicity belongs to any (self-reported) group
Page 26 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. Protected by copyright. http://bmjopen.bmj.com/ BMJ Open: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. Downloaded from
For peer review only
27
Page 27 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
1
Accuracy of routinely recorded ethnic group information
compared with self-reported ethnicity: Evidence from the
English Cancer Patient Experience survey
Saunders CL (1)
Abel GA (1)
El Turabi A (1)
Ahmed F (1)
Lyratzopoulos G (1)
Cambridge Centre for Health Services Research, University of Cambridge, Institute of Public
Health, Forvie Site, Robinson Way, Cambridge, CB2 0SR
Author for correspondence:
Catherine Saunders
Mail to: [email protected]
Word count: 295
Abstract: 335 words
Total main text: 2,621,915
Page 28 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
2
ABSTRACT
Objective: To describe the accuracy of ethnicity coding in contemporary NHS hospital
records compared to the ‘gold standard’ of self-reported ethnicity.
Design: Secondary analysis of data from a cross-sectional survey (2011)
Setting: All NHS hospitals in England providing cancer treatment
Participants: 58721 patients with cancer for whom ethnicity information [Office for
National Statistics (ONS) 2001 16-group classification] was available from self-reports
(considered to represent the ‘gold standard’) and their hospital record.
Methods: We calculated the sensitivity and positive predictive value (PPV) of hospital record
ethnicity. Further we used a logistic regression model to explore independent predictors of
discordance between recorded and self-reported ethnicity.
Results: Overall, 4.9% (4.7-5.1%) of people had their self-reported ethnic group incorrectly
recorded in their hospital records. Recorded White British ethnicity had high sensitivity
(97.8% (97.7-98.0%)) and PPV (98.1% (98.0-98.2%)) for self-reported White British ethnicity.
Recorded ethnicity information for the 15 other ethnic groups was substantially less
accurate with 41.2% (39.7-42.7%) incorrect. Recorded ‘Mixed’ ethnicity had low sensitivity
(12%-31%) and PPVs (12%-42%). Recorded ‘Indian’, ‘Chinese’, ‘Black-Caribbean’ and ‘Black
African’ ethnic groups had intermediate levels of sensitivity (65%-80%) and PPV (80%-89%
respectively). In multivariable analysis, belonging to an ethnic minority group was the only
independent predictor of discordant ethnicity information. There was strong evidence that
the degree of discordance of ethnicity information varied substantially between different
hospitals (p<0.0001).
Discussion: Current levels of accuracy of ethnicity information in NHS hospital records
support valid profiling of White / non-White ethnic differences. However profiling of ethnic
Page 29 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
3
differences in process or outcome measures for specific minority groups may contain a
substantial and variable degree of misclassification error. These considerations should be
taken into account when interpreting ethnic variation audits based on routine data and
inform initiatives aimed at improving the accuracy of ethnicity information in hospital
records.
Page 30 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
4
ARTICLE SUMMARY
Article focus
• Accurate recording of ethnicity in administrative health data is a prerequisite for efforts to
ensure equality or reduce inequalities in health care.
• This paper describes the accuracy of ethnicity coding in English NHS hospital records
compared to the ‘gold standard’ of self-reported ethnicity, and identifies areas where
improvement is needed.
Key messages
• Hospital records will usually code the ethnicity of patients with self-reported White British
ethnic group correctly, but levels of incorrect coding of the ethnicity of all ethnic minority
groups are high
• Belonging to an ethnic minority group is the only independent predictor of having an
incorrectly coded hospital record ethnicity; there is substantial variation in the quality of
coding between hospitals
• Probability look-up tables provided in this paper can be used for weighting of incidence or
prevalence estimates where hospital record ethnicity is being used, or in regression
analysis to improve estimation of ethnic variation in processes or outcomes of care
Strengths and limitations
• This was a unique opportunity to carry out an audit of the accuracy of hospital record
coding of ethnicity in England, using data from a large national survey of recently treated
patients.
Page 31 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
5
• We were not able to account for different processes by which ethnicity was
ascertained in hospital records and the study population was skewed towards older
ages which may limit the generalisability of the findings.
Page 32 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
6
BACKGROUND
Modern health care systems aspire to equal access and quality of care for patients of any
ethnic group 1 2
. In England, there are nevertheless well-documented ethnic inequalities in
processes and experiences of care 3 4
. Good practice in measuring these inequalities is
needed 5. Therefore, the availability of complete and valid information on patients’ ethnicity
in routine NHS data is a fundamental first step to enable equality audits to inform
improvement actions. Further, there are variations in the incidence and prevalence of some
conditions between different patient ethnic groups. Understanding such variations can be
vital for service planning and required capacity estimates. National Health Service hospitals
began to routinely record ethnicity information in their Patient Administration System (PAS)
records in the mid-1990s 6. (Patient Administration System records are a pre-cursor to
Hospital Episode Statistics (HES) data – hereafter we refer to hospital records as ‘HES
records’ for simplicity, as this is the more commonly used convention to denote patient
administrative data in the NHS). Although implementation of this measure has been slow 7,
HES data in recent years have high completeness of ethnic group information (typically
exceeding 90%) 6 8
. There is however little evidence about the accuracy of routinely recorded
information on patients’ ethnicity. In the past, audits of the quality of ethnicity information
in NHS records have principally focused on data completeness, rather than accuracy 6 and
completeness of ethnicity coding information currently forms part of the Commissioning
Outcomes Data Set 6, a prescribed standard for data quality that is linked to hospital
reimbursement.
Page 33 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
7
Ethnicity can be classified by referring to a community of people who share the same
culture, and/or by referring to an ancestral population which comprises their self-identity 9.
Self-reported ethnicity captures both the shared experiences/culture of an individual and
their self-identity. Methods such as surname recognition algorithms or geo-coding (or
combinations of both these approaches) have been developed to indirectly infer the
ethnicity of individuals10–15
, but both have limitations that do not apply to self-reported
ethnicity. For example, geo-coding methods rely on high levels of geographical (residential)
segregation between different ethnic groups. Surname recognition methods rely on low
levels of inter-communalethnic marriages and the existence of distinctive surname
nomenclatures (which do not always exist for some ethnic groups, e.g. Black populations in
the United States). Further, by definition indirect methods will misclassify the ethnicity of
some patients, and disproportionately do so for the ethnicity of people from ethnic
minorities. For all the above reasons self-report is currently considered the gold standard
measure of ethnicity 16 17
.
Patient surveys, when linked to routine ethnicity data, provide opportunities to determine
the accuracy of ethnic group information contained in routine health system records. As this
linkage is rarely done, few studies have cross-examined the accuracy of ethnic group
information included in routine healthcare records against self-reported ethnicity
information. Those that have examined this have been conducted in the USA 18–20
.
Against this background and using data from an English national patient survey we aimed to
examine the overall accuracy of recorded versus self-reported ethnicity; and whether
discordance between recorded and self-reported ethnicity information varied between
patients of different ethnic groups.
Page 34 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
8
METHODS
Data
We used publicly available anonymous data from the 2010 Cancer Patient Experience Survey
(CPES) in England 21
. An unusual feature of this survey is that hospital recorded ethnicity was
collected when compiling the sample and linked to patient responses by the survey provider.
All patients treated for cancer in an English NHS hospital during the first quarter of 2010
were invited to participate in the survey, with a response rate of 67% 21
. Of 67713
respondents, 64418 (94.7%) provided valid self-reported ethnicity information. Because self-
reported ethnicity was used as the gold-standard against which the accuracy of HES-
recorded ethnicity was compared, data on respondents with missing self-reported ethnicity
were excluded from further analysis (see figure 1). Data on an additional 348 respondents
with missing information on deprivation status were also excluded from further analysis,
leaving 63,770 respondents. An additional 5049 records with missing HES-recorded ethnicity
were excluded, leaving 58721 records for analysis. For all respondents included in our
analyses completely observed data were available for patients’ HES-recorded age, gender
and socio-economic status (using Index of Multiple Deprivation (IMD) score of lower super
output area of residence). In both HES-records and patient reports, ethnicity was classified
using the same 16-group categorisation [Office of National Statistics (ONS16) 2001]
(Appendix 1).
Analysis
We first described the overall degree of discordance between HES-recorded and self-
reported information on ethnicity using the Cohen’s Kappa statistic. We further calculated
the sensitivity and the positive predictive value (PPV) of HES-recorded ethnicity in respect of
self-reported ethnicity. In this context, sensitivity denotes the proportion of patients with a
Page 35 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
9
given self-reported ethnicity with concordant ethnicity information in their HES record; and
PPV denotes the probability that a patient with a particular HES-recorded ethnic group will
self-report the same ethnic group.
Subsequently, we explored independent predictors of discordant ethnicity information by
constructing a multivariable logistic regression model with concordant/discordant ethnicity
status as the binary outcome variable and self-reported ethnicity as a covariate. Adjustment
was also made for other patient socio-demographics characteristics (age in 10 year age
groups, gender and postcode-linked area-based deprivationsocio-economic status). For age
and gender we used HES-recorded information because the completeness of these variables
among survey respondents was higher than self-reported age and gender, and as the degree
of concordance between HES-recorded and self-reported age (based on year of birth) and
gender were very high (>99.5% for both).
To explore whether clustering of patients in some groups in hospitals with higher or lower
discordance levels could in part explain the findings, we subsequently repeated the
regression model described above including a random effect for hospital – in addition, this
model allows us to explore the degree of variation in the level of ethnicity information
discordance between different hospitals. Because less detailed classifications of ethnicity are
often used in health research, we also carried out supplementary analysis looking at
discordance when using 6 (as opposed to 16) ethnic groups (i.e. White, Mixed, Asian or Asian
British, Black or Black British, Chinese and Other).
Lastly, we constructed a probability ‘look-up’ table indicating the probability that HES-
recorded ethnicity represents true (self-reported) ethnicity for each of the 16 different
ethnic groups. These probabilities can be used for weighting of incidence or prevalence
estimates or used in regression analysis to improve estimation of ethnic variation in
processes or outcomes of care 14 22
. For the calculation of the probability `look-up’ table only,
Formatted: Font: Not Bold, Not Italic
Page 36 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
10
we combined data from two surveys (2010 and 2011/12) to improve the precision of the
probabilities presented (increasing our sample size to 133,204). STATA 11 was used for all
analyses.
Page 37 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
11
RESULTS
Overall the frequency of discordance of HES-recorded and self-reported ethnicity
information was 4.9% (4.7-5.1%). Patients who identified themselves as White British had
the lowest frequency of discordant ethnicity information in their HES records (2.2% (2.0-
2.3%)). In contrast, patients who identified themselves as belonging to any other ethnic
group had substantially higher frequency of discordant HES-recorded ethnicity (41.2% (39.7-
42.7%)). Cohen’s Kappa for HES-recorded and self-reported ethnicity was 0.64 overall and
0.54 if White British patients were excluded. The frequency of discordance was particularly
high for patients who self-reported that they belonged to the Any Other Black Background
(90% (97.4-100%)) and the Any Other Mixed Background (87.8% (78.2-97.3%)) groups (Table
1).
HES-recorded ‘White British’ ethnicity had high sensitivity 97.8 (97.7-98) and PPV 98.1 (98-
98.2) for self-reported White British ethnicity (Figure 2, estimates and confidence intervals in
appendix table 2 for the 6 and 16 group classification). In contrast, HES-recorded ‘Mixed’
ethnicity had very low sensitivities (12%-31%) and PPVs (12%-42%). HES-recorded ‘Indian’,
‘Pakistani’, ‘Bangladeshi’, ‘Chinese’, ‘Black-Caribbean’ and ‘Black African’ ethnicity had
intermediate levels of sensitivity (65% -80%) and PPV (80%- 89% respectively). HES-recorded
‘White Irish’ ethnicity had low sensitivity ( 47.8% (44.5-51.0%)) but high PPV (81.5% (77.9-
84.6%)). This means that of all individuals who self-identify themselves as White Irish only
48% would have their ethnic group recorded as such in their HES records; however, among
patients whose HES records indicate that they are ‘White Irish’, 82% would identify
themselves to belong to this group too.
Whilst crudely there was some evidence that age, gender and deprivation are associated
with discordance of ethnicity information, in multivariable logistic regression analysis
Page 38 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
12
adjusting for other patient characteristics the sole independent predictor of discordance was
self-reported ethnicity (Table 1). Repeating this model with a random effect for hospital
produced only trivial differences to associations of discordance with patient characteristics.
This means that the association between discordance and self-reported ethnic minority
group cannot be explained by ethnic minority patients attending hospitals that have overall
poor levels of accuracy of ethnicity information. There was however strong evidence
(p<0.0001) of variation in discordance between different hospitals (accuracy of coding of
ethnicity across hospitals ranged from 67-100%). Specifically, if all hospitals were to be
arranged in order of frequency of discordance of ethnicity information, among their HES
records,and after accounting for differences in the proportion of ethnic minority patients
attending the hospital, the odds ratio of discordance between the hospital in the 97.5th
and
2.5th
centiles (the 95% reference range) will be about 13, indicating a 13-fold difference in
the odds of discordance across hospitals. After accounting for differences in the proportion
of ethnic minority patients attending each hospital the hospital at the bottom 2.5th
centile of
coding accuracy had concordant self-reported and hospital record recorded ethnicity codes
in 90% of records.
Similar findings to those observed in the main analysis were observed when using a 6-group
classification (Appendix 2table 2) indicating that a large degree of discordance is between
rather than within the 6-groups used in the cruder classification.The total numbers of
respondents with discordant ethnicity decreased when using a 6-group classification to 796
subjects (1.4%) from 2878 (4.9%), indicating that much of the discordance was within the
cruder 6 group classification. However self-reporting a non-White ethnic background
remained as strongly associated with having an incorrectly coded hospital-record ethnic
group as for the 16 group classification. Details of the ethnic groups to which people from
Formatted: Superscript
Page 39 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
13
each self-reported ethnic background were misclassified to are given in table 3, showing the
variation between and within the 6 and 16 group coding.
People who self-report mixed ethnic backgrounds are particularly likely to have an
incorrectly coded ethnic group in their hospital records, with numbers high using both the 16
(79.9%) and 6 (74.9%) group classifications. Looking at table 3 we see that the self reported
ethnicity is falls into one of three groups - the concordant mixed category, white (either
British or Other white) or the corresponding ethnic minority category - over 90% of the time,
explaining why for this group the broader 6 category classification does not improve the
accuracy of hospital record ethnicity coding.
Finally, we providethe probability tables (Table 3)(Table 2) that can be used in estimating
ethnic group variations in incidence, prevalence or measures care quality when using HES
data. Examples of such applications in the context of US health research have been reported
previously 14 22
.
Formatted: Font: Not Bold, Not Italic
Formatted: Font: Not Bold, Not Italic
Formatted: Font: Not Bold, Not Italic
Formatted: Font: Not Bold, Not Italic
Page 40 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
14
DISCUSSION
Using data from a recent national survey of hospital patients with cancer we explored the
accuracy of ethnicity information in HES data. We found that overall the level of discordance
of ethnicity information between HES records and patient self-reports is low, particularly for
the majority White British patients. There is however a substantial degree of inaccuracy in
the recorded ethnicity of patients who self-report themselves as belonging to ethnic
minority groups. For many major ethnic groups (‘Indian’, ‘Pakistani’, ‘Bangladeshi’, ‘Chinese’,
‘Black-Caribbean’ and ‘Black African’) routine hospital data will mis-code between 20 and
35% of all patients who self-report that they belong to these ethnic groups (sensitivity 65% -
80%). Further, up to 20% of patients recorded as belonging to some major ethnic groups will
self-report that they actually belong to other ethnic groups (PPV 80%- 89% respectively). For
patients who self-report being of mixed ethnic groups HES records are usually discordant.
We provide probability tables that can be used in re-estimating ethnic group variations when
investigating ethnic variation in processes or quality of care from UK hospital records in
order to improve precision of such estimates.
The study explored a unique opportunity provided by patient survey data to explore the
accuracy of ethnicity information in UK routine health care data. Previous evidence on the
accuracy of ethnicity information in administrative datasets relates to US settings 18–20
. Our
findings concord with previous US literature which also indicates that the accuracy of
ethnicity information tends to be highest for the majority white population, lower for major
ethnic minority groups (like African American or Hispanic) and lowest for smaller ethnic
groups (such as American Indian/Alaskan Natives) and Mixed or Other racial/ethnic groups 18
19. Other strengths of the study include its large sample size, enabling the profiling of
discordance for small ethnic group; and the use of regression analyses to explore
Page 41 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
15
independent predictors of discordance and to examine variation between different
hospitals.
A limitation is that the study population included patients who attended hospital for cancer
treatment (most of whom are aged 65 years of age or older). Therefore in principle the
generalisability of the findings (particularly regarding younger patients) might be limited,
equally the need for accurate recording of ethnicity among cancer patients has been
identified 23
. We were also unable to explore potential inaccuracies in ethnicity
categorisation resulting from longitudinal person-level discordance among patients with
more than one hospital care episodes. Previous research indicates that up to 3% of patients
with more than one episode of care have longitudinally discordant ethnic group information
24. Evolving socio-cultural trends, or changes in Census methodology (such as the
introduction of Mixed ethnic groups to the UK census in 2001) could contribute to changes
in a person’s self-identified ethnic group over their lifetime. Lastly, we were not able to
account for the process by which ethnicity was ascertained in hospital records. It is likely
that this process involves a combination of self-reports, reports by relatives or carers (e.g. in
the case of infirm patients, or patients with language or other communication difficulties) or
even guess work by hospital staff (e.g. when there are language barriers, or in clinical
emergencies) 16
.
The 2001 Census has expanded the previous ONS classification of ethnicity (originally
containing 10-groups) to the 16-group classification also used in this survey 2, a change
which was subsequently reflected into HES coding. The impact of this secular change may
explain some of the discordance observed. For example, the discordance in self-reported
Irish White ethnicity may reflect the fact that this ethnic group was only included in the 2001
classification. The 2011 Census included two new ethnic groups; ‘Gypsy or Irish Traveller’
Page 42 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
16
and ‘Arab’25
. It remains to be seen if this change will be reflected into the routinely collected
health data.
Whilst this study considers the accuracy of recorded ethnicity, there are issues with defining
ethnicity using any simple classification, for example the potential for ‘concealed
heterogeneity’ within each of the ethnic groups 16
. It is possible for example that within the
Indian ethnic group certain social, linguistic or religious subgroups which are more or less
likely to have a discordant ethnicity. Indeed evidence indicates that ethnic minority patients
with discordant ethnicity information may be systematically different to other patients from
the same ethnic group with concordant ethnicity 20
. Within-ethnic group heterogeneity is
complex issue inherent in any type of ethnicity research, and the nature of our study did not
allow us to address this.
Another issue we have not addressed in this paper is the completeness of ethnicity
information, which has the potential to bias estimates of ethnic variation in addition to the
misclassification detailed here. We performed similar analysis to those discussed here for
predictors of missing ethnicity (not shown). We found only small variations by self-reported
ethnic group but a much larger degree of variation between hospitals than that seen for
discordance implying the issues with missing ethnicity is primarily driven by hospital level
processes.
Our findings have implications for policy and future research. First, although currently HES
data have a high level of completeness of ethnic information, and if the aim of any audit is to
compare outcomes between White and non-White groups the current classification system
will perform well but a degree of caution is required when interpreting more detailed
evidence on ethnic inequalities in care quality or disease incidence and prevalence that is
based solely on HES data, particularly for minority ethnic groups found to have higher
discordance rates. Misclassification of ethnicity in HES data could result in either under-
Page 43 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
17
estimation of ethnic variation, or inability to detect such variation when it exists (‘type 1’
error).
Second, we provide a list of probabilities that can be used to improve estimates of ethnic
variation in health care in a UK setting 14 22
. These probabilities can be used both to improve
the precision of prevalence estimates and in statistical models where hospital record ethnic
group is a predictor 14
.
Third, as the completeness of ethnicity information in hospital records is currently high,
more attention needs to be given to the accuracy of recorded information. The substantial
variation between hospitals in the accuracy of ethnicity information indicates that there is
great potential for improving the quality of ethnicity information in poorly performing
hospitals. Improvement in the quality of HES data (which is generally desirable 26
) should
also encompass improvements in the quality of ethnicity coding. Qualitative studies have
found willingness among ethnic minority groups to provide this information 27
F and future
research should explore optimal ways for efficiently obtaining current self-reported
information on ethnicity in patient records.
Page 44 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
18
Contributors: All authors had substantial input into the study concept, design, analysis and
access to the data. GL had the original idea for the study which was further conceptualised in
discussions between all authors; methods were principally developed by CLS and further
amplified by GAA. All authors communicated frequently during the course of the project,
including through face-to-face meetings. CLS and GL wrote the first draft which was edited by
all authors over multiple versions. All authors saw and approved the final manuscript.
Competing interests: None.
Acknowledgement: We thank the UK Data Archive for access to the anonymous survey data
(UKDA study number: 69488), the Department of Health as the depositor and principal
investigator of the Cancer Patient Experience Survey 2010, Quality Health as data collector;
and all NHS Acute Trusts in England, for provision of data samples.
Ethics approval: Not required.
Funding statement: The paper is independent research arising from a Post-Doctoral
Fellowship award to GL supported by the National Institute for Health Research (PDF-2011-04-
047). The views expressed in this publication are those of the authors and not necessarily
those of the NHS, the National Institute for Health Research or the Department of Health.
Data sharing: Not applicable (the data are available for academic research from the UK Data
Archive – see acknowledgement).
STROBE: Checklist for an observational study: Attached separately.
Page 45 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
19
REFERENCES
1 US Department of Health and Human Services Agency for Healthcare Research and Quality. 2009 National
Healthcare Disparities and Quality Reports. 2009. http://www.ahrq.gov/qual/qrdr09.htm (accessed 29
Nov2012).
2 Department of Health. Equity and excellence: liberating the NHS (White Paper). 2010.
http://www.dh.gov.uk/en/Publicationsandstatistics/Publications/PublicationsPolicyAndGuidance/DH_117353
(accessed 29 Nov2012).
3 Cancer Incidence and Survival By Major Ethnic Group, England, 2002 - 2006. 2009.
http://www.ncin.org.uk/publications/reports/default.aspx
4 Mindell J, Klodawski E, Fitzpatrick J. Using routine data to measure ethnic differentials in access to coronary
revascularization. Journal of public health (Oxford, England) Public Health 2008;30:45–53.
5 Mir G, Salway S, Kai J, et al. Principles for research on ethnicity and health: the Leeds Consensus Statement.
European journal of public health J Public Health 2012;:cks028–.
6 The NHS Information Centre. How good is HES ethnic coding and where do the problems lie? 2011.
http://www.hesonline.nhs.uk/Ease/servlet/ContentServer?siteID=1937&categoryID=171 (accessed 7
Feb2013).
7 Szczepura A. Access to health care for ethnic minority populations. Postgraduate medical journalPostgrad
Med J 2005;81:141–7.
8 Jack RH, Linklater KM, Hofman D, et al. Ethnicity coding in a regional cancer registry and in Hospital Episode
Statistics. BMC Ppublic Hhealth 2006;6:281.
9 Isajiw WW. Definition and Dimensions of Ethnicity: a theoretical framework. In: Challenges of Measuring an
Ethnic World: Science, politics and reality: Proceedings of the Joint Canada-United States Conference on the
Measurement of Ethnicity April 1-3, 1992,. Washington, D.C.: : U.S. Government Printing Office 407–27.
10 Lyratzopoulos G, McElduff P, Heller RF, et al. Comparative levels and time trends in blood pressure, total
cholesterol, body mass index and smoking among Caucasian and South-Asian participants of a UK primary-
care based cardiovascular risk factor screening programme. BMC pPublic Hhealth 2005;5:125.
11 Shah BR, Chiu M, Amin S, et al. Surname lists to identify South Asian and Chinese ethnicity from secondary
data in Ontario, Canada: a validation study. BMC medical Med rResearch Mmethodology 2010;10:42.
12 Maringe C, Mangtani P, Rachet B, et al. Cancer incidence in South Asian migrants to England, 1986-2004:
Unraveling ethnic from socioeconomic differentials. International journal of cancer Journal international du
cancerInt J Cancer 2013;132:1886–94.
13 Quan H, Wang F, Schopflocher D, et al. Development and validation of a surname list to define Chinese
ethnicity. Medical Ccare 2006;44:328–33.
14 Elliott MN, Fremont A, Morrison PA, et al. A new method for estimating race/ethnicity and associated
disparities where administrative records lack self-reported race/ethnicity. Health services Serv researchRes
2008;43:1722–36.
Page 46 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
20
15 Nitsch D, Kadalayil L, Mangtani P, et al. Validation and utility of a computerized South Asian names and group
recognition algorithm in ascertaining South Asian ethnicity in the national renal registry. QJM : monthly
journal of the Association of Physicians 2009;102:865–72.
16 Aspinall PJ. The utility and validity for public health of ethnicity categorization in the 1991, 2001 and 2011
British Censuses. Public healthPublic Health 2011;125:680–7.
17 Mays VM, Ponce NA, Washington DL, et al. Classification of race and ethnicity: implications for public health.
Annuual Rreview of pPublic hHealth 2003;24:83–110.
18 Arday SL, Arday DR, Monroe S, et al. HCFA’s racial and ethnic data: current accuracy and recent
improvements. Health Ccare Ffinancing Rreview 2000;21:107–16.
19 Lee LM, Lehman JS, Bindman AB, et al. Validation of race/ethnicity and transmission mode in the US HIV/AIDS
reporting system. American jJournal of pPublic Hhealth 2003;93:914–7.
20 Zaslavsky AM, Ayanian JZ, Zaborski LB. The validity of race and ethnicity in enrollment data for Medicare
beneficiaries. Health Sservices rResearch 2012;47:1300–21.
21 Department of Health. National Cancer Patient Experience Survey 2010 [computer file]. UKDA-SN-6742-1 .
Colchester, Essex: UK Data Archive [distributor]. http://dx.doi.org/10.5255/UKDA-SN-6742-1
22 McCaffrey DF, Elliott MN. Power of tests for a dichotomous independent variable measured with error.
Health Sservices Rresearch 2008;43:1085–101.
23 Iqbal G, Gumber A, Szczepura A, et al. Improving ethnic data collection for statistics of cancer incidence,
management, mortality and survival in the UK .
http://www2.warwick.ac.uk/fac/med/research/csri/ethnicityhealth/research/crc.pdf (accessed 25 Apr2013).
24 Georghiou T, Thorlby R. Quality of ethnicity coding in Hospital Episode Statistics: beyond completeness. Paper
presented at Healthcare Commission seminar “Measuring what matters: measuring ethnicity in health data”
12/02/2007 . 2007.
25 Office of National Statistics. 2011 Census. http://www.ons.gov.uk/ons/guide-
method/census/2011/index.html (accessed 8 Mar2013).
26 Spencer SA, Davies MP. Hospital episode statistics: improving the quality and value of hospital data: a
national internet e-survey of hospital consultants. BMJ Open 2012;2. doi:10.1136/bmjopen-2012-001651
27 Iqbal G, Johnson MR, Szczepura A, et al. UK ethnicity data collection for healthcare statistics: the South Asian
perspective. BMC Ppublic h Health 2012;12:243.
Page 47 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
21
Page 48 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
22
Figure 1. Survey responders and exclusions
67713
64118
63770
58721
Excluding patients with missing self-reported
ethnicity
Excluding patients with missing age, gender and
Index of Multiple Deprivation (IMD) score
Excluding patients with missing HES recorded
ethnicity
101773
surveys sent
67% response rate
Page 49 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
23
Total
respondeants
Total discordant Crude OR of
discordance
joint p-
value
Adjusted OR joint p-
value
n % n %
Age
16-24 399 0.7 30 7.5 1.77 (1.28-2.46) <0.0001 0.9 (0.55-1.48) 0.54
25-34 950 1.6 93 9.8 2.37 (1.86-3.01) 0.94 (0.71-1.26)
35-44 3,139 5.3 231 7.4 1.73 (1.51-1.99) 1.02 (0.84-1.23)
45-54 7,763 13.2 484 6.2 1.45 (1.3-1.61) 1.12 (0.97-1.29)
55-64 15,375 26.2 743 4.8 1.11 (0.99-1.23) 1.11 (0.98-1.25)
65-74 18,366 31.3 805 4.4 reference reference
75-84 10,747 18.3 412 3.8 0.87 (0.76-0.99) 0.99 (0.86-1.14)
85+ 1,982 3.4 80 4.0 0.92 (0.71-1.18) 1.08 (0.83-1.42)
Gender
Men 27,395 46.7 1,268 4.6 reference 0.014 reference 0.69
Women 31,326 53.3 1,610 5.1 1.12 (1.02-1.22) 1.02 (0.93-1.12)
Deprivation
Lowest 13,301 22.7 488 3.7 reference <0.0001 reference 0.69
2 13,432 22.9 509 3.8 1.03 (0.91-1.18) 1.04 (0.9-1.2)
3 12,498 21.3 559 4.5 1.23 (1.05-1.44) 0.99 (0.85-1.14)
4 10,756 18.3 652 6.1 1.69 (1.36-2.11) 1.03 (0.89-1.2)
Highest 8,734 14.9 670 7.7 2.18 (1.68-2.84) 0.94 (0.8-1.11)
Ethnic Group
British (White) 54,589 93.0 1,177 2.2 reference <0.0001 reference <0.0001
Irish (White) 938 1.6 490 52.2 49.63 (32.72-75.28) 47.73 (41-55.55)
Any other White background 1,066 1.8 485 45.5 37.88 (24.95-57.52) 36.49 (31.52-42.25)
White and Black Caribbean
(Mixed)
72 0.1 50 69.4 103.14 (55.73-190.86) 109.97 (64.65-187.04)
White and Black African (Mixed) 41 0.1 33 80.5 187.19 (90.23-388.35) 202.59 (90.28-454.64)
White and Asian (Mixed) 77 0.1 65 84.4 245.81 (125.89-
479.94)
283.62 (149.43-
538.33)
Any other Mixed background 49 0.1 43 87.8 325.22 (124.17-
851.84)
399.68 (166.26-
960.79)
Indian (Asian or Asian British) 516 0.9 101 19.6 11.04 (7.16-17.04) 8.06 (6.33-10.27)
Pakistani (Asian or Asian British) 206 0.4 46 22.3 13.05 (8.39-20.28) 10.26 (7.19-14.66)
Bangladeshi (Asian or Asian
British)
50 0.1 13 26.0 15.94 (7.44-34.18) 12.02 (6.17-23.41)
Any other Asian background 129 0.2 57 44.2 35.93 (20.28-63.63) 30.42 (20.94-44.2)
Caribbean (Black or Black British) 471 0.8 136 28.9 18.42 (12.64-26.85) 10.37 (8.21-13.1)
African (Black or Black British) 319 0.5 111 34.8 24.22 (16.97-34.56) 15.54 (11.9-20.28)
Any other Black background 10 0.0 9 90.0 408.42 (49.3-3383.53) 410.23 (49.73-
3384.32)
Chinese (other ethnic group) 124 0.2 30 24.2 14.48 (8.77-23.92) 10.66 (6.86-16.57)
Any other ethnic group 64 0.1 32 50.0 45.38 (27.55-74.75) 34.04 (20.05-57.79)
Hospital
OR 95% Reference Range 158 - 30.48 <0.0001 12.85 <0.0001
Table 1. Crude and adjusted predictors of discordant hospital record ethnicity coding
Page 50 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
24
Total %
N discordant
(ONS6) %
Crude OR of
discordance
joint p-
value
Adjusted OR of
discordance
joint p-
value
Age
16-24 399 0.7 17 4.3 4.15 (2.47-6.96)
<0.0001
0.87 (0.43-1.79)
0.10
25-34 950 1.6 37 3.9 3.78 (2.63-5.42) 1.23 (0.77-1.94)
35-44 3,139 5.3 83 2.6 2.53 (1.99-3.22) 1.32 (0.96-1.82)
45-54 7,763 13.2 166 2.1 2.04 (1.68-2.46) 1.26 (0.98-1.62)
55-64 15,375 26.2 200 1.3 1.23 (1.05-1.44) 1.16 (0.93-1.46)
65-74 18,366 31.3 195 1.1 reference reference
75-84 10,747 18.3 86 0.8 0.75 (0.58-0.98) 0.84 (0.63-1.12)
85+ 1,982 3.4 12 0.6 0.57 (0.3-1.09) 0.73 (0.39-1.39)
Gender
Men 27,395 46.7 321 1.2 reference 0.0003
reference 0.49
Women 31,326 53.3 475 1.5 1.3 (1.13-1.49) 1.06 (0.9-1.26)
Deprivation
Lowest 13,301 22.7 119 0.9 reference
<0.0001
reference
0.30
2 13,432 22.9 120 0.9 1 (0.75-1.34) 1.13 (0.84-1.52)
3 12,498 21.3 143 1.1 1.28 (0.92-1.78) 1.23 (0.92-1.64)
4 10,756 18.3 191 1.8 2 (1.44-2.79) 1.36 (1.02-1.8)
Highest 8,734 14.9 223 2.6 2.9 (2.14-3.93) 1.27 (0.94-1.7)
Ethnic Group
White 56,593 96.4 332 0.6 baseline
<0.0001
baseline
<0.0001
Mixed 239 0.4 179 74.9 505.56 (333.91-765.45) 564.17 (399.47-796.78)
Asian 901 1.5 100 11.1 21.16 (14.5-30.88) 14.3 (10.97-18.65)
Black 800 1.4 123 15.4 30.79 (19.65-48.23) 17.51 (13.35-22.96)
Chinese 124 0.2 30 24.2 54.08 (32.35-90.42) 38.62 (24.39-61.14)
Other 64 0.1 32 50.0 169.46 (103.63-277.12) 109.64 (63.49-189.34)
Hospital
OR 95% Reference
Range 158 - 73.97 <0.0001 22.96 <0.0001
Table 2. Crude and adjusted predictors of discordant hospital record ethnicity coding (ONS 6-group classification)
Page 51 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
25
White British
Irish
Other White
White & Black Caribbean
White & Black AfricanWhite & Asian
Other Mixed
IndianPakistani
Bangladeshi
Other Asian
Caribbean
African
Other Black
Chinese
Other
020
40
60
80
100
Sensitivity %
0 20 40 60 80 100Positive predictive value %
Figure 2. Sensitivity* and positive predictive value** of hospital record recorded ethnicity compared with
self-reported ethnicity as a gold-standard
*If a patient self-reports that they belong to a particular ethnic group, then the sensitivity of the hospital record
ethnicity coding is the probability that the hospital record will record the same (correct) ethnicity
**If a patient’s hospital record states that they belong to a particular ethnic group, then the positive predictive value of
the hospital record ethnicity code is the probability that this code has been recorded correctly and that the patient will
self-report the same ethnicity
Page 52 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only 26
Probability (%) of true self-reported ethnic group for each HES code
HES code British
(White)
Irish
(White)
Any other
White
background
White and
Black
Caribbean
(Mixed)
White and
Black
African
(Mixed)
White
and Asian
(Mixed)
Any other
Mixed
background
Indian
(Asian or
Asian
British)
Pakistani
(Asian or
Asian
British)
Bangladeshi
(Asian or Asian
British)
Any other
Asian
background
Caribbean
(Black or
Black British)
African
(Black or
Black
British)
Any other
Black
background
Chinese
(other
ethnic
group)
Any
other
ethnic
group
A = British (White)A 98.1 0.9 0.7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.0
B = Irish (White) B 20.1 78.7 1.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
C = Any other White backgroundC 53.1 1.2 42.1 0.2 0.1 0.5 0.7 0.2 0.0 0.0 0.5 0.1 0.3 0.1 0.2 0.8
D = White and Black Caribbean (Mixed) D 11.2 0.0 2.8 38.3 0.0 0.0 2.8 0.0 0.0 0.0 0.0 41.1 2.8 0.9 0.0 0.0
E = White and Black African (Mixed) E 12.2 0.0 6.1 4.1 26.5 0.0 8.2 0.0 0.0 0.0 0.0 8.2 34.7 0.0 0.0 0.0
F = White and Asian (Mixed) F 25.0 0.0 2.8 0.0 0.0 40.3 2.8 8.3 1.4 1.4 13.9 1.4 0.0 0.0 1.4 1.4
G = Any other Mixed background G 30.6 0.9 10.8 4.5 4.5 7.2 12.6 4.5 0.9 0.9 3.6 5.4 2.7 0.9 4.5 5.4
H = Indian (Asian or Asian British) H 1.9 0.0 0.4 0.0 0.1 0.6 0.1 88.4 2.8 0.4 4.6 0.2 0.1 0.1 0.0 0.1
J = Pakistani (Asian or Asian British) J 2.3 0.0 0.8 0.0 0.0 0.5 0.3 5.9 87.7 1.0 1.0 0.0 0.0 0.0 0.0 0.5
K = Bangladeshi (Asian or Asian British) K 2.1 0.0 0.0 0.0 0.0 1.1 1.1 5.3 2.1 82.1 4.2 0.0 1.1 0.0 1.1 0.0
L = Any other Asian background L 4.4 0.0 4.2 0.2 0.5 3.7 3.0 21.5 7.6 0.9 41.0 0.5 0.2 0.7 3.5 8.1
M = Caribbean (Black or Black British) M 4.2 0.0 0.1 3.6 0.5 0.0 0.6 0.0 0.0 0.0 0.4 84.8 3.6 2.0 0.0 0.1
N = African (Black or Black British) N 5.0 0.2 1.4 0.6 2.6 0.0 0.6 0.0 0.0 0.0 0.2 6.6 81.3 1.4 0.0 0.2
P = Any other Black backgroundP 3.1 0.0 0.9 3.6 2.7 0.9 1.8 0.4 0.0 0.0 1.3 38.1 42.6 4.0 0.4 0.0
R = Chinese (other ethnic group) R 4.2 0.0 0.8 0.0 0.0 0.8 0.0 0.4 0.0 0.0 4.2 0.0 0.0 0.0 83.2 6.3
S = Any other ethnic group S 43.5 0.6 17.3 0.4 0.9 1.6 1.8 3.5 1.1 0.6 9.1 2.5 3.4 0.9 1.6 11.3
Z = Not stated Z 90.6 1.5 2.8 0.1 0.1 0.3 0.2 0.9 0.5 0.0 0.7 1.0 0.6 0.1 0.3 0.3
X or Missing 91.3 1.3 2.5 0.1 0.1 0.2 0.2 1.3 0.4 0.1 0.5 0.7 0.7 0.1 0.4 0.2
Table 32. The probability (%) of ‘true’ ethnic group for each hospital record-recorded ethnic group (i.e. the probability that a given person with a specific HES-recorded
ethnicity belongs to any (self-reported) group
Formatted Table ... [1]
Formatted ... [2]
Formatted ... [3]
Formatted ... [4]
Formatted ... [5]
Formatted Table ... [6]
Formatted ... [7]
Formatted ... [8]
Formatted ... [9]
Formatted ... [10]
Formatted Table ... [11]
Formatted ... [12]
Formatted ... [13]
Formatted ... [14]
Formatted ... [15]
Formatted Table ... [16]
Formatted ... [17]
Formatted ... [18]
Formatted ... [19]
Formatted ... [20]
Formatted Table ... [21]
Formatted ... [22]
Formatted ... [23]
Formatted ... [24]
Formatted ... [25]
Formatted Table ... [26]
Formatted ... [27]
Formatted ... [28]
Formatted ... [29]
Formatted ... [30]
Formatted Table ... [31]
Formatted ... [32]
Formatted ... [33]
Formatted ... [34]
Formatted ... [35]
Formatted Table ... [36]
Formatted ... [37]
Formatted ... [38]
Formatted ... [39]
Formatted ... [40]
Formatted Table ... [41]
Formatted ... [42]
Formatted ... [43]
Formatted ... [44]
Formatted Table ... [45]
Page 53 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. Protected by copyright. http://bmjopen.bmj.com/ BMJ Open: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. Downloaded from
For peer review only
27
From 2001-02 onwards: A = British (White)
B = Irish (White)
C = Any other White background
D = White and Black Caribbean (Mixed)
E = White and Black African (Mixed)
F = White and Asian (Mixed)
G = Any other Mixed background
H = Indian (Asian or Asian British)
J = Pakistani (Asian or Asian British)
K = Bangladeshi (Asian or Asian British)
L = Any other Asian background
M = Caribbean (Black or Black British)
N = African (Black or Black British)
P = Any other Black background
R = Chinese (other ethnic group)
S = Any other ethnic group
Z = Not stated
X = Not known
From 1995-96 to 2000-01: 0= White
1 = Black - Caribbean
2 = Black - African
3 = Black - Other
4 = Indian
5 = Pakistani
6 = Bangladeshi
7 = Chinese
8 = Any other ethnic group
9 = Not given
X = Not known
Appendix table 1. NCPES questionnaire ethnicity question (left) and HES ethnicity codes (right)
Page 54 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
28
Sensitivity of
hospital-recorded
Information
PPV of
hospital-recorded
Information
Ethnicity in 2001 Census groups (16)
British (White) 97.8 (97.7-98) 98.1 (98-98.2)
Irish (White) 47.8 (44.5-51) 81.5 (77.9-84.6)
Any other White background 54.5 (51.5-57.5) 39.3 (36.8-41.8)
White and Black Caribbean (Mixed) 30.6 (20.2-42.5) 41.5 (28.1-55.9)
White and Black African (Mixed) 19.5 (8.8-34.9) 34.8 (16.4-57.3)
White and Asian (Mixed) 15.6 (8.3-25.6) 38.7 (21.8-57.8)
Any other Mixed background 12.2 (4.6-24.8) 12 (4.5-24.3)
Indian (Asian or Asian British) 80.4 (76.7-83.8) 89.1 (85.9-91.7)
Pakistani (Asian or Asian British) 77.7 (71.4-83.2) 86 (80.2-90.7)
Bangladeshi (Asian or Asian British) 74 (59.7-85.4) 80.4 (66.1-90.6)
Any other Asian background 55.8 (46.8-64.5) 38.3 (31.3-45.7)
Caribbean (Black or Black British) 71.1 (66.8-75.2) 85.2 (81.3-88.6)
African (Black or Black British) 65.2 (59.7-70.4) 80.3 (74.9-85)
Any other Black background 10 (0.3-44.5) 0.9 (0-5)
Chinese (other ethnic group) 75.8 (67.3-83) 87.9 (80.1-93.4)
Sensitivity of
hospital-recorded
Information
PPV of
hospital-recorded
Information
White 99.4 (99.3-99.5) 99.6 (99.6-99.7)
Mixed 25.1 (19.7-31.1) 38.2 (30.6-46.3)
Asian or Asian British 88.9 (86.7-90.9) 90.4 (88.3-92.3)
Black or Black british 84.6 (81.9-87.1) 89 (86.5-91.1)
Chinese 75.8 (67.3-83) 87.9 (80.1-93.4)
Other ethnic group 50 (37.2-62.8) 9.7 (6.7-13.5)
Appendix table 2. Sensitivity and PPV. Estimates and 95% CI
Formatted Table
Formatted Table
Formatted: Font: 8 pt
Formatted: Centered
Formatted: Font: 8 pt
Formatted: Centered
Formatted: Font: 8 pt
Formatted: Centered
Formatted: Font: 8 pt
Formatted: Centered
Formatted: Font: 8 pt
Formatted: Centered
Formatted: Font: 8 pt
Formatted: Centered
Page 55 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
29
Total %
N discordant
(ONS6) %
Crude OR of
discordance
joint p-
value
Adjusted OR of
discordance
joint p-
value
Age
16-24 399 0.7 17 4.3 4.15 (2.47-6.96)
<0.0001
0.87 (0.43-1.79)
0.10
25-34 950 1.6 37 3.9 3.78 (2.63-5.42) 1.23 (0.77-1.94)
35-44 3,139 5.3 83 2.6 2.53 (1.99-3.22) 1.32 (0.96-1.82)
45-54 7,763 13.2 166 2.1 2.04 (1.68-2.46) 1.26 (0.98-1.62)
55-64 15,375 26.2 200 1.3 1.23 (1.05-1.44) 1.16 (0.93-1.46)
65-74 18,366 31.3 195 1.1 reference reference
75-84 10,747 18.3 86 0.8 0.75 (0.58-0.98) 0.84 (0.63-1.12)
85+ 1,982 3.4 12 0.6 0.57 (0.3-1.09) 0.73 (0.39-1.39)
Gender
Men 27,395 46.7 321 1.2 reference 0.0003
reference 0.49
Women 31,326 53.3 475 1.5 1.3 (1.13-1.49) 1.06 (0.9-1.26)
Deprivation
Lowest 13,301 22.7 119 0.9 reference
<0.0001
reference
0.30
2 13,432 22.9 120 0.9 1 (0.75-1.34) 1.13 (0.84-1.52)
3 12,498 21.3 143 1.1 1.28 (0.92-1.78) 1.23 (0.92-1.64)
4 10,756 18.3 191 1.8 2 (1.44-2.79) 1.36 (1.02-1.8)
Highest 8,734 14.9 223 2.6 2.9 (2.14-3.93) 1.27 (0.94-1.7)
Ethnic Group
White 56,593 96.4 332 0.6 baseline
<0.0001
baseline
<0.0001
Mixed 239 0.4 179 74.9 505.56 (333.91-765.45) 564.17 (399.47-796.78)
Asian 901 1.5 100 11.1 21.16 (14.5-30.88) 14.3 (10.97-18.65)
Black 800 1.4 123 15.4 30.79 (19.65-48.23) 17.51 (13.35-22.96)
Chinese 124 0.2 30 24.2 54.08 (32.35-90.42) 38.62 (24.39-61.14)
Other 64 0.1 32 50.0 169.46 (103.63-277.12) 109.64 (63.49-189.34)
Hospital
OR 95% Reference
Range 158 - 73.97 <0.0001 22.96 <0.0001
Appendix table 3. Crude and adjusted predictors of discordant hospital record ethnicity coding (based on 6
combined ethnicity groups)
Page 56 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
4/25/2013 10:27:00 AM
4/25/2013 10:27:00 AM
4/25/2013 10:28:00 AM
4/25/2013 10:27:00 AM
4/25/2013 10:27:00 AM
4/25/2013 10:27:00 AM
Page 57 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
4/25/2013 10:27:00 AM
4/25/2013 10:27:00 AM
4/25/2013 10:27:00 AM
4/25/2013 10:28:00 AM
4/25/2013 10:27:00 AM
4/25/2013 10:27:00 AM
Page 58 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
4/25/2013 10:27:00 AM
4/25/2013 10:27:00 AM
4/25/2013 10:27:00 AM
4/25/2013 10:27:00 AM
4/25/2013 10:28:00 AM
4/25/2013 10:27:00 AM
Page 59 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
Page 60 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
1
STROBE Statement—checklist of items that should be included in reports of observational studies
Item
No Recommendation
Title and abstract 1 (a) Indicate the study’s design with a commonly used term in the title or the abstract
Study design is stated in title and in the design section of abstract
(b) Provide in the abstract an informative and balanced summary of what was done
and what was found
Please see Methods/Results section of abstract
Introduction
Background/rationale 2 Explain the scientific background and rationale for the investigation being reported
Given in the background section of the paper - specifically – previous studies of the
quality of coding in routine English NHS records have focused on completeness
rather than accuracy and so this work attempts to fill this research gap
Objectives 3 State specific objectives, including any prespecified hypotheses
Specifically stated as the last line of the introduction
Methods
Study design 4 Present key elements of study design early in the paper
Stated in the abstract and further presented in the methods
Setting 5 Describe the setting, locations, and relevant dates, including periods of recruitment,
exposure, follow-up, and data collection
Stated in the abstract and further presented in Methods (Data)
Participants 6 Cross-sectional study—Give the eligibility criteria, and the sources and methods of
selection of participants
Stated in the abstract and further presented in the methods (data) flowchart (figure 1)
Variables 7 Clearly define all outcomes, exposures, predictors, potential confounders, and effect
modifiers. Give diagnostic criteria, if applicable
Please see Methods, analysis (p9, line 4 onwards)
Data sources/
measurement
8* For each variable of interest, give sources of data and details of methods of
assessment (measurement). Describe comparability of assessment methods if there is
more than one group
Please see Methods, analysis (p9 line 4 onwards)
Bias 9 Describe any efforts to address potential sources of bias
Describing measurement bias in hospital record ethnicity coding is an outcome for
this study
In Methods, analysis, we describe our approach to potential clustering by hospital
Study size 10 Explain how the study size was arrived at
Given in Methods, Data and the flowchart in figure 1
Quantitative variables 11 Explain how quantitative variables were handled in the analyses. If applicable,
describe which groupings were chosen and why
All variable groups are explained and described in methods, analysis
Statistical methods 12 (a) Describe all statistical methods, including those used to control for confounding
Described in methods, analysis
(b) Describe any methods used to examine subgroups and interactions
No subgroup analyses considered
(c) Explain how missing data were addressed
Page 61 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
2
We performed these analyses (considering the small proportion of cases with missing
ethnicity coding) and mention these results in our discussion. These analyses were
extensive but not the main focus of this paper and so we did not include them as a
substantial research output in this paper
Cross-sectional study—If applicable, describe analytical methods taking account of
sampling strategy
Discussion of the generalizability of the sample was included in the discussion.
Clustering at the hospital level was considered in the analysis, looking at results with
and without a random effect for hospital.
(e) Describe any sensitivity analyses
We repeated the analyses on data from 2011/2, but as information on deprivation for
these years was not recorded and the results were similar across both years we only
present the results for 2010 (including the data from 2011/2 in the probability lookup
table calculation only)
Discordance of ethnicity using ONS 16 and ONS 6 categorisations is also presented
(the latter appendix).
Continued on next page
Page 62 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
3
Results
Participants 13* (a) Report numbers of individuals at each stage of study—eg numbers potentially eligible,
examined for eligibility, confirmed eligible, included in the study, completing follow-up, and
analysed
Figure 1
(b) Give reasons for non-participation at each stage
Figure 1
(c) Consider use of a flow diagram
Figure 1
Descriptive
data
14* (a) Give characteristics of study participants (eg demographic, clinical, social) and information
on exposures and potential confounders
Given as part of table 1
(b) Indicate number of participants with missing data for each variable of interest
Methods (data) section and figure 1
Outcome data 15*
Cross-sectional study—Report numbers of outcome events or summary measures
Table 1, given in the text of the results
Main results 16 (a) Give unadjusted estimates and, if applicable, confounder-adjusted estimates and their
precision (eg, 95% confidence interval). Make clear which confounders were adjusted for and
why they were included
Table 1; crude and adjusted results presented with 95% confidence intervals.
(b) Report category boundaries when continuous variables were categorized
Done so for age in table 1.
(c) If relevant, consider translating estimates of relative risk into absolute risk for a meaningful
time period
Not applicable
Other analyses 17 Report other analyses done—eg analyses of subgroups and interactions, and sensitivity
analyses
Results from sensitivity analyses (considering different categorisations of ethnicity) presented
in the appendix.
Discussion
Key results 18 Summarise key results with reference to study objectives
In first paragraph of discussion
Limitations 19 Discuss limitations of the study, taking into account sources of potential bias or imprecision.
Discuss both direction and magnitude of any potential bias
In discussion
Interpretation 20 Give a cautious overall interpretation of results considering objectives, limitations, multiplicity
of analyses, results from similar studies, and other relevant evidence
In discussion
Generalisability 21 Discuss the generalisability (external validity) of the study results
Discussed as in the discussion section of this paper, and also as part of the last bullet point in
‘Article Summary’.
Other information
Funding 22 Give the source of funding and the role of the funders for the present study and, if applicable,
Page 63 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
4
for the original study on which the present article is based
In acknowledgements section
*Give information separately for cases and controls in case-control studies and, if applicable, for exposed and
unexposed groups in cohort and cross-sectional studies.
Note: An Explanation and Elaboration article discusses each checklist item and gives methodological background and
published examples of transparent reporting. The STROBE checklist is best used in conjunction with this article (freely
available on the Web sites of PLoS Medicine at http://www.plosmedicine.org/, Annals of Internal Medicine at
http://www.annals.org/, and Epidemiology at http://www.epidem.com/). Information on the STROBE Initiative is
available at www.strobe-statement.org.
Page 64 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
107x90mm (300 x 300 DPI)
Page 65 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
Page 66 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
From 2001-02 onwards: A = British (White) B = Irish (White) C = Any other White background D = White and Black Caribbean (Mixed) E = White and Black African (Mixed) F = White and Asian (Mixed) G = Any other Mixed background H = Indian (Asian or Asian British) J = Pakistani (Asian or Asian British) K = Bangladeshi (Asian or Asian British) L = Any other Asian background M = Caribbean (Black or Black British) N = African (Black or Black British) P = Any other Black background R = Chinese (other ethnic group) S = Any other ethnic group Z = Not stated X = Not known
From 1995-96 to 2000-01: 0= White 1 = Black - Caribbean 2 = Black - African 3 = Black - Other 4 = Indian 5 = Pakistani 6 = Bangladeshi 7 = Chinese 8 = Any other ethnic group 9 = Not given
X = Not known
Appendix table 1. NCPES questionnaire ethnicity question (left) and HES ethnicity codes (right)
Page 67 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from
For peer review only
Sensitivity of hospital-recorded
Information
PPV of hospital-recorded
Information
Ethnicity in 2001 Census groups (16) British (White) 97.8 (97.7-98) 98.1 (98-98.2)
Irish (White) 47.8 (44.5-51) 81.5 (77.9-84.6)
Any other White background 54.5 (51.5-57.5) 39.3 (36.8-41.8)
White and Black Caribbean (Mixed) 30.6 (20.2-42.5) 41.5 (28.1-55.9)
White and Black African (Mixed) 19.5 (8.8-34.9) 34.8 (16.4-57.3)
White and Asian (Mixed) 15.6 (8.3-25.6) 38.7 (21.8-57.8)
Any other Mixed background 12.2 (4.6-24.8) 12 (4.5-24.3)
Indian (Asian or Asian British) 80.4 (76.7-83.8) 89.1 (85.9-91.7)
Pakistani (Asian or Asian British) 77.7 (71.4-83.2) 86 (80.2-90.7)
Bangladeshi (Asian or Asian British) 74 (59.7-85.4) 80.4 (66.1-90.6)
Any other Asian background 55.8 (46.8-64.5) 38.3 (31.3-45.7)
Caribbean (Black or Black British) 71.1 (66.8-75.2) 85.2 (81.3-88.6)
African (Black or Black British) 65.2 (59.7-70.4) 80.3 (74.9-85)
Any other Black background 10 (0.3-44.5) 0.9 (0-5)
Chinese (other ethnic group) 75.8 (67.3-83) 87.9 (80.1-93.4)
Sensitivity of
hospital-recorded Information
PPV of hospital-recorded
Information
White 99.4 (99.3-99.5) 99.6 (99.6-99.7)
Mixed 25.1 (19.7-31.1) 38.2 (30.6-46.3)
Asian or Asian British 88.9 (86.7-90.9) 90.4 (88.3-92.3)
Black or Black british 84.6 (81.9-87.1) 89 (86.5-91.1)
Chinese 75.8 (67.3-83) 87.9 (80.1-93.4)
Other ethnic group 50 (37.2-62.8) 9.7 (6.7-13.5)
Appendix table 2. Sensitivity and PPV. Estimates and 95% CI
Page 68 of 68
For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml
BMJ Open
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
on August 25, 2021 by guest. P
rotected by copyright.http://bm
jopen.bmj.com
/B
MJ O
pen: first published as 10.1136/bmjopen-2013-002882 on 28 June 2013. D
ownloaded from