CHEST RADIOGRAPH SCORING SYSTEMS FOR …digitool.library.mcgill.ca/thesisfile110727.pdf · A thesis...
Transcript of CHEST RADIOGRAPH SCORING SYSTEMS FOR …digitool.library.mcgill.ca/thesisfile110727.pdf · A thesis...
1
CHEST RADIOGRAPH SCORING SYSTEMS FOR THE
DIAGNOSIS OF ACTIVE PULMONARY TUBERCULOSIS
Lancelot M Pinto, MD
Department of Epidemiology and Biostatistics
McGill University, Montreal
May 2012
A thesis submitted to McGill University in partial fulfillment of the requirements of the
degree of Master of Science
© Lancelot M Pinto 2012
2
Table of Contents
ABSTRACT .................................................................................................................................................. 3
RÉSUMÉ ....................................................................................................................................................... 5
ACKNOWLEDGEMENTS .......................................................................................................................... 8
PREFACE - CONTRIBUTIONS OF CO-AUTHORS .............................................................................. 9
CHAPTER 1: INTRODUCTION ............................................................................................................ 10
CHAPTER 2: SYSTEMATIC REVIEW OF THE LITERATURE (MANUSCRIPT 1) .................. 14 2.1 ABSTRACT .................................................................................................................................................. 14 2.2 INTRODUCTION ....................................................................................................................................... 16 2.3 METHODS ................................................................................................................................................... 17 2.4 RESULTS ...................................................................................................................................................... 23 2.5 DISCUSSION ............................................................................................................................................... 28 2.6 CONCLUSIONS ........................................................................................................................................... 31 2.7 TABLES AND FIGURES .......................................................................................................................... 33
CHAPTER 3: BRIDGING CHAPTER: THE NEED FOR A CHEST RADIOGRAPH SCORING SYSTEM FOR THE DIAGNOSIS OF PULMONARY TUBERCULOSIS ......................................... 49
CHAPTER 4: DEVELOPMENT OF A RELIABLE AND SIMPLE RADIOGRAPHIC SCORING SYSTEM TO AID THE DIAGNOSIS OF PULMONARY TUBERCULOSIS (MANUSCRIPT 2) 52
3.1 ABSTRACT .................................................................................................................................................. 52 3.2 INTRODUCTION ....................................................................................................................................... 54 3.3 METHODS ................................................................................................................................................... 55 3.4 RESULTS ...................................................................................................................................................... 60 3.5 DISCUSSION ............................................................................................................................................... 62 3.6 CONCLUSIONS ........................................................................................................................................... 66 3.7 TABLES AND FIGURES .......................................................................................................................... 67
CHAPTER 5: CONCLUSIONS ............................................................................................................... 74
REFERENCES ........................................................................................................................................... 78
APPENDIX ................................................................................................................................................ 90 6.1 SEARCH STRATEGY FOR THE SYSTEMATIC REVIEW ........................................................... 90 6.2 DATA EXTRACTION FORM FOR THE SYSTEMATIC REVIEW ............................................ 93 6.3 CHEST RADIOGRAPH READING AND REPORTING SYSTEM (CRRS) FORM ............ 103
3
ABSTRACT
Background: Chest radiography is often the only tool available for the investigation of
tuberculosis (TB) suspects with negative sputum smears, thus playing a crucial role in
clinical decision-making. However, chest radiographs lack specificity for TB, and their
interpretation is subjective and not standardized, and therefore not highly reproducible.
Efforts to improve the interpretation of chest radiography are warranted, especially with
the growing use of digital radiology.
Objectives: To systematically review the literature on the use of scoring systems to aid
the diagnosis of active pulmonary TB (PTB), and to derive a new, simple scoring system
using features noted on the Chest Radiograph Reading and Recording System (CRRS), a
tool designed for the documentation of radiographic abnormalities in epidemiological
surveys for PTB.
Methods: A systematic review of the literature was performed to assess the utility of
chest radiograph scoring systems for the diagnosis of PTB, and to use this information to
derive a scoring system using the CRRS. Chest radiographs of outpatients with suspected
PTB, consecutively recruited over 3 years at clinics in South Africa, were read by two
independent readers using CRRS. Multivariable analysis was used to identify features
significantly associated with culture-positive PTB, and these were assigned weights and
used to generate a composite score.
4
Results: A systematic review of the literature identified 12 studies that used
radiographic features as part of scoring systems for the diagnosis of PTB. Six of these
were tested in smear-negative patients. There was no scoring system found that involved
the exclusive use of radiographic features. Upper lobe infiltrates and cavities were the
radiographic features most commonly associated with the disease. The sensitivities of the
scoring systems were uniformly high, but all of them lacked specificity.
For the study in South Africa, 473 patients were included in the analysis. Large upper
lobe opacities, cavities, unilateral pleural effusion and adenopathy were significantly
associated with culture-confirmed PTB, had high inter-reader reliability, and received 2,
2, 1 and 2 points, respectively in the final score. When applied to all TB suspects, using a
cut-off of ≥ 2, the score had a high negative predictive value (92%, 95%CI 87,95).
Among TB suspects with negative sputum smears, the score correctly ruled out active
disease in 214 of 229 patients (NPV 93; 95%CI 89,96)
Conclusions: Existing radiographic scoring systems for the diagnosis of PTB appear to
be sensitive, but lack specificity. The scoring system derived from CRSS is a simple and
reliable tool that may be useful for ruling out active PTB in smear-negative patients.
Validation studies are needed to confirm these initial findings.
5
RÉSUMÉ
Contexte: La radiographie thoracique est souvent le seul outil disponible pour le
dépistage de la tuberculose (TB) chez les patients ayant des frottis d'expectoration
négatifs, lui donnant ainsi un rôle crucial dans la prise de décision clinique. Toutefois, les
radiographies thoraciques manquent de spécificité pour la tuberculose, et leur
interprétation est subjective et non standardisée, et donc n‟est pas très reproductible. Les
efforts visant à améliorer l'interprétation de la radiographie pulmonaire sont justifiés,
surtout vu l'utilisation croissante de la radiologie numérique.
Objectifs: Les objectifs incluent une recherche systématique de la littérature sur
l'utilisation des systèmes de notation pour aider le diagnostic de la tuberculose
pulmonaire active (TBP), et d'en tirer un nouveau système de notation simple à partir du
Chest Radiograph Reading and Recording System (CRRS) (Système de Lecture et
Notation des radiographies thoraciques), un outil conçu pour la documentation des
anomalies radiologiques dans les études épidémiologiques sur la TBP.
Méthodes: Une recherche systématique de la littérature a été effectuée pour évaluer
l'utilité des systèmes de notation des radiographies thoraciques pour le diagnostic de la
TBP, et pour utiliser ces informations pour dériver un système de notation à partir du
CRRS. Les radiographies thoraciques de patients ambulatoires suspects de TBP, recrutés
consécutivement sur 3 ans dans des cliniques en Afrique du Sud, ont été lues par deux
lecteurs indépendants en utilisant CRRS. Une analyse multivariée a été utilisée pour
identifier les caractéristiques significativement associées à la TBP à culture positive, et
6
ceux-ci ont reçu une importance respective et ont été utilisé pour générer un score
composite.
Résultats: Une recherche systématique de la littérature a identifié 12 études qui ont
utilisé des systèmes de notation pour analyser les caractéristiques radiographiques dans le
cours du diagnostic de la TBP. Six d'entre elles comprenaient seulement des patients à
frottis négatif. Aucun système de notation ne comprenait l'usage exclusif des
caractéristiques radiographiques. Des cavités et des infiltrats dans les lobes supérieurs
étaient les caractéristiques radiographiques les plus couramment associées à la maladie.
Les sensibilités des systèmes de notation étaient uniformément élevées, mais chacun
d'eux manquait de spécificité.
Dans l'étude en Afrique du Sud, 473 patients ont été inclus dans l'analyse. Les grandes
opacités du lobe supérieur, les cavités, un épanchement pleural unilatéral ainsi que la
présence d‟adénopathie étaient significativement associés à la TBP confirmée par culture,
avaient un haut taux de fiabilité entre lecteur, et ont reçu 2, 2, 1 et 2 points,
respectivement dans le final. Lorsqu'appliqué à tous les cas suspects de tuberculose, en
utilisant un seuil de ≥ 2, le score avait une forte valeur prédictive négative (92%, IC 95%
87-95). Parmi les suspects de TB à frottis négatifs, le score a correctement exclu la
présence de maladie active dans 214 des 229 patients (VPN 93, 95% CI 89-96).
Conclusions: Les systèmes actuels de notation radiographiques pour le diagnostic de
TBP semblent être sensibles, mais manquent de spécificité. Le système de notation
7
dérivée de la CRSS est un outil simple et fiable qui peut être utile pour exclure la TBP
active chez les patients à frottis négatif. Des études de validation sont nécessaires pour
confirmer ces premiers résultats.
8
ACKNOWLEDGEMENTS
Sincere gratitude and heartfelt thanks go to my supervisor, Prof. Madhukar Pai, whose
guidance and mentoring throughout the course of my graduate studies has been
invaluable. Madhu leads by example, and his dedication to the cause of tuberculosis (TB)
control, his passion for teaching Epidemiology and his zeal to use the highest quality
research to answer questions that will hopefully help us overcome TB eventually, has
been truly inspirational.
I am grateful to Karen Steingart for being my guide and second reader for the systematic
review. Karen embodies the zest for tireless perfection, and I am immensely grateful for
her enthusiasm, patience and time spent in analyzing a huge amount of data. I thank
Keertan Dheda, Dick Menzies, Kevin Schwartzman and Rodney Dawson for their
insightful help and critiques throughout the course of the project. They have been most
helpful in assisting me with the conduct and the analysis of the studies, and their
comments and suggestions have helped me at every step of the way. I thank the team at
the University of Cape Town who helped me and made my stay at Cape Town productive
and enjoyable. I thank all my colleagues at the Pai TB group, and at the Respiratory
Epidemiology and Clinical Research Unit (RECRU). Their constant support and
encouragement has been priceless.
Special thanks go to my wife Franzina, who has always been my best friend and a
constant source of inspiration, and to my parents and family who have been pillars of
support and guidance all my life.
9
PREFACE - CONTRIBUTIONS OF CO-AUTHORS
For the systematic review, Lancelot M. Pinto (LP) was the lead reviewer and first author,
while Madhukar Pai (MP) and Karen R. Steingart (KRS) contributed to the conception
and design. LP was the lead reader, and KRS was the second reader for the screening of
citations, full-text review, and data extraction. LP prepared the manuscript, and MP,
Keertan Dheda (KD) and KRS provided editorial and methodological advice.
For the study in South Africa, LP was the lead author, while KD and MP contributed to
the conception and design. LP, MP and KD provided advice for the analysis and
interpretation of results. LP prepared the manuscript, and MP and KD contributed in
providing editorial advice.
10
CHAPTER 1: INTRODUCTION
Tuberculosis (TB), caused by Mycobacterium tuberculosis, is a disease that burdens
individuals and health systems globally. In 2010, there were 8.8 million new cases of TB,
12 million prevalent cases, and 1.45 million deaths, 0.35 million of which were among
persons living with HIV (PLWH) (1). The South-east Asian, African and Western Pacific
regions bore 85% of the global burden of disease, while Africa alone accounted for 80%
of HIV-associated incident cases worldwide.
The global case detection rate for all forms of TB is only 63-68%(1), and this low rate is
a serious impediment to the control of the disease. Limitations of existing diagnostic tests
are considered to contribute to the low-case detection rate(1). Sputum smear microscopy
and chest radiography are two of the most commonly used tests for TB in most high TB
burden countries. Smear microscopy has low sensitivity and fails to detect nearly half of
all TB cases(2). Chest radiography is a rapid point-of-care test that has been used for over
a century to diagnose pulmonary TB (PTB) (3). The test is easily performed and
incorporated in screening and diagnostic algorithms, and can be especially valuable for
the diagnosis of disease in patients suspected of having the disease (TB suspects) in
whom sputum smears are negative for acid-fast bacilli (AFB) – i.e. smear-negative TB.
While chest radiography is acknowledged to be a sensitive tool for detecting pulmonary
abnormalities, its use for the diagnosis of PTB has been limited by modest specificity and
high inter- and intra-observer differences in reporting of radiographs(4).
11
Scoring systems for chest radiographs have been used successfully to standardize and
improve the accuracy of detection of various pulmonary disorders. The International
Labor Organization (ILO) employs a classification scheme that trains readers to describe
chest radiographs in a standardized manner with regards to the location, nature, size and
profusion of abnormalities when compared to standard films, which represent the various
types of abnormalities. The system is specifically oriented towards documenting and
identifying features associated with the different occupation-related lung diseases such as
coal workers‟ pneumoconiosis, silicosis, and asbestosis (5). The score has been found to
improve reproducibility of the reading of chest radiographs for both pulmonary(6) and
pleural abnormalities(7). The Chrispin-Norman score is a standardized scoring system
used for grading chest radiographs in children with cystic fibrosis, with the aim of
documenting, and objectively assessing progression of disease with serial radiographs(8).
The score has been found to have good inter-reader reproducibility, and correlates well
with lung function(9). The lung injury score is a scoring system that divides the
visualized lung fields into zones, and assesses the extent of lung injury as a marker of
used to assess severity of acute respiratory distress syndrome (ARDS), and has been
found to be useful in improving agreement among readers, especially when read by
radiologists(10).
A similar standardized scoring system for TB that assigns weights to specific features of
chest radiographs consistent with TB, if accurate and reproducible, could potentially
augment TB case detection rates using largely pre-existing resources. Such a scoring
system would ideally need to have a documentation of specific features visualized on a
12
chest radiograph, and a weighted score for the various features visualized relative to their
association with active PTB.
In this manuscript-based thesis, I attempt to explore whether a standardized scoring
system for chest radiographs could improve the performance characteristics of chest
radiography as a diagnostic test for pulmonary TB (PTB), both as a means for improving
the inter-reader reproducibility, and for improving overall diagnostic accuracy.
The first manuscript is a systematic review of the available evidence for the use of
radiographic scoring systems for the diagnosis of PTB. This is a comprehensive literature
review that involved searching multiple databases with the assistance of a second reader,
with both readers having independently searched the literature for relevant studies that
involved radiographic scoring systems for TB. The aim of the literature review was to
identify such studies, and to try and assimilate a set of features that were consistently
associated with PTB, with the aim of testing these features (decided a priori) for the
derivation of a scoring system using the Chest Radiograph Reading and Recording
System (CRRS).
The CRRS is a tool, which was designed for use in epidemiological surveys for TB(11).
The tool involves a checklist of features visualized on the chest radiograph, and has been
found to be associated with high intra- and inter-reader reproducibility, making it a useful
tool for use in the derivation of a weighted scoring system. The derivation of a scoring
system for the diagnosis of PTB was attempted among subjects suspected of having TB
13
in a study conducted at the University of Cape Town, South Africa, using the CRRS. This
study involved the reading of chest radiographs from 473 subjects suspected of having
the disease, by two independent readers, and the analysis of various features visualized
on the radiographs for their association with PTB, followed by deriving a weighted
scoring system using these features. This study is described in the second manuscript of
the thesis.
Together, both manuscripts add to the evidence-base on use of scoring systems to
improve the accuracy and reliability of chest radiography for pulmonary TB diagnosis,
and will help improve the field of TB diagnostics within the existing framework of
limited resources.
14
CHAPTER 2: SYSTEMATIC REVIEW OF THE LITERATURE
(MANUSCRIPT 1)
Chest radiograph scoring systems for the diagnosis of active tuberculosis in adults:
A systematic review
2.1 ABSTRACT
Rationale: The use of chest radiography as a diagnostic test for active pulmonary
tuberculosis (PTB) is limited by the lack of standardization in the reading of chest
radiographs. Thus, despite being sensitive for the detection of pulmonary abnormalities, it
lacks both, specificity for PTB, and reproducibility. Scoring systems have been employed
successfully for improving the performance characteristics of chest radiography for
various pulmonary diseases, and could potentially improve the objectivity, accuracy and
reproducibility of radiography for the diagnosis of PTB.
Objectives: To systematically review the literature to assess the diagnostic accuracy of
chest radiograph scoring systems for PTB in patients clinically suspected of having the
disease. A secondary objective was to assess the reproducibility of such systems for PTB.
Methods: We searched multiple databases for studies that evaluated the diagnostic
accuracy and reproducibility of chest radiograph scoring systems for PTB. We
summarized results for individual features visualized on a chest radiograph that were
predictive of PTB, and the various features that were used in scoring systems to assess
the likelihood of the disease.
15
Results: We identified 12 studies that described clinical-radiographic scoring systems, 11
of which were created with the aim of predicting the likelihood of PTB among patients
who were to be admitted to hospitals. Six of these were tested in smear-negative patients,
and, no scoring system involved the exclusive use of radiographic features. Upper lobe
infiltrates and cavities were the radiographic features most commonly associated with TB
disease. The sensitivity estimates of the scoring systems were uniformly high, but all of
them lacked specificity. Studies involving newer techniques such as computer-assisted
diagnosis (CAD) had to be excluded due to methodological inadequacies.
Conclusions: The systematic review identified clinical-radiographic scoring systems,
most of which are useful in ruling-out PTB as part of the assessment of the need for
respiratory isolation of patients in healthcare settings. However, the low specificity
precludes their use as rule-in tests for PTB. Scoring systems that rely exclusively on chest
radiographs for the diagnosis of PTB are lacking. There is a need to derive accurate
scoring systems for PLWH and patients evaluated in out-patients settings, especially in
low-resource settings.
16
2.2 INTRODUCTION
In 2010, there were 8.8 million new cases of TB, 12 million prevalent cases, and 1.45
million deaths, 0.35 million of which were among persons living with HIV (PLWH)(1).
The global case detection rate for the disease is low, despite recognition of the need for
early diagnosis as a key element in the efforts to curb the epidemic (1). The existing
diagnostic tests in a majority of low-resource settings are sputum smear microscopy and
chest radiography. Smear microscopy lacks sensitivity, and has a limited role in the
diagnosis of extrapulmonary TB, pediatric TB, and TB in HIV-infected patients.
Chest radiography, used for over a century, is a rapid point-of-care test that is known to
be sensitive for detecting pulmonary abnormalities (3). The ease of use, relative low cost,
and quick turnaround time make it a convenient test in high-burden, low-inclome
settings. However, its use for the diagnosis of PTB has been limited by a lack of
specificity, and a lack of reproducibility in reporting of radiographs(4). Consequently, the
probability of diagnosing active PTB based on a chest radiograph reading is dependent on
the reader and not well standardized.
An analogous impediment to the use of chest radiography for the diagnosis of
occupational lung diseases was overcome by the development of standardized methods
for the reading of chest radiographs, a system that is now employed successfully by the
International Union Against Cancer, the International Labor Organization, and the
National Institute for Occupational Safety and Health(5, 12). Scoring systems have also
been developed for grading the severity and extent of pulmonary disease among patients
17
with cystic fibrosis(8) and form part of the lung injury score for assessing the severity of
adult respiratory distress syndrome(10, 13). Similarly, a standardized scoring system for
TB that assigns weights to specific features of chest radiographs consistent with TB, if
accurate and reproducible, could potentially augment TB case detection rates using
largely pre-existing resources. Such a standardized scoring system for TB also has the
potential to be combined with newer nucleic acid amplification tests such as Xpert
MTB/RIF (Cepheid Inc, Sunnyvale, CA) (14), either as a triage test to reduce costs or as
an add-on test in Xpert MTB/RIF negative persons.
To our knowledge, no previous systematic reviews have assessed the performance of
radiograph scoring systems for pulmonary TB. Therefore, we carried out a systematic
review to estimate the diagnostic accuracy of chest radiograph scoring systems for TB in
patients suspected of having the disease. A secondary objective was to assess the
reproducibility of chest radiograph scoring systems for active PTB.
2.3 METHODS
We followed guidelines for systematic reviews of diagnostic test accuracy recommended
by the Cochrane Collaboration Diagnostic Test Accuracy Working Group, including
writing a detailed protocol before starting the review(15, 16).
Types of studies: We included randomized controlled trials and observational studies of
all study designs (e.g. cross-sectional, case-control and cohort) that assessed the
performance of radiographic scoring systems for the diagnosis of pulmonary TB.
18
Participants: Participants were subjects suspected of having pulmonary TB who were 15
years of age and older. We restricted studies to those that included a minimum of 10
cases of active TB. With the aim of evaluating subjects similar to those who present in
routine clinical practice, we excluded studies that exclusively studied specific patient
groups such as patients with pneumoconioses, malignancies (both hematological and
solid organ), immune-mediated inflammatory disease, including patients on
immunosuppressive medications such as tumor necrosis factor-alpha inhibitors, and
patients on hemodialysis. Studies that were conducted on individuals who were not
suspected of having TB, such as investigations of asymptomatic contacts of patients with
active TB were also excluded.
Index test: Any chest radiograph scoring system
Comparator: No chest radiograph scoring system
Target condition: TB of the pulmonary parenchyma, pleura, intrathoracic lymph nodes.
We included miliary TB if the disease involved either pulmonary parenchyma or multiple
sites, one of which was the lung.
Reference standards: We considered liquid or solid cultures as the reference standards
for active pulmonary TB.
19
Definitions: A radiograph scoring system was defined as a system that assigned numeric
weights to specific features of chest radiographs consistent with PTB (such as cavitary
lesions), with or without the presence of clinical or lab components in the system.
True positives (TP) were TB suspects correctly classified as PTB by the scoring system
when compared with the reference standard.
False positives (FP) were TB suspects who did not have active PTB but were
misclassified by the scoring system as having active PTB.
False negatives (FN) were subjects with active PTB who were misclassified by the
scoring system as not having the disease.
True negatives (TN) were TB suspects who did not have active PTB and were correctly
classified by the scoring system.
Sensitivity refers to the proportion of patients with active PTB correctly identified by the
index test when compared with the reference standard: [TP/(FN + TP)] *100
Specificity refers to the proportion of patients with active PTB correctly identified by the
index test when compared with the reference standard: [(TN/(FP + TN)] *100
Positive predictive value (PPV) refers to the proportion of patients correctly identified as
having active PTB by the scoring system when compared to all patients identified as
having active PTB by the scoring system: [TP/(TP+FP)]
Negative predictive value (NPV) refers to the proportion of patients correctly identified as
not having active PTB by the scoring system when compared to all patients identified as
not having active PTB by the scoring system: [TN/(TN+FN)]
20
Diagnostic Odds ratio (DOR) refers to the odds of a participant with active PTB having a
specific clinical or radiographic manifestation as compared to the odds of a participant
without active PTB having the same clinical or radiographic manifestation. It is
computed by the formula: [(TP*TN)/(FP*FN)]
Reproducibility refers to percent agreement on reported features when a chest radiograph
is read more than once. The agreement could either be intra-reader, when the same chest
radiograph is read more than once by the same reader, blinded to his/her previous
reporting of the radiographs, or inter-reader, when more than one reader reports the
features of the same chest radiograph. This agreement is a reflection of the repeatability
of the test, and is independent of the accuracy with reference to the reference standard for
the diagnosis of active PTB. The observed level of agreement is the ratio of the number
of readings that are in agreement to the total number of readings. It is expressed as a
percentage: Agreement = [Number of readings in agreement/total number of
readings]*100. Cohen‟s kappa (Κ) is the chance-adjusted measure of agreement defined
as the ratio of the actual agreement beyond chance to the potential agreement beyond
chance.
Search strategy and study selection
We searched Medline (1946-2011), Embase (1947-2011) and Web of Science (1899 –
2011) on 28 July 2011 for relevant articles, using published hedges for diagnostic tests to
improve sensitivity (17, 18). We used the terms sensitiv*[tw] OR diagnos*[tw] OR di [fs]
AND radiograph*[MeSH] OR chest xray[tw] OR mass chest x-Ray[MeSH] OR
photofluorograph*[tw] OR scor*[tw] AND tuberculosis (sub-headings : lymphNode /
21
miliary/ multidrug-resistant/ Pleural/ Pulmonary) [MeSH] OR Mycobacterium
tuberculosis [MeSH]. The detailed search strategy for Medline can be found in Appendix
6.1. We also reviewed reference lists of included articles and any relevant review articles
identified through the search, for possible eligible articles, and hand-searched relevant
World Health Organization reports.
Relevant studies, restricted to those published in English, French and Spanish, were
selected independently by two reviewers (LP and KRS) and disagreements were resolved
by consensus. Citations deemed appropriate by either reviewer after screening titles and
abstracts were selected for full-text review. A list of excluded studies with their reasons
for exclusion is available upon request from the authors.
Assessment of study quality
Two reviewers (LP and KRS) independently assessed study quality using the core set of
11 items from Quality Assessment of Diagnostic Accuracy Studies (QUADAS), a
validated tool to evaluate the presence of bias and variation in diagnostic accuracy
studies(19). As recommended, each item was be scored as „yes‟, „no‟, or „unclear‟.
Data extraction
Data were extracted from each study, using a data extraction form, that was piloted and
then finalized, based on the experience gained from the pilot data extraction process. Two
reviewers (LP and KRS) independently extracted data, and disagreements were resolved
by consensus. The following data were extracted: author, study design, manner of patient
22
selection, country income status, eligibility criteria for participants, demographic details
of participants, details on the number and qualifications of the readers of the chest
radiographs, and TP, FP, FN, and TN for both, the individual features visualized on the
chest radiograph (such as infiltrates and cavities), and for the scoring system. The data
extraction form is included in Appendix 5.2.
Statistical analysis
For the studies that provided TP, FP, FN, and TN values, sensitivity and specificity
estimates and their corresponding 95% confidence intervals were calculated for the
scoring system at the cut-off for the diagnosis of active PTB used by the study authors
(mostly based on optimal sensitivities and specificities using receiver operator curves).
Forest plots were generated to display sensitivity and specificity estimates using Meta-
Disc (version 1.4) (20). Odds ratios for the presence of individual radiographic features
for the diagnosis of active PTB were determined when the study provided the relevant
data. Meta-analyses of odds ratios for specific radiographic features were performed only
if the features were defined in a sufficiently homogenous manner across studies, the
populations were similar, and the odds ratios were considered homogenous
(Heterogeneity was assessed using the I-squared statistic; effects were pooled if the I2
≤
75%. When results were pooled, meta-analysis was performed using the DerSimonian
and Laird random effects approach, with the aim of incorporating the heterogeneity of
effects across studies(21). All analyses were performed using STATA 11 (Stata
Corporation, College Station, Texas, USA). Pooling was performed using the command
“metandi”(22).Formal assessment of publication bias using methods such as funnel plots
23
or regression tests were not performed because such techniques have not been found to be
useful for diagnostic data(23). An estimation of language bias was attempted by
retrieving citations from the search strategy with and without a language filter, and we
report the “filtered” citations as a percentage of the overall citations retrieved.
2.4 RESULTS
We identified 11907 citations, of which 8066 unique articles were identified after
exclusion of duplicate articles. We conducted the search with and without the language
filters to assess the degree of bias, and found that our search strategy included 80.5% of
all studies. After screening of titles and abstracts, 168 articles satisfied the eligibility
criteria for further review and their full-texts were retrieved. After full-text review, 156
articles were excluded for various reasons. Thus, 12 articles were included in the
systematic review (24-35). Details of the selection process are outlined in Fig.1 in
accordance with PRISMA guidelines for reporting of systematic reviews(36).
Included studies
We did not identify any scoring system that was based exclusively on radiographic
criteria. The 12 included studies all involved scoring systems that comprised both clinical
and radiographic features. . Table 1 lists the characteristics of the 12 included studies,
containing a total of 5767 participants. The median number of TB suspects included in
the studies was 283 (interquartile range 177 to 431).
Six studies included all patients suspected of having TB (24-29), five studies included
patients suspected of having TB who were found to have negative sputum smears (30-
24
34), and one study excluded patients with HIV/AIDS(35). Nine studies were performed
in high-income countries. Five studies involved radiologists; two studies included
pulmonologists and five studies did not report the background of the radiograph reader.
The demographic characteristics of the patients are shown in Table 2. When reported, the
majority of patients were male. Eleven studies included PLWH, who represented 11% to
61% of eligible patients.
Excluded studies
We identified two studies that were designed with the aim of deriving a clinical-
radiographic scoring system for PLWH among hospitalized patients found to be sputum
smear-negative (37, 38). Both studies satisfied a majority of our inclusion criteria and
found the presence of mediastinal adenopathy and cavities to be significantly associated
with PTB in univariate analysis. However, neither of the studies derived a score for PTB.
The study by Le Minor et al. concluded that the “numbers were insufficient to develop a
score for TB”(38), while the study by Davis et al. stated that “After exhaustive testing,
we were unable to identify any combination of factors which reliably predicted
bacteriologically confirmed tuberculosis”(37).
We also excluded 13 studies that used automated computer-assisted diagnosis (CAD) as
none of these studies used culture as a reference standard, a criterion for inclusion in this
review (39-51). Five studies that involved the grading of chest radiographs were excluded
as these studies were designed to grade the severity of PTB based on the extent of
abnormalities visualized on the chest radiograph, and not the diagnostic accuracy of
25
scoring systems (52-56). We also excluded three studies that used the Chest Radiograph
Reading and Recording System (CRRS) (11, 57, 58), despite these studies demonstrating
the CRRS tool to have good reliability for features of PTB visualized on a chest
radiograph, as these studies did not use culture as a reference standard.
Assessment of methodological quality
As seen in Table 3, all studies suffered from incorporation bias, as the results of the chest
radiograph and/or the clinical components of the scoring system played a role in the
decision of those patients would who be investigated further with culture, the reference
standard. Six (50%) of the total 12 studies also did not include a sample that was
considered representative of the target population. Six (50%) studies were unclear about
whether the person assigning scores to the patients for the various components of the
scoring system was blinded to the results of the reference standard.
Findings
Studies that included all patients who were suspected of having TB
We identified six studies that included all patients suspected of having TB (24-29). All
studies were performed in an inpatient setting. All studies were aimed at deriving optimal
prediction scores to identify patients who were likely to have PTB and require respiratory
isolation (Tables 1 and 2). In univariate analyses, the most common radiographic features
across studies found to be significantly associated with PTB were upper lobe infiltrates
[odds ratio (OR) range, 2.38 to 10.11, pooled OR using a random effects model 6.65,
95% CI 4.42, 10.01, five studies)] (Table 4 and Figure 2), and cavities (OR range, 2.11 to
26
10.08, estimates not pooled due to heterogeneity of effects, three studies) (Table 5 and
Figure 3).
The details of the parameters included in the scores and their respective weights are
summarized in Table 6, along with the performance characteristics of the scoring system
and the final rule to aid in decision-making. Studies used several different methods to
derive weights for the scoring system: logistic regression of the parameters found
significant by univariate analysis (three studies); classification and regression tree
(CART) analysis (one study) (25); general regression neural network (GRRN) analysis
(one study) (26); and chi-squared recursive partitioning (one study) (27). All six studies
achieved a sensitivity of the scoring system greater than 80% (median 95%, range, 81%
to 100%). For the five studies that reported specificity data, specificity estimates were
low (median 42%, range 22% to 72%), suggesting a poor rule-in value for PTB.
Figure 4 presents individual study results of sensitivity and specificity estimates (and
their 95% confidence intervals) in both forest plots for the four studies that provided
sufficient data. As the scoring systems included varying clinical and radiographic
parameters to calculate the likelihood of PTB, we did not consider these systems to be
homogenous, and therefore, did not perform a meta-analysis of the accuracy estimates of
these scoring systems.
Studies that only included patients suspected of having TB who were found to be sputum
smear negative
27
We identified five studies in this category (30-34). Four studies were conducted in an
inpatient setting for the purpose of determining a clinical rule for respiratory isolation
(30, 31, 33, 34), while one study was performed in an outpatient setting (32), Tables 1
and Table 2. As with the previously described set of studies that included all patients
suspected of having TB, in the univariate analysis, the most common radiographic
features across studies found to be associated with PTB were upper lobe infiltrates (OR
range, 2.47 to 9.07, pooled OR using a random effects model 3.57, 95% CI 2.38, 5.37,
five studies) (Table 4 and Figure 5), and cavities (OR range, 1.97 to 25.66, estimates not
pooled due to heterogeneity of effects, three studies) (Table 5 and Figure 6).
To derive weights for the scoring system, two studies used logistic regression of the
parameters found significant in the univariate analysis, while three studies involved
validation of previous studies. One of the validation studies used bootstrapping, which is
a resampling method aimed at improving the internal validity of the data. (31) All studies
achieved a sensitivity of the scoring system greater than 93% (median 96%, range, 93%
to 98%). However, specificity estimates were low (median 35%, range 14% to 50%),
again suggesting a poor rule-in value for PTB. Figure 7 presents individual study results
of sensitivity and specificity estimates (and their 95% confidence intervals) in both forest
plots for the four studies that provided sufficient data. As is the case with the above
studies, we did not consider these systems to be homogenous, and therefore, did not
perform a meta-analysis of the accuracy estimates of these scoring systems.
28
One study, performed in an inpatient setting, excluded PLWH(35), Tables 2 and 3.
Logistic regression was used to derive weights for the score, but the study also validated
the score derived by Wisnivesky et al.(34) This study found a sensitivity of 97% and
specificity of 42%. The details of the scoring system and its performance characteristics
can be found in table 5.
Reproducibility
None of the included studies reported data on intra-reporter or inter-reporter
reproducibility.
2.5 DISCUSSION
Chest radiography is an important tool for physicians to assess the probability of active
pulmonary TB among individuals who have symptoms suggestive of disease, and often
the only tool available for assessing this probability among those suspected of having the
disease who are found to be negative on sputum smear examination. In the absence of
newer tests for TB that are universally affordable and accessible, there is a need to
improve existing tests such as chest radiography, which suffers from a lack of
standardization.
We conducted this systematic review with the aim of assessing the diagnostic accuracy of
standardized radiographic scoring systems for the diagnosis of PTB, and whether
standardization improves the performance of the test. Our review failed to find any study
29
that exclusively relied on radiographic features to derive a score, and all the included
studies had a combination of defined radiographic criteria with different clinical criteria.
Most of the included studies were hospital-based, decision-to-isolate studies. Patients
with PTB patient can generate up to 44 quanta per hour (one quantum is defined as the
infectious dose)(59), highlighting the necessity for rapid respiratory isolation of patients
with PTB in the hospital setting. Yet, the unnecessary respiratory isolation of patients
considerably increases costs to the healthcare system(60). Scoring systems that improve
the accuracy of the decisions to subject those patients suspected of having PTB to
respiratory isolation can considerably improve the efficiency of healthcare systems and
utilization of resources. The scores developed suffered from low specificity, and had a
high rule-out value (high negative predictive value) but a poor rule-in value (low positive
predictive value) for PTB. However, such scores may still be useful for limiting the
number of patients for whom further investigations would be warranted, especially
among patients who are smear-negative.
The prediction rule developed by Wisnivesky et al was validated in three studies, two of
which were conducted in patients who had negative sputum smears. The scoring system
consistently demonstrated sensitivity higher than 95%, but had poor specificity. As a
rule-out test, this scoring system appears to be validated in multiple studies. The study by
Soto et al(32) was a validation study of a score derived by the same research group in
an earlier study (31). Although the cut-off for the score was modified in the validation
cohort, it performed well in a sub-group of patients with no prior history of TB. However,
30
we limited the analysis of the performance characteristics of this scoring system in the
population of all patients, as this was the intent of the validation study, and not the post-
hoc analysis of the performance in the selected subgroup.
We identified only one study that assessed clinical-radiographic scoring systems for use
in the out-patient setting. Our systematic review also failed to identify a clinical
radiographic scoring system that was derived for PLWH suspected of having active PTB.
Bock et al.(24) performed a sub-group analysis in PLWH, but found no radiographic
feature to be significantly associated with active PTB in this sub-group, a finding that is
consistent with the atypical nature of radiographic manifestations of PTB described
among PLWH (61).
Automated computer-assisted diagnosis (CAD) employs techniques such as texture
analysis for reading chest radiographs, and appears to be a promising modality for
standardizing and improving the diagnostic performance of digital chest radiography
(62). However, our review suggested a lack of methodologically high-quality studies.
Further development of the field should focus on the validation of such techniques in
larger populations and with a structured epidemiological approach using appropriate
reference standards.
The strength of our systematic review is in the extensive review of the literature, with two
reviewers independently performing the review, and basing every decision on discussion
and consensus. We restricted our search to articles written in English, French and
Spanish, but an assessment for language bias suggested that we included a high
31
proportion of the available literature. We conducted citation searches of the included
articles and review articles to identify any published study that we may have failed to
include because of the language restriction, but did not identify any such studies.
However, we may have inadvertently failed to include articles of relevance in other
languages, and acknowledge this as a shortcoming of the review. All the included studies
suffered from incorporation bias, as the results of the chest radiograph and/or the clinical
components of the scoring system played a role in the decision of which patients would
be investigated further with TB culture. This may have over-estimated the accuracy of the
scoring systems in relation to culture. Six of the 12 studies also did not include a sample
that was considered representative of the target population (selection bias). Six of the 12
studies were unclear about whether the person assigning scores to the subjects for the
various components of the scoring system was blinded to the results of the reference test.
Selection bias and absence of blinding are features of study design that have been
associated with inflated accuracy estimates(63, 64). These limitations in the quality of the
included studies need to taken into consideration when drawing conclusions.
2.6 CONCLUSIONS
Our systematic review revealed no scoring system designed to assess the likelihood of
active PTB based exclusively on radiographic features. Measures to create such a system
would help standardize the interpretation of chest radiographs for the diagnosis of active
PTB. The systematic review identified clinical-radiographic scoring systems, most of
which were created with the aim of predicting the likelihood of active PTB among
patients who were to be admitted to hospitals. Such scoring systems are intended for
assessing the need for respiratory isolation of patients in healthcare settings. Although
32
most of these systems have high sensitivity, they have low specificity for active PTB.
There is a need to derive accurate scoring systems for PLWH and patients evaluated in
out-patients settings, especially in low-resource settings. Technological advances in the
interpretation of chest radiographs, such as CAD, need to be refined and validated in
well-designed studies to assess their utility.
33
2.7 TABLES AND FIGURES
Table 1. Characteristics of studies included
Study Country Setting
No. of
eligible
TB
suspects
included(
% of
eligible
participan
ts)
Design Inclusion criteria
Chest
radiograph
reader(s)
Reference
standard –
type of
culture
Studies that included all TB suspects
Bock et al
(1996)15
USA In-patient 295 (78)
Cross-sectional,
retrospective
1.Patients with active TB
2.Patients with TB in the
differential diagnosis
3.AFB smears and cultures
ordered
4.HIV + with abnormal CXR
Radiologist Solid and
liquid
El-Solh et al
(1997)16 USA In-patient 286 (100)*
Cross-sectional,
retrospective
All isolated patients, based on
symptoms, prior history of TB
exposure, HIV status, medical
and social risk factors, and
radiographic findings
Radiologist
and
pulmonologist
Liquid
El-Solh et al
(1999)17 USA In-patient 119 (100)* Cross-sectional
All patients in whom and AFB
smear and culture was
requested
Radiologist
and
pulmonologist
Liquid
Moran et al
(2009)18 USA In-patient 2535 (91)* Cross-sectional
Admission diagnosis of
pneumonia or suspected TB
Emergency
medicine
resident
NR
Mylotte et al
(1997)19 USA In-patient 220 (100)*
Cross-sectional,
retrospective
All patients in whom and AFB
smear and culture was
requested by the admitting
physician
Not reported Liquid
34
Solari et al
(2008)20 Peru
Emergency
Department 345 (70.8) Cross-sectional
Productive cough for > 1 week
or Cough of any duration and
1.Fever > 3 weeks or 2.Weight
loss of at least 3kg in previous
month or 3.Night sweats or
hemoptysis or differential
diagnosis of PTB from
attending physician
Internist,
internal
medicine
resident
Solid
Studies that included smear-negative TB suspects
Lagrange-
Xelot et al
(2010)21
France In-patient 134 (100) Cross-sectional
Suspected TB, as
recommended by French
guidelines
Not reported Liquid
Soto et al
(2008)22 Peru In-patient 262 (100) Cross-sectional
Cough > = I week AND one
or more of the following:
1.Fever
2.Weight loss >= 4kg in 1
month
3.Breathlessness
4.Constitutional symptoms
(malaise or hyporexia for a
minimum of 2 months)
Not reported Solid
Soto et al
(2011)23 Peru Out-patient 663 (96.9) Cross-sectional
Cough > = 2 weeks AND one
or more of the following:
1.Fever
2.Weight loss
3.Breathlessness
1.General
practitioner
2.TB specialist
Tie breaker:
Experienced
radiologist
Solid, liquid or
concentrated
smear
Wisnivesky
et al
(2000)25
USA In-patient 112 (100) Case-control
Cases - isolated TB patients
controls - randomly selected
from a log of patients who
submitted smears and cultures
matched on age (+/- 3 years),
sex and year of presentation, 3
smears negative, culture
negative and isolated in a
hospital
1.Radiologist
2.Radiologist
Solid and
liquid
35
Wisnivesky
et al
(2005)24
USA In-patient 516 (100) Cross-sectional Patients admitted and isolated
because of suspicion of PTB Not reported
Solid and
liquid
Study that included only HIV-uninfected patients
Rakoczy et
al (2008)26 USA In-patient 280 (100)* Case-control
Cases- all TB inpatients
controls - all inpatients placed
under airborne precautions
with negative smears and
cultures matched with cases
on time of admission (+/- 6
days)
Not reported Not reported
*Studies had derivation and validation cohorts. The number of TB suspects represents those in the validation cohorts
36
Table 2. Demographic characteristics of subjects in the included studies
Study Age (years) No. of
Males (%)
No. of
Persons
Living with
HIV (%)
Patients
with Active
TB (%)
Studies that included all TB suspects
Bock et al (1996)15+
mean 41 296 (79) 230 (61.0) 53 (14.1)
El-Solh et al (1997)16 ##
mean(SD)
PLWH:
36.6(0.4)
non-PLWH:
50.4(1.2)
NR 316 (56.1) 47 (8.3)
El-Solh et al (1999)17
NR NR 66 (55.5) 11 (9.2)
Moran et al (2009)18 ##
median (IQR)
48(38-63)
3567 (63)* 1058 (20.8) 224 (4.4)
Mylotte et al (1997)19 mean(SD)
44(16)
NR 129 (59.0) 8 (3.6)
Solari et al (2008)20
median 33 222 (64.4) 45(13.0) 109 (31.6)
Studies that included smear-negative TB suspects
mean (SD)
Lagrange-Xelot et al
(2010)21 43 (14.0) 94 (70) 60 (40.0) 26 (19.0)
Soto et al (2008)22
NR 166 (63.4) 28(10.9)**
27 (10.3)
Soto et al (2011)23
41.4 (17.2) 370 (55.8) 98 (24.0)#
184 (27.8)
Wisnivesky et al (2000)25
cases – 40
(2)
controls – 40 (2)
82 (73.2) NR 56 (50)
Wisnivesky et al (2005)24
46.3 (11.9) 285 (55.2) 362(70.0) 19 (3.7)
Study that excluded PLWH
Rakoczy et al (2008)26 cases – 60
controls – 51.8
cases -
33(67)
controls –
29 (59)
0 33 (11.8)
IQR, interquartile range; NR, not reported; PLWH, persons living with HIV
* Sex not documented in 1.4% patients
** 6 patients refused testing
# 255 patients refused testing
## The demographic characteristics represent those of the included subjects in the
combined derivation and validation cohort
+ The demographic characteristics represent those of the eligible subjects
37
Table 3. Quality assessment of the included studies using the QUADAS tool
Item
Bo
ck et
al
(19
96
)15
El-
So
lh
et
al (
19
97
)16
El-
So
lh
et
al (
19
99
)17
Mo
ran
et
al (
20
09
)18
My
lott
e et
al (
19
97
)19
So
lari
et
al
(20
08
)20
Lag
ran
ge-
Xel
ot
et
al21 (
20
10)
So
to
et
al
(20
08
)22
So
to
et
al
(20
11
)23
Wis
niv
esk
y
et
al
(20
00
)25
Wis
niv
esk
y
et
al
(20
05
)24
Rak
ocz
y e
t
al (
20
08
)26
Representative
sample?
Acceptable reference
standard?
Acceptable delay?
Partial verification
avoided?
Differential
verification avoided?
Incorporation
avoided?
Reference standard
blinded?
Index results blinded?
Relevant clinical
information available?
Uninterpretable
results reported?
Withdrawals
explained?
Yes No Unclear
38
Table 4 Association of upper lobe opacities visualized on the chest radiograph and
active pulmonary TB
Study OR 95% CI
Studies that included all TB suspects
Bock et al (1996)
15 5.14 2.56,10.33
El-Solh et al (1997)16
*
10.1
18
5.29,19.3
5.47,59.36 when adenopathy also present
El-Solh et al (1999)17
2.38 0.67,8.41
Moran et al (2009)18
**,# 7.7
5.9,10
Solari et al (2008)20
5.15 3.03,8.75
Studies that included smear-negative TB suspects
Lagrange-Xelot et al
(2010)21
3.83 1.52,9.6
Soto et al (2008)22
** 4.81 1.93,11.92
Soto et al (2011)23
** 2.47 1.71,3.57
Wisnivesky et al
(2000)25+
9.07 2.5,32.9
Wisnivesky et al
(2005)24+
3.96 1.57, 9.98
* reported “upper zone disease”,
**reported “apical infiltrates”
+ reported “upper lobe consolidation”
# The study reported relative risks (RR) as the measure of association
39
Table 5 Association of the presence of a cavity visualized on the chest radiograph
and active pulmonary TB
Study OR 95% CI
Studies that included all TB suspects
Bock et al (1996)15
4.75 1.43,15.74
Moran et al (2009)18
* 7.68 5.88,10.05
Solari et al (2008)20
2.11 1.25,3.55
Studies that included smear-negative TB suspects
Lagrange-Xelot et al
(2010)21
25.66 6.42,102.69
Soto et al (2008)22
1.97 0.76,4.87
Wisnivesky et al
(2000)25+
2.04 0.7,5.96
*The study reported relative risks (RR) as the measure of association
40
Table 6 Details and performance characteristics of included scoring systems
Study Feature Test details Rule
Sensitivity
%
(95%CI)#
Specificity
%
(95%CI)#
PPV
%
(95%
CI)#
NPV
%
(95%
CI)#
Area
under the
curve
(SE)
Studies that included all TB suspects
Bock et al
(1996)15
1.CXR with upper lobe
infiltrate
2.CXR with cavity
3.Knew someone with
active TB
4.Self-report of positive
tuberculin skin test in past
5.Self-report of isoniazid
prophylaxis therapy in the
past
Logistic regression OR,
95%CI
5 (2.38,10.51)
3.93 (1.06,14.62)
2.42 (1.1,5.32)
5.67 (1.57,22.01)
0.18 (0.04,0.82)
Any of 1-3 or 4(in
the absence of 5)
considered test
positive
81 26
El-Solh et al
(1997)16
Upper zone disease
weight loss
diabetes mellitus
Used classification and
regression tree analysis
Upper zone
disease absent and
fever absent = test
negative
Upper zone
disease absent,
but fever present
= test negative if
no weight loss
and CD4+ > 200
Everyone else to
be considered test
positive
100
(78,100) 50 (44,57) 22 100
0.878
(0.029)
41
El-Solh et al
(1999)17
Age, CD4+counts,diabetes
mellitus,HIV,tuberculin
skin test positivity
Chest pain,weight loss,
cough, night sweats,
fever,shortness of breath
Upper lobe infiltrate,
lower lobe infiltrate, upper
lobe cavity, lower lobe
cavity, adenopathy,
unilateral pleural effusion,
bilateral pleural effusion,
pleural thickening, miliary
pattern, normal
Used general regression
neural network
100
(91,100)
72
(65,77)
0.947
(0.028)
Moran et al
(2009)18
1.Apical infiltrate
2.Cavitation
3.Immigrant
4.Weight loss
5.Positive TB history
6.Homeless
7.Incarcerated
Used chi-squared recursive
partitioning
Any of 1-7
present = test
positive
96 49 8 100
Mylotte et al
(1997)19
1.AFB positive smear
2.Localized CXR change
3.Correctional facility
residence
4.History of weight loss
OR, 95%CI
5.8 (3-11)
2.5 (1.3,4.9)
2.3 (1.2,4.4)
1.8 (1,3.2)
Score
3
2
2
1
Score > 3 = test
positive 88 63.2 8 99
0.86
(0.04)
42
Solari et al
(2008)20
1.Age < 35
2.Age 35-60
3.Age > 61
3.Weight loss
4.History of PTB
5.Miliary pattern
6.Cavity
7.Upper lobe infiltrate
OR, 95%CI
0.97 (0.96,0.99)
2.79 (1.51,5.18)
0.51 (0.28,0.95)
8.04 (2.79,23.16)
2.54 (1.4,4.62)
5.64 (3.2,9.93)
Score
0
-1
-2
5
-3
10
5
9
Score > 3 = test
positive
93 42 43 93 0.809
(0.05)
Studies that included smear-negative TB suspects
Lagrange-
Xelot et al
(2010)21
(validation of
score
developed by
Wisnivesky
et al25
)
TB risk factors or chronic
symptoms
Self-report of positive
tuberculin skin test in past
Shortness of breath
Temperature < 38.5 deg C
Temperature 38.5 -39 deg
C
Temperature > 39 deg C
Crackles on physical
examination
Upper lobe disease on
CXR
Score
4
5
-3
0
3
6
-3
6
> 1 = test positive 96.2 21.3 23 96
Soto et al
(2008)22
Hemoptysis
Weight loss
Age > 45
Expectoration
Apical infiltrate
Miliary infiltrate
OR (95% CI)
3.24 (1.11, 9.22)
2.35 (0.86,6.43)
2.01 (1.01,3.01)
0.35 (0.14,0.9)
4.29 (1.7,10.86)
9.31 (2.21,39.24)
Score
2
1
-1
-1
3
4
< 0 = low
probability
> 4 = high
probability
At a cut-off < 0
At a cut-
off of > 2
0.83
(0.07)
93 50
At a cut-off > 4
92
43
Soto et al
(2011)23
Same as above – validation
study
> 5 = high
probability
At a cut-off < 0
97.8
(94.5,99.4)
14
(11,17.4)
At a cut-off > 5
23.9
(17.9,30.7)
93.1
(90.5,95.2)
Wisnivesky et
al (2000)25
TB risk factors or chronic
symptoms
Self-report of positive
tuberculin skin test in past
Shortness of breath
Temperature < 38.5 deg C
Temperature 38.5 - 39 deg
C
Temperature > 39 deg C
Crackles on physical
examination
Upper lobe disease on
CXR
OR, 95%CI 7.9 (4.4,24.2)
13.2 (4.4,40.7)
0.2 (0.1,0.5)
2.8 (1.1,8.3)
0.3 (0.1,0.5)
14.6 (3.7,57.5)
Score
4
5
-3
0
3
6
-3
6
> 1 = test positive 98
(95,100)
46
(33,59) 3.3* 99.9*
Wisnivesky et
al (2005)24 Same as above
95
(74,100)
35
(31,40) 9.6 99.7
Study that excluded PLWH
Rakoczy et al
(2008)26
Chronic symptoms
Immunosuppression*
(other than HIV)
foreign birth
CXR upper zone findings
shortness of breath
OR, 95% CI 10.21(2.95,35.4)
8.14(2.08,31.8)
7.01(2.1,23.8)
5.28(1.6,17.2)
0.13(0.04,0.45)
Score
6
4
2
2
-2
> 4 = test positive
97 42 61 95
Validation of the score
developed by Wisnivesky 96 18
*Calculated assuming a prevalence of 2%
# 95% CIs are included when the published study either provided these, or the data were sufficient for the CIs to be calculated
CXR , chest x-ray; NPV, negative predictive value; PPV, positive predictive value
44
Figure 1. PRISMA flow diagram for included and excluded studies
Records identified through database
searching
(n = 11599)
MEDLINE n = 3586 EMBASE n= 5740
Web of Science n = 2273
Scr
een
ing
In
clu
ded
E
ligib
ilit
y
Iden
tifi
cati
on
Additional records identified
through other sources
(n = 308)
Records after duplicates removed
(n = 8066)
Records screened
(n = 8066)
Records excluded
(n = 7916)
Full-text articles assessed
for eligibility
(n =168)
Full-text articles excluded,
(n = 156)
Not a scoring system – 44
Pediatric study – 35
Reference standard not
satisfied – 34
Specific features of CXR
not analyzed – 16
Review – 10
Other language – 3
Screening of asymptomatic
patients – 3
Not M.TB - 3
Abstract – 2
Duplicate – 2
Editorial/commentary – 2
Article not obtained - 4
Studies included in
qualitative synthesis
(n = 12 )
Studies included in
quantitative synthesis
(meta-analysis)
(n = 5)
45
Figure 2. Diagnostic odds ratio for active pulmonary TB among all TB suspects with
an upper lobe infiltrate visualized on the chest radiograph
The size of the black squares corresponds to the relative weight that was assigned to each study.
The diamond represents the pooled estimate for the diagnostic odds ratio.
The blue lines represent the confidence intervals around the respective estimates.
Pooling was performed using the DerSimonian and Laird random effects method
Figure 3. Diagnostic odds ratio for active pulmonary TB among all TB suspects with
a cavity visualized on the chest radiograph
The size of the black squares corresponds to the relative weight that was assigned to each study.
The blue lines represent the confidence intervals around the respective estimates.
Pooling was not performed as there was significant heterogeneity of effects (I2 = 93%)
46
Figure 4. Scoring systems for studies that included all TB suspects. The figures show
the estimated sensitivity (A) and specificity (B) of the study (black square) and its
95% confidence interval (blue horizontal line).
(A)
(B)
47
Figure 5. Diagnostic odds ratio for active pulmonary TB among smear-negative TB
suspects with an upper lobe infiltrate visualized on the chest radiograph
The size of the black squares corresponds to the relative weight that was assigned to each study.
The diamond represents the pooled estimate for the diagnostic odds ratio.
The blue lines represent the confidence intervals around the respective estimates.
Pooling was performed using the DerSimonian and Laird random effects method.
Figure 6. Diagnostic odds ratio for active pulmonary TB among smear-negative TB
suspects with a cavity visualized on the chest radiograph
The size of the black squares corresponds to the relative weight that was assigned to each study.
The blue lines represent the confidence intervals around the respective estimates.
Pooling was not performed as there was significant heterogeneity of effects (I2 = 88%)
48
Figure 7. Scoring systems for studies that included smear-negative TB suspects. The
figures show the estimated sensitivity (A) and specificity (B) of the study (black
square) and its 95% confidence interval (blue horizontal line).
(A)
(B)
49
CHAPTER 3: Bridging chapter: The need for a chest radiograph
scoring system for the diagnosis of pulmonary tuberculosis
The systematic review was conducted with the aim of :
a) identifying scoring systems that had been designed and used for the diagnosis of TB
using radiographic features visualized on the chest radiograph and assessing their
performance characteristics, and
b) identifying features that were consistently found across studies to be associated with
PTB that would help in deciding a priori which features we would include in the analysis
for the derivation of a scoring system.
Our systematic review found no scoring system designed to assess the likelihood of
active PTB based exclusively on radiographic features. Although we found 12 studies
that analyzed scoring systems that included radiographic features, all of these were
clinical-radiographic scoring systems.
Three caveats need to be acknowledged while interpreting the results of the systematic
review:
Firstly, 11 of the 12 studies were conducted among inpatients in hospitals, a population in
whom the manifestations of the disease are likely to be subject to a spectrum bias as
compared to outpatients(65).
50
Secondly, 9 of the 12 studies were conducted in high-income, low TB burden settings,
and the radiographic manifestations of the disease, and pre-test probabilities are likely to
be different in such settings as compared to low-income, high-burden settings(66).
Lastly, the purpose of most of the studies was to assess the likelihood of TB for purposes
of respiratory isolation in hospitals, and the derivation of such scores could have different
aims (and consequently different cut-offs) as scoring systems derived with diagnostic
purposes in mind.
We therefore, concluded a need to derive a scoring system that relied exclusively on
radiographic features, visualized on chest radiographs of out-patient subjects in low-
income, high-burden settings for diagnostic purposes in whom clinical features were
consistent with the disease.
At the University of Cape Town (Cape Town, South Africa), a parent prospective study
(TB-NEAT) was conducted to evaluate several TB diagnostic tests and their contributions
to the diagnosis of active TB in an HIV-endemic setting (67-69). The study consecutively
recruited outpatients with suspected pulmonary TB at two primary care clinics over a 3-
year period. The study involved the documentation of abnormalities visualized on the
chest radiographs of all subjects by two readers who independently read the radiographs
using a standardized validated tool, the CRRS. The large number of subjects involved in
a study conducted in a low-income, high HIV and TB burden outpatient setting provided
an ideal setting for the derivation of a chest radiograph scoring system.
51
The radiographic features that were found consistently associated with PTB in the
systematic review were cavities, upper lobe opacities, unilateral pleural effusions, hilar
and/or mediastinal adenopathy and these were included in the univariate analysis for their
association with PTB for the derivation of the score.
We aimed to use the knowledge gained from the literature review to inform the derivation
of a score that we hoped could have significant implications in the diagnosis of the
disease, and we present the derivation of the score in the following chapter.
52
CHAPTER 4: Development of a reliable and simple radiographic
scoring system to aid the diagnosis of pulmonary tuberculosis
(MANUSCRIPT 2)
3.1 ABSTRACT
Rationale: In tuberculosis (TB) suspects whose sputum smears are AFB negative, chest
radiography is often the only alternative diagnostic tool available. However, chest
radiographs lack specificity for TB, and their interpretation is subjective and not
standardized, and therefore not reproducible. Efforts to improve the interpretation of
chest radiography are warranted.
Objectives: To derive a scoring system to aid the diagnosis of PTB, using features noted
on the Chest Radiograph Reading and Recording System (CRRS).
Methods: Chest radiographs of outpatients with suspected PTB, consecutively-recruited
over 3 years at clinics in South Africa, were read by two independent readers using
CRRS. Multivariable analysis was used to identify features significantly associated with
culture-positive active pulmonary TB, and these were weighted and used to generate a
score.
Results: 473 patients were included in the analysis. Large upper lobe opacities, cavities,
unilateral pleural effusion and adenopathy were significantly associated with PTB, had
high inter-reader reliability, and received 2, 2, 1 and 2 points, respectively in the final
score. Using a cut-off of 2, the score had a high negative predictive value (92%, 95%CI
87,95). Among TB suspects with negative sputum smears, the score correctly ruled out
active disease in 214 of 229 patients (NPV 93; 95%CI 89,96).
53
Conclusions: The derived scoring system is a simple and reliable tool that is useful for
ruling out active PTB in smear-negative patients, thus potentially reducing the need for
further testing in high burden settings. Validation studies are now warranted.
54
3.2 INTRODUCTION
Despite the fact that tuberculosis (TB) is often curable, the disease continues to be a
major problem globally (1). Existing diagnostic tests for the disease are limited in their
scope, either due to performance limitations of traditional tests (70), or due to the cost of
more accurate tests such as liquid culture and nucleic acid amplification-based tests such
as Xpert MTB/RIF (67, 71), often making these newer tests inaccessible to clinics in
developing countries. As a result, TB case detection rates remain low worldwide (1).
Sputum smear microscopy and chest radiography have well recognized limitations, but
are often the only diagnostic tools available for clinicians in resource-limited settings.
Therefore, efforts are necessary to optimize and improve their performance. While much
work has been done to optimize sputum microscopy using strategies such as light-
emitting diodes (72), sputum concentration techniques (73), and same-day diagnosis (74)
, similar work is needed to improve the diagnostic accuracy of chest radiography for TB.
Most studies assessing the accuracy of chest radiography use subjective assessments of
the probability of TB by readers. As a result, despite being a sensitive test for the
diagnosis of active TB among patients with chest symptoms (75), the use of chest
radiography for mass screening of individuals for TB has been limited by high inter- and
intra-observer variability, thereby affecting the accuracy and reproducibility of the test
(4). In our review of the literature, it was evident that certain features noted on a chest
radiograph, such as apical infiltrates and cavities, are known to be highly suggestive of
active TB (24-35, 37, 38). However, to the best of our knowledge, there is no simple
55
scoring system that combines the systematic assessment of each of the relevant visualized
features with quantification of these parameters to generate an exclusively radiographic
score that predicts the likelihood of active pulmonary TB.
The Chest Radiograph Reading and Recording System (CRRS) is a system developed to
standardize the reading of chest radiographs in epidemiological studies of TB and lung
disease (76). The system has been validated in high-burden TB settings for documenting
the various features visualized on a chest radiograph, and these studies suggest that it may
be a useful tool to improve inter- and intra-reader reliability (77, 78). However, the
CRRS, although validated and shown to be reliable, is an epidemiological research and a
radiology training tool. Chest radiographs are systematically read by working through a
long predefined checklist (a copy of the form can be found in Appendix 6.3) culminating
in a trained reader‟s subjective assessment of the likelihood of active TB. Our study aims
to refine the system further and improve its diagnostic utility, by deriving a weighted
radiographic scoring system that can aid in the rapid, yet reliable reading of chest
radiographs, making the test easy to use for health care and research workers in high
burden settings. Moreover, we sought to investigate how this knowledge could be
integrated with clinical practice.
3.3 METHODS
Study subjects and data collection:
At the University of Cape Town (Cape Town, South Africa), a parent prospective study
(TB-NEAT) was conducted to evaluate several TB diagnostic tests and their contributions
56
to the diagnosis of active TB in an HIV-endemic setting (67-69). The study consecutively
recruited outpatients with suspected pulmonary TB at two primary care clinics over a 3-
year period.
To qualify as a TB suspect, an individual had to present to the hospital with at least two
of the following symptoms: cough for ≥ 2 weeks, haemoptysis, fatigue, night sweats,
fever for ≥ 2 weeks, weight loss, loss of appetite, or being bedridden (one of the
symptoms, if the patient was HIV-infected). Only patients ≥ 18 years were enrolled into
the study. After giving written informed consent, all patients underwent diagnostic
testing, which included two sputum samples evaluated by concentrated smear
microscopy, two sputum cultures using the MGIT 960 liquid culture system (BD
Diagnostic Systems, Sparks, MD, USA), chest radiography, standardized interferon-
gamma release assays, HIV testing, and CD4 T cell count for those who were HIV-
infected. Epidemiological data were captured by a trained interviewer-administered
questionnaire, which was completed by all patients.
CRRS training, and reading of the chest radiographs:
The CRRS training course is held bi-annually at the University of Cape Lung Institute in
Cape Town (See http://www.lunginstitute.co.za/content/talks.html). The course
involves a two-day programme of interactive training using standard chest radiographs.
On the first day of the course attendees are instructed on chest anatomy and disease
presentation, and a standardized approach to identifying radiological abnormalities is
introduced. On the second day attendees read archived radiographs using the structured
57
CRRS form (Appendix 6.3) and consolidate their understanding about the detail required
for standardized reporting. The form has a burden time of approximately 6 to 10 minutes.
On the third day an examination using 24 standardized radiographs is undertaken and
trainees are awarded either "A" or "B grade" accreditation based on their interpretation
of an examination set of radiographs. For the study, chest radiographs were read by two
independent readers (pulmonology fellows in the Department of Medicine), trained in
CRRS, and their findings were recorded on a computerized form. The CRRS involves the
use of a systematic checklist that details abnormal features visualized on a chest
radiograph, such as opacities and cavities. After a clinician reads the chest radiograph
with the assistance of the checklist provided by the CRRS, he/she is asked to gauge the
likelihood of active PTB. However, this assessment of the likelihood of active disease is
subjective, and relies on the overall impression of the reader, guided by the checklist.
Discrepancies, where appropriate, were resolved through reading by a third senior reader
(a faculty pulmonologist in the department). Where a consensus read was not possible,
the read from the senior reader was used in the analysis. Radiographs that were taken ≥ 3
months after the study entry date were discarded, as these did not represent the
radiographic presentation at the time of symptomatic manifestations and bacteriological
evidence of the disease.
Ethics approval:
The study was approved by the University of Cape Town‟s Health Sciences Faculty
Research Ethics Committee (REC REF 421/2006) and the McGill University Faculty of
Medicine ethics committee (Study no. A11-E69-11B)
58
Derivation of the score:
The CRRS includes a detailed description of the features visualized on the chest
radiograph with the aim of documenting these features for epidemiological surveys. For
the purpose of the diagnostic use of the tool, the identification of variables from the
CRRS to be included in the analysis was guided by a review of the literature (24-35, 37,
38). Features visualized on the chest radiograph suggestive of active TB that were
consistent across the reviewed studies were the preferential distribution of lesions in the
upper lobes of the lungs, the specificity of cavities for the disease, and the specificity of
unilateral pleural effusions when compared to bilateral effusions for the disease (24-35,
37, 38). These factors were taken into consideration for the a priori selection of variables.
Clinical characteristics that were known to influence the interpretation of the radiograph,
independent of disease status, were also analyzed. These characteristics included age, sex,
smoking status, HIV status and past history of TB.
The outcome of interest was the presence of active pulmonary TB, defined as the growth
of M.tuberculosis on at least one culture. Patients with two negative cultures were
classified as having a final culture-negative result. Similarly, a patient with two negative
sputum smears was classified as having a smear-negative status.
A univariate analysis was performed to identify significant associations (we used a liberal
threshold of P<0.2 for statistical significance, as we were exploring variables to include
in the multivariable analysis at this stage) between the pre-defined radiographic and
59
clinical features and the outcome of interest. Chi-square tests were used for categorical
radiographic variables and t-tests were used for continuous variables.
Variables found to be significant in the univariate analysis, or which were identified a
priori by the literature review were entered into a multivariate logistic regression model.
Factors found to be independent predictors of the outcome (P<0.05) were selected for the
final model, and a stepwise backward elimination process was employed, using the
likelihood ratio test (79), to eliminate variables that did not significantly contribute to the
model. Although HIV status was not significantly associated with disease, we adjusted
for it in the final model, as HIV is known to alter the radiographic presentation of active
TB.
We assigned scores to each radiographic feature found to be an independent predictor of
outcome in the final model, weighted according to the beta-coefficients from the final
multivariate logistic model. Weights were rounded up to the nearest integer.
Data analysis:
The various major criteria in the CRRS were analyzed for inter-reader reliability among
the two junior readers and a kappa statistic (the chance-adjusted measure of agreement,
defined as the ratio of the actual agreement beyond chance to the potential agreement
beyond chance for inter-reader agreement) was calculated (80). Based on the weights
assigned to the four radiographic features found to be significantly associated with active
pulmonary TB, we calculated a total score for each patient‟s radiograph and analyzed the
60
performance characteristics (sensitivity, specificity and predictive value) of the score at
various cut-points for the diagnosis of culture-confirmed active TB. We also analyzed the
performance of the score in the subset of patients with smear-negative pulmonary TB,
and among patients who were HIV-infected. Data were analyzed using STATA version
11.0 (Stata Corp, College Station, Texas, USA).
3.4 RESULTS
Demographic characteristics of subjects:
Of 645 patients recruited into the parent study, 473 patients were included in the final
analysis. As outlined in Figure 1, the major reasons for exclusion were inability to
produce sputum, contaminated sputa, missing chest radiographs, and the chest radiograph
being read solely by one reader. As seen in Table 1, there were no significant differences
in the demographic features of the patients who were included and those who were
excluded from the analysis.
The mean age of the patients was 39.2 years (SD 12.1). 67 (14.2%) patients refused HIV-
testing. Among those tested, 121(25.6%) were found to be HIV-infected (median CD4+
cell count = 185/cu.mm. among 115 patients with available CD4+ counts). 138 (29.2%)
of patients with suspected TB were found to have culture-confirmed active TB, and
91(19.2%) of all TB suspect patients were positive on sputum smear microscopy.
61
Reliability of the CRRS:
The inter-reader reliability of the CRRS for various major categories is summarized in
Table 2. The kappa-statistic ranged from moderate (0.56) for small opacities to excellent
(0.77) for pleural effusions. The kappa-statistic for the overall judgment on whether the
reader considered the features of the chest radiograph to be consistent with active TB was
0.52 (S.E.0.05).
Development of the score:
Results of the univariate analysis of the various chosen radiographic and clinical criteria
are summarized in Table 3. A backward elimination process was attempted, but all the
variables initially selected for inclusion in the multivariable model were found
significant, and were therefore retained in the final analysis. The final model adjusted for
age and HIV status, but these were not included in the score, as the aim was to develop a
radiograph-based score. Based on the beta-coefficients of the variables of the
multivariable logistic regression, scores were assigned to the individual radiographic
features. The results of the multivariate analysis and scores assigned to the variables in
the final model are summarized in Table 3.
Performance of the score:
The score thus developed was tested at different cut-offs, the results of which are shown
in Table 4. At a cut-off of ≥ 2, the score had a high negative predictive value (92%,
95%CI 87,95), and misclassified 20 of 138 patients with active TB. At this cut-off, using
the score was associated with a better specificity (64%, 95% CI 59,69) than the subjective
62
assessment of the probability of TB by the readers (28%, 95% CI 23,33). This was
accompanied by a loss in sensitivity that was not statistically significant (86, 95% CI
79,91 v/s 93, 95% CI 88,97) (Table 4). The gain in specificity at higher scores was
accompanied by appreciable losses in sensitivity.
In sputum-smear negative patients, at the same cut-off, the test had a good rule-out value
(NPV 93%, 95% CI 89,96), and correctly classified 214 of 229 smear-negative patients as
not having active disease. The performance of the score in smear-negative TB patients is
shown in Table 5. The score had a better negative predictive value for HIV-uninfected
individuals (92, 95% CI 86, 96) as compared to HIV-infected individuals (86, 95% CI
75,94) (Table 6), although the difference was not statistically significant (p=0.21).
3.5 DISCUSSION
Although there are diagnostic scoring systems for pulmonary TB that rely on a
combination of clinical and radiographic features (24-35, 37, 38), there is currently no
validated scoring system that relies exclusively on radiographic features. Such a scoring
system, if simple and rapid, would be invaluable for the management of TB suspects,
especially for clinicians practicing in busy, resource-poor settings. The system would be
particularly useful in the algorithm for the management of smear-negative TB suspects. It
could also serve as a valuable field research tool. This study attempts to derive such a
scoring system that is simple and reliable, and has improved accuracy in a high TB and
HIV burden setting, using CRRS, a standardized reading and recording system.
63
The features that were found to be significant in the multivariable model (upper lobe
opacities, cavities, unilateral effusions and adenopathy) are consistent with the reported
literature (24-35, 37, 38), and validate the use of these features to predict active TB. The
derived scoring system relies on 4 features (UL opacities, cavitation, unilateral effusion
and adenopathy) giving a maximum score of 5 points. The major advantage over the
existing CRRS system is the quantification of disease risk (as opposed to a subjective
impression). Another attractive feature of the scoring system is its potential simplicity.
Although the present study involved readers who were pulmonology fellows subjected to
rigorous training for the use of the CRRS as a research and epidemiological tool, training
for the recognition of the four features identified to be relevant for the score would
potentially be easier, and could make the score a quick and easy tool among general
practitioners in the clinical setting. However, the score would need to be prospectively
tested in such a setting before the results of the present study can definitively confirmed.
Age was found to be inversely associated with the probability of disease, a finding that is
consistent with a high-burden setting such as South Africa, where active disease shows a
peak incidence in younger ages(81).
We envisage the utility of the score to be most relevant in TB suspects with negative
sputum smears. Such patients comprise ~85% of the total number presenting to high
burden TB clinics as those with suspected TB (~15% are smear positive; about half of
those with TB). Thus, those with negative smears present a huge service burden to clinics
adding to health care management costs. Patients with active pulmonary TB who have
negative sputum smears are also known to have adverse outcomes (82). It is thus
64
important to aggressively investigate such patients for active TB, yet the limitations of
resources often limit the scope of such investigations. A score using an inexpensive test
such as radiography could save significant resources if it were shown to have a high rule-
out value (83). The recent WHO policy on Xpert MTB/RIF (Cepheid, Sunnyvale, CA),
an automated molecular test, for the rapid diagnosis of TB and MDR-TB recommends
chest radiography as one of the strategies to screen individuals prior to the use of Xpert
MTB/RIF, as the cost of screening with Xpert MTB/RIF is very high is most developing
countries (21). In our cohort of patients, the scoring system at a cut-off of ≥ 2 correctly
ruled out disease in 93% of smear negative suspects (45% of symptomatic patients), thus
potentially avoiding further testing in these patients. An analysis, in the same cohort of
patients, of various strategies to pre-screen patients prior to the use of Xpert MTB/RIF
found that a subjective assessment of chest radiographs (CRRS) was able to accurately
rule out TB in 18% of patients with no false negative results, reflecting the high
sensitivity but poor specificity of such subjective assessments (84).
While the score had a high negative predictive value in HIV-infected patients (86.4, 95%
CI 75-94), the rule-out value may not be high enough to be clinically useful for this
subgroup known to be at a high risk of progression of disease and death (85). However,
the imprecision of the negative predictive value is a reflection of the small sample size,
and further studies will have to be conducted to assess the utility of the scoring system in
in HIV-infected individuals.
65
The finding of high inter-reader reliability for the major features among readers trained to
report radiographs using the CRRS is consistent with earlier reports(76-78). The high
reliability is useful, given that standardizing the reading of chest radiographs for
pulmonary TB and increasing the reproducibility has always been an impediment to the
accuracy of the test. This reinforces the use of the CRRS system as an attractive tool for
epidemiological surveys for reading and reporting, a purpose for which it was designed.
However, we show here that a quick, simple scoring system potentially has similar rule-
out value with risk probability profiling that is useful for clinical decision-making. Using
this score will likely not require as intensive training at the CRRS system requires but this
needs validation in prospective studies.
Our study has several limitations. Due to logistical difficulties, not all chest radiographs
that had discordant readings were resolved by a third reader, and we had to accept the
report of the senior reader for analysis. We had to exclude several patients from the
analysis, primarily due to the absence of chest radiographs, and this could have
introduced a bias in our study. However, the comparisons of the demographic features of
the patients included and excluded suggest that the absence of radiographs could be
random, and was not associated with the study or outcome variables. We tested the score
in the same population from which the score was derived, and this could lead to an
overestimation of the performance of the score. However, the consistency of the features
used in the score with what is described in the literature suggests that these features are
likely to be reproducible. However, a validation study would be needed to confirm this
further. Despite the high negative predictive value of the score misclassification (false
66
negative) occurred in 20 patients with active disease. This highlights the principle that all
diagnostic tests including the chest radiograph must be interpreted within the clinical
context of the case, and appropriate advice given to patients in case they have progressive
or ongoing symptoms. The validity of such an approach and its effectiveness in avoiding
mortality and morbidity will need prospective study.
3.6 CONCLUSIONS
In conclusion, the CRRS has a high inter-reader reliability, and our study suggests its
usefulness for documenting radiographic abnormalities among TB suspects. The
radiographic scoring system developed is the first of its kind that employs a simple,
reliable and validated reading system, and has a high rule-out value for smear-negative
disease, thereby avoiding extensive and expensive testing in this cohort of TB suspects.
Using this user-friendly and potentially rapid scoring system, and training physicians to
identify these features could make this a simple and quick test in a resource-poor clinical
setting to rule out active TB. Further validation studies are now necessary to confirm our
findings.
68
Table 1. Characteristics of included and excluded patients
Characteristic Included
patients)
(n = 473)
Excluded patients
(n = 172 )
p-value
Age
mean (SD)
39.3(12.1)
39.6(13)
0.79
No. of males,% 329 (69.6) 110 (64) 0.18
Race
black African,%
white/mixed,%
342 (72.3)
131 (27.7)
118 (68.6)
54 (31.4)
0.36
HIV status
positive,%
negative,%
unknown/refused,%
median CD4+,IQR
121 (25.6)
285 (60.3)
67 (14.2)
185 (105,349)
52 (30.2)
94 (54.7)
26 (15.1)
0.24
0.2
0.77
Culture result
positive,%
negative,%
no results
138 (29.2)
335 (70.8)
43 (25)
99 (57.6)
30 (17.4)
0.29
Smear result
at least 1 positive,%
negative,%
91 (19.2)
382 (80.8)
35 (20.4)
115 (66.9)
0.73
69
Table 2. Inter-reader reliability for the major features on the Chest Radiograph
Reading and Recording System
Feature % agreement kappa (standard error)
Large opacity (> 1 cm) 52.97 0.7 (0.05)
Small opacity (< 1cm) 92.55 0.56 (0.05)
Cavity 91.06 0.64 (0.05)
Effusion 88.6 0.77 (0.9)
“Consistent with active TB”*
86.07 0.52 (0.05)
*This judgment (consistent or inconsistent with active TB) was derived through
subjective interpretation by the reader at the end of the CRRS read, about the overall feel
for whether the radiograph was consistent with TB. A data entry space is available on the
form for this purpose (Appendix 6.3).
70
Table 3. Analysis of radiographic and clinical features in the univariate and
multivariable logistic regression model, and weights assigned in the final
radiographic score in 473 patients (n= 138 with TB and 335 with non-TB).
Feature Patients with
TB with
feature
n=138
Patients
without TB
with feature
n=335
OR (95%CI),
univariate
analysis
OR (95%CI),
multivariate
analysis
Score
assigned
Large opacity (> 1cm):
UL opacity 108 105 7.89 (4.95,12.56) 4.18 (2.11,8.28) 2
ML/LL opacity 116 153 6.27(3.79,10.38) 1.82 (0.92, 3.58)
Small opacity (< 1cm):
UL opacity 121 220 3.72 (2.13,6.48) 1.25 (0.61,2.56)
ML/LL opacity 136 2851 11.93
(2.86,49.75)
2.83 (0.58,13.9)
Cavity, any location 55 23 8.99 (5.22,15.48) 3.95 (2.04,7.62) 2
Pleural effusion:
Unilateral 35 40 2.51 (1.51,4.16) 2.08 (1.1,3.9) 1
Bilateral 1 4 0.6 (0.67,5.45)
Apical cap 53 69 2.4(1.56,3.71) 0.78 (0.41,1.49)
Adenopathy, any
location
23 21 2.99 (1.59,5.61) 3.75 (1.71,8.23) 2
Tracheal
deviation/Mediastinal
shift/Hilar elevation
31 27 3.3 (1.87,5.79) 1.16 (0.56,2.38)
HIV infection 43 78 1.49 (0.96,2.32) 1.26 (0.72,2.2)*
Sex - males 100 229 1.22 (0.79,1.89)
Smoker current 79 196 0.91 (0.6,1.38)
Smoker past 18 35 1.29 (0.7,2.37)
Smoker ever 133 318 0.96 (0.61,1.5)
Age mean, SD 36.84 (11.55) 40.26 (12.15) 0.98 (0.96,0.99) 0.96 (0.94,0.98)*
* adjusted for, but not assigned weights in the final model.
71
Table 4. Performance characteristics of the score at different cut-offs in 473 patients
(n= 138 with TB and 335 with non-TB).
Cut-off Sensitivity
(95% CI) Specificity
(95% CI) PPV
(95% CI) NPV
(95% CI) Area under
the ROC
curve**
(95% CI)
≥ 1 89
(83,94)
58
(53,64)
47
(41,53)
93
(89,96)
0.74
(0.7,0.77)
≥ 2 86
(79,91)
64
(59,69)
49
(43,56)
92
(87,95)
0.75
(0.71,0.79)
≥ 3 57
(49,66)
87
(83,91)
64.2
(55,73)
83
(79,87)
0.72
(0.68,0.77)
≥ 4 14
(9,21)
99
(97,100)
79
(58,93)
74
(69,78)
0.56
(0.53,0.59)
“Consistent
with active
TB”
reported by
readers*
93
(88,97)
28
(23,33)
36
(31,41)
91
(83,96)
0.6
(0.57,0.64)
Calculated Score = (UL large opacity*2) + (cavity, any location*2) + (unilateral pleural
effusion*1) + (adenopathy, any location*2)
*This judgment (consistent or inconsistent with active TB) was derived through subjective
interpretation by the reader at the end of the CRRS read, about the overall feel for whether the
radiograph was consistent with TB. A data entry space is available on the form for this purpose
(Appendix 6.3).
* *The various AUROCs are derived from using the test as a dichotomous test at each cut-point
72
Table 5. Performance of scoring system among smear-negative TB patients in
comparison to checklist-based diagnosis (n = 382)
Performance
characteristics
Sensitivity Specificity** PPV NPV Area
under
ROC
curve
Checklist-based
diagnosis among
smear negative
patients*
83
(70,93)
28
(23,33)
15
(11,20)
92
(84,96)
0.55
(0.5,0.61)
Score at cut-off
≥2 among
smear-negative
patients
69
(55,82)
64
(59,69)
22
(16,30)
93
(89,96)
0.67
(0.6,0.74)
*This judgment (consistent or inconsistent with active TB) was derived through
subjective interpretation by the reader at the end of the CRRS read, about the overall feel
for whether the radiograph was consistent with TB. A data entry space is available on the
form for this purpose (Appendix 6.3).
** p-value for the difference in specificity < 0.001
73
Table 6. Performance of scoring system among HIV-infected and uninfected
patients (121 HIV-infected patients, 285 HIV-uninfected patients)
Performance
characteristics
Sensitivity Specificity* PPV NPV Area
under
ROC curve
Score at cut-off
≥ 2 among HIV-
positive patients
81
(67,92)
65
(54,76)
57
(43,69)
86
(75,94)
0.73
(0.65,0.81)
Score at cut-off
≥ 2 among HIV-
negative patients
86
(77,93)
62
(55,69)
47
(39,56)
92
(86,96)
0.74
(0.69,0.79)
* p-value for the difference in specificity = 0.21
74
CHAPTER 5: CONCLUSIONS
Chest radiography is an important diagnostic tool for physicians to assess the likelihood
of active pulmonary TB among individuals with symptoms suggestive of disease, and is
especially important in the cohort of patients who are sputum smear-negative. In the
absence of newer tests for TB that are universally affordable and accessible, there is a
need to improve existing tests such as chest radiography, which suffers from a lack of
standardization.
This thesis was an attempt to (a) systematically review the literature for the use of chest
radiograph scoring systems, with the aim of improving the overall accuracy of chest
radiography as a diagnostic tool for TB, and (b) use the knowledge gained from the
systematic review to inform the derivation of a scoring system for the diagnosis of active
PTB among TB suspects in Cape Town, South Africa.
The systematic review failed to identify a single scoring system that was based
exclusively on radiographic features. We, however, identified 12 studies that used both
clinical and radiographic features as part of scoring systems for the diagnosis of PTB.
Most of the included studies were hospital-based, decision-to-isolate studies, and such
scoring systems serve an important purpose in trying to identify infectious individuals
who need to be isolated to prevent nosocomial transmission of disease. However, all the
scores developed suffered from low specificity, and had a high rule-out value (high
negative predictive value) but a poor rule-in value (low positive predictive value) for
PTB. Such scores may still be useful for limiting the number of patients for whom further
75
investigations would be warranted, especially among patients who are smear-negative,
but are of limited value in ruling-in the disease and initiating treatment without further
diagnostic testing.
Automated computer-assisted diagnosis (CAD) employs techniques such as texture
analysis for reading chest radiographs, and appears to be a promising modality for
standardizing and improving the diagnostic performance of digital chest radiography
(62). However, although we identified 13 studies that employed CAD, our review
suggested a lack of methodologically high-quality studies. Further development of the
field should focus on the validation of such techniques in larger populations and with a
structured epidemiological approach using appropriate reference standards.
The presence of only one study that recruited patients in the out-patient setting, and the
lack of a scoring system for use among patients infected with HIV suggest the need to
derive accurate scoring systems for these cohorts of patients, especially in low-resource
settings, and we attempted to address this need with the study conducted in South Africa.
The study among TB suspects in Cape Town, South Africa attempted to derive a
radiographic scoring system that is simple and reliable, and has improved accuracy in a
high TB and HIV burden setting, using CRRS, a standardized reading and recording
system.
76
Our analysis of the association of specific radiographic features with active PTB found
upper lobe opacities, cavities, unilateral effusions and adenopathy to be associated with
the disease, and these findings are consistent with the reported literature (24-35, 37, 38),
and validate the use of these features to predict active TB. The derived scoring system
relies on 4 features (UL opacities, cavitation, unilateral effusion and adenopathy) giving a
maximum score of 5 points. The major advantage over the existing CRRS system is the
quantification of disease risk (as opposed to a subjective impression) using 4 features
visualized on the chest radiograph.
The utility of the score is especially relevant in TB suspects with negative sputum smears.
Such patients comprise ~85% of the total number presenting to high burden TB clinics as
those with suspected TB (~15% are smear positive; about half of those with TB). A score
using an inexpensive test such as radiography could save significant resources if it were
shown to have a high rule-out value (83). In the study, the scoring system at a cut-off of ≥
2 correctly ruled out disease in 93% of smear negative suspects (45% of symptomatic
patients), thus potentially avoiding further testing in these patients. While the score had a
high negative predictive value in HIV-infected patients (86.4, 95% CI 75-94), the rule-out
value may not be high enough to be clinically useful for this subgroup known to be at a
high risk of progression of disease and death (85). However, the imprecision of the
negative predictive value is a reflection of the small sample size, and further studies will
have to be conducted to assess the utility of the scoring system in in HIV-infected
individuals.
77
Despite the high negative predictive value of the score misclassification (false negative)
occurred in 20 patients with active disease. This highlights the principle that all
diagnostic tests including the chest radiograph must be interpreted within the clinical
context of the case, and appropriate advice given to patients in case they have progressive
or ongoing symptoms.
The finding of high inter-reader reliability for the major features among readers trained to
report radiographs using the CRRS is consistent with earlier reports(76-78), reinforcing
the use of the CRRS system as an attractive tool for epidemiological surveys for reading
and reporting, a purpose for which it was designed.
In conclusion, the thesis, through the systematic review, identified a lack of objective
scoring systems for the radiographic diagnosis of TB, especially in the outpatient setting,
and among HIV-infected patients. The study in South Africa tried to bridge this gap by
deriving such a scoring system using the CRRS. We found the CRRS to have high inter-
reader reliability, making it a useful tool for documenting radiographic abnormalities
among TB suspects. The radiographic scoring system developed is the first of its kind
that employs a simple, reliable and validated reading system, and has a high rule-out
value for smear-negative disease, thereby avoiding extensive and expensive testing in this
cohort of TB suspects. Using this user-friendly and potentially rapid scoring system, and
training physicians to identify these features could make this a simple and quick test in a
resource-poor clinical setting to rule out active TB. Further validation studies are now
necessary to confirm our findings.
78
REFERENCES
1. World Health Organization. Global tuberculosis control 2011.
http://www.who.int/tb/publications/global_report/2011/gtbr11_full.pdf (accessed April 4,
2012)
2. Aber VR, Allen BW, Mitchison DA, Ayuma P, Edwards EA, Keyes AB. Quality
control in tuberculosis bacteriology. 1. Laboratory studies on isolated positive cultures
and the efficiency of direct smear examination. Tubercle. 1980 Sep;61(3):123-33.
3. Lerner BH. The perils of 'x-ray vision': How radiographic images have
historically influenced perception. Perspectives in Biology and Medicine. 1992;35
(3):382-97.
4. Koppaka R, Bock N. How reliable is chest radiography? In: Frieden T, editor.
Toman‟s tuberculosis: case detection, treatment, and monitoring – questions and answers
2nd ed. ed. Geneva: World Health Organization; 2004. p. 51-60.
5. International Labour Office. Guidelines for the use of the ILO international
classification of radiographs of pneumoconiosis. Revised ed 2000. (Occupational Safety
and Health Series, No. 22). Geneva, Switzerland: ILO, 2002.
6. Albin M, Engholm G, Frostrom K, Kheddache S, Larsson S, Swantesson L.Chest
x ray films from construction workers: International Labour Office (ILO 1980)
classification compared with routine readings. Br J Ind Med. 1992;49(12):862-8.
7. Bourbeau J, Ernst P. Between- and within-reader variability in the assessment of
pleural abnormality using the ILO 1980 International Classification of Pneumoconioses.
Am J Ind Med. 1988;14(5):537-43.
79
8. Chrispin AR, Norman AP. The systematic evaluation of the chest radiograph in
cystic fibrosis. Pediatr Radiol. 1974;2(2):101-5.
9. De Jong, PA, Achterberg JA, Kessels OA, Van Ginneken B, Hogeweg L, Beek
FJ, et al. Modified Chrispin-Norman chest radiography score for cystic fibrosis: Observer
agreement and correlation with lung function. Eur Radiol. 2011 April;21(4):722-9.
10. Beards SC, Jackson A, Hunt L, Wood A, Frerk CM, Brear G, et al. Interobserver
variation in the chest radiograph component of the lung injury score. Anaesthesia. 1995
Nov;50(11):928-32.
11. Den Boon S, Bateman ED, Enarson DA, Borgdorff MW, Verver S, Lombard CJ,
et al. Development and evaluation of a new chest radiograph reading and recording
system for epidemiological surveys of tuberculosis and lung disease. Int J Tuberc Lung
Dis. 2005 Oct;9(10):1088-96.
12. Bohlig H. [UICC-Cincinnati classification of radiographic findings in
pneumoconioses. Collective work of a subcommittee of the International Union Against
Cancer (UICC)]. Fortschr Geb Rontgenstr Nuklearmed. 1971 Nov;115(5):665-83.
13. Murray JF, Matthay MA, Luce JM, Flick MR. An expanded definition of the adult
respiratory distress syndrome. Am Rev Respir Dis. 1988 Sep;138(3):720-3.
14. Boehme CC, Nabeta P, Hillemann D, Nicol MP, Shenai S, Krapp F, et al. Rapid
molecular detection of tuberculosis and rifampin resistance. N Engl J Med. 2010 Sep
9;363(11):1005-15.
15. Leeflang MM, Deeks JJ, Gatsonis C, Bossuyt PM. Systematic reviews of
diagnostic test accuracy. Ann Intern Med. 2008 Dec 16;149(12):889-97.
80
16. Macaskill P GC, Deeks JJ, Harbord RM, Takwoingi Y. Analysing and Presenting
Results. In: Deeks JJ, Bossuyt PM, Gatsonis C, editors. Cochrane handbook for
systematic reviews of diagnostic test accuracy, version 090 London: The Cochrane
Collaboration. Available: http://srdta.cochrane.org/ Accessed 6 March 2011.
17. Wilczynski NL, Haynes RB. EMBASE search strategies for identifying
methodologically sound diagnostic studies for use by clinicians and researchers. BMC
Med. 2005;3:7.
18. Haynes RB, Wilczynski NL. Optimal search strategies for retrieving scientifically
strong studies of diagnosis from Medline: analytical survey. Bmj. 2004 May
1;328(7447):1040.
19. Whiting P, Rutjes AW, Reitsma JB, Bossuyt PM, Kleijnen J. The development of
QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in
systematic reviews. BMC Med Res Methodol. 2003 Nov 10;3:25.
20. Zamora J, Abraira V, Muriel A, Khan K, Coomarasamy A. Meta-DiSc: a software
for meta-analysis of test accuracy data. BMC Med Res Methodol. 2006;6:31.
21. DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials.
1986 Sep;7(3):177-88.
22. Harbord RM, Whiting P. metandi: Meta–analysis of diagnostic accuracy using
hierarchical logistic regression. The Stata Journal.Vol 9 No. 2: pp. 211-229.
23. Tatsioni A, Zarin DA, Aronson N, Samson DJ, Flamm CR, Schmid C, et al.
Challenges in systematic reviews of diagnostic technologies. Ann Intern Med. 2005 Jun
21;142(12 Pt 2):1048-55.
81
24. Bock NN, McGowan Jr JE, Ahn J, Tapia J, Blumberg HM. Clinical predictors of
tuberculosis as a guide for a respiratory isolation policy. American Journal of Respiratory
and Critical Care Medicine. 1996;154 (5):1468-72.
25. El-Solh A, Mylotte J, Sherif S, Serghani J, Grant BJ. Validity of a decision tree
for predicting active pulmonary tuberculosis. American Journal of Respiratory & Critical
Care Medicine. 1997 May;155(5):1711-6.
26. El-Solh AA, Hsiao CB, Goodnough S, Serghani J, Grant BJ. Predicting active
pulmonary tuberculosis using an artificial neural network. Chest. 1999 Oct;116(4):968-
73.
27. Moran GJ, Barrett TW, Mower WR, Krishnadasan A, Abrahamian FM, Ong S, et
al. Decision Instrument for the Isolation of Pneumonia Patients With Suspected
Pulmonary Tuberculosis Admitted Through US Emergency Departments. Annals of
Emergency Medicine. 2009 May;53(5):625-32.
28. Mylotte JM, Rodgers J, Fassl M, Seibel K, Vacanti A. Derivation and validation
of a pulmonary tuberculosis prediction model. Infection Control & Hospital
Epidemiology. 1997 Aug;18(8):554-60.
29. Solari L, Acuna-Villaorduna C, Soto A, Agapito J, Perez F, Samalvides F, et al. A
clinical prediction rule for pulmonary tuberculosis in emergency departments.
International Journal of Tuberculosis and Lung Disease. 2008 Jun;12(6):619-24.
30. Lagrange-Xelot M, Porcher R, Gallien S, Wargnier A, Pavie J, de Castro N, et al.
Prevalence and clinical predictors of pulmonary tuberculosis among isolated inpatients: a
prospective study. Clinical Microbiology and Infection. 2011 Apr;17(4):610-4.
82
31. Soto A, Solari L, Agapito J, Acuna-Villaorduna C, Lambert ML, Gotuzzo E, et al.
Development of a clinical scoring system for the diagnosis of smear-negative pulmonary
tuberculosis. Brazilian Journal of Infectious Diseases. 2008 Apr;12(2):128-32.
32. Soto A, Solari L, Diaz J, Mantilla A, Matthys F, van der Stuyft P. Validation of a
Clinical-Radiographic Score to Assess the Probability of Pulmonary Tuberculosis in
Suspect Patients with Negative Sputum Smears. Plos One. 2011 Apr;6(4).
33. Wisnivesky JP, Henschke C, Balentine J, Willner C, Deloire AM, McGinn TG.
Prospective validation of a prediction model for isolating inpatients with suspected
pulmonary tuberculosis. Archives of Internal Medicine. 2005 Feb;165(4):453-7.
34. Wisnivesky JP, Kaplan J, Henschke C, McGinn TG, Crystal RG. Evaluation of
clinical parameters to predict Mycobacterium tuberculosis in inpatients. Archives of
Internal Medicine. 2000 Sep 11;160(16):2471-6.
35. Rakoczy KS, Cohen SH, Nguyen HH. Derivation and Validation of a Clinical
Prediction Score for Isolation of Inpatients With Suspected Pulmonary Tuberculosis.
Infection Control and Hospital Epidemiology. 2008 Oct;29(10):927-32.
36. Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items for
systematic reviews and meta-analyses: the PRISMA statement. Bmj. 2009;339:b2535.
37. Davis JL, Worodria W, Kisembo H, Metcalfe JZ, Cattamanchi A, Kawooya M, et
al. Clinical and Radiographic Factors Do Not Accurately Diagnose Smear-Negative
Tuberculosis in HIV-infected Inpatients in Uganda: A Cross-Sectional Study. Plos One.
2010 Mar;5(3).
38. Le Minor O, Germani Y, Chartier L, Lan NH, Lan NT, Duc NH, et al. Predictors
of pneumocystosis or tuberculosis in HIV-infected Asian patients with AFB smear-
83
negative sputum pneumonia. Journal of Acquired Immune Deficiency Syndromes:
JAIDS. 2008 Aug 15;48(5):620-7.
39. Hogeweg LE, Mol C, de Jong PA, van Ginneken B. Rib suppression in chest
radiographs to improve classification of textural abnormalities. Medical Imaging 2010:
Computer - Aided Diagnosis. 2010;7624.
40. Mouton A, Pitcher RD, Douglas TS. Computer-Aided Detection of Pulmonary
Pathology in Pediatric Chest Radiographs. Medical Image Computing and Computer-
Assisted Intervention - Miccai 2010, Pt Iii. 2010;6363:619-25.
41. Arzhaeva Y, Tax DMJ, van Ginneken B. Dissimilarity-based classification in the
absence of local ground truth: Application to the diagnostic interpretation of chest
radiographs. Pattern Recognition. 2009 Sep;42(9):1768-76.
42. Katsuragawa S, Doi K. Computer-aided diagnosis in chest radiography.
Computerized Medical Imaging and Graphics. 2007 Jun;31 (4-5):212-23.
43. Le K. Chest X-Ray Analysis for Computer-Aided Diagnostic. Advanced
Computing, Pt Iii. 2011;133:300-9.
44. Patil SA, Udupi VR, Kane CD, Wasif AI, Desai JV, Jadhav AN. Geometrical and
Texture Features Estimation of Lung Cancer and TB Images Using Chest X-ray
Database. 2009 International Conference on Biomedical and Pharmaceutical Engineering.
2009:9-15.
45. Rijal OM, Noor NM, Shaban H, Teng SL. A statistical comparison of digital X-
ray images for MTB patients. 2005 27th Annual International Conference of the IEEE
Engineering in Medicine and Biology Society. 2005:6418-21.
84
46. Rijal OM, Iqbal M, Yunus A, Noor NM. Some Critical Remarks on the Initial
Detection of Lung Ailments Using Clinical Data and Chest Radiography. Proceedings of
the 13th Wseas International Conference on Computers. 2009:470-5.
47. Sarkar S, Chaudhuri S. Evaluation and progression analysis of pulmonary
tuberculosis from digital chest radiographs. Computerized Medical Imaging and
Graphics. 1998 Mar-Apr;22(2):145-55.
48. Shen R, Cheng I, Basu A. A Hybrid Knowledge-Guided Detection Technique for
Screening of Infectious Pulmonary Tuberculosis From Chest Radiographs. Ieee
Transactions on Biomedical Engineering. 2010 Nov;57(11):2646-56.
49. van Ginneken B, Romeny BMT, Viergever MA. Automatic segmentation and
texture analysis of PA chest radiographs to detect abnormalities related to interstitial
disease and tuberculosis. Cars 2002: Computer Assisted Radiology and Surgery,
Proceedings. 2002:685-8.
50. van Ginneken B, Romeny BMT. Automatic segmentation of lung fields in chest
radiographs. Medical Physics. 2000 Oct;27(10):2445-55.
51. van Ginneken B, Katsuragawa S, Romeny BMT, Doi K, Viergever MA.
Automatic detection of abnormalities in chest radiographs using local texture analysis.
Ieee Transactions on Medical Imaging. 2002 Feb;21(2):139-49.
52. Caplin M, Grange JM, Morley S, Brown RA, Kemp M, Gibson JA, et al.
Relationship between radiological classification and the serological and haematological
features of untreated pulmonary tuberculosis in Indonesia. Tubercle. 1989 Jun;70(2):103-
13.
85
53. Churchyard GJ, Fielding K, Roux S, Corbett EL, Chaisson RE, De Cock KM, et
al. Twelve-monthly versus six-monthly radiological screening for active case-finding of
tuberculosis: a randomised controlled trial. Thorax. 2011 Feb;66(2):134-9.
54. Ralph AP, Ardian M, Wiguna A, Maguire GP, Becker NG, Drogumuller G, et al.
A simple, valid, numerical score for grading chest x-ray severity in adult smear-positive
pulmonary tuberculosis. Thorax. 2010 Oct;65(10):863-9.
55. Tuberculosis Research Centre M. A concurrent comparison of home and
sanatorium treatment of pulmonary tuberculosis in South India. Bull World Health
Organ. 1959;21(1):51-144.
56. Wejse C, Gustafson P, Nielsen J, Gomes VF, Aaby P, Andersen PL, et al.
TBscore: Signs and symptoms from tuberculosis patients in a low-resource setting have
predictive value and may be used to assess clinical course. Scandinavian Journal of
Infectious Diseases. 2008;40(2):111-20.
57. Agizew T, Bachhuber MA, Nyirenda S, Makwaruzi VZ, Tedla Z, Tallaksen RJ, et
al. Association of chest radiographic abnormalities with tuberculosis disease in
asymptomatic HIV-infected adults. Int J Tuberc Lung Dis. 2010 Mar;14(3):324-31.
58. Dawson R, Masuka P, Edwards DJ, Bateman ED, Bekker LG, Wood R, et al.
Chest radiograph reading and recording system: evaluation for tuberculosis screening in
patients with advanced HIV. Int J Tuberc Lung Dis. 2010 Jan;14(1):52-8.
59. Riley RL, Nardell EA. Clearing the air. The theory and application of ultraviolet
air disinfection. Am Rev Respir Dis. [Review]. 1989 May;139(5):1286-94.
86
60. Kellerman S, Tokars JI, Jarvis WR. The cost of selected tuberculosis control
measures at hospitals with a history of Mycobacterium tuberculosis outbreaks. Infect
Control Hosp Epidemiol. 1997 Aug;18(8):542-7.
61. Pitchenik AE, Rubinson HA. The radiographic appearance of tuberculosis in
patients with the acquired immune deficiency syndrome (AIDS) and pre-AIDS. Am Rev
Respir Dis. 1985 Mar;131(3):393-6.
62. Doi K. Current status and future potential of computer-aided diagnosis in medical
imaging. Br J Radiol. 2005;78 Spec No 1:S3-S19.
63. Lijmer JG, Mol BW, Heisterkamp S, Bonsel GJ, Prins MH, van der Meulen JH, et
al. Empirical evidence of design-related bias in studies of diagnostic tests. Jama. 1999
Sep 15;282(11):1061-6.
64. Rutjes AW, Reitsma JB, Di Nisio M, Smidt N, van Rijn JC, Bossuyt PM.
Evidence of bias and variation in diagnostic accuracy studies. Cmaj. 2006 Feb
14;174(4):469-76.
65. Willis BH. Spectrum bias - Why clinicians need to be cautious when applying
diagnostic test studies. Fam Pract. 2008;25(5):390-6.
66. Laifer GWAFSMBSTAFRTM, Battegay MFU. TB in a Low-Incidence Country:
Differences Between New Immigrants, Foreign-Born Residents and Native Residents.
Am J Med. 2007 April;120(4):350-6.
67. Theron G, Peter J, van Zyl-Smit R, Mishra H, Streicher E, Murray S, et al.
Evaluation of the Xpert MTB/RIF assay for the diagnosis of pulmonary tuberculosis in a
high HIV prevalence setting. American Journal of Respiratory and Critical Care
Medicine. 2011 Jul 1;184(1):132-40.
87
68. Ling DI, Pai M, Davids V, Brunet L, Lenders L, Meldau R, et al. Are interferon-
gamma release assays useful for diagnosing active tuberculosis in a high-burden setting?
Eur Respir J. 2011 Sep;38(3):649-56.
69. Dheda K, Davids V, Lenders L, Roberts T, Meldau R, Ling D, et al. Clinical
utility of a commercial LAM-ELISA assay for TB diagnosis in HIV-infected patients
using urine and sputum samples. Plos One. 2010;5(3):e9848.
70. Urbanczik R. Present position of microscopy and of culture in diagnostic
mycobacteriology. Zentralbl Bakteriol Mikrobiol Hyg A.. 1985 Aug;260(1):81-7.
71. Boehme CC, Nicol MP, Nabeta P, Michael JS, Gotuzzo E, Tahirli R, et al.
Feasibility, diagnostic accuracy, and effectiveness of decentralised use of the Xpert
MTB/RIF test for diagnosis of tuberculosis and multidrug resistance: a multicentre
implementation study. Lancet. 2011 Apr 30;377(9776):1495-505.
72. Steingart KR, Henry M, Ng V, Hopewell PC, Ramsay A, Cunningham J, et al.
Fluorescence versus conventional sputum smear microscopy for tuberculosis: a
systematic review. Lancet Infect Dis. 2006 Sep;6(9):570-81.
73. Steingart KR, Ng V, Henry M, Hopewell PC, Ramsay A, Cunningham J, et al.
Sputum processing methods to improve the sensitivity of smear microscopy for
tuberculosis: a systematic review. Lancet Infect Dis. 2006 Oct;6(10):664-74.
74. Cattamanchi A, Huang L, Worodria W, den Boon S, Kalema N, Katagira W, et al.
Integrated strategies to optimize sputum smear microscopy: a prospective observational
study. American Journal of Respiratory and Critical Care Medicine. 2011 Feb
15;183(4):547-51.
88
75. Nagpaul D, Naganathan N, Prakash M. Diagnostic photofluorography and sputum
microscopy in tuberculosis case-findings. . Proceedings of the 9th Eastern Region
Tuberculosis Conference and 29th National Conference on Tuberculosis and Chest
Diseases, Delhi, November 1974 Delhi, The Tuberculosis Association of
India/International Union Against Tuberculosis. 1975.
76. Den Boon S, Bateman ED, Enarson DA, Borgdorff MW, Verver S, Lombard CJ,
et al. Development and evaluation of a new chest radiograph reading and recording
system for epidemiological surveys of tuberculosis and lung disease. International Journal
of Tuberculosis and Lung Disease. 2005 Oct;9(10):1088-96.
77. Agizew T, Bachhuber MA, Nyirenda S, Makwaruzi V, Tedla Z, Tallaksen RJ, et
al. Association of chest radiographic abnormalities with tuberculosis disease in
asymptomatic HIV-infected adults. International Journal of Tuberculosis and Lung
Disease. 2010 Mar;14(3):324-31.
78. Dawson R, Masuka P, Edwards DJ, Bateman ED, Bekker LG, Wood R, et al.
Chest radiograph reading and recording system: evaluation for tuberculosis screening in
patients with advanced HIV. International Journal of Tuberculosis and Lung Disease.
2010 Jan;14(1):52-8.
79. Quang H. Vuong. Likelihood Ratio Tests for Model Selection and Non-Nested
Hypotheses. Econometrica. 1989;57(2):307-33.
80. Cohen J. A coefficient of agreement for nominal scales. . Educ Psychol Meas.
1960(70):213-20.
81. Dye C, Williams BG. The population dynamics and control of tuberculosis.
Science. 2010 May 14;328(5980):856-61.
89
82. Harries AD, Nyirenda TE, Banerjee A, Boeree MJ, Salaniponi FM. Treatment
outcome of patients with smear-negative and smear-positive pulmonary tuberculosis in
the National Tuberculosis Control Programme, Malawi. Trans R Soc Trop Med Hyg.
1999 Jul-Aug;93(4):443-6.
83. Jones TF, Schaffner W. Miniature chest radiograph screening for tuberculosis in
jails: a cost-effectiveness analysis. American Journal of Respiratory and Critical Care
Medicine. 2001 Jul 1;164(1):77-81.
84. Theron G, Pooran. A, Peter. J, Zyl-Smit. Rv, Mishra. HK, Meldau. R, et al. Do
adjunct TB tests, when combined with Xpert MTB/RIF, improve accuracy and the cost of
diagnosis in a resource-poor setting? Eur Respir J. 2011;ERJ Express. Published on
November 10, 2011 as doi: 10.1183/09031936.00145511.
85. Venkatesh KK, Swaminathan S, Andrews JR, Mayer KH. Tuberculosis and HIV
co-infection: screening and treatment strategies. Drugs. 2011 Jun 18;71(9):1133-52.
90
APPENDIX
6.1 SEARCH STRATEGY FOR THE SYSTEMATIC REVIEW
NEW MEDLINE search strategy:
1. senstitiv:.mp.
2. diagnos:.mp.
3. di.fs.
4. 1 or 2 or 3
5. radiograph:.tw.
6. radiolog:.tw.
7. Mass Chest X-Ray/
8. chest x-ray.tw.
9. scor:.tw.
10. Radiography, Thoracic/
11. chest xray.tw.
12. 5 or 6 or 7 or 8 or 9 or 10 or 11
13. pulmonary tuberculosis.tw.
14. pulmonary tb.tw.
15. lung tuberculosis.tw.
16. lung tb.tw.
17. Tuberculosis, Lymph Node/
18. Tuberculosis, Miliary/
19. Tuberculosis, Multidrug-Resistant/
20. Tuberculosis, Pleural/
21. Tuberculosis, Pulmonary/
22. Mycobacterium tuberculosis/
23. miliary tuberculosis.tw.
24. tuberculous pleurisy.tw.
25. tuberculous pleural effusion.tw.
26. pleural tuberculosis.tw.
27. tuberculous lymphadenitis.tw.
28. lymph node tuberculosis.tw.
29. lymph node tb.tw.
30. miliary tb.tw.
31. pleural tb.tw.
32. 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 21 or 22 or 23 or 24 or 25 or 26 or 27
or 28 or 29 or 30 or 31
33. 4 and 12 and 32
OLD MEDLINE/ MEDLINE in-process search strategy:
91
1. radiograph:.tw.
2. radiolog:.tw.
3. Mass Chest X-Ray/
4. chest x-ray.tw.
5. scor:.tw.
6. Radiography, Thoracic/
7. chest xray.tw.
8. 1 or 2 or 3 or 4 or 5 or 6 or 7
9. pulmonary tuberculosis.tw.
10. pulmonary tb.tw.
11. lung tuberculosis.tw.
12. lung tb.tw.
13. Tuberculosis, Lymph Node/
14. Tuberculosis, Miliary/
15. Tuberculosis, Multidrug-Resistant/
16. Tuberculosis, Pleural/
17. Tuberculosis, Pulmonary/
18. Mycobacterium tuberculosis/
19. miliary tuberculosis.tw.
20. tuberculous pleurisy.tw.
21. tuberculous pleural effusion.tw.
22. pleural tuberculosis.tw.
23. tuberculous lymphadenitis.tw.
24. lymph node tuberculosis.tw.
25. lymph node tb.tw.
26. miliary tb.tw.
27. pleural tb.tw.
28. 9 or 10 or 11 or 12 or 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 21 or 22 or 23 or
24 or 25 or 26 or 27
29. 8 and 28
EMBASE search strategy:
1. specificity.tw.
2. predict:.tw.
3. di.fs.
4. 1 or 2 or 3
5. radiograph:.tw.
6. radiolog:.tw.
7. chest x-ray.tw.
8. scor:.tw.
9. thorax radiography/
10. chest xray.tw.
11. 5 or 6 or 7 or 8 or 9 or 10
92
12. pulmonary tuberculosis.tw.
13. pulmonary tb.tw.
14. lung tuberculosis.tw.
15. lung tb.tw.
16. tuberculous lymphadenitis/
17. miliary tuberculosis/
18. multidrug resistant tuberculosis/
19. tuberculous pleurisy/
20. lung tuberculosis/
21. Mycobacterium tuberculosis/
22. miliary tuberculosis.tw.
23. tuberculous pleurisy.tw.
24. tuberculous pleural effusion.tw.
25. pleural tuberculosis.tw.
26. tuberculous lymphadenitis.tw.
27. lymph node tuberculosis.tw.
28. lymph node tb.tw.
29. miliary tb.tw.
30. pleural tb.tw.
31. 12 or 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 21 or 22 or 23 or 24 or 25 or 26
or 27 or 28 or 30
32. 4 and 11 and 31
Web of Science search strategy:
#1 Topic=(pulmonary tuberculosis) OR Topic=(pulmonary tb) OR Topic=(lung
tuberculosis) OR Topic=(lung tb) OR Topic=(tuberculous lymphadenitis) OR Topic=(tb
lymphadenitis) OR Topic=(tb lymph node) OR Topic=(miliary tuberculosis) OR
Topic=(miliary tb) OR Topic=(multidrug resistant tb) OR Topic=(multidrug resistant
tuberculosis) OR Topic=(pleural tuberculosis) OR Topic=(pleural tb) OR
Topic=(tuberculous pleurisy) OR Topic=(tb pleurisy) OR Topic=(mycobacterium
tuberculosis)
#2 Topic=(sensitiv*) OR Topic=(specific*) OR Topic=(diagnos*) OR
Topic=(accura*) OR Topic=(predict*) OR Topic=(reliab*) OR Topic=(reproducib*)
#3 TS=(radiograph*) OR TS=(radiolog*) OR TS=(chest x-ray) OR TS=(chest xray)
OR TS=(scor*)
#4 #3 AND #2 AND #1
93
6.2 DATA EXTRACTION FORM FOR THE SYSTEMATIC REVIEW
First author:
___________________________
Year of publication:
Language:
1-English
2-French
3-Spanish
Corresponding author email
address:
__________________________
Was author contacted?
0 - No
1 – Yes
If yes, date contacted
__ __ - __ __ - 2011
(DD – MM)
Data assessor
Karen Steingart
Lancelot Pinto
DEMOGRAPHICS AND STUDY DETAILS
Setting
1- Inpatient
2- Outpatient
3- Mixed (in-patients and out-patients)
4- Other _______________
9- NR/unclear
Country of investigation
not reported
Case country world bank
classification
1-Middle/low income
2- High income
3- Both
94
Study design
1- Randomized controlled trial
2- Cross-sectional
3- Cohort
4- Retrospective chart review
5- Others (specify)_______________
9- NR/unclear
Participant selection
1- Consecutive
2- Random
3- Convenience
4- Other_______________
9- NR/unclear
Start inclusion of patients
(year)
not reported
Eligibility inclusion criteria
(fill in the definition where
applicable, for example,
weight loss kg/percent ideal
body weight)
1- cough for ________weeks
2- fever for _________ weeks
3- weight loss _______-
________________
4- night sweats
5- hemoptysis
6- breathlessness
7- past history of tuberculosis
8- contact with TB patient
9- HIV positive individuals
10- any __ of the checked boxes
above
11-all of the checked boxes above
12- unclear/not specified
not reported
95
Exclusion criteria
1-HIV positive individuals
2-HIV negative individuals
3- known TB patients on treatment
4-smear positive patients
5-other
________________________
9-none specified
not reported
Age distribution
Mean, SD _________________
Median, IQR __________________
Range from ______ to _____years
not reported
Number of eligible patients
with chest radiographs, %
_________/_________
____________ %
not reported
Gender (% males of total) not reported
Type of radiograph
Digital
Conventional
Not specified
Number of readers for each
radiograph, and specialty
1 – specialty: ________________
2 – specialty: _________________,
____________________
3 – specialty: _______________, _______________,
___________
9 – not reported
Reference standard used for
the diagnosis of active
tuberculosis
1- solid culture x __________
2- liquid culture x _________
3- culture, not specified
Number of patients in the
study with active TB,
percentage
____________ of ___________
________ %
96
Number of patients in the
study who were HIV
positive, percentage
____________ of ___________
________ %
not reported
CD4+ count of HIV
positive individuals
mean ____________
median ___________
not reported
Number of patients HIV
positive on HAART,
percentage
____________ of __________
________ %
not reported
DETAILS PERTAINING TO THE INDEX TEST
Were individual features of
radiographs analyzed, or
was the end-point
dichotomous (consistent
with TB or not)
1- specific features analyzed
2- dichotomous
Which of these features
were analyzed? (fill in any
further specifics e.g.
laterality, size)
1- Upper lobe infiltrate
__________________________
2- Upper lobe involvement, unspecified type
3- Presence of cavity
_________________________________
4- Presence of infiltrate, unspecified location________
5- Presence of
lymphadenopathy____________________
6- Presence of pleural effusion
______________________
7- Presence of a military pattern
_____________________
97
8- Presence of volume loss
__________________________
9- Presence of calcification
__________________________
10- Other
____________________________________________
___
Were patient numbers for
specific features and
ORs/RRs reported?
1- yes
2- no
FEATURE 1
__________
____
Active TB Calculated (circle OR or RR)
OR /RR, 95% CI =
______________ (
)
Reported OR/RR, 95% CI =
______________ (
)
+ -
Feature +
-
FEATURE 2
__________
____
Active TB Calculated (circle OR or RR)
OR /RR, 95% CI =
______________ (
)
Reported OR/RR, 95% CI =
______________ (
)
+ -
Feature +
-
98
FEATURE 3
__________
____
Active TB Calculated (circle OR or RR)
OR /RR, 95% CI =
______________ (
)
Reported OR/RR, 95% CI =
______________ (
)
+ -
Feature +
-
FEATURE 4
__________
____
Active TB Calculated (circle OR or RR)
OR /RR, 95% CI =
______________ (
)
Reported OR/RR, 95% CI =
______________ (
)
+ -
Feature +
-
FEATURE 5
__________
____
Active TB Calculated (circle OR or RR)
OR /RR, 95% CI =
______________ (
)
Reported OR/RR, 95% CI =
______________ (
)
+ -
Feature +
-
99
FEATURE 6
__________
____
Active TB Calculated (circle OR or RR)
OR /RR, 95% CI =
______________ (
)
Reported OR/RR, 95% CI =
______________ (
)
+ -
Feature +
-
FEATURE 7
__________
____
Active TB Calculated (circle OR or RR)
OR /RR, 95% CI =
______________ (
)
Reported OR/RR, 95% CI =
______________ (
)
+ -
Feature +
-
Was a multivariable logistic
regression with only
radiographic features
performed?
1- Yes
2- No
Was a multivariable logistic
regression with clinical and
radiographic features
performed?
1- Yes
2- No
100
DETAILS OF THE MULTIVARIABLE LOGISTIC REGRESSION
FEATURE beta-
coefficient (CI) OR (CI)
Weight given
in the score
developed
Percentage
weight in the
overall
maximum
score
RELIABILITY OF THE
SCORING SYSTEM
1- tested
2- not tested
Intra-observer agreement
(same observer, reported
twice)
Overall _____________________
Kappa _______________________
Inter-observer agreement
(different observers)
Overall _____________________
Kappa _______________________
101
PERFORMANCE CHARACTERISTICS OF THE SCORE
Cut-off
used
Sensitivity Specificity PPV NPV AUC Any other
indicator
described
QUALITY ASSESSMENT
Study_______________________ Year______________
Item Yes No Unclear
Representative sample1
Acceptable ref std
Acceptable delay
Partial verification avoided2
Differential verification avoided3
Incorporation avoided?
Ref std results blinded?
Index results blinded?
Relevant clinical information
available?4
Uninterpretable results reported?
Withdrawals explained5
Conducted without conflict of interest?
Notes:
1 Representative sample requires PTB suspects, consecutive sample
Case control studies do not qualify as a representative sample.
102
2 Occurs when only a selected sample of patients who underwent the index test is
verified by the reference standard, and that sample is dependent on the results of the test.
For example, patients with suspected coronary artery disease whose exercise test results
are positive may be more likely to undergo coronary angiography (the reference standard)
than those whose exercise test results are negative.
3 Those who tested negative or strongly positive were given a less or more thorough
reference standard for verification
4 Relevant clinical information (were the same clinical data available when test results
were interpreted as would be available when the test is used in practice?
5 Indeterminate and withdrawals have to be described and/or show flow diagram or code
unclear.