CHEST RADIOGRAPH SCORING SYSTEMS FOR …digitool.library.mcgill.ca/thesisfile110727.pdf · A thesis...

103
1 CHEST RADIOGRAPH SCORING SYSTEMS FOR THE DIAGNOSIS OF ACTIVE PULMONARY TUBERCULOSIS Lancelot M Pinto, MD Department of Epidemiology and Biostatistics McGill University, Montreal May 2012 A thesis submitted to McGill University in partial fulfillment of the requirements of the degree of Master of Science © Lancelot M Pinto 2012

Transcript of CHEST RADIOGRAPH SCORING SYSTEMS FOR …digitool.library.mcgill.ca/thesisfile110727.pdf · A thesis...

1

CHEST RADIOGRAPH SCORING SYSTEMS FOR THE

DIAGNOSIS OF ACTIVE PULMONARY TUBERCULOSIS

Lancelot M Pinto, MD

Department of Epidemiology and Biostatistics

McGill University, Montreal

May 2012

A thesis submitted to McGill University in partial fulfillment of the requirements of the

degree of Master of Science

© Lancelot M Pinto 2012

2

Table of Contents

ABSTRACT .................................................................................................................................................. 3

RÉSUMÉ ....................................................................................................................................................... 5

ACKNOWLEDGEMENTS .......................................................................................................................... 8

PREFACE - CONTRIBUTIONS OF CO-AUTHORS .............................................................................. 9

CHAPTER 1: INTRODUCTION ............................................................................................................ 10

CHAPTER 2: SYSTEMATIC REVIEW OF THE LITERATURE (MANUSCRIPT 1) .................. 14 2.1 ABSTRACT .................................................................................................................................................. 14 2.2 INTRODUCTION ....................................................................................................................................... 16 2.3 METHODS ................................................................................................................................................... 17 2.4 RESULTS ...................................................................................................................................................... 23 2.5 DISCUSSION ............................................................................................................................................... 28 2.6 CONCLUSIONS ........................................................................................................................................... 31 2.7 TABLES AND FIGURES .......................................................................................................................... 33

CHAPTER 3: BRIDGING CHAPTER: THE NEED FOR A CHEST RADIOGRAPH SCORING SYSTEM FOR THE DIAGNOSIS OF PULMONARY TUBERCULOSIS ......................................... 49

CHAPTER 4: DEVELOPMENT OF A RELIABLE AND SIMPLE RADIOGRAPHIC SCORING SYSTEM TO AID THE DIAGNOSIS OF PULMONARY TUBERCULOSIS (MANUSCRIPT 2) 52

3.1 ABSTRACT .................................................................................................................................................. 52 3.2 INTRODUCTION ....................................................................................................................................... 54 3.3 METHODS ................................................................................................................................................... 55 3.4 RESULTS ...................................................................................................................................................... 60 3.5 DISCUSSION ............................................................................................................................................... 62 3.6 CONCLUSIONS ........................................................................................................................................... 66 3.7 TABLES AND FIGURES .......................................................................................................................... 67

CHAPTER 5: CONCLUSIONS ............................................................................................................... 74

REFERENCES ........................................................................................................................................... 78

APPENDIX ................................................................................................................................................ 90 6.1 SEARCH STRATEGY FOR THE SYSTEMATIC REVIEW ........................................................... 90 6.2 DATA EXTRACTION FORM FOR THE SYSTEMATIC REVIEW ............................................ 93 6.3 CHEST RADIOGRAPH READING AND REPORTING SYSTEM (CRRS) FORM ............ 103

3

ABSTRACT

Background: Chest radiography is often the only tool available for the investigation of

tuberculosis (TB) suspects with negative sputum smears, thus playing a crucial role in

clinical decision-making. However, chest radiographs lack specificity for TB, and their

interpretation is subjective and not standardized, and therefore not highly reproducible.

Efforts to improve the interpretation of chest radiography are warranted, especially with

the growing use of digital radiology.

Objectives: To systematically review the literature on the use of scoring systems to aid

the diagnosis of active pulmonary TB (PTB), and to derive a new, simple scoring system

using features noted on the Chest Radiograph Reading and Recording System (CRRS), a

tool designed for the documentation of radiographic abnormalities in epidemiological

surveys for PTB.

Methods: A systematic review of the literature was performed to assess the utility of

chest radiograph scoring systems for the diagnosis of PTB, and to use this information to

derive a scoring system using the CRRS. Chest radiographs of outpatients with suspected

PTB, consecutively recruited over 3 years at clinics in South Africa, were read by two

independent readers using CRRS. Multivariable analysis was used to identify features

significantly associated with culture-positive PTB, and these were assigned weights and

used to generate a composite score.

4

Results: A systematic review of the literature identified 12 studies that used

radiographic features as part of scoring systems for the diagnosis of PTB. Six of these

were tested in smear-negative patients. There was no scoring system found that involved

the exclusive use of radiographic features. Upper lobe infiltrates and cavities were the

radiographic features most commonly associated with the disease. The sensitivities of the

scoring systems were uniformly high, but all of them lacked specificity.

For the study in South Africa, 473 patients were included in the analysis. Large upper

lobe opacities, cavities, unilateral pleural effusion and adenopathy were significantly

associated with culture-confirmed PTB, had high inter-reader reliability, and received 2,

2, 1 and 2 points, respectively in the final score. When applied to all TB suspects, using a

cut-off of ≥ 2, the score had a high negative predictive value (92%, 95%CI 87,95).

Among TB suspects with negative sputum smears, the score correctly ruled out active

disease in 214 of 229 patients (NPV 93; 95%CI 89,96)

Conclusions: Existing radiographic scoring systems for the diagnosis of PTB appear to

be sensitive, but lack specificity. The scoring system derived from CRSS is a simple and

reliable tool that may be useful for ruling out active PTB in smear-negative patients.

Validation studies are needed to confirm these initial findings.

5

RÉSUMÉ

Contexte: La radiographie thoracique est souvent le seul outil disponible pour le

dépistage de la tuberculose (TB) chez les patients ayant des frottis d'expectoration

négatifs, lui donnant ainsi un rôle crucial dans la prise de décision clinique. Toutefois, les

radiographies thoraciques manquent de spécificité pour la tuberculose, et leur

interprétation est subjective et non standardisée, et donc n‟est pas très reproductible. Les

efforts visant à améliorer l'interprétation de la radiographie pulmonaire sont justifiés,

surtout vu l'utilisation croissante de la radiologie numérique.

Objectifs: Les objectifs incluent une recherche systématique de la littérature sur

l'utilisation des systèmes de notation pour aider le diagnostic de la tuberculose

pulmonaire active (TBP), et d'en tirer un nouveau système de notation simple à partir du

Chest Radiograph Reading and Recording System (CRRS) (Système de Lecture et

Notation des radiographies thoraciques), un outil conçu pour la documentation des

anomalies radiologiques dans les études épidémiologiques sur la TBP.

Méthodes: Une recherche systématique de la littérature a été effectuée pour évaluer

l'utilité des systèmes de notation des radiographies thoraciques pour le diagnostic de la

TBP, et pour utiliser ces informations pour dériver un système de notation à partir du

CRRS. Les radiographies thoraciques de patients ambulatoires suspects de TBP, recrutés

consécutivement sur 3 ans dans des cliniques en Afrique du Sud, ont été lues par deux

lecteurs indépendants en utilisant CRRS. Une analyse multivariée a été utilisée pour

identifier les caractéristiques significativement associées à la TBP à culture positive, et

6

ceux-ci ont reçu une importance respective et ont été utilisé pour générer un score

composite.

Résultats: Une recherche systématique de la littérature a identifié 12 études qui ont

utilisé des systèmes de notation pour analyser les caractéristiques radiographiques dans le

cours du diagnostic de la TBP. Six d'entre elles comprenaient seulement des patients à

frottis négatif. Aucun système de notation ne comprenait l'usage exclusif des

caractéristiques radiographiques. Des cavités et des infiltrats dans les lobes supérieurs

étaient les caractéristiques radiographiques les plus couramment associées à la maladie.

Les sensibilités des systèmes de notation étaient uniformément élevées, mais chacun

d'eux manquait de spécificité.

Dans l'étude en Afrique du Sud, 473 patients ont été inclus dans l'analyse. Les grandes

opacités du lobe supérieur, les cavités, un épanchement pleural unilatéral ainsi que la

présence d‟adénopathie étaient significativement associés à la TBP confirmée par culture,

avaient un haut taux de fiabilité entre lecteur, et ont reçu 2, 2, 1 et 2 points,

respectivement dans le final. Lorsqu'appliqué à tous les cas suspects de tuberculose, en

utilisant un seuil de ≥ 2, le score avait une forte valeur prédictive négative (92%, IC 95%

87-95). Parmi les suspects de TB à frottis négatifs, le score a correctement exclu la

présence de maladie active dans 214 des 229 patients (VPN 93, 95% CI 89-96).

Conclusions: Les systèmes actuels de notation radiographiques pour le diagnostic de

TBP semblent être sensibles, mais manquent de spécificité. Le système de notation

7

dérivée de la CRSS est un outil simple et fiable qui peut être utile pour exclure la TBP

active chez les patients à frottis négatif. Des études de validation sont nécessaires pour

confirmer ces premiers résultats.

8

ACKNOWLEDGEMENTS

Sincere gratitude and heartfelt thanks go to my supervisor, Prof. Madhukar Pai, whose

guidance and mentoring throughout the course of my graduate studies has been

invaluable. Madhu leads by example, and his dedication to the cause of tuberculosis (TB)

control, his passion for teaching Epidemiology and his zeal to use the highest quality

research to answer questions that will hopefully help us overcome TB eventually, has

been truly inspirational.

I am grateful to Karen Steingart for being my guide and second reader for the systematic

review. Karen embodies the zest for tireless perfection, and I am immensely grateful for

her enthusiasm, patience and time spent in analyzing a huge amount of data. I thank

Keertan Dheda, Dick Menzies, Kevin Schwartzman and Rodney Dawson for their

insightful help and critiques throughout the course of the project. They have been most

helpful in assisting me with the conduct and the analysis of the studies, and their

comments and suggestions have helped me at every step of the way. I thank the team at

the University of Cape Town who helped me and made my stay at Cape Town productive

and enjoyable. I thank all my colleagues at the Pai TB group, and at the Respiratory

Epidemiology and Clinical Research Unit (RECRU). Their constant support and

encouragement has been priceless.

Special thanks go to my wife Franzina, who has always been my best friend and a

constant source of inspiration, and to my parents and family who have been pillars of

support and guidance all my life.

9

PREFACE - CONTRIBUTIONS OF CO-AUTHORS

For the systematic review, Lancelot M. Pinto (LP) was the lead reviewer and first author,

while Madhukar Pai (MP) and Karen R. Steingart (KRS) contributed to the conception

and design. LP was the lead reader, and KRS was the second reader for the screening of

citations, full-text review, and data extraction. LP prepared the manuscript, and MP,

Keertan Dheda (KD) and KRS provided editorial and methodological advice.

For the study in South Africa, LP was the lead author, while KD and MP contributed to

the conception and design. LP, MP and KD provided advice for the analysis and

interpretation of results. LP prepared the manuscript, and MP and KD contributed in

providing editorial advice.

10

CHAPTER 1: INTRODUCTION

Tuberculosis (TB), caused by Mycobacterium tuberculosis, is a disease that burdens

individuals and health systems globally. In 2010, there were 8.8 million new cases of TB,

12 million prevalent cases, and 1.45 million deaths, 0.35 million of which were among

persons living with HIV (PLWH) (1). The South-east Asian, African and Western Pacific

regions bore 85% of the global burden of disease, while Africa alone accounted for 80%

of HIV-associated incident cases worldwide.

The global case detection rate for all forms of TB is only 63-68%(1), and this low rate is

a serious impediment to the control of the disease. Limitations of existing diagnostic tests

are considered to contribute to the low-case detection rate(1). Sputum smear microscopy

and chest radiography are two of the most commonly used tests for TB in most high TB

burden countries. Smear microscopy has low sensitivity and fails to detect nearly half of

all TB cases(2). Chest radiography is a rapid point-of-care test that has been used for over

a century to diagnose pulmonary TB (PTB) (3). The test is easily performed and

incorporated in screening and diagnostic algorithms, and can be especially valuable for

the diagnosis of disease in patients suspected of having the disease (TB suspects) in

whom sputum smears are negative for acid-fast bacilli (AFB) – i.e. smear-negative TB.

While chest radiography is acknowledged to be a sensitive tool for detecting pulmonary

abnormalities, its use for the diagnosis of PTB has been limited by modest specificity and

high inter- and intra-observer differences in reporting of radiographs(4).

11

Scoring systems for chest radiographs have been used successfully to standardize and

improve the accuracy of detection of various pulmonary disorders. The International

Labor Organization (ILO) employs a classification scheme that trains readers to describe

chest radiographs in a standardized manner with regards to the location, nature, size and

profusion of abnormalities when compared to standard films, which represent the various

types of abnormalities. The system is specifically oriented towards documenting and

identifying features associated with the different occupation-related lung diseases such as

coal workers‟ pneumoconiosis, silicosis, and asbestosis (5). The score has been found to

improve reproducibility of the reading of chest radiographs for both pulmonary(6) and

pleural abnormalities(7). The Chrispin-Norman score is a standardized scoring system

used for grading chest radiographs in children with cystic fibrosis, with the aim of

documenting, and objectively assessing progression of disease with serial radiographs(8).

The score has been found to have good inter-reader reproducibility, and correlates well

with lung function(9). The lung injury score is a scoring system that divides the

visualized lung fields into zones, and assesses the extent of lung injury as a marker of

used to assess severity of acute respiratory distress syndrome (ARDS), and has been

found to be useful in improving agreement among readers, especially when read by

radiologists(10).

A similar standardized scoring system for TB that assigns weights to specific features of

chest radiographs consistent with TB, if accurate and reproducible, could potentially

augment TB case detection rates using largely pre-existing resources. Such a scoring

system would ideally need to have a documentation of specific features visualized on a

12

chest radiograph, and a weighted score for the various features visualized relative to their

association with active PTB.

In this manuscript-based thesis, I attempt to explore whether a standardized scoring

system for chest radiographs could improve the performance characteristics of chest

radiography as a diagnostic test for pulmonary TB (PTB), both as a means for improving

the inter-reader reproducibility, and for improving overall diagnostic accuracy.

The first manuscript is a systematic review of the available evidence for the use of

radiographic scoring systems for the diagnosis of PTB. This is a comprehensive literature

review that involved searching multiple databases with the assistance of a second reader,

with both readers having independently searched the literature for relevant studies that

involved radiographic scoring systems for TB. The aim of the literature review was to

identify such studies, and to try and assimilate a set of features that were consistently

associated with PTB, with the aim of testing these features (decided a priori) for the

derivation of a scoring system using the Chest Radiograph Reading and Recording

System (CRRS).

The CRRS is a tool, which was designed for use in epidemiological surveys for TB(11).

The tool involves a checklist of features visualized on the chest radiograph, and has been

found to be associated with high intra- and inter-reader reproducibility, making it a useful

tool for use in the derivation of a weighted scoring system. The derivation of a scoring

system for the diagnosis of PTB was attempted among subjects suspected of having TB

13

in a study conducted at the University of Cape Town, South Africa, using the CRRS. This

study involved the reading of chest radiographs from 473 subjects suspected of having

the disease, by two independent readers, and the analysis of various features visualized

on the radiographs for their association with PTB, followed by deriving a weighted

scoring system using these features. This study is described in the second manuscript of

the thesis.

Together, both manuscripts add to the evidence-base on use of scoring systems to

improve the accuracy and reliability of chest radiography for pulmonary TB diagnosis,

and will help improve the field of TB diagnostics within the existing framework of

limited resources.

14

CHAPTER 2: SYSTEMATIC REVIEW OF THE LITERATURE

(MANUSCRIPT 1)

Chest radiograph scoring systems for the diagnosis of active tuberculosis in adults:

A systematic review

2.1 ABSTRACT

Rationale: The use of chest radiography as a diagnostic test for active pulmonary

tuberculosis (PTB) is limited by the lack of standardization in the reading of chest

radiographs. Thus, despite being sensitive for the detection of pulmonary abnormalities, it

lacks both, specificity for PTB, and reproducibility. Scoring systems have been employed

successfully for improving the performance characteristics of chest radiography for

various pulmonary diseases, and could potentially improve the objectivity, accuracy and

reproducibility of radiography for the diagnosis of PTB.

Objectives: To systematically review the literature to assess the diagnostic accuracy of

chest radiograph scoring systems for PTB in patients clinically suspected of having the

disease. A secondary objective was to assess the reproducibility of such systems for PTB.

Methods: We searched multiple databases for studies that evaluated the diagnostic

accuracy and reproducibility of chest radiograph scoring systems for PTB. We

summarized results for individual features visualized on a chest radiograph that were

predictive of PTB, and the various features that were used in scoring systems to assess

the likelihood of the disease.

15

Results: We identified 12 studies that described clinical-radiographic scoring systems, 11

of which were created with the aim of predicting the likelihood of PTB among patients

who were to be admitted to hospitals. Six of these were tested in smear-negative patients,

and, no scoring system involved the exclusive use of radiographic features. Upper lobe

infiltrates and cavities were the radiographic features most commonly associated with TB

disease. The sensitivity estimates of the scoring systems were uniformly high, but all of

them lacked specificity. Studies involving newer techniques such as computer-assisted

diagnosis (CAD) had to be excluded due to methodological inadequacies.

Conclusions: The systematic review identified clinical-radiographic scoring systems,

most of which are useful in ruling-out PTB as part of the assessment of the need for

respiratory isolation of patients in healthcare settings. However, the low specificity

precludes their use as rule-in tests for PTB. Scoring systems that rely exclusively on chest

radiographs for the diagnosis of PTB are lacking. There is a need to derive accurate

scoring systems for PLWH and patients evaluated in out-patients settings, especially in

low-resource settings.

16

2.2 INTRODUCTION

In 2010, there were 8.8 million new cases of TB, 12 million prevalent cases, and 1.45

million deaths, 0.35 million of which were among persons living with HIV (PLWH)(1).

The global case detection rate for the disease is low, despite recognition of the need for

early diagnosis as a key element in the efforts to curb the epidemic (1). The existing

diagnostic tests in a majority of low-resource settings are sputum smear microscopy and

chest radiography. Smear microscopy lacks sensitivity, and has a limited role in the

diagnosis of extrapulmonary TB, pediatric TB, and TB in HIV-infected patients.

Chest radiography, used for over a century, is a rapid point-of-care test that is known to

be sensitive for detecting pulmonary abnormalities (3). The ease of use, relative low cost,

and quick turnaround time make it a convenient test in high-burden, low-inclome

settings. However, its use for the diagnosis of PTB has been limited by a lack of

specificity, and a lack of reproducibility in reporting of radiographs(4). Consequently, the

probability of diagnosing active PTB based on a chest radiograph reading is dependent on

the reader and not well standardized.

An analogous impediment to the use of chest radiography for the diagnosis of

occupational lung diseases was overcome by the development of standardized methods

for the reading of chest radiographs, a system that is now employed successfully by the

International Union Against Cancer, the International Labor Organization, and the

National Institute for Occupational Safety and Health(5, 12). Scoring systems have also

been developed for grading the severity and extent of pulmonary disease among patients

17

with cystic fibrosis(8) and form part of the lung injury score for assessing the severity of

adult respiratory distress syndrome(10, 13). Similarly, a standardized scoring system for

TB that assigns weights to specific features of chest radiographs consistent with TB, if

accurate and reproducible, could potentially augment TB case detection rates using

largely pre-existing resources. Such a standardized scoring system for TB also has the

potential to be combined with newer nucleic acid amplification tests such as Xpert

MTB/RIF (Cepheid Inc, Sunnyvale, CA) (14), either as a triage test to reduce costs or as

an add-on test in Xpert MTB/RIF negative persons.

To our knowledge, no previous systematic reviews have assessed the performance of

radiograph scoring systems for pulmonary TB. Therefore, we carried out a systematic

review to estimate the diagnostic accuracy of chest radiograph scoring systems for TB in

patients suspected of having the disease. A secondary objective was to assess the

reproducibility of chest radiograph scoring systems for active PTB.

2.3 METHODS

We followed guidelines for systematic reviews of diagnostic test accuracy recommended

by the Cochrane Collaboration Diagnostic Test Accuracy Working Group, including

writing a detailed protocol before starting the review(15, 16).

Types of studies: We included randomized controlled trials and observational studies of

all study designs (e.g. cross-sectional, case-control and cohort) that assessed the

performance of radiographic scoring systems for the diagnosis of pulmonary TB.

18

Participants: Participants were subjects suspected of having pulmonary TB who were 15

years of age and older. We restricted studies to those that included a minimum of 10

cases of active TB. With the aim of evaluating subjects similar to those who present in

routine clinical practice, we excluded studies that exclusively studied specific patient

groups such as patients with pneumoconioses, malignancies (both hematological and

solid organ), immune-mediated inflammatory disease, including patients on

immunosuppressive medications such as tumor necrosis factor-alpha inhibitors, and

patients on hemodialysis. Studies that were conducted on individuals who were not

suspected of having TB, such as investigations of asymptomatic contacts of patients with

active TB were also excluded.

Index test: Any chest radiograph scoring system

Comparator: No chest radiograph scoring system

Target condition: TB of the pulmonary parenchyma, pleura, intrathoracic lymph nodes.

We included miliary TB if the disease involved either pulmonary parenchyma or multiple

sites, one of which was the lung.

Reference standards: We considered liquid or solid cultures as the reference standards

for active pulmonary TB.

19

Definitions: A radiograph scoring system was defined as a system that assigned numeric

weights to specific features of chest radiographs consistent with PTB (such as cavitary

lesions), with or without the presence of clinical or lab components in the system.

True positives (TP) were TB suspects correctly classified as PTB by the scoring system

when compared with the reference standard.

False positives (FP) were TB suspects who did not have active PTB but were

misclassified by the scoring system as having active PTB.

False negatives (FN) were subjects with active PTB who were misclassified by the

scoring system as not having the disease.

True negatives (TN) were TB suspects who did not have active PTB and were correctly

classified by the scoring system.

Sensitivity refers to the proportion of patients with active PTB correctly identified by the

index test when compared with the reference standard: [TP/(FN + TP)] *100

Specificity refers to the proportion of patients with active PTB correctly identified by the

index test when compared with the reference standard: [(TN/(FP + TN)] *100

Positive predictive value (PPV) refers to the proportion of patients correctly identified as

having active PTB by the scoring system when compared to all patients identified as

having active PTB by the scoring system: [TP/(TP+FP)]

Negative predictive value (NPV) refers to the proportion of patients correctly identified as

not having active PTB by the scoring system when compared to all patients identified as

not having active PTB by the scoring system: [TN/(TN+FN)]

20

Diagnostic Odds ratio (DOR) refers to the odds of a participant with active PTB having a

specific clinical or radiographic manifestation as compared to the odds of a participant

without active PTB having the same clinical or radiographic manifestation. It is

computed by the formula: [(TP*TN)/(FP*FN)]

Reproducibility refers to percent agreement on reported features when a chest radiograph

is read more than once. The agreement could either be intra-reader, when the same chest

radiograph is read more than once by the same reader, blinded to his/her previous

reporting of the radiographs, or inter-reader, when more than one reader reports the

features of the same chest radiograph. This agreement is a reflection of the repeatability

of the test, and is independent of the accuracy with reference to the reference standard for

the diagnosis of active PTB. The observed level of agreement is the ratio of the number

of readings that are in agreement to the total number of readings. It is expressed as a

percentage: Agreement = [Number of readings in agreement/total number of

readings]*100. Cohen‟s kappa (Κ) is the chance-adjusted measure of agreement defined

as the ratio of the actual agreement beyond chance to the potential agreement beyond

chance.

Search strategy and study selection

We searched Medline (1946-2011), Embase (1947-2011) and Web of Science (1899 –

2011) on 28 July 2011 for relevant articles, using published hedges for diagnostic tests to

improve sensitivity (17, 18). We used the terms sensitiv*[tw] OR diagnos*[tw] OR di [fs]

AND radiograph*[MeSH] OR chest xray[tw] OR mass chest x-Ray[MeSH] OR

photofluorograph*[tw] OR scor*[tw] AND tuberculosis (sub-headings : lymphNode /

21

miliary/ multidrug-resistant/ Pleural/ Pulmonary) [MeSH] OR Mycobacterium

tuberculosis [MeSH]. The detailed search strategy for Medline can be found in Appendix

6.1. We also reviewed reference lists of included articles and any relevant review articles

identified through the search, for possible eligible articles, and hand-searched relevant

World Health Organization reports.

Relevant studies, restricted to those published in English, French and Spanish, were

selected independently by two reviewers (LP and KRS) and disagreements were resolved

by consensus. Citations deemed appropriate by either reviewer after screening titles and

abstracts were selected for full-text review. A list of excluded studies with their reasons

for exclusion is available upon request from the authors.

Assessment of study quality

Two reviewers (LP and KRS) independently assessed study quality using the core set of

11 items from Quality Assessment of Diagnostic Accuracy Studies (QUADAS), a

validated tool to evaluate the presence of bias and variation in diagnostic accuracy

studies(19). As recommended, each item was be scored as „yes‟, „no‟, or „unclear‟.

Data extraction

Data were extracted from each study, using a data extraction form, that was piloted and

then finalized, based on the experience gained from the pilot data extraction process. Two

reviewers (LP and KRS) independently extracted data, and disagreements were resolved

by consensus. The following data were extracted: author, study design, manner of patient

22

selection, country income status, eligibility criteria for participants, demographic details

of participants, details on the number and qualifications of the readers of the chest

radiographs, and TP, FP, FN, and TN for both, the individual features visualized on the

chest radiograph (such as infiltrates and cavities), and for the scoring system. The data

extraction form is included in Appendix 5.2.

Statistical analysis

For the studies that provided TP, FP, FN, and TN values, sensitivity and specificity

estimates and their corresponding 95% confidence intervals were calculated for the

scoring system at the cut-off for the diagnosis of active PTB used by the study authors

(mostly based on optimal sensitivities and specificities using receiver operator curves).

Forest plots were generated to display sensitivity and specificity estimates using Meta-

Disc (version 1.4) (20). Odds ratios for the presence of individual radiographic features

for the diagnosis of active PTB were determined when the study provided the relevant

data. Meta-analyses of odds ratios for specific radiographic features were performed only

if the features were defined in a sufficiently homogenous manner across studies, the

populations were similar, and the odds ratios were considered homogenous

(Heterogeneity was assessed using the I-squared statistic; effects were pooled if the I2

75%. When results were pooled, meta-analysis was performed using the DerSimonian

and Laird random effects approach, with the aim of incorporating the heterogeneity of

effects across studies(21). All analyses were performed using STATA 11 (Stata

Corporation, College Station, Texas, USA). Pooling was performed using the command

“metandi”(22).Formal assessment of publication bias using methods such as funnel plots

23

or regression tests were not performed because such techniques have not been found to be

useful for diagnostic data(23). An estimation of language bias was attempted by

retrieving citations from the search strategy with and without a language filter, and we

report the “filtered” citations as a percentage of the overall citations retrieved.

2.4 RESULTS

We identified 11907 citations, of which 8066 unique articles were identified after

exclusion of duplicate articles. We conducted the search with and without the language

filters to assess the degree of bias, and found that our search strategy included 80.5% of

all studies. After screening of titles and abstracts, 168 articles satisfied the eligibility

criteria for further review and their full-texts were retrieved. After full-text review, 156

articles were excluded for various reasons. Thus, 12 articles were included in the

systematic review (24-35). Details of the selection process are outlined in Fig.1 in

accordance with PRISMA guidelines for reporting of systematic reviews(36).

Included studies

We did not identify any scoring system that was based exclusively on radiographic

criteria. The 12 included studies all involved scoring systems that comprised both clinical

and radiographic features. . Table 1 lists the characteristics of the 12 included studies,

containing a total of 5767 participants. The median number of TB suspects included in

the studies was 283 (interquartile range 177 to 431).

Six studies included all patients suspected of having TB (24-29), five studies included

patients suspected of having TB who were found to have negative sputum smears (30-

24

34), and one study excluded patients with HIV/AIDS(35). Nine studies were performed

in high-income countries. Five studies involved radiologists; two studies included

pulmonologists and five studies did not report the background of the radiograph reader.

The demographic characteristics of the patients are shown in Table 2. When reported, the

majority of patients were male. Eleven studies included PLWH, who represented 11% to

61% of eligible patients.

Excluded studies

We identified two studies that were designed with the aim of deriving a clinical-

radiographic scoring system for PLWH among hospitalized patients found to be sputum

smear-negative (37, 38). Both studies satisfied a majority of our inclusion criteria and

found the presence of mediastinal adenopathy and cavities to be significantly associated

with PTB in univariate analysis. However, neither of the studies derived a score for PTB.

The study by Le Minor et al. concluded that the “numbers were insufficient to develop a

score for TB”(38), while the study by Davis et al. stated that “After exhaustive testing,

we were unable to identify any combination of factors which reliably predicted

bacteriologically confirmed tuberculosis”(37).

We also excluded 13 studies that used automated computer-assisted diagnosis (CAD) as

none of these studies used culture as a reference standard, a criterion for inclusion in this

review (39-51). Five studies that involved the grading of chest radiographs were excluded

as these studies were designed to grade the severity of PTB based on the extent of

abnormalities visualized on the chest radiograph, and not the diagnostic accuracy of

25

scoring systems (52-56). We also excluded three studies that used the Chest Radiograph

Reading and Recording System (CRRS) (11, 57, 58), despite these studies demonstrating

the CRRS tool to have good reliability for features of PTB visualized on a chest

radiograph, as these studies did not use culture as a reference standard.

Assessment of methodological quality

As seen in Table 3, all studies suffered from incorporation bias, as the results of the chest

radiograph and/or the clinical components of the scoring system played a role in the

decision of those patients would who be investigated further with culture, the reference

standard. Six (50%) of the total 12 studies also did not include a sample that was

considered representative of the target population. Six (50%) studies were unclear about

whether the person assigning scores to the patients for the various components of the

scoring system was blinded to the results of the reference standard.

Findings

Studies that included all patients who were suspected of having TB

We identified six studies that included all patients suspected of having TB (24-29). All

studies were performed in an inpatient setting. All studies were aimed at deriving optimal

prediction scores to identify patients who were likely to have PTB and require respiratory

isolation (Tables 1 and 2). In univariate analyses, the most common radiographic features

across studies found to be significantly associated with PTB were upper lobe infiltrates

[odds ratio (OR) range, 2.38 to 10.11, pooled OR using a random effects model 6.65,

95% CI 4.42, 10.01, five studies)] (Table 4 and Figure 2), and cavities (OR range, 2.11 to

26

10.08, estimates not pooled due to heterogeneity of effects, three studies) (Table 5 and

Figure 3).

The details of the parameters included in the scores and their respective weights are

summarized in Table 6, along with the performance characteristics of the scoring system

and the final rule to aid in decision-making. Studies used several different methods to

derive weights for the scoring system: logistic regression of the parameters found

significant by univariate analysis (three studies); classification and regression tree

(CART) analysis (one study) (25); general regression neural network (GRRN) analysis

(one study) (26); and chi-squared recursive partitioning (one study) (27). All six studies

achieved a sensitivity of the scoring system greater than 80% (median 95%, range, 81%

to 100%). For the five studies that reported specificity data, specificity estimates were

low (median 42%, range 22% to 72%), suggesting a poor rule-in value for PTB.

Figure 4 presents individual study results of sensitivity and specificity estimates (and

their 95% confidence intervals) in both forest plots for the four studies that provided

sufficient data. As the scoring systems included varying clinical and radiographic

parameters to calculate the likelihood of PTB, we did not consider these systems to be

homogenous, and therefore, did not perform a meta-analysis of the accuracy estimates of

these scoring systems.

Studies that only included patients suspected of having TB who were found to be sputum

smear negative

27

We identified five studies in this category (30-34). Four studies were conducted in an

inpatient setting for the purpose of determining a clinical rule for respiratory isolation

(30, 31, 33, 34), while one study was performed in an outpatient setting (32), Tables 1

and Table 2. As with the previously described set of studies that included all patients

suspected of having TB, in the univariate analysis, the most common radiographic

features across studies found to be associated with PTB were upper lobe infiltrates (OR

range, 2.47 to 9.07, pooled OR using a random effects model 3.57, 95% CI 2.38, 5.37,

five studies) (Table 4 and Figure 5), and cavities (OR range, 1.97 to 25.66, estimates not

pooled due to heterogeneity of effects, three studies) (Table 5 and Figure 6).

To derive weights for the scoring system, two studies used logistic regression of the

parameters found significant in the univariate analysis, while three studies involved

validation of previous studies. One of the validation studies used bootstrapping, which is

a resampling method aimed at improving the internal validity of the data. (31) All studies

achieved a sensitivity of the scoring system greater than 93% (median 96%, range, 93%

to 98%). However, specificity estimates were low (median 35%, range 14% to 50%),

again suggesting a poor rule-in value for PTB. Figure 7 presents individual study results

of sensitivity and specificity estimates (and their 95% confidence intervals) in both forest

plots for the four studies that provided sufficient data. As is the case with the above

studies, we did not consider these systems to be homogenous, and therefore, did not

perform a meta-analysis of the accuracy estimates of these scoring systems.

28

One study, performed in an inpatient setting, excluded PLWH(35), Tables 2 and 3.

Logistic regression was used to derive weights for the score, but the study also validated

the score derived by Wisnivesky et al.(34) This study found a sensitivity of 97% and

specificity of 42%. The details of the scoring system and its performance characteristics

can be found in table 5.

Reproducibility

None of the included studies reported data on intra-reporter or inter-reporter

reproducibility.

2.5 DISCUSSION

Chest radiography is an important tool for physicians to assess the probability of active

pulmonary TB among individuals who have symptoms suggestive of disease, and often

the only tool available for assessing this probability among those suspected of having the

disease who are found to be negative on sputum smear examination. In the absence of

newer tests for TB that are universally affordable and accessible, there is a need to

improve existing tests such as chest radiography, which suffers from a lack of

standardization.

We conducted this systematic review with the aim of assessing the diagnostic accuracy of

standardized radiographic scoring systems for the diagnosis of PTB, and whether

standardization improves the performance of the test. Our review failed to find any study

29

that exclusively relied on radiographic features to derive a score, and all the included

studies had a combination of defined radiographic criteria with different clinical criteria.

Most of the included studies were hospital-based, decision-to-isolate studies. Patients

with PTB patient can generate up to 44 quanta per hour (one quantum is defined as the

infectious dose)(59), highlighting the necessity for rapid respiratory isolation of patients

with PTB in the hospital setting. Yet, the unnecessary respiratory isolation of patients

considerably increases costs to the healthcare system(60). Scoring systems that improve

the accuracy of the decisions to subject those patients suspected of having PTB to

respiratory isolation can considerably improve the efficiency of healthcare systems and

utilization of resources. The scores developed suffered from low specificity, and had a

high rule-out value (high negative predictive value) but a poor rule-in value (low positive

predictive value) for PTB. However, such scores may still be useful for limiting the

number of patients for whom further investigations would be warranted, especially

among patients who are smear-negative.

The prediction rule developed by Wisnivesky et al was validated in three studies, two of

which were conducted in patients who had negative sputum smears. The scoring system

consistently demonstrated sensitivity higher than 95%, but had poor specificity. As a

rule-out test, this scoring system appears to be validated in multiple studies. The study by

Soto et al(32) was a validation study of a score derived by the same research group in

an earlier study (31). Although the cut-off for the score was modified in the validation

cohort, it performed well in a sub-group of patients with no prior history of TB. However,

30

we limited the analysis of the performance characteristics of this scoring system in the

population of all patients, as this was the intent of the validation study, and not the post-

hoc analysis of the performance in the selected subgroup.

We identified only one study that assessed clinical-radiographic scoring systems for use

in the out-patient setting. Our systematic review also failed to identify a clinical

radiographic scoring system that was derived for PLWH suspected of having active PTB.

Bock et al.(24) performed a sub-group analysis in PLWH, but found no radiographic

feature to be significantly associated with active PTB in this sub-group, a finding that is

consistent with the atypical nature of radiographic manifestations of PTB described

among PLWH (61).

Automated computer-assisted diagnosis (CAD) employs techniques such as texture

analysis for reading chest radiographs, and appears to be a promising modality for

standardizing and improving the diagnostic performance of digital chest radiography

(62). However, our review suggested a lack of methodologically high-quality studies.

Further development of the field should focus on the validation of such techniques in

larger populations and with a structured epidemiological approach using appropriate

reference standards.

The strength of our systematic review is in the extensive review of the literature, with two

reviewers independently performing the review, and basing every decision on discussion

and consensus. We restricted our search to articles written in English, French and

Spanish, but an assessment for language bias suggested that we included a high

31

proportion of the available literature. We conducted citation searches of the included

articles and review articles to identify any published study that we may have failed to

include because of the language restriction, but did not identify any such studies.

However, we may have inadvertently failed to include articles of relevance in other

languages, and acknowledge this as a shortcoming of the review. All the included studies

suffered from incorporation bias, as the results of the chest radiograph and/or the clinical

components of the scoring system played a role in the decision of which patients would

be investigated further with TB culture. This may have over-estimated the accuracy of the

scoring systems in relation to culture. Six of the 12 studies also did not include a sample

that was considered representative of the target population (selection bias). Six of the 12

studies were unclear about whether the person assigning scores to the subjects for the

various components of the scoring system was blinded to the results of the reference test.

Selection bias and absence of blinding are features of study design that have been

associated with inflated accuracy estimates(63, 64). These limitations in the quality of the

included studies need to taken into consideration when drawing conclusions.

2.6 CONCLUSIONS

Our systematic review revealed no scoring system designed to assess the likelihood of

active PTB based exclusively on radiographic features. Measures to create such a system

would help standardize the interpretation of chest radiographs for the diagnosis of active

PTB. The systematic review identified clinical-radiographic scoring systems, most of

which were created with the aim of predicting the likelihood of active PTB among

patients who were to be admitted to hospitals. Such scoring systems are intended for

assessing the need for respiratory isolation of patients in healthcare settings. Although

32

most of these systems have high sensitivity, they have low specificity for active PTB.

There is a need to derive accurate scoring systems for PLWH and patients evaluated in

out-patients settings, especially in low-resource settings. Technological advances in the

interpretation of chest radiographs, such as CAD, need to be refined and validated in

well-designed studies to assess their utility.

33

2.7 TABLES AND FIGURES

Table 1. Characteristics of studies included

Study Country Setting

No. of

eligible

TB

suspects

included(

% of

eligible

participan

ts)

Design Inclusion criteria

Chest

radiograph

reader(s)

Reference

standard –

type of

culture

Studies that included all TB suspects

Bock et al

(1996)15

USA In-patient 295 (78)

Cross-sectional,

retrospective

1.Patients with active TB

2.Patients with TB in the

differential diagnosis

3.AFB smears and cultures

ordered

4.HIV + with abnormal CXR

Radiologist Solid and

liquid

El-Solh et al

(1997)16 USA In-patient 286 (100)*

Cross-sectional,

retrospective

All isolated patients, based on

symptoms, prior history of TB

exposure, HIV status, medical

and social risk factors, and

radiographic findings

Radiologist

and

pulmonologist

Liquid

El-Solh et al

(1999)17 USA In-patient 119 (100)* Cross-sectional

All patients in whom and AFB

smear and culture was

requested

Radiologist

and

pulmonologist

Liquid

Moran et al

(2009)18 USA In-patient 2535 (91)* Cross-sectional

Admission diagnosis of

pneumonia or suspected TB

Emergency

medicine

resident

NR

Mylotte et al

(1997)19 USA In-patient 220 (100)*

Cross-sectional,

retrospective

All patients in whom and AFB

smear and culture was

requested by the admitting

physician

Not reported Liquid

34

Solari et al

(2008)20 Peru

Emergency

Department 345 (70.8) Cross-sectional

Productive cough for > 1 week

or Cough of any duration and

1.Fever > 3 weeks or 2.Weight

loss of at least 3kg in previous

month or 3.Night sweats or

hemoptysis or differential

diagnosis of PTB from

attending physician

Internist,

internal

medicine

resident

Solid

Studies that included smear-negative TB suspects

Lagrange-

Xelot et al

(2010)21

France In-patient 134 (100) Cross-sectional

Suspected TB, as

recommended by French

guidelines

Not reported Liquid

Soto et al

(2008)22 Peru In-patient 262 (100) Cross-sectional

Cough > = I week AND one

or more of the following:

1.Fever

2.Weight loss >= 4kg in 1

month

3.Breathlessness

4.Constitutional symptoms

(malaise or hyporexia for a

minimum of 2 months)

Not reported Solid

Soto et al

(2011)23 Peru Out-patient 663 (96.9) Cross-sectional

Cough > = 2 weeks AND one

or more of the following:

1.Fever

2.Weight loss

3.Breathlessness

1.General

practitioner

2.TB specialist

Tie breaker:

Experienced

radiologist

Solid, liquid or

concentrated

smear

Wisnivesky

et al

(2000)25

USA In-patient 112 (100) Case-control

Cases - isolated TB patients

controls - randomly selected

from a log of patients who

submitted smears and cultures

matched on age (+/- 3 years),

sex and year of presentation, 3

smears negative, culture

negative and isolated in a

hospital

1.Radiologist

2.Radiologist

Solid and

liquid

35

Wisnivesky

et al

(2005)24

USA In-patient 516 (100) Cross-sectional Patients admitted and isolated

because of suspicion of PTB Not reported

Solid and

liquid

Study that included only HIV-uninfected patients

Rakoczy et

al (2008)26 USA In-patient 280 (100)* Case-control

Cases- all TB inpatients

controls - all inpatients placed

under airborne precautions

with negative smears and

cultures matched with cases

on time of admission (+/- 6

days)

Not reported Not reported

*Studies had derivation and validation cohorts. The number of TB suspects represents those in the validation cohorts

36

Table 2. Demographic characteristics of subjects in the included studies

Study Age (years) No. of

Males (%)

No. of

Persons

Living with

HIV (%)

Patients

with Active

TB (%)

Studies that included all TB suspects

Bock et al (1996)15+

mean 41 296 (79) 230 (61.0) 53 (14.1)

El-Solh et al (1997)16 ##

mean(SD)

PLWH:

36.6(0.4)

non-PLWH:

50.4(1.2)

NR 316 (56.1) 47 (8.3)

El-Solh et al (1999)17

NR NR 66 (55.5) 11 (9.2)

Moran et al (2009)18 ##

median (IQR)

48(38-63)

3567 (63)* 1058 (20.8) 224 (4.4)

Mylotte et al (1997)19 mean(SD)

44(16)

NR 129 (59.0) 8 (3.6)

Solari et al (2008)20

median 33 222 (64.4) 45(13.0) 109 (31.6)

Studies that included smear-negative TB suspects

mean (SD)

Lagrange-Xelot et al

(2010)21 43 (14.0) 94 (70) 60 (40.0) 26 (19.0)

Soto et al (2008)22

NR 166 (63.4) 28(10.9)**

27 (10.3)

Soto et al (2011)23

41.4 (17.2) 370 (55.8) 98 (24.0)#

184 (27.8)

Wisnivesky et al (2000)25

cases – 40

(2)

controls – 40 (2)

82 (73.2) NR 56 (50)

Wisnivesky et al (2005)24

46.3 (11.9) 285 (55.2) 362(70.0) 19 (3.7)

Study that excluded PLWH

Rakoczy et al (2008)26 cases – 60

controls – 51.8

cases -

33(67)

controls –

29 (59)

0 33 (11.8)

IQR, interquartile range; NR, not reported; PLWH, persons living with HIV

* Sex not documented in 1.4% patients

** 6 patients refused testing

# 255 patients refused testing

## The demographic characteristics represent those of the included subjects in the

combined derivation and validation cohort

+ The demographic characteristics represent those of the eligible subjects

37

Table 3. Quality assessment of the included studies using the QUADAS tool

Item

Bo

ck et

al

(19

96

)15

El-

So

lh

et

al (

19

97

)16

El-

So

lh

et

al (

19

99

)17

Mo

ran

et

al (

20

09

)18

My

lott

e et

al (

19

97

)19

So

lari

et

al

(20

08

)20

Lag

ran

ge-

Xel

ot

et

al21 (

20

10)

So

to

et

al

(20

08

)22

So

to

et

al

(20

11

)23

Wis

niv

esk

y

et

al

(20

00

)25

Wis

niv

esk

y

et

al

(20

05

)24

Rak

ocz

y e

t

al (

20

08

)26

Representative

sample?

Acceptable reference

standard?

Acceptable delay?

Partial verification

avoided?

Differential

verification avoided?

Incorporation

avoided?

Reference standard

blinded?

Index results blinded?

Relevant clinical

information available?

Uninterpretable

results reported?

Withdrawals

explained?

Yes No Unclear

38

Table 4 Association of upper lobe opacities visualized on the chest radiograph and

active pulmonary TB

Study OR 95% CI

Studies that included all TB suspects

Bock et al (1996)

15 5.14 2.56,10.33

El-Solh et al (1997)16

*

10.1

18

5.29,19.3

5.47,59.36 when adenopathy also present

El-Solh et al (1999)17

2.38 0.67,8.41

Moran et al (2009)18

**,# 7.7

5.9,10

Solari et al (2008)20

5.15 3.03,8.75

Studies that included smear-negative TB suspects

Lagrange-Xelot et al

(2010)21

3.83 1.52,9.6

Soto et al (2008)22

** 4.81 1.93,11.92

Soto et al (2011)23

** 2.47 1.71,3.57

Wisnivesky et al

(2000)25+

9.07 2.5,32.9

Wisnivesky et al

(2005)24+

3.96 1.57, 9.98

* reported “upper zone disease”,

**reported “apical infiltrates”

+ reported “upper lobe consolidation”

# The study reported relative risks (RR) as the measure of association

39

Table 5 Association of the presence of a cavity visualized on the chest radiograph

and active pulmonary TB

Study OR 95% CI

Studies that included all TB suspects

Bock et al (1996)15

4.75 1.43,15.74

Moran et al (2009)18

* 7.68 5.88,10.05

Solari et al (2008)20

2.11 1.25,3.55

Studies that included smear-negative TB suspects

Lagrange-Xelot et al

(2010)21

25.66 6.42,102.69

Soto et al (2008)22

1.97 0.76,4.87

Wisnivesky et al

(2000)25+

2.04 0.7,5.96

*The study reported relative risks (RR) as the measure of association

40

Table 6 Details and performance characteristics of included scoring systems

Study Feature Test details Rule

Sensitivity

%

(95%CI)#

Specificity

%

(95%CI)#

PPV

%

(95%

CI)#

NPV

%

(95%

CI)#

Area

under the

curve

(SE)

Studies that included all TB suspects

Bock et al

(1996)15

1.CXR with upper lobe

infiltrate

2.CXR with cavity

3.Knew someone with

active TB

4.Self-report of positive

tuberculin skin test in past

5.Self-report of isoniazid

prophylaxis therapy in the

past

Logistic regression OR,

95%CI

5 (2.38,10.51)

3.93 (1.06,14.62)

2.42 (1.1,5.32)

5.67 (1.57,22.01)

0.18 (0.04,0.82)

Any of 1-3 or 4(in

the absence of 5)

considered test

positive

81 26

El-Solh et al

(1997)16

Upper zone disease

weight loss

diabetes mellitus

Used classification and

regression tree analysis

Upper zone

disease absent and

fever absent = test

negative

Upper zone

disease absent,

but fever present

= test negative if

no weight loss

and CD4+ > 200

Everyone else to

be considered test

positive

100

(78,100) 50 (44,57) 22 100

0.878

(0.029)

41

El-Solh et al

(1999)17

Age, CD4+counts,diabetes

mellitus,HIV,tuberculin

skin test positivity

Chest pain,weight loss,

cough, night sweats,

fever,shortness of breath

Upper lobe infiltrate,

lower lobe infiltrate, upper

lobe cavity, lower lobe

cavity, adenopathy,

unilateral pleural effusion,

bilateral pleural effusion,

pleural thickening, miliary

pattern, normal

Used general regression

neural network

100

(91,100)

72

(65,77)

0.947

(0.028)

Moran et al

(2009)18

1.Apical infiltrate

2.Cavitation

3.Immigrant

4.Weight loss

5.Positive TB history

6.Homeless

7.Incarcerated

Used chi-squared recursive

partitioning

Any of 1-7

present = test

positive

96 49 8 100

Mylotte et al

(1997)19

1.AFB positive smear

2.Localized CXR change

3.Correctional facility

residence

4.History of weight loss

OR, 95%CI

5.8 (3-11)

2.5 (1.3,4.9)

2.3 (1.2,4.4)

1.8 (1,3.2)

Score

3

2

2

1

Score > 3 = test

positive 88 63.2 8 99

0.86

(0.04)

42

Solari et al

(2008)20

1.Age < 35

2.Age 35-60

3.Age > 61

3.Weight loss

4.History of PTB

5.Miliary pattern

6.Cavity

7.Upper lobe infiltrate

OR, 95%CI

0.97 (0.96,0.99)

2.79 (1.51,5.18)

0.51 (0.28,0.95)

8.04 (2.79,23.16)

2.54 (1.4,4.62)

5.64 (3.2,9.93)

Score

0

-1

-2

5

-3

10

5

9

Score > 3 = test

positive

93 42 43 93 0.809

(0.05)

Studies that included smear-negative TB suspects

Lagrange-

Xelot et al

(2010)21

(validation of

score

developed by

Wisnivesky

et al25

)

TB risk factors or chronic

symptoms

Self-report of positive

tuberculin skin test in past

Shortness of breath

Temperature < 38.5 deg C

Temperature 38.5 -39 deg

C

Temperature > 39 deg C

Crackles on physical

examination

Upper lobe disease on

CXR

Score

4

5

-3

0

3

6

-3

6

> 1 = test positive 96.2 21.3 23 96

Soto et al

(2008)22

Hemoptysis

Weight loss

Age > 45

Expectoration

Apical infiltrate

Miliary infiltrate

OR (95% CI)

3.24 (1.11, 9.22)

2.35 (0.86,6.43)

2.01 (1.01,3.01)

0.35 (0.14,0.9)

4.29 (1.7,10.86)

9.31 (2.21,39.24)

Score

2

1

-1

-1

3

4

< 0 = low

probability

> 4 = high

probability

At a cut-off < 0

At a cut-

off of > 2

0.83

(0.07)

93 50

At a cut-off > 4

92

43

Soto et al

(2011)23

Same as above – validation

study

> 5 = high

probability

At a cut-off < 0

97.8

(94.5,99.4)

14

(11,17.4)

At a cut-off > 5

23.9

(17.9,30.7)

93.1

(90.5,95.2)

Wisnivesky et

al (2000)25

TB risk factors or chronic

symptoms

Self-report of positive

tuberculin skin test in past

Shortness of breath

Temperature < 38.5 deg C

Temperature 38.5 - 39 deg

C

Temperature > 39 deg C

Crackles on physical

examination

Upper lobe disease on

CXR

OR, 95%CI 7.9 (4.4,24.2)

13.2 (4.4,40.7)

0.2 (0.1,0.5)

2.8 (1.1,8.3)

0.3 (0.1,0.5)

14.6 (3.7,57.5)

Score

4

5

-3

0

3

6

-3

6

> 1 = test positive 98

(95,100)

46

(33,59) 3.3* 99.9*

Wisnivesky et

al (2005)24 Same as above

95

(74,100)

35

(31,40) 9.6 99.7

Study that excluded PLWH

Rakoczy et al

(2008)26

Chronic symptoms

Immunosuppression*

(other than HIV)

foreign birth

CXR upper zone findings

shortness of breath

OR, 95% CI 10.21(2.95,35.4)

8.14(2.08,31.8)

7.01(2.1,23.8)

5.28(1.6,17.2)

0.13(0.04,0.45)

Score

6

4

2

2

-2

> 4 = test positive

97 42 61 95

Validation of the score

developed by Wisnivesky 96 18

*Calculated assuming a prevalence of 2%

# 95% CIs are included when the published study either provided these, or the data were sufficient for the CIs to be calculated

CXR , chest x-ray; NPV, negative predictive value; PPV, positive predictive value

44

Figure 1. PRISMA flow diagram for included and excluded studies

Records identified through database

searching

(n = 11599)

MEDLINE n = 3586 EMBASE n= 5740

Web of Science n = 2273

Scr

een

ing

In

clu

ded

E

ligib

ilit

y

Iden

tifi

cati

on

Additional records identified

through other sources

(n = 308)

Records after duplicates removed

(n = 8066)

Records screened

(n = 8066)

Records excluded

(n = 7916)

Full-text articles assessed

for eligibility

(n =168)

Full-text articles excluded,

(n = 156)

Not a scoring system – 44

Pediatric study – 35

Reference standard not

satisfied – 34

Specific features of CXR

not analyzed – 16

Review – 10

Other language – 3

Screening of asymptomatic

patients – 3

Not M.TB - 3

Abstract – 2

Duplicate – 2

Editorial/commentary – 2

Article not obtained - 4

Studies included in

qualitative synthesis

(n = 12 )

Studies included in

quantitative synthesis

(meta-analysis)

(n = 5)

45

Figure 2. Diagnostic odds ratio for active pulmonary TB among all TB suspects with

an upper lobe infiltrate visualized on the chest radiograph

The size of the black squares corresponds to the relative weight that was assigned to each study.

The diamond represents the pooled estimate for the diagnostic odds ratio.

The blue lines represent the confidence intervals around the respective estimates.

Pooling was performed using the DerSimonian and Laird random effects method

Figure 3. Diagnostic odds ratio for active pulmonary TB among all TB suspects with

a cavity visualized on the chest radiograph

The size of the black squares corresponds to the relative weight that was assigned to each study.

The blue lines represent the confidence intervals around the respective estimates.

Pooling was not performed as there was significant heterogeneity of effects (I2 = 93%)

46

Figure 4. Scoring systems for studies that included all TB suspects. The figures show

the estimated sensitivity (A) and specificity (B) of the study (black square) and its

95% confidence interval (blue horizontal line).

(A)

(B)

47

Figure 5. Diagnostic odds ratio for active pulmonary TB among smear-negative TB

suspects with an upper lobe infiltrate visualized on the chest radiograph

The size of the black squares corresponds to the relative weight that was assigned to each study.

The diamond represents the pooled estimate for the diagnostic odds ratio.

The blue lines represent the confidence intervals around the respective estimates.

Pooling was performed using the DerSimonian and Laird random effects method.

Figure 6. Diagnostic odds ratio for active pulmonary TB among smear-negative TB

suspects with a cavity visualized on the chest radiograph

The size of the black squares corresponds to the relative weight that was assigned to each study.

The blue lines represent the confidence intervals around the respective estimates.

Pooling was not performed as there was significant heterogeneity of effects (I2 = 88%)

48

Figure 7. Scoring systems for studies that included smear-negative TB suspects. The

figures show the estimated sensitivity (A) and specificity (B) of the study (black

square) and its 95% confidence interval (blue horizontal line).

(A)

(B)

49

CHAPTER 3: Bridging chapter: The need for a chest radiograph

scoring system for the diagnosis of pulmonary tuberculosis

The systematic review was conducted with the aim of :

a) identifying scoring systems that had been designed and used for the diagnosis of TB

using radiographic features visualized on the chest radiograph and assessing their

performance characteristics, and

b) identifying features that were consistently found across studies to be associated with

PTB that would help in deciding a priori which features we would include in the analysis

for the derivation of a scoring system.

Our systematic review found no scoring system designed to assess the likelihood of

active PTB based exclusively on radiographic features. Although we found 12 studies

that analyzed scoring systems that included radiographic features, all of these were

clinical-radiographic scoring systems.

Three caveats need to be acknowledged while interpreting the results of the systematic

review:

Firstly, 11 of the 12 studies were conducted among inpatients in hospitals, a population in

whom the manifestations of the disease are likely to be subject to a spectrum bias as

compared to outpatients(65).

50

Secondly, 9 of the 12 studies were conducted in high-income, low TB burden settings,

and the radiographic manifestations of the disease, and pre-test probabilities are likely to

be different in such settings as compared to low-income, high-burden settings(66).

Lastly, the purpose of most of the studies was to assess the likelihood of TB for purposes

of respiratory isolation in hospitals, and the derivation of such scores could have different

aims (and consequently different cut-offs) as scoring systems derived with diagnostic

purposes in mind.

We therefore, concluded a need to derive a scoring system that relied exclusively on

radiographic features, visualized on chest radiographs of out-patient subjects in low-

income, high-burden settings for diagnostic purposes in whom clinical features were

consistent with the disease.

At the University of Cape Town (Cape Town, South Africa), a parent prospective study

(TB-NEAT) was conducted to evaluate several TB diagnostic tests and their contributions

to the diagnosis of active TB in an HIV-endemic setting (67-69). The study consecutively

recruited outpatients with suspected pulmonary TB at two primary care clinics over a 3-

year period. The study involved the documentation of abnormalities visualized on the

chest radiographs of all subjects by two readers who independently read the radiographs

using a standardized validated tool, the CRRS. The large number of subjects involved in

a study conducted in a low-income, high HIV and TB burden outpatient setting provided

an ideal setting for the derivation of a chest radiograph scoring system.

51

The radiographic features that were found consistently associated with PTB in the

systematic review were cavities, upper lobe opacities, unilateral pleural effusions, hilar

and/or mediastinal adenopathy and these were included in the univariate analysis for their

association with PTB for the derivation of the score.

We aimed to use the knowledge gained from the literature review to inform the derivation

of a score that we hoped could have significant implications in the diagnosis of the

disease, and we present the derivation of the score in the following chapter.

52

CHAPTER 4: Development of a reliable and simple radiographic

scoring system to aid the diagnosis of pulmonary tuberculosis

(MANUSCRIPT 2)

3.1 ABSTRACT

Rationale: In tuberculosis (TB) suspects whose sputum smears are AFB negative, chest

radiography is often the only alternative diagnostic tool available. However, chest

radiographs lack specificity for TB, and their interpretation is subjective and not

standardized, and therefore not reproducible. Efforts to improve the interpretation of

chest radiography are warranted.

Objectives: To derive a scoring system to aid the diagnosis of PTB, using features noted

on the Chest Radiograph Reading and Recording System (CRRS).

Methods: Chest radiographs of outpatients with suspected PTB, consecutively-recruited

over 3 years at clinics in South Africa, were read by two independent readers using

CRRS. Multivariable analysis was used to identify features significantly associated with

culture-positive active pulmonary TB, and these were weighted and used to generate a

score.

Results: 473 patients were included in the analysis. Large upper lobe opacities, cavities,

unilateral pleural effusion and adenopathy were significantly associated with PTB, had

high inter-reader reliability, and received 2, 2, 1 and 2 points, respectively in the final

score. Using a cut-off of 2, the score had a high negative predictive value (92%, 95%CI

87,95). Among TB suspects with negative sputum smears, the score correctly ruled out

active disease in 214 of 229 patients (NPV 93; 95%CI 89,96).

53

Conclusions: The derived scoring system is a simple and reliable tool that is useful for

ruling out active PTB in smear-negative patients, thus potentially reducing the need for

further testing in high burden settings. Validation studies are now warranted.

54

3.2 INTRODUCTION

Despite the fact that tuberculosis (TB) is often curable, the disease continues to be a

major problem globally (1). Existing diagnostic tests for the disease are limited in their

scope, either due to performance limitations of traditional tests (70), or due to the cost of

more accurate tests such as liquid culture and nucleic acid amplification-based tests such

as Xpert MTB/RIF (67, 71), often making these newer tests inaccessible to clinics in

developing countries. As a result, TB case detection rates remain low worldwide (1).

Sputum smear microscopy and chest radiography have well recognized limitations, but

are often the only diagnostic tools available for clinicians in resource-limited settings.

Therefore, efforts are necessary to optimize and improve their performance. While much

work has been done to optimize sputum microscopy using strategies such as light-

emitting diodes (72), sputum concentration techniques (73), and same-day diagnosis (74)

, similar work is needed to improve the diagnostic accuracy of chest radiography for TB.

Most studies assessing the accuracy of chest radiography use subjective assessments of

the probability of TB by readers. As a result, despite being a sensitive test for the

diagnosis of active TB among patients with chest symptoms (75), the use of chest

radiography for mass screening of individuals for TB has been limited by high inter- and

intra-observer variability, thereby affecting the accuracy and reproducibility of the test

(4). In our review of the literature, it was evident that certain features noted on a chest

radiograph, such as apical infiltrates and cavities, are known to be highly suggestive of

active TB (24-35, 37, 38). However, to the best of our knowledge, there is no simple

55

scoring system that combines the systematic assessment of each of the relevant visualized

features with quantification of these parameters to generate an exclusively radiographic

score that predicts the likelihood of active pulmonary TB.

The Chest Radiograph Reading and Recording System (CRRS) is a system developed to

standardize the reading of chest radiographs in epidemiological studies of TB and lung

disease (76). The system has been validated in high-burden TB settings for documenting

the various features visualized on a chest radiograph, and these studies suggest that it may

be a useful tool to improve inter- and intra-reader reliability (77, 78). However, the

CRRS, although validated and shown to be reliable, is an epidemiological research and a

radiology training tool. Chest radiographs are systematically read by working through a

long predefined checklist (a copy of the form can be found in Appendix 6.3) culminating

in a trained reader‟s subjective assessment of the likelihood of active TB. Our study aims

to refine the system further and improve its diagnostic utility, by deriving a weighted

radiographic scoring system that can aid in the rapid, yet reliable reading of chest

radiographs, making the test easy to use for health care and research workers in high

burden settings. Moreover, we sought to investigate how this knowledge could be

integrated with clinical practice.

3.3 METHODS

Study subjects and data collection:

At the University of Cape Town (Cape Town, South Africa), a parent prospective study

(TB-NEAT) was conducted to evaluate several TB diagnostic tests and their contributions

56

to the diagnosis of active TB in an HIV-endemic setting (67-69). The study consecutively

recruited outpatients with suspected pulmonary TB at two primary care clinics over a 3-

year period.

To qualify as a TB suspect, an individual had to present to the hospital with at least two

of the following symptoms: cough for ≥ 2 weeks, haemoptysis, fatigue, night sweats,

fever for ≥ 2 weeks, weight loss, loss of appetite, or being bedridden (one of the

symptoms, if the patient was HIV-infected). Only patients ≥ 18 years were enrolled into

the study. After giving written informed consent, all patients underwent diagnostic

testing, which included two sputum samples evaluated by concentrated smear

microscopy, two sputum cultures using the MGIT 960 liquid culture system (BD

Diagnostic Systems, Sparks, MD, USA), chest radiography, standardized interferon-

gamma release assays, HIV testing, and CD4 T cell count for those who were HIV-

infected. Epidemiological data were captured by a trained interviewer-administered

questionnaire, which was completed by all patients.

CRRS training, and reading of the chest radiographs:

The CRRS training course is held bi-annually at the University of Cape Lung Institute in

Cape Town (See http://www.lunginstitute.co.za/content/talks.html). The course

involves a two-day programme of interactive training using standard chest radiographs.

On the first day of the course attendees are instructed on chest anatomy and disease

presentation, and a standardized approach to identifying radiological abnormalities is

introduced. On the second day attendees read archived radiographs using the structured

57

CRRS form (Appendix 6.3) and consolidate their understanding about the detail required

for standardized reporting. The form has a burden time of approximately 6 to 10 minutes.

On the third day an examination using 24 standardized radiographs is undertaken and

trainees are awarded either "A" or "B grade" accreditation based on their interpretation

of an examination set of radiographs. For the study, chest radiographs were read by two

independent readers (pulmonology fellows in the Department of Medicine), trained in

CRRS, and their findings were recorded on a computerized form. The CRRS involves the

use of a systematic checklist that details abnormal features visualized on a chest

radiograph, such as opacities and cavities. After a clinician reads the chest radiograph

with the assistance of the checklist provided by the CRRS, he/she is asked to gauge the

likelihood of active PTB. However, this assessment of the likelihood of active disease is

subjective, and relies on the overall impression of the reader, guided by the checklist.

Discrepancies, where appropriate, were resolved through reading by a third senior reader

(a faculty pulmonologist in the department). Where a consensus read was not possible,

the read from the senior reader was used in the analysis. Radiographs that were taken ≥ 3

months after the study entry date were discarded, as these did not represent the

radiographic presentation at the time of symptomatic manifestations and bacteriological

evidence of the disease.

Ethics approval:

The study was approved by the University of Cape Town‟s Health Sciences Faculty

Research Ethics Committee (REC REF 421/2006) and the McGill University Faculty of

Medicine ethics committee (Study no. A11-E69-11B)

58

Derivation of the score:

The CRRS includes a detailed description of the features visualized on the chest

radiograph with the aim of documenting these features for epidemiological surveys. For

the purpose of the diagnostic use of the tool, the identification of variables from the

CRRS to be included in the analysis was guided by a review of the literature (24-35, 37,

38). Features visualized on the chest radiograph suggestive of active TB that were

consistent across the reviewed studies were the preferential distribution of lesions in the

upper lobes of the lungs, the specificity of cavities for the disease, and the specificity of

unilateral pleural effusions when compared to bilateral effusions for the disease (24-35,

37, 38). These factors were taken into consideration for the a priori selection of variables.

Clinical characteristics that were known to influence the interpretation of the radiograph,

independent of disease status, were also analyzed. These characteristics included age, sex,

smoking status, HIV status and past history of TB.

The outcome of interest was the presence of active pulmonary TB, defined as the growth

of M.tuberculosis on at least one culture. Patients with two negative cultures were

classified as having a final culture-negative result. Similarly, a patient with two negative

sputum smears was classified as having a smear-negative status.

A univariate analysis was performed to identify significant associations (we used a liberal

threshold of P<0.2 for statistical significance, as we were exploring variables to include

in the multivariable analysis at this stage) between the pre-defined radiographic and

59

clinical features and the outcome of interest. Chi-square tests were used for categorical

radiographic variables and t-tests were used for continuous variables.

Variables found to be significant in the univariate analysis, or which were identified a

priori by the literature review were entered into a multivariate logistic regression model.

Factors found to be independent predictors of the outcome (P<0.05) were selected for the

final model, and a stepwise backward elimination process was employed, using the

likelihood ratio test (79), to eliminate variables that did not significantly contribute to the

model. Although HIV status was not significantly associated with disease, we adjusted

for it in the final model, as HIV is known to alter the radiographic presentation of active

TB.

We assigned scores to each radiographic feature found to be an independent predictor of

outcome in the final model, weighted according to the beta-coefficients from the final

multivariate logistic model. Weights were rounded up to the nearest integer.

Data analysis:

The various major criteria in the CRRS were analyzed for inter-reader reliability among

the two junior readers and a kappa statistic (the chance-adjusted measure of agreement,

defined as the ratio of the actual agreement beyond chance to the potential agreement

beyond chance for inter-reader agreement) was calculated (80). Based on the weights

assigned to the four radiographic features found to be significantly associated with active

pulmonary TB, we calculated a total score for each patient‟s radiograph and analyzed the

60

performance characteristics (sensitivity, specificity and predictive value) of the score at

various cut-points for the diagnosis of culture-confirmed active TB. We also analyzed the

performance of the score in the subset of patients with smear-negative pulmonary TB,

and among patients who were HIV-infected. Data were analyzed using STATA version

11.0 (Stata Corp, College Station, Texas, USA).

3.4 RESULTS

Demographic characteristics of subjects:

Of 645 patients recruited into the parent study, 473 patients were included in the final

analysis. As outlined in Figure 1, the major reasons for exclusion were inability to

produce sputum, contaminated sputa, missing chest radiographs, and the chest radiograph

being read solely by one reader. As seen in Table 1, there were no significant differences

in the demographic features of the patients who were included and those who were

excluded from the analysis.

The mean age of the patients was 39.2 years (SD 12.1). 67 (14.2%) patients refused HIV-

testing. Among those tested, 121(25.6%) were found to be HIV-infected (median CD4+

cell count = 185/cu.mm. among 115 patients with available CD4+ counts). 138 (29.2%)

of patients with suspected TB were found to have culture-confirmed active TB, and

91(19.2%) of all TB suspect patients were positive on sputum smear microscopy.

61

Reliability of the CRRS:

The inter-reader reliability of the CRRS for various major categories is summarized in

Table 2. The kappa-statistic ranged from moderate (0.56) for small opacities to excellent

(0.77) for pleural effusions. The kappa-statistic for the overall judgment on whether the

reader considered the features of the chest radiograph to be consistent with active TB was

0.52 (S.E.0.05).

Development of the score:

Results of the univariate analysis of the various chosen radiographic and clinical criteria

are summarized in Table 3. A backward elimination process was attempted, but all the

variables initially selected for inclusion in the multivariable model were found

significant, and were therefore retained in the final analysis. The final model adjusted for

age and HIV status, but these were not included in the score, as the aim was to develop a

radiograph-based score. Based on the beta-coefficients of the variables of the

multivariable logistic regression, scores were assigned to the individual radiographic

features. The results of the multivariate analysis and scores assigned to the variables in

the final model are summarized in Table 3.

Performance of the score:

The score thus developed was tested at different cut-offs, the results of which are shown

in Table 4. At a cut-off of ≥ 2, the score had a high negative predictive value (92%,

95%CI 87,95), and misclassified 20 of 138 patients with active TB. At this cut-off, using

the score was associated with a better specificity (64%, 95% CI 59,69) than the subjective

62

assessment of the probability of TB by the readers (28%, 95% CI 23,33). This was

accompanied by a loss in sensitivity that was not statistically significant (86, 95% CI

79,91 v/s 93, 95% CI 88,97) (Table 4). The gain in specificity at higher scores was

accompanied by appreciable losses in sensitivity.

In sputum-smear negative patients, at the same cut-off, the test had a good rule-out value

(NPV 93%, 95% CI 89,96), and correctly classified 214 of 229 smear-negative patients as

not having active disease. The performance of the score in smear-negative TB patients is

shown in Table 5. The score had a better negative predictive value for HIV-uninfected

individuals (92, 95% CI 86, 96) as compared to HIV-infected individuals (86, 95% CI

75,94) (Table 6), although the difference was not statistically significant (p=0.21).

3.5 DISCUSSION

Although there are diagnostic scoring systems for pulmonary TB that rely on a

combination of clinical and radiographic features (24-35, 37, 38), there is currently no

validated scoring system that relies exclusively on radiographic features. Such a scoring

system, if simple and rapid, would be invaluable for the management of TB suspects,

especially for clinicians practicing in busy, resource-poor settings. The system would be

particularly useful in the algorithm for the management of smear-negative TB suspects. It

could also serve as a valuable field research tool. This study attempts to derive such a

scoring system that is simple and reliable, and has improved accuracy in a high TB and

HIV burden setting, using CRRS, a standardized reading and recording system.

63

The features that were found to be significant in the multivariable model (upper lobe

opacities, cavities, unilateral effusions and adenopathy) are consistent with the reported

literature (24-35, 37, 38), and validate the use of these features to predict active TB. The

derived scoring system relies on 4 features (UL opacities, cavitation, unilateral effusion

and adenopathy) giving a maximum score of 5 points. The major advantage over the

existing CRRS system is the quantification of disease risk (as opposed to a subjective

impression). Another attractive feature of the scoring system is its potential simplicity.

Although the present study involved readers who were pulmonology fellows subjected to

rigorous training for the use of the CRRS as a research and epidemiological tool, training

for the recognition of the four features identified to be relevant for the score would

potentially be easier, and could make the score a quick and easy tool among general

practitioners in the clinical setting. However, the score would need to be prospectively

tested in such a setting before the results of the present study can definitively confirmed.

Age was found to be inversely associated with the probability of disease, a finding that is

consistent with a high-burden setting such as South Africa, where active disease shows a

peak incidence in younger ages(81).

We envisage the utility of the score to be most relevant in TB suspects with negative

sputum smears. Such patients comprise ~85% of the total number presenting to high

burden TB clinics as those with suspected TB (~15% are smear positive; about half of

those with TB). Thus, those with negative smears present a huge service burden to clinics

adding to health care management costs. Patients with active pulmonary TB who have

negative sputum smears are also known to have adverse outcomes (82). It is thus

64

important to aggressively investigate such patients for active TB, yet the limitations of

resources often limit the scope of such investigations. A score using an inexpensive test

such as radiography could save significant resources if it were shown to have a high rule-

out value (83). The recent WHO policy on Xpert MTB/RIF (Cepheid, Sunnyvale, CA),

an automated molecular test, for the rapid diagnosis of TB and MDR-TB recommends

chest radiography as one of the strategies to screen individuals prior to the use of Xpert

MTB/RIF, as the cost of screening with Xpert MTB/RIF is very high is most developing

countries (21). In our cohort of patients, the scoring system at a cut-off of ≥ 2 correctly

ruled out disease in 93% of smear negative suspects (45% of symptomatic patients), thus

potentially avoiding further testing in these patients. An analysis, in the same cohort of

patients, of various strategies to pre-screen patients prior to the use of Xpert MTB/RIF

found that a subjective assessment of chest radiographs (CRRS) was able to accurately

rule out TB in 18% of patients with no false negative results, reflecting the high

sensitivity but poor specificity of such subjective assessments (84).

While the score had a high negative predictive value in HIV-infected patients (86.4, 95%

CI 75-94), the rule-out value may not be high enough to be clinically useful for this

subgroup known to be at a high risk of progression of disease and death (85). However,

the imprecision of the negative predictive value is a reflection of the small sample size,

and further studies will have to be conducted to assess the utility of the scoring system in

in HIV-infected individuals.

65

The finding of high inter-reader reliability for the major features among readers trained to

report radiographs using the CRRS is consistent with earlier reports(76-78). The high

reliability is useful, given that standardizing the reading of chest radiographs for

pulmonary TB and increasing the reproducibility has always been an impediment to the

accuracy of the test. This reinforces the use of the CRRS system as an attractive tool for

epidemiological surveys for reading and reporting, a purpose for which it was designed.

However, we show here that a quick, simple scoring system potentially has similar rule-

out value with risk probability profiling that is useful for clinical decision-making. Using

this score will likely not require as intensive training at the CRRS system requires but this

needs validation in prospective studies.

Our study has several limitations. Due to logistical difficulties, not all chest radiographs

that had discordant readings were resolved by a third reader, and we had to accept the

report of the senior reader for analysis. We had to exclude several patients from the

analysis, primarily due to the absence of chest radiographs, and this could have

introduced a bias in our study. However, the comparisons of the demographic features of

the patients included and excluded suggest that the absence of radiographs could be

random, and was not associated with the study or outcome variables. We tested the score

in the same population from which the score was derived, and this could lead to an

overestimation of the performance of the score. However, the consistency of the features

used in the score with what is described in the literature suggests that these features are

likely to be reproducible. However, a validation study would be needed to confirm this

further. Despite the high negative predictive value of the score misclassification (false

66

negative) occurred in 20 patients with active disease. This highlights the principle that all

diagnostic tests including the chest radiograph must be interpreted within the clinical

context of the case, and appropriate advice given to patients in case they have progressive

or ongoing symptoms. The validity of such an approach and its effectiveness in avoiding

mortality and morbidity will need prospective study.

3.6 CONCLUSIONS

In conclusion, the CRRS has a high inter-reader reliability, and our study suggests its

usefulness for documenting radiographic abnormalities among TB suspects. The

radiographic scoring system developed is the first of its kind that employs a simple,

reliable and validated reading system, and has a high rule-out value for smear-negative

disease, thereby avoiding extensive and expensive testing in this cohort of TB suspects.

Using this user-friendly and potentially rapid scoring system, and training physicians to

identify these features could make this a simple and quick test in a resource-poor clinical

setting to rule out active TB. Further validation studies are now necessary to confirm our

findings.

67

3.7 TABLES AND FIGURES

Figure 1. Patient flow diagram with reasons for exclusion

68

Table 1. Characteristics of included and excluded patients

Characteristic Included

patients)

(n = 473)

Excluded patients

(n = 172 )

p-value

Age

mean (SD)

39.3(12.1)

39.6(13)

0.79

No. of males,% 329 (69.6) 110 (64) 0.18

Race

black African,%

white/mixed,%

342 (72.3)

131 (27.7)

118 (68.6)

54 (31.4)

0.36

HIV status

positive,%

negative,%

unknown/refused,%

median CD4+,IQR

121 (25.6)

285 (60.3)

67 (14.2)

185 (105,349)

52 (30.2)

94 (54.7)

26 (15.1)

0.24

0.2

0.77

Culture result

positive,%

negative,%

no results

138 (29.2)

335 (70.8)

43 (25)

99 (57.6)

30 (17.4)

0.29

Smear result

at least 1 positive,%

negative,%

91 (19.2)

382 (80.8)

35 (20.4)

115 (66.9)

0.73

69

Table 2. Inter-reader reliability for the major features on the Chest Radiograph

Reading and Recording System

Feature % agreement kappa (standard error)

Large opacity (> 1 cm) 52.97 0.7 (0.05)

Small opacity (< 1cm) 92.55 0.56 (0.05)

Cavity 91.06 0.64 (0.05)

Effusion 88.6 0.77 (0.9)

“Consistent with active TB”*

86.07 0.52 (0.05)

*This judgment (consistent or inconsistent with active TB) was derived through

subjective interpretation by the reader at the end of the CRRS read, about the overall feel

for whether the radiograph was consistent with TB. A data entry space is available on the

form for this purpose (Appendix 6.3).

70

Table 3. Analysis of radiographic and clinical features in the univariate and

multivariable logistic regression model, and weights assigned in the final

radiographic score in 473 patients (n= 138 with TB and 335 with non-TB).

Feature Patients with

TB with

feature

n=138

Patients

without TB

with feature

n=335

OR (95%CI),

univariate

analysis

OR (95%CI),

multivariate

analysis

Score

assigned

Large opacity (> 1cm):

UL opacity 108 105 7.89 (4.95,12.56) 4.18 (2.11,8.28) 2

ML/LL opacity 116 153 6.27(3.79,10.38) 1.82 (0.92, 3.58)

Small opacity (< 1cm):

UL opacity 121 220 3.72 (2.13,6.48) 1.25 (0.61,2.56)

ML/LL opacity 136 2851 11.93

(2.86,49.75)

2.83 (0.58,13.9)

Cavity, any location 55 23 8.99 (5.22,15.48) 3.95 (2.04,7.62) 2

Pleural effusion:

Unilateral 35 40 2.51 (1.51,4.16) 2.08 (1.1,3.9) 1

Bilateral 1 4 0.6 (0.67,5.45)

Apical cap 53 69 2.4(1.56,3.71) 0.78 (0.41,1.49)

Adenopathy, any

location

23 21 2.99 (1.59,5.61) 3.75 (1.71,8.23) 2

Tracheal

deviation/Mediastinal

shift/Hilar elevation

31 27 3.3 (1.87,5.79) 1.16 (0.56,2.38)

HIV infection 43 78 1.49 (0.96,2.32) 1.26 (0.72,2.2)*

Sex - males 100 229 1.22 (0.79,1.89)

Smoker current 79 196 0.91 (0.6,1.38)

Smoker past 18 35 1.29 (0.7,2.37)

Smoker ever 133 318 0.96 (0.61,1.5)

Age mean, SD 36.84 (11.55) 40.26 (12.15) 0.98 (0.96,0.99) 0.96 (0.94,0.98)*

* adjusted for, but not assigned weights in the final model.

71

Table 4. Performance characteristics of the score at different cut-offs in 473 patients

(n= 138 with TB and 335 with non-TB).

Cut-off Sensitivity

(95% CI) Specificity

(95% CI) PPV

(95% CI) NPV

(95% CI) Area under

the ROC

curve**

(95% CI)

≥ 1 89

(83,94)

58

(53,64)

47

(41,53)

93

(89,96)

0.74

(0.7,0.77)

≥ 2 86

(79,91)

64

(59,69)

49

(43,56)

92

(87,95)

0.75

(0.71,0.79)

≥ 3 57

(49,66)

87

(83,91)

64.2

(55,73)

83

(79,87)

0.72

(0.68,0.77)

≥ 4 14

(9,21)

99

(97,100)

79

(58,93)

74

(69,78)

0.56

(0.53,0.59)

“Consistent

with active

TB”

reported by

readers*

93

(88,97)

28

(23,33)

36

(31,41)

91

(83,96)

0.6

(0.57,0.64)

Calculated Score = (UL large opacity*2) + (cavity, any location*2) + (unilateral pleural

effusion*1) + (adenopathy, any location*2)

*This judgment (consistent or inconsistent with active TB) was derived through subjective

interpretation by the reader at the end of the CRRS read, about the overall feel for whether the

radiograph was consistent with TB. A data entry space is available on the form for this purpose

(Appendix 6.3).

* *The various AUROCs are derived from using the test as a dichotomous test at each cut-point

72

Table 5. Performance of scoring system among smear-negative TB patients in

comparison to checklist-based diagnosis (n = 382)

Performance

characteristics

Sensitivity Specificity** PPV NPV Area

under

ROC

curve

Checklist-based

diagnosis among

smear negative

patients*

83

(70,93)

28

(23,33)

15

(11,20)

92

(84,96)

0.55

(0.5,0.61)

Score at cut-off

≥2 among

smear-negative

patients

69

(55,82)

64

(59,69)

22

(16,30)

93

(89,96)

0.67

(0.6,0.74)

*This judgment (consistent or inconsistent with active TB) was derived through

subjective interpretation by the reader at the end of the CRRS read, about the overall feel

for whether the radiograph was consistent with TB. A data entry space is available on the

form for this purpose (Appendix 6.3).

** p-value for the difference in specificity < 0.001

73

Table 6. Performance of scoring system among HIV-infected and uninfected

patients (121 HIV-infected patients, 285 HIV-uninfected patients)

Performance

characteristics

Sensitivity Specificity* PPV NPV Area

under

ROC curve

Score at cut-off

≥ 2 among HIV-

positive patients

81

(67,92)

65

(54,76)

57

(43,69)

86

(75,94)

0.73

(0.65,0.81)

Score at cut-off

≥ 2 among HIV-

negative patients

86

(77,93)

62

(55,69)

47

(39,56)

92

(86,96)

0.74

(0.69,0.79)

* p-value for the difference in specificity = 0.21

74

CHAPTER 5: CONCLUSIONS

Chest radiography is an important diagnostic tool for physicians to assess the likelihood

of active pulmonary TB among individuals with symptoms suggestive of disease, and is

especially important in the cohort of patients who are sputum smear-negative. In the

absence of newer tests for TB that are universally affordable and accessible, there is a

need to improve existing tests such as chest radiography, which suffers from a lack of

standardization.

This thesis was an attempt to (a) systematically review the literature for the use of chest

radiograph scoring systems, with the aim of improving the overall accuracy of chest

radiography as a diagnostic tool for TB, and (b) use the knowledge gained from the

systematic review to inform the derivation of a scoring system for the diagnosis of active

PTB among TB suspects in Cape Town, South Africa.

The systematic review failed to identify a single scoring system that was based

exclusively on radiographic features. We, however, identified 12 studies that used both

clinical and radiographic features as part of scoring systems for the diagnosis of PTB.

Most of the included studies were hospital-based, decision-to-isolate studies, and such

scoring systems serve an important purpose in trying to identify infectious individuals

who need to be isolated to prevent nosocomial transmission of disease. However, all the

scores developed suffered from low specificity, and had a high rule-out value (high

negative predictive value) but a poor rule-in value (low positive predictive value) for

PTB. Such scores may still be useful for limiting the number of patients for whom further

75

investigations would be warranted, especially among patients who are smear-negative,

but are of limited value in ruling-in the disease and initiating treatment without further

diagnostic testing.

Automated computer-assisted diagnosis (CAD) employs techniques such as texture

analysis for reading chest radiographs, and appears to be a promising modality for

standardizing and improving the diagnostic performance of digital chest radiography

(62). However, although we identified 13 studies that employed CAD, our review

suggested a lack of methodologically high-quality studies. Further development of the

field should focus on the validation of such techniques in larger populations and with a

structured epidemiological approach using appropriate reference standards.

The presence of only one study that recruited patients in the out-patient setting, and the

lack of a scoring system for use among patients infected with HIV suggest the need to

derive accurate scoring systems for these cohorts of patients, especially in low-resource

settings, and we attempted to address this need with the study conducted in South Africa.

The study among TB suspects in Cape Town, South Africa attempted to derive a

radiographic scoring system that is simple and reliable, and has improved accuracy in a

high TB and HIV burden setting, using CRRS, a standardized reading and recording

system.

76

Our analysis of the association of specific radiographic features with active PTB found

upper lobe opacities, cavities, unilateral effusions and adenopathy to be associated with

the disease, and these findings are consistent with the reported literature (24-35, 37, 38),

and validate the use of these features to predict active TB. The derived scoring system

relies on 4 features (UL opacities, cavitation, unilateral effusion and adenopathy) giving a

maximum score of 5 points. The major advantage over the existing CRRS system is the

quantification of disease risk (as opposed to a subjective impression) using 4 features

visualized on the chest radiograph.

The utility of the score is especially relevant in TB suspects with negative sputum smears.

Such patients comprise ~85% of the total number presenting to high burden TB clinics as

those with suspected TB (~15% are smear positive; about half of those with TB). A score

using an inexpensive test such as radiography could save significant resources if it were

shown to have a high rule-out value (83). In the study, the scoring system at a cut-off of ≥

2 correctly ruled out disease in 93% of smear negative suspects (45% of symptomatic

patients), thus potentially avoiding further testing in these patients. While the score had a

high negative predictive value in HIV-infected patients (86.4, 95% CI 75-94), the rule-out

value may not be high enough to be clinically useful for this subgroup known to be at a

high risk of progression of disease and death (85). However, the imprecision of the

negative predictive value is a reflection of the small sample size, and further studies will

have to be conducted to assess the utility of the scoring system in in HIV-infected

individuals.

77

Despite the high negative predictive value of the score misclassification (false negative)

occurred in 20 patients with active disease. This highlights the principle that all

diagnostic tests including the chest radiograph must be interpreted within the clinical

context of the case, and appropriate advice given to patients in case they have progressive

or ongoing symptoms.

The finding of high inter-reader reliability for the major features among readers trained to

report radiographs using the CRRS is consistent with earlier reports(76-78), reinforcing

the use of the CRRS system as an attractive tool for epidemiological surveys for reading

and reporting, a purpose for which it was designed.

In conclusion, the thesis, through the systematic review, identified a lack of objective

scoring systems for the radiographic diagnosis of TB, especially in the outpatient setting,

and among HIV-infected patients. The study in South Africa tried to bridge this gap by

deriving such a scoring system using the CRRS. We found the CRRS to have high inter-

reader reliability, making it a useful tool for documenting radiographic abnormalities

among TB suspects. The radiographic scoring system developed is the first of its kind

that employs a simple, reliable and validated reading system, and has a high rule-out

value for smear-negative disease, thereby avoiding extensive and expensive testing in this

cohort of TB suspects. Using this user-friendly and potentially rapid scoring system, and

training physicians to identify these features could make this a simple and quick test in a

resource-poor clinical setting to rule out active TB. Further validation studies are now

necessary to confirm our findings.

78

REFERENCES

1. World Health Organization. Global tuberculosis control 2011.

http://www.who.int/tb/publications/global_report/2011/gtbr11_full.pdf (accessed April 4,

2012)

2. Aber VR, Allen BW, Mitchison DA, Ayuma P, Edwards EA, Keyes AB. Quality

control in tuberculosis bacteriology. 1. Laboratory studies on isolated positive cultures

and the efficiency of direct smear examination. Tubercle. 1980 Sep;61(3):123-33.

3. Lerner BH. The perils of 'x-ray vision': How radiographic images have

historically influenced perception. Perspectives in Biology and Medicine. 1992;35

(3):382-97.

4. Koppaka R, Bock N. How reliable is chest radiography? In: Frieden T, editor.

Toman‟s tuberculosis: case detection, treatment, and monitoring – questions and answers

2nd ed. ed. Geneva: World Health Organization; 2004. p. 51-60.

5. International Labour Office. Guidelines for the use of the ILO international

classification of radiographs of pneumoconiosis. Revised ed 2000. (Occupational Safety

and Health Series, No. 22). Geneva, Switzerland: ILO, 2002.

6. Albin M, Engholm G, Frostrom K, Kheddache S, Larsson S, Swantesson L.Chest

x ray films from construction workers: International Labour Office (ILO 1980)

classification compared with routine readings. Br J Ind Med. 1992;49(12):862-8.

7. Bourbeau J, Ernst P. Between- and within-reader variability in the assessment of

pleural abnormality using the ILO 1980 International Classification of Pneumoconioses.

Am J Ind Med. 1988;14(5):537-43.

79

8. Chrispin AR, Norman AP. The systematic evaluation of the chest radiograph in

cystic fibrosis. Pediatr Radiol. 1974;2(2):101-5.

9. De Jong, PA, Achterberg JA, Kessels OA, Van Ginneken B, Hogeweg L, Beek

FJ, et al. Modified Chrispin-Norman chest radiography score for cystic fibrosis: Observer

agreement and correlation with lung function. Eur Radiol. 2011 April;21(4):722-9.

10. Beards SC, Jackson A, Hunt L, Wood A, Frerk CM, Brear G, et al. Interobserver

variation in the chest radiograph component of the lung injury score. Anaesthesia. 1995

Nov;50(11):928-32.

11. Den Boon S, Bateman ED, Enarson DA, Borgdorff MW, Verver S, Lombard CJ,

et al. Development and evaluation of a new chest radiograph reading and recording

system for epidemiological surveys of tuberculosis and lung disease. Int J Tuberc Lung

Dis. 2005 Oct;9(10):1088-96.

12. Bohlig H. [UICC-Cincinnati classification of radiographic findings in

pneumoconioses. Collective work of a subcommittee of the International Union Against

Cancer (UICC)]. Fortschr Geb Rontgenstr Nuklearmed. 1971 Nov;115(5):665-83.

13. Murray JF, Matthay MA, Luce JM, Flick MR. An expanded definition of the adult

respiratory distress syndrome. Am Rev Respir Dis. 1988 Sep;138(3):720-3.

14. Boehme CC, Nabeta P, Hillemann D, Nicol MP, Shenai S, Krapp F, et al. Rapid

molecular detection of tuberculosis and rifampin resistance. N Engl J Med. 2010 Sep

9;363(11):1005-15.

15. Leeflang MM, Deeks JJ, Gatsonis C, Bossuyt PM. Systematic reviews of

diagnostic test accuracy. Ann Intern Med. 2008 Dec 16;149(12):889-97.

80

16. Macaskill P GC, Deeks JJ, Harbord RM, Takwoingi Y. Analysing and Presenting

Results. In: Deeks JJ, Bossuyt PM, Gatsonis C, editors. Cochrane handbook for

systematic reviews of diagnostic test accuracy, version 090 London: The Cochrane

Collaboration. Available: http://srdta.cochrane.org/ Accessed 6 March 2011.

17. Wilczynski NL, Haynes RB. EMBASE search strategies for identifying

methodologically sound diagnostic studies for use by clinicians and researchers. BMC

Med. 2005;3:7.

18. Haynes RB, Wilczynski NL. Optimal search strategies for retrieving scientifically

strong studies of diagnosis from Medline: analytical survey. Bmj. 2004 May

1;328(7447):1040.

19. Whiting P, Rutjes AW, Reitsma JB, Bossuyt PM, Kleijnen J. The development of

QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in

systematic reviews. BMC Med Res Methodol. 2003 Nov 10;3:25.

20. Zamora J, Abraira V, Muriel A, Khan K, Coomarasamy A. Meta-DiSc: a software

for meta-analysis of test accuracy data. BMC Med Res Methodol. 2006;6:31.

21. DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials.

1986 Sep;7(3):177-88.

22. Harbord RM, Whiting P. metandi: Meta–analysis of diagnostic accuracy using

hierarchical logistic regression. The Stata Journal.Vol 9 No. 2: pp. 211-229.

23. Tatsioni A, Zarin DA, Aronson N, Samson DJ, Flamm CR, Schmid C, et al.

Challenges in systematic reviews of diagnostic technologies. Ann Intern Med. 2005 Jun

21;142(12 Pt 2):1048-55.

81

24. Bock NN, McGowan Jr JE, Ahn J, Tapia J, Blumberg HM. Clinical predictors of

tuberculosis as a guide for a respiratory isolation policy. American Journal of Respiratory

and Critical Care Medicine. 1996;154 (5):1468-72.

25. El-Solh A, Mylotte J, Sherif S, Serghani J, Grant BJ. Validity of a decision tree

for predicting active pulmonary tuberculosis. American Journal of Respiratory & Critical

Care Medicine. 1997 May;155(5):1711-6.

26. El-Solh AA, Hsiao CB, Goodnough S, Serghani J, Grant BJ. Predicting active

pulmonary tuberculosis using an artificial neural network. Chest. 1999 Oct;116(4):968-

73.

27. Moran GJ, Barrett TW, Mower WR, Krishnadasan A, Abrahamian FM, Ong S, et

al. Decision Instrument for the Isolation of Pneumonia Patients With Suspected

Pulmonary Tuberculosis Admitted Through US Emergency Departments. Annals of

Emergency Medicine. 2009 May;53(5):625-32.

28. Mylotte JM, Rodgers J, Fassl M, Seibel K, Vacanti A. Derivation and validation

of a pulmonary tuberculosis prediction model. Infection Control & Hospital

Epidemiology. 1997 Aug;18(8):554-60.

29. Solari L, Acuna-Villaorduna C, Soto A, Agapito J, Perez F, Samalvides F, et al. A

clinical prediction rule for pulmonary tuberculosis in emergency departments.

International Journal of Tuberculosis and Lung Disease. 2008 Jun;12(6):619-24.

30. Lagrange-Xelot M, Porcher R, Gallien S, Wargnier A, Pavie J, de Castro N, et al.

Prevalence and clinical predictors of pulmonary tuberculosis among isolated inpatients: a

prospective study. Clinical Microbiology and Infection. 2011 Apr;17(4):610-4.

82

31. Soto A, Solari L, Agapito J, Acuna-Villaorduna C, Lambert ML, Gotuzzo E, et al.

Development of a clinical scoring system for the diagnosis of smear-negative pulmonary

tuberculosis. Brazilian Journal of Infectious Diseases. 2008 Apr;12(2):128-32.

32. Soto A, Solari L, Diaz J, Mantilla A, Matthys F, van der Stuyft P. Validation of a

Clinical-Radiographic Score to Assess the Probability of Pulmonary Tuberculosis in

Suspect Patients with Negative Sputum Smears. Plos One. 2011 Apr;6(4).

33. Wisnivesky JP, Henschke C, Balentine J, Willner C, Deloire AM, McGinn TG.

Prospective validation of a prediction model for isolating inpatients with suspected

pulmonary tuberculosis. Archives of Internal Medicine. 2005 Feb;165(4):453-7.

34. Wisnivesky JP, Kaplan J, Henschke C, McGinn TG, Crystal RG. Evaluation of

clinical parameters to predict Mycobacterium tuberculosis in inpatients. Archives of

Internal Medicine. 2000 Sep 11;160(16):2471-6.

35. Rakoczy KS, Cohen SH, Nguyen HH. Derivation and Validation of a Clinical

Prediction Score for Isolation of Inpatients With Suspected Pulmonary Tuberculosis.

Infection Control and Hospital Epidemiology. 2008 Oct;29(10):927-32.

36. Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items for

systematic reviews and meta-analyses: the PRISMA statement. Bmj. 2009;339:b2535.

37. Davis JL, Worodria W, Kisembo H, Metcalfe JZ, Cattamanchi A, Kawooya M, et

al. Clinical and Radiographic Factors Do Not Accurately Diagnose Smear-Negative

Tuberculosis in HIV-infected Inpatients in Uganda: A Cross-Sectional Study. Plos One.

2010 Mar;5(3).

38. Le Minor O, Germani Y, Chartier L, Lan NH, Lan NT, Duc NH, et al. Predictors

of pneumocystosis or tuberculosis in HIV-infected Asian patients with AFB smear-

83

negative sputum pneumonia. Journal of Acquired Immune Deficiency Syndromes:

JAIDS. 2008 Aug 15;48(5):620-7.

39. Hogeweg LE, Mol C, de Jong PA, van Ginneken B. Rib suppression in chest

radiographs to improve classification of textural abnormalities. Medical Imaging 2010:

Computer - Aided Diagnosis. 2010;7624.

40. Mouton A, Pitcher RD, Douglas TS. Computer-Aided Detection of Pulmonary

Pathology in Pediatric Chest Radiographs. Medical Image Computing and Computer-

Assisted Intervention - Miccai 2010, Pt Iii. 2010;6363:619-25.

41. Arzhaeva Y, Tax DMJ, van Ginneken B. Dissimilarity-based classification in the

absence of local ground truth: Application to the diagnostic interpretation of chest

radiographs. Pattern Recognition. 2009 Sep;42(9):1768-76.

42. Katsuragawa S, Doi K. Computer-aided diagnosis in chest radiography.

Computerized Medical Imaging and Graphics. 2007 Jun;31 (4-5):212-23.

43. Le K. Chest X-Ray Analysis for Computer-Aided Diagnostic. Advanced

Computing, Pt Iii. 2011;133:300-9.

44. Patil SA, Udupi VR, Kane CD, Wasif AI, Desai JV, Jadhav AN. Geometrical and

Texture Features Estimation of Lung Cancer and TB Images Using Chest X-ray

Database. 2009 International Conference on Biomedical and Pharmaceutical Engineering.

2009:9-15.

45. Rijal OM, Noor NM, Shaban H, Teng SL. A statistical comparison of digital X-

ray images for MTB patients. 2005 27th Annual International Conference of the IEEE

Engineering in Medicine and Biology Society. 2005:6418-21.

84

46. Rijal OM, Iqbal M, Yunus A, Noor NM. Some Critical Remarks on the Initial

Detection of Lung Ailments Using Clinical Data and Chest Radiography. Proceedings of

the 13th Wseas International Conference on Computers. 2009:470-5.

47. Sarkar S, Chaudhuri S. Evaluation and progression analysis of pulmonary

tuberculosis from digital chest radiographs. Computerized Medical Imaging and

Graphics. 1998 Mar-Apr;22(2):145-55.

48. Shen R, Cheng I, Basu A. A Hybrid Knowledge-Guided Detection Technique for

Screening of Infectious Pulmonary Tuberculosis From Chest Radiographs. Ieee

Transactions on Biomedical Engineering. 2010 Nov;57(11):2646-56.

49. van Ginneken B, Romeny BMT, Viergever MA. Automatic segmentation and

texture analysis of PA chest radiographs to detect abnormalities related to interstitial

disease and tuberculosis. Cars 2002: Computer Assisted Radiology and Surgery,

Proceedings. 2002:685-8.

50. van Ginneken B, Romeny BMT. Automatic segmentation of lung fields in chest

radiographs. Medical Physics. 2000 Oct;27(10):2445-55.

51. van Ginneken B, Katsuragawa S, Romeny BMT, Doi K, Viergever MA.

Automatic detection of abnormalities in chest radiographs using local texture analysis.

Ieee Transactions on Medical Imaging. 2002 Feb;21(2):139-49.

52. Caplin M, Grange JM, Morley S, Brown RA, Kemp M, Gibson JA, et al.

Relationship between radiological classification and the serological and haematological

features of untreated pulmonary tuberculosis in Indonesia. Tubercle. 1989 Jun;70(2):103-

13.

85

53. Churchyard GJ, Fielding K, Roux S, Corbett EL, Chaisson RE, De Cock KM, et

al. Twelve-monthly versus six-monthly radiological screening for active case-finding of

tuberculosis: a randomised controlled trial. Thorax. 2011 Feb;66(2):134-9.

54. Ralph AP, Ardian M, Wiguna A, Maguire GP, Becker NG, Drogumuller G, et al.

A simple, valid, numerical score for grading chest x-ray severity in adult smear-positive

pulmonary tuberculosis. Thorax. 2010 Oct;65(10):863-9.

55. Tuberculosis Research Centre M. A concurrent comparison of home and

sanatorium treatment of pulmonary tuberculosis in South India. Bull World Health

Organ. 1959;21(1):51-144.

56. Wejse C, Gustafson P, Nielsen J, Gomes VF, Aaby P, Andersen PL, et al.

TBscore: Signs and symptoms from tuberculosis patients in a low-resource setting have

predictive value and may be used to assess clinical course. Scandinavian Journal of

Infectious Diseases. 2008;40(2):111-20.

57. Agizew T, Bachhuber MA, Nyirenda S, Makwaruzi VZ, Tedla Z, Tallaksen RJ, et

al. Association of chest radiographic abnormalities with tuberculosis disease in

asymptomatic HIV-infected adults. Int J Tuberc Lung Dis. 2010 Mar;14(3):324-31.

58. Dawson R, Masuka P, Edwards DJ, Bateman ED, Bekker LG, Wood R, et al.

Chest radiograph reading and recording system: evaluation for tuberculosis screening in

patients with advanced HIV. Int J Tuberc Lung Dis. 2010 Jan;14(1):52-8.

59. Riley RL, Nardell EA. Clearing the air. The theory and application of ultraviolet

air disinfection. Am Rev Respir Dis. [Review]. 1989 May;139(5):1286-94.

86

60. Kellerman S, Tokars JI, Jarvis WR. The cost of selected tuberculosis control

measures at hospitals with a history of Mycobacterium tuberculosis outbreaks. Infect

Control Hosp Epidemiol. 1997 Aug;18(8):542-7.

61. Pitchenik AE, Rubinson HA. The radiographic appearance of tuberculosis in

patients with the acquired immune deficiency syndrome (AIDS) and pre-AIDS. Am Rev

Respir Dis. 1985 Mar;131(3):393-6.

62. Doi K. Current status and future potential of computer-aided diagnosis in medical

imaging. Br J Radiol. 2005;78 Spec No 1:S3-S19.

63. Lijmer JG, Mol BW, Heisterkamp S, Bonsel GJ, Prins MH, van der Meulen JH, et

al. Empirical evidence of design-related bias in studies of diagnostic tests. Jama. 1999

Sep 15;282(11):1061-6.

64. Rutjes AW, Reitsma JB, Di Nisio M, Smidt N, van Rijn JC, Bossuyt PM.

Evidence of bias and variation in diagnostic accuracy studies. Cmaj. 2006 Feb

14;174(4):469-76.

65. Willis BH. Spectrum bias - Why clinicians need to be cautious when applying

diagnostic test studies. Fam Pract. 2008;25(5):390-6.

66. Laifer GWAFSMBSTAFRTM, Battegay MFU. TB in a Low-Incidence Country:

Differences Between New Immigrants, Foreign-Born Residents and Native Residents.

Am J Med. 2007 April;120(4):350-6.

67. Theron G, Peter J, van Zyl-Smit R, Mishra H, Streicher E, Murray S, et al.

Evaluation of the Xpert MTB/RIF assay for the diagnosis of pulmonary tuberculosis in a

high HIV prevalence setting. American Journal of Respiratory and Critical Care

Medicine. 2011 Jul 1;184(1):132-40.

87

68. Ling DI, Pai M, Davids V, Brunet L, Lenders L, Meldau R, et al. Are interferon-

gamma release assays useful for diagnosing active tuberculosis in a high-burden setting?

Eur Respir J. 2011 Sep;38(3):649-56.

69. Dheda K, Davids V, Lenders L, Roberts T, Meldau R, Ling D, et al. Clinical

utility of a commercial LAM-ELISA assay for TB diagnosis in HIV-infected patients

using urine and sputum samples. Plos One. 2010;5(3):e9848.

70. Urbanczik R. Present position of microscopy and of culture in diagnostic

mycobacteriology. Zentralbl Bakteriol Mikrobiol Hyg A.. 1985 Aug;260(1):81-7.

71. Boehme CC, Nicol MP, Nabeta P, Michael JS, Gotuzzo E, Tahirli R, et al.

Feasibility, diagnostic accuracy, and effectiveness of decentralised use of the Xpert

MTB/RIF test for diagnosis of tuberculosis and multidrug resistance: a multicentre

implementation study. Lancet. 2011 Apr 30;377(9776):1495-505.

72. Steingart KR, Henry M, Ng V, Hopewell PC, Ramsay A, Cunningham J, et al.

Fluorescence versus conventional sputum smear microscopy for tuberculosis: a

systematic review. Lancet Infect Dis. 2006 Sep;6(9):570-81.

73. Steingart KR, Ng V, Henry M, Hopewell PC, Ramsay A, Cunningham J, et al.

Sputum processing methods to improve the sensitivity of smear microscopy for

tuberculosis: a systematic review. Lancet Infect Dis. 2006 Oct;6(10):664-74.

74. Cattamanchi A, Huang L, Worodria W, den Boon S, Kalema N, Katagira W, et al.

Integrated strategies to optimize sputum smear microscopy: a prospective observational

study. American Journal of Respiratory and Critical Care Medicine. 2011 Feb

15;183(4):547-51.

88

75. Nagpaul D, Naganathan N, Prakash M. Diagnostic photofluorography and sputum

microscopy in tuberculosis case-findings. . Proceedings of the 9th Eastern Region

Tuberculosis Conference and 29th National Conference on Tuberculosis and Chest

Diseases, Delhi, November 1974 Delhi, The Tuberculosis Association of

India/International Union Against Tuberculosis. 1975.

76. Den Boon S, Bateman ED, Enarson DA, Borgdorff MW, Verver S, Lombard CJ,

et al. Development and evaluation of a new chest radiograph reading and recording

system for epidemiological surveys of tuberculosis and lung disease. International Journal

of Tuberculosis and Lung Disease. 2005 Oct;9(10):1088-96.

77. Agizew T, Bachhuber MA, Nyirenda S, Makwaruzi V, Tedla Z, Tallaksen RJ, et

al. Association of chest radiographic abnormalities with tuberculosis disease in

asymptomatic HIV-infected adults. International Journal of Tuberculosis and Lung

Disease. 2010 Mar;14(3):324-31.

78. Dawson R, Masuka P, Edwards DJ, Bateman ED, Bekker LG, Wood R, et al.

Chest radiograph reading and recording system: evaluation for tuberculosis screening in

patients with advanced HIV. International Journal of Tuberculosis and Lung Disease.

2010 Jan;14(1):52-8.

79. Quang H. Vuong. Likelihood Ratio Tests for Model Selection and Non-Nested

Hypotheses. Econometrica. 1989;57(2):307-33.

80. Cohen J. A coefficient of agreement for nominal scales. . Educ Psychol Meas.

1960(70):213-20.

81. Dye C, Williams BG. The population dynamics and control of tuberculosis.

Science. 2010 May 14;328(5980):856-61.

89

82. Harries AD, Nyirenda TE, Banerjee A, Boeree MJ, Salaniponi FM. Treatment

outcome of patients with smear-negative and smear-positive pulmonary tuberculosis in

the National Tuberculosis Control Programme, Malawi. Trans R Soc Trop Med Hyg.

1999 Jul-Aug;93(4):443-6.

83. Jones TF, Schaffner W. Miniature chest radiograph screening for tuberculosis in

jails: a cost-effectiveness analysis. American Journal of Respiratory and Critical Care

Medicine. 2001 Jul 1;164(1):77-81.

84. Theron G, Pooran. A, Peter. J, Zyl-Smit. Rv, Mishra. HK, Meldau. R, et al. Do

adjunct TB tests, when combined with Xpert MTB/RIF, improve accuracy and the cost of

diagnosis in a resource-poor setting? Eur Respir J. 2011;ERJ Express. Published on

November 10, 2011 as doi: 10.1183/09031936.00145511.

85. Venkatesh KK, Swaminathan S, Andrews JR, Mayer KH. Tuberculosis and HIV

co-infection: screening and treatment strategies. Drugs. 2011 Jun 18;71(9):1133-52.

90

APPENDIX

6.1 SEARCH STRATEGY FOR THE SYSTEMATIC REVIEW

NEW MEDLINE search strategy:

1. senstitiv:.mp.

2. diagnos:.mp.

3. di.fs.

4. 1 or 2 or 3

5. radiograph:.tw.

6. radiolog:.tw.

7. Mass Chest X-Ray/

8. chest x-ray.tw.

9. scor:.tw.

10. Radiography, Thoracic/

11. chest xray.tw.

12. 5 or 6 or 7 or 8 or 9 or 10 or 11

13. pulmonary tuberculosis.tw.

14. pulmonary tb.tw.

15. lung tuberculosis.tw.

16. lung tb.tw.

17. Tuberculosis, Lymph Node/

18. Tuberculosis, Miliary/

19. Tuberculosis, Multidrug-Resistant/

20. Tuberculosis, Pleural/

21. Tuberculosis, Pulmonary/

22. Mycobacterium tuberculosis/

23. miliary tuberculosis.tw.

24. tuberculous pleurisy.tw.

25. tuberculous pleural effusion.tw.

26. pleural tuberculosis.tw.

27. tuberculous lymphadenitis.tw.

28. lymph node tuberculosis.tw.

29. lymph node tb.tw.

30. miliary tb.tw.

31. pleural tb.tw.

32. 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 21 or 22 or 23 or 24 or 25 or 26 or 27

or 28 or 29 or 30 or 31

33. 4 and 12 and 32

OLD MEDLINE/ MEDLINE in-process search strategy:

91

1. radiograph:.tw.

2. radiolog:.tw.

3. Mass Chest X-Ray/

4. chest x-ray.tw.

5. scor:.tw.

6. Radiography, Thoracic/

7. chest xray.tw.

8. 1 or 2 or 3 or 4 or 5 or 6 or 7

9. pulmonary tuberculosis.tw.

10. pulmonary tb.tw.

11. lung tuberculosis.tw.

12. lung tb.tw.

13. Tuberculosis, Lymph Node/

14. Tuberculosis, Miliary/

15. Tuberculosis, Multidrug-Resistant/

16. Tuberculosis, Pleural/

17. Tuberculosis, Pulmonary/

18. Mycobacterium tuberculosis/

19. miliary tuberculosis.tw.

20. tuberculous pleurisy.tw.

21. tuberculous pleural effusion.tw.

22. pleural tuberculosis.tw.

23. tuberculous lymphadenitis.tw.

24. lymph node tuberculosis.tw.

25. lymph node tb.tw.

26. miliary tb.tw.

27. pleural tb.tw.

28. 9 or 10 or 11 or 12 or 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 21 or 22 or 23 or

24 or 25 or 26 or 27

29. 8 and 28

EMBASE search strategy:

1. specificity.tw.

2. predict:.tw.

3. di.fs.

4. 1 or 2 or 3

5. radiograph:.tw.

6. radiolog:.tw.

7. chest x-ray.tw.

8. scor:.tw.

9. thorax radiography/

10. chest xray.tw.

11. 5 or 6 or 7 or 8 or 9 or 10

92

12. pulmonary tuberculosis.tw.

13. pulmonary tb.tw.

14. lung tuberculosis.tw.

15. lung tb.tw.

16. tuberculous lymphadenitis/

17. miliary tuberculosis/

18. multidrug resistant tuberculosis/

19. tuberculous pleurisy/

20. lung tuberculosis/

21. Mycobacterium tuberculosis/

22. miliary tuberculosis.tw.

23. tuberculous pleurisy.tw.

24. tuberculous pleural effusion.tw.

25. pleural tuberculosis.tw.

26. tuberculous lymphadenitis.tw.

27. lymph node tuberculosis.tw.

28. lymph node tb.tw.

29. miliary tb.tw.

30. pleural tb.tw.

31. 12 or 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 21 or 22 or 23 or 24 or 25 or 26

or 27 or 28 or 30

32. 4 and 11 and 31

Web of Science search strategy:

#1 Topic=(pulmonary tuberculosis) OR Topic=(pulmonary tb) OR Topic=(lung

tuberculosis) OR Topic=(lung tb) OR Topic=(tuberculous lymphadenitis) OR Topic=(tb

lymphadenitis) OR Topic=(tb lymph node) OR Topic=(miliary tuberculosis) OR

Topic=(miliary tb) OR Topic=(multidrug resistant tb) OR Topic=(multidrug resistant

tuberculosis) OR Topic=(pleural tuberculosis) OR Topic=(pleural tb) OR

Topic=(tuberculous pleurisy) OR Topic=(tb pleurisy) OR Topic=(mycobacterium

tuberculosis)

#2 Topic=(sensitiv*) OR Topic=(specific*) OR Topic=(diagnos*) OR

Topic=(accura*) OR Topic=(predict*) OR Topic=(reliab*) OR Topic=(reproducib*)

#3 TS=(radiograph*) OR TS=(radiolog*) OR TS=(chest x-ray) OR TS=(chest xray)

OR TS=(scor*)

#4 #3 AND #2 AND #1

93

6.2 DATA EXTRACTION FORM FOR THE SYSTEMATIC REVIEW

First author:

___________________________

Year of publication:

Language:

1-English

2-French

3-Spanish

Corresponding author email

address:

__________________________

Was author contacted?

0 - No

1 – Yes

If yes, date contacted

__ __ - __ __ - 2011

(DD – MM)

Data assessor

Karen Steingart

Lancelot Pinto

DEMOGRAPHICS AND STUDY DETAILS

Setting

1- Inpatient

2- Outpatient

3- Mixed (in-patients and out-patients)

4- Other _______________

9- NR/unclear

Country of investigation

not reported

Case country world bank

classification

1-Middle/low income

2- High income

3- Both

94

Study design

1- Randomized controlled trial

2- Cross-sectional

3- Cohort

4- Retrospective chart review

5- Others (specify)_______________

9- NR/unclear

Participant selection

1- Consecutive

2- Random

3- Convenience

4- Other_______________

9- NR/unclear

Start inclusion of patients

(year)

not reported

Eligibility inclusion criteria

(fill in the definition where

applicable, for example,

weight loss kg/percent ideal

body weight)

1- cough for ________weeks

2- fever for _________ weeks

3- weight loss _______-

________________

4- night sweats

5- hemoptysis

6- breathlessness

7- past history of tuberculosis

8- contact with TB patient

9- HIV positive individuals

10- any __ of the checked boxes

above

11-all of the checked boxes above

12- unclear/not specified

not reported

95

Exclusion criteria

1-HIV positive individuals

2-HIV negative individuals

3- known TB patients on treatment

4-smear positive patients

5-other

________________________

9-none specified

not reported

Age distribution

Mean, SD _________________

Median, IQR __________________

Range from ______ to _____years

not reported

Number of eligible patients

with chest radiographs, %

_________/_________

____________ %

not reported

Gender (% males of total) not reported

Type of radiograph

Digital

Conventional

Not specified

Number of readers for each

radiograph, and specialty

1 – specialty: ________________

2 – specialty: _________________,

____________________

3 – specialty: _______________, _______________,

___________

9 – not reported

Reference standard used for

the diagnosis of active

tuberculosis

1- solid culture x __________

2- liquid culture x _________

3- culture, not specified

Number of patients in the

study with active TB,

percentage

____________ of ___________

________ %

96

Number of patients in the

study who were HIV

positive, percentage

____________ of ___________

________ %

not reported

CD4+ count of HIV

positive individuals

mean ____________

median ___________

not reported

Number of patients HIV

positive on HAART,

percentage

____________ of __________

________ %

not reported

DETAILS PERTAINING TO THE INDEX TEST

Were individual features of

radiographs analyzed, or

was the end-point

dichotomous (consistent

with TB or not)

1- specific features analyzed

2- dichotomous

Which of these features

were analyzed? (fill in any

further specifics e.g.

laterality, size)

1- Upper lobe infiltrate

__________________________

2- Upper lobe involvement, unspecified type

3- Presence of cavity

_________________________________

4- Presence of infiltrate, unspecified location________

5- Presence of

lymphadenopathy____________________

6- Presence of pleural effusion

______________________

7- Presence of a military pattern

_____________________

97

8- Presence of volume loss

__________________________

9- Presence of calcification

__________________________

10- Other

____________________________________________

___

Were patient numbers for

specific features and

ORs/RRs reported?

1- yes

2- no

FEATURE 1

__________

____

Active TB Calculated (circle OR or RR)

OR /RR, 95% CI =

______________ (

)

Reported OR/RR, 95% CI =

______________ (

)

+ -

Feature +

-

FEATURE 2

__________

____

Active TB Calculated (circle OR or RR)

OR /RR, 95% CI =

______________ (

)

Reported OR/RR, 95% CI =

______________ (

)

+ -

Feature +

-

98

FEATURE 3

__________

____

Active TB Calculated (circle OR or RR)

OR /RR, 95% CI =

______________ (

)

Reported OR/RR, 95% CI =

______________ (

)

+ -

Feature +

-

FEATURE 4

__________

____

Active TB Calculated (circle OR or RR)

OR /RR, 95% CI =

______________ (

)

Reported OR/RR, 95% CI =

______________ (

)

+ -

Feature +

-

FEATURE 5

__________

____

Active TB Calculated (circle OR or RR)

OR /RR, 95% CI =

______________ (

)

Reported OR/RR, 95% CI =

______________ (

)

+ -

Feature +

-

99

FEATURE 6

__________

____

Active TB Calculated (circle OR or RR)

OR /RR, 95% CI =

______________ (

)

Reported OR/RR, 95% CI =

______________ (

)

+ -

Feature +

-

FEATURE 7

__________

____

Active TB Calculated (circle OR or RR)

OR /RR, 95% CI =

______________ (

)

Reported OR/RR, 95% CI =

______________ (

)

+ -

Feature +

-

Was a multivariable logistic

regression with only

radiographic features

performed?

1- Yes

2- No

Was a multivariable logistic

regression with clinical and

radiographic features

performed?

1- Yes

2- No

100

DETAILS OF THE MULTIVARIABLE LOGISTIC REGRESSION

FEATURE beta-

coefficient (CI) OR (CI)

Weight given

in the score

developed

Percentage

weight in the

overall

maximum

score

RELIABILITY OF THE

SCORING SYSTEM

1- tested

2- not tested

Intra-observer agreement

(same observer, reported

twice)

Overall _____________________

Kappa _______________________

Inter-observer agreement

(different observers)

Overall _____________________

Kappa _______________________

101

PERFORMANCE CHARACTERISTICS OF THE SCORE

Cut-off

used

Sensitivity Specificity PPV NPV AUC Any other

indicator

described

QUALITY ASSESSMENT

Study_______________________ Year______________

Item Yes No Unclear

Representative sample1

Acceptable ref std

Acceptable delay

Partial verification avoided2

Differential verification avoided3

Incorporation avoided?

Ref std results blinded?

Index results blinded?

Relevant clinical information

available?4

Uninterpretable results reported?

Withdrawals explained5

Conducted without conflict of interest?

Notes:

1 Representative sample requires PTB suspects, consecutive sample

Case control studies do not qualify as a representative sample.

102

2 Occurs when only a selected sample of patients who underwent the index test is

verified by the reference standard, and that sample is dependent on the results of the test.

For example, patients with suspected coronary artery disease whose exercise test results

are positive may be more likely to undergo coronary angiography (the reference standard)

than those whose exercise test results are negative.

3 Those who tested negative or strongly positive were given a less or more thorough

reference standard for verification

4 Relevant clinical information (were the same clinical data available when test results

were interpreted as would be available when the test is used in practice?

5 Indeterminate and withdrawals have to be described and/or show flow diagram or code

unclear.

103

6.3 CHEST RADIOGRAPH READING AND REPORTING SYSTEM

(CRRS) FORM